Commit Graph

494 Commits

Author SHA1 Message Date
Ethan
acf53bea4d satellite/orders;accounting: Add monthly project download bandwidth rollup
See https://storjlabs.atlassian.net/browse/SM-776

Change-Id: Ifd5cccea43c556fd59822d17344f399cfe9a7164
2020-05-04 15:49:57 +00:00
Egon Elbre
8928399d02 all: rename CreateTables to MigrateToLatest
CreateTables hasn't been quite true for a while now, rename to
MigrateToLatest to be clearer in it's behavior.

Change-Id: Ida48e95122a5d9b7a814e922d3698e00024a2ba7
2020-04-30 07:21:17 +00:00
Jessica Grebenschikov
6a6427526b satellite/overlay: remove old updateaddress method
The UpdateAddress method use to be used when storage node's checked in with the Satellite, but once the contact service was created this method was no longer used. This PR finally removes it.

Change-Id: Ib3f83c8003269671d97d54f21ee69665fa663f24
2020-04-30 06:41:48 +00:00
Moby von Briesen
de366537a8 satellite/satellitedb/overlaycache: fix behavior around gracefully exited nodes
Sometimes nodes who have gracefully exited will still be holding pieces
according to the satellite. This has some unintended side effects
currently, such as nodes getting disqualified after having successfully
exited.
* When the audit reporter attempts to update node stats, do not update
stats (alpha, beta, suspension, disqualification) if the node has
finished graceful exit (audit/reporter_test.go TestGracefullyExitedNotUpdated)
* Treat gracefully exited nodes as "not reputable" so that the repairer
and checker do not count them as healthy (overlay/statdb_test.go
TestKnownUnreliableOrOffline, repair/repair_test.go
TestRepairGracefullyExited)

Change-Id: I1920d60dd35de5b2385a9b06989397628a2f1272
2020-04-28 23:58:43 +00:00
Egon Elbre
85c45cd56f private/dbutil/pgtest: support multiple databases for testing
Currently Cockroach isn't performant for concurrent database setup and
tear-down. Instead of a single instance allow setting multiple potential
connection strings and let the tests pick one connection string
randomly.

This improves test duration by ~10 minutes.

While we are at significantly changing how pgtest works, introduce
helper PickPostgres and PickCockroach for selecting the database to
reduce code duplications in multiple places.

Change-Id: I8ad171d5c4c8a4fc081ec2ae9bdd0cc948a80619
2020-04-28 21:55:49 +03:00
Natalie Villasana
6f84be133a satellite/metainfo: add MigrateToLatest to PointerDB
In cases like the segment reaper script connecting to the metainfodb,
we don't want a db migration to happen automatically when we call
metainfo.NewStore. This adds MigrateToLatest method for postgreskv
and cockroackv, and calls MigrateToLatest in places where NewStore used
to create tables.

Change-Id: I682d0f26d609af0601dfdb32a24866cdf5d32a7e
2020-04-28 17:26:35 +00:00
Egon Elbre
ef913be234 satellite/satellitedb/satellitedbtest: don't use subtest naming
A/B indicates that B is a subtest of A, however in this case they
represent a configuration of the test, not a subtest.

Change-Id: I64eed5d5bcb12759e54fe4b5373f8e88488e50f7
2020-04-27 19:32:09 +03:00
Ivan Fraixedes
03871d17c3 satellite/satellitedb: Update ticket ref
Update a reference to a ticket in a code comment.

Change-Id: Ib82220e94527482c5ca1a58d8614b919d1113ab5
2020-04-27 08:50:41 +00:00
Stefan Benten
d73630fd4a
satellite/satellitedb: Ensure we just return bucket usage for buckets that exist (#3863) 2020-04-24 22:25:16 +02:00
Moby von Briesen
720e26d235 satellite/satellitedb/overlaycache: update unknown alpha/beta values properly
Update unknown_audit_reputation_alpha and unknown_audit_reputation_beta.
Add test to verify that BatchUpdateStats properly modifies unknown audit
alpha/beta

Change-Id: I0d5f9cac96a99f64905cf575b772402db0756a9d
2020-04-23 10:40:53 -04:00
Moby von Briesen
72b93f3120 satellite/satellitedb: disqualify suspended nodes when the grace period passes
If a node is suspended and receives an unknown or failing audit,
disqualify them if the grace period (default 1w in production) has
passed.

Migrate the nodes table so any node that is currently suspended gets
unsuspended when the satellite starts up.

Change-Id: I7b81c68026f823417faa0bf5e5cb5e67c7156b82
2020-04-22 15:45:00 -04:00
Ethan Adams
60e07f0a8b Revert "satellite/accounting: Remove unnecessary index bucket_bandwidth_rollups_project_id_action_interval_index"
This reverts commit 105dc7acc6.

Reason for revert: Recent changes to the Postgres query plan seems to want to use this index now.  Reverting until we have time to analyze what's happening.

Change-Id: I74b4b5a8f15c3850d8a958a29f51dbc80e7c282c
2020-04-22 14:49:04 +00:00
Qweder93
805e328c47 storagenode/heldamount payments removed
Change-Id: I87cc04f43d182a4190a571ef417be85d02db9d34
2020-04-21 17:15:31 +00:00
Ethan
105dc7acc6 satellite/accounting: Remove unnecessary index bucket_bandwidth_rollups_project_id_action_interval_index
See https://storjlabs.atlassian.net/browse/SM-738

Change-Id: I9ba3cc3fbff9f13fc0b95d25feee5a19e5a5c486
2020-04-21 16:43:09 +00:00
Qweder93
6e3585e394 satellite/heldamount/endpoint : GetAllPaystubs added
Change-Id: Ic8cdd9db8b2a68796f9579c7fed2d49d9054bd64
2020-04-19 19:21:54 +03:00
Ethan
4cd86ff780 satellite/accounting: Add index on bucket_bandwidth_rollups for action, interval_start, and project_id
See https://storjlabs.atlassian.net/browse/SM-551 for details

Change-Id: I104c4e87d5aef500cc4a3893817763808f76c484
2020-04-17 19:14:45 +00:00
Jess G
5ea1602ca5
satellite/overlay: add selected node cache (#3846)
* init implementation cache

Change-Id: Ia54a1943e0707a77189bc5f4a9aaa8339c98d99a

* one query to init cache

Change-Id: I7c04b3ae104b553ae23fca372351a4328f632c66

* add monit tracking of cache

Change-Id: I7d209e12c8f32d43708b23bf2126c5d5098e0a07

* add first test

Change-Id: I0646a9349d457a9eb3920f7cd2d62fb72ffc3ab5

* add staleness to cache

Change-Id: If002329bfdd53a4b200ad14dbd2ffc8b280aedb8

* add init test

Change-Id: I3a3d0aa74cfac1d125fa93cb749316ed2a74d5b1

* fix comment

Change-Id: I73353d00ccf0952b38c0f8ef7d1755c15cbfe9d9

* mv to nodeselection pkg

Change-Id: I62487f768296c7a7b597fa398a4c42daf6e9c5b7

* add state to cache

Change-Id: I081e77ec0e16706faee1a267de9a7fa643d6ac11

* add refresh concurrent test

Change-Id: Idcba72508291099f280edc65355273c0acc3d3ce

* add a few more tests

Change-Id: I9422e9eaa22bf01c11f14bdb892ebcf7b3e5e5fb

* fix tests, add min version to select allnodes

Change-Id: I926f41d568951ad4ff70c6d4ceb87abb1e3e5009

* update comments

Change-Id: I6ffe33e245ca65fb523c880cd72e63ce35776eb9

* fixes and rm Init

Change-Id: Ifbe09b668978b5d9af09ca38cb080d02a2154cf4

* fix format

Change-Id: I03cc217e28dc1839190c5c6dbdbb602c132a5a38
2020-04-14 13:50:02 -07:00
Moby von Briesen
d7794a4851 satellite/overlay: hardcode default values for audit alpha/beta
Alpha=1 and beta=0 are the expected first values for any alpha/beta
reputation system we are using in the codebase. So we are removing the
configurability of these values.

Change-Id: Ic61861b8ea5047fa1438ea6609b1d0048bf0abc3
2020-04-14 19:12:40 +00:00
Cameron Ayer
02613407ae satellite/satellitedb: only suspend node if not already suspended
Whenever the node's reputation is updated, if its unknown audit
reputation is below the suspension threshold, its suspension field
is set to the current time. This could overwrite the previous
"suspendedAt" value resulting a node that never reaches the end of
its suspension.

Also log whenever a node is disqualified or its suspension status
changes

Change-Id: I5e8c8f1c46f66d79cb279b5b16a84fe03f533deb
2020-04-10 09:37:37 +00:00
Egon Elbre
d86cce202c satellite/satellitedb: use arrays for arguments in node selection
This simplifies the code and makes queries faster:

name                               old time/op  new time/op  delta
SelectStorageNodes-32              7.72ms ± 6%  7.22ms ± 3%  -6.44%  (p=0.016 n=5+5)
SelectNewStorageNodes-32           7.75ms ± 2%  7.37ms ± 1%  -4.89%  (p=0.008 n=5+5)
SelectStorageNodesExclusion-32     16.9ms ± 0%  16.6ms ± 0%  -2.15%  (p=0.008 n=5+5)
SelectNewStorageNodesExclusion-32  17.2ms ± 0%  16.6ms ± 2%  -3.69%  (p=0.008 n=5+5)
FindStorageNodes-32                45.5ms ± 0%  45.1ms ± 1%    ~     (p=0.056 n=5+5)
FindStorageNodesExclusion-32       77.4ms ± 0%  75.9ms ± 0%  -1.91%  (p=0.008 n=5+5)

Change-Id: I38f77f6282b9738e8416113d42c6acb46c03da7b
2020-04-09 21:16:10 +03:00
Egon Elbre
ccf4f9ed2d satellite/satellitedb: node selection code cleanup
Reduce the number of non-methods to reduce funcs in the namespace also
combine a func to slightly condense the code more.

Change-Id: Ifbe728eb8c8ca4c981df648decd259c2097b6b40
2020-04-09 20:41:29 +03:00
Natalie Villasana
cf80b3caf3
satellite/overlay: combine SelectStorageNodes and SelectNewStorageNodes (#3831) 2020-04-09 11:19:44 -04:00
Egon Elbre
11a44cdd88 all: don't depend on gogo/proto directly
Change-Id: I8822dea0d1b7b99e0b828e0373a0308a42dde2be
2020-04-08 17:32:15 +00:00
Egon Elbre
cf26951a5b satellite/satellitedb/pbold: remove dead code
Change-Id: I7464773c20b8f99a601ca9cc4bee804f1ac14cf9
2020-04-08 15:22:31 +03:00
Jeff Wendling
2ded64ba2c satellite/compensation: more fixes to get prod running smoothly
Change-Id: I13a76d9d49222fb10796415a015f224d4084fde3
2020-04-07 10:10:27 +00:00
Jennifer Johnson
1547e791a3 satellitedb: remove free_bandwidth column from nodes table
Change-Id: I9d1d3de9216c6533c1042ef473631721a011d086
2020-04-06 09:30:28 +00:00
Egon Elbre
9200efc61f satellite/satellitedb: fix selecting a nullable string
Change-Id: I59e645966e09da586512c69101691b47055c1e5a
2020-04-03 21:30:20 +03:00
Egon Elbre
6492b13d81 all: remove old uuid
Change-Id: I3a137f73456f010c37d3933dbe12cbbb840b809f
2020-04-02 19:30:36 +03:00
Egon Elbre
8f73fb7a32 all: simplify uuid usage
uuid.UUID implements driver.Value so it can be directly used as a
scannable result.

Replace uses of dbutil.BytesToUUID with uuid.FromBytes.

Change-Id: I51a670185ceb3cc2199d5aa2b76bc3fc191ca8fe
2020-04-02 05:48:58 +00:00
Egon Elbre
a416b03941 satellite/accounting: fix TestProjectBandwidthTotal
Test was inserting for past 4 days, however the test was summing up for
the current month.

Change-Id: I509afdc6a76b314a6bb90652ab70cd2c2bab1288
2020-04-01 11:50:18 +03:00
Egon Elbre
0a69da4ff1 all: switch to storj.io/common/uuid
Change-Id: I178a0a8dac691e57bce317b91411292fb3c40c9f
2020-03-31 19:16:41 +03:00
Qweder93
dc32f1da55 storagenode/cache/heldamount added, errNoRows ignored
Change-Id: If6b675e622d6c1324c0893c43cca93dc5323cd78
2020-03-31 11:35:58 +00:00
Jeff Wendling
e2ff2ce672 satellite: compensation package and commands
Change-Id: I7fd6399837e45ff48e5f3d47a95192a01d58e125
2020-03-30 14:08:14 -06:00
Jennifer Johnson
d77f3b8786 satellitedb/migrate: set vetted_at backfill to now.day
Change-Id: Ib2b12be43dbd3f3705b1891bc703ae15abb75e09
2020-03-30 16:50:23 +00:00
Egon Elbre
439aba922a satellite/overlay: reduce overhead of GetNodes
Instead of filtering on the client side it's better to filter on the
database side.

Change-Id: I845fbbe5ed28c2ffdb0b8a3f789b59c094fd1069
2020-03-30 18:36:23 +03:00
Egon Elbre
cb781d66c7 satellite/overlay: optimize FindStorageNodes
Reduce the number of fields returned from the query.

Benchmark results in `satellite/overlay`:

benchstat before.txt after2.txt
name                               old time/op  new time/op  delta
SelectStorageNodes-32              7.85ms ± 1%  6.27ms ± 1%  -20.18%  (p=0.002 n=10+4)
SelectNewStorageNodes-32           8.21ms ± 1%  6.61ms ± 0%  -19.53%  (p=0.002 n=10+4)
SelectStorageNodesExclusion-32     17.2ms ± 1%  15.9ms ± 1%   -7.55%  (p=0.002 n=10+4)
SelectNewStorageNodesExclusion-32  17.8ms ± 2%  16.1ms ± 0%   -9.38%  (p=0.002 n=10+4)
FindStorageNodes-32                48.4ms ± 1%  45.1ms ± 0%   -6.69%  (p=0.002 n=10+4)
FindStorageNodesExclusion-32       79.2ms ± 1%  76.1ms ± 1%   -3.89%  (p=0.002 n=10+4)

Benchmark results from `satellite/overlay` after making them parallel:

benchstat before-parallel.txt after2-parallel.txt
name                               old time/op  new time/op  delta
SelectStorageNodes-32               548µs ± 1%   353µs ± 1%  -35.60%  (p=0.029 n=4+4)
SelectNewStorageNodes-32            562µs ± 0%   368µs ± 0%  -34.51%  (p=0.029 n=4+4)
SelectStorageNodesExclusion-32     1.02ms ± 1%  0.84ms ± 0%  -18.08%  (p=0.029 n=4+4)
SelectNewStorageNodesExclusion-32  1.03ms ± 1%  0.86ms ± 2%  -16.22%  (p=0.029 n=4+4)
FindStorageNodes-32                3.11ms ± 0%  2.79ms ± 1%  -10.27%  (p=0.029 n=4+4)
FindStorageNodesExclusion-32       4.75ms ± 0%  4.43ms ± 1%   -6.56%  (p=0.029 n=4+4)

Change-Id: I1d85e2764eb270f4c2b1998303ccfc1179d65b26
2020-03-30 18:36:23 +03:00
Egon Elbre
e1a443b04a private/testplanet: allow modifying created database
Instead of providing the database from outside to testplanet create it
inside and then allow wrapping and modifying it. This is more convenient
to use.

Change-Id: I9b8f69e6e0a19ff984b4e2bfe927c9100c77bc6c
2020-03-27 19:14:48 +00:00
Ethan
df462d7265 satellite/accounting: Add index on bucket_bandwidth_rollups to minimize full table scans
https://storjlabs.atlassian.net/browse/SM-545

Change-Id: I5599a72a991d70236f17beca027e9bc032777177
2020-03-26 19:53:50 +00:00
Jeff Wendling
97e980cd8a private/dbutil: add database name to configure as a tag
storagenodes have like 10 or more databases. without this
tag they all get sent as the same value, stomping on each
other.

Change-Id: Ib12019684d6ea8f2a5b83df584056dfa79e3c4b3
2020-03-26 16:50:15 +00:00
Jennifer Johnson
b75cbc8e24 satellite,storagenode: remove references to free bandwidth
Change-Id: I42a6597544804fa9235e89ec656ebc365eb522e5
2020-03-25 22:28:34 +00:00
Michal Niewrzal
fdf40a7526 storj: remove storj/private/version package which was moved to
`storj/private` repo

Change-Id: I81c3f5b9d5e4fe7bca760999eb045ee9734e5e2e
2020-03-24 14:31:33 +00:00
Jessica Grebenschikov
aeab599d21 satellitedb: removed unused id on storagenode_storage_tallies table, add index on node_id
The goal of this change is to improve the storagenode_storage_tallies table by removing the unneeded id column that is not being used but only taking up space, and also to add an index on a different column that needs it. Removing and adding a column seems simple, but ended up being more complicated because of some cockroachdb limitations.

The cockroachdb limitation when trying to remove a column from a table and create a new primary key are:
1. only allows primary key creation at table creation time (docs: https://www.cockroachlabs.com/docs/stable/primary-key.html)
2. table drop or rename is performed async and cannot be done in a transaction (issue: https://github.com/cockroachdb/cockroach/issues/12123, https://github.com/cockroachdb/cockroach/issues/22868)

To address these differences between cockroachdb  and Postgres, this PR performs different migrations for the two database. The Postgres migration is straight forward and what you would expect, but the cockroach migration has two main changes:
1. To change a primary key, use the recommended process from the cockroachdb docs to create a new table with the new primary key you want and then migrate the data.
2. In order to do 1, we needed to do the new table renaming in a separate transaction from the data migration.

Ref: SM-65

Change-Id: Idc9aee3ab57aa4d5570e3d2980afea853cd966bf
2020-03-20 14:39:44 -07:00
Jennifer Johnson
9b78473c0c satellitedb: adds vetted_at nullable timestamp to nodes table
Change-Id: I42d5a396b4eecbad26b683c6aee51e043d2ff034
2020-03-20 01:37:28 +00:00
Qweder93
0df586c3a8 satellitedb/heldamount updated, tests added + storagenode console updated
Change-Id: I10f568a426d0fc42069d025de2accbef5b26dc0c
2020-03-19 15:37:45 +02:00
Jeff Wendling
115f4559e5 satellite/orders: more efficient processing of orders
by doing an indexed anti-join we're able to reduce the time to
select the pending orders by over 10x on postgres. this should
help us process pending orders much more quickly.

it probably won't do as good a job on cockroach because it does
not do an indexed anti-join and instead does a hash join after
scanning the entire consumed serials table. we should either
remove orders entirely or try to make that more efficient
when necessary.

Change-Id: I8ca0535acd21c51e74955b24c9b86d20e4f2ff9c
2020-03-18 09:03:30 +00:00
Moby von Briesen
2f991b6c56 satellite/{overlay, satellitedb}: account for suspended field in overlay cache
Make sure that suspended nodes are treated appropriately by the overlay
cache. This means we should expect the following behavior:
* suspended nodes (vetted or not) should not be selected for uploading
new segments
* suspended nodes should be treated by the checker and repairer as
"unhealthy", and should be removed upon successful repair

This commit also removes unused overlay functionality.

Fixes a bug with commit 8b72181a1f where
the audit reporter was automatically suspending nodes regardless of
audit outcome (see test added).

Tests:
* updates repair tests to ensure that a suspended node is treated as
unhealthy and will be removed from the pointer on successful repair
* updates overlay tests for KnownUnreliableOrOffline and KnownReliable
to expect suspended nodes to be considered "unreliable"
* adds satellitedb test that ensures overlay.SelectStorageNodes and
overlay.SelectNewStorageNodes do not include suspended nodes
* adds audit reporter test to ensure that different audit outcomes
result in the correct suspended/disqualified states

Change-Id: I40dba67278c8e8d2ce0bcec5e0a5cb6e4ce2f561
2020-03-17 17:14:56 +00:00
Michal Niewrzal
81afbcc12e satellite/metainfo: check bucket existence on upload and listing
Initial change for checking bucket existence on satellite side for
requests like BeginObject and ListObjects. This is simple implementation
that is just checking bucket in DB but should be improved in future to
avoid DB calls as much as possible.

Part of https://storjlabs.atlassian.net/browse/USR-365

Change-Id: I9076acddc44d7dbfa7612a1c24a007de01621583
2020-03-17 15:43:22 +00:00
Jeff Wendling
7baa59753a satellite/orders: add tests for double sending the same order
Change-Id: If2fa7f035257df3b04f506f81aa8b2e0916f5033
2020-03-17 14:18:03 +00:00
Ethan
bdbf764b86 satellite/orders;overlay: Consolidate order limit storage node lookups into 1 query.
https: //storjlabs.atlassian.net/browse/SM-449
Change-Id: Idc62cc2978fba67cf48f7c98b27b0f996f9c58ac
2020-03-16 23:15:47 +00:00
Moby von Briesen
8b72181a1f satellite/{audit,overlay,satellitedb}: implement unknown audit reputation and suspension
* change overlay.UpdateStats to allow a third audit outcome. Now it can
handle successful, failed, and unknown audits.
* when "unknown audit reputation"
(unknownAuditAlpha/(unknownAuditAlpha+unknownAuditBeta)) falls below the
DQ threshold, put node into suspension.
* when unknown audit reputation goes above the DQ threshold, remove node
from suspension.
* record unknown audits from audit reporter.
* add basic tests around unknown audits and suspension.

Change-Id: I125f06f3af52e8a29ba48dc19361821a9ff1daa1
2020-03-16 20:29:26 +00:00