Add a flag that allows us to easily switch disqualification from
suspension mode on or off. A node will only be disqualified from
suspension mode if it has been suspended for longer than the grace
period AND the SuspensionDQEnabled flag is true.
Change-Id: I9e67caa727183cd52ab2042b0a370a1bcaebe792
CreateTables hasn't been quite true for a while now, rename to
MigrateToLatest to be clearer in it's behavior.
Change-Id: Ida48e95122a5d9b7a814e922d3698e00024a2ba7
The UpdateAddress method use to be used when storage node's checked in with the Satellite, but once the contact service was created this method was no longer used. This PR finally removes it.
Change-Id: Ib3f83c8003269671d97d54f21ee69665fa663f24
Sometimes nodes who have gracefully exited will still be holding pieces
according to the satellite. This has some unintended side effects
currently, such as nodes getting disqualified after having successfully
exited.
* When the audit reporter attempts to update node stats, do not update
stats (alpha, beta, suspension, disqualification) if the node has
finished graceful exit (audit/reporter_test.go TestGracefullyExitedNotUpdated)
* Treat gracefully exited nodes as "not reputable" so that the repairer
and checker do not count them as healthy (overlay/statdb_test.go
TestKnownUnreliableOrOffline, repair/repair_test.go
TestRepairGracefullyExited)
Change-Id: I1920d60dd35de5b2385a9b06989397628a2f1272
Currently Cockroach isn't performant for concurrent database setup and
tear-down. Instead of a single instance allow setting multiple potential
connection strings and let the tests pick one connection string
randomly.
This improves test duration by ~10 minutes.
While we are at significantly changing how pgtest works, introduce
helper PickPostgres and PickCockroach for selecting the database to
reduce code duplications in multiple places.
Change-Id: I8ad171d5c4c8a4fc081ec2ae9bdd0cc948a80619
In cases like the segment reaper script connecting to the metainfodb,
we don't want a db migration to happen automatically when we call
metainfo.NewStore. This adds MigrateToLatest method for postgreskv
and cockroackv, and calls MigrateToLatest in places where NewStore used
to create tables.
Change-Id: I682d0f26d609af0601dfdb32a24866cdf5d32a7e
A/B indicates that B is a subtest of A, however in this case they
represent a configuration of the test, not a subtest.
Change-Id: I64eed5d5bcb12759e54fe4b5373f8e88488e50f7
Update unknown_audit_reputation_alpha and unknown_audit_reputation_beta.
Add test to verify that BatchUpdateStats properly modifies unknown audit
alpha/beta
Change-Id: I0d5f9cac96a99f64905cf575b772402db0756a9d
If a node is suspended and receives an unknown or failing audit,
disqualify them if the grace period (default 1w in production) has
passed.
Migrate the nodes table so any node that is currently suspended gets
unsuspended when the satellite starts up.
Change-Id: I7b81c68026f823417faa0bf5e5cb5e67c7156b82
This reverts commit 105dc7acc6.
Reason for revert: Recent changes to the Postgres query plan seems to want to use this index now. Reverting until we have time to analyze what's happening.
Change-Id: I74b4b5a8f15c3850d8a958a29f51dbc80e7c282c
Alpha=1 and beta=0 are the expected first values for any alpha/beta
reputation system we are using in the codebase. So we are removing the
configurability of these values.
Change-Id: Ic61861b8ea5047fa1438ea6609b1d0048bf0abc3
Whenever the node's reputation is updated, if its unknown audit
reputation is below the suspension threshold, its suspension field
is set to the current time. This could overwrite the previous
"suspendedAt" value resulting a node that never reaches the end of
its suspension.
Also log whenever a node is disqualified or its suspension status
changes
Change-Id: I5e8c8f1c46f66d79cb279b5b16a84fe03f533deb
Reduce the number of non-methods to reduce funcs in the namespace also
combine a func to slightly condense the code more.
Change-Id: Ifbe728eb8c8ca4c981df648decd259c2097b6b40
uuid.UUID implements driver.Value so it can be directly used as a
scannable result.
Replace uses of dbutil.BytesToUUID with uuid.FromBytes.
Change-Id: I51a670185ceb3cc2199d5aa2b76bc3fc191ca8fe
Instead of providing the database from outside to testplanet create it
inside and then allow wrapping and modifying it. This is more convenient
to use.
Change-Id: I9b8f69e6e0a19ff984b4e2bfe927c9100c77bc6c
storagenodes have like 10 or more databases. without this
tag they all get sent as the same value, stomping on each
other.
Change-Id: Ib12019684d6ea8f2a5b83df584056dfa79e3c4b3
The goal of this change is to improve the storagenode_storage_tallies table by removing the unneeded id column that is not being used but only taking up space, and also to add an index on a different column that needs it. Removing and adding a column seems simple, but ended up being more complicated because of some cockroachdb limitations.
The cockroachdb limitation when trying to remove a column from a table and create a new primary key are:
1. only allows primary key creation at table creation time (docs: https://www.cockroachlabs.com/docs/stable/primary-key.html)
2. table drop or rename is performed async and cannot be done in a transaction (issue: https://github.com/cockroachdb/cockroach/issues/12123, https://github.com/cockroachdb/cockroach/issues/22868)
To address these differences between cockroachdb and Postgres, this PR performs different migrations for the two database. The Postgres migration is straight forward and what you would expect, but the cockroach migration has two main changes:
1. To change a primary key, use the recommended process from the cockroachdb docs to create a new table with the new primary key you want and then migrate the data.
2. In order to do 1, we needed to do the new table renaming in a separate transaction from the data migration.
Ref: SM-65
Change-Id: Idc9aee3ab57aa4d5570e3d2980afea853cd966bf
by doing an indexed anti-join we're able to reduce the time to
select the pending orders by over 10x on postgres. this should
help us process pending orders much more quickly.
it probably won't do as good a job on cockroach because it does
not do an indexed anti-join and instead does a hash join after
scanning the entire consumed serials table. we should either
remove orders entirely or try to make that more efficient
when necessary.
Change-Id: I8ca0535acd21c51e74955b24c9b86d20e4f2ff9c
Make sure that suspended nodes are treated appropriately by the overlay
cache. This means we should expect the following behavior:
* suspended nodes (vetted or not) should not be selected for uploading
new segments
* suspended nodes should be treated by the checker and repairer as
"unhealthy", and should be removed upon successful repair
This commit also removes unused overlay functionality.
Fixes a bug with commit 8b72181a1f where
the audit reporter was automatically suspending nodes regardless of
audit outcome (see test added).
Tests:
* updates repair tests to ensure that a suspended node is treated as
unhealthy and will be removed from the pointer on successful repair
* updates overlay tests for KnownUnreliableOrOffline and KnownReliable
to expect suspended nodes to be considered "unreliable"
* adds satellitedb test that ensures overlay.SelectStorageNodes and
overlay.SelectNewStorageNodes do not include suspended nodes
* adds audit reporter test to ensure that different audit outcomes
result in the correct suspended/disqualified states
Change-Id: I40dba67278c8e8d2ce0bcec5e0a5cb6e4ce2f561
Initial change for checking bucket existence on satellite side for
requests like BeginObject and ListObjects. This is simple implementation
that is just checking bucket in DB but should be improved in future to
avoid DB calls as much as possible.
Part of https://storjlabs.atlassian.net/browse/USR-365
Change-Id: I9076acddc44d7dbfa7612a1c24a007de01621583