With the new storage node downtime tracking feature, we need remove current uptime reputation configs: UptimeReputationAlpha, UptimeReputationBeta, and
UptimeReputationDQ. This is the first step of removing the uptime
reputation columns from satellitedb
Change-Id: Ie8fab13295dbf545e33aeda0c4306cda4ba54e36
Disqualifies a node when the node fails to complete a graceful
exit.
Adds a new DisqualifyNode method to the overlay cache, since there
wasn't an existing method to disqualify a node but do nothing else
to its stats.
Adds checks to existing tests to make sure that a storage node that
fails a graceful exit is marked as disqualified in the overlay
cache.
https: //storjlabs.atlassian.net/browse/V3-3342
Change-Id: I4d554a519ab59db31ad3b8e28764c8683a6e3888
Remove direct dependency on uplink.RSConfig, this simplifies
moving the config file without introducing weird dependencies.
Change-Id: I7fd2a145401e0205d7047631df9d2810241efeec
Adds check to see if storage nodes are eligible to initiate
graceful exit, by checking their CreatedAt date and seeing if
their "age" is greater than the new config value:
NodeMinAgeInMonths
The default for this value is 6 months for now.
https://storjlabs.atlassian.net/browse/V3-3357
Change-Id: Ib807ab8987ddb5a38a27a83886490f73fe8c5816
Fixes a data race caused by not waiting for workers to finish
before shutting down. Currently this ended up failing logging
because it was closed when test tried to write to it.
Change-Id: I074045cd83bbf49e658f51353aa7901e9a5d074b
* If a node claims to fail a transfer due to piece not found, remove that node from the pointer, delete the transfer queue item.
* If the pointer is piece hash verified, penalize the node. Otherwise, do not penalize the node.
* check duplicate node id before update pointer
* add test for transfer failure when pointer already contain the receiving node id
* check exiting and receiving nod are still in the pointer
* check node id only exists once in a pointer
* return error if the existing node doesn't match with the piece info in the pointer
* try to recreate the issue on jenkins
* should not remove exiting node piece in test
* Update satellite/gracefulexit/endpoint.go
Co-Authored-By: Maximillian von Briesen <mobyvb@gmail.com>
* Update satellite/gracefulexit/endpoint.go
Co-Authored-By: Maximillian von Briesen <mobyvb@gmail.com>
* Make the exiting node check piece hashes, piece IDs, and piece hash signatures before relaying successful transfer data to the satellite.
* Enable immediate graceful exit failure for "successful" transfers that fail satellite-side validation.
* Move transfer piece logic in storagenode worker to separate function (to make the worker easier to understand)
* add overall failure percentage check and inactive time frame check before sending a response to sno
* update comment
* delete node from transfer queue if it has been inactive for too long
* fix linting error
* add test config value
* fix nil pointer
* add config value into testplanet
* add unit test for overall failure threshold
* move timeframe threshold to chore
* update protolock
* add chore test
* add per peiece failure count logic
* change config name from EndpointMaxFailures to MaxFailuresPerPiece
* address comments
* fix linting error
* add error handling for no row returned from progress table
* fix test for graceful exit chore on storagenode
* fix typo InActive -> Inactive
* improve readability for failure threshold calculation
* update config lock
* change error handling for GetProgress in graceful exit endpoint on the satellite side
* return proper rpc error in endpoint
* add check in chore test for checking finish timestamp and queue