* satellite/metainfo: Rollback path parts check in loop
We have to rollback the changes applied in checking the rawPath parts
from 4 to 3 because the production prointerDB is still storing buckets.
* satellite/metainfo: Don't return path parts less 4
Don't return an error in the metainfo loop iterator when a path doesn't
have 4 parts because it belongs to bucket metadata, not an actual
object.
* merge migration
* rm migration versions
* rm unneeded migration test data
* create index w/postgres + crdb compatible syntax
* add default to offers.invitee_credit_duration_days
* changes so that schema matches from master to branch
* change to be crdb compatible
* add check to confirm db version
* mv version check to migration
* update tests
* add minversion to sadb migration, update tests
* confirm min version for all dbs in a migration
* add validate migration to sadb
* fix lint err
* rm min version check from migrate
* change sadb check
* hard code min db version
* fix comment
* skip unknown errors (wip)
* add tests to make sure nodes that time out are added to containment
* add bad blobs store
* call "Skipped" "Unknown"
* add tests to ensure unknown errors do not trigger containment
* add monkit stats to lockfile
* typo
* add periods to end of bad blobs comments
* satellite/nodeselection: dont select nodes that havent checked in for a while
* change testplanet online window to one minute
* remove satellite reconfigure online window = 0 in repair tests
* pass timestamp into UpdateCheckIn
* change timestamp to timestamptz
* edit tests to set last_contact_success to 4 hours ago
* fix syntax error
* remove check for last_contact_success > last_contact_failure in IsOnline
Large conditional blocks are hard to read.
When the conditional block only has one branch it's easy to understand
the logic of the function to early return switching the condition.
We don't use reverse listing in any of our code, outside of tests, and
it is only exposed through libuplink in the
lib/uplink.(*Project).ListBuckets() API. We also don't know of any users
who might have a need for reverse listing through ListBuckets().
Since one of our prospective pointerdb backends can not support
backwards iteration, and because of the above considerations, we are
going to remove the reverse listing feature.
Change-Id: I8d2a1f33d01ee70b79918d584b8c671f57eef2a0
* during audit Verify, return error and delete segment if segment is expired
* delete "main" reverify segment and return error if expired
* delete contained nodes and pointers when pointers to audit are expired
* update testplanet.Upload and testplanet.UploadWithConfig to use an expiration time of an hour from now
* Revert "update testplanet.Upload and testplanet.UploadWithConfig to use an expiration time of an hour from now"
This reverts commit e9066151cf84afbff0929a6007e641711a56b6e5.
* do not count ExpirationDate=time.Time{} as expired
* If a node claims to fail a transfer due to piece not found, remove that node from the pointer, delete the transfer queue item.
* If the pointer is piece hash verified, penalize the node. Otherwise, do not penalize the node.
* change satellite.Peer name to Core
* change to Core in testplanet
* missed a few places
* keep shared stuff in peer.go to stay consistent with storj/docs
* separate sadb migration, add version check
* update checkversion to do same validation as migration
* changes per CR
* add sa migration to storj-sim
* add different debug port in storj-sim for migration
* add wait for exit for storj-sim migration
* update sa docker entrypoint to support migration
* storj-sim satellite parts all wait for migration
* upgrade golang-migrate/migrate to v4 because bug
* fix go mod tidy
* rm dup api code from sa peer, update storj-sim
* fix for backwards compat tests
* use env var instead of localhost
* changes per CR
* fix env var name
* skip peer for setup
* improve errors in satellite contact endpoints
* add changes per CR comments
* update pingback method so it still updates node table
* fix err and returns
* fix zap logging to be better
* set up satellite repair run command
* add separated repair process to storj-sim
* add repairer peer to satellite in testplanet
* move api run cmd into api.go
* add satellite run repair to entrypoint
* check duplicate node id before update pointer
* add test for transfer failure when pointer already contain the receiving node id
* check exiting and receiving nod are still in the pointer
* check node id only exists once in a pointer
* return error if the existing node doesn't match with the piece info in the pointer
* try to recreate the issue on jenkins
* should not remove exiting node piece in test
* Update satellite/gracefulexit/endpoint.go
Co-Authored-By: Maximillian von Briesen <mobyvb@gmail.com>
* Update satellite/gracefulexit/endpoint.go
Co-Authored-By: Maximillian von Briesen <mobyvb@gmail.com>
* add signatures, fix process loop bug, move delete to on success
* added tests for signatures
* PR comment updates
* fixed setting reason by default.
* updates for PR comments
* added signed failure when verificationi fails
* moved to sign_test
* fix panic
* removed testplanet from test
* Make the exiting node check piece hashes, piece IDs, and piece hash signatures before relaying successful transfer data to the satellite.
* Enable immediate graceful exit failure for "successful" transfers that fail satellite-side validation.
* Move transfer piece logic in storagenode worker to separate function (to make the worker easier to understand)
* add overall failure percentage check and inactive time frame check before sending a response to sno
* update comment
* delete node from transfer queue if it has been inactive for too long
* fix linting error
* add test config value
* fix nil pointer
* add config value into testplanet
* add unit test for overall failure threshold
* move timeframe threshold to chore
* update protolock
* add chore test
* add per peiece failure count logic
* change config name from EndpointMaxFailures to MaxFailuresPerPiece
* address comments
* fix linting error
* add error handling for no row returned from progress table
* fix test for graceful exit chore on storagenode
* fix typo InActive -> Inactive
* improve readability for failure threshold calculation
* update config lock
* change error handling for GetProgress in graceful exit endpoint on the satellite side
* return proper rpc error in endpoint
* add check in chore test for checking finish timestamp and queue
* update lock file and add comment
* add created at and bytes transferred
* cleanup
* rename db func to GetGracefulExitNodesByTimeFrame
* fix flag
* split into two overlay functions
* := to =
* fix test
* add node not found error class
* fix overlay test
* suggested test changes
* review suggestions
* get exit status from overlay.Get()
* check rows.Err
* fix panic when ExitFinishedAt is nil
* fix comments in cmdGracefulExit
libuplink was incorrectly setting timeouts to 10 seconds still, but
should have been at least 10 minutes. the order sender was setting them
to 1 hour. we don't want timeouts in uplink-side logic as it establishes
a minimum rate on tcp streams.
instead of all of this, just use tcp keep alive. tcp keep alive packets are
sent every 15 seconds and if the peer stops responding the connection
dies. this is enabled by default with go. this will kill tcp connections
when they stop working.
Change-Id: I3d7ad49f71950b3eb43044eedf4b17993116045b
* uplink/storage/segments: return error no optimal threshold
Return an error if the store get less uploaded pieces than the indicated
by the optimal threshold.
* satellite/metainfo: Fix gRPC status error & add reason
This commit fix the CommitSegment endpoint method to return an
"Invalid Argument" status code when uplink submits invalid data which is
detected when filtering invalid pieces by filterInvalidPieces endpoint
method.
Because filterInvalidPieces is also used by CommitSegmentOld, such
method part has been changed accordingly.
* An initial check in CommitSegment to detect earlier if uplink sends an
invalid number of upload pieces.
* Add more information to some log messages.
* Return more information to uplink when it sends a number of invalid
pieces which make impossible to finish the operation successfully.
* satellite/metainfo: Swap some "sugar" loggers to normal ones
Swap "sugar" loggers to normal ones because they impact the performance
in production systems and they should only be used under specific
circumstances which were none of the ones changed.
* add metrics counter and chore
* updates metrics observer interval release default and dev default to 15min
* add more specific check for remote pointers
* add Counter field to metrics chore, add counter tests
* rm redundant ObjectCount suffix
* make pointer check easier to read
* change metrics.Config.Interval to ChoreInterval
* rm unneeded var
* fix comment
* update satellite config lock
* set up redis support in live accounting
* move live.Service interface into accounting package and rename to Cache, pass into satellite
* refactor Cache to store one int64 total, add IncrBy method to redis client implementation
* add monkit tracing to live accounting