* add outline for ECRepairer
* add description of process in TODO comments
* begin download/getting hash for a single piece
* verify piece hash and order limit during download
* fix download piece
* begin filling out ESREpair. Get
* wip move ecclient.Repair to ecrepairer.Repair
* pass satellite signee into repairer
* reconstruct original stripe from pieces
* move rebuildStripe()
* calculate piece size differently, increment successful count
* fix shares slices initialization
* rename stripeData to segment
* do not pad reader in Repair()
* temp debug
* create unsafeRSScheme
* use decode reader
* rename file name to be all lowercase
* make repair downloader async
* declare condition variable inside Get method
* set downloadAndVerifyPiece's in-memory buffer to be share size
* update unusedLimits var
* address comments
* remove unnecessary comments
* move initialization of segmentRepaire to be outside of repairer service
* use ReadAll during download
* remove dots and move hashing to after validating for order limit signature
* wip test
* make sure files exactly at min threshold are repaired
* remove unused code
* use corrput data and write back to storagenode
* only create corrupted node and piece ids once
* add comment
* address nat's comment
* fix linting and checker_test
* update comment
* add comments
* remove "copied from ecclient" comments
* add clarification comments in ec.Repair
* satellite/satellitedb: Always release savepoint
Release the savepoint when processing orders in any case.
* satellite/satellitedb: Wrap errors exec savepoints
Wrap the errors returned by the execution of savepoints operations when
processing orders.
* V3-2529: Add DB savepoint to fix issue with postgres. Add test force a rejected order
Co-Authored-By: Ivan Fraixedes <ivan@fraixed.es>
* Update satellite/satellitedb/orders.go
* nicer flags
* fix concurrency
* add concurrent workers
* initialize things
* fix tests
* close retain service
* ensure we don't have workers working on the same satellite
* ensure things compile
* fix other compilation issues:
* concurrency changes
ran this with `go test -count=1000` and it passed all of them.
- we add a closed channel so that we can select on it with
context cancellation.
- we put a once in so we only close the channel once.
- every time the queue/running state changes, we have to broadcast
because we may want to wake up N pending Wait calls or other
concurrent workers.
- because we broadcast, we don't need to do the polling in Wait
anymore.
- ensure Run doesn't start multiple times so that we don't have
to worry about concurrent Close with multiple Runs.
- hold the lock while we start workers so that a concurrent Close
with Run can't decide that there's nothing started and exit
and then have Run start things.
- make sure to poll the closed/context channels through loops
or at the start of Run calls in case Close happens first.
- these polls should be under a mutex because they have a default
case which makes it possible to schedule such that Close hasn't
executed the channel close so it starts more work.
- cancel a local Run context when it's going to exit to make sure
that any retainPieces calls have a canceled context.
- hopefully enough comments to both check my work and help readers
digest what's going on.
Change-Id: Ida0e226a7e01e8ae64fa2c59dd5a84b04bccfbd7
* use the retain error class
Change-Id: I1511eaef135f98afd57b878e997e4c8a0d11cafc
* concurrency fixes again
- forgot to update the gc test to use the old Wait api.
- we need to drop the lock while we wait for the workers
to exit, because they may be blocked on the condition
variable
- additionally, we need to broadcast when we close the
signal channel because the state changed: they want
to wake up and exit.
Change-Id: I4204699792275260cd912f29aa73720f7d9b14b5
* undo my misguided rename
Change-Id: I6baffe1eb0434e260212c485bbcc01bed3250881
* remove pollInterval
* format paragraph more nicely
* move skew calculation into retain pieces
The call to monkit for functions which mostly run from the beginning to
the end of the satellite process must be done because it only causes a
little overhead.
Creates a new chore, dbcleanup, which can be used for routine deletion of items from the satellite database and adds functionality for deletion of expired serial numbers
* add a writer wrapper
* remove unused code
* read out the rest of the connection in client
* remove unused code
* no panic
* check response status code
What: this change makes sure the count of segments is not encrypted.
Why: having the segment count encrypted just makes things hard for no reason - a satellite operator can figure out how many segments an object has by looking at the other segments in the database. but if a user has access but has lost their encryption key, they now can't clean up or delete old segments because they can't know how many there are without just guessing until they get errors. :(
Backwards compatibility: clients will still understand old pointers and will still write old pointers. at some point in the future perhaps we can do a migration for remaining old pointers so we can delete the old code.
Please describe the tests: covered by existing tests
Please describe the performance impact: none
This PR introduces functionality for routine deletion of archived orders.
The user may specify an interval at which to run archive cleanup and a TTL for archived items. During each cleanup, all items that have reached the TTL are deleted
This archive cleanup job is combined with the order sender into a new combined orders service
Add retain service on storagenode. This service runs retain jobs that have been queued by the storagenodes. Rather than running retain jobs during the grpc Retain() call, the grpc call queues a retain job to the retain service and returns immediately afterwards, removing a significant bottleneck in garbage collection.
* pkg/process: Fatal show complete error information
Change the general process execution function to not using the sugared
logger for outputting the full error information.
Delete some unreachable code because Zap logger Fatal method calls exit
1 internally.
* storagenode/storagenodedb: Add info to error
Add more information to an error returned due to some data
inconsistency.
* storagenode/orders: Don't use sugared logger
Don't use sugar logger and provide better contextualized error messages
in settle method.
* storagenode/orders: Add some log fields to error msgs
Add some relevant log fields to some logged errors of the sender settle
method.
* satellite/orders: Remove always nil error from debug
Remove an error which as logged in debug level which was always nil and
makes the logic that used this variable clear.
* storagenode/orders: Don't return error Archiving unsent
Don't stop the process which archive unsent orders if some of them
aren't found the DB because it cause the Storage Node to stop with a
fatal error.
* update offer once redemption cap has reached
* use transaction to get offer info before insert
* update offer status when redeemable capacity has reached
* fix format
* use pgutil to check constraint error
* change error message