* create upsert query for check-in method
* add tests
* fix lint err
* add benchmark test for db query
* fix lint and tests
* add a unit test, fix lint
* add address to tests
* replace print w/ b.Fatal
* refactor query per CR comments
* fix disqualified, only set if null
* fix query
* add version to updatecheckin query
* fix version
* fix tests
* change version for tests
* add version to tests
* add IP, add transport, mv unit test
* use node.address as arg
* add last ip
* fix lint
What: we move api keys out of the grpc connection-level metadata on the client side and into the request protobufs directly. the server side still supports both mechanisms for backwards compatibility.
Why: dRPC won't support connection-level metadata. the only thing we currently use connection-level metadata for is api keys. we need to move all information needed by a request into the request protobuf itself for drpc support. check out the .proto changes for the main details.
One fun side-fact: Did you know that protobuf fields 1-15 are special and only use one byte for both the field number and type? Additionally did you know we don't use field 15 anywhere yet? So the new request header will use field 15, and should use field 15 on all protobufs going forward.
Please describe the tests: all existing tests should pass
Please describe the performance impact: none
* add test to make sure we will reverify the share in the containment db rather than in the pointer passed into reverify
* use pending audit information only when running reverify
* update audit status as failed for nodes that failed piece hash verification
* remove comment
* fix lint error
* add test
* fix format
* use named return value for Get
* add comments
* add more better comment
* format
this is a trivial operation for storagenode/console, as it doesn't
really need or use kademlia in the first place.
What:
Removes kademlia from storagenode/console
Why:
We are in the process of getting rid of kademlia, and this is one place where it's particularly easy.
Please describe the tests:
Existing tests exercise storagenode/console behavior; if they continue to work, everything here should be tested satisfactorily.
Please describe the performance impact:
None
* implement contact.checkin method
* add batching to update uptime checks
* rm batching
* rm other unneeded things
* fix lint
* fix unit test
* changes per CR comments
* couple more CR changes
* add identity check into grpcOpt
* fix lint
* why do you fix the test
* revert test change
* stop contact chore for repair test
* put node in cache
* comment out contact chore. See what happens
* Revert "comment out contact chore. See what happens"
This reverts commit 2e45008e36a50e0a842ae455ac83de77093d4daa.
* try stopping contact earlier
* stop contact chore in uplink_test
* replace self on chore with *RoutingTable for access to latest node info
* Revert "stop contact chore in uplink_test"
This reverts commit 302db70f4071112d1b9f7ee0279225ea12757723.
* Revert "try stopping contact earlier"
This reverts commit 806cc3b82f9d598899dafd83da9315a1cb0cb43c.
* Revert "stop contact chore for repair test"
This reverts commit dd34de1cfdfc09b972186c9ab9a4f1e822446b79.
* add outline for ECRepairer
* add description of process in TODO comments
* begin download/getting hash for a single piece
* verify piece hash and order limit during download
* fix download piece
* begin filling out ESREpair. Get
* wip move ecclient.Repair to ecrepairer.Repair
* pass satellite signee into repairer
* reconstruct original stripe from pieces
* move rebuildStripe()
* calculate piece size differently, increment successful count
* fix shares slices initialization
* rename stripeData to segment
* do not pad reader in Repair()
* temp debug
* create unsafeRSScheme
* use decode reader
* rename file name to be all lowercase
* make repair downloader async
* declare condition variable inside Get method
* set downloadAndVerifyPiece's in-memory buffer to be share size
* update unusedLimits var
* address comments
* remove unnecessary comments
* move initialization of segmentRepaire to be outside of repairer service
* use ReadAll during download
* remove dots and move hashing to after validating for order limit signature
* wip test
* make sure files exactly at min threshold are repaired
* remove unused code
* use corrput data and write back to storagenode
* only create corrupted node and piece ids once
* add comment
* address nat's comment
* fix linting and checker_test
* update comment
* add comments
* remove "copied from ecclient" comments
* add clarification comments in ec.Repair
* satellite/satellitedb: Always release savepoint
Release the savepoint when processing orders in any case.
* satellite/satellitedb: Wrap errors exec savepoints
Wrap the errors returned by the execution of savepoints operations when
processing orders.
* V3-2529: Add DB savepoint to fix issue with postgres. Add test force a rejected order
Co-Authored-By: Ivan Fraixedes <ivan@fraixed.es>
* Update satellite/satellitedb/orders.go
* nicer flags
* fix concurrency
* add concurrent workers
* initialize things
* fix tests
* close retain service
* ensure we don't have workers working on the same satellite
* ensure things compile
* fix other compilation issues:
* concurrency changes
ran this with `go test -count=1000` and it passed all of them.
- we add a closed channel so that we can select on it with
context cancellation.
- we put a once in so we only close the channel once.
- every time the queue/running state changes, we have to broadcast
because we may want to wake up N pending Wait calls or other
concurrent workers.
- because we broadcast, we don't need to do the polling in Wait
anymore.
- ensure Run doesn't start multiple times so that we don't have
to worry about concurrent Close with multiple Runs.
- hold the lock while we start workers so that a concurrent Close
with Run can't decide that there's nothing started and exit
and then have Run start things.
- make sure to poll the closed/context channels through loops
or at the start of Run calls in case Close happens first.
- these polls should be under a mutex because they have a default
case which makes it possible to schedule such that Close hasn't
executed the channel close so it starts more work.
- cancel a local Run context when it's going to exit to make sure
that any retainPieces calls have a canceled context.
- hopefully enough comments to both check my work and help readers
digest what's going on.
Change-Id: Ida0e226a7e01e8ae64fa2c59dd5a84b04bccfbd7
* use the retain error class
Change-Id: I1511eaef135f98afd57b878e997e4c8a0d11cafc
* concurrency fixes again
- forgot to update the gc test to use the old Wait api.
- we need to drop the lock while we wait for the workers
to exit, because they may be blocked on the condition
variable
- additionally, we need to broadcast when we close the
signal channel because the state changed: they want
to wake up and exit.
Change-Id: I4204699792275260cd912f29aa73720f7d9b14b5
* undo my misguided rename
Change-Id: I6baffe1eb0434e260212c485bbcc01bed3250881
* remove pollInterval
* format paragraph more nicely
* move skew calculation into retain pieces
The call to monkit for functions which mostly run from the beginning to
the end of the satellite process must be done because it only causes a
little overhead.
Creates a new chore, dbcleanup, which can be used for routine deletion of items from the satellite database and adds functionality for deletion of expired serial numbers
* add a writer wrapper
* remove unused code
* read out the rest of the connection in client
* remove unused code
* no panic
* check response status code
What: this change makes sure the count of segments is not encrypted.
Why: having the segment count encrypted just makes things hard for no reason - a satellite operator can figure out how many segments an object has by looking at the other segments in the database. but if a user has access but has lost their encryption key, they now can't clean up or delete old segments because they can't know how many there are without just guessing until they get errors. :(
Backwards compatibility: clients will still understand old pointers and will still write old pointers. at some point in the future perhaps we can do a migration for remaining old pointers so we can delete the old code.
Please describe the tests: covered by existing tests
Please describe the performance impact: none
This PR introduces functionality for routine deletion of archived orders.
The user may specify an interval at which to run archive cleanup and a TTL for archived items. During each cleanup, all items that have reached the TTL are deleted
This archive cleanup job is combined with the order sender into a new combined orders service
Add retain service on storagenode. This service runs retain jobs that have been queued by the storagenodes. Rather than running retain jobs during the grpc Retain() call, the grpc call queues a retain job to the retain service and returns immediately afterwards, removing a significant bottleneck in garbage collection.
* pkg/process: Fatal show complete error information
Change the general process execution function to not using the sugared
logger for outputting the full error information.
Delete some unreachable code because Zap logger Fatal method calls exit
1 internally.
* storagenode/storagenodedb: Add info to error
Add more information to an error returned due to some data
inconsistency.
* storagenode/orders: Don't use sugared logger
Don't use sugar logger and provide better contextualized error messages
in settle method.
* storagenode/orders: Add some log fields to error msgs
Add some relevant log fields to some logged errors of the sender settle
method.
* satellite/orders: Remove always nil error from debug
Remove an error which as logged in debug level which was always nil and
makes the logic that used this variable clear.
* storagenode/orders: Don't return error Archiving unsent
Don't stop the process which archive unsent orders if some of them
aren't found the DB because it cause the Storage Node to stop with a
fatal error.
* update offer once redemption cap has reached
* use transaction to get offer info before insert
* update offer status when redeemable capacity has reached
* fix format
* use pgutil to check constraint error
* change error message
* when there's partner id, we will not require an activation token for creating a new account
* create new token if user has a partner id on creation
* validate partner id first
* fix format
* remove unnecessary code
* display error message instead of reroute
* add more test
* add comments
* add comment
* satellitedb/certDB: refactors of the node certificate storage DB table
The existing implementation doesnt allow to store the complete certificate chain of uplinkIDs or storagenodeIDs, so the current table is dropped and new table will be added which addresses the storage and retrieval of certificates
pkg/identity: fixes spelling mistakes that I missed on PR#2754
Fixes V3-1992/V3-2388
Deprecate the pieceinfo database, and start storing piece info as a header to
piece files. Institute a "storage format version" concept allowing us to handle
pieces stored under multiple different types of storage. Add a piece_expirations
table which will still be used to track expiration times, so we can query it, but
which should be much smaller than the pieceinfo database would be for the
same number of pieces. (Only pieces with expiration times need to be stored in piece_expirations, and we don't need to store large byte blobs like the serialized
order limit, etc.) Use specialized names for accessing any functionality related
only to dealing with V0 pieces (e.g., `store.V0PieceInfo()`). Move SpaceUsed-
type functionality under the purview of the piece store. Add some generic
interfaces for traversing all blobs or all pieces. Add lots of tests.
* Added batch update stats for recordAuditSuccessStatus
* Added batch update stats to recordAuditFailStatus
* added configurable batch size
* build individual update/delete statements so the statements can be batched into 1 call to the DB
* notified #config-changes channel and ran make update-satellite-config-lock
* updated tests to use batch update stats
* pkg/server: don't use global logger
* satellite/overlay: use correct logger
* pkg/kademlia: use correct logger
* linksharing: use conventional way to pass in logger
* use zaptest in tests
* parent 13dd501042
author Yingrong Zhao <yingrong.zhao@gmail.com> 1563560530 -0400
committer Yingrong Zhao <yingrong.zhao@gmail.com> 1563581673 -0400
parent 13dd501042
author Yingrong Zhao <yingrong.zhao@gmail.com> 1563560530 -0400
committer Yingrong Zhao <yingrong.zhao@gmail.com> 1563581428 -0400
satellite/console: add referral link logic (#2576)
* setup referral route
* referredBy
* add user id
* modify user query
* separate optional field from userInfo
* get current reward on init of satellite gui
* remove unsed code
* fix format
* only apply 0 credit on registration
* only pass required information for rewards
* fix time parsing
* fix test and linter
* rename method
* add todo
* remove user referral logic
* add null check and fix format
* get current offer
* remove partnerID on CreateUser struct
* fix storj-sim user creation
* only redeem credit when there's an offer
* fix default offer configuration
* fix migration
* Add helper function for get correct credit duration
* add comment
* only store userid into user_credit table
* add check for partner id to set correct offer type
* change free credit to use invitee credits
* remove unecessary code
* add credit update in activateAccount
* remove unused code
* fix format
* close reader and fix front-end build
* move create credit logic into CreateUser method
* when there's no offer set, user flow shouldn't be interrupted by referral program
* add appropriate error messages
* remove unused code
* add comment
* add error class for no current offer error
* add error class for credits update
* add comment for migration
* only log secret when it's in debug level
* fix typo
* add testdata
* Update overlaycache.go
Removes one select statement and columns gets filtered in first query.
Needs to be tested agains real database that this query is working and faster!
* Correct linting
reorder scans that this fit to new sql result order
* rename pkg/linksharing to linksharing
* rename pkg/httpserver to linksharing/httpserver
* rename pkg/eestream to uplink/eestream
* rename pkg/stream to uplink/stream
* rename pkg/metainfo/kvmetainfo to uplink/metainfo/kvmetainfo
* rename pkg/auth/signing to pkg/signing
* rename pkg/storage to uplink/storage
* rename pkg/accounting to satellite/accounting
* rename pkg/audit to satellite/audit
* rename pkg/certdb to satellite/certdb
* rename pkg/discovery to satellite/discovery
* rename pkg/overlay to satellite/overlay
* rename pkg/datarepair to satellite/repair
* Added a gc package at satellite/gc, which contains the gc.Service, which runs garbage collection integrated with the metainfoloop, and the gc PieceTracker, which implements the metainfo loop Observer interface and stores all of the filters (about which pieces are good) for each node.
* Added a gc config located at satellite/gc/service.go (loop disabled by default in release)
* Creates bloom filters with pieces to be retained inside the metainfo loop
* Sends RetainRequests (or filters with good piece ids) to all storage nodes.
* satellite/satellitedb: User var block for Error
To follow with the code style of the majority of the sources of the
current code base the Error variable should be in a block.
Replacing a single var expression to a block one makes the godoc more
consistent across the repository.
* satellite/satellitedb: Remove empty spaces end of line
* pkg/datarepair/repairer: Track always time for repair
Make a minor change in the worker function of the repairer, that when
successful, always track the metric time for repair independently if the
time since checker queue metric can be tracked.
* storage/postgreskv: Wrap error in Get func
Wrap the returned error of the Get function as it is done when the
query doesn't return any row.
* satellite/metainfo: Move debug msg to the right place
NewStore function was writing a debug log message when the DB was
connected, however it was always writing it out despite if an error
happened when getting the connection.
* pkg/datarepair/repairer: Wrap error before logging it
Wrap the error returned by process which is executed by the Run method
of the repairer service to add context to the error log message.
* pkg/datarepair/repairer: Make errors more specific in worker
Make the error messages of the "worker" method of the Service more
specific and the logged message for such errors.
* pkg/storage/repair: Improve error reporting Repair
In order of improving the error reporting by the
pkg/storage/repair.Repair method, several errors of this method and
functions/methods which this one relies one have been updated to be
wrapper into their corresponding classes.
* pkg/storage/segments: Track path param of Repair method
Track in monkit the path parameter passed to the Repair method.
* satellite/satellitedb: Wrap Error returned by Delete
Wrap the error returned by repairQueue.Delete method to enhance the
error with a class and stack and the
pkg/storage/segments.Repairer.Repair method get a more contextualized
error from it.
* setup referral route
* referredBy
* add user id
* modify user query
* separate optional field from userInfo
* get current reward on init of satellite gui
* remove unsed code
* fix format
* only apply 0 credit on registration
* only pass required information for rewards
* fix time parsing
* fix test and linter
* rename method
* add todo
* remove user referral logic
* add null check and fix format
* get current offer
* remove partnerID on CreateUser struct
* fix storj-sim user creation
* only redeem credit when there's an offer
* fix default offer configuration
* fix migration
* Add helper function for get correct credit duration
* add comment
* only store userid into user_credit table
* add check for partner id to set correct offer type
* change free credit to use invitee credits
* remove unecessary code
* Add partnerID on user creation
* added support for partner ID on create user in consoleql User
* add partner ID to api key if the user creating it has a partner ID associated with it
* updates for consoleal user and userinfo
* add RedeemRewards method
* remove redeem from reward.db
* add redeemable cap check in redeem
* rename offerCap to redeemableCap
* remove redeem test
* update error message
* fix build
* Trigger Jenkins
* use correct credit setting for redeem
* fix comment
* change create qury to get redeemable_cap from offers table
* change referredBy to a pointer in user credit struct