* storagenode/storagenodedb: Migrate to separate dbs
* storagenode/storagenodedb: Add migration to drop versions tables
* Put drop table statements into a transaction.
* Fix CI errors.
* Fix CI errors.
* Changes requested from PR feedback.
* storagenode/storagenodedb: fix tx commit
* test that all nodes can check in with all satellites
* keep kademlia config
* add untrusted satellite test
* use getversion
* remove kademlia config changes in test-sim-backwards.sh
* add kademlia flags back to storj-sim storagenode
* reset kademlia flags in storagenode entrypoint
What:
cmd/inspector/main.go: removes kad commands
internal/testplanet/planet.go: Waits for contact chore to finish
satellite/contact/nodesservice.go: creates an empty nodes service implementation
satellite/contact/service.go: implements Local and FetchInfo methods & adds external address config value
satellite/discovery/service.go: replaces kad.FetchInfo with contact.FetchInfo in Refresh() & removes Discover()
satellite/peer.go: sets up contact service and endpoints
storagenode/console/service.go: replaces nodeID with contact.Local()
storagenode/contact/chore.go: replaces routing table with contact service
storagenode/contact/nodesservice.go: creates empty implementation for ping and request info nodes service & implements RequestInfo method
storagenode/contact/service.go: creates a service to return the local node and update its own capacity
storagenode/monitor/monitor.go: uses contact service in place of routing table
storagenode/operator.go: moves operatorconfig from kad into its own setup
storagenode/peer.go: sets up contact service, chore, pingstats and endpoints
satellite/overlay/config.go: changes NodeSelectionConfig.OnlineWindow default to 4hr to allow for accurate repair selection
Removes kademlia setups in:
cmd/storagenode/main.go
cmd/storj-sim/network.go
internal/testplane/planet.go
internal/testplanet/satellite.go
internal/testplanet/storagenode.go
satellite/peer.go
scripts/test-sim-backwards.sh
scripts/testdata/satellite-config.yaml.lock
storagenode/inspector/inspector.go
storagenode/peer.go
storagenode/storagenodedb/database.go
Why: Replacing Kademlia
Please describe the tests:
• internal/testplanet/planet_test.go:
TestBasic: assert that the storagenode can check in with the satellite without any errors
TestContact: test that all nodes get inserted into both satellites' overlay cache during testplanet setup
• satellite/contact/contact_test.go:
TestFetchInfo: Tests that the FetchInfo method returns the correct info
• storagenode/contact/contact_test.go:
TestNodeInfoUpdated: tests that the contact chore updates the node information
TestRequestInfoEndpoint: tests that the Request info endpoint returns the correct info
Please describe the performance impact: Node discovery should be at least slightly more performant since each node connects directly to each satellite and no longer needs to wait for bootstrapping. It probably won't be faster in real time on start up since each node waits a random amount of time (less than 1 hr) to initialize its first connection (jitter).
* nicer flags
* fix concurrency
* add concurrent workers
* initialize things
* fix tests
* close retain service
* ensure we don't have workers working on the same satellite
* ensure things compile
* fix other compilation issues:
* concurrency changes
ran this with `go test -count=1000` and it passed all of them.
- we add a closed channel so that we can select on it with
context cancellation.
- we put a once in so we only close the channel once.
- every time the queue/running state changes, we have to broadcast
because we may want to wake up N pending Wait calls or other
concurrent workers.
- because we broadcast, we don't need to do the polling in Wait
anymore.
- ensure Run doesn't start multiple times so that we don't have
to worry about concurrent Close with multiple Runs.
- hold the lock while we start workers so that a concurrent Close
with Run can't decide that there's nothing started and exit
and then have Run start things.
- make sure to poll the closed/context channels through loops
or at the start of Run calls in case Close happens first.
- these polls should be under a mutex because they have a default
case which makes it possible to schedule such that Close hasn't
executed the channel close so it starts more work.
- cancel a local Run context when it's going to exit to make sure
that any retainPieces calls have a canceled context.
- hopefully enough comments to both check my work and help readers
digest what's going on.
Change-Id: Ida0e226a7e01e8ae64fa2c59dd5a84b04bccfbd7
* use the retain error class
Change-Id: I1511eaef135f98afd57b878e997e4c8a0d11cafc
* concurrency fixes again
- forgot to update the gc test to use the old Wait api.
- we need to drop the lock while we wait for the workers
to exit, because they may be blocked on the condition
variable
- additionally, we need to broadcast when we close the
signal channel because the state changed: they want
to wake up and exit.
Change-Id: I4204699792275260cd912f29aa73720f7d9b14b5
* undo my misguided rename
Change-Id: I6baffe1eb0434e260212c485bbcc01bed3250881
* remove pollInterval
* format paragraph more nicely
* move skew calculation into retain pieces
This PR introduces functionality for routine deletion of archived orders.
The user may specify an interval at which to run archive cleanup and a TTL for archived items. During each cleanup, all items that have reached the TTL are deleted
This archive cleanup job is combined with the order sender into a new combined orders service
Add retain service on storagenode. This service runs retain jobs that have been queued by the storagenodes. Rather than running retain jobs during the grpc Retain() call, the grpc call queues a retain job to the retain service and returns immediately afterwards, removing a significant bottleneck in garbage collection.
* add cache, update cache w/piece create/delete
* add service w/loop to cache to recalculate space used cache
* add piecestore cache to other sn svcs to use
* add table to persist the total space used
* rm cache where not needed
* rm stuff from sn svcs
* start fixing tests, changes per comments
* update commits
* add unit tests
* fix commiting before we write header bytes
* fix cache create test
* copy cache map, add started back to recalc
* fix test
* add test, update comments
* re-organizing into bandwidth service. re-enable rollup loop
* Prevent uniqueness failure in bandwidth rollup
* Add test to make sure the rollup select date range works correctly
* add bandwidth config for rollup interval
* add voucher service on storage node
* config field tag syntax, go routines for requests
* hook up voucher service in storagenode/peer.go
* add voucher config to testplanet
* add voucher config to testplanet
* add voucher response status INVALID, ACCEPTED, REJECTED
* add a test for vouchers service
* handle no row from GetValid, test it
* add trust pool to voucher service
* use trusted list to get satellites
* verify vouchers upon receipt
* test VerifyVoucher