As per discussed we decided to rate limit how fast we iterate through
the metainfo database in the metainfo loop. This puts in place a
mechanism for rate limiting and burst limiting if need be in the future.
The default for this rate limiting is still no limits so it stays the
same as our previous functionality.
Change-Id: I950f7192962b0e49f082d2c4284e2d52b0a925c7
Adds check to see if storage nodes are eligible to initiate
graceful exit, by checking their CreatedAt date and seeing if
their "age" is greater than the new config value:
NodeMinAgeInMonths
The default for this value is 6 months for now.
https://storjlabs.atlassian.net/browse/V3-3357
Change-Id: Ib807ab8987ddb5a38a27a83886490f73fe8c5816
test-sim-versions.sh tests upgrading the satellite, storagenodes, and uplinks from the most recent release to master, and ensures that compatibility across all uplink versions since v0.15 is maintained.
Change-Id: I80a54236d0eb2d681716caf4b825a883bdc25ef1
- also updated ping chore to pick up trust changes
- fixed small typo in blueprint
- fixed flags for storj-sim
- wired up changes to testplanet
Change-Id: I02982f3a63a1b4150b82a009ee126b25ed51917d
* add integration tests to jenksin
* have jenkins run storj-sim integration tests w/crdb
Change-Id: I696d55c5894aaf630dcd7a566e1dd705ee88486b
* rm crdb integration tests to see if postgres passes
Change-Id: I1727a027ff802acbff5fc55961a0d605faefcf2d
* comment out aws tests to see if that is the error
Change-Id: I456c3d36f6a4ce7760ea0b6c402b6ea16cfe77e3
* add aws profile to integration tests
Change-Id: Ic01185dbc7b84ac48dfb846f8f272b34b50379b6
* add tmp path for aws profile and creds
Change-Id: I7b82ee5a99937edd3f66ae01bfb5cb21028a62cf
* change linux KiB syntax to bytes to support osx
Change-Id: Ia1f1027ba8da64a6ba537062deb9b3519973621f
* satellite/console: Add X-Frame-Options and Referrer-Policy security headers
* Update to use CSP instead of XFO and include tardigrade.io
* Make FrameAncestors a config option
* Update satellite-config lock
* Make help text for FrameAncestors better
* change satellite.Peer name to Core
* change to Core in testplanet
* missed a few places
* keep shared stuff in peer.go to stay consistent with storj/docs
* rm dup api code from sa peer, update storj-sim
* fix for backwards compat tests
* use env var instead of localhost
* changes per CR
* fix env var name
* skip peer for setup
* add overall failure percentage check and inactive time frame check before sending a response to sno
* update comment
* delete node from transfer queue if it has been inactive for too long
* fix linting error
* add test config value
* fix nil pointer
* add config value into testplanet
* add unit test for overall failure threshold
* move timeframe threshold to chore
* update protolock
* add chore test
* add per peiece failure count logic
* change config name from EndpointMaxFailures to MaxFailuresPerPiece
* address comments
* fix linting error
* add error handling for no row returned from progress table
* fix test for graceful exit chore on storagenode
* fix typo InActive -> Inactive
* improve readability for failure threshold calculation
* update config lock
* change error handling for GetProgress in graceful exit endpoint on the satellite side
* return proper rpc error in endpoint
* add check in chore test for checking finish timestamp and queue
* add metrics counter and chore
* updates metrics observer interval release default and dev default to 15min
* add more specific check for remote pointers
* add Counter field to metrics chore, add counter tests
* rm redundant ObjectCount suffix
* make pointer check easier to read
* change metrics.Config.Interval to ChoreInterval
* rm unneeded var
* fix comment
* update satellite config lock
* set up redis support in live accounting
* move live.Service interface into accounting package and rename to Cache, pass into satellite
* refactor Cache to store one int64 total, add IncrBy method to redis client implementation
* add monkit tracing to live accounting
* test that all nodes can check in with all satellites
* keep kademlia config
* add untrusted satellite test
* use getversion
* remove kademlia config changes in test-sim-backwards.sh
* add kademlia flags back to storj-sim storagenode
* reset kademlia flags in storagenode entrypoint
What:
cmd/inspector/main.go: removes kad commands
internal/testplanet/planet.go: Waits for contact chore to finish
satellite/contact/nodesservice.go: creates an empty nodes service implementation
satellite/contact/service.go: implements Local and FetchInfo methods & adds external address config value
satellite/discovery/service.go: replaces kad.FetchInfo with contact.FetchInfo in Refresh() & removes Discover()
satellite/peer.go: sets up contact service and endpoints
storagenode/console/service.go: replaces nodeID with contact.Local()
storagenode/contact/chore.go: replaces routing table with contact service
storagenode/contact/nodesservice.go: creates empty implementation for ping and request info nodes service & implements RequestInfo method
storagenode/contact/service.go: creates a service to return the local node and update its own capacity
storagenode/monitor/monitor.go: uses contact service in place of routing table
storagenode/operator.go: moves operatorconfig from kad into its own setup
storagenode/peer.go: sets up contact service, chore, pingstats and endpoints
satellite/overlay/config.go: changes NodeSelectionConfig.OnlineWindow default to 4hr to allow for accurate repair selection
Removes kademlia setups in:
cmd/storagenode/main.go
cmd/storj-sim/network.go
internal/testplane/planet.go
internal/testplanet/satellite.go
internal/testplanet/storagenode.go
satellite/peer.go
scripts/test-sim-backwards.sh
scripts/testdata/satellite-config.yaml.lock
storagenode/inspector/inspector.go
storagenode/peer.go
storagenode/storagenodedb/database.go
Why: Replacing Kademlia
Please describe the tests:
• internal/testplanet/planet_test.go:
TestBasic: assert that the storagenode can check in with the satellite without any errors
TestContact: test that all nodes get inserted into both satellites' overlay cache during testplanet setup
• satellite/contact/contact_test.go:
TestFetchInfo: Tests that the FetchInfo method returns the correct info
• storagenode/contact/contact_test.go:
TestNodeInfoUpdated: tests that the contact chore updates the node information
TestRequestInfoEndpoint: tests that the Request info endpoint returns the correct info
Please describe the performance impact: Node discovery should be at least slightly more performant since each node connects directly to each satellite and no longer needs to wait for bootstrapping. It probably won't be faster in real time on start up since each node waits a random amount of time (less than 1 hr) to initialize its first connection (jitter).
* Adjust arm32v6 and aarch64 docker images to match the hello-world image
* Update from master, fix potential bug in push-images target, and update storagenode deploy to handle arm64 image
Creates a new chore, dbcleanup, which can be used for routine deletion of items from the satellite database and adds functionality for deletion of expired serial numbers
* + add git worktree cleanup
+ specify commit in git worktree add
* fail backwards compatibility test if no postgres
* fix typo
* rm dir after worktree rm
* add jenkins runs backwards compat test, fix sa config issue
* try to use current branch
* add jenkins stage condition of branch name
* try when with branch env var
* only run test on master branch
* fix != to ==
* add comment, fix indentation
* run for all branches
* make comment more specific
* add stage to jenkins, add script for backwards compat tests
* debug backwards compat tests
* add setup in a script
* set env vars
* add more env ars
* mv to one script to debug
* debug api key problem
* add tmp dir
* rm print statements