* update lock file and add comment
* add created at and bytes transferred
* cleanup
* rename db func to GetGracefulExitNodesByTimeFrame
* fix flag
* split into two overlay functions
* := to =
* fix test
* add node not found error class
* fix overlay test
* suggested test changes
* review suggestions
* get exit status from overlay.Get()
* check rows.Err
* fix panic when ExitFinishedAt is nil
* fix comments in cmdGracefulExit
libuplink was incorrectly setting timeouts to 10 seconds still, but
should have been at least 10 minutes. the order sender was setting them
to 1 hour. we don't want timeouts in uplink-side logic as it establishes
a minimum rate on tcp streams.
instead of all of this, just use tcp keep alive. tcp keep alive packets are
sent every 15 seconds and if the peer stops responding the connection
dies. this is enabled by default with go. this will kill tcp connections
when they stop working.
Change-Id: I3d7ad49f71950b3eb43044eedf4b17993116045b
* uplink/storage/segments: return error no optimal threshold
Return an error if the store get less uploaded pieces than the indicated
by the optimal threshold.
* satellite/metainfo: Fix gRPC status error & add reason
This commit fix the CommitSegment endpoint method to return an
"Invalid Argument" status code when uplink submits invalid data which is
detected when filtering invalid pieces by filterInvalidPieces endpoint
method.
Because filterInvalidPieces is also used by CommitSegmentOld, such
method part has been changed accordingly.
* An initial check in CommitSegment to detect earlier if uplink sends an
invalid number of upload pieces.
* Add more information to some log messages.
* Return more information to uplink when it sends a number of invalid
pieces which make impossible to finish the operation successfully.
* satellite/metainfo: Swap some "sugar" loggers to normal ones
Swap "sugar" loggers to normal ones because they impact the performance
in production systems and they should only be used under specific
circumstances which were none of the ones changed.
* add metrics counter and chore
* updates metrics observer interval release default and dev default to 15min
* add more specific check for remote pointers
* add Counter field to metrics chore, add counter tests
* rm redundant ObjectCount suffix
* make pointer check easier to read
* change metrics.Config.Interval to ChoreInterval
* rm unneeded var
* fix comment
* update satellite config lock
* set up redis support in live accounting
* move live.Service interface into accounting package and rename to Cache, pass into satellite
* refactor Cache to store one int64 total, add IncrBy method to redis client implementation
* add monkit tracing to live accounting
all of the packages and tests work with both grpc and
drpc. we'll probably need to do some jenkins pipelines
to run the tests with drpc as well.
most of the changes are really due to a bit of cleanup
of the pkg/transport.Client api into an rpc.Dialer in
the spirit of a net.Dialer. now that we don't need
observers, we can pass around stateless configuration
to everything rather than stateful things that issue
observations. it also adds a DialAddressID for the
case where we don't have a pb.Node, but we do have an
address and want to assert some ID. this happened
pretty frequently, and now there's no more weird
contortions creating custom tls options, etc.
a lot of the other changes are being consistent/using
the abstractions in the rpc package to do rpc style
things like finding peer information, or checking
status codes.
Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412
What:
cmd/inspector/main.go: removes kad commands
internal/testplanet/planet.go: Waits for contact chore to finish
satellite/contact/nodesservice.go: creates an empty nodes service implementation
satellite/contact/service.go: implements Local and FetchInfo methods & adds external address config value
satellite/discovery/service.go: replaces kad.FetchInfo with contact.FetchInfo in Refresh() & removes Discover()
satellite/peer.go: sets up contact service and endpoints
storagenode/console/service.go: replaces nodeID with contact.Local()
storagenode/contact/chore.go: replaces routing table with contact service
storagenode/contact/nodesservice.go: creates empty implementation for ping and request info nodes service & implements RequestInfo method
storagenode/contact/service.go: creates a service to return the local node and update its own capacity
storagenode/monitor/monitor.go: uses contact service in place of routing table
storagenode/operator.go: moves operatorconfig from kad into its own setup
storagenode/peer.go: sets up contact service, chore, pingstats and endpoints
satellite/overlay/config.go: changes NodeSelectionConfig.OnlineWindow default to 4hr to allow for accurate repair selection
Removes kademlia setups in:
cmd/storagenode/main.go
cmd/storj-sim/network.go
internal/testplane/planet.go
internal/testplanet/satellite.go
internal/testplanet/storagenode.go
satellite/peer.go
scripts/test-sim-backwards.sh
scripts/testdata/satellite-config.yaml.lock
storagenode/inspector/inspector.go
storagenode/peer.go
storagenode/storagenodedb/database.go
Why: Replacing Kademlia
Please describe the tests:
• internal/testplanet/planet_test.go:
TestBasic: assert that the storagenode can check in with the satellite without any errors
TestContact: test that all nodes get inserted into both satellites' overlay cache during testplanet setup
• satellite/contact/contact_test.go:
TestFetchInfo: Tests that the FetchInfo method returns the correct info
• storagenode/contact/contact_test.go:
TestNodeInfoUpdated: tests that the contact chore updates the node information
TestRequestInfoEndpoint: tests that the Request info endpoint returns the correct info
Please describe the performance impact: Node discovery should be at least slightly more performant since each node connects directly to each satellite and no longer needs to wait for bootstrapping. It probably won't be faster in real time on start up since each node waits a random amount of time (less than 1 hr) to initialize its first connection (jitter).
* create upsert query for check-in method
* add tests
* fix lint err
* add benchmark test for db query
* fix lint and tests
* add a unit test, fix lint
* add address to tests
* replace print w/ b.Fatal
* refactor query per CR comments
* fix disqualified, only set if null
* fix query
* add version to updatecheckin query
* fix version
* fix tests
* change version for tests
* add version to tests
* add IP, add transport, mv unit test
* use node.address as arg
* add last ip
* fix lint
What: we move api keys out of the grpc connection-level metadata on the client side and into the request protobufs directly. the server side still supports both mechanisms for backwards compatibility.
Why: dRPC won't support connection-level metadata. the only thing we currently use connection-level metadata for is api keys. we need to move all information needed by a request into the request protobuf itself for drpc support. check out the .proto changes for the main details.
One fun side-fact: Did you know that protobuf fields 1-15 are special and only use one byte for both the field number and type? Additionally did you know we don't use field 15 anywhere yet? So the new request header will use field 15, and should use field 15 on all protobufs going forward.
Please describe the tests: all existing tests should pass
Please describe the performance impact: none
* add test to make sure we will reverify the share in the containment db rather than in the pointer passed into reverify
* use pending audit information only when running reverify
* update audit status as failed for nodes that failed piece hash verification
* remove comment
* fix lint error
* add test
* fix format
* use named return value for Get
* add comments
* add more better comment
* format