Commit Graph

19 Commits

Author SHA1 Message Date
Andrew Harding
cb89496569 storagenode/trust: wire up list into pool
- also updated ping chore to pick up trust changes
- fixed small typo in blueprint
- fixed flags for storj-sim
- wired up changes to testplanet

Change-Id: I02982f3a63a1b4150b82a009ee126b25ed51917d
2019-12-13 20:32:50 +00:00
Rafael Antonio Ribeiro Gomes
da39c71d35
storagenode: add new metric satellite.request (#3610)
* storagenode: add new metric satellite.request

* storagenode: metrics fixed

* switch from Counter to Meter
2019-11-19 18:11:31 -03:00
Egon Elbre
ee6c1cac8a
private: rename internal to private (#3573) 2019-11-14 21:46:15 +02:00
Jeff Wendling
ebcd37c572 storagenode/contact: fix connection leak with contact checkin
Change-Id: If86002557144d5d8dbff939d2b6a2dfec6537577
2019-11-06 18:00:09 +00:00
littleskunk
7eb6724c92
logging: unify logging around satellite ID, node ID and piece ID (#3491)
* logging: unify logging around satellite ID, node ID and piece ID

* unify segment index
2019-11-05 22:04:07 +01:00
Jennifer Li Johnson
aa7d15a365
storagenode/contact: exponential backoff retries for pinging Satellites (#3372) 2019-11-04 16:03:21 -05:00
Egon Elbre
87687938d1 storagenode/contact: fix panic in ping satellites (#3447) 2019-11-01 16:20:53 +01:00
Jess G
4d85b11574
satellite/contact: improve errors in contact endpoints (#3356)
* improve errors in satellite contact endpoints

* add changes per CR comments

* update pingback method so it still updates node table

* fix err and returns

* fix zap logging to be better
2019-10-30 11:57:21 -07:00
Egon Elbre
93353df4d6
internal/sync2: make Fence accept context (#3393) 2019-10-28 16:04:31 +02:00
paul cannon
1469f7f41f
storagenode/contact: wait for UpdateSelf before start (#3332)
When the contact chore starts running before the monitor service has
provided any useful capacity data, the first outgoing contact has
not-very-helpful data for the satellite. This change causes the contact
chore to wait until capacity data is available. The wait should be quite
short in all reasonable cases: even when a node starts with a lot of
stored pieces and no cached spaceUsedDB data, new data will have been
calculated and cached by the call to
`peer.Storage2.CacheService.Init(ctx)` in `storagenode.cmdRun()` before
`peer.Run(ctx)`.

Change-Id: Ibc26d5c1fc10a23006c00bc3f13ff6cf71f8bf1d
2019-10-26 12:16:25 -05:00
Cameron
d17be58237 remove random sleep in storagenode contact (#3243) 2019-10-11 16:44:18 -04:00
Jeff Wendling
098cbc9c67 all: use pkg/rpc instead of pkg/transport
all of the packages and tests work with both grpc and
drpc. we'll probably need to do some jenkins pipelines
to run the tests with drpc as well.

most of the changes are really due to a bit of cleanup
of the pkg/transport.Client api into an rpc.Dialer in
the spirit of a net.Dialer. now that we don't need
observers, we can pass around stateless configuration
to everything rather than stateful things that issue
observations. it also adds a DialAddressID for the
case where we don't have a pb.Node, but we do have an
address and want to assert some ID. this happened
pretty frequently, and now there's no more weird
contortions creating custom tls options, etc.

a lot of the other changes are being consistent/using
the abstractions in the rpc package to do rpc style
things like finding peer information, or checking
status codes.

Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412
2019-09-25 15:37:06 -06:00
Jennifer Li Johnson
724bb44723
Remove Kademlia dependencies from Satellite and Storagenode (#2966)
What:

cmd/inspector/main.go: removes kad commands
internal/testplanet/planet.go: Waits for contact chore to finish
satellite/contact/nodesservice.go: creates an empty nodes service implementation
satellite/contact/service.go: implements Local and FetchInfo methods & adds external address config value
satellite/discovery/service.go: replaces kad.FetchInfo with contact.FetchInfo in Refresh() & removes Discover()
satellite/peer.go: sets up contact service and endpoints
storagenode/console/service.go: replaces nodeID with contact.Local()
storagenode/contact/chore.go: replaces routing table with contact service
storagenode/contact/nodesservice.go: creates empty implementation for ping and request info nodes service & implements RequestInfo method
storagenode/contact/service.go: creates a service to return the local node and update its own capacity
storagenode/monitor/monitor.go: uses contact service in place of routing table
storagenode/operator.go: moves operatorconfig from kad into its own setup
storagenode/peer.go: sets up contact service, chore, pingstats and endpoints
satellite/overlay/config.go: changes NodeSelectionConfig.OnlineWindow default to 4hr to allow for accurate repair selection
Removes kademlia setups in:

cmd/storagenode/main.go
cmd/storj-sim/network.go
internal/testplane/planet.go
internal/testplanet/satellite.go
internal/testplanet/storagenode.go
satellite/peer.go
scripts/test-sim-backwards.sh
scripts/testdata/satellite-config.yaml.lock
storagenode/inspector/inspector.go
storagenode/peer.go
storagenode/storagenodedb/database.go
Why: Replacing Kademlia

Please describe the tests:
• internal/testplanet/planet_test.go:

TestBasic: assert that the storagenode can check in with the satellite without any errors
TestContact: test that all nodes get inserted into both satellites' overlay cache during testplanet setup
• satellite/contact/contact_test.go:

TestFetchInfo: Tests that the FetchInfo method returns the correct info
• storagenode/contact/contact_test.go:

TestNodeInfoUpdated: tests that the contact chore updates the node information
TestRequestInfoEndpoint: tests that the Request info endpoint returns the correct info
Please describe the performance impact: Node discovery should be at least slightly more performant since each node connects directly to each satellite and no longer needs to wait for bootstrapping. It probably won't be faster in real time on start up since each node waits a random amount of time (less than 1 hr) to initialize its first connection (jitter).
2019-09-19 15:56:34 -04:00
Jess G
93788e5218
remove kademlia: create upsert query to update uptime (#2999)
* create upsert query for check-in method

* add tests

* fix lint err

* add benchmark test for db query

* fix lint and tests

* add a unit test, fix lint

* add address to tests

* replace print w/ b.Fatal

* refactor query per CR comments

* fix disqualified, only set if null

* fix query

* add version to updatecheckin query

* fix version

* fix tests

* change version for tests

* add version to tests

* add IP, add transport, mv unit test

* use node.address as arg

* add last ip

* fix lint
2019-09-19 11:37:31 -07:00
Jess G
d3ef574b20 pkg/pb: minor changes to contact.proto (#3048)
* minor fixes to contact proto

* simply and rm nodeAddr object from client
2019-09-13 19:37:32 -05:00
Cameron
44bcdd222f storagenode/contact: test node info is updated (#3039) 2019-09-13 07:53:48 -04:00
Jess G
2fc4d61610
implement contact.checkin method (#2952)
* implement contact.checkin method

* add batching to update uptime checks

* rm batching

* rm other unneeded things

* fix lint

* fix unit test

* changes per CR comments

* couple more CR changes

* add identity check into grpcOpt

* fix lint

* why do you fix the test

* revert test change

* stop contact chore for repair test

* put node in cache

* comment out contact chore. See what happens

* Revert "comment out contact chore. See what happens"

This reverts commit 2e45008e36a50e0a842ae455ac83de77093d4daa.

* try stopping contact earlier

* stop contact chore in uplink_test

* replace self on chore with *RoutingTable for access to latest node info

* Revert "stop contact chore in uplink_test"

This reverts commit 302db70f4071112d1b9f7ee0279225ea12757723.

* Revert "try stopping contact earlier"

This reverts commit 806cc3b82f9d598899dafd83da9315a1cb0cb43c.

* Revert "stop contact chore for repair test"

This reverts commit dd34de1cfdfc09b972186c9ab9a4f1e822446b79.
2019-09-10 09:05:07 -07:00
Egon Elbre
a801fab66a
all: add archview annotations (#2964) 2019-09-10 16:24:16 +03:00
Jennifer Li Johnson
3387750280 storagenode/contact: create chore for nodes to ping satellites (#2877)
Creates a chore for nodes to announce themselves to their trusted satellites. Runs on startup and every hour thereafter
2019-09-06 12:14:03 -04:00