Commit Graph

1246 Commits

Author SHA1 Message Date
Jennifer Li Johnson
7ceaabb18e
Delete Bootstrap and Kademlia (#2974) 2019-10-04 16:48:41 -04:00
Jeff Wendling
4fab22d691 pkg/rpc: don't leak goroutines during a drpc dial
we spawned a goroutine to wait on the context's done
channel sending the error afterward, but we forgot
to ensure the context was eventually done, so the
goroutine would be leaked until then.

instead, we can just do a select on two channels to
get the error rather than spawn a goroutine which
makes it impossible to leak a goroutine.

Change-Id: I2fdba206ae6ff7a3441b00708b86b36dfeece2b5
2019-10-04 20:09:36 +00:00
Jeff Wendling
64e43e555e pkg/rpc: return context error if ready after DialContext fails
the net package does not make it easy to know if DialContext
failed because the context was done. it's important for some
of our tests that canceled contexts are detected as such, so
we accept the small race that's arguably correct (the context
must be canceled asynchronously) to ensure we always return
the context error if available.

Change-Id: I058064d5c666e5353b74fb5bd300bf7abe537ff5
2019-10-04 20:09:00 +00:00
Jeff Wendling
c9e0aa7c70 pkg/kademlia: make tests run/work with drpc
Change-Id: I69372fd8f0d52913e1ad2cf7d01115460ba8eeda
2019-10-03 15:33:25 -06:00
littleskunk
b2e328f118 storagenode/dashboard: update online status (#3168) 2019-10-03 20:31:39 +02:00
Isaac Hess
94c7df0d6e
pkg/rpc/rpcstatus: Fix return type (#3162) 2019-10-02 14:46:18 -06:00
Jennifer Li Johnson
29b96a666b
internal/testplanet: fix conn leak (#3132) 2019-09-27 09:47:57 -06:00
Jeff Wendling
93349f247e pkg/rpc: add WithInsecure when doing non-tls dials
Change-Id: I993f223f4ac78824b75a7725342ebf2ae0f74254
2019-09-27 09:07:14 -06:00
Bryan White
c8aa821ccb
pkg/certificates: move certificate package to root (#3107) 2019-09-26 09:11:05 -07:00
Jeff Wendling
098cbc9c67 all: use pkg/rpc instead of pkg/transport
all of the packages and tests work with both grpc and
drpc. we'll probably need to do some jenkins pipelines
to run the tests with drpc as well.

most of the changes are really due to a bit of cleanup
of the pkg/transport.Client api into an rpc.Dialer in
the spirit of a net.Dialer. now that we don't need
observers, we can pass around stateless configuration
to everything rather than stateful things that issue
observations. it also adds a DialAddressID for the
case where we don't have a pb.Node, but we do have an
address and want to assert some ID. this happened
pretty frequently, and now there's no more weird
contortions creating custom tls options, etc.

a lot of the other changes are being consistent/using
the abstractions in the rpc package to do rpc style
things like finding peer information, or checking
status codes.

Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412
2019-09-25 15:37:06 -06:00
Bryan White
a7040647a4
run certificate authorization endpoint (#3108) 2019-09-23 15:19:13 -07:00
Jeff Wendling
d32d85a717 pkg/listenmux: resolve deadlock in test
it was possible, because we spawned Run before we did any calls
to Route, that the listenmux would send multiple connections to
the default listener. Fix that by ensuring we call Route before
we call Run.

Change-Id: Ie8fd754997975969a99fd2a3f8d3010c24cdc73d
2019-09-20 21:16:59 +00:00
Jeff Wendling
a20a7db793 pkg/rpc: build tag based selection of rpc details
It provides an abstraction around the rpc details so that one
can use dprc or gprc with the same code. It subsumes using the
protobuf package directly for client interfaces as well as
the pkg/transport package to perform dials.

Change-Id: I8f5688bd71be8b0c766f13029128a77e5d46320b
2019-09-20 21:07:33 +00:00
Jennifer Li Johnson
724bb44723
Remove Kademlia dependencies from Satellite and Storagenode (#2966)
What:

cmd/inspector/main.go: removes kad commands
internal/testplanet/planet.go: Waits for contact chore to finish
satellite/contact/nodesservice.go: creates an empty nodes service implementation
satellite/contact/service.go: implements Local and FetchInfo methods & adds external address config value
satellite/discovery/service.go: replaces kad.FetchInfo with contact.FetchInfo in Refresh() & removes Discover()
satellite/peer.go: sets up contact service and endpoints
storagenode/console/service.go: replaces nodeID with contact.Local()
storagenode/contact/chore.go: replaces routing table with contact service
storagenode/contact/nodesservice.go: creates empty implementation for ping and request info nodes service & implements RequestInfo method
storagenode/contact/service.go: creates a service to return the local node and update its own capacity
storagenode/monitor/monitor.go: uses contact service in place of routing table
storagenode/operator.go: moves operatorconfig from kad into its own setup
storagenode/peer.go: sets up contact service, chore, pingstats and endpoints
satellite/overlay/config.go: changes NodeSelectionConfig.OnlineWindow default to 4hr to allow for accurate repair selection
Removes kademlia setups in:

cmd/storagenode/main.go
cmd/storj-sim/network.go
internal/testplane/planet.go
internal/testplanet/satellite.go
internal/testplanet/storagenode.go
satellite/peer.go
scripts/test-sim-backwards.sh
scripts/testdata/satellite-config.yaml.lock
storagenode/inspector/inspector.go
storagenode/peer.go
storagenode/storagenodedb/database.go
Why: Replacing Kademlia

Please describe the tests:
• internal/testplanet/planet_test.go:

TestBasic: assert that the storagenode can check in with the satellite without any errors
TestContact: test that all nodes get inserted into both satellites' overlay cache during testplanet setup
• satellite/contact/contact_test.go:

TestFetchInfo: Tests that the FetchInfo method returns the correct info
• storagenode/contact/contact_test.go:

TestNodeInfoUpdated: tests that the contact chore updates the node information
TestRequestInfoEndpoint: tests that the Request info endpoint returns the correct info
Please describe the performance impact: Node discovery should be at least slightly more performant since each node connects directly to each satellite and no longer needs to wait for bootstrapping. It probably won't be faster in real time on start up since each node waits a random amount of time (less than 1 hr) to initialize its first connection (jitter).
2019-09-19 15:56:34 -04:00
Jess G
93788e5218
remove kademlia: create upsert query to update uptime (#2999)
* create upsert query for check-in method

* add tests

* fix lint err

* add benchmark test for db query

* fix lint and tests

* add a unit test, fix lint

* add address to tests

* replace print w/ b.Fatal

* refactor query per CR comments

* fix disqualified, only set if null

* fix query

* add version to updatecheckin query

* fix version

* fix tests

* change version for tests

* add version to tests

* add IP, add transport, mv unit test

* use node.address as arg

* add last ip

* fix lint
2019-09-19 11:37:31 -07:00
Kaloyan Raev
45df0c5340
storagenode/process: respond to Windows Service events (#3025) 2019-09-19 19:37:40 +03:00
JT Olio
946ec201e2
metainfo: move api keys to part of the request (#3069)
What: we move api keys out of the grpc connection-level metadata on the client side and into the request protobufs directly. the server side still supports both mechanisms for backwards compatibility.

Why: dRPC won't support connection-level metadata. the only thing we currently use connection-level metadata for is api keys. we need to move all information needed by a request into the request protobuf itself for drpc support. check out the .proto changes for the main details.

One fun side-fact: Did you know that protobuf fields 1-15 are special and only use one byte for both the field number and type? Additionally did you know we don't use field 15 anywhere yet? So the new request header will use field 15, and should use field 15 on all protobufs going forward.

Please describe the tests: all existing tests should pass

Please describe the performance impact: none
2019-09-19 10:19:29 -06:00
Jess G
695de9dcd7
rm noisy debug logs that we dont need (#3083) 2019-09-18 12:43:57 -07:00
Egon Elbre
186e67e056 pkg/transport: set default timeout to 10 minutes (#3075) 2019-09-18 11:56:23 -04:00
Maximillian von Briesen
574c96c350
satellite/metainfo: Verify storagenode signature on satellite upload (#2985) 2019-09-18 09:50:33 -04:00
Jess G
7c203b4884
add satelliteSystem to testplanet and update tests (#3066) 2019-09-17 13:14:49 -07:00
Natalie Villasana
cc70cd2329
satellite/repair: add metric trackers for segment age before repair (#3056) 2019-09-17 15:18:48 -04:00
Ivan Fraixedes
febe32bc7a pkg/miniogw: Add a missed stack trace error (#3035)
Add the stack trace to an error returned by one of the functions for
being able to track down the origin in case the error happens and gets
logged.
2019-09-16 23:00:50 +03:00
Jess G
d3ef574b20 pkg/pb: minor changes to contact.proto (#3048)
* minor fixes to contact proto

* simply and rm nodeAddr object from client
2019-09-13 19:37:32 -05:00
Andrew Harding
f550ab5d1c
Uplink "import" command (#2981)
* uplink import cmd

* pkg/process: fix import order

* fix golangci-lint failures

* remove "help" from the satellite config lock file
2019-09-13 12:33:30 -06:00
Jeff Wendling
0dcbd3dc08 bootstrap/satellite/certificate/storagenode: register drpc services
Change-Id: Id29f14b76a8c9cb2be31001b9a7a4356a4bda183
2019-09-12 15:09:46 -06:00
Jeff Wendling
007662d49e pkg/server: serve drpc listeners along with grpc
The fundamental problem is that both drpc and grpc servers
want to close the listener and they both want to ignore the
error from Accept after the listener is closed. There's no
way to do this in a race free way. Fortunately, the mux
hands out listeners that can be independently closed. That
means they can both do their own shutdown logic where they
ignore the error, and then after they're closed, the code
orchestrating the servers can close the listeners.

The final weird bit is that the server's Close method is
required to wait until the Run method has exited (or at
least enough for the listeners to definitely be closed)
because tests depend on that behavior, so we have to add
some channels/mutexes/onces to ensure that Run has exited
and that a new call can't start after Close is called.

Change-Id: I7c4ef293f7963f83138815f51824fd5b8d09ce15
2019-09-12 19:18:30 +00:00
Jeff Wendling
477b47f554 pkg/server: use a listenmux with nothing registered
Change-Id: I25577b4afb907f4f8b57fc0428de6e6ea4ce9ba9
2019-09-12 19:12:53 +00:00
paul cannon
c139ed8ea1 storagenode/console: remove kademlia (#2942)
this is a trivial operation for storagenode/console, as it doesn't
really need or use kademlia in the first place.

What:

Removes kademlia from storagenode/console

Why:

We are in the process of getting rid of kademlia, and this is one place where it's particularly easy.

Please describe the tests:

Existing tests exercise storagenode/console behavior; if they continue to work, everything here should be tested satisfactorily.
Please describe the performance impact:

None
2019-09-11 16:41:43 -04:00
Jeff Wendling
708c95d044 pkg/listenmux: multiplex listener based on first bytes
Change-Id: If96bfd216e55f9950a42ab5be712a3cdec257a10
2019-09-11 11:16:09 -06:00
Bryan White
6c80f01bf0
pkg/certificates: add authorization endpoint and refactor (#2971) 2019-09-11 10:36:44 +02:00
paul cannon
7cf5650560 pkg/certificate: properly close certificateclient.Client (#2986) 2019-09-10 19:24:41 +03:00
Michal Niewrzal
64c467ffe7
uplink: integrate new Metainfo calls (#2640) 2019-09-10 08:39:47 -07:00
Michal Niewrzal
0ccae6b061 cmd: windows log file workaround (#2979) 2019-09-10 12:35:59 +03:00
Egon Elbre
e03e6844f8
pkg/process: reduce noise in storj-sim (#2988)
* Don't display stack for failure to start debug endpoints
* Don't display stack for disabled telemetry
2019-09-10 10:32:53 +03:00
Egon Elbre
40ca660c06
all: use min tls 1.2 for grpc (#2967) 2019-09-09 23:09:01 +03:00
Jeff Wendling
60eba990eb use-drpc: use protoc-gen-drpc to generate protobufs
Change-Id: I5c23256068e30864022dba5137c499796ab9d6ad
2019-09-06 13:28:27 -06:00
Egon Elbre
a3e0955e16
satellite/satellitedb: ensure that we process orders in order (#2950)
When transactions are handled in different orders there is a potential for a deadlock.
2019-09-06 17:49:30 +03:00
Bryan White
1fc0c63a1d
{cmd,pkg}/certificates: service refactor (#2938) 2019-09-05 17:11:21 +02:00
Jess G
f7bae57e5b
pk/pb: add initial proto for the new contact endpoints (#2948)
* add init contact proto

* make 2 svcs

* add geenerated proto code

* fux conflict with NodeRequest name

* use correct version on proto

* rm extra node fields

* add protolock

* update field names so they are better

* rm node id since we dont need it
2019-09-04 10:07:06 -07:00
Ivan Fraixedes
d7d6e23a3e
pkg/server: Don't use Sugared logger (#2935)
Replace the usage of the Sugared logger by the normal one.
2019-09-03 11:39:26 +02:00
JT Olio
48ba45eb55
pkg/process prometheus: forgot digits (#2920)
Change-Id: I1825d31dedcce45a80663d27261e2db0cb14c0f1
2019-08-29 17:51:03 -06:00
Jess G
b20dcfd64c add type to prometheus metrics (#2916) 2019-08-29 23:42:38 +02:00
JT Olio
b3f9a8813d pkg/process: remove prometheus help (#2914)
the current prometheus help messages have enough unexpected
characters that they are breaking prometheus parsing. they
may also be triggering prometheus to expect more from us (type
annotations) than we have to offer.

we're really not adding a lot of value with these help messages,
so just take them out

Change-Id: I9b723447a294bb492a6292480e9f88634346a80b
2019-08-29 12:42:11 -07:00
Egon Elbre
c309bd3fec
lint: add linting for errs package (#2881) 2019-08-27 19:07:12 +03:00
Bill Thorp
a250551b6d storagenode/piecestore + uplink/piecestore: return PieceHash and original OrderLimit during GET_REPAIR (#2775) 2019-08-26 14:57:41 -04:00
Yaroslav Vorobiov
2ae4129d06
satellite/nodestats: add disqualified flag #2856 2019-08-23 13:58:20 +03:00
JT Olio
12d50ebb99
streams: don't encrypt segment count (#2859)
What: this change makes sure the count of segments is not encrypted.

Why: having the segment count encrypted just makes things hard for no reason - a satellite operator can figure out how many segments an object has by looking at the other segments in the database. but if a user has access but has lost their encryption key, they now can't clean up or delete old segments because they can't know how many there are without just guessing until they get errors. :(

Backwards compatibility: clients will still understand old pointers and will still write old pointers. at some point in the future perhaps we can do a migration for remaining old pointers so we can delete the old code.

Please describe the tests: covered by existing tests

Please describe the performance impact: none
2019-08-22 15:15:58 -06:00
Egon Elbre
00b2e1a7d7 all: enable staticcheck (#2849)
* by having megacheck in disable it also disabled staticcheck

* fix closing body

* keep interfacer disabled

* hide bodies

* don't use deprecated func

* fix dead code

* fix potential overrun

* keep stylecheck disabled

* don't pass nil as context

* fix infinite recursion

* remove extraneous return

* fix data race

* use correct func

* ignore unused var

* remove unused consts
2019-08-22 13:40:15 +02:00
Egon Elbre
9ec0ceddf3
pkg/revocation: ensure we close revocation databases (#2825) 2019-08-20 18:04:17 +03:00