Commit Graph

30 Commits

Author SHA1 Message Date
Egon Elbre
b04f9996c5 pkg/rpc: move test that needs testplanet
Move rpc test that uses testplanet into private/testplanet.

This ensures that rpc doesn't have the whole system as a dependency
making it easier to separate.

This unfortunately leaves pkg/rpc without specific tests, but
we would need to write new tests that only use the core packages.

Change-Id: I402ab3c2d50282af159c2ef3371d23b0997fef0a
2019-12-17 13:31:12 +00:00
Cameron Ayer
a4f9865b47 satellite: adds and enables cockroachdb compatibility for tests
Change-Id: I85a3ad8c3b9d7e15ea8675b6c55af0002933db57
2019-12-16 22:29:25 +00:00
Isaac Hess
0008aebf80 pkg/rpc: Change drpcheader to save a packet
This changes when we write the drpcheader. Rather than making it its own
write to the connection, it now prepends the drpc header to the first
write on the connection (typically the tls handshake). This results in
one less packet being sent at the beginning of each drpc connection.

For an operation like uploading a file from uplink, this results in many
packets being dropped: one when communicating with the satellite, and
one for each communication with the storage nodes.

Change-Id: I7644b46e90ffa7acea73ac56831396307352ed7a
2019-12-16 13:33:39 -07:00
Jeff Wendling
23df647a15 pkg/rpc/rpcpool: add idle expiration to connections
long lived uplinks could just hold on to connections forever
if their client to the storagenode or satellite isn't closed.
this will prevent that from happening on the client. more
changes will be necessary to add appropriate prevention on
the servers.

Change-Id: Ib36d85e70cbafb315664ad7657bb70b936b3828c
2019-12-10 20:32:11 +00:00
Cameron Ayer
6fae361c31 replace planet.Start in tests with planet.Run
planet.Start starts a testplanet system, whereas planet.Run starts a testplanet
and runs a test against it with each DB backend (cockroach compat).

Change-Id: I39c9da26d9619ee69a2b718d24ab00271f9e9bc2
2019-12-10 16:55:54 +00:00
Jeff Wendling
53176dcb0e pkg/rpc/rpcstatus: do not depend on grpc/drpc build mode
if your server is built to make drpc connections, clients can
still connect with grpc. thus, your responses to grpc clients
must still look the same, so we have to have all of our status
wrapping include codes for both the drpc and grpc servers to
return the right thing.

Change-Id: If99fa0e674dec2e20ddd372a827f1c01b4d305b2
2019-11-18 15:51:58 -07:00
Jeff Wendling
f3b20215b0 pkg/{rpc,server,tlsopts}: pick larger defaults for buffer sizes
these may not be optimal but they're probably better based on
our previous testing. we can tune better in the future now that
the groundwork is there.

Change-Id: Iafaee86d3181287c33eadf6b7eceb307dda566a6
2019-11-18 21:22:49 +00:00
Egon Elbre
ee6c1cac8a
private: rename internal to private (#3573) 2019-11-14 21:46:15 +02:00
Yingrong Zhao
d2a8ab5d7f pkg/pb: add referral manager protobuf definition (#3561) 2019-11-14 12:33:00 -05:00
JT Olio
a72bf6c254 pkg/rpc: generate drpc/grpc tags correctly (#3556)
Change-Id: Iac79d6134246e92876dd57e269a9c96c2de95884
2019-11-12 16:22:21 -07:00
Jeff Wendling
013e0d94bc pkg/rpc: ensure connections are quickly closed
drpc will call Close on any transport we pass to it, but some
transports (like tls.Conn) will attempt to notify the remote
side of things. we don't want to do that, so pass a new
interface that just closes the underlying socket.

Change-Id: I53344d2747de21b3146abe4f82b8394bb8948cb5
2019-11-12 15:53:36 +00:00
Jeff Wendling
f62107d3e9
pkg/rpc: fix grpc dial timeouts (#3517)
grpc doesn't exit dials right away if the context dialer
returns an error. since that's the only spot where we were
enforcing dial timeouts, dials could just leak for an
unknown amount of time.

add a timeout above the grpc dial because that's the documented
way that grpc expected to be canceled.

Change-Id: Ic47ac61ce8a5f721510cc2c4584f63d43fe4f2d5
2019-11-06 16:42:20 -07:00
Yaroslav Vorobiov
35edc2bcc3 satellite/payments: invoice creation (#3468) 2019-11-05 15:16:02 +02:00
Jeff Wendling
17e9044c0f pkg/rpc/rpcpeer: check both drpc and grpc for peers on a context
we don't know if an incoming connection is from drpc or grpc during
the migration time, so check both.

Change-Id: I2418dde8b651dcc4a23726057178465224a48103
2019-11-01 17:04:53 -06:00
JT Olio
41c0093e5b drpc: enable by default (#3452) 2019-11-01 22:43:24 +01:00
Jeff Wendling
51d5d8656a pkg/rpc: drpc connection pooling
keep a pool of connections open when dialing for drpc. this
makes it so that long lived clients (like lib/uplink's Project)
don't continue to use a bad connection forever. it also allows
for concurrent rpcs.

Change-Id: If649b286050e4f09c413fadc3e1ce88f5fc6e600
2019-10-22 18:15:24 -06:00
JT Olio
2c6fa3c5f8
pkg/rpc: remove read/write deadlines as a mechanism for request timeouts (#3335)
libuplink was incorrectly setting timeouts to 10 seconds still, but
should have been at least 10 minutes. the order sender was setting them
to 1 hour. we don't want timeouts in uplink-side logic as it establishes
a minimum rate on tcp streams.

instead of all of this, just use tcp keep alive. tcp keep alive packets are
sent every 15 seconds and if the peer stops responding the connection
dies. this is enabled by default with go. this will kill tcp connections
when they stop working.

Change-Id: I3d7ad49f71950b3eb43044eedf4b17993116045b
2019-10-22 17:57:24 -06:00
Egon Elbre
f929310add pkg/rpc/rpcstatus: fix drpc grpc compatibilty (#3306)
When code is compiled without -tags=drpc the statuses for drpc server
weren't handled, which meant an uplink using -tags=drpc didn't get the
correct status code.
2019-10-17 15:21:20 -04:00
Marc Schubert
93d5eeda31 Update dial.go (#3261)
What:
Bring back partial nodeID to debug.trace-out

Why:
The information is useful for interpreting the trace file and was there up drpc. I just bring it back.
https://github.com/storj/storj/blob/v0.21.3/pkg/transport/transport.go#L76

Please describe the tests:

Test 1:
Test 2:
Please describe the performance impact:
No impact.
2019-10-14 15:44:15 -06:00
Jennifer Li Johnson
b185dbbee2
satellite/discovery: remove discovery related code (#3175) 2019-10-14 10:57:01 -04:00
JT Olio
6ede140df1
pkg/rpc: defeat MITM attacks in most cases (#3215)
This change adds a trusted registry (via the source code) of node address to node id mappings (currently only for well known Satellites) to defeat MITM attacks to Satellites. It also extends the uplink UI such that when entering a satellite address by hand, a node id prefix can also be added to defeat MITM attacks with unknown satellites.

When running uplink setup, satellite addresses can now be of the form 12EayRS2V1k@us-central-1.tardigrade.io (not even using a full node id) to ensure that the peer contacted is the peer that was expected. When using a known satellite address, the known node ids are used if no override is provided.
2019-10-12 14:34:41 -06:00
Isaac Hess
9256399872
CI: test drpc and grpc (#3163)
* wip: test drpc

* Add parallel intregration test

* Add jenkinsfile.drpc

* Remove unnecessary jenkinsfile items

* testing: GOFLAGS=-drpc (#3236)

* Use GOFLAGS

* add debug

* revert tags

* revert changes

* move goflags to the correct place

* add sanity check
2019-10-11 08:30:06 -06:00
Yingrong Zhao
743a0fc38b storagenode/cmd: create start graceful exit CLI (#3202) 2019-10-11 09:58:12 -04:00
Ethan Adams
447c219d92
satellite/gracefulexit: Add protobuf definitions for communication between storage node and satellite (#3201) 2019-10-08 13:42:56 -04:00
Jeff Wendling
4fab22d691 pkg/rpc: don't leak goroutines during a drpc dial
we spawned a goroutine to wait on the context's done
channel sending the error afterward, but we forgot
to ensure the context was eventually done, so the
goroutine would be leaked until then.

instead, we can just do a select on two channels to
get the error rather than spawn a goroutine which
makes it impossible to leak a goroutine.

Change-Id: I2fdba206ae6ff7a3441b00708b86b36dfeece2b5
2019-10-04 20:09:36 +00:00
Jeff Wendling
64e43e555e pkg/rpc: return context error if ready after DialContext fails
the net package does not make it easy to know if DialContext
failed because the context was done. it's important for some
of our tests that canceled contexts are detected as such, so
we accept the small race that's arguably correct (the context
must be canceled asynchronously) to ensure we always return
the context error if available.

Change-Id: I058064d5c666e5353b74fb5bd300bf7abe537ff5
2019-10-04 20:09:00 +00:00
Isaac Hess
94c7df0d6e
pkg/rpc/rpcstatus: Fix return type (#3162) 2019-10-02 14:46:18 -06:00
Jeff Wendling
93349f247e pkg/rpc: add WithInsecure when doing non-tls dials
Change-Id: I993f223f4ac78824b75a7725342ebf2ae0f74254
2019-09-27 09:07:14 -06:00
Jeff Wendling
098cbc9c67 all: use pkg/rpc instead of pkg/transport
all of the packages and tests work with both grpc and
drpc. we'll probably need to do some jenkins pipelines
to run the tests with drpc as well.

most of the changes are really due to a bit of cleanup
of the pkg/transport.Client api into an rpc.Dialer in
the spirit of a net.Dialer. now that we don't need
observers, we can pass around stateless configuration
to everything rather than stateful things that issue
observations. it also adds a DialAddressID for the
case where we don't have a pb.Node, but we do have an
address and want to assert some ID. this happened
pretty frequently, and now there's no more weird
contortions creating custom tls options, etc.

a lot of the other changes are being consistent/using
the abstractions in the rpc package to do rpc style
things like finding peer information, or checking
status codes.

Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412
2019-09-25 15:37:06 -06:00
Jeff Wendling
a20a7db793 pkg/rpc: build tag based selection of rpc details
It provides an abstraction around the rpc details so that one
can use dprc or gprc with the same code. It subsumes using the
protobuf package directly for client interfaces as well as
the pkg/transport package to perform dials.

Change-Id: I8f5688bd71be8b0c766f13029128a77e5d46320b
2019-09-20 21:07:33 +00:00