This PR introduces a new listener that can listen for quic traffic on
both storagenodes and satellites.
Change-Id: I5eb5bc82c37dde20d3be2ec8fa5f69c18fae0af0
This will help to determine how many grpc calls are made to the
satellite.
Also remove the grpc funcs that have been added to upstream.
Change-Id: I91878f4fd10f9bfe601c94222c102eaaf4d35963
This moves grpc related tlsopts methods to private/grpctlsopts.
This allows to remove grpc dependency from tlsopts.
Change-Id: I25090b82b1e7a0633417ad600f8587b0c30ace73
Go will, by default, set tcp keep alives on sockets. But
the kernel does not send keep alives to sockets that have
a non-empty send queue. That can cause connections that
hang forever.
So we set TCP_USER_TIMEOUT on all of the sockets as well.
That option will close any connection that has not received
an ack for any sent data (keep alive or otherwise) in the
configured time period. This places an upper bound on the
amount of time a socket can be stuck due to a client not
acknowleding data.
See https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
for more information on what these options do and how they
interact.
Additionally, make sure that we close every connection coming
from the listeners by wrapping them in a type with a finalizer
that closes the connection, much like the os package does for
file handles. It monitors if a connection was closed due to a
finalizer so that we can go and look for the bug if we ever
see a non-zero value.
Change-Id: Idc6c0564224b8dc2e4c9d769e80374ed1fe8cce0
these may not be optimal but they're probably better based on
our previous testing. we can tune better in the future now that
the groundwork is there.
Change-Id: Iafaee86d3181287c33eadf6b7eceb307dda566a6
all of the packages and tests work with both grpc and
drpc. we'll probably need to do some jenkins pipelines
to run the tests with drpc as well.
most of the changes are really due to a bit of cleanup
of the pkg/transport.Client api into an rpc.Dialer in
the spirit of a net.Dialer. now that we don't need
observers, we can pass around stateless configuration
to everything rather than stateful things that issue
observations. it also adds a DialAddressID for the
case where we don't have a pb.Node, but we do have an
address and want to assert some ID. this happened
pretty frequently, and now there's no more weird
contortions creating custom tls options, etc.
a lot of the other changes are being consistent/using
the abstractions in the rpc package to do rpc style
things like finding peer information, or checking
status codes.
Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412
The fundamental problem is that both drpc and grpc servers
want to close the listener and they both want to ignore the
error from Accept after the listener is closed. There's no
way to do this in a race free way. Fortunately, the mux
hands out listeners that can be independently closed. That
means they can both do their own shutdown logic where they
ignore the error, and then after they're closed, the code
orchestrating the servers can close the listeners.
The final weird bit is that the server's Close method is
required to wait until the Run method has exited (or at
least enough for the listeners to definitely be closed)
because tests depend on that behavior, so we have to add
some channels/mutexes/onces to ensure that Run has exited
and that a new call can't start after Close is called.
Change-Id: I7c4ef293f7963f83138815f51824fd5b8d09ce15
* pkg/server: don't use global logger
* satellite/overlay: use correct logger
* pkg/kademlia: use correct logger
* linksharing: use conventional way to pass in logger
* use zaptest in tests
* add private listener to grpc server
* add changes per init CR
* fix server.close
* add insecure grpc connection, update logs msg
* fix tests, move insecure client
* add private ports to storj-sim, add insecure client to other inspectors
* add ports to test so there arent conflicts
* fix lint err
* fix node started log msg, close public listener
* remove commented out line
* separate TLS options from server options (because we need them for dialing too)
* stop creating transports in multiple places
* ensure that we actually check revocation, whitelists, certificate signing, etc, for all connections.