Up to now, we have been implementing the DistinctIP preference with code
in two places:
1. On check-in, the last_net is determined by taking the /24 or /64
(in ResolveIPAndNetwork()) and we store it with the node record.
2. On node selection, a preference parameter defines whether to return
results that are distinct on last_net.
It can be observed that we have never yet had the need to switch from
DistinctIP to !DistinctIP, or from !DistinctIP to DistinctIP, on the
same satellite, and we will probably never need to do so in an automated
way. It can also be observed that this arrangement makes tests more
complicated, because we often have to arrange for test nodes to have IP
addresses in different /24 networks (a particular pain on macOS).
Those two considerations, plus some pending work on the repair framework
that will make repair take last_net into consideration, motivate this
change.
With this change, in the #2 place, we will _always_ return results that
are distinct on last_net. We implement the DistinctIP preference, then,
by making the #1 place (ResolveIPAndNetwork()) more flexible. When
DistinctIP is enabled, last_net will be calculated as it was before. But
when DistinctIP is _off_, last_net can be the same as address (IP and
port). That will effectively implement !DistinctIP because every
record will have a distinct last_net already.
As a side effect, this flexibility will allow us to change the rules
about last_net construction arbitrarily. We can do tests where last_net
is set to the source IP, or to a /30 prefix, or a /16 prefix, etc., and
be able to exercise the production logic without requiring a virtual
network bridge.
This change should be safe to make without any migration code, because
all known production satellite deployments use DistinctIP, and the
associated last_net values will not change for them. They will only
change for satellites with !DistinctIP, which are mostly test
deployments that can be recreated trivially. For those satellites which
are both permanent and !DistinctIP, node selection will suddenly start
acting as though DistinctIP is enabled, until the operator runs a single
SQL update "UPDATE nodes SET last_net = last_ip_port". That can be done
either before or after deploying software with this change.
I also assert that this will not hurt performance for production
deployments. It's true that adding the distinct requirement to node
selection makes things a little slower, but the distinct requirement is
already present for all production deployments, and they will see no
change.
Refs: https://github.com/storj/storj/issues/5391
Change-Id: I0e7e92498c3da768df5b4d5fb213dcd2d4862924
this change uses the new storj/common noise helpers, which:
* add a security fix (require an expected node id for validating
noise key attestations)
* stops doing an unnecessary order signature validation (it's
already been done inside of PutPiece)
* removes some duplicate code
Change-Id: I5e67a08ff216cd9c5b0b82e40b4d9de664b6b0fc
This change swaps net.IP.IsPrivateIP usages with custom isPrivateIP to
unbreak the build as we want to build for earlier than Go 1.17.
Change-Id: I44badbb487f35e43b8b0433ad0f3b9c87af718d4
prevent network enumeration by rejecting privateIPs in PingMe and
Checkin endpoints
Closesstorj/storj-private#32
Change-Id: I63f00483ff4128ebd5fa9b7b8da826a5706748c9
errs.Class should not contain "error" in the name, since that causes a
lot of stutter in the error logs. As an example a log line could end up
looking like:
ERROR node stats service error: satellitedbs error: node stats database error: no rows
Whereas something like:
ERROR nodestats service: satellitedbs: nodestatsdb: no rows
Would contain all the necessary information without the stutter.
Change-Id: I7b7cb7e592ebab4bcfadc1eef11122584d2b20e0
QUIC
We want to encourage storagenodes to open their udp port. This PR
changes contact service in satellite to try to connect to nodes through
QUIC. If satellite can't reach nodes through quic, it will send an error
message back to nodes. On the nodes side, it will always log out error
message from check in if the error message is not empty.
Whether satellite can reach nodes through quic has no affect on nodes'
uptime check.
Change-Id: I5ebf80f921c4a6504997d83c8bd45226da9d3703
My understanding is that the nodes table has the following fields:
- `address` field which can be a hostname or an IP
- `last_net` field that is the /24 subnet of the IP resolved from the address
This PR does the following:
1) add back the `last_ip` field to the nodes table
2) for uplink operations remove the calls that the satellite makes to `lookupNodeAddress` (which makes the DNS calls to resolve the IP from the hostname) and instead use the data stored in the nodes table `last_ip` field. This means that the IP that the satellite sends to the uplink for the storage nodes could be approx 1 hr stale. In the short term this is fine, next we will be adding changes so that the storage node pushes any IP changes to the satellite in real time.
3) use the address field for repair and audit since we want them to still make DNS calls to confirm the IP is up to date
4) try to reduce confusion about hostname, ip, subnet, and address in the code base
Change-Id: I96ce0d8bb78303f82483d0701bc79544b74057ac
* satellite/nodeselection: dont select nodes that havent checked in for a while
* change testplanet online window to one minute
* remove satellite reconfigure online window = 0 in repair tests
* pass timestamp into UpdateCheckIn
* change timestamp to timestamptz
* edit tests to set last_contact_success to 4 hours ago
* fix syntax error
* remove check for last_contact_success > last_contact_failure in IsOnline
* improve errors in satellite contact endpoints
* add changes per CR comments
* update pingback method so it still updates node table
* fix err and returns
* fix zap logging to be better
all of the packages and tests work with both grpc and
drpc. we'll probably need to do some jenkins pipelines
to run the tests with drpc as well.
most of the changes are really due to a bit of cleanup
of the pkg/transport.Client api into an rpc.Dialer in
the spirit of a net.Dialer. now that we don't need
observers, we can pass around stateless configuration
to everything rather than stateful things that issue
observations. it also adds a DialAddressID for the
case where we don't have a pb.Node, but we do have an
address and want to assert some ID. this happened
pretty frequently, and now there's no more weird
contortions creating custom tls options, etc.
a lot of the other changes are being consistent/using
the abstractions in the rpc package to do rpc style
things like finding peer information, or checking
status codes.
Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412
* create upsert query for check-in method
* add tests
* fix lint err
* add benchmark test for db query
* fix lint and tests
* add a unit test, fix lint
* add address to tests
* replace print w/ b.Fatal
* refactor query per CR comments
* fix disqualified, only set if null
* fix query
* add version to updatecheckin query
* fix version
* fix tests
* change version for tests
* add version to tests
* add IP, add transport, mv unit test
* use node.address as arg
* add last ip
* fix lint
* implement contact.checkin method
* add batching to update uptime checks
* rm batching
* rm other unneeded things
* fix lint
* fix unit test
* changes per CR comments
* couple more CR changes
* add identity check into grpcOpt
* fix lint
* why do you fix the test
* revert test change
* stop contact chore for repair test
* put node in cache
* comment out contact chore. See what happens
* Revert "comment out contact chore. See what happens"
This reverts commit 2e45008e36a50e0a842ae455ac83de77093d4daa.
* try stopping contact earlier
* stop contact chore in uplink_test
* replace self on chore with *RoutingTable for access to latest node info
* Revert "stop contact chore in uplink_test"
This reverts commit 302db70f4071112d1b9f7ee0279225ea12757723.
* Revert "try stopping contact earlier"
This reverts commit 806cc3b82f9d598899dafd83da9315a1cb0cb43c.
* Revert "stop contact chore for repair test"
This reverts commit dd34de1cfdfc09b972186c9ab9a4f1e822446b79.