Commit Graph

59 Commits

Author SHA1 Message Date
JT Olio
2a63225b98 satellite/{contact,satellitedb}: preserve node message debounce support
Change-Id: I453ad35fda4e61d068db0c476dd86b50d7f2d705
2023-03-20 16:13:06 +00:00
paul cannon
2522ff09b6 satellite/overlay: configurable meaning of last_net
Up to now, we have been implementing the DistinctIP preference with code
in two places:

 1. On check-in, the last_net is determined by taking the /24 or /64
    (in ResolveIPAndNetwork()) and we store it with the node record.
 2. On node selection, a preference parameter defines whether to return
    results that are distinct on last_net.

It can be observed that we have never yet had the need to switch from
DistinctIP to !DistinctIP, or from !DistinctIP to DistinctIP, on the
same satellite, and we will probably never need to do so in an automated
way. It can also be observed that this arrangement makes tests more
complicated, because we often have to arrange for test nodes to have IP
addresses in different /24 networks (a particular pain on macOS).

Those two considerations, plus some pending work on the repair framework
that will make repair take last_net into consideration, motivate this
change.

With this change, in the #2 place, we will _always_ return results that
are distinct on last_net. We implement the DistinctIP preference, then,
by making the #1 place (ResolveIPAndNetwork()) more flexible. When
DistinctIP is enabled, last_net will be calculated as it was before. But
when DistinctIP is _off_, last_net can be the same as address (IP and
port). That will effectively implement !DistinctIP because every
record will have a distinct last_net already.

As a side effect, this flexibility will allow us to change the rules
about last_net construction arbitrarily. We can do tests where last_net
is set to the source IP, or to a /30 prefix, or a /16 prefix, etc., and
be able to exercise the production logic without requiring a virtual
network bridge.

This change should be safe to make without any migration code, because
all known production satellite deployments use DistinctIP, and the
associated last_net values will not change for them. They will only
change for satellites with !DistinctIP, which are mostly test
deployments that can be recreated trivially. For those satellites which
are both permanent and !DistinctIP, node selection will suddenly start
acting as though DistinctIP is enabled, until the operator runs a single
SQL update "UPDATE nodes SET last_net = last_ip_port". That can be done
either before or after deploying software with this change.

I also assert that this will not hurt performance for production
deployments. It's true that adding the distinct requirement to node
selection makes things a little slower, but the distinct requirement is
already present for all production deployments, and they will see no
change.

Refs: https://github.com/storj/storj/issues/5391
Change-Id: I0e7e92498c3da768df5b4d5fb213dcd2d4862924
2023-03-09 02:20:12 +00:00
JT Olio
522aed083d private/server,satellite/contact,misc: use new storj/common noise helpers
this change uses the new storj/common noise helpers, which:
 * add a security fix (require an expected node id for validating
   noise key attestations)
 * stops doing an unnecessary order signature validation (it's
   already been done inside of PutPiece)
 * removes some duplicate code

Change-Id: I5e67a08ff216cd9c5b0b82e40b4d9de664b6b0fc
2023-02-07 09:53:45 -05:00
JT Olio
2753d5a32f satellite/overlay: keep track of noise info per node
Change-Id: Icef04c3e87dbf4bb57d3837274c323bf6dd2c81f
2023-02-01 23:03:35 -05:00
JT Olio
572c5b305b satellite/context: fix pingme test
Change-Id: I30fcfe1dea43d65e457c76dd44f475b5c197cbee
2023-01-30 15:01:32 -05:00
JT Olio
e40191afd6 storj: upgrade to use latest storj/common NodeAddress
Change-Id: I5987391bcfe5f6dfd7b525698c337a4cbda9b76e
2023-01-25 01:37:26 +00:00
Márton Elek
97679a39ff satellite/contact: emit evenkit events in case of node checkin
Change-Id: I2da4b3055b410e476d63cc6addf982a130dba611
2022-11-14 10:38:58 +00:00
Artur M. Wolff
8c1caea5db satellite/contact: swap net.IP.IsPrivateIP with isPrivateIP
This change swaps net.IP.IsPrivateIP usages with custom isPrivateIP to
unbreak the build as we want to build for earlier than Go 1.17.

Change-Id: I44badbb487f35e43b8b0433ad0f3b9c87af718d4
2022-06-13 01:01:44 +02:00
Paul Willoughby
911cc1e163 satellite/contact: reject privateIPs in PingMe and CheckIn endpoints
prevent network enumeration by rejecting privateIPs in PingMe and
Checkin endpoints

Closes storj/storj-private#32

Change-Id: I63f00483ff4128ebd5fa9b7b8da826a5706748c9
2022-06-07 08:09:14 +00:00
Clement Sam
2de6feefde satellite/contact: fix connection leak in PingMe endpoint
Fixes connection leak in pingMe endpoint.
includes other minor fixes.

Change-Id: I7c61f620565f46dd113d21a772de7c439be550e3
2022-01-20 14:20:24 +00:00
Clement Sam
9f3c1f9cda satellite/contact: add PingMe endpoint
Change-Id: I832a72fafeacf76ad64a0129bcc6582cc4f9290d
2022-01-19 17:52:33 +00:00
Artur M. Wolff
2cd68bf4fb private/lrucache: import from common
Change-Id: Ia1f43d0440fef21122b071b05da59b4cf2689d6c
2021-08-16 10:04:32 +00:00
Yingrong Zhao
077ec96d94 private/server: use quic implementation from storj.io/common
Change-Id: I820cf6444a3ddccee0d7c647dc84c80b2752068c
2021-08-10 13:32:21 +00:00
JT Olio
da9ca0c650 testplanet/satellite: reduce the number of places default values need to be configured
Satellites set their configuration values to default values using
cfgstruct, however, it turns out our tests don't test these values
at all! Instead, they have a completely separate definition system
that is easy to forget about.

As is to be expected, these values have drifted, and it appears
in a few cases test planet is testing unreasonable values that we
won't see in production, or perhaps worse, features enabled in
production were missed and weren't enabled in testplanet.

This change makes it so all values are configured the same,
systematic way, so it's easy to see when test values are different
than dev values or release values, and it's less hard to forget
to enable features in testplanet.

In terms of reviewing, this change should be actually fairly
easy to review, considering private/testplanet/satellite.go keeps
the current config system and the new one and confirms that they
result in identical configurations, so you can be certain that
nothing was missed and the config is all correct.
You can also check the config lock to see what actual config
values changed.

Change-Id: I6715d0794887f577e21742afcf56fd2b9d12170e
2021-06-01 22:14:17 +00:00
JT Olio
b07a39bfea satellite: log check in success node id
This is so we can see what's going on if we get a weird node DoS thing again

Change-Id: I5a14c95277562e496fcefb6d368068a6ec1dbc9f
2021-06-01 19:35:24 +00:00
JT Olio
1852773e3e satellite/contact: rate limit node checkins
Change-Id: Ied386a2350aa073de46443e5259b56d49ec61dbf
2021-05-17 08:15:04 +00:00
Egon Elbre
961e841bd7 all: fix error naming
errs.Class should not contain "error" in the name, since that causes a
lot of stutter in the error logs. As an example a log line could end up
looking like:

    ERROR node stats service error: satellitedbs error: node stats database error: no rows

Whereas something like:

    ERROR nodestats service: satellitedbs: nodestatsdb: no rows

Would contain all the necessary information without the stutter.

Change-Id: I7b7cb7e592ebab4bcfadc1eef11122584d2b20e0
2021-04-29 15:38:21 +03:00
Egon Elbre
7802ab714f pkg/,private/: merge with private package
Initially there were pkg and private packages, however for all practical
purposes there's no significant difference between them. It's clearer to
have a single private package - and when we do get a specific
abstraction that needs to be reused, we can move it to storj.io/common
or storj.io/private.

Change-Id: Ibc2036e67f312f5d63cb4a97f5a92e38ae413aa5
2021-04-23 16:37:28 +03:00
Yingrong Zhao
43f5888052 satellite/contact: only test PingBack failure case
The original test caused the testplanet to timeout due to contact
service stuck in a loop. We can change it to only test the failure case for
PingBack method instead of closing the TCP port on storagenodes.

Change-Id: Ic96aee637b39ae95050c6902c2bf9ca51fb586c3
2021-04-21 14:20:36 -04:00
Yingrong Zhao
a3c437a7bf satellite/contact,storagenode/contact: try ping back to nodes through
QUIC

We want to encourage storagenodes to open their udp port. This PR
changes contact service in satellite to try to connect to nodes through
QUIC. If satellite can't reach nodes through quic, it will send an error
message back to nodes. On the nodes side, it will always log out error
message from check in if the error message is not empty.
Whether satellite can reach nodes through quic has no affect on nodes'
uptime check.

Change-Id: I5ebf80f921c4a6504997d83c8bd45226da9d3703
2021-04-20 19:25:37 +00:00
Egon Elbre
86e698f572 pb: use *UnimplementedServer to avoid breaking API changes
Change-Id: I99a34eeb37ac4453411f273511710562a519f57a
2021-03-29 12:26:10 +03:00
Yaroslav Vorobiov
966535e9de {storagenode,satellite}/nodeoperator: add wallet features
Change-Id: Iac7eb40a52b8fddcc573aebaad2e3a30a10cded9
2021-02-08 22:09:45 +02:00
Cameron Ayer
d14607a5f7 satellite/{contact,nodestats,overlay,satellitedb}: remove references to total_uptime_count and uptime_success_count columns
Change-Id: I1f92022909bc564e9b1e31bf937fdfe7c16554de
2021-01-19 15:43:02 -05:00
Moby von Briesen
d75e4be11f satellite/{accounting, contact}: Remove periods and spaces from metrics.
Change-Id: I84179c2931293e3a1eb0ff8050416d25e481ce07
2020-12-03 15:33:01 +00:00
Egon Elbre
080ba47a06 all: fix dots
Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2
2020-07-16 14:58:28 +00:00
Egon Elbre
bef84a5f9d storagenode: remove dependency to overlay.NodeDossier
This is the last dependency from storage node to satellite.

Change-Id: I12f7abb91e84f823ba5af126c6e2979519838612
2020-05-21 08:37:13 +03:00
Egon Elbre
5d016425f1 satellite/{contact,downtime,overlay}: use NodeURL
Change-Id: I555a479a89e0ddbf0499898bdbc8574282cd6846
2020-05-20 11:09:05 +00:00
Egon Elbre
941d10cbc3 private/testplanet: remove Peer.Local()
Currently storagenode depends on overlay.NodeDossier, this is the first
step in removing it.

Change-Id: I034a3f1601835f8349bd41752455022e19bcc707
2020-05-20 11:05:34 +00:00
Egon Elbre
ed627144ed all: use DialNodeURL throughout the codebase
Change-Id: Iaf9ae3aeef7305c937f2660c929744db2d88776c
2020-05-20 10:36:30 +00:00
Jess G
6fac1ed130
satellite/contact: rm unhelpful verbose logging (#3894)
* unhelpful rm verbose logging

Change-Id: I74dc29f18969250fa6838fed9bec8ba7ea160b77

* change to debug

Change-Id: I4ba880e68b40bc3e0f29d9b1aa9154e63124c904
2020-05-19 08:09:00 -07:00
Cameron Ayer
42be4bdc0f satellite/contact: add timeout to PingBack method
Change-Id: I2ec2f82e2e10d8be16f82e9de13ce42358e47c98
2020-04-04 18:26:30 +00:00
Yingrong Zhao
b7b19289d1 bump storj.io/common to latest
Change-Id: I16e337660ce8e1ef332cc842dbf4cfa067b9b98b
2020-03-25 09:08:40 -04:00
Jess G
39cb821196
satellite/overlay: rm combinedcache, fix IP naming to be network (#3798)
* rn combinedcache, rm dns node lookup

Change-Id: I239f07211764b097d851230d8c81900a47756e9e

* excludeIPs -> excludedNetworks

Change-Id: Ifa6f44ab17457cdd5aff4cd5694296867c18b179

* use lowercase var name

Change-Id: I825aad2b718c71f455e747be18f8cabd02aabe55

* update Getnetwork name

Change-Id: I002a1b7bc6b4ef40159c0cd2b0ef209f80a9c503

* fix comments

Change-Id: Ibddf5b9ffa9d685af6c392d893db063ef18e45fa

* update comments with ipv6

Change-Id: I31758b7d4979e7c27d014668f4fb532ad838cda2

Co-authored-by: Stefan Benten <mail@stefan-benten.de>
2020-03-12 11:37:57 -07:00
Jessica Grebenschikov
803e2930f4 satellite: use IP for all uplink operations, use hostname for audit and repairs
My understanding is that the nodes table has the following fields:
- `address` field which can be a hostname or an IP
- `last_net` field that is the /24 subnet of the IP resolved from the address

This PR does the following:
1) add back the `last_ip` field to the nodes table
2) for uplink operations remove the calls that the satellite makes to `lookupNodeAddress` (which makes the DNS calls to resolve the IP from the hostname) and instead use the data stored in the nodes table `last_ip` field. This means that the IP that the satellite sends to the uplink for the storage nodes could be approx 1 hr stale. In the short term this is fine, next we will be adding changes so that the storage node pushes any IP changes to the satellite in real time.
3) use the address field for repair and audit since we want them to still make DNS calls to confirm the IP is up to date
4) try to reduce confusion about hostname, ip, subnet, and address in the code base

Change-Id: I96ce0d8bb78303f82483d0701bc79544b74057ac
2020-03-11 09:11:40 -07:00
Jeff Wendling
7999d24f81 all: use monkit v3
this commit updates our monkit dependency to the v3 version where
it outputs in an influx style. this makes discovery much easier
as many tools are built to look at it this way.

graphite and rothko will suffer some due to no longer being a tree
based on dots. hopefully time will exist to update rothko to
index based on the new metric format.

it adds an influx output for the statreceiver so that we can
write to influxdb v1 or v2 directly.

Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff
2020-02-05 23:53:17 +00:00
Yingrong Zhao
db8aee0806 satellite/contact; storagenode/preflight: add clock check on startup for storagenode
add config preflight.enabled-local-time

Change-Id: I7b942c9bee063aae409ee6721ae9d079dff0144f
2020-01-15 15:35:26 +00:00
Yingrong Zhao
ee87846f0b satellite/contact: add placeholder for GetTime endpoint
Change-Id: I42f8479708f0558350c2280a398d84d145e8118f
2020-01-14 06:38:47 +00:00
Ethan
8859c36234 satellite/{downtime,contact}: Add CheckNodeAvailability for use within the downtime tracking chores.
https://storjlabs.atlassian.net/browse/V3-2545

Change-Id: I1dd54a0c77cb4905bb1f350beeb82c6f7700ee70
2020-01-02 18:24:11 +00:00
Egon Elbre
6615ecc9b6 common: separate repository
Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a
2019-12-27 14:11:15 +02:00
Egon Elbre
d55288cf68 pkg/rpc: replace methods with direct calls to pb
Change-Id: I8bd015d8d316a2c12c1daceca1d9fd257f6f57bc
2019-12-22 17:12:43 +02:00
Egon Elbre
006baa9ca6 pkg/rpc: remove drpc aliases
We need to split up pb package, which means we cannot have a core package
that depends on them.

Change-Id: I7f4f6fd82f89a51a9b2ad08bf2b1207253b8a215
2019-12-22 16:58:08 +02:00
littleskunk
8b3444e088
satellite/nodeselection: don't select nodes that haven't checked in for a while (#3567)
* satellite/nodeselection: dont select nodes that havent checked in for a while

* change testplanet online window to one minute

* remove satellite reconfigure online window = 0 in repair tests

* pass timestamp into UpdateCheckIn

* change timestamp to timestamptz

* edit tests to set last_contact_success to 4 hours ago

* fix syntax error

* remove check for last_contact_success > last_contact_failure in IsOnline
2019-11-15 23:43:06 +01:00
Egon Elbre
ee6c1cac8a
private: rename internal to private (#3573) 2019-11-14 21:46:15 +02:00
littleskunk
7eb6724c92
logging: unify logging around satellite ID, node ID and piece ID (#3491)
* logging: unify logging around satellite ID, node ID and piece ID

* unify segment index
2019-11-05 22:04:07 +01:00
Jess G
4d85b11574
satellite/contact: improve errors in contact endpoints (#3356)
* improve errors in satellite contact endpoints

* add changes per CR comments

* update pingback method so it still updates node table

* fix err and returns

* fix zap logging to be better
2019-10-30 11:57:21 -07:00
JT Olio
1b66517664 contact: small typo
Change-Id: If9126ae518b5672bfd9163a8c9fc518727d5138b
2019-10-14 13:21:05 -06:00
Jennifer Li Johnson
b185dbbee2
satellite/discovery: remove discovery related code (#3175) 2019-10-14 10:57:01 -04:00
Jeff Wendling
098cbc9c67 all: use pkg/rpc instead of pkg/transport
all of the packages and tests work with both grpc and
drpc. we'll probably need to do some jenkins pipelines
to run the tests with drpc as well.

most of the changes are really due to a bit of cleanup
of the pkg/transport.Client api into an rpc.Dialer in
the spirit of a net.Dialer. now that we don't need
observers, we can pass around stateless configuration
to everything rather than stateful things that issue
observations. it also adds a DialAddressID for the
case where we don't have a pb.Node, but we do have an
address and want to assert some ID. this happened
pretty frequently, and now there's no more weird
contortions creating custom tls options, etc.

a lot of the other changes are being consistent/using
the abstractions in the rpc package to do rpc style
things like finding peer information, or checking
status codes.

Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412
2019-09-25 15:37:06 -06:00
Egon Elbre
9ceff9f9c6 satellite/overlay: move CheckIn benchmark to overlay (#3095) 2019-09-20 16:35:52 -04:00
Jennifer Li Johnson
724bb44723
Remove Kademlia dependencies from Satellite and Storagenode (#2966)
What:

cmd/inspector/main.go: removes kad commands
internal/testplanet/planet.go: Waits for contact chore to finish
satellite/contact/nodesservice.go: creates an empty nodes service implementation
satellite/contact/service.go: implements Local and FetchInfo methods & adds external address config value
satellite/discovery/service.go: replaces kad.FetchInfo with contact.FetchInfo in Refresh() & removes Discover()
satellite/peer.go: sets up contact service and endpoints
storagenode/console/service.go: replaces nodeID with contact.Local()
storagenode/contact/chore.go: replaces routing table with contact service
storagenode/contact/nodesservice.go: creates empty implementation for ping and request info nodes service & implements RequestInfo method
storagenode/contact/service.go: creates a service to return the local node and update its own capacity
storagenode/monitor/monitor.go: uses contact service in place of routing table
storagenode/operator.go: moves operatorconfig from kad into its own setup
storagenode/peer.go: sets up contact service, chore, pingstats and endpoints
satellite/overlay/config.go: changes NodeSelectionConfig.OnlineWindow default to 4hr to allow for accurate repair selection
Removes kademlia setups in:

cmd/storagenode/main.go
cmd/storj-sim/network.go
internal/testplane/planet.go
internal/testplanet/satellite.go
internal/testplanet/storagenode.go
satellite/peer.go
scripts/test-sim-backwards.sh
scripts/testdata/satellite-config.yaml.lock
storagenode/inspector/inspector.go
storagenode/peer.go
storagenode/storagenodedb/database.go
Why: Replacing Kademlia

Please describe the tests:
• internal/testplanet/planet_test.go:

TestBasic: assert that the storagenode can check in with the satellite without any errors
TestContact: test that all nodes get inserted into both satellites' overlay cache during testplanet setup
• satellite/contact/contact_test.go:

TestFetchInfo: Tests that the FetchInfo method returns the correct info
• storagenode/contact/contact_test.go:

TestNodeInfoUpdated: tests that the contact chore updates the node information
TestRequestInfoEndpoint: tests that the Request info endpoint returns the correct info
Please describe the performance impact: Node discovery should be at least slightly more performant since each node connects directly to each satellite and no longer needs to wait for bootstrapping. It probably won't be faster in real time on start up since each node waits a random amount of time (less than 1 hr) to initialize its first connection (jitter).
2019-09-19 15:56:34 -04:00