storj

Author	SHA1	Message	Date
JT Olio	2a63225b98	satellite/{contact,satellitedb}: preserve node message debounce support Change-Id: I453ad35fda4e61d068db0c476dd86b50d7f2d705	2023-03-20 16:13:06 +00:00
paul cannon	2522ff09b6	satellite/overlay: configurable meaning of last_net Up to now, we have been implementing the DistinctIP preference with code in two places: 1. On check-in, the last_net is determined by taking the /24 or /64 (in ResolveIPAndNetwork()) and we store it with the node record. 2. On node selection, a preference parameter defines whether to return results that are distinct on last_net. It can be observed that we have never yet had the need to switch from DistinctIP to !DistinctIP, or from !DistinctIP to DistinctIP, on the same satellite, and we will probably never need to do so in an automated way. It can also be observed that this arrangement makes tests more complicated, because we often have to arrange for test nodes to have IP addresses in different /24 networks (a particular pain on macOS). Those two considerations, plus some pending work on the repair framework that will make repair take last_net into consideration, motivate this change. With this change, in the #2 place, we will _always_ return results that are distinct on last_net. We implement the DistinctIP preference, then, by making the #1 place (ResolveIPAndNetwork()) more flexible. When DistinctIP is enabled, last_net will be calculated as it was before. But when DistinctIP is _off_, last_net can be the same as address (IP and port). That will effectively implement !DistinctIP because every record will have a distinct last_net already. As a side effect, this flexibility will allow us to change the rules about last_net construction arbitrarily. We can do tests where last_net is set to the source IP, or to a /30 prefix, or a /16 prefix, etc., and be able to exercise the production logic without requiring a virtual network bridge. This change should be safe to make without any migration code, because all known production satellite deployments use DistinctIP, and the associated last_net values will not change for them. They will only change for satellites with !DistinctIP, which are mostly test deployments that can be recreated trivially. For those satellites which are both permanent and !DistinctIP, node selection will suddenly start acting as though DistinctIP is enabled, until the operator runs a single SQL update "UPDATE nodes SET last_net = last_ip_port". That can be done either before or after deploying software with this change. I also assert that this will not hurt performance for production deployments. It's true that adding the distinct requirement to node selection makes things a little slower, but the distinct requirement is already present for all production deployments, and they will see no change. Refs: https://github.com/storj/storj/issues/5391 Change-Id: I0e7e92498c3da768df5b4d5fb213dcd2d4862924	2023-03-09 02:20:12 +00:00
JT Olio	522aed083d	private/server,satellite/contact,misc: use new storj/common noise helpers this change uses the new storj/common noise helpers, which: * add a security fix (require an expected node id for validating noise key attestations) * stops doing an unnecessary order signature validation (it's already been done inside of PutPiece) * removes some duplicate code Change-Id: I5e67a08ff216cd9c5b0b82e40b4d9de664b6b0fc	2023-02-07 09:53:45 -05:00
JT Olio	2753d5a32f	satellite/overlay: keep track of noise info per node Change-Id: Icef04c3e87dbf4bb57d3837274c323bf6dd2c81f	2023-02-01 23:03:35 -05:00
JT Olio	572c5b305b	satellite/context: fix pingme test Change-Id: I30fcfe1dea43d65e457c76dd44f475b5c197cbee	2023-01-30 15:01:32 -05:00
JT Olio	e40191afd6	storj: upgrade to use latest storj/common NodeAddress Change-Id: I5987391bcfe5f6dfd7b525698c337a4cbda9b76e	2023-01-25 01:37:26 +00:00
Márton Elek	97679a39ff	satellite/contact: emit evenkit events in case of node checkin Change-Id: I2da4b3055b410e476d63cc6addf982a130dba611	2022-11-14 10:38:58 +00:00
Artur M. Wolff	8c1caea5db	satellite/contact: swap net.IP.IsPrivateIP with isPrivateIP This change swaps net.IP.IsPrivateIP usages with custom isPrivateIP to unbreak the build as we want to build for earlier than Go 1.17. Change-Id: I44badbb487f35e43b8b0433ad0f3b9c87af718d4	2022-06-13 01:01:44 +02:00
Paul Willoughby	911cc1e163	satellite/contact: reject privateIPs in PingMe and CheckIn endpoints prevent network enumeration by rejecting privateIPs in PingMe and Checkin endpoints Closes storj/storj-private#32 Change-Id: I63f00483ff4128ebd5fa9b7b8da826a5706748c9	2022-06-07 08:09:14 +00:00
Clement Sam	2de6feefde	satellite/contact: fix connection leak in PingMe endpoint Fixes connection leak in pingMe endpoint. includes other minor fixes. Change-Id: I7c61f620565f46dd113d21a772de7c439be550e3	2022-01-20 14:20:24 +00:00
Clement Sam	9f3c1f9cda	satellite/contact: add PingMe endpoint Change-Id: I832a72fafeacf76ad64a0129bcc6582cc4f9290d	2022-01-19 17:52:33 +00:00
Artur M. Wolff	2cd68bf4fb	private/lrucache: import from common Change-Id: Ia1f43d0440fef21122b071b05da59b4cf2689d6c	2021-08-16 10:04:32 +00:00
Yingrong Zhao	077ec96d94	private/server: use quic implementation from storj.io/common Change-Id: I820cf6444a3ddccee0d7c647dc84c80b2752068c	2021-08-10 13:32:21 +00:00
JT Olio	da9ca0c650	testplanet/satellite: reduce the number of places default values need to be configured Satellites set their configuration values to default values using cfgstruct, however, it turns out our tests don't test these values at all! Instead, they have a completely separate definition system that is easy to forget about. As is to be expected, these values have drifted, and it appears in a few cases test planet is testing unreasonable values that we won't see in production, or perhaps worse, features enabled in production were missed and weren't enabled in testplanet. This change makes it so all values are configured the same, systematic way, so it's easy to see when test values are different than dev values or release values, and it's less hard to forget to enable features in testplanet. In terms of reviewing, this change should be actually fairly easy to review, considering private/testplanet/satellite.go keeps the current config system and the new one and confirms that they result in identical configurations, so you can be certain that nothing was missed and the config is all correct. You can also check the config lock to see what actual config values changed. Change-Id: I6715d0794887f577e21742afcf56fd2b9d12170e	2021-06-01 22:14:17 +00:00
JT Olio	b07a39bfea	satellite: log check in success node id This is so we can see what's going on if we get a weird node DoS thing again Change-Id: I5a14c95277562e496fcefb6d368068a6ec1dbc9f	2021-06-01 19:35:24 +00:00
JT Olio	1852773e3e	satellite/contact: rate limit node checkins Change-Id: Ied386a2350aa073de46443e5259b56d49ec61dbf	2021-05-17 08:15:04 +00:00
Egon Elbre	961e841bd7	all: fix error naming errs.Class should not contain "error" in the name, since that causes a lot of stutter in the error logs. As an example a log line could end up looking like: ERROR node stats service error: satellitedbs error: node stats database error: no rows Whereas something like: ERROR nodestats service: satellitedbs: nodestatsdb: no rows Would contain all the necessary information without the stutter. Change-Id: I7b7cb7e592ebab4bcfadc1eef11122584d2b20e0	2021-04-29 15:38:21 +03:00
Egon Elbre	7802ab714f	pkg/,private/: merge with private package Initially there were pkg and private packages, however for all practical purposes there's no significant difference between them. It's clearer to have a single private package - and when we do get a specific abstraction that needs to be reused, we can move it to storj.io/common or storj.io/private. Change-Id: Ibc2036e67f312f5d63cb4a97f5a92e38ae413aa5	2021-04-23 16:37:28 +03:00
Yingrong Zhao	43f5888052	satellite/contact: only test PingBack failure case The original test caused the testplanet to timeout due to contact service stuck in a loop. We can change it to only test the failure case for PingBack method instead of closing the TCP port on storagenodes. Change-Id: Ic96aee637b39ae95050c6902c2bf9ca51fb586c3	2021-04-21 14:20:36 -04:00
Yingrong Zhao	a3c437a7bf	satellite/contact,storagenode/contact: try ping back to nodes through QUIC We want to encourage storagenodes to open their udp port. This PR changes contact service in satellite to try to connect to nodes through QUIC. If satellite can't reach nodes through quic, it will send an error message back to nodes. On the nodes side, it will always log out error message from check in if the error message is not empty. Whether satellite can reach nodes through quic has no affect on nodes' uptime check. Change-Id: I5ebf80f921c4a6504997d83c8bd45226da9d3703	2021-04-20 19:25:37 +00:00
Egon Elbre	86e698f572	pb: use *UnimplementedServer to avoid breaking API changes Change-Id: I99a34eeb37ac4453411f273511710562a519f57a	2021-03-29 12:26:10 +03:00
Yaroslav Vorobiov	966535e9de	{storagenode,satellite}/nodeoperator: add wallet features Change-Id: Iac7eb40a52b8fddcc573aebaad2e3a30a10cded9	2021-02-08 22:09:45 +02:00
Cameron Ayer	d14607a5f7	satellite/{contact,nodestats,overlay,satellitedb}: remove references to total_uptime_count and uptime_success_count columns Change-Id: I1f92022909bc564e9b1e31bf937fdfe7c16554de	2021-01-19 15:43:02 -05:00
Moby von Briesen	d75e4be11f	satellite/{accounting, contact}: Remove periods and spaces from metrics. Change-Id: I84179c2931293e3a1eb0ff8050416d25e481ce07	2020-12-03 15:33:01 +00:00
Egon Elbre	080ba47a06	all: fix dots Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2	2020-07-16 14:58:28 +00:00
Egon Elbre	bef84a5f9d	storagenode: remove dependency to overlay.NodeDossier This is the last dependency from storage node to satellite. Change-Id: I12f7abb91e84f823ba5af126c6e2979519838612	2020-05-21 08:37:13 +03:00
Egon Elbre	5d016425f1	satellite/{contact,downtime,overlay}: use NodeURL Change-Id: I555a479a89e0ddbf0499898bdbc8574282cd6846	2020-05-20 11:09:05 +00:00
Egon Elbre	941d10cbc3	private/testplanet: remove Peer.Local() Currently storagenode depends on overlay.NodeDossier, this is the first step in removing it. Change-Id: I034a3f1601835f8349bd41752455022e19bcc707	2020-05-20 11:05:34 +00:00
Egon Elbre	ed627144ed	all: use DialNodeURL throughout the codebase Change-Id: Iaf9ae3aeef7305c937f2660c929744db2d88776c	2020-05-20 10:36:30 +00:00
Jess G	6fac1ed130	satellite/contact: rm unhelpful verbose logging (#3894 ) * unhelpful rm verbose logging Change-Id: I74dc29f18969250fa6838fed9bec8ba7ea160b77 * change to debug Change-Id: I4ba880e68b40bc3e0f29d9b1aa9154e63124c904	2020-05-19 08:09:00 -07:00
Cameron Ayer	42be4bdc0f	satellite/contact: add timeout to PingBack method Change-Id: I2ec2f82e2e10d8be16f82e9de13ce42358e47c98	2020-04-04 18:26:30 +00:00
Yingrong Zhao	b7b19289d1	bump storj.io/common to latest Change-Id: I16e337660ce8e1ef332cc842dbf4cfa067b9b98b	2020-03-25 09:08:40 -04:00
Jess G	39cb821196	satellite/overlay: rm combinedcache, fix IP naming to be network (#3798 ) * rn combinedcache, rm dns node lookup Change-Id: I239f07211764b097d851230d8c81900a47756e9e * excludeIPs -> excludedNetworks Change-Id: Ifa6f44ab17457cdd5aff4cd5694296867c18b179 * use lowercase var name Change-Id: I825aad2b718c71f455e747be18f8cabd02aabe55 * update Getnetwork name Change-Id: I002a1b7bc6b4ef40159c0cd2b0ef209f80a9c503 * fix comments Change-Id: Ibddf5b9ffa9d685af6c392d893db063ef18e45fa * update comments with ipv6 Change-Id: I31758b7d4979e7c27d014668f4fb532ad838cda2 Co-authored-by: Stefan Benten <mail@stefan-benten.de>	2020-03-12 11:37:57 -07:00
Jessica Grebenschikov	803e2930f4	satellite: use IP for all uplink operations, use hostname for audit and repairs My understanding is that the nodes table has the following fields: - `address` field which can be a hostname or an IP - `last_net` field that is the /24 subnet of the IP resolved from the address This PR does the following: 1) add back the `last_ip` field to the nodes table 2) for uplink operations remove the calls that the satellite makes to `lookupNodeAddress` (which makes the DNS calls to resolve the IP from the hostname) and instead use the data stored in the nodes table `last_ip` field. This means that the IP that the satellite sends to the uplink for the storage nodes could be approx 1 hr stale. In the short term this is fine, next we will be adding changes so that the storage node pushes any IP changes to the satellite in real time. 3) use the address field for repair and audit since we want them to still make DNS calls to confirm the IP is up to date 4) try to reduce confusion about hostname, ip, subnet, and address in the code base Change-Id: I96ce0d8bb78303f82483d0701bc79544b74057ac	2020-03-11 09:11:40 -07:00
Jeff Wendling	7999d24f81	all: use monkit v3 this commit updates our monkit dependency to the v3 version where it outputs in an influx style. this makes discovery much easier as many tools are built to look at it this way. graphite and rothko will suffer some due to no longer being a tree based on dots. hopefully time will exist to update rothko to index based on the new metric format. it adds an influx output for the statreceiver so that we can write to influxdb v1 or v2 directly. Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff	2020-02-05 23:53:17 +00:00
Yingrong Zhao	db8aee0806	satellite/contact; storagenode/preflight: add clock check on startup for storagenode add config preflight.enabled-local-time Change-Id: I7b942c9bee063aae409ee6721ae9d079dff0144f	2020-01-15 15:35:26 +00:00
Yingrong Zhao	ee87846f0b	satellite/contact: add placeholder for GetTime endpoint Change-Id: I42f8479708f0558350c2280a398d84d145e8118f	2020-01-14 06:38:47 +00:00
Ethan	8859c36234	satellite/{downtime,contact}: Add CheckNodeAvailability for use within the downtime tracking chores. https://storjlabs.atlassian.net/browse/V3-2545 Change-Id: I1dd54a0c77cb4905bb1f350beeb82c6f7700ee70	2020-01-02 18:24:11 +00:00
Egon Elbre	6615ecc9b6	common: separate repository Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a	2019-12-27 14:11:15 +02:00
Egon Elbre	d55288cf68	pkg/rpc: replace methods with direct calls to pb Change-Id: I8bd015d8d316a2c12c1daceca1d9fd257f6f57bc	2019-12-22 17:12:43 +02:00
Egon Elbre	006baa9ca6	pkg/rpc: remove drpc aliases We need to split up pb package, which means we cannot have a core package that depends on them. Change-Id: I7f4f6fd82f89a51a9b2ad08bf2b1207253b8a215	2019-12-22 16:58:08 +02:00
littleskunk	8b3444e088	satellite/nodeselection: don't select nodes that haven't checked in for a while (#3567 ) * satellite/nodeselection: dont select nodes that havent checked in for a while * change testplanet online window to one minute * remove satellite reconfigure online window = 0 in repair tests * pass timestamp into UpdateCheckIn * change timestamp to timestamptz * edit tests to set last_contact_success to 4 hours ago * fix syntax error * remove check for last_contact_success > last_contact_failure in IsOnline	2019-11-15 23:43:06 +01:00
Egon Elbre	ee6c1cac8a	private: rename internal to private (#3573 )	2019-11-14 21:46:15 +02:00
littleskunk	7eb6724c92	logging: unify logging around satellite ID, node ID and piece ID (#3491 ) * logging: unify logging around satellite ID, node ID and piece ID * unify segment index	2019-11-05 22:04:07 +01:00
Jess G	4d85b11574	satellite/contact: improve errors in contact endpoints (#3356 ) * improve errors in satellite contact endpoints * add changes per CR comments * update pingback method so it still updates node table * fix err and returns * fix zap logging to be better	2019-10-30 11:57:21 -07:00
JT Olio	1b66517664	contact: small typo Change-Id: If9126ae518b5672bfd9163a8c9fc518727d5138b	2019-10-14 13:21:05 -06:00
Jennifer Li Johnson	b185dbbee2	satellite/discovery: remove discovery related code (#3175 )	2019-10-14 10:57:01 -04:00
Jeff Wendling	098cbc9c67	all: use pkg/rpc instead of pkg/transport all of the packages and tests work with both grpc and drpc. we'll probably need to do some jenkins pipelines to run the tests with drpc as well. most of the changes are really due to a bit of cleanup of the pkg/transport.Client api into an rpc.Dialer in the spirit of a net.Dialer. now that we don't need observers, we can pass around stateless configuration to everything rather than stateful things that issue observations. it also adds a DialAddressID for the case where we don't have a pb.Node, but we do have an address and want to assert some ID. this happened pretty frequently, and now there's no more weird contortions creating custom tls options, etc. a lot of the other changes are being consistent/using the abstractions in the rpc package to do rpc style things like finding peer information, or checking status codes. Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412	2019-09-25 15:37:06 -06:00
Egon Elbre	9ceff9f9c6	satellite/overlay: move CheckIn benchmark to overlay (#3095 )	2019-09-20 16:35:52 -04:00
Jennifer Li Johnson	724bb44723	Remove Kademlia dependencies from Satellite and Storagenode (#2966 ) What: cmd/inspector/main.go: removes kad commands internal/testplanet/planet.go: Waits for contact chore to finish satellite/contact/nodesservice.go: creates an empty nodes service implementation satellite/contact/service.go: implements Local and FetchInfo methods & adds external address config value satellite/discovery/service.go: replaces kad.FetchInfo with contact.FetchInfo in Refresh() & removes Discover() satellite/peer.go: sets up contact service and endpoints storagenode/console/service.go: replaces nodeID with contact.Local() storagenode/contact/chore.go: replaces routing table with contact service storagenode/contact/nodesservice.go: creates empty implementation for ping and request info nodes service & implements RequestInfo method storagenode/contact/service.go: creates a service to return the local node and update its own capacity storagenode/monitor/monitor.go: uses contact service in place of routing table storagenode/operator.go: moves operatorconfig from kad into its own setup storagenode/peer.go: sets up contact service, chore, pingstats and endpoints satellite/overlay/config.go: changes NodeSelectionConfig.OnlineWindow default to 4hr to allow for accurate repair selection Removes kademlia setups in: cmd/storagenode/main.go cmd/storj-sim/network.go internal/testplane/planet.go internal/testplanet/satellite.go internal/testplanet/storagenode.go satellite/peer.go scripts/test-sim-backwards.sh scripts/testdata/satellite-config.yaml.lock storagenode/inspector/inspector.go storagenode/peer.go storagenode/storagenodedb/database.go Why: Replacing Kademlia Please describe the tests: • internal/testplanet/planet_test.go: TestBasic: assert that the storagenode can check in with the satellite without any errors TestContact: test that all nodes get inserted into both satellites' overlay cache during testplanet setup • satellite/contact/contact_test.go: TestFetchInfo: Tests that the FetchInfo method returns the correct info • storagenode/contact/contact_test.go: TestNodeInfoUpdated: tests that the contact chore updates the node information TestRequestInfoEndpoint: tests that the Request info endpoint returns the correct info Please describe the performance impact: Node discovery should be at least slightly more performant since each node connects directly to each satellite and no longer needs to wait for bootstrapping. It probably won't be faster in real time on start up since each node waits a random amount of time (less than 1 hr) to initialize its first connection (jitter).	2019-09-19 15:56:34 -04:00

1 2

59 Commits