storj

Author	SHA1	Message	Date
Clement Sam	8b40a071a0	storagenode: check if QUIC is properly configured This change adds a check to confirm if UDP port if properly configured for QUIC Resolves https://github.com/storj/storj/issues/4332 Partly resolves https://github.com/storj/storj/issues/4358 Change-Id: I9a66f26a115e48b4fcd168f50a7d0b4d81712f4e	2022-01-20 12:04:04 +00:00
Egon Elbre	961e841bd7	all: fix error naming errs.Class should not contain "error" in the name, since that causes a lot of stutter in the error logs. As an example a log line could end up looking like: ERROR node stats service error: satellitedbs error: node stats database error: no rows Whereas something like: ERROR nodestats service: satellitedbs: nodestatsdb: no rows Would contain all the necessary information without the stutter. Change-Id: I7b7cb7e592ebab4bcfadc1eef11122584d2b20e0	2021-04-29 15:38:21 +03:00
Yingrong Zhao	a3c437a7bf	satellite/contact,storagenode/contact: try ping back to nodes through QUIC We want to encourage storagenodes to open their udp port. This PR changes contact service in satellite to try to connect to nodes through QUIC. If satellite can't reach nodes through quic, it will send an error message back to nodes. On the nodes side, it will always log out error message from check in if the error message is not empty. Whether satellite can reach nodes through quic has no affect on nodes' uptime check. Change-Id: I5ebf80f921c4a6504997d83c8bd45226da9d3703	2021-04-20 19:25:37 +00:00
Cameron Ayer	48d8114b3f	satellite/contact: treat pingback failure as error If the satellite fails to pingback the storage node during CheckIn an error message is returned to the node in the response, but the actual error value returned is nil. We are only checking the error. This means the node has no feedback about the failure, and the node also does not attempt to retry the connection. Change-Id: Iaed00e422ba91af573e72255cc6671ea97928eae	2020-11-16 18:26:37 +00:00
Egon Elbre	0bdb952269	all: use keyed special comment Change-Id: I57f6af053382c638026b64c5ff77b169bd3c6c8b	2020-10-13 15:13:41 +03:00
Egon Elbre	94a09ce20b	all: add missing dots Change-Id: I93b86c9fb3398c5d3c9121b8859dad1c615fa23a	2020-08-11 17:50:01 +03:00
Egon Elbre	080ba47a06	all: fix dots Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2	2020-07-16 14:58:28 +00:00
Egon Elbre	bef84a5f9d	storagenode: remove dependency to overlay.NodeDossier This is the last dependency from storage node to satellite. Change-Id: I12f7abb91e84f823ba5af126c6e2979519838612	2020-05-21 08:37:13 +03:00
Egon Elbre	94b2b315f7	storagenode/trust: refactor GetAddress to GetNodeURL Most places now need the NodeURL rather than the ID and Address separately. This simplifies code in multiple places. Change-Id: I52621d8ca52296a8b5bf7afbc1001cf8bfb44239	2020-05-20 11:05:15 +00:00
Egon Elbre	ed627144ed	all: use DialNodeURL throughout the codebase Change-Id: Iaf9ae3aeef7305c937f2660c929744db2d88776c	2020-05-20 10:36:30 +00:00
Yingrong Zhao	f663906357	storagenode/contact: call return value from mon.Task() on function finish Change-Id: I5e6462acb99ac1d28b5d2518d5db8a4afe593d11	2020-04-01 23:26:14 +00:00
Yingrong Zhao	b7b19289d1	bump storj.io/common to latest Change-Id: I16e337660ce8e1ef332cc842dbf4cfa067b9b98b	2020-03-25 09:08:40 -04:00
Cameron Ayer	7244a6a84e	storagenode/{contact, piecestore}: implement low disk notification with cooldown When a storagenode begins to run low on capacity, we want to notify the satellite before completely running out of space. To achieve this, at the end of an upload request, the SN checks if its available space has fallen below a certain threshold. If so, trigger a notification to the satellites. The new NotifyLowDisk method on the monitor chore is implemented using the common/syn2.Cooldown type, which allows us to execute contact only once within a given timeframe; avoiding hammering the satellites with requests. This PR contains changes to the storagenode/contact package, namely moving methods involving the actual satellite communication out of Chore and into Service. This allows us to ping satellites from the monitor chore Change-Id: I668455748cdc6741291b61130d8ef9feece86458	2020-03-03 10:45:37 -05:00
Jeff Wendling	7999d24f81	all: use monkit v3 this commit updates our monkit dependency to the v3 version where it outputs in an influx style. this makes discovery much easier as many tools are built to look at it this way. graphite and rothko will suffer some due to no longer being a tree based on dots. hopefully time will exist to update rothko to index based on the new metric format. it adds an influx output for the statreceiver so that we can write to influxdb v1 or v2 directly. Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff	2020-02-05 23:53:17 +00:00
Egon Elbre	6615ecc9b6	common: separate repository Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a	2019-12-27 14:11:15 +02:00
Egon Elbre	ee6c1cac8a	private: rename internal to private (#3573 )	2019-11-14 21:46:15 +02:00
Jennifer Li Johnson	11f0ea3258	5s (#3477 )	2019-11-04 16:20:31 -05:00
Egon Elbre	93353df4d6	internal/sync2: make Fence accept context (#3393 )	2019-10-28 16:04:31 +02:00
paul cannon	1469f7f41f	storagenode/contact: wait for UpdateSelf before start (#3332 ) When the contact chore starts running before the monitor service has provided any useful capacity data, the first outgoing contact has not-very-helpful data for the satellite. This change causes the contact chore to wait until capacity data is available. The wait should be quite short in all reasonable cases: even when a node starts with a lot of stored pieces and no cached spaceUsedDB data, new data will have been calculated and cached by the call to `peer.Storage2.CacheService.Init(ctx)` in `storagenode.cmdRun()` before `peer.Run(ctx)`. Change-Id: Ibc26d5c1fc10a23006c00bc3f13ff6cf71f8bf1d	2019-10-26 12:16:25 -05:00
Jennifer Li Johnson	b185dbbee2	satellite/discovery: remove discovery related code (#3175 )	2019-10-14 10:57:01 -04:00
Cameron	d17be58237	remove random sleep in storagenode contact (#3243 )	2019-10-11 16:44:18 -04:00
Jennifer Li Johnson	29b96a666b	internal/testplanet: fix conn leak (#3132 )	2019-09-27 09:47:57 -06:00
Jennifer Li Johnson	724bb44723	Remove Kademlia dependencies from Satellite and Storagenode (#2966 ) What: cmd/inspector/main.go: removes kad commands internal/testplanet/planet.go: Waits for contact chore to finish satellite/contact/nodesservice.go: creates an empty nodes service implementation satellite/contact/service.go: implements Local and FetchInfo methods & adds external address config value satellite/discovery/service.go: replaces kad.FetchInfo with contact.FetchInfo in Refresh() & removes Discover() satellite/peer.go: sets up contact service and endpoints storagenode/console/service.go: replaces nodeID with contact.Local() storagenode/contact/chore.go: replaces routing table with contact service storagenode/contact/nodesservice.go: creates empty implementation for ping and request info nodes service & implements RequestInfo method storagenode/contact/service.go: creates a service to return the local node and update its own capacity storagenode/monitor/monitor.go: uses contact service in place of routing table storagenode/operator.go: moves operatorconfig from kad into its own setup storagenode/peer.go: sets up contact service, chore, pingstats and endpoints satellite/overlay/config.go: changes NodeSelectionConfig.OnlineWindow default to 4hr to allow for accurate repair selection Removes kademlia setups in: cmd/storagenode/main.go cmd/storj-sim/network.go internal/testplane/planet.go internal/testplanet/satellite.go internal/testplanet/storagenode.go satellite/peer.go scripts/test-sim-backwards.sh scripts/testdata/satellite-config.yaml.lock storagenode/inspector/inspector.go storagenode/peer.go storagenode/storagenodedb/database.go Why: Replacing Kademlia Please describe the tests: • internal/testplanet/planet_test.go: TestBasic: assert that the storagenode can check in with the satellite without any errors TestContact: test that all nodes get inserted into both satellites' overlay cache during testplanet setup • satellite/contact/contact_test.go: TestFetchInfo: Tests that the FetchInfo method returns the correct info • storagenode/contact/contact_test.go: TestNodeInfoUpdated: tests that the contact chore updates the node information TestRequestInfoEndpoint: tests that the Request info endpoint returns the correct info Please describe the performance impact: Node discovery should be at least slightly more performant since each node connects directly to each satellite and no longer needs to wait for bootstrapping. It probably won't be faster in real time on start up since each node waits a random amount of time (less than 1 hr) to initialize its first connection (jitter).	2019-09-19 15:56:34 -04:00
Jennifer Li Johnson	3387750280	storagenode/contact: create chore for nodes to ping satellites (#2877 ) Creates a chore for nodes to announce themselves to their trusted satellites. Runs on startup and every hour thereafter	2019-09-06 12:14:03 -04:00
paul cannon	9821a21e5c	satellite,storagenode,bootstrap: add contact service to peer (#2951 ) * satellite,storagenode,bootstrap: add contact service to peer	2019-09-04 15:04:18 -04:00
paul cannon	adfa16188b	pkg/contact: bare-bones service and endpoint (#2941 ) * pkg/contact: bare-bones service and endpoint * split contact package into satellite and node * use new contact protobuf types	2019-09-04 11:29:34 -07:00

26 Commits