storj

Author	SHA1	Message	Date
Egon Elbre	87687938d1	storagenode/contact: fix panic in ping satellites (#3447 )	2019-11-01 16:20:53 +01:00
Ethan Adams	43103ae13f	lower storage node counts in tests (#3427 )	2019-10-31 10:57:54 -04:00
Jess G	4d85b11574	satellite/contact: improve errors in contact endpoints (#3356 ) * improve errors in satellite contact endpoints * add changes per CR comments * update pingback method so it still updates node table * fix err and returns * fix zap logging to be better	2019-10-30 11:57:21 -07:00
Natalie Villasana	4878135068	satellite/gracefulexit, storagenode/gracefulexit: add timeouts (#3407 )	2019-10-30 13:40:57 -04:00
Natalie Villasana	5453886231	satellite/repair, uplink/ecclient: remove unused expiration arg from ec.Repair and ec.putPiece (#3416 )	2019-10-30 11:35:00 -04:00
Yingrong Zhao	3ee0b89f8f	storagenode/gracefulexit: delete pieces when receive Delete or Completed message from satellite (#3406 )	2019-10-30 10:46:56 -04:00
Egon Elbre	65a8e0bcbc	{satellite,storagenode}/gracefulexit: clearer log messages (#3413 )	2019-10-30 10:21:27 +02:00
Isaac Hess	1defd4dbfe	storagenode/piecestore: Respect config.MaxConcurrentRequests for drpc (#3402 )	2019-10-28 13:12:49 -06:00
Ethan Adams	5b0398a718	storagenode/gracefulexit: Exclude finished exits from chore/worker processing. Fix update status bug (#3399 )	2019-10-28 13:59:45 -04:00
Egon Elbre	93353df4d6	internal/sync2: make Fence accept context (#3393 )	2019-10-28 16:04:31 +02:00
paul cannon	1469f7f41f	storagenode/contact: wait for UpdateSelf before start (#3332 ) When the contact chore starts running before the monitor service has provided any useful capacity data, the first outgoing contact has not-very-helpful data for the satellite. This change causes the contact chore to wait until capacity data is available. The wait should be quite short in all reasonable cases: even when a node starts with a lot of stored pieces and no cached spaceUsedDB data, new data will have been calculated and cached by the call to `peer.Storage2.CacheService.Init(ctx)` in `storagenode.cmdRun()` before `peer.Run(ctx)`. Change-Id: Ibc26d5c1fc10a23006c00bc3f13ff6cf71f8bf1d	2019-10-26 12:16:25 -05:00
Jeff Wendling	ed48e74e20	gracefulexit: fix build for drpc (#3387 ) Change-Id: I335e9f8991a10c9e8a0737bc7c9ea3f04cbe2546	2019-10-26 15:53:35 +02:00
Maximillian von Briesen	6df4d7bc73	storagenode/gracefulexit + satellite/gracefulexit: add storagenode-side transfer validation (#3371 ) * Make the exiting node check piece hashes, piece IDs, and piece hash signatures before relaying successful transfer data to the satellite. * Enable immediate graceful exit failure for "successful" transfers that fail satellite-side validation. * Move transfer piece logic in storagenode worker to separate function (to make the worker easier to understand)	2019-10-25 13:16:20 -04:00
Yingrong Zhao	fa1ac24e19	satellite/gracefulexit: add failure threshold check (#3329 ) * add overall failure percentage check and inactive time frame check before sending a response to sno * update comment * delete node from transfer queue if it has been inactive for too long * fix linting error * add test config value * fix nil pointer * add config value into testplanet * add unit test for overall failure threshold * move timeframe threshold to chore * update protolock * add chore test * add per peiece failure count logic * change config name from EndpointMaxFailures to MaxFailuresPerPiece * address comments * fix linting error * add error handling for no row returned from progress table * fix test for graceful exit chore on storagenode * fix typo InActive -> Inactive * improve readability for failure threshold calculation * update config lock * change error handling for GetProgress in graceful exit endpoint on the satellite side * return proper rpc error in endpoint * add check in chore test for checking finish timestamp and queue	2019-10-24 12:24:42 -04:00
Isaac Hess	75412e54e5	storagenode/piecestore: Rename liveGRPCRequests back to liveRequests (#3354 )	2019-10-23 13:43:43 -06:00
Isaac Hess	14c7648530	storagenode/piecestore: Only limit grpc requests (#3342 )	2019-10-23 10:14:02 -06:00
JT Olio	2c6fa3c5f8	pkg/rpc: remove read/write deadlines as a mechanism for request timeouts (#3335 ) libuplink was incorrectly setting timeouts to 10 seconds still, but should have been at least 10 minutes. the order sender was setting them to 1 hour. we don't want timeouts in uplink-side logic as it establishes a minimum rate on tcp streams. instead of all of this, just use tcp keep alive. tcp keep alive packets are sent every 15 seconds and if the peer stops responding the connection dies. this is enabled by default with go. this will kill tcp connections when they stop working. Change-Id: I3d7ad49f71950b3eb43044eedf4b17993116045b	2019-10-22 17:57:24 -06:00
Ethan Adams	3e0d12354a	storagenode/gracefulexit: Implement storage node graceful exit worker - part 1 (#3322 )	2019-10-22 16:42:21 -04:00
paul cannon	5e78f4000b	storagenode/pieces: remove old comment (#3334 ) the reservedSpace member it's talking about was removed quite a while ago. Change-Id: I28433b2a44467376a408453d875c389656347cab	2019-10-22 12:51:51 +03:00
Bryan White	f468816f13	{internal/version,versioncontrol,cmd/storagenode-updater}: add rollout to storagenode updater (#3276 )	2019-10-21 12:50:59 +02:00
Bryan White	243ba1cb17	{versioncontrol,internal/version,cmd/*}: refactor version control (#3253 )	2019-10-20 09:56:23 +02:00
Yingrong Zhao	e5099f31f3	add context.Clean and correct rpc error code (#3295 )	2019-10-16 13:50:01 -04:00
Isaac Hess	ed6b88a12d	piecestore: update usage before completing upload (#3286 ) The upload code currently updates the usage in a deferred call to saveOrder(). The consequence is that in the success case, the RPC is completed before the usage has been updated. This change repurposes the deferred call to update usage in the failure case, while explicitly updating the usage before completing the RPC. This fixes some test flakiness when using dRPC. gRPC waits until the final status is written before a Recv call completes, and the final status is written by the server after the handler function has exited. In practice this means that the client is blocked until the defer call is also finished. So this change will not change performance at all. It has two advantages: (1) It fixes test flakiness and, more importantly: (2) reduces the chances that someone will accidentally write a flaky test in the future	2019-10-15 20:17:17 -06:00
Yingrong Zhao	87e3764390	storagenode/cmd: add exit-status command for graceful exit (#3264 ) * add exit-status command * remove todo and fix format * fix status display * change startExit to exit progress * fix linting error * add successful column in exit progress * fix test * remove extra new line * fix TYPOS * format the percentage better	2019-10-15 18:07:32 -04:00
Andrew Harding	4962c6843e	piecestore: fix test flakiness around upload/download usage tracking (#3282 )	2019-10-15 11:22:15 -06:00
Simon Guindon	abb5b6c499	storagenode/piecestore: Fix to ignore both gRPC and dRPC EOF errors. (#3274 ) * Fix to ignore both gRPC and dRPC EOF errors. * Fix to ignore both gRPC and dRPC EOF errors.	2019-10-15 12:13:53 -04:00
Ethan Adams	1ad2ba7e3e	storagenode/gracefulexit: Add graceful exit chore and worker. (#3262 ) Adds graceful exit chore and worker for V3-2614	2019-10-15 11:29:47 -04:00
Jennifer Li Johnson	b185dbbee2	satellite/discovery: remove discovery related code (#3175 )	2019-10-14 10:57:01 -04:00
littleskunk	96aeedcdee	OrderLimit/GracePeriod: Increase time window from 1h to 24h (#3255 ) * OrderLimit/GracePeriod: Increase time window from 1h to 24h * update satellite config lock	2019-10-13 17:40:24 +02:00
JT Olio	6ede140df1	pkg/rpc: defeat MITM attacks in most cases (#3215 ) This change adds a trusted registry (via the source code) of node address to node id mappings (currently only for well known Satellites) to defeat MITM attacks to Satellites. It also extends the uplink UI such that when entering a satellite address by hand, a node id prefix can also be added to defeat MITM attacks with unknown satellites. When running uplink setup, satellite addresses can now be of the form 12EayRS2V1k@us-central-1.tardigrade.io (not even using a full node id) to ensure that the peer contacted is the peer that was expected. When using a known satellite address, the known node ids are used if no override is provided.	2019-10-12 14:34:41 -06:00
Isaac Hess	e567f27634	storagenode/piecestore: Change test to use ioutil.ReadAll to attempt to reduce test flake (#3250 )	2019-10-11 15:57:59 -06:00
Cameron	d17be58237	remove random sleep in storagenode contact (#3243 )	2019-10-11 16:44:18 -04:00
Vitalii Shpital	78a71ad3b6	web/storagenode: node status updated (#3220 )	2019-10-11 19:28:47 +03:00
Yingrong Zhao	743a0fc38b	storagenode/cmd: create start graceful exit CLI (#3202 )	2019-10-11 09:58:12 -04:00
littleskunk	d5b2e1ef89	storagenode/signature: Reject uploads with a timestamp too far in the future (#3194 )	2019-10-08 13:09:46 +02:00
JT Olio	37491d0d32	storagenode: embed the console into the binary and makefile (#3164 ) * web/storagenode: add package-lock.json * storagenode: compile console into binary	2019-10-08 10:52:19 +02:00
Jennifer Li Johnson	7ceaabb18e	Delete Bootstrap and Kademlia (#2974 )	2019-10-04 16:48:41 -04:00
Jeff Wendling	64e43e555e	pkg/rpc: return context error if ready after DialContext fails the net package does not make it easy to know if DialContext failed because the context was done. it's important for some of our tests that canceled contexts are detected as such, so we accept the small race that's arguably correct (the context must be canceled asynchronously) to ensure we always return the context error if available. Change-Id: I058064d5c666e5353b74fb5bd300bf7abe537ff5	2019-10-04 20:09:00 +00:00
Yaroslav Vorobiov	a11619e7f3	storagenode/console: use bandwidth monthly summary (#3183 )	2019-10-04 09:29:25 -06:00
Yaroslav Vorobiov	4824ecdb8d	storagenode/console: use bytes for remaining info (#3186 )	2019-10-04 18:17:28 +03:00
littleskunk	b2e328f118	storagenode/dashboard: update online status (#3168 )	2019-10-03 20:31:39 +02:00
Maximillian von Briesen	08ed50bcaa	satellite/metainfo: add commit interval to prevent long delays between order limit creation and segment commit (#3149 )	2019-10-01 12:55:02 -04:00
Bill Thorp	89c59d06f9	storagenode/storagenodedb: add SQL receiver logic for graceful exit (#3067 ) * added graceful exit db methods	2019-10-01 10:34:03 -04:00
Jennifer Li Johnson	755cbd4dce	storagenode/main: map aliases for kademlia config values (#3118 )	2019-09-30 19:33:00 -04:00
Jennifer Li Johnson	29b96a666b	internal/testplanet: fix conn leak (#3132 )	2019-09-27 09:47:57 -06:00
Isaac Hess	2c5e169888	storagenode/storagenodedb: Vacuum info.db to prepare for splitting storagenodedbs (#3134 )	2019-09-27 07:55:51 -06:00
Jeff Wendling	098cbc9c67	all: use pkg/rpc instead of pkg/transport all of the packages and tests work with both grpc and drpc. we'll probably need to do some jenkins pipelines to run the tests with drpc as well. most of the changes are really due to a bit of cleanup of the pkg/transport.Client api into an rpc.Dialer in the spirit of a net.Dialer. now that we don't need observers, we can pass around stateless configuration to everything rather than stateful things that issue observations. it also adds a DialAddressID for the case where we don't have a pb.Node, but we do have an address and want to assert some ID. this happened pretty frequently, and now there's no more weird contortions creating custom tls options, etc. a lot of the other changes are being consistent/using the abstractions in the rpc package to do rpc style things like finding peer information, or checking status codes. Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412	2019-09-25 15:37:06 -06:00
Isaac Hess	580e511b4c	storagenode/storagenodedb: Migrate to separate dbs (#3081 ) * storagenode/storagenodedb: Migrate to separate dbs * storagenode/storagenodedb: Add migration to drop versions tables * Put drop table statements into a transaction. * Fix CI errors. * Fix CI errors. * Changes requested from PR feedback. * storagenode/storagenodedb: fix tx commit	2019-09-23 12:36:46 -07:00
Jennifer Li Johnson	d2502bb51b	Adds tests for kad replacement and restores kad operator configs (#3094 ) * test that all nodes can check in with all satellites * keep kademlia config * add untrusted satellite test * use getversion * remove kademlia config changes in test-sim-backwards.sh * add kademlia flags back to storj-sim storagenode * reset kademlia flags in storagenode entrypoint	2019-09-20 16:02:23 -04:00
Jennifer Li Johnson	724bb44723	Remove Kademlia dependencies from Satellite and Storagenode (#2966 ) What: cmd/inspector/main.go: removes kad commands internal/testplanet/planet.go: Waits for contact chore to finish satellite/contact/nodesservice.go: creates an empty nodes service implementation satellite/contact/service.go: implements Local and FetchInfo methods & adds external address config value satellite/discovery/service.go: replaces kad.FetchInfo with contact.FetchInfo in Refresh() & removes Discover() satellite/peer.go: sets up contact service and endpoints storagenode/console/service.go: replaces nodeID with contact.Local() storagenode/contact/chore.go: replaces routing table with contact service storagenode/contact/nodesservice.go: creates empty implementation for ping and request info nodes service & implements RequestInfo method storagenode/contact/service.go: creates a service to return the local node and update its own capacity storagenode/monitor/monitor.go: uses contact service in place of routing table storagenode/operator.go: moves operatorconfig from kad into its own setup storagenode/peer.go: sets up contact service, chore, pingstats and endpoints satellite/overlay/config.go: changes NodeSelectionConfig.OnlineWindow default to 4hr to allow for accurate repair selection Removes kademlia setups in: cmd/storagenode/main.go cmd/storj-sim/network.go internal/testplane/planet.go internal/testplanet/satellite.go internal/testplanet/storagenode.go satellite/peer.go scripts/test-sim-backwards.sh scripts/testdata/satellite-config.yaml.lock storagenode/inspector/inspector.go storagenode/peer.go storagenode/storagenodedb/database.go Why: Replacing Kademlia Please describe the tests: • internal/testplanet/planet_test.go: TestBasic: assert that the storagenode can check in with the satellite without any errors TestContact: test that all nodes get inserted into both satellites' overlay cache during testplanet setup • satellite/contact/contact_test.go: TestFetchInfo: Tests that the FetchInfo method returns the correct info • storagenode/contact/contact_test.go: TestNodeInfoUpdated: tests that the contact chore updates the node information TestRequestInfoEndpoint: tests that the Request info endpoint returns the correct info Please describe the performance impact: Node discovery should be at least slightly more performant since each node connects directly to each satellite and no longer needs to wait for bootstrapping. It probably won't be faster in real time on start up since each node waits a random amount of time (less than 1 hr) to initialize its first connection (jitter).	2019-09-19 15:56:34 -04:00

1 2 3 4 5 ...

269 Commits