storj

Author	SHA1	Message	Date
Cameron Ayer	3e70a893dd	storagenode/{piecestore, contact}: report capacity to satellites if below specific threshold Curently, storage nodes only report their capacity to satellites once per hour. If a node fills up, it will fail all uploads until the next contact cycle begins. With these changes, at the end of an upload we check whether the MinimumDiskSpace threshold has been passed. If so, trigger the monitor chore to update the node's capacity, then trigger the contact chore to report the new capacity to the satellites Change-Id: Ie6aadaade1e2c12c87e03f8ff9059a50121380a0	2020-02-18 15:42:48 -05:00
Jeff Wendling	7999d24f81	all: use monkit v3 this commit updates our monkit dependency to the v3 version where it outputs in an influx style. this makes discovery much easier as many tools are built to look at it this way. graphite and rothko will suffer some due to no longer being a tree based on dots. hopefully time will exist to update rothko to index based on the new metric format. it adds an influx output for the statreceiver so that we can write to influxdb v1 or v2 directly. Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff	2020-02-05 23:53:17 +00:00
Isaac Hess	4dafd03f11	storagenode: Prevent negative values in piece_space_used, migrate negatives to 0 Change-Id: Ibd663db087058c928190aa52c520f22e9338dd04	2020-01-30 13:03:18 -05:00
Egon Elbre	4e2bf81719	pkg/debug: add better title Change-Id: Icc6114f4e7523cfe6c7984ef1f6eec664ae4ee65	2020-01-30 07:49:40 -05:00
Egon Elbre	d10d6fd153	storagenode,satellite: ignore error on listening debug port Change-Id: Id3a6d153535776ce41f8edf2bd6f6dad5e2a60bf	2020-01-29 18:06:02 -05:00
Egon Elbre	10be538602	storagenode: add pkg/debug support Change-Id: If941095b886c28a0d53fff4c9bf9fa0ce7471dea	2020-01-29 16:30:31 -05:00
Egon Elbre	f237d70098	storagenode,satellite: use pkg/debug Use debug.Server in storage node and satellite for customizing debug server. Change-Id: I7979412376d028cadf29656d838ab94f18e2aa99	2020-01-29 16:30:31 -05:00
Egon Elbre	e319660f7a	private/lifecycle: implement Group lifecycle.Group implements controlling multiple items such that their startup and close works. Change-Id: Idb4f4a6c3a1f07cdcf44d3147a6c959686df0007	2020-01-29 00:37:33 +00:00
Stefan Benten	3abb8c8ed7	Dont require an IP address being set Per default our server address is listening on all IP addresses on the machine. This caused our preflight check to fail, as it did not have an hostname to lookup. With this change, we are fine with this and go ahead. Change-Id: I9eb5c891c099eb35f679d6d7e79ec38bb43b619f	2020-01-28 15:25:17 +01:00
stefanbenten	62d3783928	storagenode/peer: ensure contact.external-address and server.address is valid Change-Id: I634f0d355b0be18ba419726ace746921adda3ac0	2020-01-23 15:51:46 +00:00
Yingrong Zhao	db8aee0806	satellite/contact; storagenode/preflight: add clock check on startup for storagenode add config preflight.enabled-local-time Change-Id: I7b942c9bee063aae409ee6721ae9d079dff0144f	2020-01-15 15:35:26 +00:00
Egon Elbre	f41d440944	all: reduce number of log messages Remove starting up messages from peers. We expect all of them to start, if they don't, then they should return an error why they don't start. The only informative message is when a service is disabled. When doing initial database setup then each migration step isn't informative, hence print only a single line with the final version. Also use shorter log scopes. Change-Id: Ic8b61411df2eeae2a36d600a0c2fbc97a84a5b93	2020-01-06 19:03:46 +00:00
Egon Elbre	6615ecc9b6	common: separate repository Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a	2019-12-27 14:11:15 +02:00
Qweder93	e47ec84dee	storagenode notification service and api added Change-Id: I36898d7c43e1768e0cae0da8d83bb20b16f0cdde	2019-12-20 18:42:23 +00:00
Vitalii Shpital	53d9bc4530	storagenode/notifications: db created (#3707 )	2019-12-16 19:59:01 +02:00
Andrew Harding	cb89496569	storagenode/trust: wire up list into pool - also updated ping chore to pick up trust changes - fixed small typo in blueprint - fixed flags for storj-sim - wired up changes to testplanet Change-Id: I02982f3a63a1b4150b82a009ee126b25ed51917d	2019-12-13 20:32:50 +00:00
Isaac Hess	56f8fd2dd7	storagenode/pieces: Add EmptyTrash functionality (#3640 ) * storagenode/pieces: Add EmptyTrash functionality * storagenode/pieces: Fix err * storagenode/pieces: Fix lint	2019-11-26 09:25:21 -07:00
Egon Elbre	ee6c1cac8a	private: rename internal to private (#3573 )	2019-11-14 21:46:15 +02:00
Jess G	5abb91afcf	satellite: change the Peer name to Core (#3472 ) * change satellite.Peer name to Core * change to Core in testplanet * missed a few places * keep shared stuff in peer.go to stay consistent with storj/docs	2019-11-04 11:01:02 -08:00
JT Olio	2c6fa3c5f8	pkg/rpc: remove read/write deadlines as a mechanism for request timeouts (#3335 ) libuplink was incorrectly setting timeouts to 10 seconds still, but should have been at least 10 minutes. the order sender was setting them to 1 hour. we don't want timeouts in uplink-side logic as it establishes a minimum rate on tcp streams. instead of all of this, just use tcp keep alive. tcp keep alive packets are sent every 15 seconds and if the peer stops responding the connection dies. this is enabled by default with go. this will kill tcp connections when they stop working. Change-Id: I3d7ad49f71950b3eb43044eedf4b17993116045b	2019-10-22 17:57:24 -06:00
Ethan Adams	3e0d12354a	storagenode/gracefulexit: Implement storage node graceful exit worker - part 1 (#3322 )	2019-10-22 16:42:21 -04:00
Bryan White	f468816f13	{internal/version,versioncontrol,cmd/storagenode-updater}: add rollout to storagenode updater (#3276 )	2019-10-21 12:50:59 +02:00
Bryan White	243ba1cb17	{versioncontrol,internal/version,cmd/*}: refactor version control (#3253 )	2019-10-20 09:56:23 +02:00
Ethan Adams	1ad2ba7e3e	storagenode/gracefulexit: Add graceful exit chore and worker. (#3262 ) Adds graceful exit chore and worker for V3-2614	2019-10-15 11:29:47 -04:00
Jennifer Li Johnson	b185dbbee2	satellite/discovery: remove discovery related code (#3175 )	2019-10-14 10:57:01 -04:00
Cameron	d17be58237	remove random sleep in storagenode contact (#3243 )	2019-10-11 16:44:18 -04:00
Yingrong Zhao	743a0fc38b	storagenode/cmd: create start graceful exit CLI (#3202 )	2019-10-11 09:58:12 -04:00
JT Olio	37491d0d32	storagenode: embed the console into the binary and makefile (#3164 ) * web/storagenode: add package-lock.json * storagenode: compile console into binary	2019-10-08 10:52:19 +02:00
littleskunk	b2e328f118	storagenode/dashboard: update online status (#3168 )	2019-10-03 20:31:39 +02:00
Bill Thorp	89c59d06f9	storagenode/storagenodedb: add SQL receiver logic for graceful exit (#3067 ) * added graceful exit db methods	2019-10-01 10:34:03 -04:00
Jennifer Li Johnson	755cbd4dce	storagenode/main: map aliases for kademlia config values (#3118 )	2019-09-30 19:33:00 -04:00
Jeff Wendling	098cbc9c67	all: use pkg/rpc instead of pkg/transport all of the packages and tests work with both grpc and drpc. we'll probably need to do some jenkins pipelines to run the tests with drpc as well. most of the changes are really due to a bit of cleanup of the pkg/transport.Client api into an rpc.Dialer in the spirit of a net.Dialer. now that we don't need observers, we can pass around stateless configuration to everything rather than stateful things that issue observations. it also adds a DialAddressID for the case where we don't have a pb.Node, but we do have an address and want to assert some ID. this happened pretty frequently, and now there's no more weird contortions creating custom tls options, etc. a lot of the other changes are being consistent/using the abstractions in the rpc package to do rpc style things like finding peer information, or checking status codes. Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412	2019-09-25 15:37:06 -06:00
Isaac Hess	580e511b4c	storagenode/storagenodedb: Migrate to separate dbs (#3081 ) * storagenode/storagenodedb: Migrate to separate dbs * storagenode/storagenodedb: Add migration to drop versions tables * Put drop table statements into a transaction. * Fix CI errors. * Fix CI errors. * Changes requested from PR feedback. * storagenode/storagenodedb: fix tx commit	2019-09-23 12:36:46 -07:00
Jennifer Li Johnson	d2502bb51b	Adds tests for kad replacement and restores kad operator configs (#3094 ) * test that all nodes can check in with all satellites * keep kademlia config * add untrusted satellite test * use getversion * remove kademlia config changes in test-sim-backwards.sh * add kademlia flags back to storj-sim storagenode * reset kademlia flags in storagenode entrypoint	2019-09-20 16:02:23 -04:00
Jennifer Li Johnson	724bb44723	Remove Kademlia dependencies from Satellite and Storagenode (#2966 ) What: cmd/inspector/main.go: removes kad commands internal/testplanet/planet.go: Waits for contact chore to finish satellite/contact/nodesservice.go: creates an empty nodes service implementation satellite/contact/service.go: implements Local and FetchInfo methods & adds external address config value satellite/discovery/service.go: replaces kad.FetchInfo with contact.FetchInfo in Refresh() & removes Discover() satellite/peer.go: sets up contact service and endpoints storagenode/console/service.go: replaces nodeID with contact.Local() storagenode/contact/chore.go: replaces routing table with contact service storagenode/contact/nodesservice.go: creates empty implementation for ping and request info nodes service & implements RequestInfo method storagenode/contact/service.go: creates a service to return the local node and update its own capacity storagenode/monitor/monitor.go: uses contact service in place of routing table storagenode/operator.go: moves operatorconfig from kad into its own setup storagenode/peer.go: sets up contact service, chore, pingstats and endpoints satellite/overlay/config.go: changes NodeSelectionConfig.OnlineWindow default to 4hr to allow for accurate repair selection Removes kademlia setups in: cmd/storagenode/main.go cmd/storj-sim/network.go internal/testplane/planet.go internal/testplanet/satellite.go internal/testplanet/storagenode.go satellite/peer.go scripts/test-sim-backwards.sh scripts/testdata/satellite-config.yaml.lock storagenode/inspector/inspector.go storagenode/peer.go storagenode/storagenodedb/database.go Why: Replacing Kademlia Please describe the tests: • internal/testplanet/planet_test.go: TestBasic: assert that the storagenode can check in with the satellite without any errors TestContact: test that all nodes get inserted into both satellites' overlay cache during testplanet setup • satellite/contact/contact_test.go: TestFetchInfo: Tests that the FetchInfo method returns the correct info • storagenode/contact/contact_test.go: TestNodeInfoUpdated: tests that the contact chore updates the node information TestRequestInfoEndpoint: tests that the Request info endpoint returns the correct info Please describe the performance impact: Node discovery should be at least slightly more performant since each node connects directly to each satellite and no longer needs to wait for bootstrapping. It probably won't be faster in real time on start up since each node waits a random amount of time (less than 1 hr) to initialize its first connection (jitter).	2019-09-19 15:56:34 -04:00
Jeff Wendling	0dcbd3dc08	bootstrap/satellite/certificate/storagenode: register drpc services Change-Id: Id29f14b76a8c9cb2be31001b9a7a4356a4bda183	2019-09-12 15:09:46 -06:00
paul cannon	c139ed8ea1	storagenode/console: remove kademlia (#2942 ) this is a trivial operation for storagenode/console, as it doesn't really need or use kademlia in the first place. What: Removes kademlia from storagenode/console Why: We are in the process of getting rid of kademlia, and this is one place where it's particularly easy. Please describe the tests: Existing tests exercise storagenode/console behavior; if they continue to work, everything here should be tested satisfactorily. Please describe the performance impact: None	2019-09-11 16:41:43 -04:00
Jess G	2fc4d61610	implement contact.checkin method (#2952 ) * implement contact.checkin method * add batching to update uptime checks * rm batching * rm other unneeded things * fix lint * fix unit test * changes per CR comments * couple more CR changes * add identity check into grpcOpt * fix lint * why do you fix the test * revert test change * stop contact chore for repair test * put node in cache * comment out contact chore. See what happens * Revert "comment out contact chore. See what happens" This reverts commit 2e45008e36a50e0a842ae455ac83de77093d4daa. * try stopping contact earlier * stop contact chore in uplink_test * replace self on chore with RoutingTable for access to latest node info Revert "stop contact chore in uplink_test" This reverts commit 302db70f4071112d1b9f7ee0279225ea12757723. * Revert "try stopping contact earlier" This reverts commit 806cc3b82f9d598899dafd83da9315a1cb0cb43c. * Revert "stop contact chore for repair test" This reverts commit dd34de1cfdfc09b972186c9ab9a4f1e822446b79.	2019-09-10 09:05:07 -07:00
Egon Elbre	a801fab66a	all: add archview annotations (#2964 )	2019-09-10 16:24:16 +03:00
Jennifer Li Johnson	3387750280	storagenode/contact: create chore for nodes to ping satellites (#2877 ) Creates a chore for nodes to announce themselves to their trusted satellites. Runs on startup and every hour thereafter	2019-09-06 12:14:03 -04:00
paul cannon	9821a21e5c	satellite,storagenode,bootstrap: add contact service to peer (#2951 ) * satellite,storagenode,bootstrap: add contact service to peer	2019-09-04 15:04:18 -04:00
Yaroslav Vorobiov	758f7cb3dd	storagenode/bandwidth: remove bandwidth concerns from console, add satellite summary (#2923 )	2019-09-04 17:01:55 +03:00
Michal Niewrzal	ee614bf032	storagenode: add custom dial timeout for orders sending (#2939 )	2019-09-03 17:32:28 +02:00
Michal Niewrzal	3fbe31aada	storagenode: Increase order sending request timeout (#2930 )	2019-09-02 13:24:02 +02:00
Egon Elbre	62e3bf5b34	storagenode/retain: fix concurrency issues (#2828 ) * nicer flags * fix concurrency * add concurrent workers * initialize things * fix tests * close retain service * ensure we don't have workers working on the same satellite * ensure things compile * fix other compilation issues: * concurrency changes ran this with `go test -count=1000` and it passed all of them. - we add a closed channel so that we can select on it with context cancellation. - we put a once in so we only close the channel once. - every time the queue/running state changes, we have to broadcast because we may want to wake up N pending Wait calls or other concurrent workers. - because we broadcast, we don't need to do the polling in Wait anymore. - ensure Run doesn't start multiple times so that we don't have to worry about concurrent Close with multiple Runs. - hold the lock while we start workers so that a concurrent Close with Run can't decide that there's nothing started and exit and then have Run start things. - make sure to poll the closed/context channels through loops or at the start of Run calls in case Close happens first. - these polls should be under a mutex because they have a default case which makes it possible to schedule such that Close hasn't executed the channel close so it starts more work. - cancel a local Run context when it's going to exit to make sure that any retainPieces calls have a canceled context. - hopefully enough comments to both check my work and help readers digest what's going on. Change-Id: Ida0e226a7e01e8ae64fa2c59dd5a84b04bccfbd7 * use the retain error class Change-Id: I1511eaef135f98afd57b878e997e4c8a0d11cafc * concurrency fixes again - forgot to update the gc test to use the old Wait api. - we need to drop the lock while we wait for the workers to exit, because they may be blocked on the condition variable - additionally, we need to broadcast when we close the signal channel because the state changed: they want to wake up and exit. Change-Id: I4204699792275260cd912f29aa73720f7d9b14b5 * undo my misguided rename Change-Id: I6baffe1eb0434e260212c485bbcc01bed3250881 * remove pollInterval * format paragraph more nicely * move skew calculation into retain pieces	2019-08-28 16:35:25 -04:00
Cameron	1f3537d4a9	storagenode/vouchers: remove storagenode vouchers (#2873 )	2019-08-26 19:35:19 +03:00
Cameron	3d9441999a	storagenode/orders: add archive cleanup to orders service (#2821 ) This PR introduces functionality for routine deletion of archived orders. The user may specify an interval at which to run archive cleanup and a TTL for archived items. During each cleanup, all items that have reached the TTL are deleted This archive cleanup job is combined with the order sender into a new combined orders service	2019-08-22 10:33:14 -04:00
Egon Elbre	9ec0ceddf3	pkg/revocation: ensure we close revocation databases (#2825 )	2019-08-20 18:04:17 +03:00
Isaac Hess	25154720bd	lib/uplink: remove redis and bolt dependencies (#2812 ) * identity: remove redis and bolt dependencies * identity: move revDB creation to main files	2019-08-19 16:10:38 -06:00
Maximillian von Briesen	d83a965139	storagenode/piecestore: Add retain service on storagenode (#2785 ) Add retain service on storagenode. This service runs retain jobs that have been queued by the storagenodes. Rather than running retain jobs during the grpc Retain() call, the grpc call queues a retain job to the retain service and returns immediately afterwards, removing a significant bottleneck in garbage collection.	2019-08-19 14:52:47 -04:00

1 2 3

115 Commits