storj

Author	SHA1	Message	Date
Egon Elbre	64fb2d3d2f	Revert "dbutil: statically require all databases accesses to use contexts" This reverts commit `8e242cd012`. Revert because lib/pq has known issues with context cancellation. These issues need to be resolved before these changes can be merged. Change-Id: I160af51dbc2d67c5449aafa406a403e5367bb555	2020-01-15 07:28:00 +00:00
JT Olio	8e242cd012	dbutil: statically require all databases accesses to use contexts this will allow for some nice runtime analysis down the road. also, this allows for wrapping database handles in a way that can interact with these contexts requires https://review.dev.storj.io/c/storj/dbx/+/514 Change-Id: Ib087b7cd73296dd2c1e0331314da34d861f61d2b	2020-01-14 18:20:47 -05:00
Egon Elbre	5af1f9e6d1	storagenode/{piecestore,storagenodedb}: use context in queries In endpoint.saveOrder, ensure we always try to save orders such that they can be settled. Change-Id: Ic9ac8f4bf684d8493282912ca97f386c1762e364	2020-01-14 20:27:26 +00:00
Egon Elbre	d80cfeb4ab	storagenode: ensure we don't eat the underlying error When error is formatted using %v it's not possible to check whether the error was caused by a context cancellation. Change-Id: Ia77dfb0817e49d9a7b168c12a6300d131007d0ee	2020-01-14 20:26:23 +00:00
Egon Elbre	23e2664327	storagenode/inspector: return rpcstatus Change-Id: I7e13b6dc8c9c3f4550f77885b1ef99662f5a5727	2020-01-14 20:24:46 +00:00
Egon Elbre	ff267168c5	private/migrate: add ctx argument Change-Id: I3d65912d89261386413c494c7ed1576fed4dcaf4	2020-01-13 15:52:26 +02:00
Egon Elbre	c7b846589e	private/dbutil/sqliteutil: add ctx argument Change-Id: If1caa9cde746817e62cae32a152eeec81959129c	2020-01-13 15:03:30 +02:00
Qweder93	cf19e141e0	storagenode/notifications: return unread count and fix json id, list-notifications method fix Change-Id: Ic56beac1f388d91a29c9e8266161715d09364520	2020-01-09 17:56:00 +00:00
Yingrong Zhao	ebeee58001	storagenode/gracefulexit: remove satellite entry when node fail precondition Change-Id: I3c215170f10f0053e4f8718ee31d64d93f52ec80	2020-01-08 18:11:58 +00:00
Egon Elbre	082ec81714	uplink: move to storj.io/uplink (#3746 )	2020-01-08 15:40:19 +02:00
paul cannon	0c88a7b475	private/migrate: use transactional helpers and not Begin() This code needs to work against cockroachDB, so transactions must be retried when a retryable error is returned. This change puts migrate transactions into the dbutil.WithTx transactional helpers to achieve this in the easiest way. Change-Id: Ib930e82d55cb0257357a222ce9131e6e53372c03	2020-01-07 18:25:38 +00:00
Egon Elbre	f41d440944	all: reduce number of log messages Remove starting up messages from peers. We expect all of them to start, if they don't, then they should return an error why they don't start. The only informative message is when a service is disabled. When doing initial database setup then each migration step isn't informative, hence print only a single line with the final version. Also use shorter log scopes. Change-Id: Ic8b61411df2eeae2a36d600a0c2fbc97a84a5b93	2020-01-06 19:03:46 +00:00
Egon Elbre	2680bae88c	private/testplanet: remove dependency to uplink Remove direct dependency on uplink.RSConfig, this simplifies moving the config file without introducing weird dependencies. Change-Id: I7fd2a145401e0205d7047631df9d2810241efeec	2020-01-02 09:40:46 +00:00
Stefan Benten	758fe35aba	storagenode/orders: adding jitter to sending (#3725 )	2019-12-30 21:35:26 +01:00
Egon Elbre	6615ecc9b6	common: separate repository Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a	2019-12-27 14:11:15 +02:00
Fadila	115b8b0fc8	storagenode/piecestore: delete several pieces in a single request This is part of the deletion performance improvement. See https://storjlabs.atlassian.net/browse/V3-3349 Change-Id: Idcd83a302f2bd5cc3299e1a4195a7e177f452599	2019-12-27 10:58:04 +00:00
Isaac Hess	7d1e28ea30	storagenode: Include trash space when calculating space used This commit adds functionality to include the space used in the trash directory when calculating available space on the node. It also includes this trash value in the space used cache, with methods to keep the cache up-to-date as files are trashed, restored, and emptied. As part of the commit, the RestoreTrash and EmptyTrash methods have slightly changed signatures. RestoreTrash now also returns the keys that were restored, while EmptyTrash also returns the total disk space recovered. Each of these changes makes it possible to keep the cache up-to-date and know how much space is being used/recovered. Also changed is the signature of PieceStoreAccess.ContentSize method. Previously this method returns only the content size of the blob, removing the size of any header data. This method has been renamed `Size` and returns both the full disk size and content size of the blob. This allows us to only stat the file once, and in some instances (i.e. cache) knowing the full file size is useful. Note: This commit simply adds the trash size data to the piece size data we were already collecting. The piece size data is not accurate for all use-cases (e.g. because it does not contain piece header data); however, this commit does not fix that problem. Now that the ContentSize (Size) method returns the full size of the file, it should be easier to fix this problem in a future commit. Change-Id: I4a6cae09e262c8452a618116d1dc66b687f59f85	2019-12-23 19:07:03 -07:00
Egon Elbre	d55288cf68	pkg/rpc: replace methods with direct calls to pb Change-Id: I8bd015d8d316a2c12c1daceca1d9fd257f6f57bc	2019-12-22 17:12:43 +02:00
Egon Elbre	006baa9ca6	pkg/rpc: remove drpc aliases We need to split up pb package, which means we cannot have a core package that depends on them. Change-Id: I7f4f6fd82f89a51a9b2ad08bf2b1207253b8a215	2019-12-22 16:58:08 +02:00
Yingrong Zhao	6e71591b9b	satellitedb;storagenodedb: remove unnecessary use of DB transactions in graceful exit Change-Id: Ief0a28c6750c130896b48bfebfbea7fb3caa810f	2019-12-20 21:24:38 +00:00
Qweder93	e47ec84dee	storagenode notification service and api added Change-Id: I36898d7c43e1768e0cae0da8d83bb20b16f0cdde	2019-12-20 18:42:23 +00:00
Egon Elbre	afe05edff2	{storagenode,satellite}/gracefulexit: ensure workers finish their work Fixes a data race caused by not waiting for workers to finish before shutting down. Currently this ended up failing logging because it was closed when test tried to write to it. Change-Id: I074045cd83bbf49e658f51353aa7901e9a5d074b	2019-12-17 17:21:52 +02:00
Egon Elbre	7a36507a0a	private/testcontext: ensure we call cleanup everywhere Change-Id: Icb921144b651611d78f3736629430d05c3b8a7d3	2019-12-17 14:16:09 +00:00
littleskunk	08947e177d	storagenode/garbagecollection: enable in production Change-Id: I627b7a37ca4a85eb19936ca2c7ca907d7cc63f5b	2019-12-16 22:44:04 +00:00
Vitalii Shpital	53d9bc4530	storagenode/notifications: db created (#3707 )	2019-12-16 19:59:01 +02:00
littleskunk	c2ea75208f	storagenode/orderdb: fix db lock Change-Id: Id1add0ba7ae1b20bd98099bd4d3aff0fcfdd90c9	2019-12-15 23:41:22 +01:00
Andrew Harding	cb89496569	storagenode/trust: wire up list into pool - also updated ping chore to pick up trust changes - fixed small typo in blueprint - fixed flags for storj-sim - wired up changes to testplanet Change-Id: I02982f3a63a1b4150b82a009ee126b25ed51917d	2019-12-13 20:32:50 +00:00
Andrew Harding	2867b6a466	storagenode/trust: list implementation Change-Id: Ia886e84990efaf2c783f199741552a7a8ff41d4e	2019-12-12 17:15:47 +00:00
Jeff Wendling	fb8e78132d	storagenodedb: reenable utccheck in tests Change-Id: If7d64dd4ae58e4b656ff9122ae3195b2a5173cb3	2019-12-10 23:17:14 +00:00
Andrew Harding	5ed9373dba	storagenode/trust: source entry cache Implements a cache that can persist trust entries returned by sources Change-Id: I72579e42e9f72d34a54b7510c9b665844f187314	2019-12-10 21:45:01 +00:00
Andrew Harding	715d97e3d8	storagenode/trust: rule and excluders Change-Id: I84ed542e1ef3cfaa5cc3d3f631cdc295393bf978	2019-12-10 21:08:12 +00:00
Cameron Ayer	6fae361c31	replace planet.Start in tests with planet.Run planet.Start starts a testplanet system, whereas planet.Run starts a testplanet and runs a test against it with each DB backend (cockroach compat). Change-Id: I39c9da26d9619ee69a2b718d24ab00271f9e9bc2	2019-12-10 16:55:54 +00:00
Andrew Harding	eb52ac623b	storagenode/trust: source implementations Change-Id: Ie36e79cc15257db88051f63e5b9463fd9d7b4736	2019-12-09 20:00:02 +00:00
Andrew Harding	7d0aadfeca	storagenode/trust: satellite URL implementation Satellite URL is a stricter form of the STORJ Node URL. It requires both the ID and port specifier. Change-Id: I7fd302064f864c1de8240a7915bf5263b898dfd1	2019-12-09 17:05:57 +00:00
littleskunk	9d1faeee58	storagenode/garbagecollection: increase MaxTimeSkew to be higher than satellite MaxCommitInterval Change-Id: I86f8d0b44bea3aa005ff26d52588611c59df5e9a	2019-12-09 16:03:55 +00:00
Ethan Adams	9420fa9fc5	satellite/gracefulexit: Add graceful exit completed/failed receipt verification to satellite CLI (#3679 )	2019-12-03 17:09:39 -05:00
Ivan Fraixedes	42c61138e8	storage: Improve doc comments delete methods (#3591 ) Improve the documentation of several methods involved in the delete operation to make clear their behavior without having to inspect their logic.	2019-12-02 12:18:20 +01:00
Ivan Fraixedes	bf97ef06fc	storagenode: Add new endpoint to receive satellite requests for… (#3590 ) * pkg/pg: Add new service function storage node Add a new service function to the storage node piece store for deleting pieces when satellites request them. * storagenode/piecestore: Add endpoint to delete piece Add a new endpoint to receive from trusted satellites to delete a piece. * private/testplanet: Fix storagenode mock Add to the storagenode mock the new endpoint method. * proto.lock: Update it with the last protbuff changes * storagenode/piecestore: Reuse test piece upload Extract the repeated logic from several tests functions for uploading a test piece to a test helper function. * uplink/piecestore: Implement client side method Implement the client side method of the new piecestore RPC function. * storagenode/piecestore: Add test DeletePiece endpoint Implement a test for the DeletePiece new endpoint method.	2019-11-26 18:47:19 +01:00
Yingrong Zhao	66f1a1680f	add completion receipt to exit-status cli command on storage node (#3650 )	2019-11-26 12:32:26 -05:00
Isaac Hess	56f8fd2dd7	storagenode/pieces: Add EmptyTrash functionality (#3640 ) * storagenode/pieces: Add EmptyTrash functionality * storagenode/pieces: Fix err * storagenode/pieces: Fix lint	2019-11-26 09:25:21 -07:00
Vitalii Shpital	038ac58600	web/storagenode: minimal allowed version view implemented (#3583 )	2019-11-26 18:08:24 +02:00
littleskunk	8842b0c252	storagenode/gracefulexit: improve logging (#3633 )	2019-11-21 21:10:02 -05:00
Rafael Antonio Ribeiro Gomes	2739771761	storagenode: add bandwidth metrics (#3623 ) * storagenode: add bandwidth metrics * remove unecessary metric	2019-11-21 16:51:40 -03:00
Isaac Hess	6aeddf2f53	storagenode/pieces: Add Trash and RestoreTrash to piecestore (#3575 ) * storagenode/pieces: Add Trash and RestoreTrash to piecestore * Add index for expiration trash	2019-11-20 09:28:49 -07:00
Kaloyan Raev	6d728d6ea0	storagenode/collect: delete piece 24 hours after expiration (#3613 )	2019-11-20 17:02:57 +02:00
Vitalii Shpital	61c8bcc9a6	web/storagenode: egress chart implemented (#3574 )	2019-11-20 16:37:57 +02:00
Rafael Antonio Ribeiro Gomes	da39c71d35	storagenode: add new metric satellite.request (#3610 ) * storagenode: add new metric satellite.request * storagenode: metrics fixed * switch from Counter to Meter	2019-11-19 18:11:31 -03:00
Ivan Fraixedes	8e1e4cc342	piecestore: Fix invalid comment and typos (#3604 )	2019-11-19 16:30:48 +01:00
Nikolai Siedov	24318d74b3	storagenode/console: show satellite url in satellite selection (#3602 )	2019-11-19 14:16:56 +02:00
Nikolai Siedov	0d35505fe1	SNOboard/console: router changed for gorillaMux, caching added (#3577 )	2019-11-15 14:36:43 +02:00
Egon Elbre	ee6c1cac8a	private: rename internal to private (#3573 )	2019-11-14 21:46:15 +02:00
Egon Elbre	1a54007f1c	storagenode/storagenodedb: dont log opening of each database (#3571 )	2019-11-14 17:08:16 +02:00
Egon Elbre	1e64006e32	lint: add staticcheck as a separate step (#3569 )	2019-11-14 10:31:30 +02:00
paul cannon	bd89f51c66	Keep v0pieceinfo database isolated (#3364 ) * put TestCreateV0 back in StoreForTest * avoid direct handles to V0 pieceinfo db * type mismatch fix * use storage.Blobs interface in store_test.go ..instead of filestore.Store. this will allow filestore.Store to become unexported. * unexport filestore.Store rename it to blobStore. things should use the storage.Blobs interface instead. changes in this commit are purely mechanical (made through the "refactor" tool in Gocode followed by search/replace on the word "Store" within the storage/filestore/ directory). * kill filestore.StoreForTest now that filestore.blobStore is unexported, there isn't a need for a specialized wrapper type. this (not coincidentally) also makes it possible for the WriterForFormatVersion() method on storagenode/pieces.StoreForTest to work, without requiring everything to wrap the store.blobs attribute in a filestore.StoreForTest, which was impractical.	2019-11-13 13:15:31 -06:00
Yingrong Zhao	db8294cfba	storagenode/gracefulexit: get hash and limit using original piece ID (#3557 )	2019-11-13 12:45:55 -05:00
Jeff Wendling	ebcd37c572	storagenode/contact: fix connection leak with contact checkin Change-Id: If86002557144d5d8dbff939d2b6a2dfec6537577	2019-11-06 18:00:09 +00:00
littleskunk	7eb6724c92	logging: unify logging around satellite ID, node ID and piece ID (#3491 ) * logging: unify logging around satellite ID, node ID and piece ID * unify segment index	2019-11-05 22:04:07 +01:00
Maximillian von Briesen	257d3946d5	storagenode/gracefulexit: allow storagenodes to concurrently transfer pieces for graceful exit (#3478 )	2019-11-05 10:33:44 -05:00
Jennifer Li Johnson	11f0ea3258	5s (#3477 )	2019-11-04 16:20:31 -05:00
Jennifer Li Johnson	aa7d15a365	storagenode/contact: exponential backoff retries for pinging Satellites (#3372 )	2019-11-04 16:03:21 -05:00
Jess G	5abb91afcf	satellite: change the Peer name to Core (#3472 ) * change satellite.Peer name to Core * change to Core in testplanet * missed a few places * keep shared stuff in peer.go to stay consistent with storj/docs	2019-11-04 11:01:02 -08:00
Isaac Hess	4d26d0a6a6	storagenode/pieces: Add migration from v0 piece to v1 piece (#3401 )	2019-11-04 17:59:45 +01:00
Egon Elbre	87687938d1	storagenode/contact: fix panic in ping satellites (#3447 )	2019-11-01 16:20:53 +01:00
Ethan Adams	43103ae13f	lower storage node counts in tests (#3427 )	2019-10-31 10:57:54 -04:00
Jess G	4d85b11574	satellite/contact: improve errors in contact endpoints (#3356 ) * improve errors in satellite contact endpoints * add changes per CR comments * update pingback method so it still updates node table * fix err and returns * fix zap logging to be better	2019-10-30 11:57:21 -07:00
Natalie Villasana	4878135068	satellite/gracefulexit, storagenode/gracefulexit: add timeouts (#3407 )	2019-10-30 13:40:57 -04:00
Natalie Villasana	5453886231	satellite/repair, uplink/ecclient: remove unused expiration arg from ec.Repair and ec.putPiece (#3416 )	2019-10-30 11:35:00 -04:00
Yingrong Zhao	3ee0b89f8f	storagenode/gracefulexit: delete pieces when receive Delete or Completed message from satellite (#3406 )	2019-10-30 10:46:56 -04:00
Egon Elbre	65a8e0bcbc	{satellite,storagenode}/gracefulexit: clearer log messages (#3413 )	2019-10-30 10:21:27 +02:00
Isaac Hess	1defd4dbfe	storagenode/piecestore: Respect config.MaxConcurrentRequests for drpc (#3402 )	2019-10-28 13:12:49 -06:00
Ethan Adams	5b0398a718	storagenode/gracefulexit: Exclude finished exits from chore/worker processing. Fix update status bug (#3399 )	2019-10-28 13:59:45 -04:00
Egon Elbre	93353df4d6	internal/sync2: make Fence accept context (#3393 )	2019-10-28 16:04:31 +02:00
paul cannon	1469f7f41f	storagenode/contact: wait for UpdateSelf before start (#3332 ) When the contact chore starts running before the monitor service has provided any useful capacity data, the first outgoing contact has not-very-helpful data for the satellite. This change causes the contact chore to wait until capacity data is available. The wait should be quite short in all reasonable cases: even when a node starts with a lot of stored pieces and no cached spaceUsedDB data, new data will have been calculated and cached by the call to `peer.Storage2.CacheService.Init(ctx)` in `storagenode.cmdRun()` before `peer.Run(ctx)`. Change-Id: Ibc26d5c1fc10a23006c00bc3f13ff6cf71f8bf1d	2019-10-26 12:16:25 -05:00
Jeff Wendling	ed48e74e20	gracefulexit: fix build for drpc (#3387 ) Change-Id: I335e9f8991a10c9e8a0737bc7c9ea3f04cbe2546	2019-10-26 15:53:35 +02:00
Maximillian von Briesen	6df4d7bc73	storagenode/gracefulexit + satellite/gracefulexit: add storagenode-side transfer validation (#3371 ) * Make the exiting node check piece hashes, piece IDs, and piece hash signatures before relaying successful transfer data to the satellite. * Enable immediate graceful exit failure for "successful" transfers that fail satellite-side validation. * Move transfer piece logic in storagenode worker to separate function (to make the worker easier to understand)	2019-10-25 13:16:20 -04:00
Yingrong Zhao	fa1ac24e19	satellite/gracefulexit: add failure threshold check (#3329 ) * add overall failure percentage check and inactive time frame check before sending a response to sno * update comment * delete node from transfer queue if it has been inactive for too long * fix linting error * add test config value * fix nil pointer * add config value into testplanet * add unit test for overall failure threshold * move timeframe threshold to chore * update protolock * add chore test * add per peiece failure count logic * change config name from EndpointMaxFailures to MaxFailuresPerPiece * address comments * fix linting error * add error handling for no row returned from progress table * fix test for graceful exit chore on storagenode * fix typo InActive -> Inactive * improve readability for failure threshold calculation * update config lock * change error handling for GetProgress in graceful exit endpoint on the satellite side * return proper rpc error in endpoint * add check in chore test for checking finish timestamp and queue	2019-10-24 12:24:42 -04:00
Isaac Hess	75412e54e5	storagenode/piecestore: Rename liveGRPCRequests back to liveRequests (#3354 )	2019-10-23 13:43:43 -06:00
Isaac Hess	14c7648530	storagenode/piecestore: Only limit grpc requests (#3342 )	2019-10-23 10:14:02 -06:00
JT Olio	2c6fa3c5f8	pkg/rpc: remove read/write deadlines as a mechanism for request timeouts (#3335 ) libuplink was incorrectly setting timeouts to 10 seconds still, but should have been at least 10 minutes. the order sender was setting them to 1 hour. we don't want timeouts in uplink-side logic as it establishes a minimum rate on tcp streams. instead of all of this, just use tcp keep alive. tcp keep alive packets are sent every 15 seconds and if the peer stops responding the connection dies. this is enabled by default with go. this will kill tcp connections when they stop working. Change-Id: I3d7ad49f71950b3eb43044eedf4b17993116045b	2019-10-22 17:57:24 -06:00
Ethan Adams	3e0d12354a	storagenode/gracefulexit: Implement storage node graceful exit worker - part 1 (#3322 )	2019-10-22 16:42:21 -04:00
paul cannon	5e78f4000b	storagenode/pieces: remove old comment (#3334 ) the reservedSpace member it's talking about was removed quite a while ago. Change-Id: I28433b2a44467376a408453d875c389656347cab	2019-10-22 12:51:51 +03:00
Bryan White	f468816f13	{internal/version,versioncontrol,cmd/storagenode-updater}: add rollout to storagenode updater (#3276 )	2019-10-21 12:50:59 +02:00
Bryan White	243ba1cb17	{versioncontrol,internal/version,cmd/*}: refactor version control (#3253 )	2019-10-20 09:56:23 +02:00
Yingrong Zhao	e5099f31f3	add context.Clean and correct rpc error code (#3295 )	2019-10-16 13:50:01 -04:00
Isaac Hess	ed6b88a12d	piecestore: update usage before completing upload (#3286 ) The upload code currently updates the usage in a deferred call to saveOrder(). The consequence is that in the success case, the RPC is completed before the usage has been updated. This change repurposes the deferred call to update usage in the failure case, while explicitly updating the usage before completing the RPC. This fixes some test flakiness when using dRPC. gRPC waits until the final status is written before a Recv call completes, and the final status is written by the server after the handler function has exited. In practice this means that the client is blocked until the defer call is also finished. So this change will not change performance at all. It has two advantages: (1) It fixes test flakiness and, more importantly: (2) reduces the chances that someone will accidentally write a flaky test in the future	2019-10-15 20:17:17 -06:00
Yingrong Zhao	87e3764390	storagenode/cmd: add exit-status command for graceful exit (#3264 ) * add exit-status command * remove todo and fix format * fix status display * change startExit to exit progress * fix linting error * add successful column in exit progress * fix test * remove extra new line * fix TYPOS * format the percentage better	2019-10-15 18:07:32 -04:00
Andrew Harding	4962c6843e	piecestore: fix test flakiness around upload/download usage tracking (#3282 )	2019-10-15 11:22:15 -06:00
Simon Guindon	abb5b6c499	storagenode/piecestore: Fix to ignore both gRPC and dRPC EOF errors. (#3274 ) * Fix to ignore both gRPC and dRPC EOF errors. * Fix to ignore both gRPC and dRPC EOF errors.	2019-10-15 12:13:53 -04:00
Ethan Adams	1ad2ba7e3e	storagenode/gracefulexit: Add graceful exit chore and worker. (#3262 ) Adds graceful exit chore and worker for V3-2614	2019-10-15 11:29:47 -04:00
Jennifer Li Johnson	b185dbbee2	satellite/discovery: remove discovery related code (#3175 )	2019-10-14 10:57:01 -04:00
littleskunk	96aeedcdee	OrderLimit/GracePeriod: Increase time window from 1h to 24h (#3255 ) * OrderLimit/GracePeriod: Increase time window from 1h to 24h * update satellite config lock	2019-10-13 17:40:24 +02:00
JT Olio	6ede140df1	pkg/rpc: defeat MITM attacks in most cases (#3215 ) This change adds a trusted registry (via the source code) of node address to node id mappings (currently only for well known Satellites) to defeat MITM attacks to Satellites. It also extends the uplink UI such that when entering a satellite address by hand, a node id prefix can also be added to defeat MITM attacks with unknown satellites. When running uplink setup, satellite addresses can now be of the form 12EayRS2V1k@us-central-1.tardigrade.io (not even using a full node id) to ensure that the peer contacted is the peer that was expected. When using a known satellite address, the known node ids are used if no override is provided.	2019-10-12 14:34:41 -06:00
Isaac Hess	e567f27634	storagenode/piecestore: Change test to use ioutil.ReadAll to attempt to reduce test flake (#3250 )	2019-10-11 15:57:59 -06:00
Cameron	d17be58237	remove random sleep in storagenode contact (#3243 )	2019-10-11 16:44:18 -04:00
Vitalii Shpital	78a71ad3b6	web/storagenode: node status updated (#3220 )	2019-10-11 19:28:47 +03:00
Yingrong Zhao	743a0fc38b	storagenode/cmd: create start graceful exit CLI (#3202 )	2019-10-11 09:58:12 -04:00
littleskunk	d5b2e1ef89	storagenode/signature: Reject uploads with a timestamp too far in the future (#3194 )	2019-10-08 13:09:46 +02:00
JT Olio	37491d0d32	storagenode: embed the console into the binary and makefile (#3164 ) * web/storagenode: add package-lock.json * storagenode: compile console into binary	2019-10-08 10:52:19 +02:00
Jennifer Li Johnson	7ceaabb18e	Delete Bootstrap and Kademlia (#2974 )	2019-10-04 16:48:41 -04:00
Jeff Wendling	64e43e555e	pkg/rpc: return context error if ready after DialContext fails the net package does not make it easy to know if DialContext failed because the context was done. it's important for some of our tests that canceled contexts are detected as such, so we accept the small race that's arguably correct (the context must be canceled asynchronously) to ensure we always return the context error if available. Change-Id: I058064d5c666e5353b74fb5bd300bf7abe537ff5	2019-10-04 20:09:00 +00:00
Yaroslav Vorobiov	a11619e7f3	storagenode/console: use bandwidth monthly summary (#3183 )	2019-10-04 09:29:25 -06:00
Yaroslav Vorobiov	4824ecdb8d	storagenode/console: use bytes for remaining info (#3186 )	2019-10-04 18:17:28 +03:00
littleskunk	b2e328f118	storagenode/dashboard: update online status (#3168 )	2019-10-03 20:31:39 +02:00
Maximillian von Briesen	08ed50bcaa	satellite/metainfo: add commit interval to prevent long delays between order limit creation and segment commit (#3149 )	2019-10-01 12:55:02 -04:00
Bill Thorp	89c59d06f9	storagenode/storagenodedb: add SQL receiver logic for graceful exit (#3067 ) * added graceful exit db methods	2019-10-01 10:34:03 -04:00
Jennifer Li Johnson	755cbd4dce	storagenode/main: map aliases for kademlia config values (#3118 )	2019-09-30 19:33:00 -04:00
Jennifer Li Johnson	29b96a666b	internal/testplanet: fix conn leak (#3132 )	2019-09-27 09:47:57 -06:00
Isaac Hess	2c5e169888	storagenode/storagenodedb: Vacuum info.db to prepare for splitting storagenodedbs (#3134 )	2019-09-27 07:55:51 -06:00
Jeff Wendling	098cbc9c67	all: use pkg/rpc instead of pkg/transport all of the packages and tests work with both grpc and drpc. we'll probably need to do some jenkins pipelines to run the tests with drpc as well. most of the changes are really due to a bit of cleanup of the pkg/transport.Client api into an rpc.Dialer in the spirit of a net.Dialer. now that we don't need observers, we can pass around stateless configuration to everything rather than stateful things that issue observations. it also adds a DialAddressID for the case where we don't have a pb.Node, but we do have an address and want to assert some ID. this happened pretty frequently, and now there's no more weird contortions creating custom tls options, etc. a lot of the other changes are being consistent/using the abstractions in the rpc package to do rpc style things like finding peer information, or checking status codes. Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412	2019-09-25 15:37:06 -06:00
Isaac Hess	580e511b4c	storagenode/storagenodedb: Migrate to separate dbs (#3081 ) * storagenode/storagenodedb: Migrate to separate dbs * storagenode/storagenodedb: Add migration to drop versions tables * Put drop table statements into a transaction. * Fix CI errors. * Fix CI errors. * Changes requested from PR feedback. * storagenode/storagenodedb: fix tx commit	2019-09-23 12:36:46 -07:00
Jennifer Li Johnson	d2502bb51b	Adds tests for kad replacement and restores kad operator configs (#3094 ) * test that all nodes can check in with all satellites * keep kademlia config * add untrusted satellite test * use getversion * remove kademlia config changes in test-sim-backwards.sh * add kademlia flags back to storj-sim storagenode * reset kademlia flags in storagenode entrypoint	2019-09-20 16:02:23 -04:00
Jennifer Li Johnson	724bb44723	Remove Kademlia dependencies from Satellite and Storagenode (#2966 ) What: cmd/inspector/main.go: removes kad commands internal/testplanet/planet.go: Waits for contact chore to finish satellite/contact/nodesservice.go: creates an empty nodes service implementation satellite/contact/service.go: implements Local and FetchInfo methods & adds external address config value satellite/discovery/service.go: replaces kad.FetchInfo with contact.FetchInfo in Refresh() & removes Discover() satellite/peer.go: sets up contact service and endpoints storagenode/console/service.go: replaces nodeID with contact.Local() storagenode/contact/chore.go: replaces routing table with contact service storagenode/contact/nodesservice.go: creates empty implementation for ping and request info nodes service & implements RequestInfo method storagenode/contact/service.go: creates a service to return the local node and update its own capacity storagenode/monitor/monitor.go: uses contact service in place of routing table storagenode/operator.go: moves operatorconfig from kad into its own setup storagenode/peer.go: sets up contact service, chore, pingstats and endpoints satellite/overlay/config.go: changes NodeSelectionConfig.OnlineWindow default to 4hr to allow for accurate repair selection Removes kademlia setups in: cmd/storagenode/main.go cmd/storj-sim/network.go internal/testplane/planet.go internal/testplanet/satellite.go internal/testplanet/storagenode.go satellite/peer.go scripts/test-sim-backwards.sh scripts/testdata/satellite-config.yaml.lock storagenode/inspector/inspector.go storagenode/peer.go storagenode/storagenodedb/database.go Why: Replacing Kademlia Please describe the tests: • internal/testplanet/planet_test.go: TestBasic: assert that the storagenode can check in with the satellite without any errors TestContact: test that all nodes get inserted into both satellites' overlay cache during testplanet setup • satellite/contact/contact_test.go: TestFetchInfo: Tests that the FetchInfo method returns the correct info • storagenode/contact/contact_test.go: TestNodeInfoUpdated: tests that the contact chore updates the node information TestRequestInfoEndpoint: tests that the Request info endpoint returns the correct info Please describe the performance impact: Node discovery should be at least slightly more performant since each node connects directly to each satellite and no longer needs to wait for bootstrapping. It probably won't be faster in real time on start up since each node waits a random amount of time (less than 1 hr) to initialize its first connection (jitter).	2019-09-19 15:56:34 -04:00
Jess G	93788e5218	remove kademlia: create upsert query to update uptime (#2999 ) * create upsert query for check-in method * add tests * fix lint err * add benchmark test for db query * fix lint and tests * add a unit test, fix lint * add address to tests * replace print w/ b.Fatal * refactor query per CR comments * fix disqualified, only set if null * fix query * add version to updatecheckin query * fix version * fix tests * change version for tests * add version to tests * add IP, add transport, mv unit test * use node.address as arg * add last ip * fix lint	2019-09-19 11:37:31 -07:00
Simon Guindon	a2b1e9fa95	storagenode/storagenodedb: refactor both data access objects and migrations to support multiple DB connections (#3057 ) * Split the info.db database into multiple DBs using Backup API. * Remove location. Prev refactor assumed we would need this but don't. * Added VACUUM to reclaim space after splitting storage node databases. * Added unique names to SQLite3 connection hooks to fix testplanet. * Moving DB closing to the migration step. * Removing the closing of the versions DB. It's already getting closed. * Swapping the database connection references on reconnect. * Moved sqlite closing logic away from the boltdb closing logic. * Moved sqlite closing logic away from the boltdb closing logic. * Remove certificate and vouchers from DB split migration. * Removed vouchers and bumped up the migration version. * Use same constructor in tests for storage node databases. * Use same constructor in tests for storage node databases. * Adding method to access underlining SQL database connections and cleanup * Adding logging for migration diagnostics. * Moved migration closing database logic to minimize disk usage. * Cleaning up error handling. * Fix missing copyright. * Fix linting error. * Add test for migration 21 (#3012) * Refactoring migration code into a nicer to use object. * Refactoring migration code into a nicer to use object. * Fixing broken migration test. * Removed unnecessary code that is no longer needed now that we close DBs. * Removed unnecessary code that is no longer needed now that we close DBs. * Fixed bug where an invalid database path was being opened. * Fixed linting errors. * Renamed VersionsDB to LegacyInfoDB and refactored DB lookup keys. * Renamed VersionsDB to LegacyInfoDB and refactored DB lookup keys. * Fix migration test. NOTE: This change does not address new tables satellites and satellite_exit_progress * Removing v22 migration to move into it's own PR. * Removing v22 migration to move into it's own PR. * Refactored schema, rebind and configure functions to be re-useable. * Renamed LegacyInfoDB to DeprecatedInfoDB. * Cleaned up closeDatabase function. * Renamed storageNodeSQLDB to migratableDB. * Switched from using errs.Combine() to errs.Group in closeDatabases func. * Removed constructors from storage node data access objects. * Reformatted usage of const. * Fixed broken test snapshots. * Fixed linting error.	2019-09-18 12:17:28 -04:00
Simon Guindon	91d54af705	Add satellites database business objects. (#3055 ) * Add satellites database business objects. * Fixed linting error.	2019-09-16 13:54:53 -04:00
Jess G	d3ef574b20	pkg/pb: minor changes to contact.proto (#3048 ) * minor fixes to contact proto * simply and rm nodeAddr object from client	2019-09-13 19:37:32 -05:00
Egon Elbre	ca058e606f	storagenode/orders: fix data race in settle (#3042 )	2019-09-13 15:50:39 +03:00
Cameron	44bcdd222f	storagenode/contact: test node info is updated (#3039 )	2019-09-13 07:53:48 -04:00
Jeff Wendling	0dcbd3dc08	bootstrap/satellite/certificate/storagenode: register drpc services Change-Id: Id29f14b76a8c9cb2be31001b9a7a4356a4bda183	2019-09-12 15:09:46 -06:00
Cameron	ab1147afb6	storagenode/pieces: fix race condition in cache service (#2972 ) * add mutex lock to PersistCacheTotals, move lock around copyCacheTotals to inside function	2019-09-12 12:42:39 -04:00
Egon Elbre	e5ac95b6e9	storagenode/inspector: fix TestInspectorStats flakyiness by waiting for requests to be handled (#3018 )	2019-09-12 02:26:55 -07:00
Yingrong Zhao	9f2f1527c5	storagenode/storagenodedb: add new tables for graceful exit (#3008 ) * add database schema * add migration * change table name and update blueprint	2019-09-11 18:57:53 -04:00
Natalie Villasana	aa3567187e	satellite/audit: worker now verifies and reverifies (#2965 )	2019-09-11 18:37:01 -04:00
paul cannon	c139ed8ea1	storagenode/console: remove kademlia (#2942 ) this is a trivial operation for storagenode/console, as it doesn't really need or use kademlia in the first place. What: Removes kademlia from storagenode/console Why: We are in the process of getting rid of kademlia, and this is one place where it's particularly easy. Please describe the tests: Existing tests exercise storagenode/console behavior; if they continue to work, everything here should be tested satisfactorily. Please describe the performance impact: None	2019-09-11 16:41:43 -04:00
Isaac Hess	7718802f0c	storagenode/storagenodedb: prepare for multiple databases (#3005 ) * Migrate test: prepare for multiple databases * Add copyright * Fix unused variables * Move data to testdata, split MultiDBSnapshot from MultiDBState	2019-09-11 14:31:46 -06:00
Isaac Hess	0b32572ae6	migrate: Allow work on separate dbs (#2996 )	2019-09-10 13:42:23 -06:00
Jess G	2fc4d61610	implement contact.checkin method (#2952 ) * implement contact.checkin method * add batching to update uptime checks * rm batching * rm other unneeded things * fix lint * fix unit test * changes per CR comments * couple more CR changes * add identity check into grpcOpt * fix lint * why do you fix the test * revert test change * stop contact chore for repair test * put node in cache * comment out contact chore. See what happens * Revert "comment out contact chore. See what happens" This reverts commit 2e45008e36a50e0a842ae455ac83de77093d4daa. * try stopping contact earlier * stop contact chore in uplink_test * replace self on chore with RoutingTable for access to latest node info Revert "stop contact chore in uplink_test" This reverts commit 302db70f4071112d1b9f7ee0279225ea12757723. * Revert "try stopping contact earlier" This reverts commit 806cc3b82f9d598899dafd83da9315a1cb0cb43c. * Revert "stop contact chore for repair test" This reverts commit dd34de1cfdfc09b972186c9ab9a4f1e822446b79.	2019-09-10 09:05:07 -07:00
Egon Elbre	a801fab66a	all: add archview annotations (#2964 )	2019-09-10 16:24:16 +03:00
Jennifer Li Johnson	3387750280	storagenode/contact: create chore for nodes to ping satellites (#2877 ) Creates a chore for nodes to announce themselves to their trusted satellites. Runs on startup and every hour thereafter	2019-09-06 12:14:03 -04:00
Yaroslav Vorobiov	c35ad5cbfc	storagenode/console: update api (#2969 )	2019-09-06 15:01:03 +03:00
paul cannon	9821a21e5c	satellite,storagenode,bootstrap: add contact service to peer (#2951 ) * satellite,storagenode,bootstrap: add contact service to peer	2019-09-04 15:04:18 -04:00
paul cannon	adfa16188b	pkg/contact: bare-bones service and endpoint (#2941 ) * pkg/contact: bare-bones service and endpoint * split contact package into satellite and node * use new contact protobuf types	2019-09-04 11:29:34 -07:00
Yaroslav Vorobiov	f7403f97b0	storagenode/storageusage: add summary, rename timestamp to interval_start (#2911 )	2019-09-04 17:13:43 +03:00
Yaroslav Vorobiov	758f7cb3dd	storagenode/bandwidth: remove bandwidth concerns from console, add satellite summary (#2923 )	2019-09-04 17:01:55 +03:00
Michal Niewrzal	ee614bf032	storagenode: add custom dial timeout for orders sending (#2939 )	2019-09-03 17:32:28 +02:00
Michal Niewrzal	3fbe31aada	storagenode: Increase order sending request timeout (#2930 )	2019-09-02 13:24:02 +02:00
littleskunk	9d1910cb2b	storagenode/orderarchive: Reduce TTL from 45 to 7 days (#2915 )	2019-08-29 22:38:09 +02:00
Ivan Fraixedes	537769d7fa	storagenode/orders: Don't return error Archiving unsent (#2903 ) Don't return error when archiving errors which aren't found in the DB because it causes Storage Node send orders cycle to stop. This was applied in the commit `e47b8ed131` but the last call to orders.Archive function was missed so the errors weren't returned when not found orders in the first call but they were returned in the second call. This commit address the second call for making handleBatches function never returns error on not found orders.	2019-08-29 20:22:22 +02:00
Egon Elbre	8a5db77e04	storagenode/retain: add comment (#2910 )	2019-08-29 19:42:17 +03:00
Yaroslav Vorobiov	b4d7d6778f	storagenode/reputation: add disqualified flag (#2862 )	2019-08-28 23:54:12 +03:00
Egon Elbre	62e3bf5b34	storagenode/retain: fix concurrency issues (#2828 ) * nicer flags * fix concurrency * add concurrent workers * initialize things * fix tests * close retain service * ensure we don't have workers working on the same satellite * ensure things compile * fix other compilation issues: * concurrency changes ran this with `go test -count=1000` and it passed all of them. - we add a closed channel so that we can select on it with context cancellation. - we put a once in so we only close the channel once. - every time the queue/running state changes, we have to broadcast because we may want to wake up N pending Wait calls or other concurrent workers. - because we broadcast, we don't need to do the polling in Wait anymore. - ensure Run doesn't start multiple times so that we don't have to worry about concurrent Close with multiple Runs. - hold the lock while we start workers so that a concurrent Close with Run can't decide that there's nothing started and exit and then have Run start things. - make sure to poll the closed/context channels through loops or at the start of Run calls in case Close happens first. - these polls should be under a mutex because they have a default case which makes it possible to schedule such that Close hasn't executed the channel close so it starts more work. - cancel a local Run context when it's going to exit to make sure that any retainPieces calls have a canceled context. - hopefully enough comments to both check my work and help readers digest what's going on. Change-Id: Ida0e226a7e01e8ae64fa2c59dd5a84b04bccfbd7 * use the retain error class Change-Id: I1511eaef135f98afd57b878e997e4c8a0d11cafc * concurrency fixes again - forgot to update the gc test to use the old Wait api. - we need to drop the lock while we wait for the workers to exit, because they may be blocked on the condition variable - additionally, we need to broadcast when we close the signal channel because the state changed: they want to wake up and exit. Change-Id: I4204699792275260cd912f29aa73720f7d9b14b5 * undo my misguided rename Change-Id: I6baffe1eb0434e260212c485bbcc01bed3250881 * remove pollInterval * format paragraph more nicely * move skew calculation into retain pieces	2019-08-28 16:35:25 -04:00
Bill Thorp	a250551b6d	storagenode/piecestore + uplink/piecestore: return `PieceHash` and original `OrderLimit` during GET_REPAIR (#2775 )	2019-08-26 14:57:41 -04:00
Egon Elbre	977472ed32	all: use fewer storage nodes to improve test memory usage (#2875 ) * storagenode/inspector: use less storage nodes * lib/uplinkc: use fewer storage nodes	2019-08-26 14:40:44 -04:00
Cameron	1f3537d4a9	storagenode/vouchers: remove storagenode vouchers (#2873 )	2019-08-26 19:35:19 +03:00
Maximillian von Briesen	65e2d2e711	storagenode/piecestore: ignore canceled errors on download (#2822 ) * ignore canceled errors on piecestore endpoint download	2019-08-23 11:16:43 -04:00
Cameron	3d9441999a	storagenode/orders: add archive cleanup to orders service (#2821 ) This PR introduces functionality for routine deletion of archived orders. The user may specify an interval at which to run archive cleanup and a TTL for archived items. During each cleanup, all items that have reached the TTL are deleted This archive cleanup job is combined with the order sender into a new combined orders service	2019-08-22 10:33:14 -04:00
Egon Elbre	00b2e1a7d7	all: enable staticcheck (#2849 ) * by having megacheck in disable it also disabled staticcheck * fix closing body * keep interfacer disabled * hide bodies * don't use deprecated func * fix dead code * fix potential overrun * keep stylecheck disabled * don't pass nil as context * fix infinite recursion * remove extraneous return * fix data race * use correct func * ignore unused var * remove unused consts	2019-08-22 13:40:15 +02:00
Jeff Wendling	14e36e4d60	storagenode/nodestats: fix issue on 32 bit platforms (#2841 ) * storagenode/nodestats: fix issue on 32 bit platforms time.Duration is an int64, so casting it down to an int can cause it to become negative, causing a panic. Change-Id: I33da7c29ddd59be60d8deec944a25f4a025902c7 * storagenode/nodestats: fix lint issue in test Change-Id: Ie68598d724d2cae0dc959d4877098a08f4eb9af7	2019-08-21 18:57:44 -06:00
Egon Elbre	2d69d47655	all: fix Error.New formatting (#2840 )	2019-08-21 19:30:29 +03:00
Simon Guindon	476fbf919a	storagenode/storagenodedb: refactor SQLite3 database connection initialization. (#2732 ) * Rebasing changes against master. * Added back withTx(). * Fix using new error type. * Moving back database initialization back into the struct. * Fix failing migration tests. * Fix linting errors. * Renamed database object names to be consistent. * Fixing linting error in imports. * Rebasing changes against master. * Added back withTx(). * Fix using new error type. * Moving back database initialization back into the struct. * Fix failing migration tests. * Fix linting errors. * Renamed database object names to be consistent. * Fixing linting error in imports. * Adding missing change from merge. * Fix error name.	2019-08-21 10:32:25 -04:00

1 2 3 4 5 ...

431 Commits