storj

Author	SHA1	Message	Date
Egon Elbre	4044b8eeea	storagenode/pieces: ensure chore is stopped before test ends Change-Id: Ibc26e156d13011bf0f91b4206980200a24d348fe	2020-02-21 10:14:44 +02:00
Cameron Ayer	3e70a893dd	storagenode/{piecestore, contact}: report capacity to satellites if below specific threshold Curently, storage nodes only report their capacity to satellites once per hour. If a node fills up, it will fail all uploads until the next contact cycle begins. With these changes, at the end of an upload we check whether the MinimumDiskSpace threshold has been passed. If so, trigger the monitor chore to update the node's capacity, then trigger the contact chore to report the new capacity to the satellites Change-Id: Ie6aadaade1e2c12c87e03f8ff9059a50121380a0	2020-02-18 15:42:48 -05:00
Egon Elbre	8f20085683	storagenode/piecestore: clearer client cancellation error message Change-Id: Ia0595f71eb3eb1c0f091e615652e2de376d5609d	2020-02-14 09:36:03 +00:00
Jeff Wendling	05a240050e	storagenode: monitor available space and bandwidth Change-Id: I5763597327c5b32982faab8910c136c6c8dc18c5	2020-02-13 07:07:29 +00:00
Michal Niewrzal	426c8eb31a	private/testplanet: add DeleteBucket method for uplink New method added to be able to delete easily bucket during tests. Change-Id: Iaae89618cc676ddbbbd4b0df2eeacd143ea6f3c2	2020-02-11 15:58:13 +00:00
Jeff Wendling	7999d24f81	all: use monkit v3 this commit updates our monkit dependency to the v3 version where it outputs in an influx style. this makes discovery much easier as many tools are built to look at it this way. graphite and rothko will suffer some due to no longer being a tree based on dots. hopefully time will exist to update rothko to index based on the new metric format. it adds an influx output for the statreceiver so that we can write to influxdb v1 or v2 directly. Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff	2020-02-05 23:53:17 +00:00
Isaac Hess	17580fdf57	storagenode/pieces: Add test to cache store This test checks that we are actually walking over the pieces when starting the cache, and that it is returning expected values. A recent outage was partially caused by the fact that this cache was accidentally reading itself (via the pieces store, which has the cache embedded). This test ensures that does not happen, and checks that when the cache's `Run` method is called, the space used values are read from disk and accurately update the cache. Change-Id: I9ec61c4299ed06c90f79b17de3ffdbbb06bc502e	2020-02-05 11:39:06 -07:00
igor gaidaienko	efa0f6d443	storagenode/monitor: set MinimumDiskSpace default to 500GB. As a workaround it was set to 0 in previous release. Now according to the TOC must be set to 500GB. Change-Id: Ia2743d49e86683396958aff51b95df743af4f872	2020-02-04 15:55:42 +00:00
Egon Elbre	9e5679fdaa	storagenode/console/consoleserver: set content-type manually http.FileServer relies on mime types defined in the operating system. These values may be misconfigured, so a javascript file might end up being served as "plain/text". Change-Id: I3c13c8a9ac484bd765a4de0f8253bfe40dde7513	2020-02-03 15:37:47 +02:00
Jeff Wendling	d20db90cff	private/dbutil/txutil: create new transactions for retries it was noticed that if you had a long lived transaction A that was blocking some other transaction B and A was being aborted due to retriable errors, then transaction B was never given priority. this was due to using savepoints to do lightweight retries. this behavior was problematic becaue we had some queries blocked for over 16 hours, so this commit addresses the issue with two prongs: 1. bound the amount of time we will retry a transaction 2. create new transactions when a retry is needed the first ensures that we never wait for 16 hours, and the value chosen is 10 minutes. that should be long enough for an ample amount of retries for small queries, and huge queries probably shouldn't be retried, even if possible: it's more preferrable to find a way to make them smaller. the second ensures that even in the case of retries, queries that are blocked on the aborted transaction gain priority to run. between those two changes, the maximum stall time due to retries should be bounded to around 10 minutes. Change-Id: Icf898501ef505a89738820a3fae2580988f9f5f4	2020-02-01 18:34:28 +00:00
Jeff Wendling	71ff044edb	storagenode/bandwidth: fix tests to not fail for 10 hours near the end of the month Change-Id: I390569a8702164c42edddd3be020e93782227c2e	2020-01-31 16:25:52 -07:00
Jeff Wendling	03166d6be3	storagenode/piecestore: log available bandwidth and space on uploads Change-Id: Ia92228cb2a178da45f4f123b48c476e5ec821fe8	2020-01-31 19:47:14 +00:00
Isaac Hess	78d0868bc9	storagenode/pieces: Log error if cannot calculate piece size Change-Id: I33b49315a0f6044a801a8b118e6b61dbcd751bfe	2020-01-31 09:57:44 -05:00
Egon Elbre	d0b4272467	storagenode: fix global logger in tests https://github.com/storj/storj/wiki/Testing#logging Change-Id: Ic6a31360bcfedae3f37f6b2536a345f00e33cd78	2020-01-31 14:09:28 +00:00
Isaac Hess	2968857e21	storagenode/pieces: Prevent recalculate from having negative numbers Change-Id: Iafd2bcb9963e85508cb5e2bd69f229d89c589a6c	2020-01-30 17:47:54 -05:00
paul cannon	157b8c4d71	storagenode/pieces: accumulate errors in traversal instead of aborting on the first error, so that we can hit all satellites and get the best numbers we can Change-Id: I21d5163884940612d7d39eaf73a6fac07235cd9e	2020-01-30 19:31:29 +00:00
Isaac Hess	5a053483b7	storagenode/pieces: read trash from blobstore Change-Id: Ib134e63a13b8a5dda5d6a9ead42013ce18411227	2020-01-30 13:30:48 -05:00
Isaac Hess	4dafd03f11	storagenode: Prevent negative values in piece_space_used, migrate negatives to 0 Change-Id: Ibd663db087058c928190aa52c520f22e9338dd04	2020-01-30 13:03:18 -05:00
Isaac Hess	00fc192f6b	storagenode/pieces: Explicitly walk satellite pieces in SpaceUsedTotalAndBySatellite Change-Id: I7ff9a1120d4ced0b5cba7d7765ef8aed7a1edae0	2020-01-30 12:01:50 -06:00
Jeff Wendling	21b65ca3b0	storagenode/storagenodedb: migrate to set total to content_size Change-Id: I4906c2fe9cdb3a32c045c98039d4bde6b8b809e3	2020-01-30 08:53:12 -07:00
Egon Elbre	4e2bf81719	pkg/debug: add better title Change-Id: Icc6114f4e7523cfe6c7984ef1f6eec664ae4ee65	2020-01-30 07:49:40 -05:00
littleskunk	81eddaa2c1	storagenode/monitor: reduce space requirement to 0 We have added a bug with v0.31.7 and deploying it would kick out all the storage nodes that are full. Easy fix is setting the requirment to 0. That will allow them to still start up even if they are full. Change-Id: Ie66f369952d929fcfd47f44f6e5e57eea8f51ff6	2020-01-30 01:44:45 +01:00
Egon Elbre	d10d6fd153	storagenode,satellite: ignore error on listening debug port Change-Id: Id3a6d153535776ce41f8edf2bd6f6dad5e2a60bf	2020-01-29 18:06:02 -05:00
Egon Elbre	10be538602	storagenode: add pkg/debug support Change-Id: If941095b886c28a0d53fff4c9bf9fa0ce7471dea	2020-01-29 16:30:31 -05:00
Egon Elbre	f237d70098	storagenode,satellite: use pkg/debug Use debug.Server in storage node and satellite for customizing debug server. Change-Id: I7979412376d028cadf29656d838ab94f18e2aa99	2020-01-29 16:30:31 -05:00
Egon Elbre	e319660f7a	private/lifecycle: implement Group lifecycle.Group implements controlling multiple items such that their startup and close works. Change-Id: Idb4f4a6c3a1f07cdcf44d3147a6c959686df0007	2020-01-29 00:37:33 +00:00
Yingrong Zhao	d8e3556a22	storagenode/preflight: wait for server to shutdown when tests are finished Change-Id: Ie3ede9f285cb61bb6bc6b0158e41d8ea10b2497e	2020-01-28 17:54:19 +00:00
Stefan Benten	3abb8c8ed7	Dont require an IP address being set Per default our server address is listening on all IP addresses on the machine. This caused our preflight check to fail, as it did not have an hostname to lookup. With this change, we are fine with this and go ahead. Change-Id: I9eb5c891c099eb35f679d6d7e79ec38bb43b619f	2020-01-28 15:25:17 +01:00
nerdatwork	9ea32016c2	storagenode/orders: fix typos in log messages (#3760 )	2020-01-26 13:45:57 -05:00
littleskunk	5c68f4fc7c	storagenode/gracefulexit: higher concurrency and shorter timeouts 1 transfer with a minimum speed of 128 Bytes was a nice try but it is way too low. Even a pi3 was able to handle 7 grpc transfers. We have 4 satellites and with 5 concurrent transfers that should be a total of 20 concurrent transfers. Each transfer will have a minimum speed of 5KB/s. That should give us a better througput and still be Ok on a pi3. Change-Id: I650a7baf890080901ef70ea3b5636d93009b4e60	2020-01-24 23:51:39 +00:00
littleskunk	226bc4de36	storagenode/preflightcheck: enable database check by default With the v0.30.5 release we asked the storage node operators to manually enable the preflight check while they are in front of their machine. We didn't want to risk taking too many storage nodes offline at the same time because of some unknow bug. The preflight check worked. We have no negative feedback. We can now enable it by default. Change-Id: Ic670ee52becd0b35eca84af7a0841ea983d7b19d	2020-01-24 23:23:35 +00:00
Moby von Briesen	e4cff1c938	storagenode/preflight: update allowed time difference for preflight clock sync Change 24h and 1h to 30m and 10m respectively for clock sync. If a storagenode's clock is off by more than 30m for every trusted satellite, it will not start. If it is off by more than 10m for any trusted satellite, a warning is displayed. Change-Id: I05ef611a30a49c1783e3b68b513745922c2f7e28	2020-01-24 22:57:13 +00:00
Jeff Wendling	16bb374deb	storagenode/piecestore: add large timeouts to read/write operations this is to help protect against intentional or unintentional slowloris style problems where a client keeps a tcp connection alive but never sends any data. because grpc is great, we have to spawn a separate goroutine for every read/write to the stream so that we can return from the server handler to cancel it if necessary. yep. really. additionally, we update the rpcstatus package to do some stack trace capture and add a Wrap method for the times where we want to just use the existing error. also fixes a number of TODOs where we attach status codes to the returned errors in the endpoints. Change-Id: Id8bb8ff84aa34e0f711b0cf9bce3908b36a1d3c1	2020-01-23 19:20:49 +00:00
Isaac Hess	44de90ecc8	storagenode/pieces: Rename vars and update comments A few variables were not renamed to the new standard piecesTotal and piecesContentSize, so it was unclear which value was being used. These have been updated, and some comments made more thorough. Change-Id: I363bad4dec2a8e5c54d22c3c4cd85fc3d2b3096c	2020-01-23 11:00:24 -07:00
Isaac Hess	14fd6a9ef0	storagenode/pieces: Track total piece size This change updates the storagenode piecestore apis to expose access to the full piece size stored on disk. Previously we only had access to (and only kept a cache of) the content size used for all pieces. This was inaccurate when reporting the amount of disk space used by nodes. We now have access to the total content size, as well as the total disk usage, of all pieces. The pieces cache also keeps a cache of the total piece size along with the content size. Change-Id: I4fffe7e1257e04c46021a2e37c5adc6fe69bee55	2020-01-23 11:00:24 -07:00
stefanbenten	62d3783928	storagenode/peer: ensure contact.external-address and server.address is valid Change-Id: I634f0d355b0be18ba419726ace746921adda3ac0	2020-01-23 15:51:46 +00:00
Egon Elbre	5a4745eddb	all: remove usages of testplanet.New Ensure that tests use testplanet.Run, so we always require running against all database backends. Change-Id: I6b0209e6a4912cf3328bd35b2c31bb8598930acb	2020-01-22 22:42:57 +02:00
Michal Niewrzal	6502454947	satellite/metainfo: move RS configuration to satellite With this change RS configuration will be set on satellite. Uplink with get RS values with BeginObject request and will use it. For backward compatibility and to avoid super large change redundancy scheme stored with bucket is not touched. This can be done in future. Change-Id: Ia5f76fc10c37e2c44e4f7b8754f28eafe1f97eff	2020-01-22 09:33:53 +00:00
Egon Elbre	c1c878efcf	all: fix import groupings check-imports was broken and didn't complain about things. Change-Id: I38adafd16b4aba86f0eb4f53427b4393f9a6c710	2020-01-20 17:47:44 +00:00
Egon Elbre	21f53e38da	storagenode/storagenodedb/storagenodedbtest: pass ctx as an argument Change-Id: I10b0a8ef3a7d5001e7d361f1873ad5987af1f9c2	2020-01-20 16:56:12 +02:00
Egon Elbre	f3b4bf2b7c	satellite/satellitedb/satellitedbtest: pass ctx as an argument ctx is created in most tests, instead pass in as argument to reduce code duplication. Change-Id: I466c51c008392001129c8b007c9d6b3619935ac4	2020-01-20 16:35:42 +02:00
Egon Elbre	1279eeae39	private/tagsql,storage: fixes to context cancellation Replace all the remaining uses of sql.DB with tagsql.DB to fix issues with context cancellation. Introduce tagsql.Open which helps to get rid of all tagsql.Wrap-s. Use tagsql in cockroachkv and postgreskv. Change-Id: I8946d203341cb85a25976896fc7881e1f704e779	2020-01-20 15:44:39 +02:00
Egon Elbre	d5438036b5	{satellite,storagnode}/gracefulexit: reduce logging Change-Id: I9f274ede77a582fc43ef14a47bf9341d4e3083df	2020-01-19 22:36:13 +02:00
Egon Elbre	3cd584c007	storagenode/gracefulexit: move database test Database tests belong to the interface, not the implementation. Change-Id: I5d76fdc7df0b0f32391ebad1b595ef26b062a9cb	2020-01-19 18:12:01 +00:00
Egon Elbre	7bc76624cf	storagenode/storagenodedb: fix closing in-use database Migration step was closing a database that was used by the migration itself. There is an active tranasction over the database. Instead of closing in the same transaction we can wait until restart for the database cleanup. Change-Id: Ic971d8cea81a3ab783f4a1bdc6357009c8b31386	2020-01-19 16:18:46 +02:00
Egon Elbre	25b76fe63f	storagenode/storagenodedb: use tagsql Change-Id: Iba3b34a97b982deb4f72ce55517a294f249b6b55	2020-01-19 14:39:16 +02:00
Egon Elbre	59d06644b9	private/migrate: switch to tagsql Also added temporary types withRebind and withTagTx, which will be later removed. Currently they help to avoid changing the whole codebase at the same time. Change-Id: I7f07ba8f4709a23a463bfa67464628665a05808f	2020-01-19 14:39:16 +02:00
Moby von Briesen	273eb66fae	cmd/storagenode,storagenode/preflight: add config flag to disable storagenode database preflight check. Disable preflight database check by default, and have the option to enable it. This will allow us to enable it once it is definitely working. Also change the name of the config flag for preflight time sync. Change-Id: Ie2e20f9e25dcb38794eafa7e1505e7c6ff287c99	2020-01-17 17:53:17 +00:00
Isaac Hess	614e04d055	storagenode/pieces: Cache inits trash info from db On pieces usage cache init we now load the trash info from the db. Also fixes a test that was masking the failure here. Change-Id: I9ff7da5bc6c0f74cf0942e20931b40e0c88d70fa	2020-01-17 09:33:05 -07:00
Bill Thorp	6f2f97b313	storagenode\gracefulexit: broke worker deleteOnePieceOrAll into deleteOnePiece and deleteAllPieces and deletePiece Change-Id: Ic3bd21e89fa71e962c2bb1c4943f4696bc4f83e5	2020-01-17 15:07:34 +00:00
Moby von Briesen	e115bc1903	cmd/storagenode;storagenode/storagenodedb: add preflight database check for storagenode Ensure that database schema matches latest test migration schema before allowing the node to start up. Ensure minimal read/write functionality for each storagenode database before allowing the node to start up. This will eliminate many unhandled audit errors we are seeing. Change-Id: Ic0e628b04a9c35b7a8243f6a81d4683918170ba9	2020-01-16 18:44:46 +00:00
Egon Elbre	81d53b8097	storagenode/storagenodedb: fixes to row handling Change-Id: I3813310b48337428f13678a9fcba5c8a0e0b2b2a	2020-01-16 15:08:37 +00:00
Yingrong Zhao	db8aee0806	satellite/contact; storagenode/preflight: add clock check on startup for storagenode add config preflight.enabled-local-time Change-Id: I7b942c9bee063aae409ee6721ae9d079dff0144f	2020-01-15 15:35:26 +00:00
Yingrong Zhao	07c2824d94	storagenode/gracefulexit: fix exit-status command output When exit succeeded, cli should display `Y` in Successful column and `100%` in PercentComplete. Change-Id: I6093eca207ecd618bb332af12e5e455bc8224dde	2020-01-15 14:58:15 +00:00
Egon Elbre	08f63614be	private/context2: add WithoutCancellation Change-Id: I38557c16f41b8983886f256353cc6afb7634d9e6	2020-01-15 14:23:46 +02:00
Egon Elbre	64fb2d3d2f	Revert "dbutil: statically require all databases accesses to use contexts" This reverts commit `8e242cd012`. Revert because lib/pq has known issues with context cancellation. These issues need to be resolved before these changes can be merged. Change-Id: I160af51dbc2d67c5449aafa406a403e5367bb555	2020-01-15 07:28:00 +00:00
JT Olio	8e242cd012	dbutil: statically require all databases accesses to use contexts this will allow for some nice runtime analysis down the road. also, this allows for wrapping database handles in a way that can interact with these contexts requires https://review.dev.storj.io/c/storj/dbx/+/514 Change-Id: Ib087b7cd73296dd2c1e0331314da34d861f61d2b	2020-01-14 18:20:47 -05:00
Egon Elbre	5af1f9e6d1	storagenode/{piecestore,storagenodedb}: use context in queries In endpoint.saveOrder, ensure we always try to save orders such that they can be settled. Change-Id: Ic9ac8f4bf684d8493282912ca97f386c1762e364	2020-01-14 20:27:26 +00:00
Egon Elbre	d80cfeb4ab	storagenode: ensure we don't eat the underlying error When error is formatted using %v it's not possible to check whether the error was caused by a context cancellation. Change-Id: Ia77dfb0817e49d9a7b168c12a6300d131007d0ee	2020-01-14 20:26:23 +00:00
Egon Elbre	23e2664327	storagenode/inspector: return rpcstatus Change-Id: I7e13b6dc8c9c3f4550f77885b1ef99662f5a5727	2020-01-14 20:24:46 +00:00
Egon Elbre	ff267168c5	private/migrate: add ctx argument Change-Id: I3d65912d89261386413c494c7ed1576fed4dcaf4	2020-01-13 15:52:26 +02:00
Egon Elbre	c7b846589e	private/dbutil/sqliteutil: add ctx argument Change-Id: If1caa9cde746817e62cae32a152eeec81959129c	2020-01-13 15:03:30 +02:00
Qweder93	cf19e141e0	storagenode/notifications: return unread count and fix json id, list-notifications method fix Change-Id: Ic56beac1f388d91a29c9e8266161715d09364520	2020-01-09 17:56:00 +00:00
Yingrong Zhao	ebeee58001	storagenode/gracefulexit: remove satellite entry when node fail precondition Change-Id: I3c215170f10f0053e4f8718ee31d64d93f52ec80	2020-01-08 18:11:58 +00:00
Egon Elbre	082ec81714	uplink: move to storj.io/uplink (#3746 )	2020-01-08 15:40:19 +02:00
paul cannon	0c88a7b475	private/migrate: use transactional helpers and not Begin() This code needs to work against cockroachDB, so transactions must be retried when a retryable error is returned. This change puts migrate transactions into the dbutil.WithTx transactional helpers to achieve this in the easiest way. Change-Id: Ib930e82d55cb0257357a222ce9131e6e53372c03	2020-01-07 18:25:38 +00:00
Egon Elbre	f41d440944	all: reduce number of log messages Remove starting up messages from peers. We expect all of them to start, if they don't, then they should return an error why they don't start. The only informative message is when a service is disabled. When doing initial database setup then each migration step isn't informative, hence print only a single line with the final version. Also use shorter log scopes. Change-Id: Ic8b61411df2eeae2a36d600a0c2fbc97a84a5b93	2020-01-06 19:03:46 +00:00
Egon Elbre	2680bae88c	private/testplanet: remove dependency to uplink Remove direct dependency on uplink.RSConfig, this simplifies moving the config file without introducing weird dependencies. Change-Id: I7fd2a145401e0205d7047631df9d2810241efeec	2020-01-02 09:40:46 +00:00
Stefan Benten	758fe35aba	storagenode/orders: adding jitter to sending (#3725 )	2019-12-30 21:35:26 +01:00
Egon Elbre	6615ecc9b6	common: separate repository Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a	2019-12-27 14:11:15 +02:00
Fadila	115b8b0fc8	storagenode/piecestore: delete several pieces in a single request This is part of the deletion performance improvement. See https://storjlabs.atlassian.net/browse/V3-3349 Change-Id: Idcd83a302f2bd5cc3299e1a4195a7e177f452599	2019-12-27 10:58:04 +00:00
Isaac Hess	7d1e28ea30	storagenode: Include trash space when calculating space used This commit adds functionality to include the space used in the trash directory when calculating available space on the node. It also includes this trash value in the space used cache, with methods to keep the cache up-to-date as files are trashed, restored, and emptied. As part of the commit, the RestoreTrash and EmptyTrash methods have slightly changed signatures. RestoreTrash now also returns the keys that were restored, while EmptyTrash also returns the total disk space recovered. Each of these changes makes it possible to keep the cache up-to-date and know how much space is being used/recovered. Also changed is the signature of PieceStoreAccess.ContentSize method. Previously this method returns only the content size of the blob, removing the size of any header data. This method has been renamed `Size` and returns both the full disk size and content size of the blob. This allows us to only stat the file once, and in some instances (i.e. cache) knowing the full file size is useful. Note: This commit simply adds the trash size data to the piece size data we were already collecting. The piece size data is not accurate for all use-cases (e.g. because it does not contain piece header data); however, this commit does not fix that problem. Now that the ContentSize (Size) method returns the full size of the file, it should be easier to fix this problem in a future commit. Change-Id: I4a6cae09e262c8452a618116d1dc66b687f59f85	2019-12-23 19:07:03 -07:00
Egon Elbre	d55288cf68	pkg/rpc: replace methods with direct calls to pb Change-Id: I8bd015d8d316a2c12c1daceca1d9fd257f6f57bc	2019-12-22 17:12:43 +02:00
Egon Elbre	006baa9ca6	pkg/rpc: remove drpc aliases We need to split up pb package, which means we cannot have a core package that depends on them. Change-Id: I7f4f6fd82f89a51a9b2ad08bf2b1207253b8a215	2019-12-22 16:58:08 +02:00
Yingrong Zhao	6e71591b9b	satellitedb;storagenodedb: remove unnecessary use of DB transactions in graceful exit Change-Id: Ief0a28c6750c130896b48bfebfbea7fb3caa810f	2019-12-20 21:24:38 +00:00
Qweder93	e47ec84dee	storagenode notification service and api added Change-Id: I36898d7c43e1768e0cae0da8d83bb20b16f0cdde	2019-12-20 18:42:23 +00:00
Egon Elbre	afe05edff2	{storagenode,satellite}/gracefulexit: ensure workers finish their work Fixes a data race caused by not waiting for workers to finish before shutting down. Currently this ended up failing logging because it was closed when test tried to write to it. Change-Id: I074045cd83bbf49e658f51353aa7901e9a5d074b	2019-12-17 17:21:52 +02:00
Egon Elbre	7a36507a0a	private/testcontext: ensure we call cleanup everywhere Change-Id: Icb921144b651611d78f3736629430d05c3b8a7d3	2019-12-17 14:16:09 +00:00
littleskunk	08947e177d	storagenode/garbagecollection: enable in production Change-Id: I627b7a37ca4a85eb19936ca2c7ca907d7cc63f5b	2019-12-16 22:44:04 +00:00
Vitalii Shpital	53d9bc4530	storagenode/notifications: db created (#3707 )	2019-12-16 19:59:01 +02:00
littleskunk	c2ea75208f	storagenode/orderdb: fix db lock Change-Id: Id1add0ba7ae1b20bd98099bd4d3aff0fcfdd90c9	2019-12-15 23:41:22 +01:00
Andrew Harding	cb89496569	storagenode/trust: wire up list into pool - also updated ping chore to pick up trust changes - fixed small typo in blueprint - fixed flags for storj-sim - wired up changes to testplanet Change-Id: I02982f3a63a1b4150b82a009ee126b25ed51917d	2019-12-13 20:32:50 +00:00
Andrew Harding	2867b6a466	storagenode/trust: list implementation Change-Id: Ia886e84990efaf2c783f199741552a7a8ff41d4e	2019-12-12 17:15:47 +00:00
Jeff Wendling	fb8e78132d	storagenodedb: reenable utccheck in tests Change-Id: If7d64dd4ae58e4b656ff9122ae3195b2a5173cb3	2019-12-10 23:17:14 +00:00
Andrew Harding	5ed9373dba	storagenode/trust: source entry cache Implements a cache that can persist trust entries returned by sources Change-Id: I72579e42e9f72d34a54b7510c9b665844f187314	2019-12-10 21:45:01 +00:00
Andrew Harding	715d97e3d8	storagenode/trust: rule and excluders Change-Id: I84ed542e1ef3cfaa5cc3d3f631cdc295393bf978	2019-12-10 21:08:12 +00:00
Cameron Ayer	6fae361c31	replace planet.Start in tests with planet.Run planet.Start starts a testplanet system, whereas planet.Run starts a testplanet and runs a test against it with each DB backend (cockroach compat). Change-Id: I39c9da26d9619ee69a2b718d24ab00271f9e9bc2	2019-12-10 16:55:54 +00:00
Andrew Harding	eb52ac623b	storagenode/trust: source implementations Change-Id: Ie36e79cc15257db88051f63e5b9463fd9d7b4736	2019-12-09 20:00:02 +00:00
Andrew Harding	7d0aadfeca	storagenode/trust: satellite URL implementation Satellite URL is a stricter form of the STORJ Node URL. It requires both the ID and port specifier. Change-Id: I7fd302064f864c1de8240a7915bf5263b898dfd1	2019-12-09 17:05:57 +00:00
littleskunk	9d1faeee58	storagenode/garbagecollection: increase MaxTimeSkew to be higher than satellite MaxCommitInterval Change-Id: I86f8d0b44bea3aa005ff26d52588611c59df5e9a	2019-12-09 16:03:55 +00:00
Ethan Adams	9420fa9fc5	satellite/gracefulexit: Add graceful exit completed/failed receipt verification to satellite CLI (#3679 )	2019-12-03 17:09:39 -05:00
Ivan Fraixedes	42c61138e8	storage: Improve doc comments delete methods (#3591 ) Improve the documentation of several methods involved in the delete operation to make clear their behavior without having to inspect their logic.	2019-12-02 12:18:20 +01:00
Ivan Fraixedes	bf97ef06fc	storagenode: Add new endpoint to receive satellite requests for… (#3590 ) * pkg/pg: Add new service function storage node Add a new service function to the storage node piece store for deleting pieces when satellites request them. * storagenode/piecestore: Add endpoint to delete piece Add a new endpoint to receive from trusted satellites to delete a piece. * private/testplanet: Fix storagenode mock Add to the storagenode mock the new endpoint method. * proto.lock: Update it with the last protbuff changes * storagenode/piecestore: Reuse test piece upload Extract the repeated logic from several tests functions for uploading a test piece to a test helper function. * uplink/piecestore: Implement client side method Implement the client side method of the new piecestore RPC function. * storagenode/piecestore: Add test DeletePiece endpoint Implement a test for the DeletePiece new endpoint method.	2019-11-26 18:47:19 +01:00
Yingrong Zhao	66f1a1680f	add completion receipt to exit-status cli command on storage node (#3650 )	2019-11-26 12:32:26 -05:00
Isaac Hess	56f8fd2dd7	storagenode/pieces: Add EmptyTrash functionality (#3640 ) * storagenode/pieces: Add EmptyTrash functionality * storagenode/pieces: Fix err * storagenode/pieces: Fix lint	2019-11-26 09:25:21 -07:00
Vitalii Shpital	038ac58600	web/storagenode: minimal allowed version view implemented (#3583 )	2019-11-26 18:08:24 +02:00
littleskunk	8842b0c252	storagenode/gracefulexit: improve logging (#3633 )	2019-11-21 21:10:02 -05:00
Rafael Antonio Ribeiro Gomes	2739771761	storagenode: add bandwidth metrics (#3623 ) * storagenode: add bandwidth metrics * remove unecessary metric	2019-11-21 16:51:40 -03:00
Isaac Hess	6aeddf2f53	storagenode/pieces: Add Trash and RestoreTrash to piecestore (#3575 ) * storagenode/pieces: Add Trash and RestoreTrash to piecestore * Add index for expiration trash	2019-11-20 09:28:49 -07:00
Kaloyan Raev	6d728d6ea0	storagenode/collect: delete piece 24 hours after expiration (#3613 )	2019-11-20 17:02:57 +02:00
Vitalii Shpital	61c8bcc9a6	web/storagenode: egress chart implemented (#3574 )	2019-11-20 16:37:57 +02:00
Rafael Antonio Ribeiro Gomes	da39c71d35	storagenode: add new metric satellite.request (#3610 ) * storagenode: add new metric satellite.request * storagenode: metrics fixed * switch from Counter to Meter	2019-11-19 18:11:31 -03:00
Ivan Fraixedes	8e1e4cc342	piecestore: Fix invalid comment and typos (#3604 )	2019-11-19 16:30:48 +01:00
Nikolai Siedov	24318d74b3	storagenode/console: show satellite url in satellite selection (#3602 )	2019-11-19 14:16:56 +02:00
Nikolai Siedov	0d35505fe1	SNOboard/console: router changed for gorillaMux, caching added (#3577 )	2019-11-15 14:36:43 +02:00
Egon Elbre	ee6c1cac8a	private: rename internal to private (#3573 )	2019-11-14 21:46:15 +02:00
Egon Elbre	1a54007f1c	storagenode/storagenodedb: dont log opening of each database (#3571 )	2019-11-14 17:08:16 +02:00
Egon Elbre	1e64006e32	lint: add staticcheck as a separate step (#3569 )	2019-11-14 10:31:30 +02:00
paul cannon	bd89f51c66	Keep v0pieceinfo database isolated (#3364 ) * put TestCreateV0 back in StoreForTest * avoid direct handles to V0 pieceinfo db * type mismatch fix * use storage.Blobs interface in store_test.go ..instead of filestore.Store. this will allow filestore.Store to become unexported. * unexport filestore.Store rename it to blobStore. things should use the storage.Blobs interface instead. changes in this commit are purely mechanical (made through the "refactor" tool in Gocode followed by search/replace on the word "Store" within the storage/filestore/ directory). * kill filestore.StoreForTest now that filestore.blobStore is unexported, there isn't a need for a specialized wrapper type. this (not coincidentally) also makes it possible for the WriterForFormatVersion() method on storagenode/pieces.StoreForTest to work, without requiring everything to wrap the store.blobs attribute in a filestore.StoreForTest, which was impractical.	2019-11-13 13:15:31 -06:00
Yingrong Zhao	db8294cfba	storagenode/gracefulexit: get hash and limit using original piece ID (#3557 )	2019-11-13 12:45:55 -05:00
Jeff Wendling	ebcd37c572	storagenode/contact: fix connection leak with contact checkin Change-Id: If86002557144d5d8dbff939d2b6a2dfec6537577	2019-11-06 18:00:09 +00:00
littleskunk	7eb6724c92	logging: unify logging around satellite ID, node ID and piece ID (#3491 ) * logging: unify logging around satellite ID, node ID and piece ID * unify segment index	2019-11-05 22:04:07 +01:00
Maximillian von Briesen	257d3946d5	storagenode/gracefulexit: allow storagenodes to concurrently transfer pieces for graceful exit (#3478 )	2019-11-05 10:33:44 -05:00
Jennifer Li Johnson	11f0ea3258	5s (#3477 )	2019-11-04 16:20:31 -05:00
Jennifer Li Johnson	aa7d15a365	storagenode/contact: exponential backoff retries for pinging Satellites (#3372 )	2019-11-04 16:03:21 -05:00
Jess G	5abb91afcf	satellite: change the Peer name to Core (#3472 ) * change satellite.Peer name to Core * change to Core in testplanet * missed a few places * keep shared stuff in peer.go to stay consistent with storj/docs	2019-11-04 11:01:02 -08:00
Isaac Hess	4d26d0a6a6	storagenode/pieces: Add migration from v0 piece to v1 piece (#3401 )	2019-11-04 17:59:45 +01:00
Egon Elbre	87687938d1	storagenode/contact: fix panic in ping satellites (#3447 )	2019-11-01 16:20:53 +01:00
Ethan Adams	43103ae13f	lower storage node counts in tests (#3427 )	2019-10-31 10:57:54 -04:00
Jess G	4d85b11574	satellite/contact: improve errors in contact endpoints (#3356 ) * improve errors in satellite contact endpoints * add changes per CR comments * update pingback method so it still updates node table * fix err and returns * fix zap logging to be better	2019-10-30 11:57:21 -07:00
Natalie Villasana	4878135068	satellite/gracefulexit, storagenode/gracefulexit: add timeouts (#3407 )	2019-10-30 13:40:57 -04:00
Natalie Villasana	5453886231	satellite/repair, uplink/ecclient: remove unused expiration arg from ec.Repair and ec.putPiece (#3416 )	2019-10-30 11:35:00 -04:00
Yingrong Zhao	3ee0b89f8f	storagenode/gracefulexit: delete pieces when receive Delete or Completed message from satellite (#3406 )	2019-10-30 10:46:56 -04:00
Egon Elbre	65a8e0bcbc	{satellite,storagenode}/gracefulexit: clearer log messages (#3413 )	2019-10-30 10:21:27 +02:00
Isaac Hess	1defd4dbfe	storagenode/piecestore: Respect config.MaxConcurrentRequests for drpc (#3402 )	2019-10-28 13:12:49 -06:00
Ethan Adams	5b0398a718	storagenode/gracefulexit: Exclude finished exits from chore/worker processing. Fix update status bug (#3399 )	2019-10-28 13:59:45 -04:00
Egon Elbre	93353df4d6	internal/sync2: make Fence accept context (#3393 )	2019-10-28 16:04:31 +02:00
paul cannon	1469f7f41f	storagenode/contact: wait for UpdateSelf before start (#3332 ) When the contact chore starts running before the monitor service has provided any useful capacity data, the first outgoing contact has not-very-helpful data for the satellite. This change causes the contact chore to wait until capacity data is available. The wait should be quite short in all reasonable cases: even when a node starts with a lot of stored pieces and no cached spaceUsedDB data, new data will have been calculated and cached by the call to `peer.Storage2.CacheService.Init(ctx)` in `storagenode.cmdRun()` before `peer.Run(ctx)`. Change-Id: Ibc26d5c1fc10a23006c00bc3f13ff6cf71f8bf1d	2019-10-26 12:16:25 -05:00
Jeff Wendling	ed48e74e20	gracefulexit: fix build for drpc (#3387 ) Change-Id: I335e9f8991a10c9e8a0737bc7c9ea3f04cbe2546	2019-10-26 15:53:35 +02:00
Maximillian von Briesen	6df4d7bc73	storagenode/gracefulexit + satellite/gracefulexit: add storagenode-side transfer validation (#3371 ) * Make the exiting node check piece hashes, piece IDs, and piece hash signatures before relaying successful transfer data to the satellite. * Enable immediate graceful exit failure for "successful" transfers that fail satellite-side validation. * Move transfer piece logic in storagenode worker to separate function (to make the worker easier to understand)	2019-10-25 13:16:20 -04:00
Yingrong Zhao	fa1ac24e19	satellite/gracefulexit: add failure threshold check (#3329 ) * add overall failure percentage check and inactive time frame check before sending a response to sno * update comment * delete node from transfer queue if it has been inactive for too long * fix linting error * add test config value * fix nil pointer * add config value into testplanet * add unit test for overall failure threshold * move timeframe threshold to chore * update protolock * add chore test * add per peiece failure count logic * change config name from EndpointMaxFailures to MaxFailuresPerPiece * address comments * fix linting error * add error handling for no row returned from progress table * fix test for graceful exit chore on storagenode * fix typo InActive -> Inactive * improve readability for failure threshold calculation * update config lock * change error handling for GetProgress in graceful exit endpoint on the satellite side * return proper rpc error in endpoint * add check in chore test for checking finish timestamp and queue	2019-10-24 12:24:42 -04:00
Isaac Hess	75412e54e5	storagenode/piecestore: Rename liveGRPCRequests back to liveRequests (#3354 )	2019-10-23 13:43:43 -06:00
Isaac Hess	14c7648530	storagenode/piecestore: Only limit grpc requests (#3342 )	2019-10-23 10:14:02 -06:00
JT Olio	2c6fa3c5f8	pkg/rpc: remove read/write deadlines as a mechanism for request timeouts (#3335 ) libuplink was incorrectly setting timeouts to 10 seconds still, but should have been at least 10 minutes. the order sender was setting them to 1 hour. we don't want timeouts in uplink-side logic as it establishes a minimum rate on tcp streams. instead of all of this, just use tcp keep alive. tcp keep alive packets are sent every 15 seconds and if the peer stops responding the connection dies. this is enabled by default with go. this will kill tcp connections when they stop working. Change-Id: I3d7ad49f71950b3eb43044eedf4b17993116045b	2019-10-22 17:57:24 -06:00
Ethan Adams	3e0d12354a	storagenode/gracefulexit: Implement storage node graceful exit worker - part 1 (#3322 )	2019-10-22 16:42:21 -04:00
paul cannon	5e78f4000b	storagenode/pieces: remove old comment (#3334 ) the reservedSpace member it's talking about was removed quite a while ago. Change-Id: I28433b2a44467376a408453d875c389656347cab	2019-10-22 12:51:51 +03:00
Bryan White	f468816f13	{internal/version,versioncontrol,cmd/storagenode-updater}: add rollout to storagenode updater (#3276 )	2019-10-21 12:50:59 +02:00
Bryan White	243ba1cb17	{versioncontrol,internal/version,cmd/*}: refactor version control (#3253 )	2019-10-20 09:56:23 +02:00
Yingrong Zhao	e5099f31f3	add context.Clean and correct rpc error code (#3295 )	2019-10-16 13:50:01 -04:00
Isaac Hess	ed6b88a12d	piecestore: update usage before completing upload (#3286 ) The upload code currently updates the usage in a deferred call to saveOrder(). The consequence is that in the success case, the RPC is completed before the usage has been updated. This change repurposes the deferred call to update usage in the failure case, while explicitly updating the usage before completing the RPC. This fixes some test flakiness when using dRPC. gRPC waits until the final status is written before a Recv call completes, and the final status is written by the server after the handler function has exited. In practice this means that the client is blocked until the defer call is also finished. So this change will not change performance at all. It has two advantages: (1) It fixes test flakiness and, more importantly: (2) reduces the chances that someone will accidentally write a flaky test in the future	2019-10-15 20:17:17 -06:00
Yingrong Zhao	87e3764390	storagenode/cmd: add exit-status command for graceful exit (#3264 ) * add exit-status command * remove todo and fix format * fix status display * change startExit to exit progress * fix linting error * add successful column in exit progress * fix test * remove extra new line * fix TYPOS * format the percentage better	2019-10-15 18:07:32 -04:00
Andrew Harding	4962c6843e	piecestore: fix test flakiness around upload/download usage tracking (#3282 )	2019-10-15 11:22:15 -06:00
Simon Guindon	abb5b6c499	storagenode/piecestore: Fix to ignore both gRPC and dRPC EOF errors. (#3274 ) * Fix to ignore both gRPC and dRPC EOF errors. * Fix to ignore both gRPC and dRPC EOF errors.	2019-10-15 12:13:53 -04:00
Ethan Adams	1ad2ba7e3e	storagenode/gracefulexit: Add graceful exit chore and worker. (#3262 ) Adds graceful exit chore and worker for V3-2614	2019-10-15 11:29:47 -04:00
Jennifer Li Johnson	b185dbbee2	satellite/discovery: remove discovery related code (#3175 )	2019-10-14 10:57:01 -04:00
littleskunk	96aeedcdee	OrderLimit/GracePeriod: Increase time window from 1h to 24h (#3255 ) * OrderLimit/GracePeriod: Increase time window from 1h to 24h * update satellite config lock	2019-10-13 17:40:24 +02:00
JT Olio	6ede140df1	pkg/rpc: defeat MITM attacks in most cases (#3215 ) This change adds a trusted registry (via the source code) of node address to node id mappings (currently only for well known Satellites) to defeat MITM attacks to Satellites. It also extends the uplink UI such that when entering a satellite address by hand, a node id prefix can also be added to defeat MITM attacks with unknown satellites. When running uplink setup, satellite addresses can now be of the form 12EayRS2V1k@us-central-1.tardigrade.io (not even using a full node id) to ensure that the peer contacted is the peer that was expected. When using a known satellite address, the known node ids are used if no override is provided.	2019-10-12 14:34:41 -06:00
Isaac Hess	e567f27634	storagenode/piecestore: Change test to use ioutil.ReadAll to attempt to reduce test flake (#3250 )	2019-10-11 15:57:59 -06:00
Cameron	d17be58237	remove random sleep in storagenode contact (#3243 )	2019-10-11 16:44:18 -04:00
Vitalii Shpital	78a71ad3b6	web/storagenode: node status updated (#3220 )	2019-10-11 19:28:47 +03:00

1 2 3 4 5 ...

486 Commits