storj

Author	SHA1	Message	Date
paul cannon	72189330fd	satellite/gracefulexit: revamp graceful exit Currently, graceful exit is a complicated subsystem that keeps a queue of all pieces expected to be on a node, and asks the node to transfer those pieces to other nodes one by one. The complexity of the system has, unfortunately, led to numerous bugs and unexpected behaviors. We have decided to remove this entire subsystem and restructure graceful exit as follows: * Nodes will signal their intent to exit gracefully * The satellite will not send any new pieces to gracefully exiting nodes * Pieces on gracefully exiting nodes will be considered by the repair subsystem as "retrievable but unhealthy". They will be repaired off of the exiting node as needed. * After one month (with an appropriately high online score), the node will be considered exited, and held amounts for the node will be released. The repair worker will continue to fetch pieces from the node as long as the node stays online. * If, at the end of the month, a node's online score is below a certain threshold, its graceful exit will fail. Refs: https://github.com/storj/storj/issues/6042 Change-Id: I52d4e07a4198e9cb2adf5e6cee2cb64d6f9f426b	2023-09-27 08:40:01 +00:00
Michal Niewrzal	6ac5bf0d7c	satellite/gracefulexit: remove segments loop parts We are switching completely to ranged loop. https://github.com/storj/storj/issues/5368 Change-Id: Ia3e2d7879d91f7f5ffa99b8e8f108380e3b39f31	2023-04-24 15:00:26 +00:00
Andrew Harding	4241e6bf5f	satellite/gracefulexit: implement rangedloop observer The tests are forked from the chore tests with slight adaptations for being run against the ranged loop. I also moved a benchmark for the database from chore_test.go to db_test.go. The pathcollector is reused as a rangedloop.Partial. https://github.com/storj/storj/issues/5234 Change-Id: I56182031d133812a9f4d4a433c01b9150af39f31	2022-12-22 10:47:10 -07:00
Fadila Khadar	c00ecae75c	satellite/gracefulexit: stop using gracefulexit_transfer_queue Remove the logic associated to the old transfer queue. A new transfer queue (gracefulexit_segment_transfer_queue) has been created for migration to segmentLoop. Transfers from the old queue were not moved to the new queue. Instead, it was still used for nodes which have initiated graceful exit before migration. There is no such node left, so we can remove all this logic. In a next step, we will drop the table. Change-Id: I3aa9bc29b76065d34b57a73f6e9c9d0297587f54	2021-09-14 11:52:34 +00:00
Michał Niewrzał	c258f4bbac	private/testplanet: move Metabase outside Metainfo for satellite At some point we moved metabase package outside Metainfo but we didn't do that for satellite structure. This change refactors only tests. When uplink will be adjusted we can remove old entries in Metainfo struct. Change-Id: I2b66ed29f539b0ec0f490cad42c72840e0351bcb	2021-09-09 07:15:51 +00:00
Fadila Khadar	c4202b9451	satellite/gracefulexit: use graceful_exit_segment_transfer_queue For being able to use the segment metainfo loop, graceful exit transfers have to include the segment stream_id/position instead of the path. For this, we created a new table graceful_exit_segment_transfer_queue that will replace the graceful_exit_transfer_queue. The table has been created in a previous migration and made accessible through graceful exit db in another one. This changes makes graceful exit enqueue transfer items for new exiting nodes in the new table. Change-Id: I7bd00de13e749be521d63ef3b80c168df66b9433	2021-07-21 14:02:20 +00:00
Fadila Khadar	b0d98b1c1a	satellite/gracefulexit: allow use of graceful_exit_segment_transfer_queue For being able to use the segment metainfo loop, graceful exit transfers have to include the segment stream_id/position instead of the path. For this, we created a new table graceful_exit_segment_transfer_queue that will replace the graceful_exit_transfer_queue. The table has been created in a previous migration. This change gives access to this table. Graceful Exit doesn't use the table yet, this will be done in a next change. Change-Id: I6c09cff4cc45f0529813a8898ddb2d14aadb2cb8	2021-07-21 12:34:44 +00:00
Ivan Fraixedes	7fb86617fc	satellite/satellitedb: Use CRDB AS OF SYSTEM & batch for GE Use the 'AS OF SYSTEM TIME' Cockroach DB clause for the Graceful Exit (a.k.a GE) queries that count the delete the GE queue items of nodes which have already exited the network. Split the subquery used for deleting all the transfer queue items of nodes which has exited when CRDB is used and batch the queries because CRDB struggles when executing in a single query unlike Postgres. The new test which has been added to this commit to verify the CRDB batch logic for deleting all the transfer queue items of the exited nodes has raised that the Enqueue method has to run in baches when CRDB is used otherwise CRDB has return the error "driver: bad connection" when a big a amount of items are passed to be enqueued. This error didn't happen with the current test implementation it was with an initial one that it was creating a big amount of exited nodes and transfer queue items for those nodes. Change-Id: I6a099cdbc515a240596bc93141fea3182c2e50a9	2021-05-07 13:09:19 -04:00
Michał Niewrzał	7944df20d6	storj: use multipart API Change-Id: I10b401434e3e77468d12ecd225b41689568fd197	2021-04-26 13:15:09 +00:00
Egon Elbre	267506bb20	satellite/metabase: move package one level higher metabase has become a central concept and it's more suitable for it to be directly nested under satellite rather than being part of metainfo. metainfo is going to be the "endpoint" logic for handling requests. Change-Id: I53770d6761ac1e9a1283b5aa68f471b21e784198	2021-04-21 15:54:22 +03:00
Kaloyan Raev	035c393da0	satellite: update tests to pass etag.Reader to multipart.PutObjectPart Change-Id: Ibe99357945ae7a91f5b5d4f87b83d425c9fa84a5	2021-03-29 13:18:11 +00:00
Michał Niewrzał	27ae0d1f15	satellite/metainfo/metabase: add NewRedundancy parameter for UpdateSegmentPieces method At some point we might try to change original segment RS values and set Pieces according to the new values. This change adds add NewRedundancy parameter for UpdateSegmentPieces method to give ability to do that. As a part of change NewPieces are validated against NewRedundancy. Change-Id: I8ea531c9060b5cd283d3bf4f6e4c320099dd5576	2021-03-22 08:12:56 +00:00
Fadila Khadar	5dd76522af	gracefulexit: use GetSegmentByLocation instead of GetObjectLatestVersion This enables the transfer of pieces from an on-going multipart upload. Tests are also modified to take into account pending multipart uploads. See https://storjlabs.atlassian.net/browse/PG-161 Change-Id: I35d433c44dd6e618667e5e8f9f998ef867b9f1ad	2021-02-16 10:49:36 +00:00
Michal Niewrzal	b3aa28cc02	satellite/gracefulexit: migrate to metabase Change-Id: I8be9cc68894124427e4a30d7631126b3afb1f281	2020-12-18 10:57:39 +00:00
Michal Niewrzal	7dde184cb5	Merge 'master' branch Change-Id: I6070089128a150a4dd501bbc62a1f8b394aa643e	2020-11-10 11:58:59 +00:00
Michal Niewrzal	8649a00557	satellite/gracefulexit: replace `Path []byte` to `Key metabaseSegmentKey` TransferQueueItem We are unifying which name (and type) we are using for value we are using to point to segment. We want to use `key` instead of `path`. Dedicated type `metabase.SegmentKey` was created for this purposes also. This change is doing refactoring around gracefulexit. Change-Id: I90d51ff087b206179e61d5f1bc95f4709d76f917	2020-09-04 11:09:48 +00:00
Michal Niewrzal	aa47e70f03	satellite/metainfo: use metabase.SegmentKey with metainfo.Service Instead of using string or []byte we will be using dedicated type SegmentKey. Change-Id: I6ca8039f0741f6f9837c69a6d070228ed10f2220	2020-09-03 15:11:32 +00:00
Egon Elbre	5bdcd86fa7	ci: test benchmarks This runs each benchmark for one iteration to ensure that they are valid. Unfortunately, it does not give any useful metrics as output. Change-Id: I68940398c8dd849aed656bd12656f48d5df10128	2020-07-10 13:26:49 +00:00
Egon Elbre	bcd93ee375	private/testplanet: add StopNodeAndUpdate This was commonly used and code with it can be simplified. Change-Id: I2f2b91f7de54269aee6ef027f97f9e8a7d222e39	2020-05-08 13:02:19 +00:00
Egon Elbre	11a44cdd88	all: don't depend on gogo/proto directly Change-Id: I8822dea0d1b7b99e0b828e0373a0308a42dde2be	2020-04-08 17:32:15 +00:00
Bill Thorp	94c11c5212	satellite: remove some unnecessary UTC() calls Fixes some easy cases of extraneous UTC() calls Change-Id: I3f4c287ae622a455b9a492a8892a699e0710ca9a	2020-03-13 13:49:44 +00:00
Michal Niewrzal	6502454947	satellite/metainfo: move RS configuration to satellite With this change RS configuration will be set on satellite. Uplink with get RS values with BeginObject request and will use it. For backward compatibility and to avoid super large change redundancy scheme stored with bucket is not touched. This can be done in future. Change-Id: Ia5f76fc10c37e2c44e4f7b8754f28eafe1f97eff	2020-01-22 09:33:53 +00:00
Egon Elbre	2680bae88c	private/testplanet: remove dependency to uplink Remove direct dependency on uplink.RSConfig, this simplifies moving the config file without introducing weird dependencies. Change-Id: I7fd2a145401e0205d7047631df9d2810241efeec	2020-01-02 09:40:46 +00:00
Egon Elbre	6615ecc9b6	common: separate repository Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a	2019-12-27 14:11:15 +02:00
Ethan Adams	bba92911cc	fix calcuation of durability ration (#3656 )	2019-11-26 12:04:48 -05:00
Egon Elbre	ee6c1cac8a	private: rename internal to private (#3573 )	2019-11-14 21:46:15 +02:00
Ethan Adams	f3dccb56b1	satellite/gracefulexit: Check if pointer has been overwritten or deleted before sending transfer message. (#3481 )	2019-11-07 11:13:05 -05:00
Yingrong Zhao	fa1ac24e19	satellite/gracefulexit: add failure threshold check (#3329 ) * add overall failure percentage check and inactive time frame check before sending a response to sno * update comment * delete node from transfer queue if it has been inactive for too long * fix linting error * add test config value * fix nil pointer * add config value into testplanet * add unit test for overall failure threshold * move timeframe threshold to chore * update protolock * add chore test * add per peiece failure count logic * change config name from EndpointMaxFailures to MaxFailuresPerPiece * address comments * fix linting error * add error handling for no row returned from progress table * fix test for graceful exit chore on storagenode * fix typo InActive -> Inactive * improve readability for failure threshold calculation * update config lock * change error handling for GetProgress in graceful exit endpoint on the satellite side * return proper rpc error in endpoint * add check in chore test for checking finish timestamp and queue	2019-10-24 12:24:42 -04:00
Ethan Adams	4c4519f0be	satellite/gracefulexit: add transfer queue for pieces (#3174 ) initial impl of transfer queue updated docs represent the new design how we handle durability during exit	2019-10-07 16:38:05 -04:00

29 Commits