docs/blueprints: Fix typos horizontal scale GC

Fix a few typos in the horizontal scale garbages collection blueprint
document.

Change-Id: I25a0e59ae526e6f270ea4e8b5eac36d779d597dc
This commit is contained in:
Ivan Fraixedes 2020-08-11 15:29:03 +02:00 committed by Ivan Fraixedes
parent c921710247
commit 0525858b72

View File

@ -29,10 +29,10 @@ Reference: [Redash query](https://redash.datasci.storj.io/queries/1224) for curr
## Design ## Design
#### GC Manager and Workers #### GC Manager and Workers
The idea here is to split the garbage collection process into a manager process and many worker processes. The GC Manger will join the metainfo loop to retrieve pointer data. It will assign a portion of the storage nodes to each of the GC workers so that each worker is only responsible for creating bloom filters for a subset of all storage nodes. The GC master will send the piece IDs from the metainfo loop to the correct worker responsible for that storage node. Once the GC cycle is complete, the workers send the completed bloom filters to the storage nodes. The idea here is to split the garbage collection process into a manager process and many worker processes. The GC Manager will join the metainfo loop to retrieve pointer data. It will assign a portion of the storage nodes to each of the GC workers so that each worker is only responsible for creating bloom filters for a subset of all storage nodes. The GC master will send the piece IDs from the metainfo loop to the correct worker responsible for that storage node. Once the GC cycle is complete, the workers send the completed bloom filters to the storage nodes.
#### Reliable data transfer mechanism #### Reliable data transfer mechanism
With this design, the GC Manager will be sending piece ID data to the workers, it's very important we never lose a piece ID. We need a way to confirm each worker received every piece ID from the GC Manger for any given GC iteration. Otherwise it could cause the storage node to delete the unintended data. One way to solve this is to assign a sequence number to each piece ID sent from GC Manager to the worker so that at the end of the GC cycle, the manager and worker must confirm the end sequence number match and all sequences in between have been received. With this design, the GC Manager will be sending piece ID data to the workers, it's very important we never lose a piece ID. We need a way to confirm each worker received every piece ID from the GC Manager for any given GC iteration. Otherwise it could cause the storage node to delete the unintended data. One way to solve this is to assign a sequence number to each piece ID sent from GC Manager to the worker so that at the end of the GC cycle, the manager and worker must confirm the end sequence number match and all sequences in between have been received.
#### GC Sessions #### GC Sessions
Since the GC process runs on an interval, the GC workers need to keep track of which GC cycle relates to which bloom filters, we should store some sort of session ID or timestamp related to each GC cycle to track this. Since the GC process runs on an interval, the GC workers need to keep track of which GC cycle relates to which bloom filters, we should store some sort of session ID or timestamp related to each GC cycle to track this.