satellite/gc: improve comments

Change-Id: I9e71c9bee3447f78365ba1593e4a4ef55b28356f
This commit is contained in:
Erik van Velzen 2022-10-11 20:19:58 +02:00 committed by Storj Robot
parent 0408997e6c
commit 464ceb1c0e
4 changed files with 23 additions and 15 deletions

View File

@ -44,7 +44,7 @@ type Config struct {
AccessGrant string `help:"Access Grant which will be used to upload bloom filters to the bucket" default:""`
Bucket string `help:"Bucket which will be used to upload bloom filters" default:"" testDefault:"gc-queue"` // TODO do we need full location?
ZipBatchSize int `help:"how many bloom filters will be packed in a single zip" default:"500" testDefault:"2"`
ExpireIn time.Duration `help:"how quickly uploaded bloom filters will be automatically deleted" default:"336h"`
ExpireIn time.Duration `help:"how long bloom filters will remain in the bucket for gc/sender to consume before being automatically deleted" default:"336h"`
}
// Service implements service to collect bloom filters for the garbage collection.

View File

@ -4,18 +4,25 @@
/*
Package gc contains the functions needed to run garbage collection.
The gc.PieceTracker implements the segments loop Observer interface
allowing us to subscribe to the loop to get information for every segment
in the metabase db.
Package gc/bloomfilter creates retain instructions. These are compressed
lists of piece id's that storage nodes must keep. It uploads those to a bucket.
The gc.PieceTracker handling functions are used by the gc.Service to periodically
account for all existing pieces on storage nodes and create "retain requests"
which contain a bloom filter of all pieces that possibly exist on a storage node.
gc/bloomfilter is designed to run on a satellite pointed to a backup snapshot
of the metainfo database. It should not be run on a live satellite: due to the
features move and copy object, pieces can move to different segments in the
segments table during the iteration of that table. Because of that it is not
guaranteed that all piece ids are in the bloom filter if the segments table
is live.
The gc.Service will send that request to the storagenode after a full segments loop
iteration, and the storage node will use that request to delete the "garbage" pieces
that are not in the bloom filter.
Package gc/sender reads the retain instructions from the bucket and sends
them to the storage nodes. It is intended to run on a satellite connected
to the live database.
See storj/docs/design/garbage-collection.md for more info.
Should we ever delete all segments from the satellite's metainfo, then no
bloom filters will be generated, because GC only considers NodeID's inside
the segments table. There is also an explicit check that stops sending out
an empty bloom filter to a storage node.
See storj/docs/blueprints/garbage-collection.md for more info.
*/
package gc

View File

@ -42,7 +42,7 @@ type Config struct {
AccessGrant string `help:"Access to download the bloom filters. Needs read and write permission."`
Bucket string `help:"bucket where retain info is stored" default:"" testDefault:"gc-queue"`
ExpireIn time.Duration `help:"Expiration of newly created objects. These objects store error messages." default:"336h"`
ExpireIn time.Duration `help:"Expiration of newly created objects in the bucket. These objects are under the prefix error-[timestamp] and store error messages." default:"336h"`
}
// NewService creates a new instance of the gc sender service.
@ -58,7 +58,8 @@ func NewService(log *zap.Logger, config Config, dialer rpc.Dialer, overlay overl
}
// Service reads bloom filters of piece IDs to retain from a Storj bucket
// and sends them out to the storage nodes.
// and sends them out to the storage nodes. This is intended to run on a live satellite,
// not on a backup database.
//
// The split between creating retain info and sending it out to storagenodes
// is made so that the bloom filter can be created from a backup database.

View File

@ -445,7 +445,7 @@ contact.external-address: ""
# set if garbage collection bloom filters is enabled or not
# garbage-collection-bf.enabled: true
# how quickly uploaded bloom filters will be automatically deleted
# how long bloom filters will remain in the bucket for gc/sender to consume before being automatically deleted
# garbage-collection-bf.expire-in: 336h0m0s
# the false positive rate used for creating a garbage collection bloom filter
@ -478,7 +478,7 @@ contact.external-address: ""
# set if loop to send garbage collection retain filters is enabled
# garbage-collection.enabled: false
# Expiration of newly created objects. These objects store error messages.
# Expiration of newly created objects in the bucket. These objects are under the prefix error-[timestamp] and store error messages.
# garbage-collection.expire-in: 336h0m0s
# the time between each attempt to download and send garbage collection retain filters to storage nodes