storj/cmd/tools/segment-verify
Fadila Khadar 995f78d579 satellite/cmd: segment-verify verifies segments in given bucket list
Provides the `segment-verify run buckets` command for verifying segments within a list of buckets.

Bucket list is a csv file with `project_id,bucket_name` to be checked.

https://github.com/storj/storj-private/issues/101

Change-Id: I3d25c27b56fcab4a6a1aebb6f87514d6c97de3ff
2022-12-13 20:10:00 +00:00
..
batch.go cmd/tools/segment-verify: add failure tests 2022-09-26 19:38:16 +00:00
csv_test.go cmd/tools/segment-verify: add csv writer 2022-09-15 13:28:21 +00:00
csv.go cmd/tools/segment-verify: add csv writer 2022-09-15 13:28:21 +00:00
main.go satellite/cmd: segment-verify verifies segments in given bucket list 2022-12-13 20:10:00 +00:00
nodealias.go cmd/tools/segment-verify: allow ignoring specific nodes 2022-10-10 20:14:38 +03:00
process_test.go cmd/tools/segment-verify: don't mark node immediately offline 2022-10-14 08:10:26 +00:00
process.go cmd/tools/segment-verify: don't mark node immediately offline 2022-10-14 08:10:26 +00:00
README.md satellite/cmd: segment-verify verifies segments in given bucket list 2022-12-13 20:10:00 +00:00
service_test.go satellite/cmd: segment-verify verifies segments in given bucket list 2022-12-13 20:10:00 +00:00
service.go satellite/cmd: segment-verify verifies segments in given bucket list 2022-12-13 20:10:00 +00:00
summarize.go cmd/tools/segment-verify: add tool for summarizing log 2022-10-10 20:02:50 +03:00
verify_test.go cmd/tools/segment-verify: don't mark node immediately offline 2022-10-14 08:10:26 +00:00
verify.go mod: bump dependencies 2022-10-19 17:01:53 +00:00

segment-verify is a tool for verifying the segments.

High Level Overview

segment-verify verifies segment status on storage nodes in a few stages:

  1. First it loads the metabase for a batch of --service.batch-size=10000 segments.
  2. They are then distributed into queues using every storage nodes. It will preferentially choose nodes specified in --service.priority-nodes-path file, one storagenode id per line.
  3. Then it will query each storage node a single byte for each segment. --service.concurrency=1000 concurrent connections at a time are made.
  4. Every segment will be checked --service.check=3 times. However, any failed attempt (e.g. node is offline) is only retried once.
  5. When there are failures in verification process itself, then those segments are written into --service.retry-path=segments-retry.csv path.
  6. When the segment isn't found at least on one of the nodes, then it's written into --service.not-found-path=segments-not-found.csv file.

There are few parameters for controlling the verification itself:

# This allows to throttle requests, to avoid overloading the storage nodes.
--verify.request-throttle minimum interval for sending out each request (default 150ms)
# When there's a failure to make a request, the process will retry after this duration.
--verify.order-retry-throttle duration     how much to wait before retrying order creation (default 50ms)
# This is the time each storage-node has to respond to the request.
--verify.per-piece-timeout duration        duration to wait per piece download (default 800ms)
# Just the regular dialing timeout.
--verify.dial-timeout duration             how long to wait for a successful dial (default 2s)

Running the tool

  • by specifying range boundaries:
segment-verify run range --low 00 --high ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff --config-dir ./satellite-config-dir
  • by specifying buckets to be checked:
segment-verify run buckets --buckets-csv bucket.csv