storj/satellite/repair
paul cannon 985ccbe721 satellite/repair: in dns redial, don't retry if CloseError
To save load on DNS servers, the repair code first tries to dial the
last known good ip and port for a node, and then falls back to a DNS
lookup only if we fail to connect to the last known good ip and port.

However, it looks like we are seeing errors during the client stream
Close() call (probably due to quic-go code), and those are classified
the same as errors encountered during Dial. The repairer code sees this
error, assumes that we failed to contact the node, and retries- but
since we did actually succeed in connecting the first time around, this
results in submitting the same order limit (with the same serial number)
to the storage node, which (rightfully) rejects it.

So together with change I055c186d5fd4e79560f67763175bc3130b9bc7d2 in
storj/uplink, this should avoid the double submission and avoid dinging
nodes' suspension scores unfairly.

See https://github.com/storj/storj/issues/4687.

Also, moving the testsuite directory check up above check-monkit in the
Jenkins Lint task, so that a non-tidy testsuite/go.mod can be recognized
and handled before everything breaks weirdly and seemingly randomly
later on.

Change-Id: Icb2b05aaff921d0af6aba10e450ac7e0a7bb2655
2022-04-04 17:01:09 +00:00
..
checker satellite/repair: add test that confirms that repairer is ignoring copied segments 2022-03-16 09:00:34 +00:00
queue satellite/repair: migrate to new repair_queue table 2021-06-30 17:12:24 +02:00
repairer satellite/repair: in dns redial, don't retry if CloseError 2022-04-04 17:01:09 +00:00
priority_test.go satellite/repair: test inmemory/disk difference only once 2022-03-29 14:08:13 +03:00
priority.go satellite/repair: clamp totalNodes to 100 or higher 2020-12-30 10:39:14 -06:00
repair_test.go satellite/repair: test inmemory/disk difference only once 2022-03-29 14:08:13 +03:00
repair.go satellite/repair: move test files (#2649) 2019-07-28 12:15:34 +03:00