storj/satellite
paul cannon 726c95160b satellite/repair: avoid retrying GET_REPAIR incorrectly
We retry a GET_REPAIR operation in one case, and one case only (as far
as I can determine): when we are trying to connect to a node using its
last known working IP and port combination rather than its supplied
hostname, and we think the operation failed the first time because of a
Dial failure.

However, logs collected from storage node operators along with logs
collected from satellites are strongly indicating that we are retrying
GET_REPAIR operations in some cases even when we succeeded in connecting
to the node the first time. This results in the node complaining loudly
about being given a duplicate order limit (as it should), whereupon the
satellite counts that as an unknown error and potentially penalizes the
node.

See discussion at
https://forum.storj.io/t/get-repair-error-used-serial-already-exists-in-store/17922/36
.

Investigation into this problem has revealed that
`!piecestore.CloseError.Has(err)` may not be the best way of determining
whether a problem occurred during Dial. In fact, it is probably
downright Wrong. Handling of errors on a stream is somewhat complicated,
but it would appear that there are several paths by which an RPC error
originating on the remote side might show up during the Close() call,
and would thus be labeled as a "CloseError".

This change creates a new error class, repairer.ErrDialFailed, with
which we will now wrap errors that _really definitely_ occurred during
a Dial call. We will use this class to determine whether or not to retry
a GET_REPAIR operation. The error will still also be wrapped with
whatever wrapper classes it used to be wrapped with, so the potential
for breakage here should be minimal.

Refs: https://github.com/storj/storj/issues/4687
Change-Id: Ifdd3deadc8258f34cf3fbc42aff393fa545794eb
2022-07-18 05:11:56 +00:00
..
accounting storagenode/console/consoleapi: use fixed time.Now() 2022-07-01 12:36:01 +03:00
admin web/{multinode,satellite,storagenode}: revert go.mod 2022-07-08 19:51:51 +03:00
analytics satellite/analytics: Added tracks calls for product activity metrics (#4907) 2022-06-17 12:57:10 -07:00
attribution {cmd/satellite/reports, satellite/attribution}: type and variable name adjustments 2022-04-26 20:12:38 +00:00
audit satellite/reputation: add a reputation write cache 2022-07-14 21:40:16 +00:00
buckets satellite/metainfo: propagate geofencing between buckets and stream id 2021-11-24 08:05:05 +00:00
compensation satellite/compensation: add a code that crypthopper-go now uses 2022-04-25 10:46:51 +00:00
console satellite/console: added new email which is sent on unknown password reset 2022-07-14 14:32:59 +00:00
contact satellite/contact: swap net.IP.IsPrivateIP with isPrivateIP 2022-06-13 01:01:44 +02:00
gc satellite: use more optimal monkit call for loop observers methods 2022-05-20 11:03:41 +00:00
geoip satellite/geoip: update node check-in to associate a country code 2021-11-10 16:44:41 +01:00
gracefulexit satellite: use more optimal monkit call for loop observers methods 2022-05-20 11:03:41 +00:00
inspector satellite/metabase: drop GetObjectLatestVersion method 2022-02-02 09:40:53 +00:00
internalpb satellite/reputation: don't need 3 identical AuditHistory types 2022-05-24 05:48:46 +00:00
mailservice satellite/{mailservice,oidc}: improvements to debugging 2022-05-11 19:59:42 +00:00
metabase satellite/metabase: don't return expired objects 2022-07-04 21:32:30 +03:00
metainfo satellite/metainfo: use metaclient type instead of storj 2022-07-08 14:17:19 +02:00
metrics satellite/{audit,metrics}: optimize loop methods 2022-05-05 15:10:56 +00:00
nodeapiversion satellite/nodeapiversion: new table for tracking node api usage 2020-07-09 15:02:25 +00:00
nodeselection/uploadselection satellite/repairer: handle excluded countries 2022-03-14 10:59:36 -04:00
nodestats satellite/reputation: don't need 3 identical AuditHistory types 2022-05-24 05:48:46 +00:00
oidc satellite/console: integrate sessions into satellite UI 2022-06-13 08:02:02 +00:00
orders satellite/{orders,overlay}: use cache for downloads 2022-07-12 11:04:34 +00:00
overlay satellite/reputation: add a reputation write cache 2022-07-14 21:40:16 +00:00
payments satellite: wire storjscan chore to core process 2022-07-14 15:07:52 +00:00
repair satellite/repair: avoid retrying GET_REPAIR incorrectly 2022-07-18 05:11:56 +00:00
reputation satellite/reputation: add a reputation write cache 2022-07-14 21:40:16 +00:00
revocation satellite/satellitedb: move tests to their domains 2021-02-19 17:29:15 +02:00
rewards satellite/rewards: adding SeaweedFS to partners list (#4230) 2021-10-19 21:30:31 +02:00
satellitedb satellite/reputation: add a reputation write cache 2022-07-14 21:40:16 +00:00
snopayouts all: fix error naming 2021-04-29 15:38:21 +03:00
admin.go satellite/admin: fix console config handling 2022-05-27 22:26:06 +00:00
api.go satellite/reputation: add a reputation write cache 2022-07-14 21:40:16 +00:00
configlock_test.go all: fix linter complaints 2020-10-13 15:59:01 +03:00
core.go satellite/reputation: add a reputation write cache 2022-07-14 21:40:16 +00:00
gc.go satellite: more detailed goroutine labels 2022-05-11 17:50:55 +00:00
peer.go satellite/payments/storjscan: add payments DB 2022-06-10 13:44:27 +01:00
repairer.go satellite/reputation: add a reputation write cache 2022-07-14 21:40:16 +00:00