storj

History

paul cannon 7f1cad6faf satellite/repair: better handling of piece fetch errors We have an alert on `repair_too_many_nodes_failed` which fires too frequently. Every time so far, it has been because of a network blip of some nature on the satellite side. Satellite operators are expected to have other means in place for alerting on network problems and fixing them, so it's not necessary for the repair framework to act in that way. Instead, in this change, we change the way that `repair_too_many_nodes_failed` works. When a repair fails, we collect piece fetch errors by type and determine from them whether it looks like we are having network problems (most errors are connection failures, possibly also some successful connections which subsequently time out) or whether something else has happened. We will now only emit `repair_too_many_nodes_failed` when the outcome does not look like a network failure. In the network failure case, we will instead emit `repair_suspected_network_problem`. Refs: https://github.com/storj/storj/issues/4669 Change-Id: I49df98da5df9c606b95ad08a2bdfec8092fba926		2022-09-23 09:35:06 +00:00
..
checker	satellite: fix segment loop observers metrics	2022-08-10 14:13:16 +00:00
queue	satellite/repair/checker: buffer repair queue	2022-05-12 16:28:05 +00:00
repairer	satellite/repair: better handling of piece fetch errors	2022-09-23 09:35:06 +00:00
priority_test.go	satellite/repair: test inmemory/disk difference only once	2022-03-29 14:08:13 +03:00
priority.go	all: reformat comments as required by gofmt 1.19	2022-08-10 18:24:55 +00:00
repair_test.go	satellite/repair: better handling of piece fetch errors	2022-09-23 09:35:06 +00:00
repair.go	satellite/repair: move test files (#2649 )	2019-07-28 12:15:34 +03:00