storj

History

paul cannon 802ff18bd8 satellite/audit: better handling of piece fetch errors We have an alert on `not_enough_shares_for_audit` which fires too frequently. Every time so far, it has been because of a network blip of some nature on the satellite side. Satellite operators are expected to have other means in place for alerting on network problems and fixing them, so it's not necessary for the audit framework to act in that way. Instead, in this change, we add three new metrics, `audit_not_enough_nodes_online`, `audit_not_enough_shares_acquired`, and `audit_suspected_network_problem`. When an audit fails, and emits `not_enough_shares_for_audit`, we will now determine whether it looks like we are having network problems (most errors are connection failures, possibly also some successful connections which subsequently time out) or whether something else has happened. After this is deployed, we can remove the alert on `not_enough_shares_for_audit` and add new alerts on `audit_not_enough_nodes_online` and `audit_not_enough_shares_acquired`. `audit_suspected_network_problem` does not need an alert. Refs: https://github.com/storj/storj/issues/4669 Change-Id: Ibb256bc19d2578904f71f5229111ac98e5212fcb		2022-09-28 17:02:06 +00:00
..
audit_test.go	satellite/audit: verify auditing of copies	2022-04-12 15:24:54 +00:00
chore.go	satellite/audit: move to segmentloop	2021-06-28 11:32:00 +00:00
collector_test.go	all: reformat comments as required by gofmt 1.19	2022-08-10 18:24:55 +00:00
collector.go	satellite: fix segment loop observers metrics	2022-08-10 14:13:16 +00:00
containment_test.go	satellite/{audit, reputation}: fix potential nodes reputation status	2022-01-06 21:05:59 +00:00
containment.go	satellite/audit: migrate to new segment_pending_audit table	2021-06-28 13:19:49 +02:00
cryptosource.go	all: fix dots	2020-07-16 14:58:28 +00:00
disqualification_test.go	all: reformat comments as required by gofmt 1.19	2022-08-10 18:24:55 +00:00
getshare_test.go	private/testplanet: move Metabase outside Metainfo for satellite	2021-09-09 07:15:51 +00:00
integration_test.go	satellite/audits: migrate to metabase	2020-12-17 14:38:48 +02:00
pieces_test.go	satellite/repair: update audit records during repair	2021-09-24 00:48:13 +00:00
pieces.go	satellite/repair: move over audit.Pieces	2022-09-22 16:43:03 +00:00
queue_test.go	satellite/audit: move to segmentloop	2021-06-28 11:32:00 +00:00
queue.go	satellite/audits: migrate to metabase	2020-12-17 14:38:48 +02:00
reporter_test.go	satellite/reputation: reconfigure lambda and alpha	2022-08-17 18:52:53 +00:00
reporter.go	satellite/{repair,audit}: simplify reputation reporter	2022-05-10 14:04:43 +00:00
reservoir_test.go	satellite/audit: account for piece size during audit reservoir sampling	2021-12-01 18:17:52 +00:00
reservoir.go	satellite/audit: account for piece size during audit reservoir sampling	2021-12-01 18:17:52 +00:00
reverify_test.go	deps: upgrade storj.io/common	2022-02-16 18:59:19 +00:00
verifier_test.go	satellite/reputation: add a reputation write cache	2022-07-14 21:40:16 +00:00
verifier_unit_test.go	satellite/audit: move to segmentloop	2021-06-28 11:32:00 +00:00
verifier.go	satellite/audit: better handling of piece fetch errors	2022-09-28 17:02:06 +00:00
worker.go	satellite/audit,storagenode/gracefulexit: fixes to limiter	2022-08-03 10:24:16 +03:00