storj

Author	SHA1	Message	Date
paul cannon	fc905a15f7	satellite/audit: newContainment->containment Now that all the reverification changes have been made and the old code is out of the way, this commit renames the new things back to the old names. Mostly, this involves renaming "newContainment" to "containment" or "NewContainment" to "Containment", but there are a few other renames that have been promised and are carried out here. Refs: https://github.com/storj/storj/issues/5230 Change-Id: I34e2b857ea338acbb8421cdac18b17f2974f233c	2022-12-16 17:59:52 +00:00
paul cannon	0342ca1aa6	satellite/audit: delete now-unused code Now that we are doing scalable piecewise reverifications, the code for handling the old way of doing things (containment, pending audits, reporting, testing) can now be removed. Refs: https://github.com/storj/storj/issues/5230 Change-Id: Ief1a75f423eff682e8f3d57804e343b3409a6631	2022-12-16 14:53:39 +00:00
paul cannon	a66503b444	satellite/audit: Begin using piecewise reverifications This commit pulls the big switch! We have been setting up piecewise reverifications (the workers for which can be scaled independently of the core) for several commits now, and this commit actually begins making use of them. The core of this commit is fairly small, but it requires changing the semantics in all the tests that relate to reverifications, so it ends up being a large change. The changes to the tests are mostly mechanical and repetitive, though, so reviewers needn't worry much. Refs: https://github.com/storj/storj/issues/5230 Change-Id: Ibb421cc021664fd6e0096ffdf5b402a69b2d6f18	2022-12-16 14:21:13 +00:00
paul cannon	1854351da6	satellite/audit: teach Reporter about piecewise audits The Reporter is responsible for processing results from auditing operations, logging the results, disqualifying nodes that reached the maximum reverification count, and passing the results on to the reputation system. In this commit, we extend the Reporter so that it knows how to process the results of piecewise reverification audits. We also change most reporter-related tests so that reverifications happen as piecewise reverification audits, exercising the new code. Note that piecewise reverification audits are not yet being done outside of tests. In a later commit, we will switch from doing segmentwise reverifications to piecewise reverifications, as part of the audit-scaling effort. Refs: https://github.com/storj/storj/issues/5230 Change-Id: I9438164ce1ea4d9a1790d18d0e1046a8eb04d8e9	2022-12-12 11:28:02 +00:00
paul cannon	378b8915c4	satellite/{satellitedb,audit}: add NewContainment NewContainment will replace Containment later in this commit chain, but for now it is not yet being used. NewContainment will allow a node to be contained for multiple pending reverify jobs at a time. It is implemented by way of the reverify queue. Refs: https://github.com/storj/storj/issues/5231 Change-Id: I126eda0b3dfc4710a88fe4a5f41780618ec19101	2022-12-07 18:03:37 +00:00
paul cannon	fd01c6cc25	satellite/{repair,audit}: simplify reputation reporter Also, make it an interface so that the upcoming write cache can be dropped in to the same place. Change-Id: I2c286743825e647c0cef5b6578245391851fa10c	2022-05-10 14:04:43 +00:00
Yingrong Zhao	1f8f7ebf06	satellite/{audit, reputation}: fix potential nodes reputation status inconsistency The original design had a flaw which can potentially cause discrepancy for nodes reputation status between reputations table and nodes table. In the event of a failure(network issue, db failure, satellite failure, etc.) happens between update to reputations table and update to nodes table, data can be out of sync. This PR tries to fix above issue by passing through node's reputation from the beginning of an audit/repair(this data is from nodes table) to the next update in reputation service. If the updated reputation status from the service is different from the existing node status, the service will try to update nodes table. In the case of a failure, the service will be able to try update nodes table again since it can see the discrepancy of the data. This will allow both tables to be in-sync eventually. Change-Id: Ic22130b4503a594b7177237b18f7e68305c2f122	2022-01-06 21:05:59 +00:00
Yingrong Zhao	0b500a30e4	satellite/audit: move audit metrics out of reporter Since we are sharing the reporting logic between repair and audit. We need to remove metric reporting logic in reporter. Change-Id: Ib87295ab19079329e7438327d785a7f5c21d3b21	2021-09-16 17:58:56 +00:00
Yingrong Zhao	b64d8084e1	satellite/audit: fix metric reporting when fail to complete an audit Change-Id: I39df8d4291db35afbba824281cb23438a91c45db	2021-08-31 17:02:30 +00:00
Cameron Ayer	5a1a29a62e	satellite/audit: fix containment bug where nodes not removed When a node gets enough timeouts, it is supposed to be removed from pending_audits and get an audit failure. We would give them a failure, but we missed the removal. This change fixes it. Change-Id: I2f7014e28d7d9b01a9d051f5bbb4f67c86c7b36b	2021-08-20 14:48:27 +00:00
Yingrong Zhao	6c7bf357cd	satellite/{reputation,audit,overlay}: replace overlay with reputation package in audit This PR implements reputation store and replace overlay in audit service to use such store for storing node's audit stats. In order to keep the changeset smaller, most of the changes in this PR is for copying audit logic in overlay to reputation package. In a following PR, the duplicating code will be removed from overlay. Change-Id: I16c12494a0970f44c422b26cf603c1dc489e5bc1	2021-07-28 13:10:48 -04:00
Michał Niewrzał	8ce619706b	satellite/audit: migrate to new segment_pending_audit table Currently, pending audit is finding segment by segment location (path) because we want to move audit to segmentloop and we will have only StreamID and Position we need to add columns for those fields. Altering existing table can cause issues while migration and deployment. Cleaner choise is to make new table. This change contains migration with new segment_pending_audit table that will replace pending_audits table and adjustments to use new table in the code. Table pending_audits will be dropped with next release. Change-Id: Id507e29c152da594bac1fd812c78d7ecf45ec51f	2021-06-28 13:19:49 +02:00
Michał Niewrzał	237782813b	Merge remote-tracking branch 'origin/multipart-upload' Change-Id: If6c5a450b238adab55d1e0dea67d01e5f5768a9f	2021-03-23 09:44:49 +01:00
Cameron Ayer	a04495713d	satellite/audit: add missing logs for audit failure conditions Among other conditions, nodes fail audits by returning incorrect data and by reaching the max reverify count, but we weren't logging these events. This commit adds the missing logs. Change-Id: I80749a7e95e8cb97bc8dd7dac1e523e223114b7f	2021-03-18 17:33:11 +00:00
Kaloyan Raev	9aa61245d0	satellite/audits: migrate to metabase Change-Id: I480c941820c5b0bd3af0539d92b548189211acb2	2020-12-17 14:38:48 +02:00
Stefan Benten	494bd5db81	all: golangci-lint v1.33.0 fixes (#3985 )	2020-12-05 17:01:42 +01:00
Cameron Ayer	dc67ce74c9	satellite: remove IsUp field from overlay.UpdateRequest With the new overlay.AuditOutcome type for offline audits, the IsUp field is redundant. If AuditOutcome != AuditOffline, then the node is online. In addition to removing the field itself, other changes needed to be made regarding the relationship between 'uptime' and 'audits'. Previously, uptime and audit outcome were completely separated. For example, it was possible to update a node's stats to give it a successful/failed/unknown audit while simultaneously indicating that the node was offline by setting IsUp to false. This is no longer possible under this changeset. Some test which did this have been changed slightly in order to pass. Also add new benchmarks for UpdateStats and BatchUpdateStats with different audit outcomes. Change-Id: I998892d615850b1f138dc62f9b050f720ea0926b	2020-11-02 15:34:17 -05:00
Cameron Ayer	bb7be23115	satellite/{audit,overlay,satellitedb}: enable reporting offline audits - Remove flag for switching off offline audit reporting. - Change the overlay method used from UpdateUptime to BatchUpdateStats, as this is where the new online scoring is done. - Add a new overlay.AuditOutcome type: AuditOffline. Since we now use the same method to record offline audits as success, failure, and unknown, we need to distinguish offline audits from the rest. Change-Id: Iadcfe10cf13466fa1a1c2dc542db8994a6423355	2020-10-27 10:44:46 +00:00
Egon Elbre	080ba47a06	all: fix dots Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2	2020-07-16 14:58:28 +00:00
littleskunk	23e5a0471f	satellite/audit: clean up logging (#3832 ) Co-authored-by: Ivan Fraixedes <ivan@fraixed.es>	2020-03-30 12:09:50 -06:00
Moby von Briesen	8b72181a1f	satellite/{audit,overlay,satellitedb}: implement unknown audit reputation and suspension * change overlay.UpdateStats to allow a third audit outcome. Now it can handle successful, failed, and unknown audits. * when "unknown audit reputation" (unknownAuditAlpha/(unknownAuditAlpha+unknownAuditBeta)) falls below the DQ threshold, put node into suspension. * when unknown audit reputation goes above the DQ threshold, remove node from suspension. * record unknown audits from audit reporter. * add basic tests around unknown audits and suspension. Change-Id: I125f06f3af52e8a29ba48dc19361821a9ff1daa1	2020-03-16 20:29:26 +00:00
Egon Elbre	6615ecc9b6	common: separate repository Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a	2019-12-27 14:11:15 +02:00
paul cannon	af24581ac0	satellite/audit: do not report offline to overlay (#3547 )	2019-12-18 04:51:24 -06:00
Maximillian von Briesen	8653dda2b1	satellite/audit: do not contain nodes for unknown errors (#3592 ) * skip unknown errors (wip) * add tests to make sure nodes that time out are added to containment * add bad blobs store * call "Skipped" "Unknown" * add tests to ensure unknown errors do not trigger containment * add monkit stats to lockfile * typo * add periods to end of bad blobs comments	2019-11-19 17:30:28 +01:00
littleskunk	eeb38245ff	satellite/audit: improve logging (#3285 )	2019-10-16 13:48:05 +02:00
Maximillian von Briesen	784ca1582a	satellite/audit: fix audit panic (#3217 )	2019-10-09 10:06:58 -04:00
Egon Elbre	a801fab66a	all: add archview annotations (#2964 )	2019-09-10 16:24:16 +03:00
Ivan Fraixedes	df29699641	satellite/audit: Improve code comment in reporter (#2838 )	2019-08-22 14:13:43 +02:00
Egon Elbre	c8edeb0257	satellite/overlay: rename overlay.Cache to overlay.Service (#2717 )	2019-08-06 19:35:59 +03:00
ethanadams	c9b46f2fe2	V3-1987: Optimize audits stats persistence (#2632 ) * Added batch update stats for recordAuditSuccessStatus * Added batch update stats to recordAuditFailStatus * added configurable batch size * build individual update/delete statements so the statements can be batched into 1 call to the DB * notified #config-changes channel and ran make update-satellite-config-lock * updated tests to use batch update stats	2019-07-31 13:21:06 -04:00
aligeti	6dee5a3918	added more logging info (#2667 )	2019-07-30 12:38:10 -04:00
aligeti	55e3a37d19	Adds more detailed error logging information V3-2239 (#2654 ) * added more logging info	2019-07-30 12:03:25 -04:00
Egon Elbre	5d0816430f	rename all the things (#2531 ) * rename pkg/linksharing to linksharing * rename pkg/httpserver to linksharing/httpserver * rename pkg/eestream to uplink/eestream * rename pkg/stream to uplink/stream * rename pkg/metainfo/kvmetainfo to uplink/metainfo/kvmetainfo * rename pkg/auth/signing to pkg/signing * rename pkg/storage to uplink/storage * rename pkg/accounting to satellite/accounting * rename pkg/audit to satellite/audit * rename pkg/certdb to satellite/certdb * rename pkg/discovery to satellite/discovery * rename pkg/overlay to satellite/overlay * rename pkg/datarepair to satellite/repair	2019-07-28 08:55:36 +03:00

33 Commits