storj

Author	SHA1	Message	Date
Cameron Ayer	8c124c6fa4	satellite/{reputation,overlay,satellitedb}: create reputation service, DB, add overlay method UpdateReputation Define service and DB interface for storing node reputation data and updating the overlay cache. Add overlay service and DB method UpdateReputation. See https://github.com/storj/storj/pull/4144 Change-Id: Iedd8bd3274457d26c595919303d55327c1464b8c	2021-06-24 16:19:15 +00:00
Jeremy Wharton	8a070e7c25	satellite/overlay: Ignore unnecessary check-ins This prevents the database from being contacted unnecessarily, reducing load. Change-Id: Ib2420f68a20636ec35eb3dd3df8e02bd5341b419	2021-06-22 09:00:41 +00:00
Egon Elbre	0858c3797a	satellite/{metabase,satellitedb}: deduplicate AS OF SYSTEM TIME code Currently we were duplicating code for AS OF SYSTEM TIME in several places. This replaces the code with using a method on dbutil.Implementation. As a consequence it's more useful to use a shorter name for implementation - 'impl' should be sufficiently clear in the context. Similarly, using AsOfSystemInterval and AsOfSystemTime to distinguish between the two modes is useful and slightly shorter without causing confusion. Change-Id: Idefe55528efa758b6176591017b6572a8d443e3d	2021-05-11 12:40:36 +03:00
Cameron Ayer	a0c5da6643	satellite/satellitedb: in stray nodes DQ, don't DQ nodes where last_contact_success = '0001-01-01 00:00:00+00' When nodes check in for the very first time, if the satellite can't ping them back, they are inserted into the nodes table with last_contact_success of '0001-01-01 00:00:00+00'. If the stray nodes chore runs before the node can fix their problem, they are DQd. Solution: when DQing stray nodes, dont DQ where last_contact_success = '0001-01-01 00:00:00+00'::timestamptz Change-Id: I477a02d5ef85b2c930ed6b7d99a4d1995169bca8	2021-04-22 10:13:13 -04:00
Egon Elbre	267506bb20	satellite/metabase: move package one level higher metabase has become a central concept and it's more suitable for it to be directly nested under satellite rather than being part of metainfo. metainfo is going to be the "endpoint" logic for handling requests. Change-Id: I53770d6761ac1e9a1283b5aa68f471b21e784198	2021-04-21 15:54:22 +03:00
Jeff Wendling	a65aecfd98	compensation: always generate invoices for every node instead of only generating invoices for nodes that had some activity, we generate it for every node so that we can find and pay terminal nodes that did not meet thresholds before we recognized them as terminal. Change-Id: Ibb3433e1b35f1ddcfbe292c034238c9fa1b66c44	2021-03-29 14:15:45 +00:00
Egon Elbre	d57873fd45	satellite/overlay: remove Inspector Currently overlay.Inspector had two rpc methods and both of them were unimplemented. Change-Id: I1a2ecc7b7113898fa234a1c1fe451c8cc9e2ee81	2021-03-29 12:26:10 +03:00
Michał Niewrzał	237782813b	Merge remote-tracking branch 'origin/multipart-upload' Change-Id: If6c5a450b238adab55d1e0dea67d01e5f5768a9f	2021-03-23 09:44:49 +01:00
Cameron Ayer	2607b16070	satellite/{overlay/straynodes,satellitedb}: rework DQNodesLastSeenBefore to return DQd node IDs and last contact successes We would like to log Node IDs and last contact successes of nodes DQd in this manner. We would also like to avoid returning an unbounded list of items from the db. Therefore we change the query to select a limited number of nodes that meet the DQ conditions and iterate until 0 rows are returned. Each column of the query is already indexed. Change-Id: Iaec2d9b56e7202b7c2028ba21750d40c8dd506ee	2021-03-22 13:01:30 -04:00
Michał Niewrzał	d995fb497f	Merge remote-tracking branch 'origin/main' into multipart-upload Change-Id: I367da03351ab80f7343332420490dde9282aa47a	2021-02-23 12:31:31 +01:00
JT Olio	b2ed7edd30	cmd/satellite: restore-trash parallel workers Change-Id: Ic7466b21c20bda334e7ba4268a494e96b6528ac1	2021-02-18 19:11:19 +02:00
JT Olio	3ae3389ddc	cmd/satellite: restore-trash command Change-Id: I80fc932c12147692d49cde277784871ac611fcad	2021-02-18 09:19:22 -07:00
Kaloyan Raev	d0612199f0	Merge remote-tracking branch 'origin/main' into multipart-upload Conflicts: go.mod go.sum satellite/metainfo/config.go satellite/metainfo/metainfo_test.go Change-Id: I95cf3c1d020a7918795b5eec63f36112fdb86749	2021-02-01 14:32:12 +02:00
Egon Elbre	b7a0739219	satellite/overlay: use DownloadSelectionCache for getting node IPs Change-Id: Ib8f4eedb2bf465767050693a1e961b37a294ca06	2021-01-29 16:47:10 +02:00
Egon Elbre	54e01d37f9	satellite/overlay: add DownloadSelectionCache Change-Id: Ic0779280172325f8d03f55a2e9673722f72bdd44	2021-01-29 16:47:06 +02:00
Egon Elbre	19e3dc4ec0	satellite/overlay: rename NodeSelectionCache to UploadSelectionCache It wasn't obvious that NodeSelectionCache was only for uploads. Change-Id: Ifeeaa6fdb50a4b7916245b48d8634d70ac54459c	2021-01-28 14:56:53 +02:00
Kaloyan Raev	c24ada7114	Merge remote-tracking branch 'origin/main' into multipart-upload Conflicts: go.mod go.sum Change-Id: Icf7c029e9d800e5f6a9fdd208c36f28e05468690	2021-01-20 17:35:57 +02:00
Cameron Ayer	d14607a5f7	satellite/{contact,nodestats,overlay,satellitedb}: remove references to total_uptime_count and uptime_success_count columns Change-Id: I1f92022909bc564e9b1e31bf937fdfe7c16554de	2021-01-19 15:43:02 -05:00
Cameron Ayer	75d828200c	private,satellite: add chore to dq stray nodes Full scope: private/testplanet,satellite/{overlay,satellitedb} Description: In most cases, downtime tracking with audits will eventually lead to DQ for nodes who are unresponsive. However, if a stray node has no pieces, it will not be audited and will thus never be disqualified. This chore will check for nodes who have not successfully been contacted in some set time and DQ them. There are some new flags for toggling DQ of stray nodes and the timeframes for running the chore and how long nodes can go without contact. Change-Id: Ic9d41fdbf214736798925e728245180fb3c55615	2021-01-19 14:21:56 -05:00
Kaloyan Raev	6dff40f5c5	Merge remote-tracking branch 'origin/main' into multipart-upload Conflicts: go.mod go.sum satellite/metainfo/metainfo.go Change-Id: Ib5c49f3c911c58319855a171f9ce73657da976d9	2021-01-14 14:33:59 +02:00
Egon Elbre	85fb964afe	satellite/{metainfo,overlay}: improvements to GetObjectIPs * Deduplicate NodeID list prior to fetching IPs. * Use NodeSelectionCache for fetching reliable IPs. * Return number of segements, reliable pieces and all pieces. Change-Id: I13e679caab275488b4037624b840a4068dad9589	2021-01-14 09:12:45 +00:00
Cameron Ayer	0403e99a5b	satellite/{overlay,satellitedb}: remove unused methods for old downtime tracking GetSuccessfulNodeNotCheckedInSince and GetOfflineNodesLimited are overlay methods which were only used by the previous downtime tracking system which has been removed. These methods should also be removed. Change-Id: Idb829d742e1f987e095604423fff656fe581183e	2021-01-11 15:21:28 +00:00
Michał Niewrzał	ec88d21a3c	Merge 'main' branch. Change-Id: I6e8162d1a6caf75e89c9f9c9f9522730aebf83ae	2021-01-11 10:26:58 +01:00
Moby von Briesen	6e2ef3b9ee	Revert "satellite/satellitedb: Do not consider nodes with offline_suspended as reputable." This reverts commit `e24262c2c9`. Change-Id: I287deb2e52d03bbd698ed055f0f216b0b5bf2798	2021-01-04 14:28:37 +00:00
Michał Niewrzał	ad3e3a38c5	Merge 'main' branch Change-Id: Ia0db1b1f9ef3e0671d3f2208881b0abc3064e200	2021-01-04 12:13:45 +01:00
Moby von Briesen	825dc71227	satellite/{overlay, satellitedb}: Refactor audit history * Separate audit history interface into its own file in the overlay package * Add overlay.AuditHistory struct so that internalpb.AuditHistory is only used from within the database layer * Add overlay.GetAuditHistory function for features that will require access to detailed audit history information * Do not return full audit history from UpdateAuditHistory - callers to that function only need to know the online score and whether a full tracking period has been completed * Move audit history tests out of satellite/satellitedb, since they are independent of database implementation Change-Id: I35b0c4ac23bbaabd80624f8a9631c3cb1a1f33bd	2020-12-29 18:50:22 +00:00
Moby von Briesen	e24262c2c9	satellite/satellitedb: Do not consider nodes with offline_suspended as reputable. Nodes which are offline_suspended will no longer be considered for new uploads. The current threshold that enters a node into offline suspension is 0.6. Disqualification for offline suspension is still disabled. Change-Id: I0da9abf47167dd5bf6bb21e0bc2186e003e38d1a	2020-12-29 17:59:09 +00:00
Ethan Adams	6070018021	satellite/overlay: use AS OF SYSTEM TIME with Cockroach Query nodes table using AS OF SYSTEM TIME '-10s' (by default) when on CRDB to alleviate contention on the nodes table and minimize CRDB retries. Queries for standard uploads are already cached, and node lookups for graceful exit uploads has retry logic so it isn't necessary for the nodes returned to be current.	2020-12-22 21:07:07 +02:00
Michal Niewrzal	8d3ea9c251	satellite/repair/repairer: implement SegmentRepairer with metabase Change-Id: I647c625e00a626c44e812602ad9bc3e85a7b602c	2020-12-17 10:47:21 +00:00
Stefan Benten	494bd5db81	all: golangci-lint v1.33.0 fixes (#3985 )	2020-12-05 17:01:42 +01:00
Cameron Ayer	dc67ce74c9	satellite: remove IsUp field from overlay.UpdateRequest With the new overlay.AuditOutcome type for offline audits, the IsUp field is redundant. If AuditOutcome != AuditOffline, then the node is online. In addition to removing the field itself, other changes needed to be made regarding the relationship between 'uptime' and 'audits'. Previously, uptime and audit outcome were completely separated. For example, it was possible to update a node's stats to give it a successful/failed/unknown audit while simultaneously indicating that the node was offline by setting IsUp to false. This is no longer possible under this changeset. Some test which did this have been changed slightly in order to pass. Also add new benchmarks for UpdateStats and BatchUpdateStats with different audit outcomes. Change-Id: I998892d615850b1f138dc62f9b050f720ea0926b	2020-11-02 15:34:17 -05:00
Egon Elbre	11338e9beb	satellite/internalpb: move audithistory.pb Change-Id: I8eee84d49ed90459168ddaf04ae57f790c2a22c4	2020-10-30 15:30:11 +02:00
Cameron Ayer	bb7be23115	satellite/{audit,overlay,satellitedb}: enable reporting offline audits - Remove flag for switching off offline audit reporting. - Change the overlay method used from UpdateUptime to BatchUpdateStats, as this is where the new online scoring is done. - Add a new overlay.AuditOutcome type: AuditOffline. Since we now use the same method to record offline audits as success, failure, and unknown, we need to distinguish offline audits from the rest. Change-Id: Iadcfe10cf13466fa1a1c2dc542db8994a6423355	2020-10-27 10:44:46 +00:00
Egon Elbre	2268cc1df3	all: fix linter complaints Change-Id: Ia01404dbb6bdd19a146fa10ff7302e08f87a8c95	2020-10-13 15:59:01 +03:00
Cameron Ayer	b39a99bae6	satellite/{overlay,satellitedb}: always show node's real online score Previously if a node did not have audit history data for each of the windows over the tracking period, we would give them the benefit of the doubt and set their score to 1. This was to prevent nodes from being suspended right out the gate. We need a minimum amount of data to evaluate them. However, a node who is actually failing at being online will have no idea until they have received enough audits and we suspend them. Instead, we will always use their real score, but use a flag to determine whether they are eligible for suspension/dq. Change-Id: I382218f12e8770f95d4bcddcf101ef348940cadf	2020-10-02 12:28:11 -04:00
Jennifer Johnson	4e2413a99d	satellite/satellitedb: uses vetted_at field to select for reputable nodes Additionally, this PR changes NewNodeFraction devDefault and testplanet config from 0.05 to 1. This is because many tests relied on selecting nodes that were reputable based on audit and uptime counts of 0, in effect, selecting new nodes as reputable ones. However, since reputation is now indicated by a vetted_at db field that is explicitly set rather than implied by audit and uptime counts, it would be more complicated to try to update all of the nodes' reputations before selecting nodes for tests. Now we just allow all test nodes to be new if needed. Change-Id: Ib9531be77408662315b948fd029cee925ed2ca1d	2020-09-04 16:45:32 +00:00
Moby von Briesen	2d01dd9732	satellite/satellitedb: Add online_score column to nodes table Add online score used for the new audit history offline tracking system to the nodes table. This allows us easy access to the node's online score for the storagenode dashboard as well as for data analysis. Change-Id: Ie99be1192e5236862a5b3dbed2e5ef03b9169410	2020-08-31 15:07:07 +00:00
Moby von Briesen	60a95d0dc9	satellite/{satellitedb,overlay}: Enable offline suspension and review period When a node's audit history "online score" passes below a configured threshold, the node goes into "offline suspension" mode and begins a review period, where the operator is given an opportunity to bring their node back online. After the review period passes, offline suspension is turned off for the node. In the future, if a node still has a bad online score at the end of the review period, it will be disqualified. This is disabled right now. In the future, if a node is in offline suspension, it will be treated as "unhealthy". Right now, there are no consequences for being in offline suspension. Minor changes: * Moves AuditHistoryConfig out of UpdateStats/BatchUpdateStats args and into UpdateRequest. * Adds "now" argument to UpdateStats/BatchUpdateStats args for easy testing. * Changes formatting strings inside buildUpdateStatement to use specific types. Change-Id: I032b60298840fc16e6ef831da750f2d57619a397	2020-08-28 16:35:48 +00:00
Moby von Briesen	959cd5cd83	satellite/satellitedb: Update audit history from overlay.UpdateStats and overlay.BatchUpdateStats Change-Id: Ib530b61895ca4a8b12ba022c408a416b237b56d7	2020-08-20 22:46:28 +00:00
Moby von Briesen	5f0477ebe9	satellite/{overlay,satellitedb}: Create database functionality for updating audit history Add a function to the overlay cache called UpdateAuditHistory, which allows us to add online or offline audits to a particular node's audit history, and get that node's "online score" for the configured tracking period. The next step will be to use UpdateAuditHistory from inside BatchUpdateStats/UpdateStats, so that audit history is actually updated when nodes get audited, and we can suspend nodes based on their online score. Change-Id: I2289105e6961e68e829a987ff756b0e576fab120	2020-08-20 17:34:27 +00:00
Qweder93	01bb2bd17d	satellite/audit: verifier checks if node made sucess GE before auditing Change-Id: Ia6cde4e9fcf11020a5301d38065f7159f276eb80	2020-08-17 23:37:57 +03:00
Moby von Briesen	5dfe27f175	satellite/{repair,overlay}: Use overlay NodeSelectionCache for repair uploads This change removes the overlay function FindStorageNodesForRepair, which skips using the node selection cache and hits the database directly. Otherwise, it is functionally identical to FindStorageNodesForUpload, which checks the node selection cache first. When selecting nodes for PUT_REPAIRs, we now call FindStorageNodesForUpload instead of FindStorageNodesForRepair to reduce database load. Change-Id: If34e109695b2ed2b8fb6759115bf769a3459684e	2020-08-04 12:50:12 -04:00
Egon Elbre	080ba47a06	all: fix dots Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2	2020-07-16 14:58:28 +00:00
Cameron Ayer	3b4b5f45c7	satellite: replace references to Suspended with UnknownAuditSuspended Change-Id: I3d2d00c95954c0546ad077702617895f262926ef	2020-06-23 14:19:22 +00:00
Egon Elbre	f68e7b3fde	satellite/overlay: replace pb.InfoResponse pb.InfoResponse wasn't used for protocol buffer communication, but instead as a satellite type. Change-Id: I755619f2deec5b76c4fe488591b7d8c1b9fcdafb	2020-06-16 15:16:55 +03:00
Jennifer Johnson	03e5f922c3	satellite/overlay: updates node with a vetted_at timestamp if they meet the vetting criteria What: As soon as a node passes the vetting criteria (total_audit_count and total_uptime_count are greater than the configured thresholds), we set vetted_at to the current timestamp. Why: We may want to use this timestamp in future development to select new vs vetted nodes. It also allows flexibility in node vetting experiments and allows for better metrics around vetting times. Please describe the tests: satellitedb_test: TestUpdateStats and TestBatchUpdateStats make sure vetted_at is set appropriately Please describe the performance impact: This change does add extra logic to BatchUpdateStats and UpdateStats and commits another variable to the db (vetted_at), but this should be negligible. Change-Id: I3de804549b5f1bc359da4935bc859758ceac261d	2020-05-20 16:30:26 -04:00
Egon Elbre	5d016425f1	satellite/{contact,downtime,overlay}: use NodeURL Change-Id: I555a479a89e0ddbf0499898bdbc8574282cd6846	2020-05-20 11:09:05 +00:00
Egon Elbre	678b859172	satellite/overlay: remove MinimumRequiredNodes In non-test code we were only using RequestedCount, not need to have MinimumRequiredNodes. Change-Id: I40736f4b028b41e94abfdeb221bce5aa86a5cb82	2020-05-07 15:41:23 +00:00
Egon Elbre	2955c50bc1	satellite/overlay: fix data-race in node selection cache GetNodes returned references to nodes in the immutable state, however some parts of code expect them to be modified. Change-Id: I5be1866f95e0dbe062a6b6be60e29f2365c35faa	2020-05-06 20:03:06 +03:00
Egon Elbre	4e94da3fda	satellite/overlay: add feature flag for node selection cache Also distinguish the purpose for selecting nodes to avoid potential confusion, what should allow caching and what shouldn't. Change-Id: Iee2451c1f10d0f1c81feb1641507400d89918d61	2020-05-06 16:13:47 +03:00

1 2

93 Commits