storj

Author	SHA1	Message	Date
Egon Elbre	b7a0739219	satellite/overlay: use DownloadSelectionCache for getting node IPs Change-Id: Ib8f4eedb2bf465767050693a1e961b37a294ca06	2021-01-29 16:47:10 +02:00
Egon Elbre	54e01d37f9	satellite/overlay: add DownloadSelectionCache Change-Id: Ic0779280172325f8d03f55a2e9673722f72bdd44	2021-01-29 16:47:06 +02:00
Egon Elbre	19e3dc4ec0	satellite/overlay: rename NodeSelectionCache to UploadSelectionCache It wasn't obvious that NodeSelectionCache was only for uploads. Change-Id: Ifeeaa6fdb50a4b7916245b48d8634d70ac54459c	2021-01-28 14:56:53 +02:00
littleskunk	0b2568d712	satellite/overlay/straynodes: increase development duration without contact Stopping storj-sim for over 5 minutes caused nodes to be disqualified. Set development max duration without contact to 300 days.	2021-01-26 12:24:39 +02:00
Cameron Ayer	d14607a5f7	satellite/{contact,nodestats,overlay,satellitedb}: remove references to total_uptime_count and uptime_success_count columns Change-Id: I1f92022909bc564e9b1e31bf937fdfe7c16554de	2021-01-19 15:43:02 -05:00
Cameron Ayer	75d828200c	private,satellite: add chore to dq stray nodes Full scope: private/testplanet,satellite/{overlay,satellitedb} Description: In most cases, downtime tracking with audits will eventually lead to DQ for nodes who are unresponsive. However, if a stray node has no pieces, it will not be audited and will thus never be disqualified. This chore will check for nodes who have not successfully been contacted in some set time and DQ them. There are some new flags for toggling DQ of stray nodes and the timeframes for running the chore and how long nodes can go without contact. Change-Id: Ic9d41fdbf214736798925e728245180fb3c55615	2021-01-19 14:21:56 -05:00
Egon Elbre	85fb964afe	satellite/{metainfo,overlay}: improvements to GetObjectIPs * Deduplicate NodeID list prior to fetching IPs. * Use NodeSelectionCache for fetching reliable IPs. * Return number of segements, reliable pieces and all pieces. Change-Id: I13e679caab275488b4037624b840a4068dad9589	2021-01-14 09:12:45 +00:00
Cameron Ayer	0403e99a5b	satellite/{overlay,satellitedb}: remove unused methods for old downtime tracking GetSuccessfulNodeNotCheckedInSince and GetOfflineNodesLimited are overlay methods which were only used by the previous downtime tracking system which has been removed. These methods should also be removed. Change-Id: Idb829d742e1f987e095604423fff656fe581183e	2021-01-11 15:21:28 +00:00
Moby von Briesen	6e2ef3b9ee	Revert "satellite/satellitedb: Do not consider nodes with offline_suspended as reputable." This reverts commit `e24262c2c9`. Change-Id: I287deb2e52d03bbd698ed055f0f216b0b5bf2798	2021-01-04 14:28:37 +00:00
Moby von Briesen	edbee53888	satellite,storagenode: Pass audit history over GetStats endpoint Full prefix: satellite/{overlay,nodestats},storagenode/{reputation,nodestats} Allow the storagenode to receive its audit history data from the satellite via the satellite's GetStats endpoint. The storagenode does not save this data for use in the API yet. Change-Id: I9488f4d7a4ccb4ccf8336b8e4aeb3e5beee54979	2020-12-30 19:13:26 +00:00
Moby von Briesen	825dc71227	satellite/{overlay, satellitedb}: Refactor audit history * Separate audit history interface into its own file in the overlay package * Add overlay.AuditHistory struct so that internalpb.AuditHistory is only used from within the database layer * Add overlay.GetAuditHistory function for features that will require access to detailed audit history information * Do not return full audit history from UpdateAuditHistory - callers to that function only need to know the online score and whether a full tracking period has been completed * Move audit history tests out of satellite/satellitedb, since they are independent of database implementation Change-Id: I35b0c4ac23bbaabd80624f8a9631c3cb1a1f33bd	2020-12-29 18:50:22 +00:00
Moby von Briesen	e24262c2c9	satellite/satellitedb: Do not consider nodes with offline_suspended as reputable. Nodes which are offline_suspended will no longer be considered for new uploads. The current threshold that enters a node into offline suspension is 0.6. Disqualification for offline suspension is still disabled. Change-Id: I0da9abf47167dd5bf6bb21e0bc2186e003e38d1a	2020-12-29 17:59:09 +00:00
Ethan Adams	6070018021	satellite/overlay: use AS OF SYSTEM TIME with Cockroach Query nodes table using AS OF SYSTEM TIME '-10s' (by default) when on CRDB to alleviate contention on the nodes table and minimize CRDB retries. Queries for standard uploads are already cached, and node lookups for graceful exit uploads has retry logic so it isn't necessary for the nodes returned to be current.	2020-12-22 21:07:07 +02:00
Stefan Benten	494bd5db81	all: golangci-lint v1.33.0 fixes (#3985 )	2020-12-05 17:01:42 +01:00
Moby von Briesen	3fc76f4ffe	satellite/downtime: Remove deprecated downtime tracking service. We are no longer planning on implementing downtime penalization using the method described in docs/blueprints/archive/storage-node-downtime-tracking-deprecated.md. Now, we are implementing the design described in docs/blueprints/storage-node-downtime-tracking-with-audits.md. This change removes the downtime estimation chores from the satellite core as well as the package satellite/downtime. A future change will remove the database table. Change-Id: I1a1d3cf9dceeba36255d25243294865b89925518	2020-12-02 15:16:13 -05:00
Cameron Ayer	dc67ce74c9	satellite: remove IsUp field from overlay.UpdateRequest With the new overlay.AuditOutcome type for offline audits, the IsUp field is redundant. If AuditOutcome != AuditOffline, then the node is online. In addition to removing the field itself, other changes needed to be made regarding the relationship between 'uptime' and 'audits'. Previously, uptime and audit outcome were completely separated. For example, it was possible to update a node's stats to give it a successful/failed/unknown audit while simultaneously indicating that the node was offline by setting IsUp to false. This is no longer possible under this changeset. Some test which did this have been changed slightly in order to pass. Also add new benchmarks for UpdateStats and BatchUpdateStats with different audit outcomes. Change-Id: I998892d615850b1f138dc62f9b050f720ea0926b	2020-11-02 15:34:17 -05:00
Egon Elbre	11338e9beb	satellite/internalpb: move audithistory.pb Change-Id: I8eee84d49ed90459168ddaf04ae57f790c2a22c4	2020-10-30 15:30:11 +02:00
Egon Elbre	7ce372c686	satellite/internalpb: add inspectors Change-Id: Ib688e43d05135c0c31ae95df533f1e4535ea396a	2020-10-30 13:28:17 +02:00
Cameron Ayer	bb7be23115	satellite/{audit,overlay,satellitedb}: enable reporting offline audits - Remove flag for switching off offline audit reporting. - Change the overlay method used from UpdateUptime to BatchUpdateStats, as this is where the new online scoring is done. - Add a new overlay.AuditOutcome type: AuditOffline. Since we now use the same method to record offline audits as success, failure, and unknown, we need to distinguish offline audits from the rest. Change-Id: Iadcfe10cf13466fa1a1c2dc542db8994a6423355	2020-10-27 10:44:46 +00:00
Moby von Briesen	7c3afe164b	satellite/overlay: uncomment dq for offline and disable with feature flag Change-Id: Ib39e2be32e880b822a94eddfb81af99a38843a27	2020-10-16 12:55:16 +00:00
Egon Elbre	2268cc1df3	all: fix linter complaints Change-Id: Ia01404dbb6bdd19a146fa10ff7302e08f87a8c95	2020-10-13 15:59:01 +03:00
Cameron Ayer	b39a99bae6	satellite/{overlay,satellitedb}: always show node's real online score Previously if a node did not have audit history data for each of the windows over the tracking period, we would give them the benefit of the doubt and set their score to 1. This was to prevent nodes from being suspended right out the gate. We need a minimum amount of data to evaluate them. However, a node who is actually failing at being online will have no idea until they have received enough audits and we suspend them. Instead, we will always use their real score, but use a flag to determine whether they are eligible for suspension/dq. Change-Id: I382218f12e8770f95d4bcddcf101ef348940cadf	2020-10-02 12:28:11 -04:00
Jennifer Johnson	4e2413a99d	satellite/satellitedb: uses vetted_at field to select for reputable nodes Additionally, this PR changes NewNodeFraction devDefault and testplanet config from 0.05 to 1. This is because many tests relied on selecting nodes that were reputable based on audit and uptime counts of 0, in effect, selecting new nodes as reputable ones. However, since reputation is now indicated by a vetted_at db field that is explicitly set rather than implied by audit and uptime counts, it would be more complicated to try to update all of the nodes' reputations before selecting nodes for tests. Now we just allow all test nodes to be new if needed. Change-Id: Ib9531be77408662315b948fd029cee925ed2ca1d	2020-09-04 16:45:32 +00:00
Moby von Briesen	2d01dd9732	satellite/satellitedb: Add online_score column to nodes table Add online score used for the new audit history offline tracking system to the nodes table. This allows us easy access to the node's online score for the storagenode dashboard as well as for data analysis. Change-Id: Ie99be1192e5236862a5b3dbed2e5ef03b9169410	2020-08-31 15:07:07 +00:00
Moby von Briesen	60a95d0dc9	satellite/{satellitedb,overlay}: Enable offline suspension and review period When a node's audit history "online score" passes below a configured threshold, the node goes into "offline suspension" mode and begins a review period, where the operator is given an opportunity to bring their node back online. After the review period passes, offline suspension is turned off for the node. In the future, if a node still has a bad online score at the end of the review period, it will be disqualified. This is disabled right now. In the future, if a node is in offline suspension, it will be treated as "unhealthy". Right now, there are no consequences for being in offline suspension. Minor changes: * Moves AuditHistoryConfig out of UpdateStats/BatchUpdateStats args and into UpdateRequest. * Adds "now" argument to UpdateStats/BatchUpdateStats args for easy testing. * Changes formatting strings inside buildUpdateStatement to use specific types. Change-Id: I032b60298840fc16e6ef831da750f2d57619a397	2020-08-28 16:35:48 +00:00
Moby von Briesen	959cd5cd83	satellite/satellitedb: Update audit history from overlay.UpdateStats and overlay.BatchUpdateStats Change-Id: Ib530b61895ca4a8b12ba022c408a416b237b56d7	2020-08-20 22:46:28 +00:00
Moby von Briesen	5f0477ebe9	satellite/{overlay,satellitedb}: Create database functionality for updating audit history Add a function to the overlay cache called UpdateAuditHistory, which allows us to add online or offline audits to a particular node's audit history, and get that node's "online score" for the configured tracking period. The next step will be to use UpdateAuditHistory from inside BatchUpdateStats/UpdateStats, so that audit history is actually updated when nodes get audited, and we can suspend nodes based on their online score. Change-Id: I2289105e6961e68e829a987ff756b0e576fab120	2020-08-20 17:34:27 +00:00
Qweder93	01bb2bd17d	satellite/audit: verifier checks if node made sucess GE before auditing Change-Id: Ia6cde4e9fcf11020a5301d38065f7159f276eb80	2020-08-17 23:37:57 +03:00
Egon Elbre	94a09ce20b	all: add missing dots Change-Id: I93b86c9fb3398c5d3c9121b8859dad1c615fa23a	2020-08-11 17:50:01 +03:00
Qweder93	4ee1b2d45a	storagenode/console: added list of all audits per satellite to sno dashboard/satellites Change-Id: I52e58748d6467f372d9a308347fc77e400d137e2	2020-08-10 12:55:07 +00:00
Moby von Briesen	e02adfe5e9	satellite/overlay/config.go: Add AuditHistoryConfig to overlay Adds AuditHistory{WindowSize, TrackingPeriod, GracePeriod, OfflineThreshold}. These values will be used to track offline audits over time, and to suspend/disqualify nodes for being offline for too long. Change-Id: I05f7dbc3c034bdc53c4fbd7719c71a44f37ec6a5	2020-08-04 18:18:56 +00:00
Moby von Briesen	5dfe27f175	satellite/{repair,overlay}: Use overlay NodeSelectionCache for repair uploads This change removes the overlay function FindStorageNodesForRepair, which skips using the node selection cache and hits the database directly. Otherwise, it is functionally identical to FindStorageNodesForUpload, which checks the node selection cache first. When selecting nodes for PUT_REPAIRs, we now call FindStorageNodesForUpload instead of FindStorageNodesForRepair to reduce database load. Change-Id: If34e109695b2ed2b8fb6759115bf769a3459684e	2020-08-04 12:50:12 -04:00
Egon Elbre	080ba47a06	all: fix dots Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2	2020-07-16 14:58:28 +00:00
stefanbenten	257855b5de	all: replace == comparison with errors.Is Change-Id: I05d9a369c7c6f144b94a4c524e8aea18eb9cb714	2020-07-14 15:50:25 +00:00
Egon Elbre	5bdcd86fa7	ci: test benchmarks This runs each benchmark for one iteration to ensure that they are valid. Unfortunately, it does not give any useful metrics as output. Change-Id: I68940398c8dd849aed656bd12656f48d5df10128	2020-07-10 13:26:49 +00:00
Cameron Ayer	3b4b5f45c7	satellite: replace references to Suspended with UnknownAuditSuspended Change-Id: I3d2d00c95954c0546ad077702617895f262926ef	2020-06-23 14:19:22 +00:00
Egon Elbre	f68e7b3fde	satellite/overlay: replace pb.InfoResponse pb.InfoResponse wasn't used for protocol buffer communication, but instead as a satellite type. Change-Id: I755619f2deec5b76c4fe488591b7d8c1b9fcdafb	2020-06-16 15:16:55 +03:00
Cameron Ayer	bad299b541	satellite/satellitedb: serialize UpdateStats and BatchUpdateStats transactions Since we increased the number of audit workers from 1 to 2, we need to make sure concurrent updates do not trample each other. We can do this by serializing the transactions. Change-Id: If1b2f71cabe3c779c12ffa33c0c3271778ac3ae0	2020-06-10 17:11:28 +00:00
Egon Elbre	c5d4a13158	satellite/nodeselection: use NodeURL Change-Id: I2ebd4dbf993ff5c7864f3a3a665b5c8fc48aa7d1	2020-05-27 05:46:11 +00:00
littleskunk	2fbb34c3ea	nodeselection: Increase minimum free space to 500MB (#3898 )	2020-05-25 12:13:28 +02:00
Egon Elbre	bef84a5f9d	storagenode: remove dependency to overlay.NodeDossier This is the last dependency from storage node to satellite. Change-Id: I12f7abb91e84f823ba5af126c6e2979519838612	2020-05-21 08:37:13 +03:00
Jennifer Johnson	03e5f922c3	satellite/overlay: updates node with a vetted_at timestamp if they meet the vetting criteria What: As soon as a node passes the vetting criteria (total_audit_count and total_uptime_count are greater than the configured thresholds), we set vetted_at to the current timestamp. Why: We may want to use this timestamp in future development to select new vs vetted nodes. It also allows flexibility in node vetting experiments and allows for better metrics around vetting times. Please describe the tests: satellitedb_test: TestUpdateStats and TestBatchUpdateStats make sure vetted_at is set appropriately Please describe the performance impact: This change does add extra logic to BatchUpdateStats and UpdateStats and commits another variable to the db (vetted_at), but this should be negligible. Change-Id: I3de804549b5f1bc359da4935bc859758ceac261d	2020-05-20 16:30:26 -04:00
Egon Elbre	5d016425f1	satellite/{contact,downtime,overlay}: use NodeURL Change-Id: I555a479a89e0ddbf0499898bdbc8574282cd6846	2020-05-20 11:09:05 +00:00
Egon Elbre	941d10cbc3	private/testplanet: remove Peer.Local() Currently storagenode depends on overlay.NodeDossier, this is the first step in removing it. Change-Id: I034a3f1601835f8349bd41752455022e19bcc707	2020-05-20 11:05:34 +00:00
littleskunk	8ec64f3daf	satellite/overlay: enable node selection cache on all satellites (#3895 )	2020-05-19 19:25:53 +02:00
Egon Elbre	16abf02b35	satellite/{nodeselection,overlay}: use the new package Change-Id: I034fdbe578dec2e5c906aca82231cd3e56f26aeb	2020-05-18 21:38:43 +00:00
Jessica Grebenschikov	1ecbfc453f	satellite/overlay: add random test for node selection from cache Change-Id: I57b57db919b5afb3fc81176e078ac1e3a873565a	2020-05-18 16:59:52 +00:00
Egon Elbre	ec589a8289	all: fix comments about grpc Change-Id: Id830fbe2d44f083c88765561b6c07c5689afe5bd	2020-05-11 13:05:34 +03:00
Egon Elbre	678b859172	satellite/overlay: remove MinimumRequiredNodes In non-test code we were only using RequestedCount, not need to have MinimumRequiredNodes. Change-Id: I40736f4b028b41e94abfdeb221bce5aa86a5cb82	2020-05-07 15:41:23 +00:00
Egon Elbre	ce6a500b0c	satellite/overlay: support DistinctIP=false in selection cache Most of our tests and storj-sim are using DistinctIP. Also fix bug with newNodeCount calculation. Change-Id: I1a6d0efe7074908896a3322d19f487b929f0f0fc	2020-05-07 11:04:32 +00:00

1 2 3

126 Commits