Commit Graph

17 Commits

Author SHA1 Message Date
paul cannon
0dcc0a9ee0 satellite/reputation: reconfigure lambda and alpha
This is in response to community feedback that our existing reputation
calculation is too likely to disqualify storage nodes unfairly with
extreme swings up and down.

For details and analysis, please see the data_loss_vs_dq_chance_sim.py
tool, the "tuning reputation further.ipynb" Jupyter notebook in the
storj/datascience repository, and the discussion at

    https://forum.storj.io/t/tuning-audit-scoring/14084

In brief: changing the lambda and initial-alpha parameters in this way
causes the swings in reputation to be smaller and less likely to put a
node past the disqualification threshold unfairly.

Note: this change will cause a one-time reset of all (non-disqualified)
node reputations, because the new initial alpha value of 1000 is
dramatically different, and the disqualification threshold is going to
be much higher.

Change-Id: Id6dc4ba8fde1be3db4255b72282207bab5491ca3
2022-08-17 18:52:53 +00:00
paul cannon
2f20bbf4d8 satellite/reputation: add a reputation write cache
This should lower the amount of database load coming from
reputation updates.

Change-Id: Iaacfb81480075261da77c5cc93e08b24f69f8949
2022-07-14 21:40:16 +00:00
Cameron Ayer
56fe636123 satellite/{reputation/satellitedb}: remove references to contained column in reputations table
We don't use this column for anything. If you want to know if a node is
contained, you can check the pending_audits table.

Change-Id: I5671722a5fc6e1749d3a49e187a56556000ff941
2021-10-14 19:59:03 +00:00
Cameron Ayer
bb21551a9c satellite/satellitedb: remove references to contained column in nodes table
We don't use this column for anything. If you want to know if a node is
contained, you can check the pending_audits table.

Change-Id: I8da1d8e01a2dcaff63c5067a7927b5451424ad04
2021-10-14 19:17:46 +00:00
Yingrong Zhao
6c7bf357cd satellite/{reputation,audit,overlay}: replace overlay with reputation
package in audit

This PR implements reputation store and replace overlay in audit service
to use such store for storing node's audit stats.

In order to keep the changeset smaller, most of the changes in this PR is for copying audit logic in overlay to
reputation package. In a following PR, the duplicating code will be
removed from overlay.

Change-Id: I16c12494a0970f44c422b26cf603c1dc489e5bc1
2021-07-28 13:10:48 -04:00
Kaloyan Raev
9aa61245d0 satellite/audits: migrate to metabase
Change-Id: I480c941820c5b0bd3af0539d92b548189211acb2
2020-12-17 14:38:48 +02:00
Cameron Ayer
bb7be23115 satellite/{audit,overlay,satellitedb}: enable reporting offline audits
- Remove flag for switching off offline audit reporting.
- Change the overlay method used from UpdateUptime to BatchUpdateStats, as this
is where the new online scoring is done.
- Add a new overlay.AuditOutcome type: AuditOffline. Since we now use the same
method to record offline audits as success, failure, and unknown, we need to
distinguish offline audits from the rest.

Change-Id: Iadcfe10cf13466fa1a1c2dc542db8994a6423355
2020-10-27 10:44:46 +00:00
Cameron Ayer
3b4b5f45c7 satellite: replace references to Suspended with UnknownAuditSuspended
Change-Id: I3d2d00c95954c0546ad077702617895f262926ef
2020-06-23 14:19:22 +00:00
Moby von Briesen
de366537a8 satellite/satellitedb/overlaycache: fix behavior around gracefully exited nodes
Sometimes nodes who have gracefully exited will still be holding pieces
according to the satellite. This has some unintended side effects
currently, such as nodes getting disqualified after having successfully
exited.
* When the audit reporter attempts to update node stats, do not update
stats (alpha, beta, suspension, disqualification) if the node has
finished graceful exit (audit/reporter_test.go TestGracefullyExitedNotUpdated)
* Treat gracefully exited nodes as "not reputable" so that the repairer
and checker do not count them as healthy (overlay/statdb_test.go
TestKnownUnreliableOrOffline, repair/repair_test.go
TestRepairGracefullyExited)

Change-Id: I1920d60dd35de5b2385a9b06989397628a2f1272
2020-04-28 23:58:43 +00:00
Cameron Ayer
02613407ae satellite/satellitedb: only suspend node if not already suspended
Whenever the node's reputation is updated, if its unknown audit
reputation is below the suspension threshold, its suspension field
is set to the current time. This could overwrite the previous
"suspendedAt" value resulting a node that never reaches the end of
its suspension.

Also log whenever a node is disqualified or its suspension status
changes

Change-Id: I5e8c8f1c46f66d79cb279b5b16a84fe03f533deb
2020-04-10 09:37:37 +00:00
Moby von Briesen
2f991b6c56 satellite/{overlay, satellitedb}: account for suspended field in overlay cache
Make sure that suspended nodes are treated appropriately by the overlay
cache. This means we should expect the following behavior:
* suspended nodes (vetted or not) should not be selected for uploading
new segments
* suspended nodes should be treated by the checker and repairer as
"unhealthy", and should be removed upon successful repair

This commit also removes unused overlay functionality.

Fixes a bug with commit 8b72181a1f where
the audit reporter was automatically suspending nodes regardless of
audit outcome (see test added).

Tests:
* updates repair tests to ensure that a suspended node is treated as
unhealthy and will be removed from the pointer on successful repair
* updates overlay tests for KnownUnreliableOrOffline and KnownReliable
to expect suspended nodes to be considered "unreliable"
* adds satellitedb test that ensures overlay.SelectStorageNodes and
overlay.SelectNewStorageNodes do not include suspended nodes
* adds audit reporter test to ensure that different audit outcomes
result in the correct suspended/disqualified states

Change-Id: I40dba67278c8e8d2ce0bcec5e0a5cb6e4ce2f561
2020-03-17 17:14:56 +00:00
Egon Elbre
6615ecc9b6 common: separate repository
Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a
2019-12-27 14:11:15 +02:00
Egon Elbre
ee6c1cac8a
private: rename internal to private (#3573) 2019-11-14 21:46:15 +02:00
littleskunk
eeb38245ff
satellite/audit: improve logging (#3285) 2019-10-16 13:48:05 +02:00
Maximillian von Briesen
784ca1582a
satellite/audit: fix audit panic (#3217) 2019-10-09 10:06:58 -04:00
Natalie Villasana
aa3567187e
satellite/audit: worker now verifies and reverifies (#2965) 2019-09-11 18:37:01 -04:00
Egon Elbre
5d0816430f
rename all the things (#2531)
* rename pkg/linksharing to linksharing
* rename pkg/httpserver to linksharing/httpserver
* rename pkg/eestream to uplink/eestream
* rename pkg/stream to uplink/stream
* rename pkg/metainfo/kvmetainfo to uplink/metainfo/kvmetainfo
* rename pkg/auth/signing to pkg/signing
* rename pkg/storage to uplink/storage
* rename pkg/accounting to satellite/accounting
* rename pkg/audit to satellite/audit
* rename pkg/certdb to satellite/certdb
* rename pkg/discovery to satellite/discovery
* rename pkg/overlay to satellite/overlay
* rename pkg/datarepair to satellite/repair
2019-07-28 08:55:36 +03:00