Make sure that suspended nodes are treated appropriately by the overlay
cache. This means we should expect the following behavior:
* suspended nodes (vetted or not) should not be selected for uploading
new segments
* suspended nodes should be treated by the checker and repairer as
"unhealthy", and should be removed upon successful repair
This commit also removes unused overlay functionality.
Fixes a bug with commit 8b72181a1f where
the audit reporter was automatically suspending nodes regardless of
audit outcome (see test added).
Tests:
* updates repair tests to ensure that a suspended node is treated as
unhealthy and will be removed from the pointer on successful repair
* updates overlay tests for KnownUnreliableOrOffline and KnownReliable
to expect suspended nodes to be considered "unreliable"
* adds satellitedb test that ensures overlay.SelectStorageNodes and
overlay.SelectNewStorageNodes do not include suspended nodes
* adds audit reporter test to ensure that different audit outcomes
result in the correct suspended/disqualified states
Change-Id: I40dba67278c8e8d2ce0bcec5e0a5cb6e4ce2f561
* change overlay.UpdateStats to allow a third audit outcome. Now it can
handle successful, failed, and unknown audits.
* when "unknown audit reputation"
(unknownAuditAlpha/(unknownAuditAlpha+unknownAuditBeta)) falls below the
DQ threshold, put node into suspension.
* when unknown audit reputation goes above the DQ threshold, remove node
from suspension.
* record unknown audits from audit reporter.
* add basic tests around unknown audits and suspension.
Change-Id: I125f06f3af52e8a29ba48dc19361821a9ff1daa1
My understanding is that the nodes table has the following fields:
- `address` field which can be a hostname or an IP
- `last_net` field that is the /24 subnet of the IP resolved from the address
This PR does the following:
1) add back the `last_ip` field to the nodes table
2) for uplink operations remove the calls that the satellite makes to `lookupNodeAddress` (which makes the DNS calls to resolve the IP from the hostname) and instead use the data stored in the nodes table `last_ip` field. This means that the IP that the satellite sends to the uplink for the storage nodes could be approx 1 hr stale. In the short term this is fine, next we will be adding changes so that the storage node pushes any IP changes to the satellite in real time.
3) use the address field for repair and audit since we want them to still make DNS calls to confirm the IP is up to date
4) try to reduce confusion about hostname, ip, subnet, and address in the code base
Change-Id: I96ce0d8bb78303f82483d0701bc79544b74057ac
With the new storage node downtime tracking feature, we need remove current uptime reputation configs: UptimeReputationAlpha, UptimeReputationBeta, and
UptimeReputationDQ. This is the first step of removing the uptime
reputation columns from satellitedb
Change-Id: Ie8fab13295dbf545e33aeda0c4306cda4ba54e36
* satellite/nodeselection: dont select nodes that havent checked in for a while
* change testplanet online window to one minute
* remove satellite reconfigure online window = 0 in repair tests
* pass timestamp into UpdateCheckIn
* change timestamp to timestamptz
* edit tests to set last_contact_success to 4 hours ago
* fix syntax error
* remove check for last_contact_success > last_contact_failure in IsOnline
* create upsert query for check-in method
* add tests
* fix lint err
* add benchmark test for db query
* fix lint and tests
* add a unit test, fix lint
* add address to tests
* replace print w/ b.Fatal
* refactor query per CR comments
* fix disqualified, only set if null
* fix query
* add version to updatecheckin query
* fix version
* fix tests
* change version for tests
* add version to tests
* add IP, add transport, mv unit test
* use node.address as arg
* add last ip
* fix lint
* rename pkg/linksharing to linksharing
* rename pkg/httpserver to linksharing/httpserver
* rename pkg/eestream to uplink/eestream
* rename pkg/stream to uplink/stream
* rename pkg/metainfo/kvmetainfo to uplink/metainfo/kvmetainfo
* rename pkg/auth/signing to pkg/signing
* rename pkg/storage to uplink/storage
* rename pkg/accounting to satellite/accounting
* rename pkg/audit to satellite/audit
* rename pkg/certdb to satellite/certdb
* rename pkg/discovery to satellite/discovery
* rename pkg/overlay to satellite/overlay
* rename pkg/datarepair to satellite/repair