Commit Graph

99 Commits

Author SHA1 Message Date
Clement Sam
bce022ea7a satellite/overlay: remove Type field from NodeDossier
The overlay.NodeDossier struct only tracks information about a
storagenode, the field is deprecated and no longer needed.
This is a kademlia left-over.

Updates https://github.com/storj/storj/issues/5426

Change-Id: Ie278ffd88d1b9a9fde6c81eb5f0e287bab8c9ef0
2023-10-18 18:21:26 +00:00
Márton Elek
98921f9faa satellite/overlay: fix placement selection config parsing
When we do `satellite run api --placement '...'`, the placement rules are not parsed well.

The problem is based on `viper.AllSettings()`, and the main logic is sg. like this (from a new unit test):

```
		r := ConfigurablePlacementRule{}
		err := r.Set(p)
		require.NoError(t, err)
		serialized := r.String()

		r2 := ConfigurablePlacementRule{}
		err = r2.Set(serialized)
		require.NoError(t, err)

		require.Equal(t, p, r2.String())
```

All settings evaluates the placement rules in `ConfigurablePlacementRules` and stores the string representation.

The problem is that we don't have proper `String()` implementation (it prints out the structs instead of the original definition.

There are two main solutions for this problem:

 1. We can fix the `String()`. When we parse a placement rule, the `String()` method should print out the original definition
 2. We can switch to use pure string as configuration parameter, and parse the rules only when required.

I feel that 1 is error prone, we can do it (and in this patch I added a lot of `String()` implementations, but it's hard to be sure that our `String()` logic is inline with the parsing logic.

Therefore I decided to make the configuration value of the placements a string (or a wrapper around string).

That's the main reason why this patch seems to be big, as I updated all the usages.

But the main part is in beginning of the `placement.go` (configuration parsing is not a pflag.Value implementation any more, but a separated step).

And `filter.go`, (a few more String implementation for filters.

https://github.com/storj/storj/issues/6248

Change-Id: I47c762d3514342b76a2e85683b1c891502a0756a
2023-09-21 14:31:41 +00:00
Márton Elek
e2006d821c satellite/overlay: change Reliable and KnownReliable
as GetParticipatingNodes and GetNodes, respectively.

We now want these functions to include offline and suspended nodes as
well, so that we can force immediate repair when pieces are out of
placement or in excluded countries. With that change, the old names no
longer made sense.

Change-Id: Icbcbad43dbde0ca8cbc80a4d17a896bb89b078b7
2023-09-02 23:34:50 +00:00
Márton Elek
f7b39aaed4 satellite/nodeselection: remove stats/size from nodeselection state
stats/size/count is not used by any production code, and it's not required, as we can assert the state with other checks.

real motivation: next commits will make the Selector of the State configurable, therefore we won't have one single Stat, it depends on the request parameters.

(we plan to support both network and id based randomization)

Change-Id: I631828fc0046d2fef5b7a674fc0268a0446e9655
2023-08-01 18:29:41 +00:00
Michal Niewrzal
1d62dc63f5 satellite/repair/repairer: fix NumHealthyInExcludedCountries calculation
Currently, we have issue were while counting unhealthy pieces we are
counting twice piece which is in excluded country and is outside segment
placement. This can cause unnecessary repair.

This change is also doing another step to move RepairExcludedCountryCodes
from overlay config into repair package.

Change-Id: I3692f6e0ddb9982af925db42be23d644aec1963f
2023-07-10 12:01:19 +02:00
Márton Elek
97a89c3476 satellite: switch to use nodefilters instead of old placement.AllowedCountry
placement.AllowedCountry is the old way to specify placement, with the new approach we can use a more generic (dynamic method), which can check full node information instead of just the country code.

The 90% of this patch is just search and replace:

 * we need to use NodeFilters instead of placement.AllowedCountry
 * which means, we need an initialized PlacementRules available everywhere
 * which means we need to configure the placement rules

The remaining 10% is the placement.go, where we introduced a new type of configuration (lightweight expression language) to define any kind of placement without code change.

Change-Id: Ie644b0b1840871b0e6bbcf80c6b50a947503d7df
2023-07-07 16:55:45 +00:00
Márton Elek
70cdca5d3c
satellite: move satellite/nodeselection/uploadselection => satellite/nodeselection
All the files in uploadselection are (in fact) related to generic node selection, and used not only for upload,
but for download, repair, etc...

Change-Id: Ie4098318a6f8f0bbf672d432761e87047d3762ab
2023-07-07 10:32:03 +02:00
Michal Niewrzal
21c1e66a85 satellite/overlay: refactor ReliabilityCache to keep more data
ReliabilityCache will be now using refactored overlay Reliable method.
This method will provide more info about nodes (e.g. country code) and
with this we are able to add two dedicated methods to classify pieces:
* OutOfPlacementPieces
* PiecesNodesLastNetsInOrder

With those new method we will fix issue where offline but reliable node
won't be checked for clumped pieces and off placement pieces.

https://github.com/storj/storj/issues/5998

Change-Id: I9ffbed9f07f4881c9db3bd0e5f0412f1a418dd82
2023-07-05 11:19:10 +02:00
Michal Niewrzal
f2cd7b0928 satellite/overlay: refactor Reliable to be used with repair checker
Currently we are using Reliable to get missing pieces for repair
checker. The issue is that now checker is looking at more things than
just missing pieces (clumped/off, placement pieces) and using only node
ID is not enough. We have issue where we are skipping offline nodes from
clumped and off placement pieces check.

Reliable was refactored to get data (e.g. country, lastNet) about all
reliable nodes. List is split into online and offline. This data will be
cached for quick use by repair checker. It will be also possible to
check nodes metadata like country code or lastNet.

We are also slowly moving `RepairExcludedCountryCodes` config from
overlay to repair which makes more sens for it.

This this first part of changes.

https://github.com/storj/storj/issues/5998

Change-Id: If534342488c0e440affc2894a8fbda6507b8959d
2023-07-05 10:56:31 +02:00
Márton Elek
500b6244f8
satellite/satellitedb: create table for node tags
Change-Id: I884bb740974e6b8241aa6b85faf266b85fe892d4
2023-07-05 09:38:53 +02:00
Márton Elek
d38b8fa2c4 satellite/nodeselection: use the same Node object from overlay and nodeselection
We use two different Node types in `overlay` and `uploadnodeselection` and converting back and forth.

Using the same object would allow us to use a unified node selection interface everywhere.

Change-Id: Ie71e29d60184ee0e5b4547eb54325f09c418f73c
2023-07-03 16:59:33 +00:00
Michal Niewrzal
98f4f249b2 satellite/overlay: refactor KnownReliable to be used with repairer
Currently we are using KnownUnreliableOrOffline to get missing pieces
for segment repairer (GetMissingPieces). The issue is that now repairer
is looking at more things than just missing pieces (clumped/off
placement pieces).

KnownReliable was refactored to get data (e.g. country, lastNet) about
all reliable nodes from provided list. List is split into online and
offline. This way we will be able to use results from this method to all
checks: missing pieces, clumped pieces, out of placement pieces.

This this first part of changes to handle different kind of pieces in
segment repairer.

https://github.com/storj/storj/issues/5998

Change-Id: I6cbaf59cff9d6c4346ace75bb814ccd985c0e43e
2023-06-27 13:27:23 +02:00
Michal Niewrzal
9e3fd4d514 satellite/overlay: delete unused method
Change-Id: I87828fcac4f4a9fb08c86af188aa6ea28c5c64af
2023-06-22 12:45:59 +00:00
Michal Niewrzal
fe21fd42f7 satellite/overlay: add GetNodesOutOfPlacement method
We would like to verify if nodes matches specific placement e.g. to
validate segment pieces are correctly geofenced.

https://github.com/storj/storj/issues/5896

Change-Id: I842767dccc121a3c60224f677ab55e5dc150c76e
2023-05-30 14:57:20 +02:00
Michal Niewrzal
36e046375c satellite/repair/checker: remove segments loop parts
We are switching completely to ranged loop.

https://github.com/storj/storj/issues/5368

Change-Id: I8583549973cd36aa0e0c482c20d7a75cb7568ab3
2023-05-08 12:19:13 +00:00
Michal Niewrzal
1aa24b9f0d satellite/audit: remove segments loop parts
We are switching completely to ranged loop.

https://github.com/storj/storj/issues/5368

Change-Id: I9cec0ac454f40f19d52c078a8b1870c4d192bd7a
2023-04-24 15:52:11 +00:00
Márton Elek
ffaf15a3b0 satellite/overlay: remove unused mail service from overlay
It was surprising that `satellite auditor` complained about SMTP mail settings, even if it's not supposed to sending any mail.

Looks like we can remove the mail service dependency, as it's not a hard requirement for overlay.Service.

Change-Id: I29a52eeff3f967ddb2d74a09458dc0ee2f051bd7
2023-03-09 12:17:35 +00:00
paul cannon
2522ff09b6 satellite/overlay: configurable meaning of last_net
Up to now, we have been implementing the DistinctIP preference with code
in two places:

 1. On check-in, the last_net is determined by taking the /24 or /64
    (in ResolveIPAndNetwork()) and we store it with the node record.
 2. On node selection, a preference parameter defines whether to return
    results that are distinct on last_net.

It can be observed that we have never yet had the need to switch from
DistinctIP to !DistinctIP, or from !DistinctIP to DistinctIP, on the
same satellite, and we will probably never need to do so in an automated
way. It can also be observed that this arrangement makes tests more
complicated, because we often have to arrange for test nodes to have IP
addresses in different /24 networks (a particular pain on macOS).

Those two considerations, plus some pending work on the repair framework
that will make repair take last_net into consideration, motivate this
change.

With this change, in the #2 place, we will _always_ return results that
are distinct on last_net. We implement the DistinctIP preference, then,
by making the #1 place (ResolveIPAndNetwork()) more flexible. When
DistinctIP is enabled, last_net will be calculated as it was before. But
when DistinctIP is _off_, last_net can be the same as address (IP and
port). That will effectively implement !DistinctIP because every
record will have a distinct last_net already.

As a side effect, this flexibility will allow us to change the rules
about last_net construction arbitrarily. We can do tests where last_net
is set to the source IP, or to a /30 prefix, or a /16 prefix, etc., and
be able to exercise the production logic without requiring a virtual
network bridge.

This change should be safe to make without any migration code, because
all known production satellite deployments use DistinctIP, and the
associated last_net values will not change for them. They will only
change for satellites with !DistinctIP, which are mostly test
deployments that can be recreated trivially. For those satellites which
are both permanent and !DistinctIP, node selection will suddenly start
acting as though DistinctIP is enabled, until the operator runs a single
SQL update "UPDATE nodes SET last_net = last_ip_port". That can be done
either before or after deploying software with this change.

I also assert that this will not hurt performance for production
deployments. It's true that adding the distinct requirement to node
selection makes things a little slower, but the distinct requirement is
already present for all production deployments, and they will see no
change.

Refs: https://github.com/storj/storj/issues/5391
Change-Id: I0e7e92498c3da768df5b4d5fb213dcd2d4862924
2023-03-09 02:20:12 +00:00
JT Olio
686faeedbd satellite/overlay: return noise info with selected nodes
we have two more fields in the database (noise_proto and
noise_public_key) that now need to go into pb.NodeAddress when
returning AddressedOrderLimits.

the only real complication is making sure type conversions between
database types and NodeURLs and so on don't lose this new
pb.NodeAddress field (NoiseInfo). otherwise this is a relatively
straightforward commit

Change-Id: I45b59d7b2d3ae21c2e6eb95497f07cd388d454b3
2023-02-02 15:46:27 +00:00
JT Olio
2753d5a32f satellite/overlay: keep track of noise info per node
Change-Id: Icef04c3e87dbf4bb57d3837274c323bf6dd2c81f
2023-02-01 23:03:35 -05:00
Cameron
0596651580 satellite/satellitedb: fix updating nodes.last_software_update_email
The CASE expression used to determine which value to set
last_software_update_email to did not have an ELSE clause. Therefore,
when the node is both below the minimum version and did not receive a
version update email (no condition is true), the value would be set to
NULL.

Additionally, replace `time.Now()` with `timestamp` in the check to
determine if the email cooldown has passed.

Change-Id: I2e2e93f1a865e123ed8b665be9621cebfb72236f
2023-02-01 17:25:58 -05:00
paul cannon
b6bcb32ecf satellite/reputation: more accurate "reputation changes" list
`overlay.(*Service).UpdateReputation()` takes a "reputationChanges"
parameter, a slice of node events indicating whether we think the node's
disqualification or suspension status is changing. This is necessary so
that the overlay service can notify the nodeevents DB about these
changes.

In several cases, however, this list of events is not constructed
correctly, because of missing information about the previous state.
In most cases, this is because the node was offline, and the order limit
creation functions (which usually obtain and return the prior reputation
status) ignored that node.

This change makes it so that all callers to
`overlay.(*Service).UpdateReputation()` can be expected to provide a
correct list of change events (as correct as feasible, given that we
can't lock the node's information in the database during the entire
operation).

It ended up that there was only one caller we needed to worry about, and
that was reputation.(*Service).ApplyAudit(). So the bulk of this change
is teaching that function how to recognize when the prior reputation
status was not filled in, and fill it in.

Refs: https://github.com/storj/storj/issues/5464
Change-Id: I52ce385fc9c0ce3b283b998d517998e7f4ec8792
2023-01-31 18:39:40 +00:00
JT Olio
e40191afd6 storj: upgrade to use latest storj/common NodeAddress
Change-Id: I5987391bcfe5f6dfd7b525698c337a4cbda9b76e
2023-01-25 01:37:26 +00:00
Cameron
8681a36164 satellite/overlay: add ability to get offline nodes in need of email
Add LastOfflineEmail to overlay.NodeDossier. This is the last time a
node got an offline email. Add two new overlay db methods,
GetOfflineNodesForEmail and UpdateLastOfflineEmail. Edit db method
UpdateCheckIn to nullify last_offline_email if node is up.

Change-Id: I1ee60e7d98dd1b68348a57f9a4fb77c6c9895d6d
2022-11-17 19:03:04 +00:00
Cameron
2a25974261 satellite/overlay: insert node software update events into node events
When a node checks in and its version is below the minimum, insert
BelowMinVersion event into node events

Change-Id: I0e437ac34496778369515cbc40c15676da8b27ae
2022-11-11 20:43:56 +00:00
Cameron
74ddfab810 satellite/overlay: insert DQ event into node events in overlay.DisqualifyNode
Also, return node email from overlaycache db DisqualifyNode to be used
in node events insertion

Change-Id: I41534cf01351c1690c3966a8055c5fe6fcf0d6a6
2022-11-04 15:18:31 +00:00
Cameron
68fe26ebe5 satelite/overlay: insert reputation events into node events
Insert reputation event into node events if reputation change occurs.

Change-Id: If1c5526092cb6834fe2faa6aa6e0306d4d88a4b7
2022-11-02 18:32:20 +00:00
Cameron
865974950d satellite/overlay: insert node online events into node events table
Instead of sending emails at the time the node is seen to be back
online, we have decided to send the event to the node events table,
which will initiate the email sending process at some point.

Change-Id: Id756209498112579de8e78ee20ad2df54571a617
2022-11-02 16:26:19 +00:00
Cameron
f06da25c3d satellite/overlay: add nodeevents.DB to satellite overlay service
Add nodeevents.DB to satellite overlay service so we can insert node
events into the nodeevents DB.

Change-Id: I642c0ccc9941ecdb08cb22d5c8cf701959a55156
2022-11-02 15:56:37 +00:00
Cameron
a2ca443e29 satellite/overlay: send Node Online emails
Send an email when a node goes from offline to online.

Change-Id: I82d3f9001b9b0669e096d784edf097ffdcb1697d
2022-10-20 18:56:35 +00:00
Cameron
a52f766273 satellite/overlay: add email-sending functionality to overlay service
We want to send emails to SNOs. Node status changes go through the
overlay service, so it's a good place to add the mail service.
Add the mailservice.Service, satellite address, and satellite name to
overlay service. Also add feature flag --overlay.send-node-emails

Change-Id: I3bd2cb3bf22f9724954ce2374f8b651b902b3a24
2022-10-13 18:01:05 +00:00
Egon Elbre
48b0a65fbd satellite/overlay: use ReadCache in Download/UploadSelectionCache
sync2.ReadCache implements preemptive refreshing preventing stalling
while it's being updated.

Change-Id: Iee9ef36049b986f0e426c14a139b2bc9ac17fb53
2022-07-12 13:52:48 +03:00
Yaroslav Vorobiov
3f47d19aa6 satellite/overlay: add disqualification reason
Add disqualification reason to NodeDossier.
Extend DB.DisqualifyNode with disqualification reason.
Extend reputation Service.TestDisqualifyNode with disqualification reason.

Change-Id: I8611b6340c7f42ac1bb8bd0fd7f0648ad650ab2d
2022-04-20 13:29:31 +00:00
Yaroslav Vorobiov
4223fa01f8 satellite/reputation: add disqualification reason for status update
Set disqualification reason when reputations stats are updated on DB.Update.
Added tests for DisqualifyNode and for disqualification cases which happens during Update.

Change-Id: I00130ab5d9722422805159ad2f183c205de60f7e
2022-04-20 13:29:10 +00:00
Fadila Khadar
29fd36a20e satellite/repairer: handle excluded countries
For nodes in excluded areas, we don't necessarily want to remove them
from the pointer, but we do want to increase the number of pieces in the
segment in case those excluded area nodes go down. To do that, we
increase the number of pieces repaired by the number of pieces in
excluded areas.

Change-Id: I0424f1bcd7e93f33eb3eeeec79dbada3b3ea1f3a
2022-03-14 10:59:36 -04:00
Fadila Khadar
e776c65172 satellite/checker: pieces in excluded countries are not healthy
Add a RepairExcludedCountryCodes config flag for overlay for providing a list of country codes to exclude nodes from target repair selection.

Mark segments with less than repairThreshold pieces in countries not in the RepairExcludedCountryCodes as not healthy.
With this change, the repair process is not affected. The segment will be removed from the repair queue by the repairer.

Another change will handle the logic at the repairer level.

Fixes https://github.com/storj/team-metainfo/issues/95

Change-Id: I9231b32de117a116488de055a3e94efcabb46e81
2022-03-02 09:59:09 +00:00
Egon Elbre
be02aa9b17 satellite/overlay: add test for UpdateCheckIn panic
The database table got invalid input and the resulting error
was not checked. This adds updates that contain invalid fields
to trigger different errors.

Change-Id: Iacea32cbef5599aab562c88e4113073596cc9996
2022-01-24 15:01:59 +02:00
Yingrong Zhao
1f8f7ebf06 satellite/{audit, reputation}: fix potential nodes reputation status
inconsistency

The original design had a flaw which can potentially cause discrepancy
for nodes reputation status between reputations table and nodes table.
In the event of a failure(network issue, db failure, satellite failure, etc.)
happens between update to reputations table and update to nodes table, data
can be out of sync.
This PR tries to fix above issue by passing through node's reputation from
the beginning of an audit/repair(this data is from nodes table) to the next
update in reputation service. If the updated reputation status from the service
is different from the existing node status, the service will try to update nodes
table. In the case of a failure, the service will be able to try update nodes
table again since it can see the discrepancy of the data. This will allow
both tables to be in-sync eventually.

Change-Id: Ic22130b4503a594b7177237b18f7e68305c2f122
2022-01-06 21:05:59 +00:00
Egon Elbre
a42b9d1a48 all: fix uses of email.com
email.com is not a domain that should be used for examples
nor tests.

Change-Id: I654d4287d02633d5ed9740e81a79150470eeaf25
2022-01-05 16:29:19 +02:00
Cameron Ayer
bb21551a9c satellite/satellitedb: remove references to contained column in nodes table
We don't use this column for anything. If you want to know if a node is
contained, you can check the pending_audits table.

Change-Id: I8da1d8e01a2dcaff63c5067a7927b5451424ad04
2021-10-14 19:17:46 +00:00
Yingrong Zhao
646ce5b8cc satellite/overlay: remove reputation logic from overlay
Change-Id: I3492860e4537c7a8e4e824ec4c9c8d179134a0c0
2021-07-28 15:15:28 -04:00
Yingrong Zhao
f8914ccce0 satellite/{repair, overlay}: use reputation store in repair
Change-Id: I48db9e68f48239d48621ccc77d33618ecb83ce1a
2021-07-28 13:22:05 -04:00
Yingrong Zhao
e91574cee1 satellite/{reputation, gracefulexit}: use reputation store in
gracefulexit

With the effort to move audit related data into reputation store, this
PR updates gracefulexit endpoint to use reputation service to get a
node's audit score

Change-Id: Iad93ea689ad67ff9c57c7be16687e21e715fab7a
2021-07-28 13:21:41 -04:00
Yingrong Zhao
6c7bf357cd satellite/{reputation,audit,overlay}: replace overlay with reputation
package in audit

This PR implements reputation store and replace overlay in audit service
to use such store for storing node's audit stats.

In order to keep the changeset smaller, most of the changes in this PR is for copying audit logic in overlay to
reputation package. In a following PR, the duplicating code will be
removed from overlay.

Change-Id: I16c12494a0970f44c422b26cf603c1dc489e5bc1
2021-07-28 13:10:48 -04:00
Cameron Ayer
8c124c6fa4 satellite/{reputation,overlay,satellitedb}: create reputation service, DB, add overlay method UpdateReputation
Define service and DB interface for storing node reputation data
and updating the overlay cache.
Add overlay service and DB method UpdateReputation.
See https://github.com/storj/storj/pull/4144

Change-Id: Iedd8bd3274457d26c595919303d55327c1464b8c
2021-06-24 16:19:15 +00:00
Cameron Ayer
05f8d2d0b1 satellite/satellitedb: filter offline suspended nodes from selection
Change-Id: I5a6f413453332238d579a7bf50eb30e9156f96c2
2021-03-27 23:36:46 +00:00
Yaroslav Vorobiov
966535e9de {storagenode,satellite}/nodeoperator: add wallet features
Change-Id: Iac7eb40a52b8fddcc573aebaad2e3a30a10cded9
2021-02-08 22:09:45 +02:00
Egon Elbre
19e3dc4ec0 satellite/overlay: rename NodeSelectionCache to UploadSelectionCache
It wasn't obvious that NodeSelectionCache was only for uploads.

Change-Id: Ifeeaa6fdb50a4b7916245b48d8634d70ac54459c
2021-01-28 14:56:53 +02:00
Cameron Ayer
d14607a5f7 satellite/{contact,nodestats,overlay,satellitedb}: remove references to total_uptime_count and uptime_success_count columns
Change-Id: I1f92022909bc564e9b1e31bf937fdfe7c16554de
2021-01-19 15:43:02 -05:00
Cameron Ayer
0403e99a5b satellite/{overlay,satellitedb}: remove unused methods for old downtime tracking
GetSuccessfulNodeNotCheckedInSince and GetOfflineNodesLimited are overlay methods
which were only used by the previous downtime tracking system which has been removed.
These methods should also be removed.

Change-Id: Idb829d742e1f987e095604423fff656fe581183e
2021-01-11 15:21:28 +00:00