Commit Graph

230 Commits

Author SHA1 Message Date
Márton Elek
ad87d1de74
satellite/satellitedb/overlaycache: fill node tags with join for limited number of nodes
The easiest way to get node information WITH node tags is executing two queries:

 1. select all nodes
 2. select all tags

And we can pair them with a loop, using the in-memory data structures.

But this approach does work only, if we select all nodes, which is true when we use cache (upload, download, repair checker).

But repair process selects only the required nodes, where this approach is suboptimal. (full table scan for all tags, even if we need only tags for a few dozens nodes).

Possible solutions:

 1. We can introduce a cache for repair (similar to upload cache)
 2. Or we can select both node and tag information with one query (join).

This patch implements the second approach.

Note: repair itself is quite slow (10-20 seconds per segements to repair). With 15 seconds execution time and 3 minutes cache staleness, we would use the cache only 12 times per worker. Probably we don't need cache for now.

https://github.com/storj/storj/issues/6198

Change-Id: I0364d94306e9815a1c280b71e843b8f504e3d870
2023-09-07 19:27:53 +02:00
Márton Elek
e2006d821c satellite/overlay: change Reliable and KnownReliable
as GetParticipatingNodes and GetNodes, respectively.

We now want these functions to include offline and suspended nodes as
well, so that we can force immediate repair when pieces are out of
placement or in excluded countries. With that change, the old names no
longer made sense.

Change-Id: Icbcbad43dbde0ca8cbc80a4d17a896bb89b078b7
2023-09-02 23:34:50 +00:00
Márton Elek
de7aabc8c9 satellite/{repair,rangedloop,overlay}: fix node tag placement selection for repair
This patch fixes the node tag based placement of rangedloop/repairchecker + repair process.

The main change is just adding the node tags for Reliable and KnownReliabel database calls + adding new tests to prove, it works.

https://github.com/storj/storj/issues/6126

Change-Id: I245d654a18c1d61b2c72df49afa0718d0de76da1
2023-08-16 15:45:41 +00:00
Egon Elbre
dc41978743 all: fix golangci failures
Change-Id: I07421388d53c837e35a4727cead26fc21c324d04
2023-08-09 11:44:44 +03:00
paul cannon
6e46a926bb satellite/nodeselection: expand SelectedNode
In the repair subsystem, it is necessary to acquire several extra
properties of nodes that are holding pieces of things or may be
selected to hold pieces. We need to know if a node is 'online' (the
definition of "online" may change somewhat depending on the situation),
if a node is in the process of graceful exit, and whether a node is
suspended. We can't just filter out nodes with all of these properties,
because sometimes we need to know properties about nodes even when the
nodes are suspended or gracefully exiting.

I thought the best way to do this was to add fields to SelectedNode,
and (to avoid any confusion) arrange for the added fields to be
populated wherever SelectedNode is returned, whether or not the new
fields are necessarily going to be used.

If people would rather I use a separate type from SelectedNode, I can do
that instead.

Change-Id: I7804a0e0a15cfe34c8ff47a227175ea5862a4ebc
2023-08-07 12:44:49 +00:00
Michal Niewrzal
1d62dc63f5 satellite/repair/repairer: fix NumHealthyInExcludedCountries calculation
Currently, we have issue were while counting unhealthy pieces we are
counting twice piece which is in excluded country and is outside segment
placement. This can cause unnecessary repair.

This change is also doing another step to move RepairExcludedCountryCodes
from overlay config into repair package.

Change-Id: I3692f6e0ddb9982af925db42be23d644aec1963f
2023-07-10 12:01:19 +02:00
Márton Elek
70cdca5d3c
satellite: move satellite/nodeselection/uploadselection => satellite/nodeselection
All the files in uploadselection are (in fact) related to generic node selection, and used not only for upload,
but for download, repair, etc...

Change-Id: Ie4098318a6f8f0bbf672d432761e87047d3762ab
2023-07-07 10:32:03 +02:00
Márton Elek
8b4387a498 satellite/satellitedb: add tag information to nodes selected for upload/downloads
Change-Id: I0fa7daebcf83f7949726e5fffe68e0bdc6fd1d7a
2023-07-07 07:54:16 +00:00
Michal Niewrzal
f2cd7b0928 satellite/overlay: refactor Reliable to be used with repair checker
Currently we are using Reliable to get missing pieces for repair
checker. The issue is that now checker is looking at more things than
just missing pieces (clumped/off, placement pieces) and using only node
ID is not enough. We have issue where we are skipping offline nodes from
clumped and off placement pieces check.

Reliable was refactored to get data (e.g. country, lastNet) about all
reliable nodes. List is split into online and offline. This data will be
cached for quick use by repair checker. It will be also possible to
check nodes metadata like country code or lastNet.

We are also slowly moving `RepairExcludedCountryCodes` config from
overlay to repair which makes more sens for it.

This this first part of changes.

https://github.com/storj/storj/issues/5998

Change-Id: If534342488c0e440affc2894a8fbda6507b8959d
2023-07-05 10:56:31 +02:00
Márton Elek
500b6244f8
satellite/satellitedb: create table for node tags
Change-Id: I884bb740974e6b8241aa6b85faf266b85fe892d4
2023-07-05 09:38:53 +02:00
Márton Elek
d38b8fa2c4 satellite/nodeselection: use the same Node object from overlay and nodeselection
We use two different Node types in `overlay` and `uploadnodeselection` and converting back and forth.

Using the same object would allow us to use a unified node selection interface everywhere.

Change-Id: Ie71e29d60184ee0e5b4547eb54325f09c418f73c
2023-07-03 16:59:33 +00:00
Michal Niewrzal
98f4f249b2 satellite/overlay: refactor KnownReliable to be used with repairer
Currently we are using KnownUnreliableOrOffline to get missing pieces
for segment repairer (GetMissingPieces). The issue is that now repairer
is looking at more things than just missing pieces (clumped/off
placement pieces).

KnownReliable was refactored to get data (e.g. country, lastNet) about
all reliable nodes from provided list. List is split into online and
offline. This way we will be able to use results from this method to all
checks: missing pieces, clumped pieces, out of placement pieces.

This this first part of changes to handle different kind of pieces in
segment repairer.

https://github.com/storj/storj/issues/5998

Change-Id: I6cbaf59cff9d6c4346ace75bb814ccd985c0e43e
2023-06-27 13:27:23 +02:00
Michal Niewrzal
eb407b2ae3 satellite/overlay: delete unused KnownOffline method
Change-Id: Ief9288fee83f9c381dd7840f48333babcd3d6bf7
2023-06-23 13:24:30 +00:00
Michal Niewrzal
9e3fd4d514 satellite/overlay: delete unused method
Change-Id: I87828fcac4f4a9fb08c86af188aa6ea28c5c64af
2023-06-22 12:45:59 +00:00
JT Olio
1437257dbf satellite: save and return which node features are enabled
current feature is if tcp fastopen was successfully enabled

Change-Id: Ide251863a9790b0fbebdf2e82dfd2afa8f25c408
2023-06-06 21:13:29 +00:00
Michal Niewrzal
c48bd81e5f satellite/satellitedb: update SelectAllStorageNodes* to set country code
Methods SelectAllStorageNodesUpload and SelectAllStorageNodesDownload
are not returning full info with overlay.SelectedNode because its
missing CountryCode.

Change-Id: Ie3cb396bf28d7ec4c6ab8927e5bb560236036aa6
2023-05-26 11:02:29 +00:00
paul cannon
915f3952af satellite/repair: repair pieces on the same last_net
We avoid putting more than one piece of a segment on the same /24
network (or /64 for ipv6). However, it is possible for multiple pieces
of the same segment to move to the same network over time. Nodes can
change addresses, or segments could be uploaded with dev settings, etc.
We will call such pieces "clumped", as they are clumped into the same
net, and are much more likely to be lost or preserved together.

This change teaches the repair checker to recognize segments which have
clumped pieces, and put them in the repair queue. It also teaches the
repair worker to repair such segments (treating clumped pieces as
"retrievable but unhealthy"; i.e., they will be replaced on new nodes if
possible).

Refs: https://github.com/storj/storj/issues/5391
Change-Id: Iaa9e339fee8f80f4ad39895438e9f18606338908
2023-04-06 17:34:25 +00:00
JT Olio
2a63225b98 satellite/{contact,satellitedb}: preserve node message debounce support
Change-Id: I453ad35fda4e61d068db0c476dd86b50d7f2d705
2023-03-20 16:13:06 +00:00
paul cannon
97e20bc579 scripts/tests: fix rollingupgrade test even more
This might be pretty awful, but at least it is a complete and non-flaky
solution.

**Only when using the rollingupgrade test** (which implies a throwaway
satellite and also a PostgreSQL backend), create a trigger on the nodes
table which forces last_net to be equal to last_ip_port always.

Change-Id: I8448cf131e46576d96a414d06780270c7b2b1892
2023-03-13 15:49:07 +00:00
paul cannon
fd6ce6b9a5 scripts/tests: fix test-sim-rolling-upgrade.sh
This test involves a satellite with dev defaults (DistinctIP=no) being
upgraded past commit 2522ff09b6, which
means we need to run the dev-defaults-satellite-upgrade migration SQL
to avoid getting DistinctIP=yes behavior (which breaks the tests).

Change-Id: I29fb596d1ffa568dad635d98cfe9abacd3aaa48f
2023-03-09 23:35:36 +00:00
paul cannon
0e2fef977f satellite/overlay: add SetAllContainedNodes method to overlay.DB
We will be needing an infrequent chore to check which nodes are in the
reverify queue and synchronize that set with the 'contained' field in
the nodes db, since it is easily possible for them to get out of sync.
(We can't require that the reverification queue table be in the same
database as the nodes table, so maintaining consistency with SQL
transactions is out. Plus, even if they were in the same database, using
such SQL transactions to maintain consistency would be slow and
unwieldy.)

This commit adds a method to the overlay allowing the caller to set the
contained status of all nodes in the nodes table at once. This is valid
because our definition of "contained" now depends solely on whether a
node appears at least once in the reverification queue. Only rows whose
contained field does not match the expectation will be updated; the
contained timestamp will not be updated for a node which is supposed to
be contained and was already contained.

Change-Id: I8cabe56ad897b6027e11aa5b17175295391aa3ac
2023-02-06 10:18:54 +00:00
JT Olio
686faeedbd satellite/overlay: return noise info with selected nodes
we have two more fields in the database (noise_proto and
noise_public_key) that now need to go into pb.NodeAddress when
returning AddressedOrderLimits.

the only real complication is making sure type conversions between
database types and NodeURLs and so on don't lose this new
pb.NodeAddress field (NoiseInfo). otherwise this is a relatively
straightforward commit

Change-Id: I45b59d7b2d3ae21c2e6eb95497f07cd388d454b3
2023-02-02 15:46:27 +00:00
JT Olio
2753d5a32f satellite/overlay: keep track of noise info per node
Change-Id: Icef04c3e87dbf4bb57d3837274c323bf6dd2c81f
2023-02-01 23:03:35 -05:00
Cameron
0596651580 satellite/satellitedb: fix updating nodes.last_software_update_email
The CASE expression used to determine which value to set
last_software_update_email to did not have an ELSE clause. Therefore,
when the node is both below the minimum version and did not receive a
version update email (no condition is true), the value would be set to
NULL.

Additionally, replace `time.Now()` with `timestamp` in the check to
determine if the email cooldown has passed.

Change-Id: I2e2e93f1a865e123ed8b665be9621cebfb72236f
2023-02-01 17:25:58 -05:00
paul cannon
b6bcb32ecf satellite/reputation: more accurate "reputation changes" list
`overlay.(*Service).UpdateReputation()` takes a "reputationChanges"
parameter, a slice of node events indicating whether we think the node's
disqualification or suspension status is changing. This is necessary so
that the overlay service can notify the nodeevents DB about these
changes.

In several cases, however, this list of events is not constructed
correctly, because of missing information about the previous state.
In most cases, this is because the node was offline, and the order limit
creation functions (which usually obtain and return the prior reputation
status) ignored that node.

This change makes it so that all callers to
`overlay.(*Service).UpdateReputation()` can be expected to provide a
correct list of change events (as correct as feasible, given that we
can't lock the node's information in the database during the entire
operation).

It ended up that there was only one caller we needed to worry about, and
that was reputation.(*Service).ApplyAudit(). So the bulk of this change
is teaching that function how to recognize when the prior reputation
status was not filled in, and fill it in.

Refs: https://github.com/storj/storj/issues/5464
Change-Id: I52ce385fc9c0ce3b283b998d517998e7f4ec8792
2023-01-31 18:39:40 +00:00
JT Olio
e40191afd6 storj: upgrade to use latest storj/common NodeAddress
Change-Id: I5987391bcfe5f6dfd7b525698c337a4cbda9b76e
2023-01-25 01:37:26 +00:00
Fadila Khadar
5c3a148d6e satellite/overlaycache: fix typo in UpdateCheckIn request
- fix a malformed SQL query
- add test to be sure we don't have this problem again.

Change-Id: I3fde8c59ba01335411e51d964bec95bc26cfc961
2022-12-14 22:21:45 +01:00
paul cannon
ed0fa59f23 satellite/overlay: add SetNodeContained() method
SetNodeContained() will change the contained flag in the nodes table,
which will affect whether nodes are selected for new uploads. This flag
_should_ correlate with whether or not a given node has any entries in
the reverification queue. However, the reverification queue is intended
to be 'safely partitionable' from the nodes table, so we can't enforce
that characteristic transactionally. But this is ok; there are no dire
consequences if they are out of sync.

We will be adding a chore that updates the contained flag based on the
contents of the reverification queue periodically, if something fails
to set it directly when appropriate.

Refs: https://github.com/storj/storj/issues/5231
Change-Id: I26460d8718dee63fd55d00a44568b2065fc8fe30
2022-12-01 12:43:40 +00:00
Cameron
8681a36164 satellite/overlay: add ability to get offline nodes in need of email
Add LastOfflineEmail to overlay.NodeDossier. This is the last time a
node got an offline email. Add two new overlay db methods,
GetOfflineNodesForEmail and UpdateLastOfflineEmail. Edit db method
UpdateCheckIn to nullify last_offline_email if node is up.

Change-Id: I1ee60e7d98dd1b68348a57f9a4fb77c6c9895d6d
2022-11-17 19:03:04 +00:00
Cameron
2a25974261 satellite/overlay: insert node software update events into node events
When a node checks in and its version is below the minimum, insert
BelowMinVersion event into node events

Change-Id: I0e437ac34496778369515cbc40c15676da8b27ae
2022-11-11 20:43:56 +00:00
Cameron
cb0c359b81 satellite/overlay: insert DQ node events for stray nodes
Change-Id: I99da11e506ab7f6bcebdb08a5815078a3297c932
2022-11-04 15:48:17 +00:00
Cameron
74ddfab810 satellite/overlay: insert DQ event into node events in overlay.DisqualifyNode
Also, return node email from overlaycache db DisqualifyNode to be used
in node events insertion

Change-Id: I41534cf01351c1690c3966a8055c5fe6fcf0d6a6
2022-11-04 15:18:31 +00:00
Cameron
8c8688ca6b satellite/overlay: return email as part of NodeReputation
Make email accessible for email sending after reputation updates

Change-Id: I760feee0f6ca58b76a2955a04c0c366c618656bb
2022-10-26 17:33:22 +00:00
Michal Niewrzal
a22e6bdf67 satellite/gc/bloomfilter: use int64 to count pieces
Pieces count in DB are stored as int64 and we would like to align bloom
filter processing with this type.

Change-Id: Iaec767e609a40d802077ae057520541805a7c44f
2022-09-22 09:39:53 +00:00
JT Olio
12ef68bdf4 satellitedb: restore-trash shouldn't bother with nodes we've never talked to
Change-Id: I3c631e92a9a2670c52c9fa23b2b1baf67d3c9a9c
2022-08-25 16:14:29 +00:00
Egon Elbre
65b5a0fe82 satellite/{overlay,satellitedb}: add AS OF to download selection
This should reduce the load caused by download selection queries.

Change-Id: Ic1a89d9c3eb5418f0792eb20ec2aece18dc63f2c
2022-06-28 05:18:01 +00:00
paul cannon
aa728bd6ea satellite/satellitedb: add reputations.disqualification_reason
We added nodes.disqualification_reason recently, but we didn't add a
corresponding column in the reputations table (despite having a
corresponding `disqualified` column there).

Without this change, the (very useful and informative) assignments to
updateFields.DisqualificationReason in reputations.go have no effect.

Refs: https://github.com/storj/storj/issues/4601

Change-Id: I77404902ca64b56aed72f1de76b303fe82b76aab
2022-05-17 10:09:36 -05:00
Yaroslav Vorobiov
3f47d19aa6 satellite/overlay: add disqualification reason
Add disqualification reason to NodeDossier.
Extend DB.DisqualifyNode with disqualification reason.
Extend reputation Service.TestDisqualifyNode with disqualification reason.

Change-Id: I8611b6340c7f42ac1bb8bd0fd7f0648ad650ab2d
2022-04-20 13:29:31 +00:00
Yaroslav Vorobiov
4223fa01f8 satellite/reputation: add disqualification reason for status update
Set disqualification reason when reputations stats are updated on DB.Update.
Added tests for DisqualifyNode and for disqualification cases which happens during Update.

Change-Id: I00130ab5d9722422805159ad2f183c205de60f7e
2022-04-20 13:29:10 +00:00
Fadila Khadar
29fd36a20e satellite/repairer: handle excluded countries
For nodes in excluded areas, we don't necessarily want to remove them
from the pointer, but we do want to increase the number of pieces in the
segment in case those excluded area nodes go down. To do that, we
increase the number of pieces repaired by the number of pieces in
excluded areas.

Change-Id: I0424f1bcd7e93f33eb3eeeec79dbada3b3ea1f3a
2022-03-14 10:59:36 -04:00
Fadila Khadar
e776c65172 satellite/checker: pieces in excluded countries are not healthy
Add a RepairExcludedCountryCodes config flag for overlay for providing a list of country codes to exclude nodes from target repair selection.

Mark segments with less than repairThreshold pieces in countries not in the RepairExcludedCountryCodes as not healthy.
With this change, the repair process is not affected. The segment will be removed from the repair queue by the repairer.

Another change will handle the logic at the repairer level.

Fixes https://github.com/storj/team-metainfo/issues/95

Change-Id: I9231b32de117a116488de055a3e94efcabb46e81
2022-03-02 09:59:09 +00:00
Yingrong Zhao
1f8f7ebf06 satellite/{audit, reputation}: fix potential nodes reputation status
inconsistency

The original design had a flaw which can potentially cause discrepancy
for nodes reputation status between reputations table and nodes table.
In the event of a failure(network issue, db failure, satellite failure, etc.)
happens between update to reputations table and update to nodes table, data
can be out of sync.
This PR tries to fix above issue by passing through node's reputation from
the beginning of an audit/repair(this data is from nodes table) to the next
update in reputation service. If the updated reputation status from the service
is different from the existing node status, the service will try to update nodes
table. In the case of a failure, the service will be able to try update nodes
table again since it can see the discrepancy of the data. This will allow
both tables to be in-sync eventually.

Change-Id: Ic22130b4503a594b7177237b18f7e68305c2f122
2022-01-06 21:05:59 +00:00
Márton Elek
76c2228fbd satellite/metainfo: propagate geofencing between buckets and stream id
Github: https://github.com/storj/storj/issues/4245

Change-Id: I83d34367aab1f3c0d46a044f54980b2d50174b19
2021-11-24 08:05:05 +00:00
Mya
bf51c286d9
satellite/geoip: update node check-in to associate a country code
Resolves https://github.com/storj/storj/issues/4247

Change-Id: Idfd71bf1795d48ca3c686066bbdb95b9c6594f00
2021-11-10 16:44:41 +01:00
Cameron Ayer
1de8a695e8 satellite/{overlay,satellitedb}: fix stray nodes DQ bug
We had a bug in the stray nodes chore where nodes who had not been seen
in several months were not being DQd. We figured out that this was
happening because we were using two queries: The first to grab
nodes where last_contact_success < some cutoff, the second to DQ them
unless last_contact_success == '0001-01-01 00:00:00+00'. The problem
is that if all of the nodes returned from the first query had
last_contact_success of '0001-01-01 00:00:00+00', we would pass them to
the second query which would not DQ them. This would result in the stray
nodes DQ loop ending since we found a number of nodes to DQ less than the
limit.

The fix: add the "WHERE last_contact_success != '0001-01-01
00:00:00+00'::timestamptz" to the selection query.

Change-Id: I4e60de90b68d8745d641b4467c2b23e0e56f7dff
2021-11-02 17:05:00 +00:00
Cameron Ayer
bb21551a9c satellite/satellitedb: remove references to contained column in nodes table
We don't use this column for anything. If you want to know if a node is
contained, you can check the pending_audits table.

Change-Id: I8da1d8e01a2dcaff63c5067a7927b5451424ad04
2021-10-14 19:17:46 +00:00
Yingrong Zhao
6c34ff64ad satellite/satellitedb: remove referrence to audit information in
nodes and audit_history tables

This PR removes all code reference to audit_histories table and
```
audit_reputation_alpha, audit_reputation_beta,
unknown_audit_reputation_alpha, unknown_audit_reputation_beta,
```
columns from nodes table.

It also drops audit_histories table from the db since the code
that's referencing it currently are not being used.

Change-Id: Ifcda8db36afb3a333d487ff831f2fdefc8b02a4c
2021-08-13 21:11:28 +00:00
Yingrong Zhao
e4cc965c39 satellite/satellitedb: replace explicit transaction with dbx query for
UpdateReputation

Change-Id: I7c139ededea83d4b58107536c3a031c4f92d6eb4
2021-08-05 17:09:49 +00:00
Egon Elbre
65804801ec all: fix mon.Task leak
Change-Id: Ifd58c7ac5631b9c3c750b3f4cc50525167e90709
2021-08-05 14:07:45 +03:00
Yingrong Zhao
646ce5b8cc satellite/overlay: remove reputation logic from overlay
Change-Id: I3492860e4537c7a8e4e824ec4c9c8d179134a0c0
2021-07-28 15:15:28 -04:00