Commit Graph

217 Commits

Author SHA1 Message Date
paul cannon
0e2fef977f satellite/overlay: add SetAllContainedNodes method to overlay.DB
We will be needing an infrequent chore to check which nodes are in the
reverify queue and synchronize that set with the 'contained' field in
the nodes db, since it is easily possible for them to get out of sync.
(We can't require that the reverification queue table be in the same
database as the nodes table, so maintaining consistency with SQL
transactions is out. Plus, even if they were in the same database, using
such SQL transactions to maintain consistency would be slow and
unwieldy.)

This commit adds a method to the overlay allowing the caller to set the
contained status of all nodes in the nodes table at once. This is valid
because our definition of "contained" now depends solely on whether a
node appears at least once in the reverification queue. Only rows whose
contained field does not match the expectation will be updated; the
contained timestamp will not be updated for a node which is supposed to
be contained and was already contained.

Change-Id: I8cabe56ad897b6027e11aa5b17175295391aa3ac
2023-02-06 10:18:54 +00:00
JT Olio
686faeedbd satellite/overlay: return noise info with selected nodes
we have two more fields in the database (noise_proto and
noise_public_key) that now need to go into pb.NodeAddress when
returning AddressedOrderLimits.

the only real complication is making sure type conversions between
database types and NodeURLs and so on don't lose this new
pb.NodeAddress field (NoiseInfo). otherwise this is a relatively
straightforward commit

Change-Id: I45b59d7b2d3ae21c2e6eb95497f07cd388d454b3
2023-02-02 15:46:27 +00:00
JT Olio
2753d5a32f satellite/overlay: keep track of noise info per node
Change-Id: Icef04c3e87dbf4bb57d3837274c323bf6dd2c81f
2023-02-01 23:03:35 -05:00
Cameron
0596651580 satellite/satellitedb: fix updating nodes.last_software_update_email
The CASE expression used to determine which value to set
last_software_update_email to did not have an ELSE clause. Therefore,
when the node is both below the minimum version and did not receive a
version update email (no condition is true), the value would be set to
NULL.

Additionally, replace `time.Now()` with `timestamp` in the check to
determine if the email cooldown has passed.

Change-Id: I2e2e93f1a865e123ed8b665be9621cebfb72236f
2023-02-01 17:25:58 -05:00
paul cannon
b6bcb32ecf satellite/reputation: more accurate "reputation changes" list
`overlay.(*Service).UpdateReputation()` takes a "reputationChanges"
parameter, a slice of node events indicating whether we think the node's
disqualification or suspension status is changing. This is necessary so
that the overlay service can notify the nodeevents DB about these
changes.

In several cases, however, this list of events is not constructed
correctly, because of missing information about the previous state.
In most cases, this is because the node was offline, and the order limit
creation functions (which usually obtain and return the prior reputation
status) ignored that node.

This change makes it so that all callers to
`overlay.(*Service).UpdateReputation()` can be expected to provide a
correct list of change events (as correct as feasible, given that we
can't lock the node's information in the database during the entire
operation).

It ended up that there was only one caller we needed to worry about, and
that was reputation.(*Service).ApplyAudit(). So the bulk of this change
is teaching that function how to recognize when the prior reputation
status was not filled in, and fill it in.

Refs: https://github.com/storj/storj/issues/5464
Change-Id: I52ce385fc9c0ce3b283b998d517998e7f4ec8792
2023-01-31 18:39:40 +00:00
JT Olio
e40191afd6 storj: upgrade to use latest storj/common NodeAddress
Change-Id: I5987391bcfe5f6dfd7b525698c337a4cbda9b76e
2023-01-25 01:37:26 +00:00
Fadila Khadar
5c3a148d6e satellite/overlaycache: fix typo in UpdateCheckIn request
- fix a malformed SQL query
- add test to be sure we don't have this problem again.

Change-Id: I3fde8c59ba01335411e51d964bec95bc26cfc961
2022-12-14 22:21:45 +01:00
paul cannon
ed0fa59f23 satellite/overlay: add SetNodeContained() method
SetNodeContained() will change the contained flag in the nodes table,
which will affect whether nodes are selected for new uploads. This flag
_should_ correlate with whether or not a given node has any entries in
the reverification queue. However, the reverification queue is intended
to be 'safely partitionable' from the nodes table, so we can't enforce
that characteristic transactionally. But this is ok; there are no dire
consequences if they are out of sync.

We will be adding a chore that updates the contained flag based on the
contents of the reverification queue periodically, if something fails
to set it directly when appropriate.

Refs: https://github.com/storj/storj/issues/5231
Change-Id: I26460d8718dee63fd55d00a44568b2065fc8fe30
2022-12-01 12:43:40 +00:00
Cameron
87660bd9b3 satellite/overlay/offlinenodes: insert offline nodes into node events
Add a new chore to periodically insert nodes who are offline and
have not gotten an offline email in a certain amount of time into node
events

Change-Id: I658b385bb777b0240c98092946a93d65bee94abc
2022-11-18 12:10:06 -05:00
Cameron
8681a36164 satellite/overlay: add ability to get offline nodes in need of email
Add LastOfflineEmail to overlay.NodeDossier. This is the last time a
node got an offline email. Add two new overlay db methods,
GetOfflineNodesForEmail and UpdateLastOfflineEmail. Edit db method
UpdateCheckIn to nullify last_offline_email if node is up.

Change-Id: I1ee60e7d98dd1b68348a57f9a4fb77c6c9895d6d
2022-11-17 19:03:04 +00:00
Cameron
2a25974261 satellite/overlay: insert node software update events into node events
When a node checks in and its version is below the minimum, insert
BelowMinVersion event into node events

Change-Id: I0e437ac34496778369515cbc40c15676da8b27ae
2022-11-11 20:43:56 +00:00
Cameron
d856569935 satellite/overlay: node software update email cooldown config
Change-Id: I792eef4a570e38e94f79a3f73c204b65f86ab541
2022-11-11 20:14:25 +00:00
Cameron
cb0c359b81 satellite/overlay: insert DQ node events for stray nodes
Change-Id: I99da11e506ab7f6bcebdb08a5815078a3297c932
2022-11-04 15:48:17 +00:00
Cameron
74ddfab810 satellite/overlay: insert DQ event into node events in overlay.DisqualifyNode
Also, return node email from overlaycache db DisqualifyNode to be used
in node events insertion

Change-Id: I41534cf01351c1690c3966a8055c5fe6fcf0d6a6
2022-11-04 15:18:31 +00:00
Cameron
68fe26ebe5 satelite/overlay: insert reputation events into node events
Insert reputation event into node events if reputation change occurs.

Change-Id: If1c5526092cb6834fe2faa6aa6e0306d4d88a4b7
2022-11-02 18:32:20 +00:00
Cameron
865974950d satellite/overlay: insert node online events into node events table
Instead of sending emails at the time the node is seen to be back
online, we have decided to send the event to the node events table,
which will initiate the email sending process at some point.

Change-Id: Id756209498112579de8e78ee20ad2df54571a617
2022-11-02 16:26:19 +00:00
Cameron
f06da25c3d satellite/overlay: add nodeevents.DB to satellite overlay service
Add nodeevents.DB to satellite overlay service so we can insert node
events into the nodeevents DB.

Change-Id: I642c0ccc9941ecdb08cb22d5c8cf701959a55156
2022-11-02 15:56:37 +00:00
Cameron
8c8688ca6b satellite/overlay: return email as part of NodeReputation
Make email accessible for email sending after reputation updates

Change-Id: I760feee0f6ca58b76a2955a04c0c366c618656bb
2022-10-26 17:33:22 +00:00
Cameron
a2ca443e29 satellite/overlay: send Node Online emails
Send an email when a node goes from offline to online.

Change-Id: I82d3f9001b9b0669e096d784edf097ffdcb1697d
2022-10-20 18:56:35 +00:00
Cameron
a52f766273 satellite/overlay: add email-sending functionality to overlay service
We want to send emails to SNOs. Node status changes go through the
overlay service, so it's a good place to add the mail service.
Add the mailservice.Service, satellite address, and satellite name to
overlay service. Also add feature flag --overlay.send-node-emails

Change-Id: I3bd2cb3bf22f9724954ce2374f8b651b902b3a24
2022-10-13 18:01:05 +00:00
Michal Niewrzal
a22e6bdf67 satellite/gc/bloomfilter: use int64 to count pieces
Pieces count in DB are stored as int64 and we would like to align bloom
filter processing with this type.

Change-Id: Iaec767e609a40d802077ae057520541805a7c44f
2022-09-22 09:39:53 +00:00
Egon Elbre
cf50696745 cmd/tools/segment-verify: wire up overlay logic
Change-Id: I0a4c737a8b0995a1c3e3adeac728fe833d0ce684
2022-09-19 11:32:18 +03:00
paul cannon
07bbe7d340 satellite/overlay: don't insert new nodes if contact check failed
Currently, the satellite tracks connectivity information about all nodes
that have contacted it, even if we have never successfully contacted
the node back.

This behavior was leveraged during a security audit to create hundreds
of thousands of "junk nodes" in the nodes table on one satellite, which
affected performance of queries such as node selection.

With this change, we should no longer track information about nodes that
have never been successfully contacted.

Note that it will still be possible to cause the creation of "junk
node" entries in the db; the attacker just has to set up individual
publicly-routable IP+port pairs for each node as it is created, so it
can respond to a PingBack.

Change-Id: Ibb6da6cc908fd4fc85aae1ba00313ba2738409ab
2022-08-26 09:05:04 +00:00
JT Olio
12ef68bdf4 satellitedb: restore-trash shouldn't bother with nodes we've never talked to
Change-Id: I3c631e92a9a2670c52c9fa23b2b1baf67d3c9a9c
2022-08-25 16:14:29 +00:00
paul cannon
0dcc0a9ee0 satellite/reputation: reconfigure lambda and alpha
This is in response to community feedback that our existing reputation
calculation is too likely to disqualify storage nodes unfairly with
extreme swings up and down.

For details and analysis, please see the data_loss_vs_dq_chance_sim.py
tool, the "tuning reputation further.ipynb" Jupyter notebook in the
storj/datascience repository, and the discussion at

    https://forum.storj.io/t/tuning-audit-scoring/14084

In brief: changing the lambda and initial-alpha parameters in this way
causes the swings in reputation to be smaller and less likely to put a
node past the disqualification threshold unfairly.

Note: this change will cause a one-time reset of all (non-disqualified)
node reputations, because the new initial alpha value of 1000 is
dramatically different, and the disqualification threshold is going to
be much higher.

Change-Id: Id6dc4ba8fde1be3db4255b72282207bab5491ca3
2022-08-17 18:52:53 +00:00
paul cannon
2f20bbf4d8 satellite/reputation: add a reputation write cache
This should lower the amount of database load coming from
reputation updates.

Change-Id: Iaacfb81480075261da77c5cc93e08b24f69f8949
2022-07-14 21:40:16 +00:00
Egon Elbre
51d4e5c275 satellite/{orders,overlay}: use cache for downloads
Use DownloadSelectionCache to avoid querying database for every
download.
This change only addresses downloads from users. The download selection
cache is not currently used for audit and repair.

Change-Id: I96a49e121dac0b4204f97592a63131edabd73fb5
2022-07-12 11:04:34 +00:00
Egon Elbre
48b0a65fbd satellite/overlay: use ReadCache in Download/UploadSelectionCache
sync2.ReadCache implements preemptive refreshing preventing stalling
while it's being updated.

Change-Id: Iee9ef36049b986f0e426c14a139b2bc9ac17fb53
2022-07-12 13:52:48 +03:00
Egon Elbre
65b5a0fe82 satellite/{overlay,satellitedb}: add AS OF to download selection
This should reduce the load caused by download selection queries.

Change-Id: Ic1a89d9c3eb5418f0792eb20ec2aece18dc63f2c
2022-06-28 05:18:01 +00:00
paul cannon
4e81a60838 satellite/overlay: fix TestNodeSelectionGracefulExit
This test had an effective config.Reputation.AuditCount = 0, meaning all
nodes that had _any_ positive audit results were considered vetted.
Because of that, only one node in the test setup was "new". And that
node was marked as being in GE, so could not be returned by node
selection.

The reason the tests still worked is because of the node selection rule
that says "if there are no new nodes at all, just get all reputable
nodes to satisfy the request".

This commit makes it so half of the nodes are vetted and half new, which
makes the test somewhat more interesting (and means we aren't
concentrating too much on testing details of behavior when AuditCount is
0).

Change-Id: I09157b7dc20ecaddd2a6e60cfe146e9186e3603b
2022-06-17 18:28:04 +00:00
Paul Willoughby
911cc1e163 satellite/contact: reject privateIPs in PingMe and CheckIn endpoints
prevent network enumeration by rejecting privateIPs in PingMe and
Checkin endpoints

Closes storj/storj-private#32

Change-Id: I63f00483ff4128ebd5fa9b7b8da826a5706748c9
2022-06-07 08:09:14 +00:00
Yaroslav Vorobiov
3f47d19aa6 satellite/overlay: add disqualification reason
Add disqualification reason to NodeDossier.
Extend DB.DisqualifyNode with disqualification reason.
Extend reputation Service.TestDisqualifyNode with disqualification reason.

Change-Id: I8611b6340c7f42ac1bb8bd0fd7f0648ad650ab2d
2022-04-20 13:29:31 +00:00
Yaroslav Vorobiov
4223fa01f8 satellite/reputation: add disqualification reason for status update
Set disqualification reason when reputations stats are updated on DB.Update.
Added tests for DisqualifyNode and for disqualification cases which happens during Update.

Change-Id: I00130ab5d9722422805159ad2f183c205de60f7e
2022-04-20 13:29:10 +00:00
Fadila Khadar
29fd36a20e satellite/repairer: handle excluded countries
For nodes in excluded areas, we don't necessarily want to remove them
from the pointer, but we do want to increase the number of pieces in the
segment in case those excluded area nodes go down. To do that, we
increase the number of pieces repaired by the number of pieces in
excluded areas.

Change-Id: I0424f1bcd7e93f33eb3eeeec79dbada3b3ea1f3a
2022-03-14 10:59:36 -04:00
Moby von Briesen
b2d342aa9b satellite/overlay: Add ability to exclude country codes on upload
Create global config to specify a list of country codes that should be
excluded from node selection during uploads.

This exclusion is not implemented when the upload selection cache is
disabled.

Change-Id: Ic41e8b4f18857a11045668eac23107da99668a72
2022-03-03 16:58:48 +00:00
Fadila Khadar
e776c65172 satellite/checker: pieces in excluded countries are not healthy
Add a RepairExcludedCountryCodes config flag for overlay for providing a list of country codes to exclude nodes from target repair selection.

Mark segments with less than repairThreshold pieces in countries not in the RepairExcludedCountryCodes as not healthy.
With this change, the repair process is not affected. The segment will be removed from the repair queue by the repairer.

Another change will handle the logic at the repairer level.

Fixes https://github.com/storj/team-metainfo/issues/95

Change-Id: I9231b32de117a116488de055a3e94efcabb46e81
2022-03-02 09:59:09 +00:00
Egon Elbre
be02aa9b17 satellite/overlay: add test for UpdateCheckIn panic
The database table got invalid input and the resulting error
was not checked. This adds updates that contain invalid fields
to trigger different errors.

Change-Id: Iacea32cbef5599aab562c88e4113073596cc9996
2022-01-24 15:01:59 +02:00
Yingrong Zhao
1f8f7ebf06 satellite/{audit, reputation}: fix potential nodes reputation status
inconsistency

The original design had a flaw which can potentially cause discrepancy
for nodes reputation status between reputations table and nodes table.
In the event of a failure(network issue, db failure, satellite failure, etc.)
happens between update to reputations table and update to nodes table, data
can be out of sync.
This PR tries to fix above issue by passing through node's reputation from
the beginning of an audit/repair(this data is from nodes table) to the next
update in reputation service. If the updated reputation status from the service
is different from the existing node status, the service will try to update nodes
table. In the case of a failure, the service will be able to try update nodes
table again since it can see the discrepancy of the data. This will allow
both tables to be in-sync eventually.

Change-Id: Ic22130b4503a594b7177237b18f7e68305c2f122
2022-01-06 21:05:59 +00:00
Egon Elbre
a42b9d1a48 all: fix uses of email.com
email.com is not a domain that should be used for examples
nor tests.

Change-Id: I654d4287d02633d5ed9740e81a79150470eeaf25
2022-01-05 16:29:19 +02:00
Márton Elek
76c2228fbd satellite/metainfo: propagate geofencing between buckets and stream id
Github: https://github.com/storj/storj/issues/4245

Change-Id: I83d34367aab1f3c0d46a044f54980b2d50174b19
2021-11-24 08:05:05 +00:00
Kaloyan Raev
f773bb80c8 mod: bump common to fetch latest placement type changes
Change-Id: I3d0813f05622e706c9be1a578b5e4d4159d16dfc
2021-11-16 12:42:25 +00:00
Mya
bf51c286d9
satellite/geoip: update node check-in to associate a country code
Resolves https://github.com/storj/storj/issues/4247

Change-Id: Idfd71bf1795d48ca3c686066bbdb95b9c6594f00
2021-11-10 16:44:41 +01:00
Egon Elbre
d5628740fd satellite/overlay/straynodes: prevent disqualification in tests
Currently the test threshold for unneeded node checkins are 30s.
The disqualification threshold was at 30s, which means, it was possible
for all the nodes to get disqualified.

Hopefully fixes #4267

Change-Id: I6b0a10c09b7fd90a9729794885c9e7a593781bad
2021-11-09 12:29:21 +00:00
Márton Elek
fb604be460 satellite/gracefulexit: use nodecache with gracefulexit
Change-Id: I700caf6dfd06bd3b3970a8cf7c5a79da4af27b5f
2021-11-05 19:14:14 +00:00
Cameron Ayer
1de8a695e8 satellite/{overlay,satellitedb}: fix stray nodes DQ bug
We had a bug in the stray nodes chore where nodes who had not been seen
in several months were not being DQd. We figured out that this was
happening because we were using two queries: The first to grab
nodes where last_contact_success < some cutoff, the second to DQ them
unless last_contact_success == '0001-01-01 00:00:00+00'. The problem
is that if all of the nodes returned from the first query had
last_contact_success of '0001-01-01 00:00:00+00', we would pass them to
the second query which would not DQ them. This would result in the stray
nodes DQ loop ending since we found a number of nodes to DQ less than the
limit.

The fix: add the "WHERE last_contact_success != '0001-01-01
00:00:00+00'::timestamptz" to the selection query.

Change-Id: I4e60de90b68d8745d641b4467c2b23e0e56f7dff
2021-11-02 17:05:00 +00:00
Egon Elbre
f2d8e97d97 satellite/satellitedb: simplify select nodes query construction
Change-Id: I07009b28762d4485929a2a999e8f4be8179bee51
2021-10-22 07:41:07 +00:00
Egon Elbre
51f488fc4b satellite/satellitedb: fix selection query with AOST
Union query for both reputable and new nodes didn't properly work. The
top level query is required to have an `AS OF SYSTEM TIME` statement as
well.

Change-Id: I8ee6dd5b700c2b1ed2aa562962bfa72be7eec30a
2021-10-19 16:59:40 +03:00
Cameron Ayer
bb21551a9c satellite/satellitedb: remove references to contained column in nodes table
We don't use this column for anything. If you want to know if a node is
contained, you can check the pending_audits table.

Change-Id: I8da1d8e01a2dcaff63c5067a7927b5451424ad04
2021-10-14 19:17:46 +00:00
Yingrong Zhao
e00b573f8f satellite/overlay: fix UpdateCheckIn comment
Change-Id: I8584895249d7e5be6dbec79974fc9f77b7a1930c
2021-08-06 05:54:00 +00:00
Egon Elbre
65804801ec all: fix mon.Task leak
Change-Id: Ifd58c7ac5631b9c3c750b3f4cc50525167e90709
2021-08-05 14:07:45 +03:00