Commit Graph

139 Commits

Author SHA1 Message Date
Kaloyan Raev
98b82486bd satellite/overlay: update country code on every node check-in
We have a specific issue that a user uploaded a file to a bucket
geo-fenced to EU and one of the pieces appeared to be on a node in the
US. The country code of this node is set to SE (Sweden) in the satellite
DB. It turns out that some time ago MaxMind changed the country code of
this node's IP from Sweden to US, but this change hasn't been reflected
in the satellite's database.

So far the satellite updates the country code of a node only if its IP
changes. It was assumed that if the IP does not change, its country code
shouldn't change too. This turned to be a wrong assumption.

With this change, the satellite will look up the MaxMindDB on every
check-in to see if the country code of the node's IP has changed.

Change-Id: Icdf659b09be9fc6ad14601902032b35ba5ea78c4
2023-03-22 08:38:51 +00:00
paul cannon
fd6ce6b9a5 scripts/tests: fix test-sim-rolling-upgrade.sh
This test involves a satellite with dev defaults (DistinctIP=no) being
upgraded past commit 2522ff09b6, which
means we need to run the dev-defaults-satellite-upgrade migration SQL
to avoid getting DistinctIP=yes behavior (which breaks the tests).

Change-Id: I29fb596d1ffa568dad635d98cfe9abacd3aaa48f
2023-03-09 23:35:36 +00:00
Márton Elek
ffaf15a3b0 satellite/overlay: remove unused mail service from overlay
It was surprising that `satellite auditor` complained about SMTP mail settings, even if it's not supposed to sending any mail.

Looks like we can remove the mail service dependency, as it's not a hard requirement for overlay.Service.

Change-Id: I29a52eeff3f967ddb2d74a09458dc0ee2f051bd7
2023-03-09 12:17:35 +00:00
paul cannon
2522ff09b6 satellite/overlay: configurable meaning of last_net
Up to now, we have been implementing the DistinctIP preference with code
in two places:

 1. On check-in, the last_net is determined by taking the /24 or /64
    (in ResolveIPAndNetwork()) and we store it with the node record.
 2. On node selection, a preference parameter defines whether to return
    results that are distinct on last_net.

It can be observed that we have never yet had the need to switch from
DistinctIP to !DistinctIP, or from !DistinctIP to DistinctIP, on the
same satellite, and we will probably never need to do so in an automated
way. It can also be observed that this arrangement makes tests more
complicated, because we often have to arrange for test nodes to have IP
addresses in different /24 networks (a particular pain on macOS).

Those two considerations, plus some pending work on the repair framework
that will make repair take last_net into consideration, motivate this
change.

With this change, in the #2 place, we will _always_ return results that
are distinct on last_net. We implement the DistinctIP preference, then,
by making the #1 place (ResolveIPAndNetwork()) more flexible. When
DistinctIP is enabled, last_net will be calculated as it was before. But
when DistinctIP is _off_, last_net can be the same as address (IP and
port). That will effectively implement !DistinctIP because every
record will have a distinct last_net already.

As a side effect, this flexibility will allow us to change the rules
about last_net construction arbitrarily. We can do tests where last_net
is set to the source IP, or to a /30 prefix, or a /16 prefix, etc., and
be able to exercise the production logic without requiring a virtual
network bridge.

This change should be safe to make without any migration code, because
all known production satellite deployments use DistinctIP, and the
associated last_net values will not change for them. They will only
change for satellites with !DistinctIP, which are mostly test
deployments that can be recreated trivially. For those satellites which
are both permanent and !DistinctIP, node selection will suddenly start
acting as though DistinctIP is enabled, until the operator runs a single
SQL update "UPDATE nodes SET last_net = last_ip_port". That can be done
either before or after deploying software with this change.

I also assert that this will not hurt performance for production
deployments. It's true that adding the distinct requirement to node
selection makes things a little slower, but the distinct requirement is
already present for all production deployments, and they will see no
change.

Refs: https://github.com/storj/storj/issues/5391
Change-Id: I0e7e92498c3da768df5b4d5fb213dcd2d4862924
2023-03-09 02:20:12 +00:00
JT Olio
77bf88e916 satellite/overlay: check node difficulty before entering database
closes https://github.com/storj/storj/issues/5568

Change-Id: Id413637c2678e7a7cf8dbf414e082c687c8e8a39
2023-02-15 17:46:25 +00:00
paul cannon
0e2fef977f satellite/overlay: add SetAllContainedNodes method to overlay.DB
We will be needing an infrequent chore to check which nodes are in the
reverify queue and synchronize that set with the 'contained' field in
the nodes db, since it is easily possible for them to get out of sync.
(We can't require that the reverification queue table be in the same
database as the nodes table, so maintaining consistency with SQL
transactions is out. Plus, even if they were in the same database, using
such SQL transactions to maintain consistency would be slow and
unwieldy.)

This commit adds a method to the overlay allowing the caller to set the
contained status of all nodes in the nodes table at once. This is valid
because our definition of "contained" now depends solely on whether a
node appears at least once in the reverification queue. Only rows whose
contained field does not match the expectation will be updated; the
contained timestamp will not be updated for a node which is supposed to
be contained and was already contained.

Change-Id: I8cabe56ad897b6027e11aa5b17175295391aa3ac
2023-02-06 10:18:54 +00:00
JT Olio
686faeedbd satellite/overlay: return noise info with selected nodes
we have two more fields in the database (noise_proto and
noise_public_key) that now need to go into pb.NodeAddress when
returning AddressedOrderLimits.

the only real complication is making sure type conversions between
database types and NodeURLs and so on don't lose this new
pb.NodeAddress field (NoiseInfo). otherwise this is a relatively
straightforward commit

Change-Id: I45b59d7b2d3ae21c2e6eb95497f07cd388d454b3
2023-02-02 15:46:27 +00:00
JT Olio
2753d5a32f satellite/overlay: keep track of noise info per node
Change-Id: Icef04c3e87dbf4bb57d3837274c323bf6dd2c81f
2023-02-01 23:03:35 -05:00
Cameron
0596651580 satellite/satellitedb: fix updating nodes.last_software_update_email
The CASE expression used to determine which value to set
last_software_update_email to did not have an ELSE clause. Therefore,
when the node is both below the minimum version and did not receive a
version update email (no condition is true), the value would be set to
NULL.

Additionally, replace `time.Now()` with `timestamp` in the check to
determine if the email cooldown has passed.

Change-Id: I2e2e93f1a865e123ed8b665be9621cebfb72236f
2023-02-01 17:25:58 -05:00
JT Olio
e40191afd6 storj: upgrade to use latest storj/common NodeAddress
Change-Id: I5987391bcfe5f6dfd7b525698c337a4cbda9b76e
2023-01-25 01:37:26 +00:00
Fadila Khadar
5c3a148d6e satellite/overlaycache: fix typo in UpdateCheckIn request
- fix a malformed SQL query
- add test to be sure we don't have this problem again.

Change-Id: I3fde8c59ba01335411e51d964bec95bc26cfc961
2022-12-14 22:21:45 +01:00
paul cannon
ed0fa59f23 satellite/overlay: add SetNodeContained() method
SetNodeContained() will change the contained flag in the nodes table,
which will affect whether nodes are selected for new uploads. This flag
_should_ correlate with whether or not a given node has any entries in
the reverification queue. However, the reverification queue is intended
to be 'safely partitionable' from the nodes table, so we can't enforce
that characteristic transactionally. But this is ok; there are no dire
consequences if they are out of sync.

We will be adding a chore that updates the contained flag based on the
contents of the reverification queue periodically, if something fails
to set it directly when appropriate.

Refs: https://github.com/storj/storj/issues/5231
Change-Id: I26460d8718dee63fd55d00a44568b2065fc8fe30
2022-12-01 12:43:40 +00:00
Cameron
87660bd9b3 satellite/overlay/offlinenodes: insert offline nodes into node events
Add a new chore to periodically insert nodes who are offline and
have not gotten an offline email in a certain amount of time into node
events

Change-Id: I658b385bb777b0240c98092946a93d65bee94abc
2022-11-18 12:10:06 -05:00
Cameron
8681a36164 satellite/overlay: add ability to get offline nodes in need of email
Add LastOfflineEmail to overlay.NodeDossier. This is the last time a
node got an offline email. Add two new overlay db methods,
GetOfflineNodesForEmail and UpdateLastOfflineEmail. Edit db method
UpdateCheckIn to nullify last_offline_email if node is up.

Change-Id: I1ee60e7d98dd1b68348a57f9a4fb77c6c9895d6d
2022-11-17 19:03:04 +00:00
Cameron
2a25974261 satellite/overlay: insert node software update events into node events
When a node checks in and its version is below the minimum, insert
BelowMinVersion event into node events

Change-Id: I0e437ac34496778369515cbc40c15676da8b27ae
2022-11-11 20:43:56 +00:00
Cameron
cb0c359b81 satellite/overlay: insert DQ node events for stray nodes
Change-Id: I99da11e506ab7f6bcebdb08a5815078a3297c932
2022-11-04 15:48:17 +00:00
Cameron
74ddfab810 satellite/overlay: insert DQ event into node events in overlay.DisqualifyNode
Also, return node email from overlaycache db DisqualifyNode to be used
in node events insertion

Change-Id: I41534cf01351c1690c3966a8055c5fe6fcf0d6a6
2022-11-04 15:18:31 +00:00
Cameron
68fe26ebe5 satelite/overlay: insert reputation events into node events
Insert reputation event into node events if reputation change occurs.

Change-Id: If1c5526092cb6834fe2faa6aa6e0306d4d88a4b7
2022-11-02 18:32:20 +00:00
Cameron
865974950d satellite/overlay: insert node online events into node events table
Instead of sending emails at the time the node is seen to be back
online, we have decided to send the event to the node events table,
which will initiate the email sending process at some point.

Change-Id: Id756209498112579de8e78ee20ad2df54571a617
2022-11-02 16:26:19 +00:00
Cameron
f06da25c3d satellite/overlay: add nodeevents.DB to satellite overlay service
Add nodeevents.DB to satellite overlay service so we can insert node
events into the nodeevents DB.

Change-Id: I642c0ccc9941ecdb08cb22d5c8cf701959a55156
2022-11-02 15:56:37 +00:00
Cameron
8c8688ca6b satellite/overlay: return email as part of NodeReputation
Make email accessible for email sending after reputation updates

Change-Id: I760feee0f6ca58b76a2955a04c0c366c618656bb
2022-10-26 17:33:22 +00:00
Cameron
a2ca443e29 satellite/overlay: send Node Online emails
Send an email when a node goes from offline to online.

Change-Id: I82d3f9001b9b0669e096d784edf097ffdcb1697d
2022-10-20 18:56:35 +00:00
Cameron
a52f766273 satellite/overlay: add email-sending functionality to overlay service
We want to send emails to SNOs. Node status changes go through the
overlay service, so it's a good place to add the mail service.
Add the mailservice.Service, satellite address, and satellite name to
overlay service. Also add feature flag --overlay.send-node-emails

Change-Id: I3bd2cb3bf22f9724954ce2374f8b651b902b3a24
2022-10-13 18:01:05 +00:00
Michal Niewrzal
a22e6bdf67 satellite/gc/bloomfilter: use int64 to count pieces
Pieces count in DB are stored as int64 and we would like to align bloom
filter processing with this type.

Change-Id: Iaec767e609a40d802077ae057520541805a7c44f
2022-09-22 09:39:53 +00:00
Egon Elbre
cf50696745 cmd/tools/segment-verify: wire up overlay logic
Change-Id: I0a4c737a8b0995a1c3e3adeac728fe833d0ce684
2022-09-19 11:32:18 +03:00
paul cannon
07bbe7d340 satellite/overlay: don't insert new nodes if contact check failed
Currently, the satellite tracks connectivity information about all nodes
that have contacted it, even if we have never successfully contacted
the node back.

This behavior was leveraged during a security audit to create hundreds
of thousands of "junk nodes" in the nodes table on one satellite, which
affected performance of queries such as node selection.

With this change, we should no longer track information about nodes that
have never been successfully contacted.

Note that it will still be possible to cause the creation of "junk
node" entries in the db; the attacker just has to set up individual
publicly-routable IP+port pairs for each node as it is created, so it
can respond to a PingBack.

Change-Id: Ibb6da6cc908fd4fc85aae1ba00313ba2738409ab
2022-08-26 09:05:04 +00:00
JT Olio
12ef68bdf4 satellitedb: restore-trash shouldn't bother with nodes we've never talked to
Change-Id: I3c631e92a9a2670c52c9fa23b2b1baf67d3c9a9c
2022-08-25 16:14:29 +00:00
Egon Elbre
51d4e5c275 satellite/{orders,overlay}: use cache for downloads
Use DownloadSelectionCache to avoid querying database for every
download.
This change only addresses downloads from users. The download selection
cache is not currently used for audit and repair.

Change-Id: I96a49e121dac0b4204f97592a63131edabd73fb5
2022-07-12 11:04:34 +00:00
Egon Elbre
48b0a65fbd satellite/overlay: use ReadCache in Download/UploadSelectionCache
sync2.ReadCache implements preemptive refreshing preventing stalling
while it's being updated.

Change-Id: Iee9ef36049b986f0e426c14a139b2bc9ac17fb53
2022-07-12 13:52:48 +03:00
Egon Elbre
65b5a0fe82 satellite/{overlay,satellitedb}: add AS OF to download selection
This should reduce the load caused by download selection queries.

Change-Id: Ic1a89d9c3eb5418f0792eb20ec2aece18dc63f2c
2022-06-28 05:18:01 +00:00
Paul Willoughby
911cc1e163 satellite/contact: reject privateIPs in PingMe and CheckIn endpoints
prevent network enumeration by rejecting privateIPs in PingMe and
Checkin endpoints

Closes storj/storj-private#32

Change-Id: I63f00483ff4128ebd5fa9b7b8da826a5706748c9
2022-06-07 08:09:14 +00:00
Yaroslav Vorobiov
3f47d19aa6 satellite/overlay: add disqualification reason
Add disqualification reason to NodeDossier.
Extend DB.DisqualifyNode with disqualification reason.
Extend reputation Service.TestDisqualifyNode with disqualification reason.

Change-Id: I8611b6340c7f42ac1bb8bd0fd7f0648ad650ab2d
2022-04-20 13:29:31 +00:00
Yaroslav Vorobiov
4223fa01f8 satellite/reputation: add disqualification reason for status update
Set disqualification reason when reputations stats are updated on DB.Update.
Added tests for DisqualifyNode and for disqualification cases which happens during Update.

Change-Id: I00130ab5d9722422805159ad2f183c205de60f7e
2022-04-20 13:29:10 +00:00
Fadila Khadar
29fd36a20e satellite/repairer: handle excluded countries
For nodes in excluded areas, we don't necessarily want to remove them
from the pointer, but we do want to increase the number of pieces in the
segment in case those excluded area nodes go down. To do that, we
increase the number of pieces repaired by the number of pieces in
excluded areas.

Change-Id: I0424f1bcd7e93f33eb3eeeec79dbada3b3ea1f3a
2022-03-14 10:59:36 -04:00
Moby von Briesen
b2d342aa9b satellite/overlay: Add ability to exclude country codes on upload
Create global config to specify a list of country codes that should be
excluded from node selection during uploads.

This exclusion is not implemented when the upload selection cache is
disabled.

Change-Id: Ic41e8b4f18857a11045668eac23107da99668a72
2022-03-03 16:58:48 +00:00
Fadila Khadar
e776c65172 satellite/checker: pieces in excluded countries are not healthy
Add a RepairExcludedCountryCodes config flag for overlay for providing a list of country codes to exclude nodes from target repair selection.

Mark segments with less than repairThreshold pieces in countries not in the RepairExcludedCountryCodes as not healthy.
With this change, the repair process is not affected. The segment will be removed from the repair queue by the repairer.

Another change will handle the logic at the repairer level.

Fixes https://github.com/storj/team-metainfo/issues/95

Change-Id: I9231b32de117a116488de055a3e94efcabb46e81
2022-03-02 09:59:09 +00:00
Yingrong Zhao
1f8f7ebf06 satellite/{audit, reputation}: fix potential nodes reputation status
inconsistency

The original design had a flaw which can potentially cause discrepancy
for nodes reputation status between reputations table and nodes table.
In the event of a failure(network issue, db failure, satellite failure, etc.)
happens between update to reputations table and update to nodes table, data
can be out of sync.
This PR tries to fix above issue by passing through node's reputation from
the beginning of an audit/repair(this data is from nodes table) to the next
update in reputation service. If the updated reputation status from the service
is different from the existing node status, the service will try to update nodes
table. In the case of a failure, the service will be able to try update nodes
table again since it can see the discrepancy of the data. This will allow
both tables to be in-sync eventually.

Change-Id: Ic22130b4503a594b7177237b18f7e68305c2f122
2022-01-06 21:05:59 +00:00
Márton Elek
76c2228fbd satellite/metainfo: propagate geofencing between buckets and stream id
Github: https://github.com/storj/storj/issues/4245

Change-Id: I83d34367aab1f3c0d46a044f54980b2d50174b19
2021-11-24 08:05:05 +00:00
Mya
bf51c286d9
satellite/geoip: update node check-in to associate a country code
Resolves https://github.com/storj/storj/issues/4247

Change-Id: Idfd71bf1795d48ca3c686066bbdb95b9c6594f00
2021-11-10 16:44:41 +01:00
Márton Elek
fb604be460 satellite/gracefulexit: use nodecache with gracefulexit
Change-Id: I700caf6dfd06bd3b3970a8cf7c5a79da4af27b5f
2021-11-05 19:14:14 +00:00
Yingrong Zhao
e00b573f8f satellite/overlay: fix UpdateCheckIn comment
Change-Id: I8584895249d7e5be6dbec79974fc9f77b7a1930c
2021-08-06 05:54:00 +00:00
Egon Elbre
65804801ec all: fix mon.Task leak
Change-Id: Ifd58c7ac5631b9c3c750b3f4cc50525167e90709
2021-08-05 14:07:45 +03:00
Yingrong Zhao
ae02f6deda satellite/reputation: return default reputation stats when node is not
found

Change-Id: I587d0ab36ffa0efaf345a6a6e221ae5d2068e1c5
2021-08-04 19:34:54 +00:00
Yingrong Zhao
646ce5b8cc satellite/overlay: remove reputation logic from overlay
Change-Id: I3492860e4537c7a8e4e824ec4c9c8d179134a0c0
2021-07-28 15:15:28 -04:00
Yingrong Zhao
f8914ccce0 satellite/{repair, overlay}: use reputation store in repair
Change-Id: I48db9e68f48239d48621ccc77d33618ecb83ce1a
2021-07-28 13:22:05 -04:00
Yingrong Zhao
6c7bf357cd satellite/{reputation,audit,overlay}: replace overlay with reputation
package in audit

This PR implements reputation store and replace overlay in audit service
to use such store for storing node's audit stats.

In order to keep the changeset smaller, most of the changes in this PR is for copying audit logic in overlay to
reputation package. In a following PR, the duplicating code will be
removed from overlay.

Change-Id: I16c12494a0970f44c422b26cf603c1dc489e5bc1
2021-07-28 13:10:48 -04:00
Cameron Ayer
8c124c6fa4 satellite/{reputation,overlay,satellitedb}: create reputation service, DB, add overlay method UpdateReputation
Define service and DB interface for storing node reputation data
and updating the overlay cache.
Add overlay service and DB method UpdateReputation.
See https://github.com/storj/storj/pull/4144

Change-Id: Iedd8bd3274457d26c595919303d55327c1464b8c
2021-06-24 16:19:15 +00:00
Jeremy Wharton
8a070e7c25 satellite/overlay: Ignore unnecessary check-ins
This prevents the database from being contacted unnecessarily,
reducing load.

Change-Id: Ib2420f68a20636ec35eb3dd3df8e02bd5341b419
2021-06-22 09:00:41 +00:00
Egon Elbre
0858c3797a satellite/{metabase,satellitedb}: deduplicate AS OF SYSTEM TIME code
Currently we were duplicating code for AS OF SYSTEM TIME in several
places. This replaces the code with using a method on
dbutil.Implementation.

As a consequence it's more useful to use a shorter name for
implementation - 'impl' should be sufficiently clear in the context.

Similarly, using AsOfSystemInterval and AsOfSystemTime to distinguish
between the two modes is useful and slightly shorter without causing
confusion.

Change-Id: Idefe55528efa758b6176591017b6572a8d443e3d
2021-05-11 12:40:36 +03:00
Cameron Ayer
a0c5da6643 satellite/satellitedb: in stray nodes DQ, don't DQ nodes where last_contact_success = '0001-01-01 00:00:00+00'
When nodes check in for the very first time, if the satellite can't ping
them back, they are inserted into the nodes table with
last_contact_success of '0001-01-01 00:00:00+00'. If the stray nodes
chore runs before the node can fix their problem, they are DQd.

Solution: when DQing stray nodes, dont DQ where last_contact_success =
'0001-01-01 00:00:00+00'::timestamptz

Change-Id: I477a02d5ef85b2c930ed6b7d99a4d1995169bca8
2021-04-22 10:13:13 -04:00