storj

Author	SHA1	Message	Date
Márton Elek	70cdca5d3c	satellite: move satellite/nodeselection/uploadselection => satellite/nodeselection All the files in uploadselection are (in fact) related to generic node selection, and used not only for upload, but for download, repair, etc... Change-Id: Ie4098318a6f8f0bbf672d432761e87047d3762ab	2023-07-07 10:32:03 +02:00
Márton Elek	ddf1f1c340	satellite/{nodeselection,overlay}: NodeFilters for dynamic placement implementations Change-Id: Ica3a7b535fa6736cd8fb12066e615b70e1fa65d6	2023-07-06 12:08:01 +00:00
Michal Niewrzal	21c1e66a85	satellite/overlay: refactor ReliabilityCache to keep more data ReliabilityCache will be now using refactored overlay Reliable method. This method will provide more info about nodes (e.g. country code) and with this we are able to add two dedicated methods to classify pieces: * OutOfPlacementPieces * PiecesNodesLastNetsInOrder With those new method we will fix issue where offline but reliable node won't be checked for clumped pieces and off placement pieces. https://github.com/storj/storj/issues/5998 Change-Id: I9ffbed9f07f4881c9db3bd0e5f0412f1a418dd82	2023-07-05 11:19:10 +02:00
Michal Niewrzal	f2cd7b0928	satellite/overlay: refactor Reliable to be used with repair checker Currently we are using Reliable to get missing pieces for repair checker. The issue is that now checker is looking at more things than just missing pieces (clumped/off, placement pieces) and using only node ID is not enough. We have issue where we are skipping offline nodes from clumped and off placement pieces check. Reliable was refactored to get data (e.g. country, lastNet) about all reliable nodes. List is split into online and offline. This data will be cached for quick use by repair checker. It will be also possible to check nodes metadata like country code or lastNet. We are also slowly moving `RepairExcludedCountryCodes` config from overlay to repair which makes more sens for it. This this first part of changes. https://github.com/storj/storj/issues/5998 Change-Id: If534342488c0e440affc2894a8fbda6507b8959d	2023-07-05 10:56:31 +02:00
Márton Elek	500b6244f8	satellite/satellitedb: create table for node tags Change-Id: I884bb740974e6b8241aa6b85faf266b85fe892d4	2023-07-05 09:38:53 +02:00
Márton Elek	d38b8fa2c4	satellite/nodeselection: use the same Node object from overlay and nodeselection We use two different Node types in `overlay` and `uploadnodeselection` and converting back and forth. Using the same object would allow us to use a unified node selection interface everywhere. Change-Id: Ie71e29d60184ee0e5b4547eb54325f09c418f73c	2023-07-03 16:59:33 +00:00
Michal Niewrzal	98f4f249b2	satellite/overlay: refactor KnownReliable to be used with repairer Currently we are using KnownUnreliableOrOffline to get missing pieces for segment repairer (GetMissingPieces). The issue is that now repairer is looking at more things than just missing pieces (clumped/off placement pieces). KnownReliable was refactored to get data (e.g. country, lastNet) about all reliable nodes from provided list. List is split into online and offline. This way we will be able to use results from this method to all checks: missing pieces, clumped pieces, out of placement pieces. This this first part of changes to handle different kind of pieces in segment repairer. https://github.com/storj/storj/issues/5998 Change-Id: I6cbaf59cff9d6c4346ace75bb814ccd985c0e43e	2023-06-27 13:27:23 +02:00
Michal Niewrzal	eb407b2ae3	satellite/overlay: delete unused KnownOffline method Change-Id: Ief9288fee83f9c381dd7840f48333babcd3d6bf7	2023-06-23 13:24:30 +00:00
Michal Niewrzal	9e3fd4d514	satellite/overlay: delete unused method Change-Id: I87828fcac4f4a9fb08c86af188aa6ea28c5c64af	2023-06-22 12:45:59 +00:00
Michal Niewrzal	f7c7851519	satellite/metainfo: filter metainfo.GetObjectIPs by bucket/object placement For now we will use bucket placement to determine if we should exclude some node IPs from metainfo.GetObjectIPs results. Bucket placement is retrieved directly from DB in parallel to metabase GetStreamPieceCountByNodeID request. GetObjectIPs is not heavily used so additional request to DB shouldn't be a problem for now. https://github.com/storj/storj/issues/5950 Change-Id: Idf58b1cfbcd1afff5f23868ba2f71ce239f42439	2023-06-07 16:52:02 +00:00
Michal Niewrzal	fe21fd42f7	satellite/overlay: add GetNodesOutOfPlacement method We would like to verify if nodes matches specific placement e.g. to validate segment pieces are correctly geofenced. https://github.com/storj/storj/issues/5896 Change-Id: I842767dccc121a3c60224f677ab55e5dc150c76e	2023-05-30 14:57:20 +02:00
Michal Niewrzal	c48bd81e5f	satellite/satellitedb: update SelectAllStorageNodes* to set country code Methods SelectAllStorageNodesUpload and SelectAllStorageNodesDownload are not returning full info with overlay.SelectedNode because its missing CountryCode. Change-Id: Ie3cb396bf28d7ec4c6ab8927e5bb560236036aa6	2023-05-26 11:02:29 +00:00
paul cannon	c856d45cc0	satellite/overlay: fix GetNodesNetworkInOrder We were using the UploadSelectionCache previously, which does _not_ have all nodes, or even all online nodes, in it. So all nodes with less than MinimumVersion, or with less than MinimumDiskSpace, or nodes suspended for unknown audit errors, or nodes that have started graceful exit, were all missing, and ended up having empty last_nets. Even with all that, I'm kind of surprised how many nodes this involved, but using the upload selection cache was definitely wrong. This change uses the download selection cache instead, which excludes nodes only when they are disqualified, gracefully exited (completely), or offline. Change-Id: Iaa07c988aa29c1eb05796ac48a6f19d69f5826c1	2023-05-19 08:08:08 +00:00
paul cannon	958d8676d0	satellite/overlay: remove unnecessary test helper Change-Id: I8439eec4ed440f60353fc620ca906a917a03613c	2023-05-17 17:04:54 +00:00
paul cannon	75d10fe4fa	satellite/overlay: use UploadSelectionCache for GetNodesNetworkInOrder The query for GetNodesNetworkInOrder is causing far too much load on the database. Since it is not critical that the repair checker have perfectly up-to-date node network information, we can use a cache instead. Change-Id: I07ad45bfdeb46529da093941a06c2da8a00ce878	2023-05-16 17:32:09 +00:00
Michal Niewrzal	36e046375c	satellite/repair/checker: remove segments loop parts We are switching completely to ranged loop. https://github.com/storj/storj/issues/5368 Change-Id: I8583549973cd36aa0e0c482c20d7a75cb7568ab3	2023-05-08 12:19:13 +00:00
Michal Niewrzal	1aa24b9f0d	satellite/audit: remove segments loop parts We are switching completely to ranged loop. https://github.com/storj/storj/issues/5368 Change-Id: I9cec0ac454f40f19d52c078a8b1870c4d192bd7a	2023-04-24 15:52:11 +00:00
Egon Elbre	f40a0cb7ba	satellite/*: use typed lrucache and ReadCache Change-Id: Ieee535dd8735a95dd196a77413e4a25a6a72342c	2023-04-21 10:49:08 +00:00
paul cannon	915f3952af	satellite/repair: repair pieces on the same last_net We avoid putting more than one piece of a segment on the same /24 network (or /64 for ipv6). However, it is possible for multiple pieces of the same segment to move to the same network over time. Nodes can change addresses, or segments could be uploaded with dev settings, etc. We will call such pieces "clumped", as they are clumped into the same net, and are much more likely to be lost or preserved together. This change teaches the repair checker to recognize segments which have clumped pieces, and put them in the repair queue. It also teaches the repair worker to repair such segments (treating clumped pieces as "retrievable but unhealthy"; i.e., they will be replaced on new nodes if possible). Refs: https://github.com/storj/storj/issues/5391 Change-Id: Iaa9e339fee8f80f4ad39895438e9f18606338908	2023-04-06 17:34:25 +00:00
Kaloyan Raev	98b82486bd	satellite/overlay: update country code on every node check-in We have a specific issue that a user uploaded a file to a bucket geo-fenced to EU and one of the pieces appeared to be on a node in the US. The country code of this node is set to SE (Sweden) in the satellite DB. It turns out that some time ago MaxMind changed the country code of this node's IP from Sweden to US, but this change hasn't been reflected in the satellite's database. So far the satellite updates the country code of a node only if its IP changes. It was assumed that if the IP does not change, its country code shouldn't change too. This turned to be a wrong assumption. With this change, the satellite will look up the MaxMindDB on every check-in to see if the country code of the node's IP has changed. Change-Id: Icdf659b09be9fc6ad14601902032b35ba5ea78c4	2023-03-22 08:38:51 +00:00
paul cannon	fd6ce6b9a5	scripts/tests: fix test-sim-rolling-upgrade.sh This test involves a satellite with dev defaults (DistinctIP=no) being upgraded past commit `2522ff09b6`, which means we need to run the dev-defaults-satellite-upgrade migration SQL to avoid getting DistinctIP=yes behavior (which breaks the tests). Change-Id: I29fb596d1ffa568dad635d98cfe9abacd3aaa48f	2023-03-09 23:35:36 +00:00
Márton Elek	ffaf15a3b0	satellite/overlay: remove unused mail service from overlay It was surprising that `satellite auditor` complained about SMTP mail settings, even if it's not supposed to sending any mail. Looks like we can remove the mail service dependency, as it's not a hard requirement for overlay.Service. Change-Id: I29a52eeff3f967ddb2d74a09458dc0ee2f051bd7	2023-03-09 12:17:35 +00:00
paul cannon	2522ff09b6	satellite/overlay: configurable meaning of last_net Up to now, we have been implementing the DistinctIP preference with code in two places: 1. On check-in, the last_net is determined by taking the /24 or /64 (in ResolveIPAndNetwork()) and we store it with the node record. 2. On node selection, a preference parameter defines whether to return results that are distinct on last_net. It can be observed that we have never yet had the need to switch from DistinctIP to !DistinctIP, or from !DistinctIP to DistinctIP, on the same satellite, and we will probably never need to do so in an automated way. It can also be observed that this arrangement makes tests more complicated, because we often have to arrange for test nodes to have IP addresses in different /24 networks (a particular pain on macOS). Those two considerations, plus some pending work on the repair framework that will make repair take last_net into consideration, motivate this change. With this change, in the #2 place, we will _always_ return results that are distinct on last_net. We implement the DistinctIP preference, then, by making the #1 place (ResolveIPAndNetwork()) more flexible. When DistinctIP is enabled, last_net will be calculated as it was before. But when DistinctIP is _off_, last_net can be the same as address (IP and port). That will effectively implement !DistinctIP because every record will have a distinct last_net already. As a side effect, this flexibility will allow us to change the rules about last_net construction arbitrarily. We can do tests where last_net is set to the source IP, or to a /30 prefix, or a /16 prefix, etc., and be able to exercise the production logic without requiring a virtual network bridge. This change should be safe to make without any migration code, because all known production satellite deployments use DistinctIP, and the associated last_net values will not change for them. They will only change for satellites with !DistinctIP, which are mostly test deployments that can be recreated trivially. For those satellites which are both permanent and !DistinctIP, node selection will suddenly start acting as though DistinctIP is enabled, until the operator runs a single SQL update "UPDATE nodes SET last_net = last_ip_port". That can be done either before or after deploying software with this change. I also assert that this will not hurt performance for production deployments. It's true that adding the distinct requirement to node selection makes things a little slower, but the distinct requirement is already present for all production deployments, and they will see no change. Refs: https://github.com/storj/storj/issues/5391 Change-Id: I0e7e92498c3da768df5b4d5fb213dcd2d4862924	2023-03-09 02:20:12 +00:00
JT Olio	77bf88e916	satellite/overlay: check node difficulty before entering database closes https://github.com/storj/storj/issues/5568 Change-Id: Id413637c2678e7a7cf8dbf414e082c687c8e8a39	2023-02-15 17:46:25 +00:00
JT Olio	522aed083d	private/server,satellite/contact,misc: use new storj/common noise helpers this change uses the new storj/common noise helpers, which: * add a security fix (require an expected node id for validating noise key attestations) * stops doing an unnecessary order signature validation (it's already been done inside of PutPiece) * removes some duplicate code Change-Id: I5e67a08ff216cd9c5b0b82e40b4d9de664b6b0fc	2023-02-07 09:53:45 -05:00
paul cannon	0e2fef977f	satellite/overlay: add SetAllContainedNodes method to overlay.DB We will be needing an infrequent chore to check which nodes are in the reverify queue and synchronize that set with the 'contained' field in the nodes db, since it is easily possible for them to get out of sync. (We can't require that the reverification queue table be in the same database as the nodes table, so maintaining consistency with SQL transactions is out. Plus, even if they were in the same database, using such SQL transactions to maintain consistency would be slow and unwieldy.) This commit adds a method to the overlay allowing the caller to set the contained status of all nodes in the nodes table at once. This is valid because our definition of "contained" now depends solely on whether a node appears at least once in the reverification queue. Only rows whose contained field does not match the expectation will be updated; the contained timestamp will not be updated for a node which is supposed to be contained and was already contained. Change-Id: I8cabe56ad897b6027e11aa5b17175295391aa3ac	2023-02-06 10:18:54 +00:00
JT Olio	686faeedbd	satellite/overlay: return noise info with selected nodes we have two more fields in the database (noise_proto and noise_public_key) that now need to go into pb.NodeAddress when returning AddressedOrderLimits. the only real complication is making sure type conversions between database types and NodeURLs and so on don't lose this new pb.NodeAddress field (NoiseInfo). otherwise this is a relatively straightforward commit Change-Id: I45b59d7b2d3ae21c2e6eb95497f07cd388d454b3	2023-02-02 15:46:27 +00:00
JT Olio	2753d5a32f	satellite/overlay: keep track of noise info per node Change-Id: Icef04c3e87dbf4bb57d3837274c323bf6dd2c81f	2023-02-01 23:03:35 -05:00
Cameron	0596651580	satellite/satellitedb: fix updating nodes.last_software_update_email The CASE expression used to determine which value to set last_software_update_email to did not have an ELSE clause. Therefore, when the node is both below the minimum version and did not receive a version update email (no condition is true), the value would be set to NULL. Additionally, replace `time.Now()` with `timestamp` in the check to determine if the email cooldown has passed. Change-Id: I2e2e93f1a865e123ed8b665be9621cebfb72236f	2023-02-01 17:25:58 -05:00
paul cannon	b6bcb32ecf	satellite/reputation: more accurate "reputation changes" list `overlay.(Service).UpdateReputation()` takes a "reputationChanges" parameter, a slice of node events indicating whether we think the node's disqualification or suspension status is changing. This is necessary so that the overlay service can notify the nodeevents DB about these changes. In several cases, however, this list of events is not constructed correctly, because of missing information about the previous state. In most cases, this is because the node was offline, and the order limit creation functions (which usually obtain and return the prior reputation status) ignored that node. This change makes it so that all callers to `overlay.(Service).UpdateReputation()` can be expected to provide a correct list of change events (as correct as feasible, given that we can't lock the node's information in the database during the entire operation). It ended up that there was only one caller we needed to worry about, and that was reputation.(*Service).ApplyAudit(). So the bulk of this change is teaching that function how to recognize when the prior reputation status was not filled in, and fill it in. Refs: https://github.com/storj/storj/issues/5464 Change-Id: I52ce385fc9c0ce3b283b998d517998e7f4ec8792	2023-01-31 18:39:40 +00:00
JT Olio	e40191afd6	storj: upgrade to use latest storj/common NodeAddress Change-Id: I5987391bcfe5f6dfd7b525698c337a4cbda9b76e	2023-01-25 01:37:26 +00:00
Fadila Khadar	5c3a148d6e	satellite/overlaycache: fix typo in UpdateCheckIn request - fix a malformed SQL query - add test to be sure we don't have this problem again. Change-Id: I3fde8c59ba01335411e51d964bec95bc26cfc961	2022-12-14 22:21:45 +01:00
paul cannon	ed0fa59f23	satellite/overlay: add SetNodeContained() method SetNodeContained() will change the contained flag in the nodes table, which will affect whether nodes are selected for new uploads. This flag _should_ correlate with whether or not a given node has any entries in the reverification queue. However, the reverification queue is intended to be 'safely partitionable' from the nodes table, so we can't enforce that characteristic transactionally. But this is ok; there are no dire consequences if they are out of sync. We will be adding a chore that updates the contained flag based on the contents of the reverification queue periodically, if something fails to set it directly when appropriate. Refs: https://github.com/storj/storj/issues/5231 Change-Id: I26460d8718dee63fd55d00a44568b2065fc8fe30	2022-12-01 12:43:40 +00:00
Cameron	87660bd9b3	satellite/overlay/offlinenodes: insert offline nodes into node events Add a new chore to periodically insert nodes who are offline and have not gotten an offline email in a certain amount of time into node events Change-Id: I658b385bb777b0240c98092946a93d65bee94abc	2022-11-18 12:10:06 -05:00
Cameron	8681a36164	satellite/overlay: add ability to get offline nodes in need of email Add LastOfflineEmail to overlay.NodeDossier. This is the last time a node got an offline email. Add two new overlay db methods, GetOfflineNodesForEmail and UpdateLastOfflineEmail. Edit db method UpdateCheckIn to nullify last_offline_email if node is up. Change-Id: I1ee60e7d98dd1b68348a57f9a4fb77c6c9895d6d	2022-11-17 19:03:04 +00:00
Cameron	2a25974261	satellite/overlay: insert node software update events into node events When a node checks in and its version is below the minimum, insert BelowMinVersion event into node events Change-Id: I0e437ac34496778369515cbc40c15676da8b27ae	2022-11-11 20:43:56 +00:00
Cameron	d856569935	satellite/overlay: node software update email cooldown config Change-Id: I792eef4a570e38e94f79a3f73c204b65f86ab541	2022-11-11 20:14:25 +00:00
Cameron	cb0c359b81	satellite/overlay: insert DQ node events for stray nodes Change-Id: I99da11e506ab7f6bcebdb08a5815078a3297c932	2022-11-04 15:48:17 +00:00
Cameron	74ddfab810	satellite/overlay: insert DQ event into node events in overlay.DisqualifyNode Also, return node email from overlaycache db DisqualifyNode to be used in node events insertion Change-Id: I41534cf01351c1690c3966a8055c5fe6fcf0d6a6	2022-11-04 15:18:31 +00:00
Cameron	68fe26ebe5	satelite/overlay: insert reputation events into node events Insert reputation event into node events if reputation change occurs. Change-Id: If1c5526092cb6834fe2faa6aa6e0306d4d88a4b7	2022-11-02 18:32:20 +00:00
Cameron	865974950d	satellite/overlay: insert node online events into node events table Instead of sending emails at the time the node is seen to be back online, we have decided to send the event to the node events table, which will initiate the email sending process at some point. Change-Id: Id756209498112579de8e78ee20ad2df54571a617	2022-11-02 16:26:19 +00:00
Cameron	f06da25c3d	satellite/overlay: add nodeevents.DB to satellite overlay service Add nodeevents.DB to satellite overlay service so we can insert node events into the nodeevents DB. Change-Id: I642c0ccc9941ecdb08cb22d5c8cf701959a55156	2022-11-02 15:56:37 +00:00
Cameron	8c8688ca6b	satellite/overlay: return email as part of NodeReputation Make email accessible for email sending after reputation updates Change-Id: I760feee0f6ca58b76a2955a04c0c366c618656bb	2022-10-26 17:33:22 +00:00
Cameron	a2ca443e29	satellite/overlay: send Node Online emails Send an email when a node goes from offline to online. Change-Id: I82d3f9001b9b0669e096d784edf097ffdcb1697d	2022-10-20 18:56:35 +00:00
Cameron	a52f766273	satellite/overlay: add email-sending functionality to overlay service We want to send emails to SNOs. Node status changes go through the overlay service, so it's a good place to add the mail service. Add the mailservice.Service, satellite address, and satellite name to overlay service. Also add feature flag --overlay.send-node-emails Change-Id: I3bd2cb3bf22f9724954ce2374f8b651b902b3a24	2022-10-13 18:01:05 +00:00
Michal Niewrzal	a22e6bdf67	satellite/gc/bloomfilter: use int64 to count pieces Pieces count in DB are stored as int64 and we would like to align bloom filter processing with this type. Change-Id: Iaec767e609a40d802077ae057520541805a7c44f	2022-09-22 09:39:53 +00:00
Egon Elbre	cf50696745	cmd/tools/segment-verify: wire up overlay logic Change-Id: I0a4c737a8b0995a1c3e3adeac728fe833d0ce684	2022-09-19 11:32:18 +03:00
paul cannon	07bbe7d340	satellite/overlay: don't insert new nodes if contact check failed Currently, the satellite tracks connectivity information about all nodes that have contacted it, even if we have never successfully contacted the node back. This behavior was leveraged during a security audit to create hundreds of thousands of "junk nodes" in the nodes table on one satellite, which affected performance of queries such as node selection. With this change, we should no longer track information about nodes that have never been successfully contacted. Note that it will still be possible to cause the creation of "junk node" entries in the db; the attacker just has to set up individual publicly-routable IP+port pairs for each node as it is created, so it can respond to a PingBack. Change-Id: Ibb6da6cc908fd4fc85aae1ba00313ba2738409ab	2022-08-26 09:05:04 +00:00
JT Olio	12ef68bdf4	satellitedb: restore-trash shouldn't bother with nodes we've never talked to Change-Id: I3c631e92a9a2670c52c9fa23b2b1baf67d3c9a9c	2022-08-25 16:14:29 +00:00
paul cannon	0dcc0a9ee0	satellite/reputation: reconfigure lambda and alpha This is in response to community feedback that our existing reputation calculation is too likely to disqualify storage nodes unfairly with extreme swings up and down. For details and analysis, please see the data_loss_vs_dq_chance_sim.py tool, the "tuning reputation further.ipynb" Jupyter notebook in the storj/datascience repository, and the discussion at https://forum.storj.io/t/tuning-audit-scoring/14084 In brief: changing the lambda and initial-alpha parameters in this way causes the swings in reputation to be smaller and less likely to put a node past the disqualification threshold unfairly. Note: this change will cause a one-time reset of all (non-disqualified) node reputations, because the new initial alpha value of 1000 is dramatically different, and the disqualification threshold is going to be much higher. Change-Id: Id6dc4ba8fde1be3db4255b72282207bab5491ca3	2022-08-17 18:52:53 +00:00

1 2 3 4 5

242 Commits