storj

Author	SHA1	Message	Date
Márton Elek	ad87d1de74	satellite/satellitedb/overlaycache: fill node tags with join for limited number of nodes The easiest way to get node information WITH node tags is executing two queries: 1. select all nodes 2. select all tags And we can pair them with a loop, using the in-memory data structures. But this approach does work only, if we select all nodes, which is true when we use cache (upload, download, repair checker). But repair process selects only the required nodes, where this approach is suboptimal. (full table scan for all tags, even if we need only tags for a few dozens nodes). Possible solutions: 1. We can introduce a cache for repair (similar to upload cache) 2. Or we can select both node and tag information with one query (join). This patch implements the second approach. Note: repair itself is quite slow (10-20 seconds per segements to repair). With 15 seconds execution time and 3 minutes cache staleness, we would use the cache only 12 times per worker. Probably we don't need cache for now. https://github.com/storj/storj/issues/6198 Change-Id: I0364d94306e9815a1c280b71e843b8f504e3d870	2023-09-07 19:27:53 +02:00
Márton Elek	e2006d821c	satellite/overlay: change Reliable and KnownReliable as GetParticipatingNodes and GetNodes, respectively. We now want these functions to include offline and suspended nodes as well, so that we can force immediate repair when pieces are out of placement or in excluded countries. With that change, the old names no longer made sense. Change-Id: Icbcbad43dbde0ca8cbc80a4d17a896bb89b078b7	2023-09-02 23:34:50 +00:00
Márton Elek	de7aabc8c9	satellite/{repair,rangedloop,overlay}: fix node tag placement selection for repair This patch fixes the node tag based placement of rangedloop/repairchecker + repair process. The main change is just adding the node tags for Reliable and KnownReliabel database calls + adding new tests to prove, it works. https://github.com/storj/storj/issues/6126 Change-Id: I245d654a18c1d61b2c72df49afa0718d0de76da1	2023-08-16 15:45:41 +00:00
Egon Elbre	dc41978743	all: fix golangci failures Change-Id: I07421388d53c837e35a4727cead26fc21c324d04	2023-08-09 11:44:44 +03:00
paul cannon	6e46a926bb	satellite/nodeselection: expand SelectedNode In the repair subsystem, it is necessary to acquire several extra properties of nodes that are holding pieces of things or may be selected to hold pieces. We need to know if a node is 'online' (the definition of "online" may change somewhat depending on the situation), if a node is in the process of graceful exit, and whether a node is suspended. We can't just filter out nodes with all of these properties, because sometimes we need to know properties about nodes even when the nodes are suspended or gracefully exiting. I thought the best way to do this was to add fields to SelectedNode, and (to avoid any confusion) arrange for the added fields to be populated wherever SelectedNode is returned, whether or not the new fields are necessarily going to be used. If people would rather I use a separate type from SelectedNode, I can do that instead. Change-Id: I7804a0e0a15cfe34c8ff47a227175ea5862a4ebc	2023-08-07 12:44:49 +00:00
Michal Niewrzal	1d62dc63f5	satellite/repair/repairer: fix NumHealthyInExcludedCountries calculation Currently, we have issue were while counting unhealthy pieces we are counting twice piece which is in excluded country and is outside segment placement. This can cause unnecessary repair. This change is also doing another step to move RepairExcludedCountryCodes from overlay config into repair package. Change-Id: I3692f6e0ddb9982af925db42be23d644aec1963f	2023-07-10 12:01:19 +02:00
Márton Elek	70cdca5d3c	satellite: move satellite/nodeselection/uploadselection => satellite/nodeselection All the files in uploadselection are (in fact) related to generic node selection, and used not only for upload, but for download, repair, etc... Change-Id: Ie4098318a6f8f0bbf672d432761e87047d3762ab	2023-07-07 10:32:03 +02:00
Márton Elek	8b4387a498	satellite/satellitedb: add tag information to nodes selected for upload/downloads Change-Id: I0fa7daebcf83f7949726e5fffe68e0bdc6fd1d7a	2023-07-07 07:54:16 +00:00
Michal Niewrzal	f2cd7b0928	satellite/overlay: refactor Reliable to be used with repair checker Currently we are using Reliable to get missing pieces for repair checker. The issue is that now checker is looking at more things than just missing pieces (clumped/off, placement pieces) and using only node ID is not enough. We have issue where we are skipping offline nodes from clumped and off placement pieces check. Reliable was refactored to get data (e.g. country, lastNet) about all reliable nodes. List is split into online and offline. This data will be cached for quick use by repair checker. It will be also possible to check nodes metadata like country code or lastNet. We are also slowly moving `RepairExcludedCountryCodes` config from overlay to repair which makes more sens for it. This this first part of changes. https://github.com/storj/storj/issues/5998 Change-Id: If534342488c0e440affc2894a8fbda6507b8959d	2023-07-05 10:56:31 +02:00
Márton Elek	500b6244f8	satellite/satellitedb: create table for node tags Change-Id: I884bb740974e6b8241aa6b85faf266b85fe892d4	2023-07-05 09:38:53 +02:00
Márton Elek	d38b8fa2c4	satellite/nodeselection: use the same Node object from overlay and nodeselection We use two different Node types in `overlay` and `uploadnodeselection` and converting back and forth. Using the same object would allow us to use a unified node selection interface everywhere. Change-Id: Ie71e29d60184ee0e5b4547eb54325f09c418f73c	2023-07-03 16:59:33 +00:00
Michal Niewrzal	98f4f249b2	satellite/overlay: refactor KnownReliable to be used with repairer Currently we are using KnownUnreliableOrOffline to get missing pieces for segment repairer (GetMissingPieces). The issue is that now repairer is looking at more things than just missing pieces (clumped/off placement pieces). KnownReliable was refactored to get data (e.g. country, lastNet) about all reliable nodes from provided list. List is split into online and offline. This way we will be able to use results from this method to all checks: missing pieces, clumped pieces, out of placement pieces. This this first part of changes to handle different kind of pieces in segment repairer. https://github.com/storj/storj/issues/5998 Change-Id: I6cbaf59cff9d6c4346ace75bb814ccd985c0e43e	2023-06-27 13:27:23 +02:00
Michal Niewrzal	eb407b2ae3	satellite/overlay: delete unused KnownOffline method Change-Id: Ief9288fee83f9c381dd7840f48333babcd3d6bf7	2023-06-23 13:24:30 +00:00
Michal Niewrzal	9e3fd4d514	satellite/overlay: delete unused method Change-Id: I87828fcac4f4a9fb08c86af188aa6ea28c5c64af	2023-06-22 12:45:59 +00:00
JT Olio	1437257dbf	satellite: save and return which node features are enabled current feature is if tcp fastopen was successfully enabled Change-Id: Ide251863a9790b0fbebdf2e82dfd2afa8f25c408	2023-06-06 21:13:29 +00:00
Michal Niewrzal	c48bd81e5f	satellite/satellitedb: update SelectAllStorageNodes* to set country code Methods SelectAllStorageNodesUpload and SelectAllStorageNodesDownload are not returning full info with overlay.SelectedNode because its missing CountryCode. Change-Id: Ie3cb396bf28d7ec4c6ab8927e5bb560236036aa6	2023-05-26 11:02:29 +00:00
paul cannon	915f3952af	satellite/repair: repair pieces on the same last_net We avoid putting more than one piece of a segment on the same /24 network (or /64 for ipv6). However, it is possible for multiple pieces of the same segment to move to the same network over time. Nodes can change addresses, or segments could be uploaded with dev settings, etc. We will call such pieces "clumped", as they are clumped into the same net, and are much more likely to be lost or preserved together. This change teaches the repair checker to recognize segments which have clumped pieces, and put them in the repair queue. It also teaches the repair worker to repair such segments (treating clumped pieces as "retrievable but unhealthy"; i.e., they will be replaced on new nodes if possible). Refs: https://github.com/storj/storj/issues/5391 Change-Id: Iaa9e339fee8f80f4ad39895438e9f18606338908	2023-04-06 17:34:25 +00:00
JT Olio	2a63225b98	satellite/{contact,satellitedb}: preserve node message debounce support Change-Id: I453ad35fda4e61d068db0c476dd86b50d7f2d705	2023-03-20 16:13:06 +00:00
paul cannon	97e20bc579	scripts/tests: fix rollingupgrade test even more This might be pretty awful, but at least it is a complete and non-flaky solution. Only when using the rollingupgrade test (which implies a throwaway satellite and also a PostgreSQL backend), create a trigger on the nodes table which forces last_net to be equal to last_ip_port always. Change-Id: I8448cf131e46576d96a414d06780270c7b2b1892	2023-03-13 15:49:07 +00:00
paul cannon	fd6ce6b9a5	scripts/tests: fix test-sim-rolling-upgrade.sh This test involves a satellite with dev defaults (DistinctIP=no) being upgraded past commit `2522ff09b6`, which means we need to run the dev-defaults-satellite-upgrade migration SQL to avoid getting DistinctIP=yes behavior (which breaks the tests). Change-Id: I29fb596d1ffa568dad635d98cfe9abacd3aaa48f	2023-03-09 23:35:36 +00:00
paul cannon	0e2fef977f	satellite/overlay: add SetAllContainedNodes method to overlay.DB We will be needing an infrequent chore to check which nodes are in the reverify queue and synchronize that set with the 'contained' field in the nodes db, since it is easily possible for them to get out of sync. (We can't require that the reverification queue table be in the same database as the nodes table, so maintaining consistency with SQL transactions is out. Plus, even if they were in the same database, using such SQL transactions to maintain consistency would be slow and unwieldy.) This commit adds a method to the overlay allowing the caller to set the contained status of all nodes in the nodes table at once. This is valid because our definition of "contained" now depends solely on whether a node appears at least once in the reverification queue. Only rows whose contained field does not match the expectation will be updated; the contained timestamp will not be updated for a node which is supposed to be contained and was already contained. Change-Id: I8cabe56ad897b6027e11aa5b17175295391aa3ac	2023-02-06 10:18:54 +00:00
JT Olio	686faeedbd	satellite/overlay: return noise info with selected nodes we have two more fields in the database (noise_proto and noise_public_key) that now need to go into pb.NodeAddress when returning AddressedOrderLimits. the only real complication is making sure type conversions between database types and NodeURLs and so on don't lose this new pb.NodeAddress field (NoiseInfo). otherwise this is a relatively straightforward commit Change-Id: I45b59d7b2d3ae21c2e6eb95497f07cd388d454b3	2023-02-02 15:46:27 +00:00
JT Olio	2753d5a32f	satellite/overlay: keep track of noise info per node Change-Id: Icef04c3e87dbf4bb57d3837274c323bf6dd2c81f	2023-02-01 23:03:35 -05:00
Cameron	0596651580	satellite/satellitedb: fix updating nodes.last_software_update_email The CASE expression used to determine which value to set last_software_update_email to did not have an ELSE clause. Therefore, when the node is both below the minimum version and did not receive a version update email (no condition is true), the value would be set to NULL. Additionally, replace `time.Now()` with `timestamp` in the check to determine if the email cooldown has passed. Change-Id: I2e2e93f1a865e123ed8b665be9621cebfb72236f	2023-02-01 17:25:58 -05:00
paul cannon	b6bcb32ecf	satellite/reputation: more accurate "reputation changes" list `overlay.(Service).UpdateReputation()` takes a "reputationChanges" parameter, a slice of node events indicating whether we think the node's disqualification or suspension status is changing. This is necessary so that the overlay service can notify the nodeevents DB about these changes. In several cases, however, this list of events is not constructed correctly, because of missing information about the previous state. In most cases, this is because the node was offline, and the order limit creation functions (which usually obtain and return the prior reputation status) ignored that node. This change makes it so that all callers to `overlay.(Service).UpdateReputation()` can be expected to provide a correct list of change events (as correct as feasible, given that we can't lock the node's information in the database during the entire operation). It ended up that there was only one caller we needed to worry about, and that was reputation.(*Service).ApplyAudit(). So the bulk of this change is teaching that function how to recognize when the prior reputation status was not filled in, and fill it in. Refs: https://github.com/storj/storj/issues/5464 Change-Id: I52ce385fc9c0ce3b283b998d517998e7f4ec8792	2023-01-31 18:39:40 +00:00
JT Olio	e40191afd6	storj: upgrade to use latest storj/common NodeAddress Change-Id: I5987391bcfe5f6dfd7b525698c337a4cbda9b76e	2023-01-25 01:37:26 +00:00
Fadila Khadar	5c3a148d6e	satellite/overlaycache: fix typo in UpdateCheckIn request - fix a malformed SQL query - add test to be sure we don't have this problem again. Change-Id: I3fde8c59ba01335411e51d964bec95bc26cfc961	2022-12-14 22:21:45 +01:00
paul cannon	ed0fa59f23	satellite/overlay: add SetNodeContained() method SetNodeContained() will change the contained flag in the nodes table, which will affect whether nodes are selected for new uploads. This flag _should_ correlate with whether or not a given node has any entries in the reverification queue. However, the reverification queue is intended to be 'safely partitionable' from the nodes table, so we can't enforce that characteristic transactionally. But this is ok; there are no dire consequences if they are out of sync. We will be adding a chore that updates the contained flag based on the contents of the reverification queue periodically, if something fails to set it directly when appropriate. Refs: https://github.com/storj/storj/issues/5231 Change-Id: I26460d8718dee63fd55d00a44568b2065fc8fe30	2022-12-01 12:43:40 +00:00
Cameron	8681a36164	satellite/overlay: add ability to get offline nodes in need of email Add LastOfflineEmail to overlay.NodeDossier. This is the last time a node got an offline email. Add two new overlay db methods, GetOfflineNodesForEmail and UpdateLastOfflineEmail. Edit db method UpdateCheckIn to nullify last_offline_email if node is up. Change-Id: I1ee60e7d98dd1b68348a57f9a4fb77c6c9895d6d	2022-11-17 19:03:04 +00:00
Cameron	2a25974261	satellite/overlay: insert node software update events into node events When a node checks in and its version is below the minimum, insert BelowMinVersion event into node events Change-Id: I0e437ac34496778369515cbc40c15676da8b27ae	2022-11-11 20:43:56 +00:00
Cameron	cb0c359b81	satellite/overlay: insert DQ node events for stray nodes Change-Id: I99da11e506ab7f6bcebdb08a5815078a3297c932	2022-11-04 15:48:17 +00:00
Cameron	74ddfab810	satellite/overlay: insert DQ event into node events in overlay.DisqualifyNode Also, return node email from overlaycache db DisqualifyNode to be used in node events insertion Change-Id: I41534cf01351c1690c3966a8055c5fe6fcf0d6a6	2022-11-04 15:18:31 +00:00
Cameron	8c8688ca6b	satellite/overlay: return email as part of NodeReputation Make email accessible for email sending after reputation updates Change-Id: I760feee0f6ca58b76a2955a04c0c366c618656bb	2022-10-26 17:33:22 +00:00
Michal Niewrzal	a22e6bdf67	satellite/gc/bloomfilter: use int64 to count pieces Pieces count in DB are stored as int64 and we would like to align bloom filter processing with this type. Change-Id: Iaec767e609a40d802077ae057520541805a7c44f	2022-09-22 09:39:53 +00:00
JT Olio	12ef68bdf4	satellitedb: restore-trash shouldn't bother with nodes we've never talked to Change-Id: I3c631e92a9a2670c52c9fa23b2b1baf67d3c9a9c	2022-08-25 16:14:29 +00:00
Egon Elbre	65b5a0fe82	satellite/{overlay,satellitedb}: add AS OF to download selection This should reduce the load caused by download selection queries. Change-Id: Ic1a89d9c3eb5418f0792eb20ec2aece18dc63f2c	2022-06-28 05:18:01 +00:00
paul cannon	aa728bd6ea	satellite/satellitedb: add reputations.disqualification_reason We added nodes.disqualification_reason recently, but we didn't add a corresponding column in the reputations table (despite having a corresponding `disqualified` column there). Without this change, the (very useful and informative) assignments to updateFields.DisqualificationReason in reputations.go have no effect. Refs: https://github.com/storj/storj/issues/4601 Change-Id: I77404902ca64b56aed72f1de76b303fe82b76aab	2022-05-17 10:09:36 -05:00
Yaroslav Vorobiov	3f47d19aa6	satellite/overlay: add disqualification reason Add disqualification reason to NodeDossier. Extend DB.DisqualifyNode with disqualification reason. Extend reputation Service.TestDisqualifyNode with disqualification reason. Change-Id: I8611b6340c7f42ac1bb8bd0fd7f0648ad650ab2d	2022-04-20 13:29:31 +00:00
Yaroslav Vorobiov	4223fa01f8	satellite/reputation: add disqualification reason for status update Set disqualification reason when reputations stats are updated on DB.Update. Added tests for DisqualifyNode and for disqualification cases which happens during Update. Change-Id: I00130ab5d9722422805159ad2f183c205de60f7e	2022-04-20 13:29:10 +00:00
Fadila Khadar	29fd36a20e	satellite/repairer: handle excluded countries For nodes in excluded areas, we don't necessarily want to remove them from the pointer, but we do want to increase the number of pieces in the segment in case those excluded area nodes go down. To do that, we increase the number of pieces repaired by the number of pieces in excluded areas. Change-Id: I0424f1bcd7e93f33eb3eeeec79dbada3b3ea1f3a	2022-03-14 10:59:36 -04:00
Fadila Khadar	e776c65172	satellite/checker: pieces in excluded countries are not healthy Add a RepairExcludedCountryCodes config flag for overlay for providing a list of country codes to exclude nodes from target repair selection. Mark segments with less than repairThreshold pieces in countries not in the RepairExcludedCountryCodes as not healthy. With this change, the repair process is not affected. The segment will be removed from the repair queue by the repairer. Another change will handle the logic at the repairer level. Fixes https://github.com/storj/team-metainfo/issues/95 Change-Id: I9231b32de117a116488de055a3e94efcabb46e81	2022-03-02 09:59:09 +00:00
Yingrong Zhao	1f8f7ebf06	satellite/{audit, reputation}: fix potential nodes reputation status inconsistency The original design had a flaw which can potentially cause discrepancy for nodes reputation status between reputations table and nodes table. In the event of a failure(network issue, db failure, satellite failure, etc.) happens between update to reputations table and update to nodes table, data can be out of sync. This PR tries to fix above issue by passing through node's reputation from the beginning of an audit/repair(this data is from nodes table) to the next update in reputation service. If the updated reputation status from the service is different from the existing node status, the service will try to update nodes table. In the case of a failure, the service will be able to try update nodes table again since it can see the discrepancy of the data. This will allow both tables to be in-sync eventually. Change-Id: Ic22130b4503a594b7177237b18f7e68305c2f122	2022-01-06 21:05:59 +00:00
Márton Elek	76c2228fbd	satellite/metainfo: propagate geofencing between buckets and stream id Github: https://github.com/storj/storj/issues/4245 Change-Id: I83d34367aab1f3c0d46a044f54980b2d50174b19	2021-11-24 08:05:05 +00:00
Mya	bf51c286d9	satellite/geoip: update node check-in to associate a country code Resolves https://github.com/storj/storj/issues/4247 Change-Id: Idfd71bf1795d48ca3c686066bbdb95b9c6594f00	2021-11-10 16:44:41 +01:00
Cameron Ayer	1de8a695e8	satellite/{overlay,satellitedb}: fix stray nodes DQ bug We had a bug in the stray nodes chore where nodes who had not been seen in several months were not being DQd. We figured out that this was happening because we were using two queries: The first to grab nodes where last_contact_success < some cutoff, the second to DQ them unless last_contact_success == '0001-01-01 00:00:00+00'. The problem is that if all of the nodes returned from the first query had last_contact_success of '0001-01-01 00:00:00+00', we would pass them to the second query which would not DQ them. This would result in the stray nodes DQ loop ending since we found a number of nodes to DQ less than the limit. The fix: add the "WHERE last_contact_success != '0001-01-01 00:00:00+00'::timestamptz" to the selection query. Change-Id: I4e60de90b68d8745d641b4467c2b23e0e56f7dff	2021-11-02 17:05:00 +00:00
Cameron Ayer	bb21551a9c	satellite/satellitedb: remove references to contained column in nodes table We don't use this column for anything. If you want to know if a node is contained, you can check the pending_audits table. Change-Id: I8da1d8e01a2dcaff63c5067a7927b5451424ad04	2021-10-14 19:17:46 +00:00
Yingrong Zhao	6c34ff64ad	satellite/satellitedb: remove referrence to audit information in nodes and audit_history tables This PR removes all code reference to audit_histories table and ``` audit_reputation_alpha, audit_reputation_beta, unknown_audit_reputation_alpha, unknown_audit_reputation_beta, ``` columns from nodes table. It also drops audit_histories table from the db since the code that's referencing it currently are not being used. Change-Id: Ifcda8db36afb3a333d487ff831f2fdefc8b02a4c	2021-08-13 21:11:28 +00:00
Yingrong Zhao	e4cc965c39	satellite/satellitedb: replace explicit transaction with dbx query for UpdateReputation Change-Id: I7c139ededea83d4b58107536c3a031c4f92d6eb4	2021-08-05 17:09:49 +00:00
Egon Elbre	65804801ec	all: fix mon.Task leak Change-Id: Ifd58c7ac5631b9c3c750b3f4cc50525167e90709	2021-08-05 14:07:45 +03:00
Yingrong Zhao	646ce5b8cc	satellite/overlay: remove reputation logic from overlay Change-Id: I3492860e4537c7a8e4e824ec4c9c8d179134a0c0	2021-07-28 15:15:28 -04:00

1 2 3 4 5

230 Commits