storj

Author	SHA1	Message	Date
Márton Elek	6304046e80	satellite/nodeselection: read email + wallet from db to SelectedNode NodeSelection struct is used to make decisions (and assertions) related to node selection. Usually we don't use email and wallet for placement decision, as they are not reliable. But there are cases, when we know that the email address is confirmed. Also, it can be used for upper-bound estimations (if same wallet is used for too many pieces in a segment, it's a sign of a risk, even if not all the risks can be detected with this approach, as one owner can use different wallets). Long story short: let's put wallet and email to the SelectedNode. Change-Id: I922185e3769d43eb7762b8d60d88ecd3d50991bb	2023-10-03 18:15:56 +00:00
Márton Elek	ad87d1de74	satellite/satellitedb/overlaycache: fill node tags with join for limited number of nodes The easiest way to get node information WITH node tags is executing two queries: 1. select all nodes 2. select all tags And we can pair them with a loop, using the in-memory data structures. But this approach does work only, if we select all nodes, which is true when we use cache (upload, download, repair checker). But repair process selects only the required nodes, where this approach is suboptimal. (full table scan for all tags, even if we need only tags for a few dozens nodes). Possible solutions: 1. We can introduce a cache for repair (similar to upload cache) 2. Or we can select both node and tag information with one query (join). This patch implements the second approach. Note: repair itself is quite slow (10-20 seconds per segements to repair). With 15 seconds execution time and 3 minutes cache staleness, we would use the cache only 12 times per worker. Probably we don't need cache for now. https://github.com/storj/storj/issues/6198 Change-Id: I0364d94306e9815a1c280b71e843b8f504e3d870	2023-09-07 19:27:53 +02:00
Márton Elek	e2006d821c	satellite/overlay: change Reliable and KnownReliable as GetParticipatingNodes and GetNodes, respectively. We now want these functions to include offline and suspended nodes as well, so that we can force immediate repair when pieces are out of placement or in excluded countries. With that change, the old names no longer made sense. Change-Id: Icbcbad43dbde0ca8cbc80a4d17a896bb89b078b7	2023-09-02 23:34:50 +00:00
paul cannon	6e46a926bb	satellite/nodeselection: expand SelectedNode In the repair subsystem, it is necessary to acquire several extra properties of nodes that are holding pieces of things or may be selected to hold pieces. We need to know if a node is 'online' (the definition of "online" may change somewhat depending on the situation), if a node is in the process of graceful exit, and whether a node is suspended. We can't just filter out nodes with all of these properties, because sometimes we need to know properties about nodes even when the nodes are suspended or gracefully exiting. I thought the best way to do this was to add fields to SelectedNode, and (to avoid any confusion) arrange for the added fields to be populated wherever SelectedNode is returned, whether or not the new fields are necessarily going to be used. If people would rather I use a separate type from SelectedNode, I can do that instead. Change-Id: I7804a0e0a15cfe34c8ff47a227175ea5862a4ebc	2023-08-07 12:44:49 +00:00
Márton Elek	70cdca5d3c	satellite: move satellite/nodeselection/uploadselection => satellite/nodeselection All the files in uploadselection are (in fact) related to generic node selection, and used not only for upload, but for download, repair, etc... Change-Id: Ie4098318a6f8f0bbf672d432761e87047d3762ab	2023-07-07 10:32:03 +02:00
Márton Elek	8b4387a498	satellite/satellitedb: add tag information to nodes selected for upload/downloads Change-Id: I0fa7daebcf83f7949726e5fffe68e0bdc6fd1d7a	2023-07-07 07:54:16 +00:00
Michal Niewrzal	f2cd7b0928	satellite/overlay: refactor Reliable to be used with repair checker Currently we are using Reliable to get missing pieces for repair checker. The issue is that now checker is looking at more things than just missing pieces (clumped/off, placement pieces) and using only node ID is not enough. We have issue where we are skipping offline nodes from clumped and off placement pieces check. Reliable was refactored to get data (e.g. country, lastNet) about all reliable nodes. List is split into online and offline. This data will be cached for quick use by repair checker. It will be also possible to check nodes metadata like country code or lastNet. We are also slowly moving `RepairExcludedCountryCodes` config from overlay to repair which makes more sens for it. This this first part of changes. https://github.com/storj/storj/issues/5998 Change-Id: If534342488c0e440affc2894a8fbda6507b8959d	2023-07-05 10:56:31 +02:00
Márton Elek	d38b8fa2c4	satellite/nodeselection: use the same Node object from overlay and nodeselection We use two different Node types in `overlay` and `uploadnodeselection` and converting back and forth. Using the same object would allow us to use a unified node selection interface everywhere. Change-Id: Ie71e29d60184ee0e5b4547eb54325f09c418f73c	2023-07-03 16:59:33 +00:00
Michal Niewrzal	98f4f249b2	satellite/overlay: refactor KnownReliable to be used with repairer Currently we are using KnownUnreliableOrOffline to get missing pieces for segment repairer (GetMissingPieces). The issue is that now repairer is looking at more things than just missing pieces (clumped/off placement pieces). KnownReliable was refactored to get data (e.g. country, lastNet) about all reliable nodes from provided list. List is split into online and offline. This way we will be able to use results from this method to all checks: missing pieces, clumped pieces, out of placement pieces. This this first part of changes to handle different kind of pieces in segment repairer. https://github.com/storj/storj/issues/5998 Change-Id: I6cbaf59cff9d6c4346ace75bb814ccd985c0e43e	2023-06-27 13:27:23 +02:00
Michal Niewrzal	c48bd81e5f	satellite/satellitedb: update SelectAllStorageNodes* to set country code Methods SelectAllStorageNodesUpload and SelectAllStorageNodesDownload are not returning full info with overlay.SelectedNode because its missing CountryCode. Change-Id: Ie3cb396bf28d7ec4c6ab8927e5bb560236036aa6	2023-05-26 11:02:29 +00:00
paul cannon	915f3952af	satellite/repair: repair pieces on the same last_net We avoid putting more than one piece of a segment on the same /24 network (or /64 for ipv6). However, it is possible for multiple pieces of the same segment to move to the same network over time. Nodes can change addresses, or segments could be uploaded with dev settings, etc. We will call such pieces "clumped", as they are clumped into the same net, and are much more likely to be lost or preserved together. This change teaches the repair checker to recognize segments which have clumped pieces, and put them in the repair queue. It also teaches the repair worker to repair such segments (treating clumped pieces as "retrievable but unhealthy"; i.e., they will be replaced on new nodes if possible). Refs: https://github.com/storj/storj/issues/5391 Change-Id: Iaa9e339fee8f80f4ad39895438e9f18606338908	2023-04-06 17:34:25 +00:00
paul cannon	0e2fef977f	satellite/overlay: add SetAllContainedNodes method to overlay.DB We will be needing an infrequent chore to check which nodes are in the reverify queue and synchronize that set with the 'contained' field in the nodes db, since it is easily possible for them to get out of sync. (We can't require that the reverification queue table be in the same database as the nodes table, so maintaining consistency with SQL transactions is out. Plus, even if they were in the same database, using such SQL transactions to maintain consistency would be slow and unwieldy.) This commit adds a method to the overlay allowing the caller to set the contained status of all nodes in the nodes table at once. This is valid because our definition of "contained" now depends solely on whether a node appears at least once in the reverification queue. Only rows whose contained field does not match the expectation will be updated; the contained timestamp will not be updated for a node which is supposed to be contained and was already contained. Change-Id: I8cabe56ad897b6027e11aa5b17175295391aa3ac	2023-02-06 10:18:54 +00:00
Fadila Khadar	5c3a148d6e	satellite/overlaycache: fix typo in UpdateCheckIn request - fix a malformed SQL query - add test to be sure we don't have this problem again. Change-Id: I3fde8c59ba01335411e51d964bec95bc26cfc961	2022-12-14 22:21:45 +01:00
paul cannon	ed0fa59f23	satellite/overlay: add SetNodeContained() method SetNodeContained() will change the contained flag in the nodes table, which will affect whether nodes are selected for new uploads. This flag _should_ correlate with whether or not a given node has any entries in the reverification queue. However, the reverification queue is intended to be 'safely partitionable' from the nodes table, so we can't enforce that characteristic transactionally. But this is ok; there are no dire consequences if they are out of sync. We will be adding a chore that updates the contained flag based on the contents of the reverification queue periodically, if something fails to set it directly when appropriate. Refs: https://github.com/storj/storj/issues/5231 Change-Id: I26460d8718dee63fd55d00a44568b2065fc8fe30	2022-12-01 12:43:40 +00:00
Cameron	8681a36164	satellite/overlay: add ability to get offline nodes in need of email Add LastOfflineEmail to overlay.NodeDossier. This is the last time a node got an offline email. Add two new overlay db methods, GetOfflineNodesForEmail and UpdateLastOfflineEmail. Edit db method UpdateCheckIn to nullify last_offline_email if node is up. Change-Id: I1ee60e7d98dd1b68348a57f9a4fb77c6c9895d6d	2022-11-17 19:03:04 +00:00
Egon Elbre	1137620baf	satellite/satellitedb: move tests to their domains Testing interfaces is slightly clearer when it's in the package needing the database rather than each individual implementation. Change-Id: I10334c214a205f7e510b939b4359a2214c4e060a	2021-02-19 17:29:15 +02:00
Yaroslav Vorobiov	966535e9de	{storagenode,satellite}/nodeoperator: add wallet features Change-Id: Iac7eb40a52b8fddcc573aebaad2e3a30a10cded9	2021-02-08 22:09:45 +02:00
Cameron Ayer	d14607a5f7	satellite/{contact,nodestats,overlay,satellitedb}: remove references to total_uptime_count and uptime_success_count columns Change-Id: I1f92022909bc564e9b1e31bf937fdfe7c16554de	2021-01-19 15:43:02 -05:00
Cameron Ayer	75d828200c	private,satellite: add chore to dq stray nodes Full scope: private/testplanet,satellite/{overlay,satellitedb} Description: In most cases, downtime tracking with audits will eventually lead to DQ for nodes who are unresponsive. However, if a stray node has no pieces, it will not be audited and will thus never be disqualified. This chore will check for nodes who have not successfully been contacted in some set time and DQ them. There are some new flags for toggling DQ of stray nodes and the timeframes for running the chore and how long nodes can go without contact. Change-Id: Ic9d41fdbf214736798925e728245180fb3c55615	2021-01-19 14:21:56 -05:00
Cameron Ayer	dc67ce74c9	satellite: remove IsUp field from overlay.UpdateRequest With the new overlay.AuditOutcome type for offline audits, the IsUp field is redundant. If AuditOutcome != AuditOffline, then the node is online. In addition to removing the field itself, other changes needed to be made regarding the relationship between 'uptime' and 'audits'. Previously, uptime and audit outcome were completely separated. For example, it was possible to update a node's stats to give it a successful/failed/unknown audit while simultaneously indicating that the node was offline by setting IsUp to false. This is no longer possible under this changeset. Some test which did this have been changed slightly in order to pass. Also add new benchmarks for UpdateStats and BatchUpdateStats with different audit outcomes. Change-Id: I998892d615850b1f138dc62f9b050f720ea0926b	2020-11-02 15:34:17 -05:00
Jennifer Johnson	4e2413a99d	satellite/satellitedb: uses vetted_at field to select for reputable nodes Additionally, this PR changes NewNodeFraction devDefault and testplanet config from 0.05 to 1. This is because many tests relied on selecting nodes that were reputable based on audit and uptime counts of 0, in effect, selecting new nodes as reputable ones. However, since reputation is now indicated by a vetted_at db field that is explicitly set rather than implied by audit and uptime counts, it would be more complicated to try to update all of the nodes' reputations before selecting nodes for tests. Now we just allow all test nodes to be new if needed. Change-Id: Ib9531be77408662315b948fd029cee925ed2ca1d	2020-09-04 16:45:32 +00:00
Moby von Briesen	60a95d0dc9	satellite/{satellitedb,overlay}: Enable offline suspension and review period When a node's audit history "online score" passes below a configured threshold, the node goes into "offline suspension" mode and begins a review period, where the operator is given an opportunity to bring their node back online. After the review period passes, offline suspension is turned off for the node. In the future, if a node still has a bad online score at the end of the review period, it will be disqualified. This is disabled right now. In the future, if a node is in offline suspension, it will be treated as "unhealthy". Right now, there are no consequences for being in offline suspension. Minor changes: * Moves AuditHistoryConfig out of UpdateStats/BatchUpdateStats args and into UpdateRequest. * Adds "now" argument to UpdateStats/BatchUpdateStats args for easy testing. * Changes formatting strings inside buildUpdateStatement to use specific types. Change-Id: I032b60298840fc16e6ef831da750f2d57619a397	2020-08-28 16:35:48 +00:00
Moby von Briesen	959cd5cd83	satellite/satellitedb: Update audit history from overlay.UpdateStats and overlay.BatchUpdateStats Change-Id: Ib530b61895ca4a8b12ba022c408a416b237b56d7	2020-08-20 22:46:28 +00:00
Jennifer Johnson	03e5f922c3	satellite/overlay: updates node with a vetted_at timestamp if they meet the vetting criteria What: As soon as a node passes the vetting criteria (total_audit_count and total_uptime_count are greater than the configured thresholds), we set vetted_at to the current timestamp. Why: We may want to use this timestamp in future development to select new vs vetted nodes. It also allows flexibility in node vetting experiments and allows for better metrics around vetting times. Please describe the tests: satellitedb_test: TestUpdateStats and TestBatchUpdateStats make sure vetted_at is set appropriately Please describe the performance impact: This change does add extra logic to BatchUpdateStats and UpdateStats and commits another variable to the db (vetted_at), but this should be negligible. Change-Id: I3de804549b5f1bc359da4935bc859758ceac261d	2020-05-20 16:30:26 -04:00
Egon Elbre	3d410add40	satellite/overlay: avoid large statement for piece counts (#3001 )	2019-09-12 00:38:58 +03:00
Bryan White	a33106df1c	satellite/satellitedb: persist piece counts to/from db (#2803 )	2019-08-27 14:37:42 +02:00

26 Commits