storj

Author	SHA1	Message	Date
Márton Elek	97a89c3476	satellite: switch to use nodefilters instead of old placement.AllowedCountry placement.AllowedCountry is the old way to specify placement, with the new approach we can use a more generic (dynamic method), which can check full node information instead of just the country code. The 90% of this patch is just search and replace: * we need to use NodeFilters instead of placement.AllowedCountry * which means, we need an initialized PlacementRules available everywhere * which means we need to configure the placement rules The remaining 10% is the placement.go, where we introduced a new type of configuration (lightweight expression language) to define any kind of placement without code change. Change-Id: Ie644b0b1840871b0e6bbcf80c6b50a947503d7df	2023-07-07 16:55:45 +00:00
Michal Niewrzal	98f4f249b2	satellite/overlay: refactor KnownReliable to be used with repairer Currently we are using KnownUnreliableOrOffline to get missing pieces for segment repairer (GetMissingPieces). The issue is that now repairer is looking at more things than just missing pieces (clumped/off placement pieces). KnownReliable was refactored to get data (e.g. country, lastNet) about all reliable nodes from provided list. List is split into online and offline. This way we will be able to use results from this method to all checks: missing pieces, clumped pieces, out of placement pieces. This this first part of changes to handle different kind of pieces in segment repairer. https://github.com/storj/storj/issues/5998 Change-Id: I6cbaf59cff9d6c4346ace75bb814ccd985c0e43e	2023-06-27 13:27:23 +02:00
Márton Elek	ffaf15a3b0	satellite/overlay: remove unused mail service from overlay It was surprising that `satellite auditor` complained about SMTP mail settings, even if it's not supposed to sending any mail. Looks like we can remove the mail service dependency, as it's not a hard requirement for overlay.Service. Change-Id: I29a52eeff3f967ddb2d74a09458dc0ee2f051bd7	2023-03-09 12:17:35 +00:00
paul cannon	2522ff09b6	satellite/overlay: configurable meaning of last_net Up to now, we have been implementing the DistinctIP preference with code in two places: 1. On check-in, the last_net is determined by taking the /24 or /64 (in ResolveIPAndNetwork()) and we store it with the node record. 2. On node selection, a preference parameter defines whether to return results that are distinct on last_net. It can be observed that we have never yet had the need to switch from DistinctIP to !DistinctIP, or from !DistinctIP to DistinctIP, on the same satellite, and we will probably never need to do so in an automated way. It can also be observed that this arrangement makes tests more complicated, because we often have to arrange for test nodes to have IP addresses in different /24 networks (a particular pain on macOS). Those two considerations, plus some pending work on the repair framework that will make repair take last_net into consideration, motivate this change. With this change, in the #2 place, we will _always_ return results that are distinct on last_net. We implement the DistinctIP preference, then, by making the #1 place (ResolveIPAndNetwork()) more flexible. When DistinctIP is enabled, last_net will be calculated as it was before. But when DistinctIP is _off_, last_net can be the same as address (IP and port). That will effectively implement !DistinctIP because every record will have a distinct last_net already. As a side effect, this flexibility will allow us to change the rules about last_net construction arbitrarily. We can do tests where last_net is set to the source IP, or to a /30 prefix, or a /16 prefix, etc., and be able to exercise the production logic without requiring a virtual network bridge. This change should be safe to make without any migration code, because all known production satellite deployments use DistinctIP, and the associated last_net values will not change for them. They will only change for satellites with !DistinctIP, which are mostly test deployments that can be recreated trivially. For those satellites which are both permanent and !DistinctIP, node selection will suddenly start acting as though DistinctIP is enabled, until the operator runs a single SQL update "UPDATE nodes SET last_net = last_ip_port". That can be done either before or after deploying software with this change. I also assert that this will not hurt performance for production deployments. It's true that adding the distinct requirement to node selection makes things a little slower, but the distinct requirement is already present for all production deployments, and they will see no change. Refs: https://github.com/storj/storj/issues/5391 Change-Id: I0e7e92498c3da768df5b4d5fb213dcd2d4862924	2023-03-09 02:20:12 +00:00
Cameron	74ddfab810	satellite/overlay: insert DQ event into node events in overlay.DisqualifyNode Also, return node email from overlaycache db DisqualifyNode to be used in node events insertion Change-Id: I41534cf01351c1690c3966a8055c5fe6fcf0d6a6	2022-11-04 15:18:31 +00:00
Cameron	f06da25c3d	satellite/overlay: add nodeevents.DB to satellite overlay service Add nodeevents.DB to satellite overlay service so we can insert node events into the nodeevents DB. Change-Id: I642c0ccc9941ecdb08cb22d5c8cf701959a55156	2022-11-02 15:56:37 +00:00
Cameron	a52f766273	satellite/overlay: add email-sending functionality to overlay service We want to send emails to SNOs. Node status changes go through the overlay service, so it's a good place to add the mail service. Add the mailservice.Service, satellite address, and satellite name to overlay service. Also add feature flag --overlay.send-node-emails Change-Id: I3bd2cb3bf22f9724954ce2374f8b651b902b3a24	2022-10-13 18:01:05 +00:00
Egon Elbre	48b0a65fbd	satellite/overlay: use ReadCache in Download/UploadSelectionCache sync2.ReadCache implements preemptive refreshing preventing stalling while it's being updated. Change-Id: Iee9ef36049b986f0e426c14a139b2bc9ac17fb53	2022-07-12 13:52:48 +03:00
Yaroslav Vorobiov	3f47d19aa6	satellite/overlay: add disqualification reason Add disqualification reason to NodeDossier. Extend DB.DisqualifyNode with disqualification reason. Extend reputation Service.TestDisqualifyNode with disqualification reason. Change-Id: I8611b6340c7f42ac1bb8bd0fd7f0648ad650ab2d	2022-04-20 13:29:31 +00:00
Egon Elbre	f2d8e97d97	satellite/satellitedb: simplify select nodes query construction Change-Id: I07009b28762d4485929a2a999e8f4be8179bee51	2021-10-22 07:41:07 +00:00
Yingrong Zhao	646ce5b8cc	satellite/overlay: remove reputation logic from overlay Change-Id: I3492860e4537c7a8e4e824ec4c9c8d179134a0c0	2021-07-28 15:15:28 -04:00
Yingrong Zhao	f8914ccce0	satellite/{repair, overlay}: use reputation store in repair Change-Id: I48db9e68f48239d48621ccc77d33618ecb83ce1a	2021-07-28 13:22:05 -04:00
Egon Elbre	8f15f975a2	satellite/overlay: improve contended update checkin Improve UpdateCheckIn on a contended row: name old time/op new time/op delta UpdateCheckInContended-100x-32 2.29s ±55% 0.17s ±61% -92.45% (p=0.008 n=5+5) Change-Id: I053ab9f1cff136c306e5fb57f5e355cdc0269a8c	2021-05-16 20:41:12 +03:00
Egon Elbre	19e3dc4ec0	satellite/overlay: rename NodeSelectionCache to UploadSelectionCache It wasn't obvious that NodeSelectionCache was only for uploads. Change-Id: Ifeeaa6fdb50a4b7916245b48d8634d70ac54459c	2021-01-28 14:56:53 +02:00
Cameron Ayer	d14607a5f7	satellite/{contact,nodestats,overlay,satellitedb}: remove references to total_uptime_count and uptime_success_count columns Change-Id: I1f92022909bc564e9b1e31bf937fdfe7c16554de	2021-01-19 15:43:02 -05:00
Ethan Adams	6070018021	satellite/overlay: use AS OF SYSTEM TIME with Cockroach Query nodes table using AS OF SYSTEM TIME '-10s' (by default) when on CRDB to alleviate contention on the nodes table and minimize CRDB retries. Queries for standard uploads are already cached, and node lookups for graceful exit uploads has retry logic so it isn't necessary for the nodes returned to be current.	2020-12-22 21:07:07 +02:00
Cameron Ayer	dc67ce74c9	satellite: remove IsUp field from overlay.UpdateRequest With the new overlay.AuditOutcome type for offline audits, the IsUp field is redundant. If AuditOutcome != AuditOffline, then the node is online. In addition to removing the field itself, other changes needed to be made regarding the relationship between 'uptime' and 'audits'. Previously, uptime and audit outcome were completely separated. For example, it was possible to update a node's stats to give it a successful/failed/unknown audit while simultaneously indicating that the node was offline by setting IsUp to false. This is no longer possible under this changeset. Some test which did this have been changed slightly in order to pass. Also add new benchmarks for UpdateStats and BatchUpdateStats with different audit outcomes. Change-Id: I998892d615850b1f138dc62f9b050f720ea0926b	2020-11-02 15:34:17 -05:00
Jennifer Johnson	4e2413a99d	satellite/satellitedb: uses vetted_at field to select for reputable nodes Additionally, this PR changes NewNodeFraction devDefault and testplanet config from 0.05 to 1. This is because many tests relied on selecting nodes that were reputable based on audit and uptime counts of 0, in effect, selecting new nodes as reputable ones. However, since reputation is now indicated by a vetted_at db field that is explicitly set rather than implied by audit and uptime counts, it would be more complicated to try to update all of the nodes' reputations before selecting nodes for tests. Now we just allow all test nodes to be new if needed. Change-Id: Ib9531be77408662315b948fd029cee925ed2ca1d	2020-09-04 16:45:32 +00:00
Moby von Briesen	60a95d0dc9	satellite/{satellitedb,overlay}: Enable offline suspension and review period When a node's audit history "online score" passes below a configured threshold, the node goes into "offline suspension" mode and begins a review period, where the operator is given an opportunity to bring their node back online. After the review period passes, offline suspension is turned off for the node. In the future, if a node still has a bad online score at the end of the review period, it will be disqualified. This is disabled right now. In the future, if a node is in offline suspension, it will be treated as "unhealthy". Right now, there are no consequences for being in offline suspension. Minor changes: * Moves AuditHistoryConfig out of UpdateStats/BatchUpdateStats args and into UpdateRequest. * Adds "now" argument to UpdateStats/BatchUpdateStats args for easy testing. * Changes formatting strings inside buildUpdateStatement to use specific types. Change-Id: I032b60298840fc16e6ef831da750f2d57619a397	2020-08-28 16:35:48 +00:00
Moby von Briesen	959cd5cd83	satellite/satellitedb: Update audit history from overlay.UpdateStats and overlay.BatchUpdateStats Change-Id: Ib530b61895ca4a8b12ba022c408a416b237b56d7	2020-08-20 22:46:28 +00:00
Egon Elbre	5bdcd86fa7	ci: test benchmarks This runs each benchmark for one iteration to ensure that they are valid. Unfortunately, it does not give any useful metrics as output. Change-Id: I68940398c8dd849aed656bd12656f48d5df10128	2020-07-10 13:26:49 +00:00
Cameron Ayer	3b4b5f45c7	satellite: replace references to Suspended with UnknownAuditSuspended Change-Id: I3d2d00c95954c0546ad077702617895f262926ef	2020-06-23 14:19:22 +00:00
Egon Elbre	f68e7b3fde	satellite/overlay: replace pb.InfoResponse pb.InfoResponse wasn't used for protocol buffer communication, but instead as a satellite type. Change-Id: I755619f2deec5b76c4fe488591b7d8c1b9fcdafb	2020-06-16 15:16:55 +03:00
Egon Elbre	678b859172	satellite/overlay: remove MinimumRequiredNodes In non-test code we were only using RequestedCount, not need to have MinimumRequiredNodes. Change-Id: I40736f4b028b41e94abfdeb221bce5aa86a5cb82	2020-05-07 15:41:23 +00:00
Egon Elbre	4e94da3fda	satellite/overlay: add feature flag for node selection cache Also distinguish the purpose for selecting nodes to avoid potential confusion, what should allow caching and what shouldn't. Change-Id: Iee2451c1f10d0f1c81feb1641507400d89918d61	2020-05-06 16:13:47 +03:00
Jessica Grebenschikov	6a6427526b	satellite/overlay: remove old updateaddress method The UpdateAddress method use to be used when storage node's checked in with the Satellite, but once the contact service was created this method was no longer used. This PR finally removes it. Change-Id: Ib3f83c8003269671d97d54f21ee69665fa663f24	2020-04-30 06:41:48 +00:00
Jess G	825226c98e	satellite/overlay: use node selection cache for uploads (#3859 ) * satellite/overlay: use node selection cache for uploads Change-Id: Ibd16cccee979d0544f2f4a01749af9f36f02a6ad * fix config lock Change-Id: Idd307e4dee8ab92749f1ec3f996419ea0af829fd * start fixing tests Change-Id: I207d373a3b2a2d9312c9e72fe9bd0b01e06ad6cf * fix test, add some more Change-Id: I82b99c2004fca2510965f9b389f87dd4474bc722 * change config name Change-Id: I0c0f7fc726b2565dc3828cb723f5459a940f2a0b * add benchmarks Change-Id: I05fa25bff8d5b65f94d918556855b95163d002e9 * revert bench to put in different PR Change-Id: I0f6942296895594768f19614bd7b2e3b9b106ade * add staleness to benchmark Change-Id: Ia80a310623d5a342afa6d835402170b531b0f870 * add cache config to testplanet Change-Id: I39abdab8cc442694da543115a9e470b2a8a25dff * have repair select old way Change-Id: I25a938457d7d1bcf89fd15130cb6b0ac19585252 * lower testplante config time Change-Id: Ib56a2ed086c06bc6061388d15a10a2526a663af7 * fix test Change-Id: I3868e9cacde2dfbf9c407afab04dc5fc2f286f69	2020-04-24 09:11:04 -07:00
Jess G	7a4dcd61f7	satellite/overlay: add changes to selected node benchmarks (#3862 ) * add changes to selected node benchmarks Change-Id: I0259af155f9151cc2c7830d10f8907634c5e494f * fix lint Change-Id: I6c7b82bbfa579b468712f90fc03b12a931874a54 * restart jenkins Change-Id: I1d7300343e94e695cd1c93a3b59895f52bbcb11e	2020-04-23 15:30:50 -07:00
Moby von Briesen	d7794a4851	satellite/overlay: hardcode default values for audit alpha/beta Alpha=1 and beta=0 are the expected first values for any alpha/beta reputation system we are using in the codebase. So we are removing the configurability of these values. Change-Id: Ic61861b8ea5047fa1438ea6609b1d0048bf0abc3	2020-04-14 19:12:40 +00:00
Natalie Villasana	cf80b3caf3	satellite/overlay: combine SelectStorageNodes and SelectNewStorageNodes (#3831 )	2020-04-09 11:19:44 -04:00
Egon Elbre	c970969503	satellite/overlay: add benchmark for node selection Change-Id: I15b767a78b662f8276e656b3fb73a15ec59e76c8	2020-03-27 23:09:29 +02:00
Moby von Briesen	8b72181a1f	satellite/{audit,overlay,satellitedb}: implement unknown audit reputation and suspension * change overlay.UpdateStats to allow a third audit outcome. Now it can handle successful, failed, and unknown audits. * when "unknown audit reputation" (unknownAuditAlpha/(unknownAuditAlpha+unknownAuditBeta)) falls below the DQ threshold, put node into suspension. * when unknown audit reputation goes above the DQ threshold, remove node from suspension. * record unknown audits from audit reporter. * add basic tests around unknown audits and suspension. Change-Id: I125f06f3af52e8a29ba48dc19361821a9ff1daa1	2020-03-16 20:29:26 +00:00
Jessica Grebenschikov	803e2930f4	satellite: use IP for all uplink operations, use hostname for audit and repairs My understanding is that the nodes table has the following fields: - `address` field which can be a hostname or an IP - `last_net` field that is the /24 subnet of the IP resolved from the address This PR does the following: 1) add back the `last_ip` field to the nodes table 2) for uplink operations remove the calls that the satellite makes to `lookupNodeAddress` (which makes the DNS calls to resolve the IP from the hostname) and instead use the data stored in the nodes table `last_ip` field. This means that the IP that the satellite sends to the uplink for the storage nodes could be approx 1 hr stale. In the short term this is fine, next we will be adding changes so that the storage node pushes any IP changes to the satellite in real time. 3) use the address field for repair and audit since we want them to still make DNS calls to confirm the IP is up to date 4) try to reduce confusion about hostname, ip, subnet, and address in the code base Change-Id: I96ce0d8bb78303f82483d0701bc79544b74057ac	2020-03-11 09:11:40 -07:00
Jennifer Johnson	1c1750e6be	removes bandwidth limiting On satellite, remove all references to free_bandwidth column in nodes table. On storage node, remove references to AllocatedBandwidth and MinimumBandwidth and mark as deprecated. Protobuf message, NodeCapacity, is left intact for backwards compatibility. Once this is released to all satellites, we can drop the column from the DB. Change-Id: I2ff6c6537fc9008a0c5588e951afea58ede85838	2020-03-04 14:04:00 +00:00
Yingrong Zhao	76ee8a1b4c	satellite: remove UptimeReputation configs from codebase With the new storage node downtime tracking feature, we need remove current uptime reputation configs: UptimeReputationAlpha, UptimeReputationBeta, and UptimeReputationDQ. This is the first step of removing the uptime reputation columns from satellitedb Change-Id: Ie8fab13295dbf545e33aeda0c4306cda4ba54e36	2020-01-08 18:54:15 +00:00
Egon Elbre	6615ecc9b6	common: separate repository Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a	2019-12-27 14:11:15 +02:00
littleskunk	8b3444e088	satellite/nodeselection: don't select nodes that haven't checked in for a while (#3567 ) * satellite/nodeselection: dont select nodes that havent checked in for a while * change testplanet online window to one minute * remove satellite reconfigure online window = 0 in repair tests * pass timestamp into UpdateCheckIn * change timestamp to timestamptz * edit tests to set last_contact_success to 4 hours ago * fix syntax error * remove check for last_contact_success > last_contact_failure in IsOnline	2019-11-15 23:43:06 +01:00
Egon Elbre	ee6c1cac8a	private: rename internal to private (#3573 )	2019-11-14 21:46:15 +02:00
Egon Elbre	9ceff9f9c6	satellite/overlay: move CheckIn benchmark to overlay (#3095 )	2019-09-20 16:35:52 -04:00
ethanadams	c9b46f2fe2	V3-1987: Optimize audits stats persistence (#2632 ) * Added batch update stats for recordAuditSuccessStatus * Added batch update stats to recordAuditFailStatus * added configurable batch size * build individual update/delete statements so the statements can be batched into 1 call to the DB * notified #config-changes channel and ran make update-satellite-config-lock * updated tests to use batch update stats	2019-07-31 13:21:06 -04:00
Egon Elbre	5d0816430f	rename all the things (#2531 ) * rename pkg/linksharing to linksharing * rename pkg/httpserver to linksharing/httpserver * rename pkg/eestream to uplink/eestream * rename pkg/stream to uplink/stream * rename pkg/metainfo/kvmetainfo to uplink/metainfo/kvmetainfo * rename pkg/auth/signing to pkg/signing * rename pkg/storage to uplink/storage * rename pkg/accounting to satellite/accounting * rename pkg/audit to satellite/audit * rename pkg/certdb to satellite/certdb * rename pkg/discovery to satellite/discovery * rename pkg/overlay to satellite/overlay * rename pkg/datarepair to satellite/repair	2019-07-28 08:55:36 +03:00

41 Commits