storj

Author	SHA1	Message	Date
Michal Niewrzal	e1215d5da8	satellite/overlay: add AOST to GetParticipatingNodes method This method is sometimes ends with transaction error. Most probably because it's trying to do full table scan on nodes table which is heavily used. Adding AOST should help with DB contention. Change-Id: Ibd4358d28dc26922b60c6b30862f20e7c0662cd1	2023-09-27 12:00:10 +00:00
paul cannon	2522ff09b6	satellite/overlay: configurable meaning of last_net Up to now, we have been implementing the DistinctIP preference with code in two places: 1. On check-in, the last_net is determined by taking the /24 or /64 (in ResolveIPAndNetwork()) and we store it with the node record. 2. On node selection, a preference parameter defines whether to return results that are distinct on last_net. It can be observed that we have never yet had the need to switch from DistinctIP to !DistinctIP, or from !DistinctIP to DistinctIP, on the same satellite, and we will probably never need to do so in an automated way. It can also be observed that this arrangement makes tests more complicated, because we often have to arrange for test nodes to have IP addresses in different /24 networks (a particular pain on macOS). Those two considerations, plus some pending work on the repair framework that will make repair take last_net into consideration, motivate this change. With this change, in the #2 place, we will _always_ return results that are distinct on last_net. We implement the DistinctIP preference, then, by making the #1 place (ResolveIPAndNetwork()) more flexible. When DistinctIP is enabled, last_net will be calculated as it was before. But when DistinctIP is _off_, last_net can be the same as address (IP and port). That will effectively implement !DistinctIP because every record will have a distinct last_net already. As a side effect, this flexibility will allow us to change the rules about last_net construction arbitrarily. We can do tests where last_net is set to the source IP, or to a /30 prefix, or a /16 prefix, etc., and be able to exercise the production logic without requiring a virtual network bridge. This change should be safe to make without any migration code, because all known production satellite deployments use DistinctIP, and the associated last_net values will not change for them. They will only change for satellites with !DistinctIP, which are mostly test deployments that can be recreated trivially. For those satellites which are both permanent and !DistinctIP, node selection will suddenly start acting as though DistinctIP is enabled, until the operator runs a single SQL update "UPDATE nodes SET last_net = last_ip_port". That can be done either before or after deploying software with this change. I also assert that this will not hurt performance for production deployments. It's true that adding the distinct requirement to node selection makes things a little slower, but the distinct requirement is already present for all production deployments, and they will see no change. Refs: https://github.com/storj/storj/issues/5391 Change-Id: I0e7e92498c3da768df5b4d5fb213dcd2d4862924	2023-03-09 02:20:12 +00:00
JT Olio	77bf88e916	satellite/overlay: check node difficulty before entering database closes https://github.com/storj/storj/issues/5568 Change-Id: Id413637c2678e7a7cf8dbf414e082c687c8e8a39	2023-02-15 17:46:25 +00:00
Cameron	d856569935	satellite/overlay: node software update email cooldown config Change-Id: I792eef4a570e38e94f79a3f73c204b65f86ab541	2022-11-11 20:14:25 +00:00
Cameron	a52f766273	satellite/overlay: add email-sending functionality to overlay service We want to send emails to SNOs. Node status changes go through the overlay service, so it's a good place to add the mail service. Add the mailservice.Service, satellite address, and satellite name to overlay service. Also add feature flag --overlay.send-node-emails Change-Id: I3bd2cb3bf22f9724954ce2374f8b651b902b3a24	2022-10-13 18:01:05 +00:00
Egon Elbre	65b5a0fe82	satellite/{overlay,satellitedb}: add AS OF to download selection This should reduce the load caused by download selection queries. Change-Id: Ic1a89d9c3eb5418f0792eb20ec2aece18dc63f2c	2022-06-28 05:18:01 +00:00
Moby von Briesen	b2d342aa9b	satellite/overlay: Add ability to exclude country codes on upload Create global config to specify a list of country codes that should be excluded from node selection during uploads. This exclusion is not implemented when the upload selection cache is disabled. Change-Id: Ic41e8b4f18857a11045668eac23107da99668a72	2022-03-03 16:58:48 +00:00
Fadila Khadar	e776c65172	satellite/checker: pieces in excluded countries are not healthy Add a RepairExcludedCountryCodes config flag for overlay for providing a list of country codes to exclude nodes from target repair selection. Mark segments with less than repairThreshold pieces in countries not in the RepairExcludedCountryCodes as not healthy. With this change, the repair process is not affected. The segment will be removed from the repair queue by the repairer. Another change will handle the logic at the repairer level. Fixes https://github.com/storj/team-metainfo/issues/95 Change-Id: I9231b32de117a116488de055a3e94efcabb46e81	2022-03-02 09:59:09 +00:00
Mya	bf51c286d9	satellite/geoip: update node check-in to associate a country code Resolves https://github.com/storj/storj/issues/4247 Change-Id: Idfd71bf1795d48ca3c686066bbdb95b9c6594f00	2021-11-10 16:44:41 +01:00
Yingrong Zhao	646ce5b8cc	satellite/overlay: remove reputation logic from overlay Change-Id: I3492860e4537c7a8e4e824ec4c9c8d179134a0c0	2021-07-28 15:15:28 -04:00
Cameron Ayer	8c124c6fa4	satellite/{reputation,overlay,satellitedb}: create reputation service, DB, add overlay method UpdateReputation Define service and DB interface for storing node reputation data and updating the overlay cache. Add overlay service and DB method UpdateReputation. See https://github.com/storj/storj/pull/4144 Change-Id: Iedd8bd3274457d26c595919303d55327c1464b8c	2021-06-24 16:19:15 +00:00
Egon Elbre	59e3b586e7	satellite/{gracefulexit,overlay}: enable as of system time queries Change-Id: I2af5eb0e8a51fca7893ce07b78b5633be71dfef8	2021-06-22 11:50:50 +00:00
Jeremy Wharton	8a070e7c25	satellite/overlay: Ignore unnecessary check-ins This prevents the database from being contacted unnecessarily, reducing load. Change-Id: Ib2420f68a20636ec35eb3dd3df8e02bd5341b419	2021-06-22 09:00:41 +00:00
JT Olio	da9ca0c650	testplanet/satellite: reduce the number of places default values need to be configured Satellites set their configuration values to default values using cfgstruct, however, it turns out our tests don't test these values at all! Instead, they have a completely separate definition system that is easy to forget about. As is to be expected, these values have drifted, and it appears in a few cases test planet is testing unreasonable values that we won't see in production, or perhaps worse, features enabled in production were missed and weren't enabled in testplanet. This change makes it so all values are configured the same, systematic way, so it's easy to see when test values are different than dev values or release values, and it's less hard to forget to enable features in testplanet. In terms of reviewing, this change should be actually fairly easy to review, considering private/testplanet/satellite.go keeps the current config system and the new one and confirms that they result in identical configurations, so you can be certain that nothing was missed and the config is all correct. You can also check the config lock to see what actual config values changed. Change-Id: I6715d0794887f577e21742afcf56fd2b9d12170e	2021-06-01 22:14:17 +00:00
Egon Elbre	961e841bd7	all: fix error naming errs.Class should not contain "error" in the name, since that causes a lot of stutter in the error logs. As an example a log line could end up looking like: ERROR node stats service error: satellitedbs error: node stats database error: no rows Whereas something like: ERROR nodestats service: satellitedbs: nodestatsdb: no rows Would contain all the necessary information without the stutter. Change-Id: I7b7cb7e592ebab4bcfadc1eef11122584d2b20e0	2021-04-29 15:38:21 +03:00
Cameron Ayer	1a51049ac0	satellite/{overlay,satellitedb}: add flag to toggle suspending nodes for offline audits This change introduces a new config flag, --overlay.audit-history.offline-suspension-enabled, to toggle suspending nodes for offline audits. If the flag is set to true, nodes will be suspended if they meet the requirements. If the flag is false, nodes will not be suspended. If they are already suspended and/or under review, these will be cleared. Change-Id: Ibeba759c42d6e504f6b7598120d4fd4dab85ca74	2021-03-27 16:28:27 +00:00
Egon Elbre	19e3dc4ec0	satellite/overlay: rename NodeSelectionCache to UploadSelectionCache It wasn't obvious that NodeSelectionCache was only for uploads. Change-Id: Ifeeaa6fdb50a4b7916245b48d8634d70ac54459c	2021-01-28 14:56:53 +02:00
Cameron Ayer	d14607a5f7	satellite/{contact,nodestats,overlay,satellitedb}: remove references to total_uptime_count and uptime_success_count columns Change-Id: I1f92022909bc564e9b1e31bf937fdfe7c16554de	2021-01-19 15:43:02 -05:00
Ethan Adams	6070018021	satellite/overlay: use AS OF SYSTEM TIME with Cockroach Query nodes table using AS OF SYSTEM TIME '-10s' (by default) when on CRDB to alleviate contention on the nodes table and minimize CRDB retries. Queries for standard uploads are already cached, and node lookups for graceful exit uploads has retry logic so it isn't necessary for the nodes returned to be current.	2020-12-22 21:07:07 +02:00
Moby von Briesen	7c3afe164b	satellite/overlay: uncomment dq for offline and disable with feature flag Change-Id: Ib39e2be32e880b822a94eddfb81af99a38843a27	2020-10-16 12:55:16 +00:00
Jennifer Johnson	4e2413a99d	satellite/satellitedb: uses vetted_at field to select for reputable nodes Additionally, this PR changes NewNodeFraction devDefault and testplanet config from 0.05 to 1. This is because many tests relied on selecting nodes that were reputable based on audit and uptime counts of 0, in effect, selecting new nodes as reputable ones. However, since reputation is now indicated by a vetted_at db field that is explicitly set rather than implied by audit and uptime counts, it would be more complicated to try to update all of the nodes' reputations before selecting nodes for tests. Now we just allow all test nodes to be new if needed. Change-Id: Ib9531be77408662315b948fd029cee925ed2ca1d	2020-09-04 16:45:32 +00:00
Egon Elbre	94a09ce20b	all: add missing dots Change-Id: I93b86c9fb3398c5d3c9121b8859dad1c615fa23a	2020-08-11 17:50:01 +03:00
Qweder93	4ee1b2d45a	storagenode/console: added list of all audits per satellite to sno dashboard/satellites Change-Id: I52e58748d6467f372d9a308347fc77e400d137e2	2020-08-10 12:55:07 +00:00
Moby von Briesen	e02adfe5e9	satellite/overlay/config.go: Add AuditHistoryConfig to overlay Adds AuditHistory{WindowSize, TrackingPeriod, GracePeriod, OfflineThreshold}. These values will be used to track offline audits over time, and to suspend/disqualify nodes for being offline for too long. Change-Id: I05f7dbc3c034bdc53c4fbd7719c71a44f37ec6a5	2020-08-04 18:18:56 +00:00
Egon Elbre	080ba47a06	all: fix dots Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2	2020-07-16 14:58:28 +00:00
littleskunk	2fbb34c3ea	nodeselection: Increase minimum free space to 500MB (#3898 )	2020-05-25 12:13:28 +02:00
Moby von Briesen	8f60cfc4fb	satellite/overlay: Add flag for enabling/disabling disqualification from suspension mode Add a flag that allows us to easily switch disqualification from suspension mode on or off. A node will only be disqualified from suspension mode if it has been suspended for longer than the grace period AND the SuspensionDQEnabled flag is true. Change-Id: I9e67caa727183cd52ab2042b0a370a1bcaebe792	2020-05-04 17:25:09 +00:00
Jess G	825226c98e	satellite/overlay: use node selection cache for uploads (#3859 ) * satellite/overlay: use node selection cache for uploads Change-Id: Ibd16cccee979d0544f2f4a01749af9f36f02a6ad * fix config lock Change-Id: Idd307e4dee8ab92749f1ec3f996419ea0af829fd * start fixing tests Change-Id: I207d373a3b2a2d9312c9e72fe9bd0b01e06ad6cf * fix test, add some more Change-Id: I82b99c2004fca2510965f9b389f87dd4474bc722 * change config name Change-Id: I0c0f7fc726b2565dc3828cb723f5459a940f2a0b * add benchmarks Change-Id: I05fa25bff8d5b65f94d918556855b95163d002e9 * revert bench to put in different PR Change-Id: I0f6942296895594768f19614bd7b2e3b9b106ade * add staleness to benchmark Change-Id: Ia80a310623d5a342afa6d835402170b531b0f870 * add cache config to testplanet Change-Id: I39abdab8cc442694da543115a9e470b2a8a25dff * have repair select old way Change-Id: I25a938457d7d1bcf89fd15130cb6b0ac19585252 * lower testplante config time Change-Id: Ib56a2ed086c06bc6061388d15a10a2526a663af7 * fix test Change-Id: I3868e9cacde2dfbf9c407afab04dc5fc2f286f69	2020-04-24 09:11:04 -07:00
Moby von Briesen	72b93f3120	satellite/satellitedb: disqualify suspended nodes when the grace period passes If a node is suspended and receives an unknown or failing audit, disqualify them if the grace period (default 1w in production) has passed. Migrate the nodes table so any node that is currently suspended gets unsuspended when the satellite starts up. Change-Id: I7b81c68026f823417faa0bf5e5cb5e67c7156b82	2020-04-22 15:45:00 -04:00
Moby von Briesen	d7794a4851	satellite/overlay: hardcode default values for audit alpha/beta Alpha=1 and beta=0 are the expected first values for any alpha/beta reputation system we are using in the codebase. So we are removing the configurability of these values. Change-Id: Ic61861b8ea5047fa1438ea6609b1d0048bf0abc3	2020-04-14 19:12:40 +00:00
Jennifer Johnson	699b635e5d	satellite/overlay: rename newNodePercentage to newNodeFraction Change-Id: Ie66de91f88183b44de0773589e83e4ade9aa997a	2020-03-19 20:09:32 +00:00
Cameron Ayer	b22bf16b35	satellite/overlay: add config flag for node selection free disk requirement Currently SNs report their free disk space once per hour. If a node becomes full, it has to wait until the next contact cycle begins to report; all the while receiving and failing upload requests. By increasing the minimum required disk space, we can give the storage nodes more time to report their space before the completely fill up. This change goes hand-in-hand with another change we want to implement: trigger capacity report on SN immediately upon falling below threshold. Change-Id: I12f778286c6c3f582438b0e2949765ac43325e27	2020-02-11 18:08:25 +00:00
Jeff Wendling	7999d24f81	all: use monkit v3 this commit updates our monkit dependency to the v3 version where it outputs in an influx style. this makes discovery much easier as many tools are built to look at it this way. graphite and rothko will suffer some due to no longer being a tree based on dots. hopefully time will exist to update rothko to index based on the new metric format. it adds an influx output for the statreceiver so that we can write to influxdb v1 or v2 directly. Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff	2020-02-05 23:53:17 +00:00
Yingrong Zhao	76ee8a1b4c	satellite: remove UptimeReputation configs from codebase With the new storage node downtime tracking feature, we need remove current uptime reputation configs: UptimeReputationAlpha, UptimeReputationBeta, and UptimeReputationDQ. This is the first step of removing the uptime reputation columns from satellitedb Change-Id: Ie8fab13295dbf545e33aeda0c4306cda4ba54e36	2020-01-08 18:54:15 +00:00
Jennifer Li Johnson	ce3203e910	update NodeSelectionConfig.OnlineWindow to 4hr default (#3082 )	2019-09-18 14:57:57 -04:00
Egon Elbre	c8edeb0257	satellite/overlay: rename overlay.Cache to overlay.Service (#2717 )	2019-08-06 19:35:59 +03:00
ethanadams	c9b46f2fe2	V3-1987: Optimize audits stats persistence (#2632 ) * Added batch update stats for recordAuditSuccessStatus * Added batch update stats to recordAuditFailStatus * added configurable batch size * build individual update/delete statements so the statements can be batched into 1 call to the DB * notified #config-changes channel and ran make update-satellite-config-lock * updated tests to use batch update stats	2019-07-31 13:21:06 -04:00
Egon Elbre	5d0816430f	rename all the things (#2531 ) * rename pkg/linksharing to linksharing * rename pkg/httpserver to linksharing/httpserver * rename pkg/eestream to uplink/eestream * rename pkg/stream to uplink/stream * rename pkg/metainfo/kvmetainfo to uplink/metainfo/kvmetainfo * rename pkg/auth/signing to pkg/signing * rename pkg/storage to uplink/storage * rename pkg/accounting to satellite/accounting * rename pkg/audit to satellite/audit * rename pkg/certdb to satellite/certdb * rename pkg/discovery to satellite/discovery * rename pkg/overlay to satellite/overlay * rename pkg/datarepair to satellite/repair	2019-07-28 08:55:36 +03:00

38 Commits