storj

Author	SHA1	Message	Date
Egon Elbre	c5d4a13158	satellite/nodeselection: use NodeURL Change-Id: I2ebd4dbf993ff5c7864f3a3a665b5c8fc48aa7d1	2020-05-27 05:46:11 +00:00
littleskunk	2fbb34c3ea	nodeselection: Increase minimum free space to 500MB (#3898 )	2020-05-25 12:13:28 +02:00
Egon Elbre	bef84a5f9d	storagenode: remove dependency to overlay.NodeDossier This is the last dependency from storage node to satellite. Change-Id: I12f7abb91e84f823ba5af126c6e2979519838612	2020-05-21 08:37:13 +03:00
Jennifer Johnson	03e5f922c3	satellite/overlay: updates node with a vetted_at timestamp if they meet the vetting criteria What: As soon as a node passes the vetting criteria (total_audit_count and total_uptime_count are greater than the configured thresholds), we set vetted_at to the current timestamp. Why: We may want to use this timestamp in future development to select new vs vetted nodes. It also allows flexibility in node vetting experiments and allows for better metrics around vetting times. Please describe the tests: satellitedb_test: TestUpdateStats and TestBatchUpdateStats make sure vetted_at is set appropriately Please describe the performance impact: This change does add extra logic to BatchUpdateStats and UpdateStats and commits another variable to the db (vetted_at), but this should be negligible. Change-Id: I3de804549b5f1bc359da4935bc859758ceac261d	2020-05-20 16:30:26 -04:00
Egon Elbre	5d016425f1	satellite/{contact,downtime,overlay}: use NodeURL Change-Id: I555a479a89e0ddbf0499898bdbc8574282cd6846	2020-05-20 11:09:05 +00:00
Egon Elbre	941d10cbc3	private/testplanet: remove Peer.Local() Currently storagenode depends on overlay.NodeDossier, this is the first step in removing it. Change-Id: I034a3f1601835f8349bd41752455022e19bcc707	2020-05-20 11:05:34 +00:00
littleskunk	8ec64f3daf	satellite/overlay: enable node selection cache on all satellites (#3895 )	2020-05-19 19:25:53 +02:00
Egon Elbre	16abf02b35	satellite/{nodeselection,overlay}: use the new package Change-Id: I034fdbe578dec2e5c906aca82231cd3e56f26aeb	2020-05-18 21:38:43 +00:00
Jessica Grebenschikov	1ecbfc453f	satellite/overlay: add random test for node selection from cache Change-Id: I57b57db919b5afb3fc81176e078ac1e3a873565a	2020-05-18 16:59:52 +00:00
Egon Elbre	ec589a8289	all: fix comments about grpc Change-Id: Id830fbe2d44f083c88765561b6c07c5689afe5bd	2020-05-11 13:05:34 +03:00
Egon Elbre	678b859172	satellite/overlay: remove MinimumRequiredNodes In non-test code we were only using RequestedCount, not need to have MinimumRequiredNodes. Change-Id: I40736f4b028b41e94abfdeb221bce5aa86a5cb82	2020-05-07 15:41:23 +00:00
Egon Elbre	ce6a500b0c	satellite/overlay: support DistinctIP=false in selection cache Most of our tests and storj-sim are using DistinctIP. Also fix bug with newNodeCount calculation. Change-Id: I1a6d0efe7074908896a3322d19f487b929f0f0fc	2020-05-07 11:04:32 +00:00
Egon Elbre	2955c50bc1	satellite/overlay: fix data-race in node selection cache GetNodes returned references to nodes in the immutable state, however some parts of code expect them to be modified. Change-Id: I5be1866f95e0dbe062a6b6be60e29f2365c35faa	2020-05-06 20:03:06 +03:00
Egon Elbre	4e94da3fda	satellite/overlay: add feature flag for node selection cache Also distinguish the purpose for selecting nodes to avoid potential confusion, what should allow caching and what shouldn't. Change-Id: Iee2451c1f10d0f1c81feb1641507400d89918d61	2020-05-06 16:13:47 +03:00
Moby von Briesen	8f60cfc4fb	satellite/overlay: Add flag for enabling/disabling disqualification from suspension mode Add a flag that allows us to easily switch disqualification from suspension mode on or off. A node will only be disqualified from suspension mode if it has been suspended for longer than the grace period AND the SuspensionDQEnabled flag is true. Change-Id: I9e67caa727183cd52ab2042b0a370a1bcaebe792	2020-05-04 17:25:09 +00:00
Jessica Grebenschikov	6a6427526b	satellite/overlay: remove old updateaddress method The UpdateAddress method use to be used when storage node's checked in with the Satellite, but once the contact service was created this method was no longer used. This PR finally removes it. Change-Id: Ib3f83c8003269671d97d54f21ee69665fa663f24	2020-04-30 06:41:48 +00:00
Egon Elbre	e26a5598ec	satellite/overlay: fix UpdateCheckIn test time.Now() in a short amount of time can return the exact same value. Ensure that the test uses times that are distinct. Change-Id: Ia653ce0af4bfcf7b5da133a9cf98b823033d9592	2020-04-29 17:59:00 +00:00
Moby von Briesen	de366537a8	satellite/satellitedb/overlaycache: fix behavior around gracefully exited nodes Sometimes nodes who have gracefully exited will still be holding pieces according to the satellite. This has some unintended side effects currently, such as nodes getting disqualified after having successfully exited. * When the audit reporter attempts to update node stats, do not update stats (alpha, beta, suspension, disqualification) if the node has finished graceful exit (audit/reporter_test.go TestGracefullyExitedNotUpdated) * Treat gracefully exited nodes as "not reputable" so that the repairer and checker do not count them as healthy (overlay/statdb_test.go TestKnownUnreliableOrOffline, repair/repair_test.go TestRepairGracefullyExited) Change-Id: I1920d60dd35de5b2385a9b06989397628a2f1272	2020-04-28 23:58:43 +00:00
Jess G	825226c98e	satellite/overlay: use node selection cache for uploads (#3859 ) * satellite/overlay: use node selection cache for uploads Change-Id: Ibd16cccee979d0544f2f4a01749af9f36f02a6ad * fix config lock Change-Id: Idd307e4dee8ab92749f1ec3f996419ea0af829fd * start fixing tests Change-Id: I207d373a3b2a2d9312c9e72fe9bd0b01e06ad6cf * fix test, add some more Change-Id: I82b99c2004fca2510965f9b389f87dd4474bc722 * change config name Change-Id: I0c0f7fc726b2565dc3828cb723f5459a940f2a0b * add benchmarks Change-Id: I05fa25bff8d5b65f94d918556855b95163d002e9 * revert bench to put in different PR Change-Id: I0f6942296895594768f19614bd7b2e3b9b106ade * add staleness to benchmark Change-Id: Ia80a310623d5a342afa6d835402170b531b0f870 * add cache config to testplanet Change-Id: I39abdab8cc442694da543115a9e470b2a8a25dff * have repair select old way Change-Id: I25a938457d7d1bcf89fd15130cb6b0ac19585252 * lower testplante config time Change-Id: Ib56a2ed086c06bc6061388d15a10a2526a663af7 * fix test Change-Id: I3868e9cacde2dfbf9c407afab04dc5fc2f286f69	2020-04-24 09:11:04 -07:00
Jess G	7a4dcd61f7	satellite/overlay: add changes to selected node benchmarks (#3862 ) * add changes to selected node benchmarks Change-Id: I0259af155f9151cc2c7830d10f8907634c5e494f * fix lint Change-Id: I6c7b82bbfa579b468712f90fc03b12a931874a54 * restart jenkins Change-Id: I1d7300343e94e695cd1c93a3b59895f52bbcb11e	2020-04-23 15:30:50 -07:00
Moby von Briesen	720e26d235	satellite/satellitedb/overlaycache: update unknown alpha/beta values properly Update unknown_audit_reputation_alpha and unknown_audit_reputation_beta. Add test to verify that BatchUpdateStats properly modifies unknown audit alpha/beta Change-Id: I0d5f9cac96a99f64905cf575b772402db0756a9d	2020-04-23 10:40:53 -04:00
Moby von Briesen	72b93f3120	satellite/satellitedb: disqualify suspended nodes when the grace period passes If a node is suspended and receives an unknown or failing audit, disqualify them if the grace period (default 1w in production) has passed. Migrate the nodes table so any node that is currently suspended gets unsuspended when the satellite starts up. Change-Id: I7b81c68026f823417faa0bf5e5cb5e67c7156b82	2020-04-22 15:45:00 -04:00
Jess G	5ea1602ca5	satellite/overlay: add selected node cache (#3846 ) * init implementation cache Change-Id: Ia54a1943e0707a77189bc5f4a9aaa8339c98d99a * one query to init cache Change-Id: I7c04b3ae104b553ae23fca372351a4328f632c66 * add monit tracking of cache Change-Id: I7d209e12c8f32d43708b23bf2126c5d5098e0a07 * add first test Change-Id: I0646a9349d457a9eb3920f7cd2d62fb72ffc3ab5 * add staleness to cache Change-Id: If002329bfdd53a4b200ad14dbd2ffc8b280aedb8 * add init test Change-Id: I3a3d0aa74cfac1d125fa93cb749316ed2a74d5b1 * fix comment Change-Id: I73353d00ccf0952b38c0f8ef7d1755c15cbfe9d9 * mv to nodeselection pkg Change-Id: I62487f768296c7a7b597fa398a4c42daf6e9c5b7 * add state to cache Change-Id: I081e77ec0e16706faee1a267de9a7fa643d6ac11 * add refresh concurrent test Change-Id: Idcba72508291099f280edc65355273c0acc3d3ce * add a few more tests Change-Id: I9422e9eaa22bf01c11f14bdb892ebcf7b3e5e5fb * fix tests, add min version to select allnodes Change-Id: I926f41d568951ad4ff70c6d4ceb87abb1e3e5009 * update comments Change-Id: I6ffe33e245ca65fb523c880cd72e63ce35776eb9 * fixes and rm Init Change-Id: Ifbe09b668978b5d9af09ca38cb080d02a2154cf4 * fix format Change-Id: I03cc217e28dc1839190c5c6dbdbb602c132a5a38	2020-04-14 13:50:02 -07:00
Moby von Briesen	d7794a4851	satellite/overlay: hardcode default values for audit alpha/beta Alpha=1 and beta=0 are the expected first values for any alpha/beta reputation system we are using in the codebase. So we are removing the configurability of these values. Change-Id: Ic61861b8ea5047fa1438ea6609b1d0048bf0abc3	2020-04-14 19:12:40 +00:00
Natalie Villasana	cf80b3caf3	satellite/overlay: combine SelectStorageNodes and SelectNewStorageNodes (#3831 )	2020-04-09 11:19:44 -04:00
Natalie Villasana	0a1bbc9824	satellite/overlay: add TestEnsureMinimumRequested Adds a test to make sure that the correct amount of total nodes, reputable nodes, and new nodes are returned by SelectStorageNodes in different cases. Change-Id: I0939159600afde8a46c35735f1edf0576fcdb4cd	2020-04-08 16:06:25 +00:00
Egon Elbre	439aba922a	satellite/overlay: reduce overhead of GetNodes Instead of filtering on the client side it's better to filter on the database side. Change-Id: I845fbbe5ed28c2ffdb0b8a3f789b59c094fd1069	2020-03-30 18:36:23 +03:00
Egon Elbre	cb781d66c7	satellite/overlay: optimize FindStorageNodes Reduce the number of fields returned from the query. Benchmark results in `satellite/overlay`: benchstat before.txt after2.txt name old time/op new time/op delta SelectStorageNodes-32 7.85ms ± 1% 6.27ms ± 1% -20.18% (p=0.002 n=10+4) SelectNewStorageNodes-32 8.21ms ± 1% 6.61ms ± 0% -19.53% (p=0.002 n=10+4) SelectStorageNodesExclusion-32 17.2ms ± 1% 15.9ms ± 1% -7.55% (p=0.002 n=10+4) SelectNewStorageNodesExclusion-32 17.8ms ± 2% 16.1ms ± 0% -9.38% (p=0.002 n=10+4) FindStorageNodes-32 48.4ms ± 1% 45.1ms ± 0% -6.69% (p=0.002 n=10+4) FindStorageNodesExclusion-32 79.2ms ± 1% 76.1ms ± 1% -3.89% (p=0.002 n=10+4) Benchmark results from `satellite/overlay` after making them parallel: benchstat before-parallel.txt after2-parallel.txt name old time/op new time/op delta SelectStorageNodes-32 548µs ± 1% 353µs ± 1% -35.60% (p=0.029 n=4+4) SelectNewStorageNodes-32 562µs ± 0% 368µs ± 0% -34.51% (p=0.029 n=4+4) SelectStorageNodesExclusion-32 1.02ms ± 1% 0.84ms ± 0% -18.08% (p=0.029 n=4+4) SelectNewStorageNodesExclusion-32 1.03ms ± 1% 0.86ms ± 2% -16.22% (p=0.029 n=4+4) FindStorageNodes-32 3.11ms ± 0% 2.79ms ± 1% -10.27% (p=0.029 n=4+4) FindStorageNodesExclusion-32 4.75ms ± 0% 4.43ms ± 1% -6.56% (p=0.029 n=4+4) Change-Id: I1d85e2764eb270f4c2b1998303ccfc1179d65b26	2020-03-30 18:36:23 +03:00
Egon Elbre	a6540dc3ef	satellite/overlay: remove unused KeyLock Change-Id: Ie99f97772824ceafb3d97453545fc6e96be2fb6f	2020-03-30 16:47:48 +03:00
Egon Elbre	c970969503	satellite/overlay: add benchmark for node selection Change-Id: I15b767a78b662f8276e656b3fb73a15ec59e76c8	2020-03-27 23:09:29 +02:00
Jennifer Johnson	699b635e5d	satellite/overlay: rename newNodePercentage to newNodeFraction Change-Id: Ie66de91f88183b44de0773589e83e4ade9aa997a	2020-03-19 20:09:32 +00:00
Moby von Briesen	2f991b6c56	satellite/{overlay, satellitedb}: account for `suspended` field in overlay cache Make sure that suspended nodes are treated appropriately by the overlay cache. This means we should expect the following behavior: * suspended nodes (vetted or not) should not be selected for uploading new segments * suspended nodes should be treated by the checker and repairer as "unhealthy", and should be removed upon successful repair This commit also removes unused overlay functionality. Fixes a bug with commit `8b72181a1f` where the audit reporter was automatically suspending nodes regardless of audit outcome (see test added). Tests: * updates repair tests to ensure that a suspended node is treated as unhealthy and will be removed from the pointer on successful repair * updates overlay tests for KnownUnreliableOrOffline and KnownReliable to expect suspended nodes to be considered "unreliable" * adds satellitedb test that ensures overlay.SelectStorageNodes and overlay.SelectNewStorageNodes do not include suspended nodes * adds audit reporter test to ensure that different audit outcomes result in the correct suspended/disqualified states Change-Id: I40dba67278c8e8d2ce0bcec5e0a5cb6e4ce2f561	2020-03-17 17:14:56 +00:00
Ethan	bdbf764b86	satellite/orders;overlay: Consolidate order limit storage node lookups into 1 query. https: //storjlabs.atlassian.net/browse/SM-449 Change-Id: Idc62cc2978fba67cf48f7c98b27b0f996f9c58ac	2020-03-16 23:15:47 +00:00
Moby von Briesen	8b72181a1f	satellite/{audit,overlay,satellitedb}: implement unknown audit reputation and suspension * change overlay.UpdateStats to allow a third audit outcome. Now it can handle successful, failed, and unknown audits. * when "unknown audit reputation" (unknownAuditAlpha/(unknownAuditAlpha+unknownAuditBeta)) falls below the DQ threshold, put node into suspension. * when unknown audit reputation goes above the DQ threshold, remove node from suspension. * record unknown audits from audit reporter. * add basic tests around unknown audits and suspension. Change-Id: I125f06f3af52e8a29ba48dc19361821a9ff1daa1	2020-03-16 20:29:26 +00:00
Bill Thorp	94c11c5212	satellite: remove some unnecessary UTC() calls Fixes some easy cases of extraneous UTC() calls Change-Id: I3f4c287ae622a455b9a492a8892a699e0710ca9a	2020-03-13 13:49:44 +00:00
Jess G	39cb821196	satellite/overlay: rm combinedcache, fix IP naming to be network (#3798 ) * rn combinedcache, rm dns node lookup Change-Id: I239f07211764b097d851230d8c81900a47756e9e * excludeIPs -> excludedNetworks Change-Id: Ifa6f44ab17457cdd5aff4cd5694296867c18b179 * use lowercase var name Change-Id: I825aad2b718c71f455e747be18f8cabd02aabe55 * update Getnetwork name Change-Id: I002a1b7bc6b4ef40159c0cd2b0ef209f80a9c503 * fix comments Change-Id: Ibddf5b9ffa9d685af6c392d893db063ef18e45fa * update comments with ipv6 Change-Id: I31758b7d4979e7c27d014668f4fb532ad838cda2 Co-authored-by: Stefan Benten <mail@stefan-benten.de>	2020-03-12 11:37:57 -07:00
Jessica Grebenschikov	803e2930f4	satellite: use IP for all uplink operations, use hostname for audit and repairs My understanding is that the nodes table has the following fields: - `address` field which can be a hostname or an IP - `last_net` field that is the /24 subnet of the IP resolved from the address This PR does the following: 1) add back the `last_ip` field to the nodes table 2) for uplink operations remove the calls that the satellite makes to `lookupNodeAddress` (which makes the DNS calls to resolve the IP from the hostname) and instead use the data stored in the nodes table `last_ip` field. This means that the IP that the satellite sends to the uplink for the storage nodes could be approx 1 hr stale. In the short term this is fine, next we will be adding changes so that the storage node pushes any IP changes to the satellite in real time. 3) use the address field for repair and audit since we want them to still make DNS calls to confirm the IP is up to date 4) try to reduce confusion about hostname, ip, subnet, and address in the code base Change-Id: I96ce0d8bb78303f82483d0701bc79544b74057ac	2020-03-11 09:11:40 -07:00
Jennifer Johnson	1c1750e6be	removes bandwidth limiting On satellite, remove all references to free_bandwidth column in nodes table. On storage node, remove references to AllocatedBandwidth and MinimumBandwidth and mark as deprecated. Protobuf message, NodeCapacity, is left intact for backwards compatibility. Once this is released to all satellites, we can drop the column from the DB. Change-Id: I2ff6c6537fc9008a0c5588e951afea58ede85838	2020-03-04 14:04:00 +00:00
Cameron Ayer	f22bddf122	{storagenode/contact, private/testplanet}: remove ErrFailureToStart and panic in testplanet.Start Change-Id: I252e8c9407400af7bda95a7657c8154660c3c801	2020-02-24 18:24:23 +00:00
Cameron Ayer	3e70a893dd	storagenode/{piecestore, contact}: report capacity to satellites if below specific threshold Curently, storage nodes only report their capacity to satellites once per hour. If a node fills up, it will fail all uploads until the next contact cycle begins. With these changes, at the end of an upload we check whether the MinimumDiskSpace threshold has been passed. If so, trigger the monitor chore to update the node's capacity, then trigger the contact chore to report the new capacity to the satellites Change-Id: Ie6aadaade1e2c12c87e03f8ff9059a50121380a0	2020-02-18 15:42:48 -05:00
Cameron Ayer	b22bf16b35	satellite/overlay: add config flag for node selection free disk requirement Currently SNs report their free disk space once per hour. If a node becomes full, it has to wait until the next contact cycle begins to report; all the while receiving and failing upload requests. By increasing the minimum required disk space, we can give the storage nodes more time to report their space before the completely fill up. This change goes hand-in-hand with another change we want to implement: trigger capacity report on SN immediately upon falling below threshold. Change-Id: I12f778286c6c3f582438b0e2949765ac43325e27	2020-02-11 18:08:25 +00:00
Natalie Ventura Villasana	3900dadafd	satellite/overlay: find new nodes with ExcludedIPs Adds ExcludedIPs to the NodeCriteria for selecting new storage nodes. Previously, ExcludedIPs was only added to the NodeCriteria for selecting reputable storage nodes. Now that both are included in the FindStorageNodesWithPreferences call, it should no longer be possible to repair pieces to nodes that are on the same IP as nodes already storing pieces from that segment. Adds TestSelectNewStorageNodesExcludedIPs to make sure that SelectNewStorageNodes returns nodes with different IP addresses. https://storjlabs.atlassian.net/browse/V3-3011 Change-Id: Ic2d5e607cadeba6e8d5c40f9717149cb30880335	2020-02-10 23:45:17 +00:00
Jeff Wendling	7999d24f81	all: use monkit v3 this commit updates our monkit dependency to the v3 version where it outputs in an influx style. this makes discovery much easier as many tools are built to look at it this way. graphite and rothko will suffer some due to no longer being a tree based on dots. hopefully time will exist to update rothko to index based on the new metric format. it adds an influx output for the statreceiver so that we can write to influxdb v1 or v2 directly. Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff	2020-02-05 23:53:17 +00:00
Egon Elbre	0c0b47823d	satellite: use require.WithinDuration Noticed that assert/require has WithinDuration for comparing time.Time-s. Change-Id: Ia340896443f610d38799b7ef245b5775eecfc92b	2020-01-21 19:43:53 +02:00
Egon Elbre	f3b4bf2b7c	satellite/satellitedb/satellitedbtest: pass ctx as an argument ctx is created in most tests, instead pass in as argument to reduce code duplication. Change-Id: I466c51c008392001129c8b007c9d6b3619935ac4	2020-01-20 16:35:42 +02:00
Egon Elbre	a4026f97b8	satellite: fix test time comparisons Correct way to compare time that may have an error is to use InDelta. Change-Id: I0140892119c44c63fa042bbc7292ab91bb33a350	2020-01-20 10:17:20 +00:00
Natalie Ventura Villasana	6b1829f3c3	satellite/downtime: new chore estimates downtime Adds EstimationChore to the downtime package, which is an independent chore that finds offline nodes given a configurable limit, then uptime checks those nodes, and sets a last contact success or failure given a response. For failed nodes, the chore updates the amount of downtime the node has been offline in the DowntimeTracking table. Design doc section: https://github.com/storj/storj/blob/master/docs/blueprints/storage-node-downtime-tracking.md#estimating-offline-time Jira: https://storjlabs.atlassian.net/browse/V3-2545 Change-Id: I60af95803930bf9b33232b248bb20cca6f0e0b5f	2020-01-09 15:05:13 -05:00
Yingrong Zhao	76ee8a1b4c	satellite: remove UptimeReputation configs from codebase With the new storage node downtime tracking feature, we need remove current uptime reputation configs: UptimeReputationAlpha, UptimeReputationBeta, and UptimeReputationDQ. This is the first step of removing the uptime reputation columns from satellitedb Change-Id: Ie8fab13295dbf545e33aeda0c4306cda4ba54e36	2020-01-08 18:54:15 +00:00
Natalie Ventura Villasana	1cb0f80a8d	satellite/gracefulexit: dq node on exit fail Disqualifies a node when the node fails to complete a graceful exit. Adds a new DisqualifyNode method to the overlay cache, since there wasn't an existing method to disqualify a node but do nothing else to its stats. Adds checks to existing tests to make sure that a storage node that fails a graceful exit is marked as disqualified in the overlay cache. https: //storjlabs.atlassian.net/browse/V3-3342 Change-Id: I4d554a519ab59db31ad3b8e28764c8683a6e3888	2020-01-06 19:16:26 -05:00
Moby von Briesen	6c2e4cc0cd	satellite/overlay: Return NodeLastContact instead of a node dossier from overlay.GetOfflineNodesLimited We only care about node ID, address, and last contact success/failure from the downtime service, so the overlay should only return these values for the downtime-specific queries. Change-Id: I08a6ecfdd2a12b82cae62e87d6adeab53975bfce	2020-01-06 17:12:30 -05:00

1 2

88 Commits