storj

Author	SHA1	Message	Date
Egon Elbre	34db4a80fd	ci: fix staticcheck failures Change-Id: I176fb24214755a1940a0a1a4e9cc8e39f184870b	2020-06-05 13:15:34 +00:00
Yingrong Zhao	163c027a6d	satellite/satellitedb: remove monkit trace from convertDBNode In jaeger, it shows that this function gets called repetitively in a single request. Most of the time, it's less than 1ms. Therefore, it doesn't add much value in our trace but create noises. Change-Id: I20234f36bbcf0fc22f91e5e1a5634c0cad577ed0	2020-06-01 17:58:43 +00:00
Jennifer Johnson	03e5f922c3	satellite/overlay: updates node with a vetted_at timestamp if they meet the vetting criteria What: As soon as a node passes the vetting criteria (total_audit_count and total_uptime_count are greater than the configured thresholds), we set vetted_at to the current timestamp. Why: We may want to use this timestamp in future development to select new vs vetted nodes. It also allows flexibility in node vetting experiments and allows for better metrics around vetting times. Please describe the tests: satellitedb_test: TestUpdateStats and TestBatchUpdateStats make sure vetted_at is set appropriately Please describe the performance impact: This change does add extra logic to BatchUpdateStats and UpdateStats and commits another variable to the db (vetted_at), but this should be negligible. Change-Id: I3de804549b5f1bc359da4935bc859758ceac261d	2020-05-20 16:30:26 -04:00
Egon Elbre	5d016425f1	satellite/{contact,downtime,overlay}: use NodeURL Change-Id: I555a479a89e0ddbf0499898bdbc8574282cd6846	2020-05-20 11:09:05 +00:00
Moby von Briesen	8f60cfc4fb	satellite/overlay: Add flag for enabling/disabling disqualification from suspension mode Add a flag that allows us to easily switch disqualification from suspension mode on or off. A node will only be disqualified from suspension mode if it has been suspended for longer than the grace period AND the SuspensionDQEnabled flag is true. Change-Id: I9e67caa727183cd52ab2042b0a370a1bcaebe792	2020-05-04 17:25:09 +00:00
Jessica Grebenschikov	6a6427526b	satellite/overlay: remove old updateaddress method The UpdateAddress method use to be used when storage node's checked in with the Satellite, but once the contact service was created this method was no longer used. This PR finally removes it. Change-Id: Ib3f83c8003269671d97d54f21ee69665fa663f24	2020-04-30 06:41:48 +00:00
Moby von Briesen	de366537a8	satellite/satellitedb/overlaycache: fix behavior around gracefully exited nodes Sometimes nodes who have gracefully exited will still be holding pieces according to the satellite. This has some unintended side effects currently, such as nodes getting disqualified after having successfully exited. * When the audit reporter attempts to update node stats, do not update stats (alpha, beta, suspension, disqualification) if the node has finished graceful exit (audit/reporter_test.go TestGracefullyExitedNotUpdated) * Treat gracefully exited nodes as "not reputable" so that the repairer and checker do not count them as healthy (overlay/statdb_test.go TestKnownUnreliableOrOffline, repair/repair_test.go TestRepairGracefullyExited) Change-Id: I1920d60dd35de5b2385a9b06989397628a2f1272	2020-04-28 23:58:43 +00:00
Moby von Briesen	720e26d235	satellite/satellitedb/overlaycache: update unknown alpha/beta values properly Update unknown_audit_reputation_alpha and unknown_audit_reputation_beta. Add test to verify that BatchUpdateStats properly modifies unknown audit alpha/beta Change-Id: I0d5f9cac96a99f64905cf575b772402db0756a9d	2020-04-23 10:40:53 -04:00
Moby von Briesen	72b93f3120	satellite/satellitedb: disqualify suspended nodes when the grace period passes If a node is suspended and receives an unknown or failing audit, disqualify them if the grace period (default 1w in production) has passed. Migrate the nodes table so any node that is currently suspended gets unsuspended when the satellite starts up. Change-Id: I7b81c68026f823417faa0bf5e5cb5e67c7156b82	2020-04-22 15:45:00 -04:00
Jess G	5ea1602ca5	satellite/overlay: add selected node cache (#3846 ) * init implementation cache Change-Id: Ia54a1943e0707a77189bc5f4a9aaa8339c98d99a * one query to init cache Change-Id: I7c04b3ae104b553ae23fca372351a4328f632c66 * add monit tracking of cache Change-Id: I7d209e12c8f32d43708b23bf2126c5d5098e0a07 * add first test Change-Id: I0646a9349d457a9eb3920f7cd2d62fb72ffc3ab5 * add staleness to cache Change-Id: If002329bfdd53a4b200ad14dbd2ffc8b280aedb8 * add init test Change-Id: I3a3d0aa74cfac1d125fa93cb749316ed2a74d5b1 * fix comment Change-Id: I73353d00ccf0952b38c0f8ef7d1755c15cbfe9d9 * mv to nodeselection pkg Change-Id: I62487f768296c7a7b597fa398a4c42daf6e9c5b7 * add state to cache Change-Id: I081e77ec0e16706faee1a267de9a7fa643d6ac11 * add refresh concurrent test Change-Id: Idcba72508291099f280edc65355273c0acc3d3ce * add a few more tests Change-Id: I9422e9eaa22bf01c11f14bdb892ebcf7b3e5e5fb * fix tests, add min version to select allnodes Change-Id: I926f41d568951ad4ff70c6d4ceb87abb1e3e5009 * update comments Change-Id: I6ffe33e245ca65fb523c880cd72e63ce35776eb9 * fixes and rm Init Change-Id: Ifbe09b668978b5d9af09ca38cb080d02a2154cf4 * fix format Change-Id: I03cc217e28dc1839190c5c6dbdbb602c132a5a38	2020-04-14 13:50:02 -07:00
Moby von Briesen	d7794a4851	satellite/overlay: hardcode default values for audit alpha/beta Alpha=1 and beta=0 are the expected first values for any alpha/beta reputation system we are using in the codebase. So we are removing the configurability of these values. Change-Id: Ic61861b8ea5047fa1438ea6609b1d0048bf0abc3	2020-04-14 19:12:40 +00:00
Cameron Ayer	02613407ae	satellite/satellitedb: only suspend node if not already suspended Whenever the node's reputation is updated, if its unknown audit reputation is below the suspension threshold, its suspension field is set to the current time. This could overwrite the previous "suspendedAt" value resulting a node that never reaches the end of its suspension. Also log whenever a node is disqualified or its suspension status changes Change-Id: I5e8c8f1c46f66d79cb279b5b16a84fe03f533deb	2020-04-10 09:37:37 +00:00
Natalie Villasana	cf80b3caf3	satellite/overlay: combine SelectStorageNodes and SelectNewStorageNodes (#3831 )	2020-04-09 11:19:44 -04:00
Egon Elbre	9200efc61f	satellite/satellitedb: fix selecting a nullable string Change-Id: I59e645966e09da586512c69101691b47055c1e5a	2020-04-03 21:30:20 +03:00
Jeff Wendling	e2ff2ce672	satellite: compensation package and commands Change-Id: I7fd6399837e45ff48e5f3d47a95192a01d58e125	2020-03-30 14:08:14 -06:00
Egon Elbre	439aba922a	satellite/overlay: reduce overhead of GetNodes Instead of filtering on the client side it's better to filter on the database side. Change-Id: I845fbbe5ed28c2ffdb0b8a3f789b59c094fd1069	2020-03-30 18:36:23 +03:00
Egon Elbre	cb781d66c7	satellite/overlay: optimize FindStorageNodes Reduce the number of fields returned from the query. Benchmark results in `satellite/overlay`: benchstat before.txt after2.txt name old time/op new time/op delta SelectStorageNodes-32 7.85ms ± 1% 6.27ms ± 1% -20.18% (p=0.002 n=10+4) SelectNewStorageNodes-32 8.21ms ± 1% 6.61ms ± 0% -19.53% (p=0.002 n=10+4) SelectStorageNodesExclusion-32 17.2ms ± 1% 15.9ms ± 1% -7.55% (p=0.002 n=10+4) SelectNewStorageNodesExclusion-32 17.8ms ± 2% 16.1ms ± 0% -9.38% (p=0.002 n=10+4) FindStorageNodes-32 48.4ms ± 1% 45.1ms ± 0% -6.69% (p=0.002 n=10+4) FindStorageNodesExclusion-32 79.2ms ± 1% 76.1ms ± 1% -3.89% (p=0.002 n=10+4) Benchmark results from `satellite/overlay` after making them parallel: benchstat before-parallel.txt after2-parallel.txt name old time/op new time/op delta SelectStorageNodes-32 548µs ± 1% 353µs ± 1% -35.60% (p=0.029 n=4+4) SelectNewStorageNodes-32 562µs ± 0% 368µs ± 0% -34.51% (p=0.029 n=4+4) SelectStorageNodesExclusion-32 1.02ms ± 1% 0.84ms ± 0% -18.08% (p=0.029 n=4+4) SelectNewStorageNodesExclusion-32 1.03ms ± 1% 0.86ms ± 2% -16.22% (p=0.029 n=4+4) FindStorageNodes-32 3.11ms ± 0% 2.79ms ± 1% -10.27% (p=0.029 n=4+4) FindStorageNodesExclusion-32 4.75ms ± 0% 4.43ms ± 1% -6.56% (p=0.029 n=4+4) Change-Id: I1d85e2764eb270f4c2b1998303ccfc1179d65b26	2020-03-30 18:36:23 +03:00
Jennifer Johnson	b75cbc8e24	satellite,storagenode: remove references to free bandwidth Change-Id: I42a6597544804fa9235e89ec656ebc365eb522e5	2020-03-25 22:28:34 +00:00
Michal Niewrzal	fdf40a7526	storj: remove `storj/private/version` package which was moved to `storj/private` repo Change-Id: I81c3f5b9d5e4fe7bca760999eb045ee9734e5e2e	2020-03-24 14:31:33 +00:00
Moby von Briesen	2f991b6c56	satellite/{overlay, satellitedb}: account for `suspended` field in overlay cache Make sure that suspended nodes are treated appropriately by the overlay cache. This means we should expect the following behavior: * suspended nodes (vetted or not) should not be selected for uploading new segments * suspended nodes should be treated by the checker and repairer as "unhealthy", and should be removed upon successful repair This commit also removes unused overlay functionality. Fixes a bug with commit `8b72181a1f` where the audit reporter was automatically suspending nodes regardless of audit outcome (see test added). Tests: * updates repair tests to ensure that a suspended node is treated as unhealthy and will be removed from the pointer on successful repair * updates overlay tests for KnownUnreliableOrOffline and KnownReliable to expect suspended nodes to be considered "unreliable" * adds satellitedb test that ensures overlay.SelectStorageNodes and overlay.SelectNewStorageNodes do not include suspended nodes * adds audit reporter test to ensure that different audit outcomes result in the correct suspended/disqualified states Change-Id: I40dba67278c8e8d2ce0bcec5e0a5cb6e4ce2f561	2020-03-17 17:14:56 +00:00
Ethan	bdbf764b86	satellite/orders;overlay: Consolidate order limit storage node lookups into 1 query. https: //storjlabs.atlassian.net/browse/SM-449 Change-Id: Idc62cc2978fba67cf48f7c98b27b0f996f9c58ac	2020-03-16 23:15:47 +00:00
Moby von Briesen	8b72181a1f	satellite/{audit,overlay,satellitedb}: implement unknown audit reputation and suspension * change overlay.UpdateStats to allow a third audit outcome. Now it can handle successful, failed, and unknown audits. * when "unknown audit reputation" (unknownAuditAlpha/(unknownAuditAlpha+unknownAuditBeta)) falls below the DQ threshold, put node into suspension. * when unknown audit reputation goes above the DQ threshold, remove node from suspension. * record unknown audits from audit reporter. * add basic tests around unknown audits and suspension. Change-Id: I125f06f3af52e8a29ba48dc19361821a9ff1daa1	2020-03-16 20:29:26 +00:00
Jess G	39cb821196	satellite/overlay: rm combinedcache, fix IP naming to be network (#3798 ) * rn combinedcache, rm dns node lookup Change-Id: I239f07211764b097d851230d8c81900a47756e9e * excludeIPs -> excludedNetworks Change-Id: Ifa6f44ab17457cdd5aff4cd5694296867c18b179 * use lowercase var name Change-Id: I825aad2b718c71f455e747be18f8cabd02aabe55 * update Getnetwork name Change-Id: I002a1b7bc6b4ef40159c0cd2b0ef209f80a9c503 * fix comments Change-Id: Ibddf5b9ffa9d685af6c392d893db063ef18e45fa * update comments with ipv6 Change-Id: I31758b7d4979e7c27d014668f4fb532ad838cda2 Co-authored-by: Stefan Benten <mail@stefan-benten.de>	2020-03-12 11:37:57 -07:00
Jessica Grebenschikov	803e2930f4	satellite: use IP for all uplink operations, use hostname for audit and repairs My understanding is that the nodes table has the following fields: - `address` field which can be a hostname or an IP - `last_net` field that is the /24 subnet of the IP resolved from the address This PR does the following: 1) add back the `last_ip` field to the nodes table 2) for uplink operations remove the calls that the satellite makes to `lookupNodeAddress` (which makes the DNS calls to resolve the IP from the hostname) and instead use the data stored in the nodes table `last_ip` field. This means that the IP that the satellite sends to the uplink for the storage nodes could be approx 1 hr stale. In the short term this is fine, next we will be adding changes so that the storage node pushes any IP changes to the satellite in real time. 3) use the address field for repair and audit since we want them to still make DNS calls to confirm the IP is up to date 4) try to reduce confusion about hostname, ip, subnet, and address in the code base Change-Id: I96ce0d8bb78303f82483d0701bc79544b74057ac	2020-03-11 09:11:40 -07:00
Jennifer Johnson	1c1750e6be	removes bandwidth limiting On satellite, remove all references to free_bandwidth column in nodes table. On storage node, remove references to AllocatedBandwidth and MinimumBandwidth and mark as deprecated. Protobuf message, NodeCapacity, is left intact for backwards compatibility. Once this is released to all satellites, we can drop the column from the DB. Change-Id: I2ff6c6537fc9008a0c5588e951afea58ede85838	2020-03-04 14:04:00 +00:00
Moby von Briesen	f495544c56	satellite/satellitedb/dbx: add fields to node table for placing nodes into suspended mode for too many unknown-error audits Change-Id: Iac9a619e5c08377de87ffdf4acdd0155027f5eb3	2020-03-03 03:30:59 +00:00
Jeff Wendling	7999d24f81	all: use monkit v3 this commit updates our monkit dependency to the v3 version where it outputs in an influx style. this makes discovery much easier as many tools are built to look at it this way. graphite and rothko will suffer some due to no longer being a tree based on dots. hopefully time will exist to update rothko to index based on the new metric format. it adds an influx output for the statreceiver so that we can write to influxdb v1 or v2 directly. Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff	2020-02-05 23:53:17 +00:00
Egon Elbre	76fdb5d863	storage: add configurable lookup limits Currently storage tests were tied to the default lookup limit. By increasing the limits, the tests will take longer and sometimes cause a large number of goroutines to be started. This change adds configurable lookup limit to all storage backends. Also remove boltdb.NewShared, since it's not used any more. Change-Id: I1a052f149da471246fac5745da133c3cfc27582e	2020-01-22 21:35:56 +02:00
Egon Elbre	1abfe42142	satellite: use tagsql Change-Id: I2170dee409fb0c2fe85913ddd36e7811a3b853ed	2020-01-19 14:39:16 +02:00
Egon Elbre	7d79aab14e	satellite/satellitedb: fixes to row handling Change-Id: I48fae692bcca152143a12f333296c42471538850	2020-01-16 17:07:26 +02:00
Jeff Wendling	9da16b1d9e	satellite/satellitedb/dbx: name the package dbx everyone was importing it as dbx anyway. why should it be named satellitedb? so yeah just pass the "-p dbx" flag. Change-Id: I5efa669f4f00f196b38a9acd0d402009475a936f	2020-01-15 15:16:39 -07:00
Egon Elbre	64fb2d3d2f	Revert "dbutil: statically require all databases accesses to use contexts" This reverts commit `8e242cd012`. Revert because lib/pq has known issues with context cancellation. These issues need to be resolved before these changes can be merged. Change-Id: I160af51dbc2d67c5449aafa406a403e5367bb555	2020-01-15 07:28:00 +00:00
JT Olio	8e242cd012	dbutil: statically require all databases accesses to use contexts this will allow for some nice runtime analysis down the road. also, this allows for wrapping database handles in a way that can interact with these contexts requires https://review.dev.storj.io/c/storj/dbx/+/514 Change-Id: Ib087b7cd73296dd2c1e0331314da34d861f61d2b	2020-01-14 18:20:47 -05:00
Yingrong Zhao	76ee8a1b4c	satellite: remove UptimeReputation configs from codebase With the new storage node downtime tracking feature, we need remove current uptime reputation configs: UptimeReputationAlpha, UptimeReputationBeta, and UptimeReputationDQ. This is the first step of removing the uptime reputation columns from satellitedb Change-Id: Ie8fab13295dbf545e33aeda0c4306cda4ba54e36	2020-01-08 18:54:15 +00:00
Natalie Ventura Villasana	1cb0f80a8d	satellite/gracefulexit: dq node on exit fail Disqualifies a node when the node fails to complete a graceful exit. Adds a new DisqualifyNode method to the overlay cache, since there wasn't an existing method to disqualify a node but do nothing else to its stats. Adds checks to existing tests to make sure that a storage node that fails a graceful exit is marked as disqualified in the overlay cache. https: //storjlabs.atlassian.net/browse/V3-3342 Change-Id: I4d554a519ab59db31ad3b8e28764c8683a6e3888	2020-01-06 19:16:26 -05:00
Moby von Briesen	6c2e4cc0cd	satellite/overlay: Return NodeLastContact instead of a node dossier from overlay.GetOfflineNodesLimited We only care about node ID, address, and last contact success/failure from the downtime service, so the overlay should only return these values for the downtime-specific queries. Change-Id: I08a6ecfdd2a12b82cae62e87d6adeab53975bfce	2020-01-06 17:12:30 -05:00
paul cannon	4203e25c54	satellite/satellitedb: use transaction helpers in overlaycache Transactions in our code that might need to work against CockroachDB need to be retried in the event of a retryable error. The transaction helper functions in dbutil do that automatically. I am changing this code to use those helpers instead. Change-Id: Icd3da71448a84c582c6afdc6b52d1f345fe9469f	2020-01-06 21:42:57 +00:00
Ethan	05b406e992	satellite:{downtime,overlay}: Implement offline node detection chore https://storjlabs.atlassian.net/browse/V3-3398 Change-Id: I598c3bad819026377d1d113c099dc9bba8b02742	2020-01-03 17:10:03 +00:00
Moby von Briesen	ff74b44c5f	satellite/overlay: Add ability for overlay to get offline nodes ordered by last checked time This is required for the downtime tracking service: https://storjlabs.atlassian.net/browse/V3-2545 Change-Id: I286cdc07d802393948eb10c25c45ba78cc3ceafc	2020-01-02 16:39:38 +00:00
Egon Elbre	6615ecc9b6	common: separate repository Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a	2019-12-27 14:11:15 +02:00
Yingrong Zhao	6e71591b9b	satellitedb;storagenodedb: remove unnecessary use of DB transactions in graceful exit Change-Id: Ief0a28c6750c130896b48bfebfbea7fb3caa810f	2019-12-20 21:24:38 +00:00
Kaloyan Raev	5ee1a00857	satellite/overlay: filter reliable nodes from a list Adds the KnownReliable method to Overlay Service that filters all nodes from the given list to be only reliable nodes (online and qualified). The method return []*pb.Node of reliable nodes. The pb.Node values are ready for dialing. The first use case is when deleting an object to efficiently dial all reliable nodes holding a piece of that object and send them a delete request. Change-Id: I13e0a8666f3807c5c31ef1a1087476018a5d3acb	2019-12-17 21:20:08 +00:00
paul cannon	b5ddfc6fa5	satellite/satellitedb: unexport satellitedb.DB Backstory: I needed a better way to pass around information about the underlying driver and implementation to all the various db-using things in satellitedb (at least until some new "cockroach driver" support makes it to DBX). After hitting a few dead ends, I decided I wanted to have a type that could act like a dbx.DB but which would also carry information about the implementation, etc. Then I could pass around that type to all the things in satellitedb that previously wanted dbx.DB. But then I realized that satellitedb.DB was, essentially, exactly that already. One thing that might have kept satellitedb.DB from being directly usable was that embedding a dbx.DB inside it would make a lot of dbx methods publicly available on a satellitedb.DB instance that previously were nicely encapsulated and hidden. But after a quick look, I realized that _nothing_ outside of satellite/satellitedb even needs to use satellitedb.DB at all. It didn't even need to be exported, except for some trivially-replaceable code in migrate_postgres_test.go. And once I made it unexported, any concerns about exposing new methods on it were entirely moot. So I have here changed the exported satellitedb.DB type into the unexported satellitedb.satelliteDB type, and I have changed all the places here that wanted raw dbx.DB handles to use this new type instead. Now they can just take a gander at the implementation member on it and know all they need to know about the underlying database. This will make it possible for some other pending code here to differentiate between postgres and cockroach backends. Change-Id: I27af99f8ae23b50782333da5277b553b34634edc	2019-12-16 19:09:30 +00:00
Cameron	7abad3c6bb	refactor sql to be compatible with pq and cockroach (#3647 )	2019-11-26 09:26:22 -05:00
littleskunk	8b3444e088	satellite/nodeselection: don't select nodes that haven't checked in for a while (#3567 ) * satellite/nodeselection: dont select nodes that havent checked in for a while * change testplanet online window to one minute * remove satellite reconfigure online window = 0 in repair tests * pass timestamp into UpdateCheckIn * change timestamp to timestamptz * edit tests to set last_contact_success to 4 hours ago * fix syntax error * remove check for last_contact_success > last_contact_failure in IsOnline	2019-11-15 23:43:06 +01:00
Egon Elbre	ee6c1cac8a	private: rename internal to private (#3573 )	2019-11-14 21:46:15 +02:00
Yingrong Zhao	6331f839ae	satellite/gracefulexit: not allow disqualified node to graceful exit (#3493 )	2019-11-07 12:19:34 -05:00
Natalie Villasana	68a7790069	satellite/gracefulexit: select new node filtered by Distinct IP (#3435 )	2019-11-06 16:38:51 -05:00
Egon Elbre	aa761700af	satellite/satellitedb: update nodes in sorted order (#3446 )	2019-11-01 18:07:23 +01:00
Maximillian von Briesen	54594e79c3	satellite/gracefulexit: add metrics on satellite for graceful exit (#3355 )	2019-10-29 16:22:20 -04:00

1 2 3

129 Commits