storj

Author	SHA1	Message	Date
Michał Niewrzał	ec88d21a3c	Merge 'main' branch. Change-Id: I6e8162d1a6caf75e89c9f9c9f9522730aebf83ae	2021-01-11 10:26:58 +01:00
Moby von Briesen	6e2ef3b9ee	Revert "satellite/satellitedb: Do not consider nodes with offline_suspended as reputable." This reverts commit `e24262c2c9`. Change-Id: I287deb2e52d03bbd698ed055f0f216b0b5bf2798	2021-01-04 14:28:37 +00:00
Michał Niewrzał	ad3e3a38c5	Merge 'main' branch Change-Id: Ia0db1b1f9ef3e0671d3f2208881b0abc3064e200	2021-01-04 12:13:45 +01:00
Moby von Briesen	825dc71227	satellite/{overlay, satellitedb}: Refactor audit history * Separate audit history interface into its own file in the overlay package * Add overlay.AuditHistory struct so that internalpb.AuditHistory is only used from within the database layer * Add overlay.GetAuditHistory function for features that will require access to detailed audit history information * Do not return full audit history from UpdateAuditHistory - callers to that function only need to know the online score and whether a full tracking period has been completed * Move audit history tests out of satellite/satellitedb, since they are independent of database implementation Change-Id: I35b0c4ac23bbaabd80624f8a9631c3cb1a1f33bd	2020-12-29 18:50:22 +00:00
Moby von Briesen	85ae13f11d	satellite/satellitedb: Drop nodes_offline_times table. Now that the deprecated downtime tracking service is removed (`3fc76f4ffe`), we can safely remove the nodes_offline_times table. Change-Id: Ia7c6efe32ba104dff5a830af5f2beee3337eefe5	2020-12-29 18:17:50 +00:00
Moby von Briesen	e24262c2c9	satellite/satellitedb: Do not consider nodes with offline_suspended as reputable. Nodes which are offline_suspended will no longer be considered for new uploads. The current threshold that enters a node into offline suspension is 0.6. Disqualification for offline suspension is still disabled. Change-Id: I0da9abf47167dd5bf6bb21e0bc2186e003e38d1a	2020-12-29 17:59:09 +00:00
Ethan Adams	6070018021	satellite/overlay: use AS OF SYSTEM TIME with Cockroach Query nodes table using AS OF SYSTEM TIME '-10s' (by default) when on CRDB to alleviate contention on the nodes table and minimize CRDB retries. Queries for standard uploads are already cached, and node lookups for graceful exit uploads has retry logic so it isn't necessary for the nodes returned to be current.	2020-12-22 21:07:07 +02:00
Michal Niewrzal	9a8959d429	Merge 'master' branch Change-Id: Iba69ea73ca4d3f1cd4ae94243eaaae033c5324e8	2020-12-22 14:55:57 +01:00
Ethan Adams	563197c628	satellite/overlay: Add index on nodes table (#4012 ) satellite/accounting: Add index for project_id on bucket_storage_tallies	2020-12-21 12:48:48 -05:00
Ethan Adams	9b52283570	satellite/accounting: Add index for project_id on bucket_storage_tallies (#4010 ) Change-Id: I47ab2d1e24f94307c3383c497cffe2a150fa8ab7	2020-12-21 11:42:00 -05:00
Ethan Adams	6e501898c3	satellite/accounting: Performance improvements to getNodeIds used by GetBandwidthSince (#4009 )	2020-12-21 16:37:01 +01:00
Jessica Grebenschikov	da0327c9b7	satellite/dbcleanup: remove expired serial chore Change-Id: Ib71d41eb6679d6435e5bc10b6244dac66380a74e	2020-12-18 09:36:28 -08:00
Michal Niewrzal	2111740236	Merge 'master' branch Change-Id: Ib73af0ff3ce0e9a1547b0b9fc55bf88704f6f394	2020-12-18 09:13:24 +01:00
Cameron Ayer	28eaae66af	satellite/satellitedb: drop num_healthy_pieces column from injuredsegments This column is no longer used as it has been replaced by the segment_health column. Change-Id: I6b4df89cd4f994d8418976f88e8c5f57615f8115	2020-12-17 20:17:08 +00:00
Michal Niewrzal	70ba4deea9	satellite/repair/checker: adjust irreparable part of repair checker Change-Id: I0732104a97ba18a5359de3966cd692677a0ff790	2020-12-17 14:11:22 +00:00
Kaloyan Raev	9aa61245d0	satellite/audits: migrate to metabase Change-Id: I480c941820c5b0bd3af0539d92b548189211acb2	2020-12-17 14:38:48 +02:00
Michal Niewrzal	57f374af24	Merge 'master' branch Change-Id: Idf6b10ea7ca94e4d232e6a3b6a38ef2e646ba197	2020-12-15 08:26:53 +01:00
Stefan Benten	9fe477899b	satellite/satellitedb: add lint ignore rule to support staticcheck 2020.2 staticcheck 2020.2 is not liking our dbx files, so we need to ignore them. Change-Id: I6becc3619bb088473f9776d0878ce240d4935936	2020-12-14 21:16:31 +00:00
Jessica Grebenschikov	0649d2b930	satellite/repair: improve contention for injuredsegments table on CRDB We migrated satelliteDB off of Postgres and over to CockroachDB (crdb), but there was way too high contention for the injuredsegments table so we had to rollback to Postgres for the repair queue. A couple things contributed to this problem: 1) crdb doesn't support `FOR UPDATE SKIP LOCKED` 2) the original crdb Select query was doing 2 full table scans and not using any indexes 3) the SLC Satellite (where we were doing the migration) was running 48 repair worker processes, each of which run up to 5 goroutines which all are trying to select out of the repair queue and this was causing a ton of contention. The changes in this PR should help to reduce that contention and improve performance on CRDB. The changes include: 1) Use an update/set query instead of select/update to capitalize on the new `UPDATE` implicit row locking ability in CRDB. - Details: As of CRDB v20.2.2, there is implicit row locking with update/set queries (contention reduction and performance gains are described in this blog post: https://www.cockroachlabs.com/blog/when-and-why-to-use-select-for-update-in-cockroachdb/). 2) Remove the `ORDER BY` clause since this was causing a full table scan and also prevented the use of the row locking capability. - While long term it is very important to `ORDER BY segment_health`, the change here is only suppose to be a temporary bandaid to get us migrated over to CRDB quickly. Since segment_health has been set to infinity for some time now (re: https://review.dev.storj.io/c/storj/storj/+/3224), it seems like it might be ok to continue not making use of this for the short term. However, long term this needs to be fixed with a redesign of the repair workers, possible in the trusted delegated repair design (https://review.dev.storj.io/c/storj/storj/+/2602) or something similar to what is recommended here on how to implement a queue on CRDB https://dev.to/ajwerner/quick-and-easy-exactly-once-distributed-work-queues-using-serializable-transactions-jdp, or migrate to rabbit MQ priority queue or something similar.. This PRs improved query uses the index to avoid full scans and also locks the row its going to update and CRDB retries for us if there are any lock errors. Change-Id: Id29faad2186627872fbeb0f31536c4f55f860f23	2020-12-10 09:51:26 -08:00
Michal Niewrzal	b3acc1101a	Merge 'master' branch Change-Id: Iee99400c7095770e61cde94b3b2c8eb0ddec463d	2020-12-10 15:42:52 +01:00
Michal Niewrzal	c2a97aeb14	satellite/satellitedb: add ListAllBuckets method We need to be able to list all buckets in DB without knowing project ID. This method will be used to list buckets for metainfo loop implementation based on metabase. Change-Id: Iac75af0eee4f31e80a15577575a8249cbca787b2	2020-12-10 14:19:27 +00:00
Michal Niewrzal	218bbeaffa	Merge 'master' branch Change-Id: Ica5c25607a951076dd9f77e35e308062f71ce3f0	2020-12-07 15:05:52 +01:00
Stefan Benten	494bd5db81	all: golangci-lint v1.33.0 fixes (#3985 )	2020-12-05 17:01:42 +01:00
Ethan Adams	f90ea10a4a	Allow for DB application names per process. (#3983 )	2020-12-04 11:24:39 +01:00
Moby von Briesen	3fc76f4ffe	satellite/downtime: Remove deprecated downtime tracking service. We are no longer planning on implementing downtime penalization using the method described in docs/blueprints/archive/storage-node-downtime-tracking-deprecated.md. Now, we are implementing the design described in docs/blueprints/storage-node-downtime-tracking-with-audits.md. This change removes the downtime estimation chores from the satellite core as well as the package satellite/downtime. A future change will remove the database table. Change-Id: I1a1d3cf9dceeba36255d25243294865b89925518	2020-12-02 15:16:13 -05:00
JT Olio	1728c3a992	satellite/dbx: standardize on assignment Change-Id: I8f87bc8391e765e4480b0590d92d3601248e1f93	2020-12-01 16:10:18 +00:00
JT Olio	70b91aac54	satellitedb: remove cruft caused by https://review.dev.storj.io/c/storj/storj/+/3223 Change-Id: I198bb2f869cc7177b9ecafdd8932bbf2b58be5b8	2020-12-01 00:16:26 +00:00
Michal Niewrzal	5a7bc9657d	Merge 'master' branch Change-Id: If583132a821274dc4b78cf5f72b853ba8460c619	2020-11-30 12:57:22 +01:00
Egon Elbre	f456d7ce03	satellite: remove implementation detail from DB interface Which database access and how it internally does migrations is an implementation detail and does not belong in the requirements interface. Change-Id: Ia4a6994f39470063a96a8e5f3a1bd27aa79fe5cd	2020-11-30 13:29:20 +02:00
Egon Elbre	28ea63be92	satellite/repair: avoid TestDBAccess Change-Id: I34adb58cd67fba5917032f2f328d75b1c4afdbbf	2020-11-30 13:29:08 +02:00
JT Olio	71e11b27f3	satellite/dbx: only retry with cockroach Change-Id: Id3630c26dbfda36dcbece2849e2353d5ab2882af	2020-11-29 18:10:07 -07:00
JT Olio	bd23d12bb9	satellite/dbx: add cockroach retries for other QueryContext operations Change-Id: Ia30fbba55c926892702fa96fb9dd01b75347d351	2020-11-29 18:09:56 -07:00
JT Olio	ea2f39ca7f	satellite/dbx: add retries for QueryRowContext-based operations Change-Id: Ie2527b673dd4ce5250cf5c0cbf8f14921262f665	2020-11-29 18:09:46 -07:00
JT Olio	d3b0691bbd	satellite/dbx: import dbx templates these are unchanged from storj.io/dbx. we're importing them because in a later commit we will change them, and it'd be nice to see that diff as a separate commit. Change-Id: I8315130ed6bab397bd65b9a1a90c29d130b8c02d	2020-11-29 18:09:33 -07:00
JT Olio	5d8a67a4f7	satellitedb: retry GetBandwidthSince on cockroach Change-Id: I2bf20f3a19e7f3af97630d8a679410feba70661e	2020-11-29 16:36:15 -07:00
Ethan	5dc013d3bd	satellite/overlay: Add retry to all selects in overlaycache Change-Id: I0356d71a35701f8e0ca04a34b2bb2aea666c1394	2020-11-29 16:46:57 -05:00
JT Olio	6bce907cb0	satellite: try to stream rollups to aggregation function to use less memory this change tries really hard to never have all of the storage node rollups in memory at the same time, up until the rollups are actually getting summed together. Change-Id: If67f49e7d71106798d996a6850b3e48671bd9e18	2020-11-29 10:26:32 -07:00
JT Olio	6aae21541f	satellitedb: do saverollup in batches Change-Id: I78278a192cba60541eee2986f54a88d5a479bd3e	2020-11-28 19:26:46 -07:00
JT Olio	0ba516d405	satellite: support pointing db components at different databases the immediate need is to be able to move the repair queue back out of cockroach if we can't save it. Change-Id: If26001a4e6804f6bb8713b4aee7e4fd6254dc326	2020-11-28 18:39:16 +00:00
Michal Niewrzal	efaba85c73	Merge 'master' branch Change-Id: I3520b3e327732929f5167b07a15ddb92d26cae1b	2020-11-24 10:03:20 +01:00
Egon Elbre	55d5e1fd7d	satellite/orders: ensure that expired deletion doesn't stall Add checks to ensure that when somebody uses empty options, the deletion doesn't loop infinitely. Change-Id: I1738fb1e7e1f8efbbb954c491cb6489f7bcdc2db	2020-11-23 14:52:40 +02:00
Ethan	2b92bba563	satellite/satellitedb/orders: Handle serial_numbers deletes in smaller increments on CRDB CRDB doesn't like large deletes. While testing in the POC environment we found that deletes on the serial_numbers table could take hours. This change limits deletes to 1000 at a time (configurable) to avoid blocking other queries. Change-Id: I08455e25db1574579dd4d7b7125a08e9c913dff1	2020-11-20 13:44:52 +00:00
Moby von Briesen	a8b66dce17	satellite/accounting: account for old orders that can be submitted in satellite rollup With the new phase 3 order submission, orders can be added to the storage and bandwidth rollup tables at timestamps before the most recent rollup was run. This change shifts the start time of each new rollup window to account for any unexpired orders that might have been added since the previous rollup. A satellitedb migration is necessary to allow upserts in the accounting_rollups table when entries with identical node_ids and start_times are inserted. Change-Id: Ib3022081f4d6be60cfec8430b45867ad3c01da63	2020-11-18 14:46:00 -05:00
Moby von Briesen	0ec685b173	satellite/{satellitedb, repair/{queue, checker}}: Use new column "segmentHealth" instead of "numHealthy" in injured segments queue We plan to add support for a new Reed-Solomon scheme soon, but our repair queue orders segments by least number of healthy pieces first. With a second RS scheme, fewer healthy pieces will not necessarily correlate to lower health. This change just adds the new column in a migration. A separate change will add the new health function. Right now, since we only support one RS scheme, behavior will not change. Number of healthy pieces is being inserted as "segment health" until the new health function is merged. Segment health is calculated with a new priority function created in commit `3e5640359`. In order to use the function, a new config value is added, called NodeFailureRate, representing the approximate probability of any individual node going down in the duration of one checker run. Change-Id: I51c4202203faf52528d923befbe886dbf86d02f2	2020-11-16 21:18:09 +00:00
Michal Niewrzal	7c384c8293	Merge 'master' branch Change-Id: I1eefd5a56449e577820977d61fa4a22bdd4fc230	2020-11-16 10:02:54 +01:00
Jessica Grebenschikov	f558cc825e	satellite/orders: add storagenode_bw_phase2 table and dont delete tallies for longer It turns out we need to make 2 more changes in order for the new order submission phase 3 to get deployed. This PR makes 2 changes: 1) when the rollup service deletes tallies, we now keep tallies around until orders expire (vs 1 day like before). 2) the reported rollup chore will now write the storagenode_bandwidth_rollups to a new table _phase2 as an intermediary step so it doesn't conflict with phase 3 order settlement. These changes need to be deployed for 2 days before we can turn on phase 3 of the new orders settlement workflow. Change-Id: Iafbff577ba7d55f8f17b7db857311b2ce799de60	2020-11-13 17:15:24 +00:00
Michal Niewrzal	7dde184cb5	Merge 'master' branch Change-Id: I6070089128a150a4dd501bbc62a1f8b394aa643e	2020-11-10 11:58:59 +00:00
Cameron Ayer	dc67ce74c9	satellite: remove IsUp field from overlay.UpdateRequest With the new overlay.AuditOutcome type for offline audits, the IsUp field is redundant. If AuditOutcome != AuditOffline, then the node is online. In addition to removing the field itself, other changes needed to be made regarding the relationship between 'uptime' and 'audits'. Previously, uptime and audit outcome were completely separated. For example, it was possible to update a node's stats to give it a successful/failed/unknown audit while simultaneously indicating that the node was offline by setting IsUp to false. This is no longer possible under this changeset. Some test which did this have been changed slightly in order to pass. Also add new benchmarks for UpdateStats and BatchUpdateStats with different audit outcomes. Change-Id: I998892d615850b1f138dc62f9b050f720ea0926b	2020-11-02 15:34:17 -05:00
Egon Elbre	7183dca6cb	all: fix defers in loop defer should not be called in a loop. Change-Id: Ifa5a25a56402814b974bcdfb0c2fce56df8e7e59	2020-11-02 15:06:38 +02:00
Egon Elbre	716068a1e0	Merge branch 'master'. Change-Id: Ic14325edc291573582dce0cea3e04991a820b48b	2020-11-02 13:02:01 +02:00

1 2 3 4 5 ...

641 Commits