storj

Author	SHA1	Message	Date
Malcolm Bouzi	24d60384c5	satellite/satellitedb: add columns for professional users (#4028 ) Co-authored-by: Egon Elbre <egonelbre@gmail.com>	2021-01-26 11:38:53 -05:00
Isaac Hess	c92bda7e75	tally monkit: change location to monitor piecesize When we observed the value for total piecesizes stored in the network, we were doing it after converting them to byte-hours, rather than using the actual piece sizes. This fixes that issue. Change-Id: I1564d21b519f70eb59f298d97dbd777baf127723	2021-01-26 15:37:02 +00:00
Moby von Briesen	0a48071854	satellite/console: Add pagination fields for ListProjectsByOwnerID Add ProjectsCursor type for pagination Add PageCount, CurrentPage, and TotalCount ProjectsPage This allows us to mimic the logic of GetBucketTotals and the implementation of BucketUsages in graphql for the new ProjectsByOwnerID functionality. Change-Id: I4e1613859085db65971b44fcacd9813d9ddad8eb	2021-01-20 16:15:29 +00:00
Cameron Ayer	d14607a5f7	satellite/{contact,nodestats,overlay,satellitedb}: remove references to total_uptime_count and uptime_success_count columns Change-Id: I1f92022909bc564e9b1e31bf937fdfe7c16554de	2021-01-19 15:43:02 -05:00
Cameron Ayer	75d828200c	private,satellite: add chore to dq stray nodes Full scope: private/testplanet,satellite/{overlay,satellitedb} Description: In most cases, downtime tracking with audits will eventually lead to DQ for nodes who are unresponsive. However, if a stray node has no pieces, it will not be audited and will thus never be disqualified. This chore will check for nodes who have not successfully been contacted in some set time and DQ them. There are some new flags for toggling DQ of stray nodes and the timeframes for running the chore and how long nodes can go without contact. Change-Id: Ic9d41fdbf214736798925e728245180fb3c55615	2021-01-19 14:21:56 -05:00
Qweder93	6ba8f6c8a9	storanode, satellite: payout renamed to payouts, expected estimation payouts added, console api for audits reworked Change-Id: I4aa5e99bffaa87d0a800a429a4c83aa498ad4b7b	2021-01-18 10:56:03 +00:00
Ivan Fraixedes	678b07b314	satellite: Fix typos & code formatting Fix some typos in the doc comments and readdress some code formatting applied automatically. Change-Id: I605b4eff2e7c6c58227ecf16be4c1d26f5322eb6	2021-01-15 16:40:26 +01:00
Moby von Briesen	c24f84914c	satellite/console: Add ability to list projects by owner ID Listing projects by owner ID also includes the number of members in each project. Change-Id: I53a09674b60c199ef378943851bb0f164e92e4e2	2021-01-15 14:22:22 +00:00
Cameron Ayer	0184d33e96	satellite/satellitedb: set default 0 on uptime columns This is the first step in the removal of uptime columns on the nodes table. These columns are no longer used: uptime_success_count total_uptime_count uptime_reputation_alpha uptime_reputation_beta In order to avoid breaking backwards compatibility, we need to remove all references to these columns before removing the columns themselves from the database. However, since uptime_success_count and total_uptime_count are NOT NULLABLE, we can't remove them from the insert statements in the overlay. So we can't remove the columns because of the references, and we can't remove the references because the columns can't be null. What a pickle. To remedy this, we will set a default on the columns. Then we should be able to remove them from the insert statements Change-Id: I75f6c56fb7897835bbf29869f86f39de1d9dd345	2021-01-12 17:44:37 +00:00
Cameron Ayer	0403e99a5b	satellite/{overlay,satellitedb}: remove unused methods for old downtime tracking GetSuccessfulNodeNotCheckedInSince and GetOfflineNodesLimited are overlay methods which were only used by the previous downtime tracking system which has been removed. These methods should also be removed. Change-Id: Idb829d742e1f987e095604423fff656fe581183e	2021-01-11 15:21:28 +00:00
Moby von Briesen	6e2ef3b9ee	Revert "satellite/satellitedb: Do not consider nodes with offline_suspended as reputable." This reverts commit `e24262c2c9`. Change-Id: I287deb2e52d03bbd698ed055f0f216b0b5bf2798	2021-01-04 14:28:37 +00:00
Moby von Briesen	825dc71227	satellite/{overlay, satellitedb}: Refactor audit history * Separate audit history interface into its own file in the overlay package * Add overlay.AuditHistory struct so that internalpb.AuditHistory is only used from within the database layer * Add overlay.GetAuditHistory function for features that will require access to detailed audit history information * Do not return full audit history from UpdateAuditHistory - callers to that function only need to know the online score and whether a full tracking period has been completed * Move audit history tests out of satellite/satellitedb, since they are independent of database implementation Change-Id: I35b0c4ac23bbaabd80624f8a9631c3cb1a1f33bd	2020-12-29 18:50:22 +00:00
Moby von Briesen	85ae13f11d	satellite/satellitedb: Drop nodes_offline_times table. Now that the deprecated downtime tracking service is removed (`3fc76f4ffe`), we can safely remove the nodes_offline_times table. Change-Id: Ia7c6efe32ba104dff5a830af5f2beee3337eefe5	2020-12-29 18:17:50 +00:00
Moby von Briesen	e24262c2c9	satellite/satellitedb: Do not consider nodes with offline_suspended as reputable. Nodes which are offline_suspended will no longer be considered for new uploads. The current threshold that enters a node into offline suspension is 0.6. Disqualification for offline suspension is still disabled. Change-Id: I0da9abf47167dd5bf6bb21e0bc2186e003e38d1a	2020-12-29 17:59:09 +00:00
Ethan Adams	6070018021	satellite/overlay: use AS OF SYSTEM TIME with Cockroach Query nodes table using AS OF SYSTEM TIME '-10s' (by default) when on CRDB to alleviate contention on the nodes table and minimize CRDB retries. Queries for standard uploads are already cached, and node lookups for graceful exit uploads has retry logic so it isn't necessary for the nodes returned to be current.	2020-12-22 21:07:07 +02:00
Ethan Adams	563197c628	satellite/overlay: Add index on nodes table (#4012 ) satellite/accounting: Add index for project_id on bucket_storage_tallies	2020-12-21 12:48:48 -05:00
Ethan Adams	9b52283570	satellite/accounting: Add index for project_id on bucket_storage_tallies (#4010 ) Change-Id: I47ab2d1e24f94307c3383c497cffe2a150fa8ab7	2020-12-21 11:42:00 -05:00
Ethan Adams	6e501898c3	satellite/accounting: Performance improvements to getNodeIds used by GetBandwidthSince (#4009 )	2020-12-21 16:37:01 +01:00
Jessica Grebenschikov	da0327c9b7	satellite/dbcleanup: remove expired serial chore Change-Id: Ib71d41eb6679d6435e5bc10b6244dac66380a74e	2020-12-18 09:36:28 -08:00
Cameron Ayer	28eaae66af	satellite/satellitedb: drop num_healthy_pieces column from injuredsegments This column is no longer used as it has been replaced by the segment_health column. Change-Id: I6b4df89cd4f994d8418976f88e8c5f57615f8115	2020-12-17 20:17:08 +00:00
Stefan Benten	9fe477899b	satellite/satellitedb: add lint ignore rule to support staticcheck 2020.2 staticcheck 2020.2 is not liking our dbx files, so we need to ignore them. Change-Id: I6becc3619bb088473f9776d0878ce240d4935936	2020-12-14 21:16:31 +00:00
Jessica Grebenschikov	0649d2b930	satellite/repair: improve contention for injuredsegments table on CRDB We migrated satelliteDB off of Postgres and over to CockroachDB (crdb), but there was way too high contention for the injuredsegments table so we had to rollback to Postgres for the repair queue. A couple things contributed to this problem: 1) crdb doesn't support `FOR UPDATE SKIP LOCKED` 2) the original crdb Select query was doing 2 full table scans and not using any indexes 3) the SLC Satellite (where we were doing the migration) was running 48 repair worker processes, each of which run up to 5 goroutines which all are trying to select out of the repair queue and this was causing a ton of contention. The changes in this PR should help to reduce that contention and improve performance on CRDB. The changes include: 1) Use an update/set query instead of select/update to capitalize on the new `UPDATE` implicit row locking ability in CRDB. - Details: As of CRDB v20.2.2, there is implicit row locking with update/set queries (contention reduction and performance gains are described in this blog post: https://www.cockroachlabs.com/blog/when-and-why-to-use-select-for-update-in-cockroachdb/). 2) Remove the `ORDER BY` clause since this was causing a full table scan and also prevented the use of the row locking capability. - While long term it is very important to `ORDER BY segment_health`, the change here is only suppose to be a temporary bandaid to get us migrated over to CRDB quickly. Since segment_health has been set to infinity for some time now (re: https://review.dev.storj.io/c/storj/storj/+/3224), it seems like it might be ok to continue not making use of this for the short term. However, long term this needs to be fixed with a redesign of the repair workers, possible in the trusted delegated repair design (https://review.dev.storj.io/c/storj/storj/+/2602) or something similar to what is recommended here on how to implement a queue on CRDB https://dev.to/ajwerner/quick-and-easy-exactly-once-distributed-work-queues-using-serializable-transactions-jdp, or migrate to rabbit MQ priority queue or something similar.. This PRs improved query uses the index to avoid full scans and also locks the row its going to update and CRDB retries for us if there are any lock errors. Change-Id: Id29faad2186627872fbeb0f31536c4f55f860f23	2020-12-10 09:51:26 -08:00
Michal Niewrzal	c2a97aeb14	satellite/satellitedb: add ListAllBuckets method We need to be able to list all buckets in DB without knowing project ID. This method will be used to list buckets for metainfo loop implementation based on metabase. Change-Id: Iac75af0eee4f31e80a15577575a8249cbca787b2	2020-12-10 14:19:27 +00:00
Stefan Benten	494bd5db81	all: golangci-lint v1.33.0 fixes (#3985 )	2020-12-05 17:01:42 +01:00
Ethan Adams	f90ea10a4a	Allow for DB application names per process. (#3983 )	2020-12-04 11:24:39 +01:00
Moby von Briesen	3fc76f4ffe	satellite/downtime: Remove deprecated downtime tracking service. We are no longer planning on implementing downtime penalization using the method described in docs/blueprints/archive/storage-node-downtime-tracking-deprecated.md. Now, we are implementing the design described in docs/blueprints/storage-node-downtime-tracking-with-audits.md. This change removes the downtime estimation chores from the satellite core as well as the package satellite/downtime. A future change will remove the database table. Change-Id: I1a1d3cf9dceeba36255d25243294865b89925518	2020-12-02 15:16:13 -05:00
JT Olio	1728c3a992	satellite/dbx: standardize on assignment Change-Id: I8f87bc8391e765e4480b0590d92d3601248e1f93	2020-12-01 16:10:18 +00:00
JT Olio	70b91aac54	satellitedb: remove cruft caused by https://review.dev.storj.io/c/storj/storj/+/3223 Change-Id: I198bb2f869cc7177b9ecafdd8932bbf2b58be5b8	2020-12-01 00:16:26 +00:00
Egon Elbre	f456d7ce03	satellite: remove implementation detail from DB interface Which database access and how it internally does migrations is an implementation detail and does not belong in the requirements interface. Change-Id: Ia4a6994f39470063a96a8e5f3a1bd27aa79fe5cd	2020-11-30 13:29:20 +02:00
Egon Elbre	28ea63be92	satellite/repair: avoid TestDBAccess Change-Id: I34adb58cd67fba5917032f2f328d75b1c4afdbbf	2020-11-30 13:29:08 +02:00
JT Olio	71e11b27f3	satellite/dbx: only retry with cockroach Change-Id: Id3630c26dbfda36dcbece2849e2353d5ab2882af	2020-11-29 18:10:07 -07:00
JT Olio	bd23d12bb9	satellite/dbx: add cockroach retries for other QueryContext operations Change-Id: Ia30fbba55c926892702fa96fb9dd01b75347d351	2020-11-29 18:09:56 -07:00
JT Olio	ea2f39ca7f	satellite/dbx: add retries for QueryRowContext-based operations Change-Id: Ie2527b673dd4ce5250cf5c0cbf8f14921262f665	2020-11-29 18:09:46 -07:00
JT Olio	d3b0691bbd	satellite/dbx: import dbx templates these are unchanged from storj.io/dbx. we're importing them because in a later commit we will change them, and it'd be nice to see that diff as a separate commit. Change-Id: I8315130ed6bab397bd65b9a1a90c29d130b8c02d	2020-11-29 18:09:33 -07:00
JT Olio	5d8a67a4f7	satellitedb: retry GetBandwidthSince on cockroach Change-Id: I2bf20f3a19e7f3af97630d8a679410feba70661e	2020-11-29 16:36:15 -07:00
Ethan	5dc013d3bd	satellite/overlay: Add retry to all selects in overlaycache Change-Id: I0356d71a35701f8e0ca04a34b2bb2aea666c1394	2020-11-29 16:46:57 -05:00
JT Olio	6bce907cb0	satellite: try to stream rollups to aggregation function to use less memory this change tries really hard to never have all of the storage node rollups in memory at the same time, up until the rollups are actually getting summed together. Change-Id: If67f49e7d71106798d996a6850b3e48671bd9e18	2020-11-29 10:26:32 -07:00
JT Olio	6aae21541f	satellitedb: do saverollup in batches Change-Id: I78278a192cba60541eee2986f54a88d5a479bd3e	2020-11-28 19:26:46 -07:00
JT Olio	0ba516d405	satellite: support pointing db components at different databases the immediate need is to be able to move the repair queue back out of cockroach if we can't save it. Change-Id: If26001a4e6804f6bb8713b4aee7e4fd6254dc326	2020-11-28 18:39:16 +00:00
Egon Elbre	55d5e1fd7d	satellite/orders: ensure that expired deletion doesn't stall Add checks to ensure that when somebody uses empty options, the deletion doesn't loop infinitely. Change-Id: I1738fb1e7e1f8efbbb954c491cb6489f7bcdc2db	2020-11-23 14:52:40 +02:00
Ethan	2b92bba563	satellite/satellitedb/orders: Handle serial_numbers deletes in smaller increments on CRDB CRDB doesn't like large deletes. While testing in the POC environment we found that deletes on the serial_numbers table could take hours. This change limits deletes to 1000 at a time (configurable) to avoid blocking other queries. Change-Id: I08455e25db1574579dd4d7b7125a08e9c913dff1	2020-11-20 13:44:52 +00:00
Moby von Briesen	a8b66dce17	satellite/accounting: account for old orders that can be submitted in satellite rollup With the new phase 3 order submission, orders can be added to the storage and bandwidth rollup tables at timestamps before the most recent rollup was run. This change shifts the start time of each new rollup window to account for any unexpired orders that might have been added since the previous rollup. A satellitedb migration is necessary to allow upserts in the accounting_rollups table when entries with identical node_ids and start_times are inserted. Change-Id: Ib3022081f4d6be60cfec8430b45867ad3c01da63	2020-11-18 14:46:00 -05:00
Moby von Briesen	0ec685b173	satellite/{satellitedb, repair/{queue, checker}}: Use new column "segmentHealth" instead of "numHealthy" in injured segments queue We plan to add support for a new Reed-Solomon scheme soon, but our repair queue orders segments by least number of healthy pieces first. With a second RS scheme, fewer healthy pieces will not necessarily correlate to lower health. This change just adds the new column in a migration. A separate change will add the new health function. Right now, since we only support one RS scheme, behavior will not change. Number of healthy pieces is being inserted as "segment health" until the new health function is merged. Segment health is calculated with a new priority function created in commit `3e5640359`. In order to use the function, a new config value is added, called NodeFailureRate, representing the approximate probability of any individual node going down in the duration of one checker run. Change-Id: I51c4202203faf52528d923befbe886dbf86d02f2	2020-11-16 21:18:09 +00:00
Jessica Grebenschikov	f558cc825e	satellite/orders: add storagenode_bw_phase2 table and dont delete tallies for longer It turns out we need to make 2 more changes in order for the new order submission phase 3 to get deployed. This PR makes 2 changes: 1) when the rollup service deletes tallies, we now keep tallies around until orders expire (vs 1 day like before). 2) the reported rollup chore will now write the storagenode_bandwidth_rollups to a new table _phase2 as an intermediary step so it doesn't conflict with phase 3 order settlement. These changes need to be deployed for 2 days before we can turn on phase 3 of the new orders settlement workflow. Change-Id: Iafbff577ba7d55f8f17b7db857311b2ce799de60	2020-11-13 17:15:24 +00:00
Cameron Ayer	dc67ce74c9	satellite: remove IsUp field from overlay.UpdateRequest With the new overlay.AuditOutcome type for offline audits, the IsUp field is redundant. If AuditOutcome != AuditOffline, then the node is online. In addition to removing the field itself, other changes needed to be made regarding the relationship between 'uptime' and 'audits'. Previously, uptime and audit outcome were completely separated. For example, it was possible to update a node's stats to give it a successful/failed/unknown audit while simultaneously indicating that the node was offline by setting IsUp to false. This is no longer possible under this changeset. Some test which did this have been changed slightly in order to pass. Also add new benchmarks for UpdateStats and BatchUpdateStats with different audit outcomes. Change-Id: I998892d615850b1f138dc62f9b050f720ea0926b	2020-11-02 15:34:17 -05:00
Egon Elbre	7183dca6cb	all: fix defers in loop defer should not be called in a loop. Change-Id: Ifa5a25a56402814b974bcdfb0c2fce56df8e7e59	2020-11-02 15:06:38 +02:00
Egon Elbre	11338e9beb	satellite/internalpb: move audithistory.pb Change-Id: I8eee84d49ed90459168ddaf04ae57f790c2a22c4	2020-10-30 15:30:11 +02:00
Egon Elbre	7ce372c686	satellite/internalpb: add inspectors Change-Id: Ib688e43d05135c0c31ae95df533f1e4535ea396a	2020-10-30 13:28:17 +02:00
Egon Elbre	004e610d0f	satellite/internalpb: move datarepair.pb to internal Change-Id: If901d9ff4e5ee6715b963eeeb46513a602a44b3d	2020-10-30 13:28:14 +02:00
Egon Elbre	caefde6b32	private/{dbutil,tagsql}: pass ctx to database opening Database opening usually dial and hence we should pass ctx to them. Change-Id: Iaa2875981570d83e65be3710f841cf30349f807b	2020-10-29 10:51:29 +00:00

1 2 3 4 5 ...

636 Commits