storj/satellite
Jessica Grebenschikov 0649d2b930 satellite/repair: improve contention for injuredsegments table on CRDB
We migrated satelliteDB off of Postgres and over to CockroachDB (crdb), but there was way too high contention for the injuredsegments table so we had to rollback to Postgres for the repair queue. A couple things contributed to this problem:
1) crdb doesn't support `FOR UPDATE SKIP LOCKED`
2) the original crdb Select query was doing 2 full table scans and not using any indexes
3) the SLC Satellite (where we were doing the migration) was running 48 repair worker processes, each of which run up to 5 goroutines which all are trying to select out of the repair queue and this was causing a ton of contention.

The changes in this PR should help to reduce that contention and improve performance on CRDB.
The changes include:
1) Use an update/set query instead of select/update to capitalize on the new `UPDATE` implicit row locking ability in CRDB.
- Details: As of CRDB v20.2.2, there is implicit row locking with update/set queries (contention reduction and performance gains are described in this blog post: https://www.cockroachlabs.com/blog/when-and-why-to-use-select-for-update-in-cockroachdb/).

2) Remove the `ORDER BY` clause since this was causing a full table scan and also prevented the use of the row locking capability.
- While long term it is very important to `ORDER BY segment_health`, the change here is only suppose to be a temporary bandaid to get us migrated over to CRDB quickly. Since segment_health has been set to infinity for some time now (re: https://review.dev.storj.io/c/storj/storj/+/3224), it seems like it might be ok to continue not making use of this for the short term. However, long term this needs to be fixed with a redesign of the repair workers, possible in the trusted delegated repair design (https://review.dev.storj.io/c/storj/storj/+/2602) or something similar to what is recommended here on how to implement a queue on CRDB https://dev.to/ajwerner/quick-and-easy-exactly-once-distributed-work-queues-using-serializable-transactions-jdp, or migrate to rabbit MQ priority queue or something similar..

This PRs improved query uses the index to avoid full scans and also locks the row its going to update and CRDB retries for us if there are any lock errors.

Change-Id: Id29faad2186627872fbeb0f31536c4f55f860f23
2020-12-10 09:51:26 -08:00
..
accounting all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
admin satellite/admin: add apikey endpoints 2020-10-20 11:26:56 +00:00
attribution all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
audit all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
compensation all: add missing dots 2020-08-11 17:50:01 +03:00
console all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
contact satellite/{accounting, contact}: Remove periods and spaces from metrics. 2020-12-03 15:33:01 +00:00
dbcleanup satellite/orders: ensure that expired deletion doesn't stall 2020-11-23 14:52:40 +02:00
gc all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
gracefulexit all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
inspector all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
internalpb {satellite,storagenode}/internalpb: use specific package name 2020-10-30 17:31:08 +02:00
mailservice all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
marketingweb all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
metainfo satellite/satellitedb: add ListAllBuckets method 2020-12-10 14:19:27 +00:00
metrics satellite/metainfo: get away from using pb.Pointer in Metainfo Loop 2020-10-27 13:06:47 +00:00
nodeapiversion satellite/nodeapiversion: new table for tracking node api usage 2020-07-09 15:02:25 +00:00
nodeselection all: add missing dots 2020-08-11 17:50:01 +03:00
nodestats all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
orders all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
overlay all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
payments satellite/satellitedb: make limits per default NULL 2020-10-14 20:28:16 +00:00
referrals all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
repair satellite/repair: improve contention for injuredsegments table on CRDB 2020-12-10 09:51:26 -08:00
revocation all: fix dots 2020-07-16 14:58:28 +00:00
rewards all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
satellitedb satellite/repair: improve contention for injuredsegments table on CRDB 2020-12-10 09:51:26 -08:00
snopayout all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
admin.go all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
api.go all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
configlock_test.go all: fix linter complaints 2020-10-13 15:59:01 +03:00
core.go all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
gc.go all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
peer.go all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
repairer.go satellite/repair: Update repair override config to support multiple RS schemes. 2020-11-23 18:01:15 +00:00