Commit Graph

1653 Commits

Author SHA1 Message Date
Kaloyan Raev
5934969dd6 satellite/orders: remove obsolete CreateDeleteOrderLimits method
Not used anywhere.

Change-Id: I878635d2d533ad4b06ba0d07a94908105546cb82
2020-12-22 13:15:11 +02:00
Kaloyan Raev
777f6e583b satellite/metainfo: replace a call to PointerDB with call to Metabase
Change-Id: I10aa89dacf91cb6c1528698031e6b53c52915bd9
2020-12-22 12:44:28 +02:00
Egon Elbre
c4578eb3ec satellite/gc: add test for pending object
Change-Id: Ifb076ab38442f88f94a3e0c2ae1b19528a55f724
2020-12-22 09:42:32 +00:00
Michal Niewrzal
66d4d5eb48 satellite/metainfo/metabase: implement IterateObjectsAllVersions for
pending/committed objects

Change-Id: Ibf390821b6a23919078de4f18c2653e308320904
2020-12-22 10:27:09 +01:00
Michal Niewrzal
dad8360b39 satellite/metainfo/metabase: rename IterateObjectsAllVersions to
IterateObjectsAllVersionsWithStatus

We need different implementation for IterateObjectsAllVersions because
we want to iterate over all object without specifying object status.
Existing method will have new name but implementation details are not
changed.

Change-Id: I01b987996772fa7f8fd73da9910d52db2d1aa0d7
2020-12-21 16:47:32 +00:00
Kaloyan Raev
bafc6af992 ci: remove workaround for failing tests
Change-Id: I3eb673fae6c81bee17d7437cb870d5f5ba6978d5
2020-12-21 18:07:40 +02:00
Egon Elbre
529bae3052 satellite/accounting: fix TestBilling_AuditRepairTraffic
Change-Id: I1743b2ab9e1afeb16ca31338a6b067da88e906ee
2020-12-21 13:18:35 +00:00
Michal Niewrzal
f3ef8088e7 satellite/metainfo/metabase: add Verify method for Pieces
This change adds Verify method for pieces to do some basic checks.

Change-Id: I0ff4313b594d2cb3aad7da545f940e10ee654b77
2020-12-21 12:49:03 +00:00
Michal Niewrzal
18825d1e0b satellite/{metainfo,gracefulexit}: fix failing tests
Change-Id: I3428ea601255c36a316732c9f75135d6e5fa4d79
2020-12-21 12:22:32 +00:00
Kaloyan Raev
4d37d14929 satellite/{metrics,repair}: adjust monitoring to new metainfo loop
Change-Id: I87a2145daa5ed49bb2c08d6967baa09c0b14b4c6
2020-12-21 09:05:17 +02:00
Egon Elbre
4e19839669 satellite/accounting: fix TestProjectUsage_ tests
Change-Id: Id3972ca592c4459cc95668a53daca09f818e34b7
2020-12-18 16:41:02 +02:00
Egon Elbre
65b22be417 satellite/accounting/tally: use metabase
Change-Id: I6d49dc103a18e8a110bfa7775d53a65d208b6c2c
2020-12-18 16:18:03 +02:00
Michal Niewrzal
f7a31308db satellite/repair: enable TestRemoveExpiredSegmentFromQueue test
Change adds ability to set `now` time during test for repair.

Change-Id: Idb8826b7b58b8789b0abc65817b888ecdc752a3f
2020-12-18 10:58:05 +00:00
Michal Niewrzal
b3aa28cc02 satellite/gracefulexit: migrate to metabase
Change-Id: I8be9cc68894124427e4a30d7631126b3afb1f281
2020-12-18 10:57:39 +00:00
Michal Niewrzal
311b082838 satellite/metainfo: fix metainfo loop
This fix issues with passing observers between iteration methods.

It's not best implementation but I think we will need to optimize it
soon one way or another.

Change-Id: I574599bfd10822d84e2d2f1800bcd88e176a76ea
2020-12-18 10:38:12 +00:00
Michal Niewrzal
2111740236 Merge 'master' branch
Change-Id: Ib73af0ff3ce0e9a1547b0b9fc55bf88704f6f394
2020-12-18 09:13:24 +01:00
littleskunk
2437d5b171
satellite/access-grants: default auth service url (#4002)
* satellite/access-grants: default auth service url
2020-12-17 23:38:16 +01:00
paul cannon
d3604a5e90 satellite/repair: use survivability model for segment health
The chief segment health models we've come up with are the "immediate
danger" model and the "survivability" model. The former calculates the
chance of losing a segment becoming lost in the next time period (using
the CDF of the binomial distribution to estimate the chance of x nodes
failing in that period), while the latter estimates the number of
iterations for which a segment can be expected to survive (using the
mean of the negative binomial distribution). The immediate danger model
was a promising one for comparing segment health across segments with
different RS parameters, as it is more precisely what we want to
prevent, but it turns out that practically all segments in production
have infinite health, as the chance of losing segments with any
reasonable estimate of node failure rate is smaller than DBL_EPSILON,
the smallest possible difference from 1.0 representable in a float64
(about 1e-16).

Leaving aside the wisdom of worrying about the repair of segments that
have less than a 1e-16 chance of being lost, we want to be extremely
conservative and proactive in our repair efforts, and the health of the
segments we have been repairing thus far also evaluates to infinity
under the immediate danger model. Thus, we find ourselves reaching for
an alternative.

Dr. Ben saves the day: the survivability model is a reasonably close
approximation of the immediate danger model, and even better, it is
far simpler to calculate and yields manageable values for real-world
segments. The downside to it is that it requires as input an estimate
of the total number of active nodes.

This change replaces the segment health calculation to use the
survivability model, and reinstates the call to SegmentHealth() where it
was reverted. It gets estimates for the total number of active nodes by
leveraging the reliability cache.

Change-Id: Ia5d9b9031b9f6cf0fa7b9005a7011609415527dc
2020-12-17 21:30:17 +00:00
littleskunk
3feee9f4f8
satellite/accounting: default project limits (#4001) 2020-12-17 22:27:05 +01:00
Cameron Ayer
28eaae66af satellite/satellitedb: drop num_healthy_pieces column from injuredsegments
This column is no longer used as it has been replaced by the segment_health
column.

Change-Id: I6b4df89cd4f994d8418976f88e8c5f57615f8115
2020-12-17 20:17:08 +00:00
VitaliiShpital
f4bbd0f5df web/satellite: use brotli instead of gzip
WHAT:
we'll use brotli instead of gzip from now on

WHY:
better compression

Change-Id: Ibeadd6bfc783e9c15cf3f62f719af692071a7721
2020-12-17 19:23:44 +00:00
Michal Niewrzal
70ba4deea9 satellite/repair/checker: adjust irreparable part of repair checker
Change-Id: I0732104a97ba18a5359de3966cd692677a0ff790
2020-12-17 14:11:22 +00:00
Kaloyan Raev
9aa61245d0 satellite/audits: migrate to metabase
Change-Id: I480c941820c5b0bd3af0539d92b548189211acb2
2020-12-17 14:38:48 +02:00
Michal Niewrzal
2381ca2810 Merge 'master' branch
Change-Id: I4a3e45a2a2cdacfd87d16b148cfb4c6671c20b15
2020-12-17 13:17:17 +01:00
Fadila Khadar
c705237beb satellite/inspector: migrate to metabase
Change-Id: Ibfc12bf9bce0f9f065f4859a6818e5c18bbd526a
2020-12-17 11:56:08 +00:00
Michal Niewrzal
8d3ea9c251 satellite/repair/repairer: implement SegmentRepairer with metabase
Change-Id: I647c625e00a626c44e812602ad9bc3e85a7b602c
2020-12-17 10:47:21 +00:00
Kaloyan Raev
6206aa88e4 {satellite/metainfo,private/testplanet} use TestingAllSegments in tests
Change-Id: I8c641a24fabf3ea537312978a42501eab8d6a339
2020-12-17 09:58:31 +00:00
Egon Elbre
1728e45e55 satellite/metainfo/metabase: optimize DeleteBucket
Change-Id: If34cdeae0f688cb96717905fc8287c66ea3034be
2020-12-16 14:39:33 +00:00
Egon Elbre
12055e7864 all: minor cleanups
Change-Id: I4248dbe36a62a223b06135254b32851485a2eec1
2020-12-16 10:47:46 +00:00
Fadila Khadar
724b0f91eb satellite/gc: update tests to use metabase
Change-Id: I13c6c02a46254ea1d7176c0c6045fd24dd117a58
2020-12-16 10:38:24 +00:00
Egon Elbre
4706f01876 satellite/metainfo: add TestingAll{Segments,Objects}
Change-Id: Ia758c119d5ebd7bbb21216a463c99c2e9afcdeb0
2020-12-16 10:21:36 +00:00
Cameron Ayer
8c52bb3a18 satellite/checker: use numHealthy as segment health in repair queue
A few weeks ago it was discovered that the segment health function
was not working as expected with production values. As a bandaid,
we decided to insert the number of healthy pieces into the segment
health column. This should have effectively reverted our means of
prioritizing repair to the previous implementation.

However, it turns out that the bandaid was placed into the code which
removes items from the irreparable db and inserts them into the repair
queue.

This change: insert number of healthy pieces into the repair queue in the
method, RemoteSegment

Change-Id: Iabfc7984df0a928066b69e9aecb6f615253f1ad2
2020-12-15 17:16:59 -05:00
Jeff Wendling
9db28b92e7 satellite/console/wasm: fix issue from merge
there were two changes to this package: one that modified
some things and renamed access.go to main.go, and one that
reduced the binary sizes against main.go.

somehow the latter change was included on this branch but
the former was not, which made the folder contain both
main.go and access.go. building the package then fails
with duplicate definitions.

the easiest fix is to remove access.go which makes the folder
match the contents of the current master branch.

Change-Id: I7070a0d25a0399cef3166b8b2189427bc55587e0
2020-12-15 16:29:04 -05:00
Cameron Ayer
2ac72eaf16 satellite/repair/checker: add new monkit stats tagged with rs scheme
There is a new checker field called statsCollector. This contains
a map of stats pointers where the key is a stringified redundancy
scheme. stats contains all tagged monkit metrics. These metrics exist
under the key name, "tagged_repair_stats", which is tagged with the
name of each metric and a corresponding rs scheme.

As the metainfo observer works on a segment, it checks statsCollector
for a stats corresponding to the segment's redundancy scheme. If one
doesn't exist, it is created and chained to the monkit scope. Now we can call
Observe, Inc, etc on the fields just like before, and they have tags!

durabilityStats has also been renamed to aggregateStats.

At the end of the metainfo loop, we insert the aggregateStats totals into the
corresponding stats fields for metric reporting.

Change-Id: I8aa1918351d246a8ef818b9712ed4cb39d1ea9c6
2020-12-15 14:08:01 +00:00
Kaloyan Raev
4fba9921f6 satellite/metainfo/metabase: define ErrSegmentNotFound error class
This makes it easier to callers of GetSegmentByPosition to determine if
the segment is missing.

Change-Id: I2d8546dddf07dcf790d2f7c08d308ed589b34f2f
2020-12-15 15:48:41 +02:00
Michal Niewrzal
934ae32ca4 satellite/repair/checker: fix checker tests
Change-Id: I63d3368a07b800fdb10bb93b847eb32927b8c0dc
2020-12-15 10:47:42 +00:00
Michal Niewrzal
57f374af24 Merge 'master' branch
Change-Id: Idf6b10ea7ca94e4d232e6a3b6a38ef2e646ba197
2020-12-15 08:26:53 +01:00
Stefan Benten
9fe477899b satellite/satellitedb: add lint ignore rule to support staticcheck 2020.2
staticcheck 2020.2 is not liking our dbx files, so we need to ignore them.

Change-Id: I6becc3619bb088473f9776d0878ce240d4935936
2020-12-14 21:16:31 +00:00
Jessica Grebenschikov
3cc98de3ee satellite/console/wasm: reduce size to <9MB
Make changes so that we only import the necessary files from the console package so that the generated wasm code is as small as possible.

This change gets the compiled wasm code down to 8.6MB uncompressed and 2MB when compressed with `gzip --best`.

https://review.dev.storj.io/c/storj/storj/+/3396

Change-Id: Ifdd4be285810757b46bbbe43327c0d0139e5f8f7
2020-12-14 16:41:39 +00:00
Ivan Fraixedes
2dddcffe43 satellite/accounting/rollout: Remove unused variable
Remove a declared variable that's set by never read nor passed to any
function so it's unused code.

Change-Id: I8daf9d1f71d29ab39d7a80011d1b4813ada1c67d
2020-12-14 14:11:41 +00:00
Kaloyan Raev
fc85179a19 satellite/metainfo: refactor SegmentLocation.Index to SegmentPosition
Change-Id: Ic9403c8126712693326dd83d6ba4f3b84be3e0c7
2020-12-14 13:35:53 +02:00
Kaloyan Raev
7d8f19e94d satellite/metainfo: metainfo loop should yield StreamID for segments
Change-Id: If6c86add75ce79ffcfe95353225719c7d4b5a459
2020-12-14 13:04:38 +02:00
Michal Niewrzal
e7e6985ae9 satellite/metainfo/metabase: add UpdateSegmentPieces method
We need to be able to update just remote_pieces column in DB. This is
needed at least for repair process.

Change-Id: I20dcc9b06babfefbbf102f32b1d14946379f26c2
2020-12-14 10:25:09 +00:00
Kaloyan Raev
2bb010e7c5 cmd: remove segment reaper
It was designed to detect and remove zombie segments in the PointerDB.
This tool should be not relevant with the MetabaseDB anymore.

Change-Id: I112552203b1329a5a659f69a0043eb1f8dadb551
2020-12-14 09:36:37 +00:00
Michal Niewrzal
7e6e0d3e2e satellite/metainfo: metainfo loop implementation with metabase
Change-Id: Iadac469519de605a88e624df23265289771b2006
2020-12-11 16:15:57 +01:00
Brandon Iglesias
ca1e6b9756
Adding Fastly (#3994) 2020-12-11 15:53:05 +02:00
Stefan Benten
8fe829d5fd
build: add wasm bits to Dockerfile and bump to go v1.15.6 (#3992) 2020-12-11 02:23:39 +01:00
Jessica Grebenschikov
0649d2b930 satellite/repair: improve contention for injuredsegments table on CRDB
We migrated satelliteDB off of Postgres and over to CockroachDB (crdb), but there was way too high contention for the injuredsegments table so we had to rollback to Postgres for the repair queue. A couple things contributed to this problem:
1) crdb doesn't support `FOR UPDATE SKIP LOCKED`
2) the original crdb Select query was doing 2 full table scans and not using any indexes
3) the SLC Satellite (where we were doing the migration) was running 48 repair worker processes, each of which run up to 5 goroutines which all are trying to select out of the repair queue and this was causing a ton of contention.

The changes in this PR should help to reduce that contention and improve performance on CRDB.
The changes include:
1) Use an update/set query instead of select/update to capitalize on the new `UPDATE` implicit row locking ability in CRDB.
- Details: As of CRDB v20.2.2, there is implicit row locking with update/set queries (contention reduction and performance gains are described in this blog post: https://www.cockroachlabs.com/blog/when-and-why-to-use-select-for-update-in-cockroachdb/).

2) Remove the `ORDER BY` clause since this was causing a full table scan and also prevented the use of the row locking capability.
- While long term it is very important to `ORDER BY segment_health`, the change here is only suppose to be a temporary bandaid to get us migrated over to CRDB quickly. Since segment_health has been set to infinity for some time now (re: https://review.dev.storj.io/c/storj/storj/+/3224), it seems like it might be ok to continue not making use of this for the short term. However, long term this needs to be fixed with a redesign of the repair workers, possible in the trusted delegated repair design (https://review.dev.storj.io/c/storj/storj/+/2602) or something similar to what is recommended here on how to implement a queue on CRDB https://dev.to/ajwerner/quick-and-easy-exactly-once-distributed-work-queues-using-serializable-transactions-jdp, or migrate to rabbit MQ priority queue or something similar..

This PRs improved query uses the index to avoid full scans and also locks the row its going to update and CRDB retries for us if there are any lock errors.

Change-Id: Id29faad2186627872fbeb0f31536c4f55f860f23
2020-12-10 09:51:26 -08:00
Michal Niewrzal
b3acc1101a Merge 'master' branch
Change-Id: Iee99400c7095770e61cde94b3b2c8eb0ddec463d
2020-12-10 15:42:52 +01:00
Michal Niewrzal
c2a97aeb14 satellite/satellitedb: add ListAllBuckets method
We need to be able to list all buckets in DB without knowing project ID.
This method will be used to list buckets for metainfo loop
implementation based on metabase.

Change-Id: Iac75af0eee4f31e80a15577575a8249cbca787b2
2020-12-10 14:19:27 +00:00