this way we don't have to do 2 steps, and by using the ctid, postgres
is going to do two very efficient prefix scans.
Change-Id: Ia9d0546cdf0a1af67ceec9cd508d336a5fdcbdb9
also remove the continuation support from the queue, otherwise
we may end up sequential scanning the entire table to get
a few rows at the end.
then, in the core, instead of looping both to get a big enough
batch inside of the queue, as well as outside of it to ensure
we consume the whole queue, just get a single batch at a time.
also, make the queue size configurable because we'll need to
do some tuning in production.
Change-Id: If1a997c6012898056ace89366a847c4cb141a025
In jaeger, it shows that this function gets called repetitively in
a single request. Most of the time, it's less than 1ms. Therefore, it
doesn't add much value in our trace but create noises.
Change-Id: I20234f36bbcf0fc22f91e5e1a5634c0cad577ed0
struct
This change is removing ProjectID from code. Next change will be about
dropping this colum from DB table.
Change-Id: Idb949e2829e2c304a2b6b011259c7cc7667082e1
the initial calculations for the historical values of comp_at_rest
were wrong. because our historical data only included total amounts
as well as compensation for bandwidth, the at rest value was
calculated as
at_rest = total - bandwidth
unfortunately, that calculation did not take surge pricing into
account correctly. the at rest and bandwidth values do not
include surge pricing, but the total that was used did. so what
we actually calculated was
no_surge_at_rest = surge_total - no_surge_bandwidth
which will create a value that is too large. this migration
fixes the calculation for imports that are old enough and
of a non-negligable difference.
Change-Id: I61eb0b670510f6d7fb8fc3de39ba79150fac10eb
* add monkit stat new_remote_segments_needing_repair, which reports the
number of new unhealthy segments in the repair queue since the previous
checker iteration
Change-Id: I2f10266006fdd6406ece50f4759b91382059dcc3
This attempts to add a README.md to help create consistent migrations
that maximize our test coverage and do not include unnecessary
statements.
It also adds a feature to have an `-- OLD DATA --` section as well
as a `-- NEW DATA --` section so that we can fix mistakes made in
previous snapshots (like a row that was forgotten to be added when a
table was created) without editing them going forward.
Change-Id: I28a786f8ef163cae1de1bb08f61af1e1104b0a88
What: As soon as a node passes the vetting criteria (total_audit_count and total_uptime_count
are greater than the configured thresholds), we set vetted_at to the current timestamp.
Why: We may want to use this timestamp in future development to select new vs vetted nodes.
It also allows flexibility in node vetting experiments and allows for better metrics around
vetting times.
Please describe the tests: satellitedb_test: TestUpdateStats and TestBatchUpdateStats make sure vetted_at is set appropriately
Please describe the performance impact: This change does add extra logic to BatchUpdateStats and UpdateStats and
commits another variable to the db (vetted_at), but this should be negligible.
Change-Id: I3de804549b5f1bc359da4935bc859758ceac261d
To avoid including multiple months in a single invoice, we need all
inspector's invoice commands to run in for specific period.
See https://storjlabs.atlassian.net/browse/USR-725
Change-Id: I3637dc189234f02350daca8d897c21765762ea55
There is a subtle problem when one does a cast with `::date`. Observe:
teststorj=# set timezone = 'US/Eastern';
SET
teststorj=# select (timestamp with time zone '2020-02-01 00:00:00+00')::date;
date
------------
2020-01-31
(1 row)
teststorj=# set timezone = 'UTC';
SET
teststorj=# select (timestamp with time zone '2020-02-01 00:00:00+00')::date;
date
------------
2020-02-01
(1 row)
In order to correctly determine the date a timestamp is in, one has to
explicitly pick the time zone that the date truncation should use
otherwise postgres will use whatever setting the client has. These
tests were failing for me locally, because I run my postgres in
the US/Eastern time zone to try to tickle these bugs out. So it
should be `(x at time zone 'UTC')::date` instead of just `x::date`.
Change-Id: I4e9e32d4b53abc6165a4d0474f4702f8b9f801c7
Add a flag that allows us to easily switch disqualification from
suspension mode on or off. A node will only be disqualified from
suspension mode if it has been suspended for longer than the grace
period AND the SuspensionDQEnabled flag is true.
Change-Id: I9e67caa727183cd52ab2042b0a370a1bcaebe792
CreateTables hasn't been quite true for a while now, rename to
MigrateToLatest to be clearer in it's behavior.
Change-Id: Ida48e95122a5d9b7a814e922d3698e00024a2ba7
The UpdateAddress method use to be used when storage node's checked in with the Satellite, but once the contact service was created this method was no longer used. This PR finally removes it.
Change-Id: Ib3f83c8003269671d97d54f21ee69665fa663f24
Sometimes nodes who have gracefully exited will still be holding pieces
according to the satellite. This has some unintended side effects
currently, such as nodes getting disqualified after having successfully
exited.
* When the audit reporter attempts to update node stats, do not update
stats (alpha, beta, suspension, disqualification) if the node has
finished graceful exit (audit/reporter_test.go TestGracefullyExitedNotUpdated)
* Treat gracefully exited nodes as "not reputable" so that the repairer
and checker do not count them as healthy (overlay/statdb_test.go
TestKnownUnreliableOrOffline, repair/repair_test.go
TestRepairGracefullyExited)
Change-Id: I1920d60dd35de5b2385a9b06989397628a2f1272
Currently Cockroach isn't performant for concurrent database setup and
tear-down. Instead of a single instance allow setting multiple potential
connection strings and let the tests pick one connection string
randomly.
This improves test duration by ~10 minutes.
While we are at significantly changing how pgtest works, introduce
helper PickPostgres and PickCockroach for selecting the database to
reduce code duplications in multiple places.
Change-Id: I8ad171d5c4c8a4fc081ec2ae9bdd0cc948a80619
In cases like the segment reaper script connecting to the metainfodb,
we don't want a db migration to happen automatically when we call
metainfo.NewStore. This adds MigrateToLatest method for postgreskv
and cockroackv, and calls MigrateToLatest in places where NewStore used
to create tables.
Change-Id: I682d0f26d609af0601dfdb32a24866cdf5d32a7e
A/B indicates that B is a subtest of A, however in this case they
represent a configuration of the test, not a subtest.
Change-Id: I64eed5d5bcb12759e54fe4b5373f8e88488e50f7
Update unknown_audit_reputation_alpha and unknown_audit_reputation_beta.
Add test to verify that BatchUpdateStats properly modifies unknown audit
alpha/beta
Change-Id: I0d5f9cac96a99f64905cf575b772402db0756a9d
If a node is suspended and receives an unknown or failing audit,
disqualify them if the grace period (default 1w in production) has
passed.
Migrate the nodes table so any node that is currently suspended gets
unsuspended when the satellite starts up.
Change-Id: I7b81c68026f823417faa0bf5e5cb5e67c7156b82
This reverts commit 105dc7acc6.
Reason for revert: Recent changes to the Postgres query plan seems to want to use this index now. Reverting until we have time to analyze what's happening.
Change-Id: I74b4b5a8f15c3850d8a958a29f51dbc80e7c282c
Alpha=1 and beta=0 are the expected first values for any alpha/beta
reputation system we are using in the codebase. So we are removing the
configurability of these values.
Change-Id: Ic61861b8ea5047fa1438ea6609b1d0048bf0abc3
Whenever the node's reputation is updated, if its unknown audit
reputation is below the suspension threshold, its suspension field
is set to the current time. This could overwrite the previous
"suspendedAt" value resulting a node that never reaches the end of
its suspension.
Also log whenever a node is disqualified or its suspension status
changes
Change-Id: I5e8c8f1c46f66d79cb279b5b16a84fe03f533deb
Reduce the number of non-methods to reduce funcs in the namespace also
combine a func to slightly condense the code more.
Change-Id: Ifbe728eb8c8ca4c981df648decd259c2097b6b40
uuid.UUID implements driver.Value so it can be directly used as a
scannable result.
Replace uses of dbutil.BytesToUUID with uuid.FromBytes.
Change-Id: I51a670185ceb3cc2199d5aa2b76bc3fc191ca8fe