With the new phase 3 order submission, orders can be added to the
storage and bandwidth rollup tables at timestamps before the most recent
rollup was run. This change shifts the start time of each new rollup
window to account for any unexpired orders that might have been added
since the previous rollup.
A satellitedb migration is necessary to allow upserts in the
accounting_rollups table when entries with identical node_ids and
start_times are inserted.
Change-Id: Ib3022081f4d6be60cfec8430b45867ad3c01da63
It turns out we need to make 2 more changes in order for the new order submission phase 3 to get deployed.
This PR makes 2 changes:
1) when the rollup service deletes tallies, we now keep tallies around until orders expire (vs 1 day like before).
2) the reported rollup chore will now write the storagenode_bandwidth_rollups to a new table _phase2 as an intermediary step so it doesn't conflict with phase 3 order settlement.
These changes need to be deployed for 2 days before we can turn on phase 3 of the new orders settlement workflow.
Change-Id: Iafbff577ba7d55f8f17b7db857311b2ce799de60
Doing it at the ProcessOrders level was insufficient: the endpoints
make multiple database calls. It was a misguided attempt to only
have one spot enter the semaphore. By putting it in the endpoint
we can not only be sure that the concurrency is correctly limited
but it can be configurable easily.
Change-Id: I937149dd077adf9eb87fce52a1a17dc0afe96f64
we have thundering herds of order submissions that take all of the
database connections causing temporary periodic outages. limit
the amount of concurrent order processing to 2.
Change-Id: If3f86cdbd21085a4414c2ff17d9ef6d8839a6c2b
Why: We need a way to cut down on database traffic due to bandwidth
measurement and tracking.
What: This changeset is the Satellite side of settling orders in 1 hr windows.
See design doc for more details: https://review.dev.storj.io/c/storj/storj/+/1732
Change-Id: I2e1c151e2e65516ebe1b7f47b7c5f83a3a220b31
What:
Use the github.com/jackc/pgx postgresql driver in place of
github.com/lib/pq.
Why:
github.com/lib/pq has some problems with error handling and context
cancellations (i.e. it might even issue queries or DML statements more
than once! see https://github.com/lib/pq/issues/939). The
github.com/jackx/pgx library appears not to have these problems, and
also appears to be better engineered and implemented (in particular, it
doesn't use "exceptions by panic"). It should also give us some
performance improvements in some cases, and even more so if we can use
it directly instead of going through the database/sql layer.
Change-Id: Ia696d220f340a097dee9550a312d37de14ed2044
This ensures that rows are closed to avoid leaks.
Also verifies that Err() is called, to ensure that no
error is left behind.
Change-Id: Idd1bec9bf479f40021da67b2c80ce83033149469
this way we don't have to do 2 steps, and by using the ctid, postgres
is going to do two very efficient prefix scans.
Change-Id: Ia9d0546cdf0a1af67ceec9cd508d336a5fdcbdb9
also remove the continuation support from the queue, otherwise
we may end up sequential scanning the entire table to get
a few rows at the end.
then, in the core, instead of looping both to get a big enough
batch inside of the queue, as well as outside of it to ensure
we consume the whole queue, just get a single batch at a time.
also, make the queue size configurable because we'll need to
do some tuning in production.
Change-Id: If1a997c6012898056ace89366a847c4cb141a025
by doing an indexed anti-join we're able to reduce the time to
select the pending orders by over 10x on postgres. this should
help us process pending orders much more quickly.
it probably won't do as good a job on cockroach because it does
not do an indexed anti-join and instead does a hash join after
scanning the entire consumed serials table. we should either
remove orders entirely or try to make that more efficient
when necessary.
Change-Id: I8ca0535acd21c51e74955b24c9b86d20e4f2ff9c
This change adds two new tables to process orders as fast as we used
to but in an asynchronous manner and with hopefully less storage
usage. This should help scale on cockroach, but limits us to one
worker. It lays the groundwork for the order processing pipeline to
be queue rather than database driven.
For more details, see the added fast billing changes blueprint.
It also fixes the orders db so that all the timestamps that are
passed to columns that do not contain a time zone are converted to
UTC at the last possible opportunity, making it less likely to use
the APIs incorrectly. We really should migrate to include timezones
on all of our timestamp columns.
Change-Id: Ibfda8e7a3d5972b7798fb61b31ff56419c64ea35
Enhance the documentation of the UseSerialNumber method (interface and
implementation) and add several missing dots in doc comments of the
methods of the same interface and implementation.
Change-Id: I792cd344f0d2542e060fa2ec288b71231cae69de
A uuid.UUID is an array of bytes, and slicing it refers to the
underlying value, much like taking the address. Because range
in Go reuses the same value for every loop iteration, this means
that later iterations would overwrite earlier stored project
ids. We fix that by making a copy of the value before slicing it
for every loop iteration.
Change-Id: Iae3f11138d11a176ce360bd5af2244307c74fdad
Since incoming times may be in any time zone, and we want the output
to be in UTC and for them to have 00:00:00 hours, minutes and seconds
we first convert the incoming timestamp to UTC before doing the
truncate to the day and adding a day.
Because the old code always returned a timestamp that was in the
future, this is just for efficiency.
Change-Id: Ie692d47bca8691e73852c822d5c56cf8773d99b4
warning: databases migrated to version 77 before this commit
is merged must be manually re-migrated. this should not be a
problem for anything but staging databases.
Change-Id: Ie1631c48379472352014183ee43f1465e22200f7
this commit introduces the reported_serials table. its purpose is
to allow for blind writes into it as nodes report in so that we have
minimal contention. in order to continue to accurately account for
used bandwidth, though, we cannot immediately add the settled amount.
if we did, we would have to give up on blind writes.
the table's primary key is structured precisely so that we can quickly
find expired orders and so that we maximally benefit from rocksdb
path prefix compression. we do this by rounding the expires at time
forward to the next day, effectively giving us storagenode petnames
for free. and since there's no secondary index or foreign key
constraints, this design should use significantly less space than
the current used_serials table while also reducing contention.
after inserting the orders into the table, we have a chore that
periodically consumes all of the expired orders in it and inserts
them into the existing rollups tables. this is as if we changed
the nodes to report as the order expired rather than as soon as
possible, so the belief in correctness of the refactor is higher.
since we are able to process large batches of orders (typically
a day's worth), we can use the code to maximally batch inserts into
the rollup tables to make inserts as friendly as possible to
cockroach.
Change-Id: I25d609ca2679b8331979184f16c6d46d4f74c1a6
everyone was importing it as dbx anyway. why should it be
named satellitedb? so yeah just pass the "-p dbx" flag.
Change-Id: I5efa669f4f00f196b38a9acd0d402009475a936f
This reverts commit 8e242cd012.
Revert because lib/pq has known issues with context cancellation.
These issues need to be resolved before these changes can be merged.
Change-Id: I160af51dbc2d67c5449aafa406a403e5367bb555
this will allow for some nice runtime analysis down the road.
also, this allows for wrapping database handles in a way that
can interact with these contexts
requires https://review.dev.storj.io/c/storj/dbx/+/514
Change-Id: Ib087b7cd73296dd2c1e0331314da34d861f61d2b
When an uplink requests an upload or download from the satellite we are trackig the
allocated bandwidth twice. The value in bucket_bandwidth_rollups is used
for project limits but the value in storagenode_bandwidth_rollups is not
used at all. We can increase the performance by removing it. Uplinks
will get a faster response from the satellite.
Change-Id: Icccd41f94107ef34668f30f99bf5f728c384b07e
any database error doesn't mean the order wasn't found. for example
in cockroach it may say that the transaction is aborted. then what?
maybe we get big old row level deadlocks like we've observed? so
instead explicitly check for ErrNoRows to reject the order and bail
out otherwise. the surrounding logic will give it a retry.
Change-Id: I6e1f8f6e6a6def3e45b44f5088cbdc158e1098e4
Transactions in our code that might need to work against CockroachDB
need to be retried in the event of a retryable error. The WithTx
helper functions in dbutil and dbx do that automatically. I am changing
this code to use those helpers instead.
Change-Id: Iaf492af35471931125f2b7365aa4338f44154881
Backstory: I needed a better way to pass around information about the
underlying driver and implementation to all the various db-using things
in satellitedb (at least until some new "cockroach driver" support makes
it to DBX). After hitting a few dead ends, I decided I wanted to have a
type that could act like a *dbx.DB but which would also carry
information about the implementation, etc. Then I could pass around that
type to all the things in satellitedb that previously wanted *dbx.DB.
But then I realized that *satellitedb.DB was, essentially, exactly that
already.
One thing that might have kept *satellitedb.DB from being directly
usable was that embedding a *dbx.DB inside it would make a lot of dbx
methods publicly available on a *satellitedb.DB instance that previously
were nicely encapsulated and hidden. But after a quick look, I realized
that _nothing_ outside of satellite/satellitedb even needs to use
satellitedb.DB at all. It didn't even need to be exported, except for
some trivially-replaceable code in migrate_postgres_test.go. And once
I made it unexported, any concerns about exposing new methods on it were
entirely moot.
So I have here changed the exported *satellitedb.DB type into the
unexported *satellitedb.satelliteDB type, and I have changed all the
places here that wanted raw dbx.DB handles to use this new type instead.
Now they can just take a gander at the implementation member on it and
know all they need to know about the underlying database.
This will make it possible for some other pending code here to
differentiate between postgres and cockroach backends.
Change-Id: I27af99f8ae23b50782333da5277b553b34634edc
* satellite/satellitedb: Always release savepoint
Release the savepoint when processing orders in any case.
* satellite/satellitedb: Wrap errors exec savepoints
Wrap the errors returned by the execution of savepoints operations when
processing orders.
* V3-2529: Add DB savepoint to fix issue with postgres. Add test force a rejected order
Co-Authored-By: Ivan Fraixedes <ivan@fraixed.es>
* Update satellite/satellitedb/orders.go
Creates a new chore, dbcleanup, which can be used for routine deletion of items from the satellite database and adds functionality for deletion of expired serial numbers
* fix orderdDB methods to take correct args
* update tally to save projectID in correct format
* update var names in splitBucket test
* changes per CR comments