Commit Graph

130 Commits

Author SHA1 Message Date
Egon Elbre
0e3be60b79 satellite/satellitedb: simplify migrate step
Change-Id: Ie4574144fb6ddd057d5fca740702c59fbdb2c5e4
2020-05-12 18:27:07 +03:00
Stefan Benten
e23bd806b4
satellite/accounting: separate usage and bandwidth limit (#3878) 2020-05-12 15:01:15 +02:00
Ethan
acf53bea4d satellite/orders;accounting: Add monthly project download bandwidth rollup
See https://storjlabs.atlassian.net/browse/SM-776

Change-Id: Ifd5cccea43c556fd59822d17344f399cfe9a7164
2020-05-04 15:49:57 +00:00
Egon Elbre
8928399d02 all: rename CreateTables to MigrateToLatest
CreateTables hasn't been quite true for a while now, rename to
MigrateToLatest to be clearer in it's behavior.

Change-Id: Ida48e95122a5d9b7a814e922d3698e00024a2ba7
2020-04-30 07:21:17 +00:00
Ivan Fraixedes
03871d17c3 satellite/satellitedb: Update ticket ref
Update a reference to a ticket in a code comment.

Change-Id: Ib82220e94527482c5ca1a58d8614b919d1113ab5
2020-04-27 08:50:41 +00:00
Moby von Briesen
72b93f3120 satellite/satellitedb: disqualify suspended nodes when the grace period passes
If a node is suspended and receives an unknown or failing audit,
disqualify them if the grace period (default 1w in production) has
passed.

Migrate the nodes table so any node that is currently suspended gets
unsuspended when the satellite starts up.

Change-Id: I7b81c68026f823417faa0bf5e5cb5e67c7156b82
2020-04-22 15:45:00 -04:00
Ethan Adams
60e07f0a8b Revert "satellite/accounting: Remove unnecessary index bucket_bandwidth_rollups_project_id_action_interval_index"
This reverts commit 105dc7acc6.

Reason for revert: Recent changes to the Postgres query plan seems to want to use this index now.  Reverting until we have time to analyze what's happening.

Change-Id: I74b4b5a8f15c3850d8a958a29f51dbc80e7c282c
2020-04-22 14:49:04 +00:00
Ethan
105dc7acc6 satellite/accounting: Remove unnecessary index bucket_bandwidth_rollups_project_id_action_interval_index
See https://storjlabs.atlassian.net/browse/SM-738

Change-Id: I9ba3cc3fbff9f13fc0b95d25feee5a19e5a5c486
2020-04-21 16:43:09 +00:00
Ethan
4cd86ff780 satellite/accounting: Add index on bucket_bandwidth_rollups for action, interval_start, and project_id
See https://storjlabs.atlassian.net/browse/SM-551 for details

Change-Id: I104c4e87d5aef500cc4a3893817763808f76c484
2020-04-17 19:14:45 +00:00
Jeff Wendling
2ded64ba2c satellite/compensation: more fixes to get prod running smoothly
Change-Id: I13a76d9d49222fb10796415a015f224d4084fde3
2020-04-07 10:10:27 +00:00
Jennifer Johnson
1547e791a3 satellitedb: remove free_bandwidth column from nodes table
Change-Id: I9d1d3de9216c6533c1042ef473631721a011d086
2020-04-06 09:30:28 +00:00
Jennifer Johnson
d77f3b8786 satellitedb/migrate: set vetted_at backfill to now.day
Change-Id: Ib2b12be43dbd3f3705b1891bc703ae15abb75e09
2020-03-30 16:50:23 +00:00
Ethan
df462d7265 satellite/accounting: Add index on bucket_bandwidth_rollups to minimize full table scans
https://storjlabs.atlassian.net/browse/SM-545

Change-Id: I5599a72a991d70236f17beca027e9bc032777177
2020-03-26 19:53:50 +00:00
Jessica Grebenschikov
aeab599d21 satellitedb: removed unused id on storagenode_storage_tallies table, add index on node_id
The goal of this change is to improve the storagenode_storage_tallies table by removing the unneeded id column that is not being used but only taking up space, and also to add an index on a different column that needs it. Removing and adding a column seems simple, but ended up being more complicated because of some cockroachdb limitations.

The cockroachdb limitation when trying to remove a column from a table and create a new primary key are:
1. only allows primary key creation at table creation time (docs: https://www.cockroachlabs.com/docs/stable/primary-key.html)
2. table drop or rename is performed async and cannot be done in a transaction (issue: https://github.com/cockroachdb/cockroach/issues/12123, https://github.com/cockroachdb/cockroach/issues/22868)

To address these differences between cockroachdb  and Postgres, this PR performs different migrations for the two database. The Postgres migration is straight forward and what you would expect, but the cockroach migration has two main changes:
1. To change a primary key, use the recommended process from the cockroachdb docs to create a new table with the new primary key you want and then migrate the data.
2. In order to do 1, we needed to do the new table renaming in a separate transaction from the data migration.

Ref: SM-65

Change-Id: Idc9aee3ab57aa4d5570e3d2980afea853cd966bf
2020-03-20 14:39:44 -07:00
Jennifer Johnson
9b78473c0c satellitedb: adds vetted_at nullable timestamp to nodes table
Change-Id: I42d5a396b4eecbad26b683c6aee51e043d2ff034
2020-03-20 01:37:28 +00:00
Jeff Wendling
41887883f3 satellite/satellitedb: check indexes on migration
Change-Id: I5ba7ae2b512d77c70405ce332158f12128e27eed
2020-03-13 10:45:22 +00:00
Jessica Grebenschikov
803e2930f4 satellite: use IP for all uplink operations, use hostname for audit and repairs
My understanding is that the nodes table has the following fields:
- `address` field which can be a hostname or an IP
- `last_net` field that is the /24 subnet of the IP resolved from the address

This PR does the following:
1) add back the `last_ip` field to the nodes table
2) for uplink operations remove the calls that the satellite makes to `lookupNodeAddress` (which makes the DNS calls to resolve the IP from the hostname) and instead use the data stored in the nodes table `last_ip` field. This means that the IP that the satellite sends to the uplink for the storage nodes could be approx 1 hr stale. In the short term this is fine, next we will be adding changes so that the storage node pushes any IP changes to the satellite in real time.
3) use the address field for repair and audit since we want them to still make DNS calls to confirm the IP is up to date
4) try to reduce confusion about hostname, ip, subnet, and address in the code base

Change-Id: I96ce0d8bb78303f82483d0701bc79544b74057ac
2020-03-11 09:11:40 -07:00
Moby von Briesen
1baf1bd249 satellite/satellitedb: Add index on num_healthy_pieces column in injuredsegments table
We missed this in the migration that added the num_healthy_pieces
column. It exists in dbx, but not on the actual satellite table.

Change-Id: If16b5ec2325d56406250298531b3285215188bf3
2020-03-10 16:59:35 +00:00
Bill Thorp
e99e675fb1 satellite/satellitedb: use time zones with all timestamps
The migration was broken into one migration per table to reduce table locking and reduce the
chances of failure due to SQL timeouts.

Of the 14 fields that lacked time zones, only the 3 named 'interval_start` seemed to have non-UTC data in them.
These fields are fixed in the migration by removing the +00 and adding  AT TIME ZONE current_setting('TIMEZONE')
Field with good data are migrated by adding AT TIME ZONE 'UTC'

Note that postgres's timezone() is different than cockroach's timezone() so AT TIME ZONE is used.

https://storjlabs.atlassian.net/browse/SM-104

Change-Id: I410f2f1d7c11b143f17844347f37e6f4b1e70fce
2020-03-05 21:11:25 +00:00
Moby von Briesen
f495544c56 satellite/satellitedb/dbx: add fields to node table for placing nodes into suspended mode for too many unknown-error audits
Change-Id: Iac9a619e5c08377de87ffdf4acdd0155027f5eb3
2020-03-03 03:30:59 +00:00
Jeff Wendling
1db087cfba satellite/satellitedb: migration to create tables for compensation
these tables are used in future commits with respect to the new
storagenode payments code. if we create them now, it will make
backfilling them with historical data easier.

Change-Id: I3c08c9770ec5b2baa38b4f2fd18c2f07746a61c2
2020-02-27 17:34:50 +00:00
Moby von Briesen
4e5a7f13c7 satellite/repair/queue: Prioritize selection of items off repair queue by segment health
Add a column to the repair queue table in the satellite db for healthy
piece count. When an item is selected from the repair queue, the least
durable segment that has not been attempted in the past hour should be
selected first. This prevents our repairer from getting stuck doing work
on segments that are close to the repair threshold while allowing
segments that are more unhealthy to degrade further.

The migration also clears the repair queue so that the migration runs
quickly and we can properly account for segment health in future repair
work.

We do not select items off the repair queue that have been attempted in
the past six hours. This was changed from on hour to allow us time to
try a wider variety of segments when the repair queue is very large.

Change-Id: Iaf183f1e5fd45cd792a52e3563a3e43a2b9f410b
2020-02-26 09:54:16 -05:00
Jeff Wendling
f671eb2beb satellite/satellitedb: use queue for orders to get back fast billing
This change adds two new tables to process orders as fast as we used
to but in an asynchronous manner and with hopefully less storage
usage. This should help scale on cockroach, but limits us to one
worker. It lays the groundwork for the order processing pipeline to
be queue rather than database driven.

For more details, see the added fast billing changes blueprint.

It also fixes the orders db so that all the timestamps that are
passed to columns that do not contain a time zone are converted to
UTC at the last possible opportunity, making it less likely to use
the APIs incorrectly. We really should migrate to include timezones
on all of our timestamp columns.

Change-Id: Ibfda8e7a3d5972b7798fb61b31ff56419c64ea35
2020-02-24 17:07:07 +00:00
Qweder93
dc075eaa96 satellite/payments : deposit bonuses (credits) added
Change-Id: Ib151bbb9b02d655fa619c53bfbc04ed6f3bb39e0
2020-02-11 11:11:42 +00:00
Jeff Wendling
d20db90cff private/dbutil/txutil: create new transactions for retries
it was noticed that if you had a long lived transaction A that
was blocking some other transaction B and A was being aborted
due to retriable errors, then transaction B was never given
priority. this was due to using savepoints to do lightweight
retries.

this behavior was problematic becaue we had some queries blocked
for over 16 hours, so this commit addresses the issue with two
prongs:

    1. bound the amount of time we will retry a transaction
    2. create new transactions when a retry is needed

the first ensures that we never wait for 16 hours, and the value
chosen is 10 minutes. that should be long enough for an ample
amount of retries for small queries, and huge queries probably
shouldn't be retried, even if possible: it's more preferrable to
find a way to make them smaller.

the second ensures that even in the case of retries, queries that
are blocked on the aborted transaction gain priority to run.

between those two changes, the maximum stall time due to retries
should be bounded to around 10 minutes.

Change-Id: Icf898501ef505a89738820a3fae2580988f9f5f4
2020-02-01 18:34:28 +00:00
littleskunk
90cf78e6f2 satellite/coinpayments: fix migration
The old migration was not working. It was updateding pending (status 0)
and failed (status -1) to completed (status 100).

Change-Id: I808ff3cc692fe6c698ce26a8b411b134e67b752b
2020-01-25 00:12:35 +00:00
Egon Elbre
fc2766eefc private/testplanet: flatten migration for running tests
Currently Cockroach DB setup takes a significant amount of time.
This flattens the database setup into a single query,
which improves the test time significantly.

The migration tests still test each migration separately.

Change-Id: Iaca16f34a6af3926fa2b5ebf618f939fd59460b3
2020-01-22 15:09:11 +00:00
Ethan
21a5d70a83 satellite/metainfo: Rate limiting - API requests
Limits how many times metainfo APIs can be called per second by project ID. If limit is exceeded, the API will return Unauthorized/Too Many requests.

Limit per second and the size of the limiter cache per project are configurable, as well as whether the limiter is enabled.

Tests added/updated for the new rate_limit field in projects table.
Tests added for exceeding limits and disableing limiter.

Change-Id: Ic8ad102de3b690a475809d4f684156d5715f20fa
2020-01-21 14:25:04 +00:00
Egon Elbre
1abfe42142 satellite: use tagsql
Change-Id: I2170dee409fb0c2fe85913ddd36e7811a3b853ed
2020-01-19 14:39:16 +02:00
Egon Elbre
59d06644b9 private/migrate: switch to tagsql
Also added temporary types withRebind and withTagTx,
which will be later removed. Currently they help to avoid
changing the whole codebase at the same time.

Change-Id: I7f07ba8f4709a23a463bfa67464628665a05808f
2020-01-19 14:39:16 +02:00
Yaroslav
d8368d0b30 satellite/payments: coinpayments add completed status, treat received status as pending, add balance for completed transactions only
Change-Id: I20494bdddfda6d4f37ba2c5b6f7955cd29a6d798
2020-01-17 17:26:34 +00:00
Isaac Hess
cd48dc369a satellite/satellitedb: Remove unused indexes
Change-Id: I875b94574eacf9d2df537bcf1f42f30e0bf60ab9
2020-01-16 16:06:21 -07:00
Jeff Wendling
696d98a232 satellite/satellitedb: fix nitpicks and timestamp issue found in review
warning: databases migrated to version 77 before this commit
is merged must be manually re-migrated. this should not be a
problem for anything but staging databases.

Change-Id: Ie1631c48379472352014183ee43f1465e22200f7
2020-01-16 21:22:38 +00:00
Jeff Wendling
78c6d5bb32 satellite/satellitedb: reported_serials table for processing orders
this commit introduces the reported_serials table. its purpose is
to allow for blind writes into it as nodes report in so that we have
minimal contention. in order to continue to accurately account for
used bandwidth, though, we cannot immediately add the settled amount.
if we did, we would have to give up on blind writes.

the table's primary key is structured precisely so that we can quickly
find expired orders and so that we maximally benefit from rocksdb
path prefix compression. we do this by rounding the expires at time
forward to the next day, effectively giving us storagenode petnames
for free. and since there's no secondary index or foreign key
constraints, this design should use significantly less space than
the current used_serials table while also reducing contention.

after inserting the orders into the table, we have a chore that
periodically consumes all of the expired orders in it and inserts
them into the existing rollups tables. this is as if we changed
the nodes to report as the order expired rather than as soon as
possible, so the belief in correctness of the refactor is higher.

since we are able to process large batches of orders (typically
a day's worth), we can use the code to maximally batch inserts into
the rollup tables to make inserts as friendly as possible to
cockroach.

Change-Id: I25d609ca2679b8331979184f16c6d46d4f74c1a6
2020-01-15 19:21:21 -07:00
Egon Elbre
64fb2d3d2f Revert "dbutil: statically require all databases accesses to use contexts"
This reverts commit 8e242cd012.

Revert because lib/pq has known issues with context cancellation.
These issues need to be resolved before these changes can be merged.

Change-Id: I160af51dbc2d67c5449aafa406a403e5367bb555
2020-01-15 07:28:00 +00:00
JT Olio
8e242cd012 dbutil: statically require all databases accesses to use contexts
this will allow for some nice runtime analysis down the road.
also, this allows for wrapping database handles in a way that
can interact with these contexts

requires https://review.dev.storj.io/c/storj/dbx/+/514

Change-Id: Ib087b7cd73296dd2c1e0331314da34d861f61d2b
2020-01-14 18:20:47 -05:00
crawter
a57ce18f58 satellite/payments: coupons, coupons usage, invoice generation with pricing model applied
Change-Id: Ic5d5a2fc116388647efe46896cfccc2038c77537
2020-01-14 12:45:00 +00:00
Egon Elbre
ff267168c5 private/migrate: add ctx argument
Change-Id: I3d65912d89261386413c494c7ed1576fed4dcaf4
2020-01-13 15:52:26 +02:00
Egon Elbre
24958bd7d3 satellite: add ctx to DB.CreateTables
Change-Id: I9ecad624cf5a7fc9c86bb91c68f96a3a4efd2e92
2020-01-13 15:31:09 +02:00
Egon Elbre
0835b9024c private/dbutil/pgutil: add ctx argument
Change-Id: Icfd56ca8c1f831ad56c0195a0b883e8f0618daaf
2020-01-13 15:27:06 +02:00
littleskunk
bcc23f6869
Satellite/orders: remove allocated bandwith from storagenode_bandwidth_rollups
When an uplink requests an upload or download from the satellite we are trackig the
allocated bandwidth twice. The value in bucket_bandwidth_rollups is used
for project limits but the value in storagenode_bandwidth_rollups is not
used at all. We can increase the performance by removing it. Uplinks
will get a faster response from the satellite.

Change-Id: Icccd41f94107ef34668f30f99bf5f728c384b07e
2020-01-12 16:20:47 +01:00
Moby von Briesen
bb3baf5a4e satellite/satellitedb: Add nodes_offline_times table for downtime tracking
Change-Id: If6b80fe0a20d88cedacaf4b76b75aa21d0af2465
2019-12-30 15:45:02 -05:00
paul cannon
b5ddfc6fa5 satellite/satellitedb: unexport satellitedb.DB
Backstory: I needed a better way to pass around information about the
underlying driver and implementation to all the various db-using things
in satellitedb (at least until some new "cockroach driver" support makes
it to DBX). After hitting a few dead ends, I decided I wanted to have a
type that could act like a *dbx.DB but which would also carry
information about the implementation, etc. Then I could pass around that
type to all the things in satellitedb that previously wanted *dbx.DB.

But then I realized that *satellitedb.DB was, essentially, exactly that
already.

One thing that might have kept *satellitedb.DB from being directly
usable was that embedding a *dbx.DB inside it would make a lot of dbx
methods publicly available on a *satellitedb.DB instance that previously
were nicely encapsulated and hidden. But after a quick look, I realized
that _nothing_ outside of satellite/satellitedb even needs to use
satellitedb.DB at all. It didn't even need to be exported, except for
some trivially-replaceable code in migrate_postgres_test.go. And once
I made it unexported, any concerns about exposing new methods on it were
entirely moot.

So I have here changed the exported *satellitedb.DB type into the
unexported *satellitedb.satelliteDB type, and I have changed all the
places here that wanted raw dbx.DB handles to use this new type instead.
Now they can just take a gander at the implementation member on it and
know all they need to know about the underlying database.

This will make it possible for some other pending code here to
differentiate between postgres and cockroach backends.

Change-Id: I27af99f8ae23b50782333da5277b553b34634edc
2019-12-16 19:09:30 +00:00
Jessica Grebenschikov
c5116cb2a0 satellitedb: fix migration cockroach test
Change-Id: Ie3b4a4b0795d156238d50a58078282cc0918a334
2019-12-16 18:02:31 +00:00
Yaroslav Vorobiov
8cf1aa6e4f
satellite/accounting: fix project limits migration (#3717) 2019-12-10 18:12:49 +02:00
Jeff Wendling
48da8baab5 storj-sim: work with cockroach:// urls for satellite databases
for storj-sim to work, we need to avoid schemas in cockroach urls
so we have storj-sim create namespaced databases instead of schemas
and we have the migrate command create the database in the same way
that it would create a schema for postgres. then it works!

a follow up commit will move the creation of the database/schemas
into storj-sim's setup step so that we can avoid doing these icky
creations during normal migration calls. it will also make the
pointerdb have an explicit call to migrate instead of just doing
it every time it's opened.

Change-Id: If69ef5cb96b6866b0438c761bd445afb3597ae5f
2019-12-09 23:44:00 +00:00
Natalie Villasana
c3c02bec3c
satellite/satellitedb: reset storage node reputations to re-enable disqualification (#3693) 2019-12-09 12:04:00 -05:00
Egon Elbre
56a3b62bef satellite/satellitedb: ensure migration tests run (#3706)
satellitedb migration tests ran against multiple base versions, however after the merging all the steps the base versions didn't exists anymore - which meant none of the migration tests were actually running.
2019-12-09 09:26:58 -06:00
paul cannon
378b863b2b private,satellite: unite all the "temp db schema" things
first, so that they all work the same way, because it's getting
complicated, and second, so that we can do the appropriate thing
instead of CREATE SCHEMA for cockroachdb.

Change-Id: I27fbaeeb6223a3e06d97bcf692a2d014b31465f7
2019-12-05 15:36:59 +00:00
Yehor Butko
756b9b9e2b
satellite/payments: coupons and coupon usage (#3648) 2019-11-26 19:58:51 +02:00