Commit Graph

75 Commits

Author SHA1 Message Date
Fadila Khadar
fb0d055a41 satellites/orders: populate egress_dead in project_bandwidth_daily_rollups
Populate the egress_dead column for taking into account allocated bandwidth that can be removed because orders have been sent by the storage nodes. The bandwidth not used in these orders can be allocated again.

Change-Id: I78c333a03945cd7330aec052edd3562ec671118e
2021-10-06 16:54:49 +00:00
Jeff Wendling
8a6efa1f58 satellite/orders: query for node first before upsert/replace
the very common case is that the node api version is
indeed at least the requested version, so query for
that first to avoid write traffic.

Change-Id: Ib047d93078205bc07fee75d1f635503b792307f0
2021-06-22 15:16:12 -04:00
Egon Elbre
961e841bd7 all: fix error naming
errs.Class should not contain "error" in the name, since that causes a
lot of stutter in the error logs. As an example a log line could end up
looking like:

    ERROR node stats service error: satellitedbs error: node stats database error: no rows

Whereas something like:

    ERROR nodestats service: satellitedbs: nodestatsdb: no rows

Would contain all the necessary information without the stutter.

Change-Id: I7b7cb7e592ebab4bcfadc1eef11122584d2b20e0
2021-04-29 15:38:21 +03:00
Egon Elbre
267506bb20 satellite/metabase: move package one level higher
metabase has become a central concept and it's more suitable for it to
be directly nested under satellite rather than being part of metainfo.

metainfo is going to be the "endpoint" logic for handling requests.

Change-Id: I53770d6761ac1e9a1283b5aa68f471b21e784198
2021-04-21 15:54:22 +03:00
Egon Elbre
86e698f572 pb: use *UnimplementedServer to avoid breaking API changes
Change-Id: I99a34eeb37ac4453411f273511710562a519f57a
2021-03-29 12:26:10 +03:00
Ivan Fraixedes
9c9f481469 satellite/orders: Remove deprecated endpoint
Remove the orders Settlement endpoint because it isn't used and it was
already always returning an error.

Change-Id: I81486fbe7044a1444182173bc0693698ee7cfe7e
2021-02-03 23:47:07 +00:00
Ivan Fraixedes
d93944c57b satellite/orders: Delete unused methods & DB tables
Delete satellite order methods and DB tables which aren't used anymore
after we have done a refactoring on the orders to stuck bucket
information in the orders' encrypted metadata.

There are also configuration parameters and a satellite chore that
aren't needed anymore after the orders refactoring.

Change-Id: Ida3682b95921df70792284b42c96d2508bf8ca9c
2021-02-01 18:01:29 +00:00
Egon Elbre
51731db121 satellite/orders: use smaller encrypted metadata
Avoid using project uuid string representation, because
it uses more bandwidth.

This reduces the encrypted metadata size from 118 -> 97 bytes.

Change-Id: Ic53a81b83acc065f24f28cd404f9c0b1fe592594
2021-01-08 16:40:31 +00:00
Jessica Grebenschikov
da0327c9b7 satellite/dbcleanup: remove expired serial chore
Change-Id: Ib71d41eb6679d6435e5bc10b6244dac66380a74e
2020-12-18 09:36:28 -08:00
Jessica Grebenschikov
97a5e6c814 satellite/orders: stop inserting/reading from serial_numbers table
This PR contains the minimum changes needed to stop inserting into the serial_numbers table. This is the first step in completely deprecating that table.
The next step is to create another PR to remove the expiredSerial chore, fix more tests, and remove any other methods on the serial_number table.

Change-Id: I5f12a56ebf3fa4d1a1976141d2911f25a98d2cc3
2020-12-18 08:35:13 -08:00
Stefan Benten
494bd5db81
all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
Jessica Grebenschikov
b261110352 satellite/orders: get bucketID from encrypted metadata in order instead of serial_numbers table
We want to stop using the serial_numbers table in satelliteDB. One of the last places using the serial_numbers table is when storagenodes settle orders, we look up the bucket name and project ID from the serial number from the serial_numbers table.

Now that we have support to add encrypted metadata into the OrderLimit, this PR makes use of that and now attempts to read the project ID and bucket name from the encrypted orderLimit metadata instead of from the serial_numbers table. For backwards compatibility and to ensure no errors, we will still fallback to the old way of getting that info from the serial_numbers table, but this will be removed in the next release as long as there are no errors.

All processes that create orderLimits must have an orders.encryption-keys set. The services that create orderLimits (and thus need to encrypt the order metadata) are the satellite apiProcess, the repair process, audit service (core process), and graceful exit (core process). Only the satellite api process decrypts the order metadata when storagenodes settle orders. This means that the same encryption key needs to be provided in the config for the satellite api process, repair process, and the core process like so:
orders.include-encrypted-metadata=true
orders.encryption-keys="<"encryptionKeyID>=<encryptionKey>"

Change-Id: Ie2c037971713d6fbf69d697bfad7f8b672eedd66
2020-12-01 15:29:32 +00:00
Egon Elbre
55d5e1fd7d satellite/orders: ensure that expired deletion doesn't stall
Add checks to ensure that when somebody uses empty options, the deletion
doesn't loop infinitely.

Change-Id: I1738fb1e7e1f8efbbb954c491cb6489f7bcdc2db
2020-11-23 14:52:40 +02:00
Ethan
2b92bba563 satellite/satellitedb/orders: Handle serial_numbers deletes in smaller increments on CRDB
CRDB doesn't like large deletes. While testing in the POC environment we found that deletes on the serial_numbers table could take hours.  This change limits deletes to 1000 at a time (configurable) to avoid blocking other queries.

Change-Id: I08455e25db1574579dd4d7b7125a08e9c913dff1
2020-11-20 13:44:52 +00:00
Moby von Briesen
a8b66dce17 satellite/accounting: account for old orders that can be submitted in satellite rollup
With the new phase 3 order submission, orders can be added to the
storage and bandwidth rollup tables at timestamps before the most recent
rollup was run. This change shifts the start time of each new rollup
window to account for any unexpired orders that might have been added
since the previous rollup.

A satellitedb migration is necessary to allow upserts in the
accounting_rollups table when entries with identical node_ids and
start_times are inserted.

Change-Id: Ib3022081f4d6be60cfec8430b45867ad3c01da63
2020-11-18 14:46:00 -05:00
Jessica Grebenschikov
f558cc825e satellite/orders: add storagenode_bw_phase2 table and dont delete tallies for longer
It turns out we need to make 2 more changes in order for the new order submission phase 3 to get deployed.

This PR makes 2 changes:
1) when the rollup service deletes tallies, we now keep tallies around until orders expire (vs 1 day like before).
2) the reported rollup chore will now write the storagenode_bandwidth_rollups to a new table _phase2 as an intermediary step so it doesn't conflict with phase 3 order settlement.

These changes need to be deployed for 2 days before we can turn on phase 3 of the new orders settlement workflow.

Change-Id: Iafbff577ba7d55f8f17b7db857311b2ce799de60
2020-11-13 17:15:24 +00:00
Jessica Grebenschikov
f5880f6833 satellite/orders: rollout phase3 of SettlementWithWindow endpoint
Change-Id: Id19fae4f444c83157ce58c933a18be1898430ad0
2020-10-26 14:56:28 +00:00
Jessica Grebenschikov
205c39d404 satellite/orders: upgrade to phase 2 rollout ordersWithWindow
We are moving an error into rejectErr since its preventing storage nodes from being able to settle other orders.

Change-Id: I3ac97c340e491b127f5e0024c5e8bd9f4df8d5c3
2020-10-15 21:20:19 +00:00
Stefan Benten
1d3b728766 satellite/{console/payments/satellitedb}: add validation for deletion of account and project
The same was that our Admin API handles project and account deletions currently, we would like
to have the same checks on the user-facing API. This PR adds the same checks to the console service.
General more applicable checks have been moved directly into the payments service.

In addition it adds the BucketsDB to the console DB, to have easier access and avoiding import cycles with
the metainfo package.

A small cleanup around our unnecessary monkit imports made it in as well.

Change-Id: I8769b01c2271c1687fbd2269a738a41764216e51
2020-10-13 07:55:26 +00:00
Jeff Wendling
4cbd4d52a9 satellite/orders: only hold the orders semaphore during database calls
holding it during node i/o means slow nodes can hold up order
processing for everyone else. this dramatically increases
the amount of tiem spent handling orders.

Change-Id: Iec999b7ed0817c921a0fd039097a75bdd3c70ea2
2020-10-10 15:40:50 -04:00
Jeff Wendling
0f0faf0a9f satellite/orders: do a better job limiting concurrent requests
Doing it at the ProcessOrders level was insufficient: the endpoints
make multiple database calls. It was a misguided attempt to only
have one spot enter the semaphore. By putting it in the endpoint
we can not only be sure that the concurrency is correctly limited
but it can be configurable easily.

Change-Id: I937149dd077adf9eb87fce52a1a17dc0afe96f64
2020-10-09 16:27:15 -04:00
Jeff Wendling
1fecaed7df satellite/orders: don't version check old endpoint
nodes are submitting using both the legacy and windowed endpoints
and thus having their legacy submissions rejected.

it is legal to use both the legacy and windowed endpoints
in phase1 since they use the same backend. the legacy endpoint
is disabled in phase2 and phase3.

therefore, if we wait an order expiration period (2 days) after
we determine enough nodes have started using the windowed
endpoint, we can be sure that any orders they did have to
submit with the legacy endpoint will have expired.

Change-Id: I4418a881bf8bb9377efaef4c651e6103a5dc6ed0
2020-09-09 10:23:48 -04:00
Egon Elbre
3ca405aa97 satellite/orders: use metabase types as arguments
Change-Id: I7ddaad207c20572a5ea762667531770a56fd54ef
2020-08-28 15:52:37 +03:00
Egon Elbre
94a09ce20b all: add missing dots
Change-Id: I93b86c9fb3398c5d3c9121b8859dad1c615fa23a
2020-08-11 17:50:01 +03:00
Jeff Wendling
85a74b47e7 satellite/orders: 3-phase rollout
This adds a config flag orders.window-endpoint-rollout-phase
that can take on the values phase1, phase2 or phase3.

In phase1, the current orders endpoint continues to work as
usual, and the windowed orders endpoint uses the same backend
as the current one (but also does a bit extra).

In phase2, the current orders endpoint is disabled and the
windowed orders endpoint continues to use the same backend.

In phase3, the current orders endpoint is still disabled and
the windowed orders endpoint uses the new backend that requires
much less database traffic and state.

The intention is to deploy in phase1, roll out code to nodes
to have them use the windowed endpoint, switch to phase2, wait
a couple days for all existing orders to expire, then switch
to phase3.

Additionally, it fixes a bug where a node could submit a bunch
of orders and rack up charges for a bucket.

Change-Id: Ifdc10e09ae1645159cbec7ace687dcb2d594c76d
2020-08-03 17:01:42 +00:00
Egon Elbre
b84923558b satellite: fix scoping, formatting
Change-Id: I21ef9edc2d449d75ad74891df7f966fb150d80fd
2020-07-16 19:13:14 +03:00
Egon Elbre
080ba47a06 all: fix dots
Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2
2020-07-16 14:58:28 +00:00
Jeff Wendling
1944d734ef satellite/orders: check and enforce node api version
Change-Id: Ibdeb1a85dfed8b534bfed32a7cdaae5c3dc8b420
2020-07-16 10:38:12 +00:00
stefanbenten
257855b5de all: replace == comparison with errors.Is
Change-Id: I05d9a369c7c6f144b94a4c524e8aea18eb9cb714
2020-07-14 15:50:25 +00:00
Jessica Grebenschikov
1f1e3f0604 satellite/orders: disable settlementwithwindow, skip flaky tests
Change-Id: Ia60d7e0f2d383919650cdc736ba4569bb26ff2d7
2020-07-14 07:17:18 -07:00
Jessica Grebenschikov
8abb907010 satellite/orders: add settle orders with window
Why: We need a way to cut down on database traffic due to bandwidth
measurement and tracking.

What: This changeset is the Satellite side of settling orders in 1 hr windows.
See design doc for more details: https://review.dev.storj.io/c/storj/storj/+/1732

Change-Id: I2e1c151e2e65516ebe1b7f47b7c5f83a3a220b31
2020-07-13 15:41:29 -07:00
Egon Elbre
7d29f2e0d3 all: remove drpc wrappers
Change-Id: I45016f7d2a771dc00776196c1f531f3343e93b40
2020-05-11 08:20:34 +03:00
Egon Elbre
e6d5ce6b77 all: remove grpc
It seems everyone has migrated to drpc.

Change-Id: Ica6b2d0bdef68c6603083f2963458843eca71e9e
2020-05-10 06:36:09 +00:00
Jeff Wendling
a409bd5dec satellite/orders: check for expired orders first
there are a subset of storagenodes hammering the satellite with
expired orders. if we check for expiration first, we don't have
to do a bunch of pointless signature verification. since a && b
is equal to b && a, we can order these checks in any way we want
and have it still be correct.

Change-Id: I6ffc8025c8b0d54949a1daf5f5ea1fed9e213372
2020-04-02 12:35:11 -06:00
Egon Elbre
0a69da4ff1 all: switch to storj.io/common/uuid
Change-Id: I178a0a8dac691e57bce317b91411292fb3c40c9f
2020-03-31 19:16:41 +03:00
JT Olio
5511827662 satellite/orders: don't log expired order limits
we still need to come up with a better plan to get storage nodes
to stop doing this, but in the meantime, we know this is happening,
just stop logging it and keep some stats instead.

Change-Id: Icb6bcba275e0e955c54b1a90da2b37219fff2349
2020-03-26 22:31:10 -06:00
Jeff Wendling
115f4559e5 satellite/orders: more efficient processing of orders
by doing an indexed anti-join we're able to reduce the time to
select the pending orders by over 10x on postgres. this should
help us process pending orders much more quickly.

it probably won't do as good a job on cockroach because it does
not do an indexed anti-join and instead does a hash join after
scanning the entire consumed serials table. we should either
remove orders entirely or try to make that more efficient
when necessary.

Change-Id: I8ca0535acd21c51e74955b24c9b86d20e4f2ff9c
2020-03-18 09:03:30 +00:00
Egon Elbre
64330c55b3 all: use pbgrpc
common/pb moved grpc to a separate package common/pb/pbgrpc.
This updates this repository to use it.

Change-Id: I2de2a190688871cf9cb61f7ea511f8a01e264e4e
2020-02-26 21:27:47 +02:00
Jeff Wendling
f671eb2beb satellite/satellitedb: use queue for orders to get back fast billing
This change adds two new tables to process orders as fast as we used
to but in an asynchronous manner and with hopefully less storage
usage. This should help scale on cockroach, but limits us to one
worker. It lays the groundwork for the order processing pipeline to
be queue rather than database driven.

For more details, see the added fast billing changes blueprint.

It also fixes the orders db so that all the timestamps that are
passed to columns that do not contain a time zone are converted to
UTC at the last possible opportunity, making it less likely to use
the APIs incorrectly. We really should migrate to include timezones
on all of our timestamp columns.

Change-Id: Ibfda8e7a3d5972b7798fb61b31ff56419c64ea35
2020-02-24 17:07:07 +00:00
Ivan Fraixedes
1a84a00cc9
satellite/orders: Fix doc comments
Enhance the documentation of the UseSerialNumber method (interface and
implementation) and add several missing dots in doc comments of the
methods of the same interface and implementation.

Change-Id: I792cd344f0d2542e060fa2ec288b71231cae69de
2020-02-18 13:03:23 +01:00
Jeff Wendling
7999d24f81 all: use monkit v3
this commit updates our monkit dependency to the v3 version where
it outputs in an influx style. this makes discovery much easier
as many tools are built to look at it this way.

graphite and rothko will suffer some due to no longer being a tree
based on dots. hopefully time will exist to update rothko to
index based on the new metric format.

it adds an influx output for the statreceiver so that we can
write to influxdb v1 or v2 directly.

Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff
2020-02-05 23:53:17 +00:00
paul cannon
a0a94a9ac7 satellite/satellitedb: insert into reported_serials w/ arrays
Change-Id: Icb682de09ded3e3159e3590594dcf13f2e7f40f0
2020-01-24 18:36:21 -06:00
Jeff Wendling
696d98a232 satellite/satellitedb: fix nitpicks and timestamp issue found in review
warning: databases migrated to version 77 before this commit
is merged must be manually re-migrated. this should not be a
problem for anything but staging databases.

Change-Id: Ie1631c48379472352014183ee43f1465e22200f7
2020-01-16 21:22:38 +00:00
Jeff Wendling
f42851b1ab satellite/satellitedb: remove the big honkin mutex
no longer necessary/desired with reported_serials.

Change-Id: I69b5c535488eb5f98b250d73a7c8e6deaed0254e
2020-01-15 19:24:35 -07:00
Jeff Wendling
78c6d5bb32 satellite/satellitedb: reported_serials table for processing orders
this commit introduces the reported_serials table. its purpose is
to allow for blind writes into it as nodes report in so that we have
minimal contention. in order to continue to accurately account for
used bandwidth, though, we cannot immediately add the settled amount.
if we did, we would have to give up on blind writes.

the table's primary key is structured precisely so that we can quickly
find expired orders and so that we maximally benefit from rocksdb
path prefix compression. we do this by rounding the expires at time
forward to the next day, effectively giving us storagenode petnames
for free. and since there's no secondary index or foreign key
constraints, this design should use significantly less space than
the current used_serials table while also reducing contention.

after inserting the orders into the table, we have a chore that
periodically consumes all of the expired orders in it and inserts
them into the existing rollups tables. this is as if we changed
the nodes to report as the order expired rather than as soon as
possible, so the belief in correctness of the refactor is higher.

since we are able to process large batches of orders (typically
a day's worth), we can use the code to maximally batch inserts into
the rollup tables to make inserts as friendly as possible to
cockroach.

Change-Id: I25d609ca2679b8331979184f16c6d46d4f74c1a6
2020-01-15 19:21:21 -07:00
Isaac Hess
4950d7106a satellite/orders: Add write cache for bw rollups
Change-Id: I8ba454cb2ab4742cafd6ed09120e4240874831fc
2020-01-13 22:40:51 +00:00
Jeff Wendling
71ec0ad374 satellite/satellitedb: add big honkin mutex to ProcessOrders
the hope is that it is mostly interfering with itself, so this
will make it not do that (well, N api servers, but hopefully
that's not enough to cause it to have issues).

Change-Id: Ifd0c9e6617457785ab25fe5b714d8556cdc8e2d3
2020-01-13 11:33:12 -07:00
littleskunk
bcc23f6869
Satellite/orders: remove allocated bandwith from storagenode_bandwidth_rollups
When an uplink requests an upload or download from the satellite we are trackig the
allocated bandwidth twice. The value in bucket_bandwidth_rollups is used
for project limits but the value in storagenode_bandwidth_rollups is not
used at all. We can increase the performance by removing it. Uplinks
will get a faster response from the satellite.

Change-Id: Icccd41f94107ef34668f30f99bf5f728c384b07e
2020-01-12 16:20:47 +01:00
Egon Elbre
6615ecc9b6 common: separate repository
Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a
2019-12-27 14:11:15 +02:00
Jeff Wendling
098cbc9c67 all: use pkg/rpc instead of pkg/transport
all of the packages and tests work with both grpc and
drpc. we'll probably need to do some jenkins pipelines
to run the tests with drpc as well.

most of the changes are really due to a bit of cleanup
of the pkg/transport.Client api into an rpc.Dialer in
the spirit of a net.Dialer. now that we don't need
observers, we can pass around stateless configuration
to everything rather than stateful things that issue
observations. it also adds a DialAddressID for the
case where we don't have a pb.Node, but we do have an
address and want to assert some ID. this happened
pretty frequently, and now there's no more weird
contortions creating custom tls options, etc.

a lot of the other changes are being consistent/using
the abstractions in the rpc package to do rpc style
things like finding peer information, or checking
status codes.

Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412
2019-09-25 15:37:06 -06:00