storj

Author	SHA1	Message	Date
Moby von Briesen	a8b66dce17	satellite/accounting: account for old orders that can be submitted in satellite rollup With the new phase 3 order submission, orders can be added to the storage and bandwidth rollup tables at timestamps before the most recent rollup was run. This change shifts the start time of each new rollup window to account for any unexpired orders that might have been added since the previous rollup. A satellitedb migration is necessary to allow upserts in the accounting_rollups table when entries with identical node_ids and start_times are inserted. Change-Id: Ib3022081f4d6be60cfec8430b45867ad3c01da63	2020-11-18 14:46:00 -05:00
Moby von Briesen	0ec685b173	satellite/{satellitedb, repair/{queue, checker}}: Use new column "segmentHealth" instead of "numHealthy" in injured segments queue We plan to add support for a new Reed-Solomon scheme soon, but our repair queue orders segments by least number of healthy pieces first. With a second RS scheme, fewer healthy pieces will not necessarily correlate to lower health. This change just adds the new column in a migration. A separate change will add the new health function. Right now, since we only support one RS scheme, behavior will not change. Number of healthy pieces is being inserted as "segment health" until the new health function is merged. Segment health is calculated with a new priority function created in commit `3e5640359`. In order to use the function, a new config value is added, called NodeFailureRate, representing the approximate probability of any individual node going down in the duration of one checker run. Change-Id: I51c4202203faf52528d923befbe886dbf86d02f2	2020-11-16 21:18:09 +00:00
Jessica Grebenschikov	f558cc825e	satellite/orders: add storagenode_bw_phase2 table and dont delete tallies for longer It turns out we need to make 2 more changes in order for the new order submission phase 3 to get deployed. This PR makes 2 changes: 1) when the rollup service deletes tallies, we now keep tallies around until orders expire (vs 1 day like before). 2) the reported rollup chore will now write the storagenode_bandwidth_rollups to a new table _phase2 as an intermediary step so it doesn't conflict with phase 3 order settlement. These changes need to be deployed for 2 days before we can turn on phase 3 of the new orders settlement workflow. Change-Id: Iafbff577ba7d55f8f17b7db857311b2ce799de60	2020-11-13 17:15:24 +00:00
Ethan	9a29ec5b3e	Add index to graceful_exit_transfer_queue table This fixes a slow query that was taking up to 4 seconds in production SELECT node_id, path, piece_num, root_piece_id, durability_ratio, queued_at, requested_at, last_failed_at, last_failed_code, failed_count, finished_at, order_limit_send_count FROM graceful_exit_transfer_queue WHERE node_id = '[redacted]' AND finished_at is NULL AND last_failed_at is NULL ORDER BY durability_ratio asc, queued_at asc LIMIT 300 OFFSET 0; Change-Id: Ib89743ca35f1d8d0a1456b20fa08c683ebdc1549	2020-10-26 14:47:48 +00:00
Stefan Benten	0b43b93259	satellite/satellitedb: make limits per default NULL This change completes the column migration of `5f6fccc6e8` and `2f648fd981`. It resets every users project limits who are below or equal to our current production defaults. Change-Id: Ie041d08bb67b62844f6023190fc00bc2dad5b1cb	2020-10-14 20:28:16 +00:00
Jessica Grebenschikov	4a2c66fa06	satellite/accounting: add cache for getting project storage and bw limits This PR adds the following items: 1) an in-memory read-only cache thats stores project limit info for projectIDs This cache is stored in-memory since this is expected to be a small amount of data. In this implementation we are only storing in the cache projects that have been accessed. Currently for the largest Satellite (eu-west) there is about 4500 total projects. So storing the storage limit (int64) and the bandwidth limit (int64), this would end up being about 200kb (including the 32 byte project ID) if all 4500 projectIDs were in the cache. So this all fits in memory for the time being. At some point it may not as usage grows, but that seems years out. The cache is a read only cache. When requests come in to upload/download a file, we will read from the cache what the current limits are for that project. If the cache does not contain the projectID, it will get the info from the database (satellitedb project table), then add it to the cache. The only time the values in the cache are modified is when either a) the project ID is not in the cache, or b) the item in the cache has expired (default 10mins), then the data gets refreshed out of the database. This occurs by default every 10 mins. This means that if we update the usage limits in the database, that change might not show up in the cache for 10 mins which mean it will not be reflected to limit end users uploading/downloading files for that time period.. Change-Id: I3fd7056cf963676009834fcbcf9c4a0922ca4a8f	2020-09-25 16:28:49 +00:00
Stefan Benten	5f6fccc6e8	satellite/satellitedb: makes limits nullable change backwards compatible Our current endpoints bail on us, if the column data is null. Thus we need to take the intermediate step and set the default to a fixed value and reset those with the following release. It sets the default column value to our current config values of 50GB for storage and bandwidth and 100 buckets, while still enabling the field to be nullable. All 0 values are migrated to be the default as well to ensure they can keep using their projects, as with the original change, 0 actually means 0. Change-Id: I797be80ce2d2105091599dc1b3fc76f74336b66b	2020-09-23 17:54:42 +02:00
Stefan Benten	2f648fd981	satellite: make limits be nullable Currently we have no way to actually set one of the following limits to 0 (meaning not usable): - maxBuckets - usageLimit - bandwidthLimit With having the field nullable, NULL corresponds to the global default, 0 now actually 0 and a set value determines a custom limit. Change-Id: I92bb77529dcbd0881ae8368921be9d246eb0919e	2020-09-21 19:34:19 +00:00
Cameron Ayer	e7c34a053d	satellite/satellitedb: add column and index "updated_at" to injuredsegments Change-Id: I59e9bb2077885f09e17795375fe98ed31bd83d54	2020-09-14 12:53:04 -04:00
Moby von Briesen	2d01dd9732	satellite/satellitedb: Add online_score column to nodes table Add online score used for the new audit history offline tracking system to the nodes table. This allows us easy access to the node's online score for the storagenode dashboard as well as for data analysis. Change-Id: Ie99be1192e5236862a5b3dbed2e5ef03b9169410	2020-08-31 15:07:07 +00:00
Egon Elbre	94a09ce20b	all: add missing dots Change-Id: I93b86c9fb3398c5d3c9121b8859dad1c615fa23a	2020-08-11 17:50:01 +03:00
Kaloyan Raev	7552ff26ec	satellite/db: drop project_invoice_stamps table It's an obsolete table from earlier state of Stripe invoices implementation. No code is currently using it. It is confirmed that this table is currently empty across all satellites. Change-Id: I12d2756578faf8418ea8f3b09088e885694b8925	2020-08-10 13:22:10 +00:00
Kaloyan Raev	edfd3d7661	satellite/payments: delete `credits` and `credits_spendings` db tables Jira: https://storjlabs.atlassian.net/browse/USR-822 This the last step of dropping these 2 db tables. It also deletes all code associate with them. Change-Id: I8be840dc2a7be255cf6308c9434b729fe4d9391e	2020-07-30 12:19:57 +03:00
Egon Elbre	36ed939b89	satellite/orders: add buckets db to service We need to add bucket UUID into the order limit, hence we need access to the buckets table. Change-Id: I348ce1f709c9fcdec5c4034acaab59805b33da9f	2020-07-24 17:36:49 +03:00
Egon Elbre	080ba47a06	all: fix dots Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2	2020-07-16 14:58:28 +00:00
stefanbenten	0209a2095f	satellite/{console,satellitedb}: add project_limit column to users table Change-Id: I603f085f17ca5b413dd1c6837c2081f9e7e791a1	2020-07-15 17:27:31 +00:00
Jennifer Johnson	784a156eea	satellite: prevents uplink from creating a bucket once it exceeds the max bucket allocation. Change-Id: I4b3822ed723c03dbbc0df136b2201027e19ba0cd	2020-07-15 17:27:05 +00:00
stefanbenten	0a32ba0e6b	satellite/admin: add project rename functionality Change-Id: I4c0f42d4c2c26859279f247f94cef97a8ff630a9	2020-07-14 11:36:49 +00:00
Jessica Grebenschikov	8abb907010	satellite/orders: add settle orders with window Why: We need a way to cut down on database traffic due to bandwidth measurement and tracking. What: This changeset is the Satellite side of settling orders in 1 hr windows. See design doc for more details: https://review.dev.storj.io/c/storj/storj/+/1732 Change-Id: I2e1c151e2e65516ebe1b7f47b7c5f83a3a220b31	2020-07-13 15:41:29 -07:00
paul cannon	bbdb351e5e	all: use jackc/pgx in place of lib/pq What: Use the github.com/jackc/pgx postgresql driver in place of github.com/lib/pq. Why: github.com/lib/pq has some problems with error handling and context cancellations (i.e. it might even issue queries or DML statements more than once! see https://github.com/lib/pq/issues/939). The github.com/jackx/pgx library appears not to have these problems, and also appears to be better engineered and implemented (in particular, it doesn't use "exceptions by panic"). It should also give us some performance improvements in some cases, and even more so if we can use it directly instead of going through the database/sql layer. Change-Id: Ia696d220f340a097dee9550a312d37de14ed2044	2020-07-13 15:54:41 +00:00
Jeff Wendling	885ef70c58	satellite/nodeapiversion: new table for tracking node api usage This system tracks an abstract "api version" from nodes based on their usage, allowing us to have latching behavior where if a node ever uses a new api, it can be blocked from using the old api. This is better than using self-reported semver version information because the node cannot lie, there's no confusion about what semver version implies which features, no questions about dev and ci environments, and no dependencies between reporting the version and using the new api. Change-Id: Ifeced5c9ae8e0a16102d79635e176a7d3bdd8ed4	2020-07-09 15:02:25 +00:00
Cameron Ayer	e3088d9ad5	satellite/satellitedb: add new DB table audit_histories Change-Id: I5f854514994cab9a68cf978f2dabfb588df695f5	2020-07-01 21:14:35 +00:00
Isaac Hess	2d727bb14e	satellite: Check macaroon revocation When a request comes in on the satellite api and we validate the macaroon, we now also check if any of the macaroon's tails have been revoked. Change-Id: I80ce4312602baf431cfa1b1285f79bed88bb4497	2020-06-22 13:50:07 -06:00
Cameron Ayer	0885ba5646	satellite/satellitedb: add new columns for offline suspension add new columns `offline_suspended` and `under_review` to nodes table. `unknown_audit_suspended` is a new column which will replace `suspended` Change-Id: I22ddeb338ea0ff63f14332a7ebd0f3e9e4c06cdc	2020-06-15 04:00:20 +00:00
Egon Elbre	36c461bd59	private/tagsql: track proper closing of rows and statements This ensures that rows are closed to avoid leaks. Also verifies that Err() is called, to ensure that no error is left behind. Change-Id: Idd1bec9bf479f40021da67b2c80ce83033149469	2020-06-05 18:25:43 +00:00
Michal Niewrzal	b20ced9519	satellite/satellitedb: drop project_id column from coupons table This is last part of https://storjlabs.atlassian.net/browse/USR-818 Change-Id: I053d11b37df962c12e46645bae2fc2dad49c9755	2020-06-03 14:56:41 +00:00
Kaloyan Raev	49571f1a23	satellite/payments: all invoice commands require period To avoid including multiple months in a single invoice, we need all inspector's invoice commands to run in for specific period. See https://storjlabs.atlassian.net/browse/USR-725 Change-Id: I3637dc189234f02350daca8d897c21765762ea55	2020-05-14 11:50:19 +00:00
Stefan Benten	e23bd806b4	satellite/accounting: separate usage and bandwidth limit (#3878 )	2020-05-12 15:01:15 +02:00
Ethan	acf53bea4d	satellite/orders;accounting: Add monthly project download bandwidth rollup See https://storjlabs.atlassian.net/browse/SM-776 Change-Id: Ifd5cccea43c556fd59822d17344f399cfe9a7164	2020-05-04 15:49:57 +00:00
Ethan Adams	60e07f0a8b	Revert "satellite/accounting: Remove unnecessary index bucket_bandwidth_rollups_project_id_action_interval_index" This reverts commit `105dc7acc6`. Reason for revert: Recent changes to the Postgres query plan seems to want to use this index now. Reverting until we have time to analyze what's happening. Change-Id: I74b4b5a8f15c3850d8a958a29f51dbc80e7c282c	2020-04-22 14:49:04 +00:00
Ethan	105dc7acc6	satellite/accounting: Remove unnecessary index bucket_bandwidth_rollups_project_id_action_interval_index See https://storjlabs.atlassian.net/browse/SM-738 Change-Id: I9ba3cc3fbff9f13fc0b95d25feee5a19e5a5c486	2020-04-21 16:43:09 +00:00
Ethan	4cd86ff780	satellite/accounting: Add index on bucket_bandwidth_rollups for action, interval_start, and project_id See https://storjlabs.atlassian.net/browse/SM-551 for details Change-Id: I104c4e87d5aef500cc4a3893817763808f76c484	2020-04-17 19:14:45 +00:00
Jeff Wendling	2ded64ba2c	satellite/compensation: more fixes to get prod running smoothly Change-Id: I13a76d9d49222fb10796415a015f224d4084fde3	2020-04-07 10:10:27 +00:00
Jennifer Johnson	1547e791a3	satellitedb: remove free_bandwidth column from nodes table Change-Id: I9d1d3de9216c6533c1042ef473631721a011d086	2020-04-06 09:30:28 +00:00
Ethan	df462d7265	satellite/accounting: Add index on bucket_bandwidth_rollups to minimize full table scans https://storjlabs.atlassian.net/browse/SM-545 Change-Id: I5599a72a991d70236f17beca027e9bc032777177	2020-03-26 19:53:50 +00:00
Jessica Grebenschikov	aeab599d21	satellitedb: removed unused id on storagenode_storage_tallies table, add index on node_id The goal of this change is to improve the storagenode_storage_tallies table by removing the unneeded id column that is not being used but only taking up space, and also to add an index on a different column that needs it. Removing and adding a column seems simple, but ended up being more complicated because of some cockroachdb limitations. The cockroachdb limitation when trying to remove a column from a table and create a new primary key are: 1. only allows primary key creation at table creation time (docs: https://www.cockroachlabs.com/docs/stable/primary-key.html) 2. table drop or rename is performed async and cannot be done in a transaction (issue: https://github.com/cockroachdb/cockroach/issues/12123, https://github.com/cockroachdb/cockroach/issues/22868) To address these differences between cockroachdb and Postgres, this PR performs different migrations for the two database. The Postgres migration is straight forward and what you would expect, but the cockroach migration has two main changes: 1. To change a primary key, use the recommended process from the cockroachdb docs to create a new table with the new primary key you want and then migrate the data. 2. In order to do 1, we needed to do the new table renaming in a separate transaction from the data migration. Ref: SM-65 Change-Id: Idc9aee3ab57aa4d5570e3d2980afea853cd966bf	2020-03-20 14:39:44 -07:00
Jennifer Johnson	9b78473c0c	satellitedb: adds vetted_at nullable timestamp to nodes table Change-Id: I42d5a396b4eecbad26b683c6aee51e043d2ff034	2020-03-20 01:37:28 +00:00
Moby von Briesen	2f991b6c56	satellite/{overlay, satellitedb}: account for `suspended` field in overlay cache Make sure that suspended nodes are treated appropriately by the overlay cache. This means we should expect the following behavior: * suspended nodes (vetted or not) should not be selected for uploading new segments * suspended nodes should be treated by the checker and repairer as "unhealthy", and should be removed upon successful repair This commit also removes unused overlay functionality. Fixes a bug with commit `8b72181a1f` where the audit reporter was automatically suspending nodes regardless of audit outcome (see test added). Tests: * updates repair tests to ensure that a suspended node is treated as unhealthy and will be removed from the pointer on successful repair * updates overlay tests for KnownUnreliableOrOffline and KnownReliable to expect suspended nodes to be considered "unreliable" * adds satellitedb test that ensures overlay.SelectStorageNodes and overlay.SelectNewStorageNodes do not include suspended nodes * adds audit reporter test to ensure that different audit outcomes result in the correct suspended/disqualified states Change-Id: I40dba67278c8e8d2ce0bcec5e0a5cb6e4ce2f561	2020-03-17 17:14:56 +00:00
Jeff Wendling	41887883f3	satellite/satellitedb: check indexes on migration Change-Id: I5ba7ae2b512d77c70405ce332158f12128e27eed	2020-03-13 10:45:22 +00:00
Jessica Grebenschikov	803e2930f4	satellite: use IP for all uplink operations, use hostname for audit and repairs My understanding is that the nodes table has the following fields: - `address` field which can be a hostname or an IP - `last_net` field that is the /24 subnet of the IP resolved from the address This PR does the following: 1) add back the `last_ip` field to the nodes table 2) for uplink operations remove the calls that the satellite makes to `lookupNodeAddress` (which makes the DNS calls to resolve the IP from the hostname) and instead use the data stored in the nodes table `last_ip` field. This means that the IP that the satellite sends to the uplink for the storage nodes could be approx 1 hr stale. In the short term this is fine, next we will be adding changes so that the storage node pushes any IP changes to the satellite in real time. 3) use the address field for repair and audit since we want them to still make DNS calls to confirm the IP is up to date 4) try to reduce confusion about hostname, ip, subnet, and address in the code base Change-Id: I96ce0d8bb78303f82483d0701bc79544b74057ac	2020-03-11 09:11:40 -07:00
Bill Thorp	e99e675fb1	satellite/satellitedb: use time zones with all timestamps The migration was broken into one migration per table to reduce table locking and reduce the chances of failure due to SQL timeouts. Of the 14 fields that lacked time zones, only the 3 named 'interval_start` seemed to have non-UTC data in them. These fields are fixed in the migration by removing the +00 and adding AT TIME ZONE current_setting('TIMEZONE') Field with good data are migrated by adding AT TIME ZONE 'UTC' Note that postgres's timezone() is different than cockroach's timezone() so AT TIME ZONE is used. https://storjlabs.atlassian.net/browse/SM-104 Change-Id: I410f2f1d7c11b143f17844347f37e6f4b1e70fce	2020-03-05 21:11:25 +00:00
Moby von Briesen	f495544c56	satellite/satellitedb/dbx: add fields to node table for placing nodes into suspended mode for too many unknown-error audits Change-Id: Iac9a619e5c08377de87ffdf4acdd0155027f5eb3	2020-03-03 03:30:59 +00:00
Jeff Wendling	1db087cfba	satellite/satellitedb: migration to create tables for compensation these tables are used in future commits with respect to the new storagenode payments code. if we create them now, it will make backfilling them with historical data easier. Change-Id: I3c08c9770ec5b2baa38b4f2fd18c2f07746a61c2	2020-02-27 17:34:50 +00:00
Moby von Briesen	4e5a7f13c7	satellite/repair/queue: Prioritize selection of items off repair queue by segment health Add a column to the repair queue table in the satellite db for healthy piece count. When an item is selected from the repair queue, the least durable segment that has not been attempted in the past hour should be selected first. This prevents our repairer from getting stuck doing work on segments that are close to the repair threshold while allowing segments that are more unhealthy to degrade further. The migration also clears the repair queue so that the migration runs quickly and we can properly account for segment health in future repair work. We do not select items off the repair queue that have been attempted in the past six hours. This was changed from on hour to allow us time to try a wider variety of segments when the repair queue is very large. Change-Id: Iaf183f1e5fd45cd792a52e3563a3e43a2b9f410b	2020-02-26 09:54:16 -05:00
Jeff Wendling	f671eb2beb	satellite/satellitedb: use queue for orders to get back fast billing This change adds two new tables to process orders as fast as we used to but in an asynchronous manner and with hopefully less storage usage. This should help scale on cockroach, but limits us to one worker. It lays the groundwork for the order processing pipeline to be queue rather than database driven. For more details, see the added fast billing changes blueprint. It also fixes the orders db so that all the timestamps that are passed to columns that do not contain a time zone are converted to UTC at the last possible opportunity, making it less likely to use the APIs incorrectly. We really should migrate to include timezones on all of our timestamp columns. Change-Id: Ibfda8e7a3d5972b7798fb61b31ff56419c64ea35	2020-02-24 17:07:07 +00:00
Jeff Wendling	2d2f5e1a7f	satellite/satellitedb/dbx: remove typo in dbx file and format it Change-Id: I756315d6228ac9edd35cad8b496d36ecf2b5d26f	2020-02-11 14:15:13 -07:00
Qweder93	dc075eaa96	satellite/payments : deposit bonuses (credits) added Change-Id: Ib151bbb9b02d655fa619c53bfbc04ed6f3bb39e0	2020-02-11 11:11:42 +00:00
Jeff Wendling	7999d24f81	all: use monkit v3 this commit updates our monkit dependency to the v3 version where it outputs in an influx style. this makes discovery much easier as many tools are built to look at it this way. graphite and rothko will suffer some due to no longer being a tree based on dots. hopefully time will exist to update rothko to index based on the new metric format. it adds an influx output for the statreceiver so that we can write to influxdb v1 or v2 directly. Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff	2020-02-05 23:53:17 +00:00
Jeff Wendling	d20db90cff	private/dbutil/txutil: create new transactions for retries it was noticed that if you had a long lived transaction A that was blocking some other transaction B and A was being aborted due to retriable errors, then transaction B was never given priority. this was due to using savepoints to do lightweight retries. this behavior was problematic becaue we had some queries blocked for over 16 hours, so this commit addresses the issue with two prongs: 1. bound the amount of time we will retry a transaction 2. create new transactions when a retry is needed the first ensures that we never wait for 16 hours, and the value chosen is 10 minutes. that should be long enough for an ample amount of retries for small queries, and huge queries probably shouldn't be retried, even if possible: it's more preferrable to find a way to make them smaller. the second ensures that even in the case of retries, queries that are blocked on the aborted transaction gain priority to run. between those two changes, the maximum stall time due to retries should be bounded to around 10 minutes. Change-Id: Icf898501ef505a89738820a3fae2580988f9f5f4	2020-02-01 18:34:28 +00:00
Jeff Wendling	d09bd4a749	satellite/satellitedb/dbx: regenerate with paged composite key fixes before dbx would generate a compilcated blob of conditions that encoded a row comparison, which only optimized to an index seek on cockroachdb. this means that sqlite and postgres both had quadratic behavior on paged queries of this form. instead, use the implicit row construction feature supported in all of the databases to do paged support so that they all optimize well. Change-Id: Iac8703929ba2a59ee3ffa619b916d12663422887	2020-01-27 12:43:16 -07:00

1 2 3 4 5

207 Commits