Update the billing table to use generated IDs, to include the source and status fields, and to add metadata jsonb field that can be used for fields specific to a particular type of billing transaction. Additionally a new table was added to keep track of user balances
Change-Id: Ieb3a63aafd8fe21fc3386bafd43d52081b7d2838
satellite/{payments/billing,satellitedb}: refactor billing DB
Update the billing table to use generated IDs, to include the source and status fields, and to add metadata jsonb field that can be used for fields specific to a particular type of billing transaction. Additionally a new table was added to keep track of user balances
Change-Id: Ieb3a63aafd8fe21fc3386bafd43d52081b7d2838
Adding an interval_end_time column to the accounting_rollups table
to keep the last interval_end_time for each daily storage tallies.
Updates https://github.com/storj/storj/issues/4178
Change-Id: If7a8210c5e9fe2fc9df84b137a8b6e3db2471c58
Add billing DB to the satellite. This DB will hold all transactions on the users account and can be used to compute the users current usable account balance.
Change-Id: I056416efc169e5e5e30c9f30cd8bc766b7bc8073
Add storjscan wallets DB to the satellite. For now this DB is a one to one mapping of the users account to a storjscan wallet that can be used by the account holder to make payments on their Storj account.
relates to https://github.com/storj/storj/issues/4347
Change-Id: I6e65b15817b90ceb75641244f9bf173c3b4228a7
We added nodes.disqualification_reason recently, but we didn't add a
corresponding column in the reputations table (despite having a
corresponding `disqualified` column there).
Without this change, the (very useful and informative) assignments to
updateFields.DisqualificationReason in reputations.go have no effect.
Refs: https://github.com/storj/storj/issues/4601
Change-Id: I77404902ca64b56aed72f1de76b303fe82b76aab
When a new user registers, we send a verification request to their
email. Currently, if they do not verify their email, we take no further
action. We want to send these users reminders: one after about one day
and one after about 5 days. To do this we will use this new
verification_reminders column.
It will look something like this:
```
SELECT email FROM users
WHERE status = 0
AND (
(verification_reminders = 0 AND created_at < now() - 'INTERVAL 1d')
OR (verification_reminders = 1 AND created_at < now() - 'INTERVAL 5d')
)
```
Change-Id: If0620e08c97e9e337c9563481d665c5bd462693b
Things that make debugging easier.
* Added logging to automatic link clicking to make it obvious, when it
fails.
* Added monitoring to oidc.
* Made dbx create calls noreturn for oauth_*
Change-Id: I37397b4e84ce5bfd82954aed9c38fdfd52595f24
Added failed_login_count and login_lockout_expiration columns to users table to control users failed login attempts.
We want to prevent brute forcing of user login so this is the first step.
Change-Id: I06b0b9f5415a1922e08cd9908893b2fd3c26bca0
This sets the corresponding _numeric columns to be NOT NULL (it has been
verified manually that there are no more NULL _numeric values on any
known satellites, and it should be impossible with current code to get
new NULL values in the _numeric columns.
We can't drop the _gob columns immediately, as there will still be code
running that expects them, but once this version is deployed we can
finally drop them and be totally done with this crazy 5-step migration.
Change-Id: I518302528d972090d56b3eedc815656610ac8e73
We are in the process of creating an api to allow users to manage their
accounts programmatically. We would like to use api keys for
authorization. We were originally going to create an entirely new table
for these api keys, but seeing as we already have 2 other tables for
keys/tokens, api_keys and oauth_tokens, we thought it might be better to
use one of these. We're using oauth_tokens.
We create a new oidc.OAuthTokenKind for account management api keys:
KindAccountManagementTokenV0. We made the key versioned because we
likely want to improve the implementation in the future, but we want to
get something functional out the door ASAP because the account management
api feature is highly desired.
Add a new method to oidc.OAuthTokens interface for revoking v0 account
management api keys, RevokeAccountManagementTokenV0. Add update method
to dbx implementation to allow updating the expiration. We will revoke
these keys by setting the expiration to 0 so they are expired.
Change-Id: Ideb8ae04b23aa55d5825b064b5e43e32eadc1fba
Remove redundant suspension timestamp column from nodes and reputation tables.
Suspended timestamp was moved to unknown_audit_suspended and suspended column is
no longer used so there is no point in keeping both.
Change-Id: Ieea3f12141b33ec9efe7594f4c9dbc7e10675b0e
For a thorough explanation of the overall transition, see the message on
commit c053bdbd70.
This change will rename the columns containing gob-encoded big.Floats
and add new columns which will contain the equivalent data in a more
sql-friendly format.
The change should *not* break already-running satellite processes,
because all functionality touching these tables has already been taught
to work with these new columns if it sees any "undefined column" errors.
Change-Id: I229324376533e383c5d05064b8aedad149cf825b
This change adds some more checks to the deletion process for projects and
users, since we ran into a race condition during invoicing, where projects
have been deleted before the invoicing was finished, leading to missing
references.
This PR changes the logic to block user deletion if we are in exactly that period,
while also allowing the deletion of projects/users on free tier during the month.
Change-Id: Ic0735205e6633762fb7e3c2fa13e744cdfa5ec32
We want to issue a reminder to users when they don't verify their email within 24hrs of registering. This change only adds a column to the users table.
Change-Id: I92e2baeabf179338ffec01574d4752c0ccdba88b
All limits we have for projects have also parent limits stored
with user data. New created project is first taking limits from
owner (user) limits.
This change is extending users table with project_segment_limit
column and adds functionality to get and set value for this
column.
Change-Id: Iff5e36c62b517652390b649fc05992475916ecff
Exposes functionality to get and update project segment
limit. It will be used to limit number of segments per project
while uploading object.
Change-Id: I971d48eebb4e7db8b01535c3091829e73437f48d
We want to set maximum number of segments per
project. This change adds only column to projects table.
Default value 1M is set to make later migration easier as
we need to set 1M for paid tier users and 140K for free
tier users.
Change-Id: I8e83712e08c5bd91dfa59f652d17e45c14240a36
To allow for changing limits for new users, while leaving existing users limits as they are, we must store the project limits for each user. We currently store the limit for the number of projects a user can create in the user DB table. This change would also store the project bandwidth and storage limits in the same table.
Change-Id: If8d79b39de020b969f3445ef2fcc370e51d706c6
Alter satellites DB nodes table to add `disqualification_reason` int column
which contains disqualification type enum of why node has been disqualified.
Change-Id: Ia514557018ca27e1984216dc5004346d59869d16
table and drop not null constraint on objects column
Since, we want to move from charging our customers by object count to
segment count, this PR prepares the database to be able to record segments count
instead of objects count for satellite's billing system
Change-Id: Ie91ef354e78d24a268bc1cdc4327c182f733321e
This update is to set up users being able to register with a promo code added to their account in place of the free tier coupon.
Change-Id: I7badf87937b12664f145520b6dcc4b26fe750407
We don't need column uses_segment_transfer_queue in graceful_exit_progress
as now all exiting nodes are using graceful_exit_segment_transfer_queue and
table graceful_exit_transfer_queue has been dropped.
Change-Id: I4b7c087433f04138cf09bcf8ad3d8de2c185502a
Uplink needs only part of columns we are reading from DB.
To improve performance we should read only those that are
realy needed.
Change-Id: Ib39259318169c46afe5fa4c6ce2184da82e960c8
We don't use this column for anything. If you want to know if a node is
contained, you can check the pending_audits table.
Change-Id: I5671722a5fc6e1749d3a49e187a56556000ff941
We don't use this column for anything. If you want to know if a node is
contained, you can check the pending_audits table.
Change-Id: I8da1d8e01a2dcaff63c5067a7927b5451424ad04
Drop table graceful_exit_transfer_queue which is not used anymore (replaced by graceful_exit_segment_transfer_queue).
Change-Id: Ie254fe9a54fb0784e350a439ce7a9bc99a3a58b5
In order to limit the amount of overall requests a user can issue in a
time span, we need to have the ability to define such limit separate
from per second request rate.
This PR adds a new column on the projects table to store the burst limit
per project.
Change-Id: I7efc2ccdda4579252347cc6878cf846b85146dc7
nodes and audit_history tables
This PR removes all code reference to audit_histories table and
```
audit_reputation_alpha, audit_reputation_beta,
unknown_audit_reputation_alpha, unknown_audit_reputation_beta,
```
columns from nodes table.
It also drops audit_histories table from the db since the code
that's referencing it currently are not being used.
Change-Id: Ifcda8db36afb3a333d487ff831f2fdefc8b02a4c
Currently, reputation table is only populated when a node has been
audited. This is ok in production, however a lot of our tests doesn't
upload any data or trigger audits.
This PR adds an initialization step in testplanet to populate reputation
table with zero value for nodes reputation.
Change-Id: I11b381236669db346dc68a48a6d4a27334a0a8b8
Columns for MFA status, secret key, and JSON-encoded array of
recovery codes are added to the users table.
Change-Id: Ifed7e50ec9767c1670d9682df1575678984daa60
We want to calculate bucket tally only from iterating objects.
Object currently has an info about totals for bytes and segments.
We need to adjust tallies to keep those totals. Older entries will
be untouched and code will use totals only if available. Change
is adding columns for totals to bucket_storage_tally table and
is adding general handling for them.
Next step is to start using total columns instead of inline/remote.
This will be done with next change.
Change-Id: I37fed1b327789efcf1d0570318aee3045db17fad
We want to use StreamID/Position to identify injured
segment. As it is hard to alter existing injuredsegments
table we are adding a new table that will replace existing
one. Old table will be dropped later.
Change-Id: I0d3b06522645013178b6678c19378ebafe485c49
So that we can easily see whether a user is in the paid tier without
querying for payment methods.
Change-Id: I122566ddd0953203f852741fa12c71795bc1ec5c
Currently, pending audit is finding segment by segment location
(path) because we want to move audit to segmentloop and we will
have only StreamID and Position we need to add columns for those
fields. Altering existing table can cause issues while
migration and deployment. Cleaner choise is to make new table.
This change contains migration with new segment_pending_audit
table that will replace pending_audits table and adjustments
to use new table in the code.
Table pending_audits will be dropped with next release.
Change-Id: Id507e29c152da594bac1fd812c78d7ecf45ec51f
table graceful_exit_segment_transfer_queue will be used to replace graceful_exit_transfer_queue. Currently, it uses the path of a segment to keep track of pieces to be transferred. As we want to use the segment metainfo loop, we will need to record stream_id and position of the segment instead of relying on object path.
This change also add a uses_segment_transfer_queue column to the graceful_exit_progress table to be able to know if a transfer has been initiated while using the old table.
Change-Id: Iafb1e8e65ba124e20de4a9ff76da181c3222de7e
The reputation table duplicates the reputation information in the
nodes table. It will be used for implementing the reputation
service.
Change-Id: I36c0318e8fa5f535e9d527df95b22a4f9eb365d4
Migration step for adding a 'egress_dead' column to the project_bandwidth_daily_rollups.
It will be used to track bandwidth allocation that won't be consumned
as the corresponding order has already been processed and has a settled
bandwidth amount lower than the order limit (allocated bandwidth).
Change-Id: Ic07592e69292ae2076e69f6038bb0e0fae79b271
The DBs of our production satellites have some indexes that we didn't
have in the migrations because at that time we weren't able to add them
because our migration test was not able to deal with Cockroach indexes
with the STORING clause.
We have recently modified the storj.io/private/dbutil/pgutil package to
support the CRDB STRORING clause, so we are adding the missing indexes
to our migrations for being able to have them if we have to recover a DB
from scratch or we deploy a new DB satellite.
Change-Id: I686ff84e5b4c02d9615f50fa531261363affefb8
For business accounts we need to track the sales contact.
It will be a question to business accounts during onboarding.
Change-Id: I8d101ce1b52091478dfb0ddd875e1cc717d765d3
Initially we duplicated the code to avoid large scale changes to
the packages. Now we are past metainfo refactor we can remove the
duplication.
Change-Id: I9d0b2756cc6e2a2f4d576afa408a15273a7e1cef
* Add a nullable billing_periods column in the coupons table
* Add nullable billing_periods column to the currently unused
coupon_codes table
* Drop the duration column from the coupon_codes table
* Replace duration config type so that the default promotional coupon
can be configured to never expire
Zero downtime migration plan:
* Add billing_periods column to coupons and coupon_codes tables (this change)
* After one release, remove all references to the old duration column,
replacing with references to billing_periods. At this point, we can also
change the defult promotional coupon to never expire and migrate over
values from the old duration column.
* After another release, drop the duration column.
Change-Id: I374e8dc9fab9f81b4a5bc681771955662d4c007a
instead of only generating invoices for nodes that had some
activity, we generate it for every node so that we can find
and pay terminal nodes that did not meet thresholds before
we recognized them as terminal.
Change-Id: Ibb3433e1b35f1ddcfbe292c034238c9fa1b66c44
The coupon_codes table will allow for administrators to create new promo
codes associated with coupon information (amount, duration, etc...).
A user will be able to enter a promo code (aka coupon code) in order to
apply a new coupon to their account. The coupon in the coupons table is
linked to the template defined in the coupon_codes table.
Change-Id: I50e49fa92afbc6aa9d01d8a895c069efb59e472b
Migration step 148 will cause errors because we missed some
references to the columns being dropped. Removing the step
altogether causes problems with backwards compatibility tests
because the change already exists in the latest release tag.
To circumvent, we change v148 to an empty migration.
Add methods FindTable and RemoveColumn in private/dbutil/dbschema
Change-Id: Ia527e95b88a88c5dc82800928ce6f8cfb879e334
cockroach is having problems with huge transactions and
having them complete before timeouts or whatever, so
do smaller transactions.
because we can have partial recording of payments which
are not unique, we have to do a thing where we read and
check if it already exists before writing. this is not
concurrency safe.
Change-Id: Ia7d59499a43ce6d70cb2a23754edbdd1b643ef1a
Turns out that many methods generated with dbx where not used at all.
Lets remove them.
As a next step we can think about dropping tables like:
* user_credit
* offer
Change-Id: Id6cda81a701348db2a6b8b26daa22ae9c4f87cb4
Limit bucket name lookup to date range of the calling methods since we only need distinct bucket names for that time period.
Adds new index and removes an index specific to project ID since it is no longer needed.
Change-Id: Ic07bbfb1c32280e0c0e39f8da020b284e1e5d974
Delete satellite order methods and DB tables which aren't used anymore
after we have done a refactoring on the orders to stuck bucket
information in the orders' encrypted metadata.
There are also configuration parameters and a satellite chore that
aren't needed anymore after the orders refactoring.
Change-Id: Ida3682b95921df70792284b42c96d2508bf8ca9c
using redash i manually checked that the only times the sum of
the payments does not match the paid column is for 2020-12 and
if it does not match then there are no payments.
Change-Id: I71ce0571de7e38e21548d7d6757b25abc3bfa781
The rollup archiver chore moves bucket bandwidth rollups and
storagenode rollups that are older than a given duration
to two new archive tables.
Change-Id: I1626a3742ad4271bc744fbcefa6355a29d49c6a5
This index is obsolete and duplicates a similiar (project_id, name)
index on the same table.
Moreover, it might confuse CockroachDB which of the two index to use,
which may might affect DB performance.
Change-Id: If8d1df8347714942cea9dca82864ba5f4973bed3
This is the first step in the removal of uptime columns on the
nodes table. These columns are no longer used:
uptime_success_count
total_uptime_count
uptime_reputation_alpha
uptime_reputation_beta
In order to avoid breaking backwards compatibility, we need to
remove all references to these columns before removing the columns
themselves from the database. However, since uptime_success_count
and total_uptime_count are NOT NULLABLE, we can't remove them from
the insert statements in the overlay. So we can't remove the columns
because of the references, and we can't remove the references because
the columns can't be null. What a pickle. To remedy this, we will set a
default on the columns. Then we should be able to remove them from the
insert statements
Change-Id: I75f6c56fb7897835bbf29869f86f39de1d9dd345
Now that the deprecated downtime tracking service is removed
(3fc76f4ffe), we can safely remove
the nodes_offline_times table.
Change-Id: Ia7c6efe32ba104dff5a830af5f2beee3337eefe5
We need to be able to list all buckets in DB without knowing project ID.
This method will be used to list buckets for metainfo loop
implementation based on metabase.
Change-Id: Iac75af0eee4f31e80a15577575a8249cbca787b2
these are unchanged from storj.io/dbx.
we're importing them because in a later commit we
will change them, and it'd be nice to see that
diff as a separate commit.
Change-Id: I8315130ed6bab397bd65b9a1a90c29d130b8c02d
this change tries really hard to never have all of the storage node
rollups in memory at the same time, up until the rollups are actually
getting summed together.
Change-Id: If67f49e7d71106798d996a6850b3e48671bd9e18
With the new phase 3 order submission, orders can be added to the
storage and bandwidth rollup tables at timestamps before the most recent
rollup was run. This change shifts the start time of each new rollup
window to account for any unexpired orders that might have been added
since the previous rollup.
A satellitedb migration is necessary to allow upserts in the
accounting_rollups table when entries with identical node_ids and
start_times are inserted.
Change-Id: Ib3022081f4d6be60cfec8430b45867ad3c01da63
We plan to add support for a new Reed-Solomon scheme soon, but our
repair queue orders segments by least number of healthy pieces first.
With a second RS scheme, fewer healthy pieces will not necessarily
correlate to lower health.
This change just adds the new column in a migration. A separate change
will add the new health function.
Right now, since we only support one RS scheme, behavior will not
change. Number of healthy pieces is being inserted as "segment health"
until the new health function is merged.
Segment health is calculated with a new priority function created in
commit 3e5640359. In order to use the function, a new config value is
added, called NodeFailureRate, representing the approximate probability
of any individual node going down in the duration of one checker run.
Change-Id: I51c4202203faf52528d923befbe886dbf86d02f2
It turns out we need to make 2 more changes in order for the new order submission phase 3 to get deployed.
This PR makes 2 changes:
1) when the rollup service deletes tallies, we now keep tallies around until orders expire (vs 1 day like before).
2) the reported rollup chore will now write the storagenode_bandwidth_rollups to a new table _phase2 as an intermediary step so it doesn't conflict with phase 3 order settlement.
These changes need to be deployed for 2 days before we can turn on phase 3 of the new orders settlement workflow.
Change-Id: Iafbff577ba7d55f8f17b7db857311b2ce799de60
This fixes a slow query that was taking up to 4 seconds in production
SELECT node_id, path, piece_num, root_piece_id, durability_ratio, queued_at, requested_at, last_failed_at, last_failed_code, failed_count, finished_at, order_limit_send_count
FROM graceful_exit_transfer_queue
WHERE node_id = '[redacted]'
AND finished_at is NULL
AND last_failed_at is NULL
ORDER BY durability_ratio asc, queued_at asc LIMIT 300 OFFSET 0;
Change-Id: Ib89743ca35f1d8d0a1456b20fa08c683ebdc1549
This change completes the column migration of
5f6fccc6e8 and
2f648fd981.
It resets every users project limits who are below or equal to our
current production defaults.
Change-Id: Ie041d08bb67b62844f6023190fc00bc2dad5b1cb
This PR adds the following items:
1) an in-memory read-only cache thats stores project limit info for projectIDs
This cache is stored in-memory since this is expected to be a small amount of data. In this implementation we are only storing in the cache projects that have been accessed. Currently for the largest Satellite (eu-west) there is about 4500 total projects. So storing the storage limit (int64) and the bandwidth limit (int64), this would end up being about 200kb (including the 32 byte project ID) if all 4500 projectIDs were in the cache. So this all fits in memory for the time being. At some point it may not as usage grows, but that seems years out.
The cache is a read only cache. When requests come in to upload/download a file, we will read from the cache what the current limits are for that project. If the cache does not contain the projectID, it will get the info from the database (satellitedb project table), then add it to the cache.
The only time the values in the cache are modified is when either a) the project ID is not in the cache, or b) the item in the cache has expired (default 10mins), then the data gets refreshed out of the database. This occurs by default every 10 mins. This means that if we update the usage limits in the database, that change might not show up in the cache for 10 mins which mean it will not be reflected to limit end users uploading/downloading files for that time period..
Change-Id: I3fd7056cf963676009834fcbcf9c4a0922ca4a8f
Our current endpoints bail on us, if the column data is null. Thus we need
to take the intermediate step and set the default to a fixed value and
reset those with the following release.
It sets the default column value to our current config values of 50GB
for storage and bandwidth and 100 buckets, while still enabling the field to be nullable.
All 0 values are migrated to be the default as well to ensure they can
keep using their projects, as with the original change, 0 actually means 0.
Change-Id: I797be80ce2d2105091599dc1b3fc76f74336b66b
Currently we have no way to actually set one
of the following limits to 0 (meaning not usable):
- maxBuckets
- usageLimit
- bandwidthLimit
With having the field nullable,
NULL corresponds to the global default,
0 now actually 0 and
a set value determines a custom limit.
Change-Id: I92bb77529dcbd0881ae8368921be9d246eb0919e
Add online score used for the new audit history offline tracking system
to the nodes table. This allows us easy access to the node's online
score for the storagenode dashboard as well as for data analysis.
Change-Id: Ie99be1192e5236862a5b3dbed2e5ef03b9169410
It's an obsolete table from earlier state of Stripe invoices
implementation. No code is currently using it. It is confirmed that this
table is currently empty across all satellites.
Change-Id: I12d2756578faf8418ea8f3b09088e885694b8925
Jira: https://storjlabs.atlassian.net/browse/USR-822
This the last step of dropping these 2 db tables. It also deletes all
code associate with them.
Change-Id: I8be840dc2a7be255cf6308c9434b729fe4d9391e
Why: We need a way to cut down on database traffic due to bandwidth
measurement and tracking.
What: This changeset is the Satellite side of settling orders in 1 hr windows.
See design doc for more details: https://review.dev.storj.io/c/storj/storj/+/1732
Change-Id: I2e1c151e2e65516ebe1b7f47b7c5f83a3a220b31
What:
Use the github.com/jackc/pgx postgresql driver in place of
github.com/lib/pq.
Why:
github.com/lib/pq has some problems with error handling and context
cancellations (i.e. it might even issue queries or DML statements more
than once! see https://github.com/lib/pq/issues/939). The
github.com/jackx/pgx library appears not to have these problems, and
also appears to be better engineered and implemented (in particular, it
doesn't use "exceptions by panic"). It should also give us some
performance improvements in some cases, and even more so if we can use
it directly instead of going through the database/sql layer.
Change-Id: Ia696d220f340a097dee9550a312d37de14ed2044
This system tracks an abstract "api version" from nodes based on
their usage, allowing us to have latching behavior where if a node
ever uses a new api, it can be blocked from using the old api.
This is better than using self-reported semver version information
because the node cannot lie, there's no confusion about what semver
version implies which features, no questions about dev and ci
environments, and no dependencies between reporting the version
and using the new api.
Change-Id: Ifeced5c9ae8e0a16102d79635e176a7d3bdd8ed4
When a request comes in on the satellite api and we validate the
macaroon, we now also check if any of the macaroon's tails have been
revoked.
Change-Id: I80ce4312602baf431cfa1b1285f79bed88bb4497
add new columns `offline_suspended` and `under_review` to nodes table.
`unknown_audit_suspended` is a new column which will replace `suspended`
Change-Id: I22ddeb338ea0ff63f14332a7ebd0f3e9e4c06cdc
This ensures that rows are closed to avoid leaks.
Also verifies that Err() is called, to ensure that no
error is left behind.
Change-Id: Idd1bec9bf479f40021da67b2c80ce83033149469
To avoid including multiple months in a single invoice, we need all
inspector's invoice commands to run in for specific period.
See https://storjlabs.atlassian.net/browse/USR-725
Change-Id: I3637dc189234f02350daca8d897c21765762ea55
This reverts commit 105dc7acc6.
Reason for revert: Recent changes to the Postgres query plan seems to want to use this index now. Reverting until we have time to analyze what's happening.
Change-Id: I74b4b5a8f15c3850d8a958a29f51dbc80e7c282c
The goal of this change is to improve the storagenode_storage_tallies table by removing the unneeded id column that is not being used but only taking up space, and also to add an index on a different column that needs it. Removing and adding a column seems simple, but ended up being more complicated because of some cockroachdb limitations.
The cockroachdb limitation when trying to remove a column from a table and create a new primary key are:
1. only allows primary key creation at table creation time (docs: https://www.cockroachlabs.com/docs/stable/primary-key.html)
2. table drop or rename is performed async and cannot be done in a transaction (issue: https://github.com/cockroachdb/cockroach/issues/12123, https://github.com/cockroachdb/cockroach/issues/22868)
To address these differences between cockroachdb and Postgres, this PR performs different migrations for the two database. The Postgres migration is straight forward and what you would expect, but the cockroach migration has two main changes:
1. To change a primary key, use the recommended process from the cockroachdb docs to create a new table with the new primary key you want and then migrate the data.
2. In order to do 1, we needed to do the new table renaming in a separate transaction from the data migration.
Ref: SM-65
Change-Id: Idc9aee3ab57aa4d5570e3d2980afea853cd966bf
Make sure that suspended nodes are treated appropriately by the overlay
cache. This means we should expect the following behavior:
* suspended nodes (vetted or not) should not be selected for uploading
new segments
* suspended nodes should be treated by the checker and repairer as
"unhealthy", and should be removed upon successful repair
This commit also removes unused overlay functionality.
Fixes a bug with commit 8b72181a1f where
the audit reporter was automatically suspending nodes regardless of
audit outcome (see test added).
Tests:
* updates repair tests to ensure that a suspended node is treated as
unhealthy and will be removed from the pointer on successful repair
* updates overlay tests for KnownUnreliableOrOffline and KnownReliable
to expect suspended nodes to be considered "unreliable"
* adds satellitedb test that ensures overlay.SelectStorageNodes and
overlay.SelectNewStorageNodes do not include suspended nodes
* adds audit reporter test to ensure that different audit outcomes
result in the correct suspended/disqualified states
Change-Id: I40dba67278c8e8d2ce0bcec5e0a5cb6e4ce2f561
My understanding is that the nodes table has the following fields:
- `address` field which can be a hostname or an IP
- `last_net` field that is the /24 subnet of the IP resolved from the address
This PR does the following:
1) add back the `last_ip` field to the nodes table
2) for uplink operations remove the calls that the satellite makes to `lookupNodeAddress` (which makes the DNS calls to resolve the IP from the hostname) and instead use the data stored in the nodes table `last_ip` field. This means that the IP that the satellite sends to the uplink for the storage nodes could be approx 1 hr stale. In the short term this is fine, next we will be adding changes so that the storage node pushes any IP changes to the satellite in real time.
3) use the address field for repair and audit since we want them to still make DNS calls to confirm the IP is up to date
4) try to reduce confusion about hostname, ip, subnet, and address in the code base
Change-Id: I96ce0d8bb78303f82483d0701bc79544b74057ac
The migration was broken into one migration per table to reduce table locking and reduce the
chances of failure due to SQL timeouts.
Of the 14 fields that lacked time zones, only the 3 named 'interval_start` seemed to have non-UTC data in them.
These fields are fixed in the migration by removing the +00 and adding AT TIME ZONE current_setting('TIMEZONE')
Field with good data are migrated by adding AT TIME ZONE 'UTC'
Note that postgres's timezone() is different than cockroach's timezone() so AT TIME ZONE is used.
https://storjlabs.atlassian.net/browse/SM-104
Change-Id: I410f2f1d7c11b143f17844347f37e6f4b1e70fce
these tables are used in future commits with respect to the new
storagenode payments code. if we create them now, it will make
backfilling them with historical data easier.
Change-Id: I3c08c9770ec5b2baa38b4f2fd18c2f07746a61c2
Add a column to the repair queue table in the satellite db for healthy
piece count. When an item is selected from the repair queue, the least
durable segment that has not been attempted in the past hour should be
selected first. This prevents our repairer from getting stuck doing work
on segments that are close to the repair threshold while allowing
segments that are more unhealthy to degrade further.
The migration also clears the repair queue so that the migration runs
quickly and we can properly account for segment health in future repair
work.
We do not select items off the repair queue that have been attempted in
the past six hours. This was changed from on hour to allow us time to
try a wider variety of segments when the repair queue is very large.
Change-Id: Iaf183f1e5fd45cd792a52e3563a3e43a2b9f410b
This change adds two new tables to process orders as fast as we used
to but in an asynchronous manner and with hopefully less storage
usage. This should help scale on cockroach, but limits us to one
worker. It lays the groundwork for the order processing pipeline to
be queue rather than database driven.
For more details, see the added fast billing changes blueprint.
It also fixes the orders db so that all the timestamps that are
passed to columns that do not contain a time zone are converted to
UTC at the last possible opportunity, making it less likely to use
the APIs incorrectly. We really should migrate to include timezones
on all of our timestamp columns.
Change-Id: Ibfda8e7a3d5972b7798fb61b31ff56419c64ea35
this commit updates our monkit dependency to the v3 version where
it outputs in an influx style. this makes discovery much easier
as many tools are built to look at it this way.
graphite and rothko will suffer some due to no longer being a tree
based on dots. hopefully time will exist to update rothko to
index based on the new metric format.
it adds an influx output for the statreceiver so that we can
write to influxdb v1 or v2 directly.
Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff
it was noticed that if you had a long lived transaction A that
was blocking some other transaction B and A was being aborted
due to retriable errors, then transaction B was never given
priority. this was due to using savepoints to do lightweight
retries.
this behavior was problematic becaue we had some queries blocked
for over 16 hours, so this commit addresses the issue with two
prongs:
1. bound the amount of time we will retry a transaction
2. create new transactions when a retry is needed
the first ensures that we never wait for 16 hours, and the value
chosen is 10 minutes. that should be long enough for an ample
amount of retries for small queries, and huge queries probably
shouldn't be retried, even if possible: it's more preferrable to
find a way to make them smaller.
the second ensures that even in the case of retries, queries that
are blocked on the aborted transaction gain priority to run.
between those two changes, the maximum stall time due to retries
should be bounded to around 10 minutes.
Change-Id: Icf898501ef505a89738820a3fae2580988f9f5f4
before dbx would generate a compilcated blob of conditions that
encoded a row comparison, which only optimized to an index seek
on cockroachdb. this means that sqlite and postgres both had
quadratic behavior on paged queries of this form. instead, use
the implicit row construction feature supported in all of the
databases to do paged support so that they all optimize well.
Change-Id: Iac8703929ba2a59ee3ffa619b916d12663422887
Limits how many times metainfo APIs can be called per second by project ID. If limit is exceeded, the API will return Unauthorized/Too Many requests.
Limit per second and the size of the limiter cache per project are configurable, as well as whether the limiter is enabled.
Tests added/updated for the new rate_limit field in projects table.
Tests added for exceeding limits and disableing limiter.
Change-Id: Ic8ad102de3b690a475809d4f684156d5715f20fa