Commit Graph

76 Commits

Author SHA1 Message Date
Natalie Villasana
3cc1a0b393 dbutil/cockroachutil: make crdb retry when tcp error
Adds checks in the existing NeedsRetry cockroachutil method
to see if the error is from the net package and a tcp error,
in which case it should retry. Also adds checks for if the
error contains EOF-- we don't retry in this case because it's
unclear if the query succeeded or failed.
These checks are needed to prevent our app from crashing when
it gets "connection reset by peer" and other tcp errors that
we've seen frequently coming from the metainfo loop.

Change-Id: I194d4b120082393cb6dbda2dd86b44f4696a66a4
2021-02-08 08:49:22 +00:00
Egon Elbre
12055e7864 all: minor cleanups
Change-Id: I4248dbe36a62a223b06135254b32851485a2eec1
2020-12-16 10:47:46 +00:00
Ethan Adams
f90ea10a4a
Allow for DB application names per process. (#3983) 2020-12-04 11:24:39 +01:00
JT Olio
0ba516d405 satellite: support pointing db components at different databases
the immediate need is to be able to move the repair queue back out
of cockroach if we can't save it.

Change-Id: If26001a4e6804f6bb8713b4aee7e4fd6254dc326
2020-11-28 18:39:16 +00:00
Egon Elbre
cbc1922590 private/dbutil/pgtest: use round robin to pick databases
Currently we were picking databases randomly for testing,
however a round-robin picking might have more predictable
behavior and cause less cockroach timeouts.

Change-Id: I74ac0d5b38c89452d3c46d3811330e46e7449514
2020-11-06 12:55:55 +00:00
Egon Elbre
7183dca6cb all: fix defers in loop
defer should not be called in a loop.

Change-Id: Ifa5a25a56402814b974bcdfb0c2fce56df8e7e59
2020-11-02 15:06:38 +02:00
Egon Elbre
caefde6b32 private/{dbutil,tagsql}: pass ctx to database opening
Database opening usually dial and hence we should pass ctx to them.

Change-Id: Iaa2875981570d83e65be3710f841cf30349f807b
2020-10-29 10:51:29 +00:00
Egon Elbre
4e8d53c8fb private/dbutil/pgutil: ensure storagenode doesn't depend on pgx
pgx is a large dependency and there's no need to include it in
storagenode binary.

Change-Id: I49c304c6420733d5f095d7edb35d32811210e41a
2020-09-30 14:28:47 +00:00
Egon Elbre
c23a8e3b81 go.mod: update pgx to v4.9.0
Fix query to use TextArray instead of VarcharArray.
Fix queries to use the correct type.

Change-Id: Ibb7e55adba277d05778118d81ca697470e72c374
2020-09-29 19:03:08 +00:00
Egon Elbre
2d27bc8787 satellite/satellitedb: separate cockroach for migration tests
Currently Cockroach migration test is the most heavy with regards to
schema changes. This causes other tests to time out. This adds an
alternate cockroach instance that is used for migration tests.

Change-Id: I01fe9313527ff002f0bb0914dd52c3645b8eaf6d
2020-09-29 09:31:33 +00:00
stefanbenten
4645805b18 private/dbutil: set connMaxLifetime to 30 minutes
To prevent longlived unused connections, set the maximum time to 30 minutes to
prevent proxies and loadbalancers forcefully cutting the connection.
This helps in scenarios with low load/requests to a DB.

Change-Id: I7dba15ef97f6f6541e872a6fb1d3a9bbbfe5bb50
2020-08-28 18:00:41 +00:00
Egon Elbre
94a09ce20b all: add missing dots
Change-Id: I93b86c9fb3398c5d3c9121b8859dad1c615fa23a
2020-08-11 17:50:01 +03:00
paul cannon
fd7bfc94fe private/dbutil: don't sort column names in an index
The order in which column names appear in an index should be
deterministic (for both our sqlite and postgresql code). Also, the order
is very relevant as to whether a given schema is correct.

Change-Id: I227ea057fcd9c3e967dd241a7e1c787d1bc4baa1
2020-07-17 10:07:01 +00:00
Egon Elbre
080ba47a06 all: fix dots
Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2
2020-07-16 14:58:28 +00:00
Jennifer Johnson
784a156eea satellite: prevents uplink from creating a bucket once it exceeds the max bucket allocation.
Change-Id: I4b3822ed723c03dbbc0df136b2201027e19ba0cd
2020-07-15 17:27:05 +00:00
stefanbenten
257855b5de all: replace == comparison with errors.Is
Change-Id: I05d9a369c7c6f144b94a4c524e8aea18eb9cb714
2020-07-14 15:50:25 +00:00
paul cannon
bbdb351e5e all: use jackc/pgx in place of lib/pq
What:

Use the github.com/jackc/pgx postgresql driver in place of
github.com/lib/pq.

Why:

github.com/lib/pq has some problems with error handling and context
cancellations (i.e. it might even issue queries or DML statements more
than once! see https://github.com/lib/pq/issues/939). The
github.com/jackx/pgx library appears not to have these problems, and
also appears to be better engineered and implemented (in particular, it
doesn't use "exceptions by panic"). It should also give us some
performance improvements in some cases, and even more so if we can use
it directly instead of going through the database/sql layer.

Change-Id: Ia696d220f340a097dee9550a312d37de14ed2044
2020-07-13 15:54:41 +00:00
Egon Elbre
9dc9cd8a17 tests: allow STORJ_TEST_POSTGRES
STORJ_POSTGRES_TEST naming was not consistent with STORJ_SIM_POSTGRES.

This allows to use STORJ_TEST_POSTGRES for clarity, it still has a
fallback to STORJ_POSTGRES_TEST.

Change-Id: I6f294c66c80fcfd6750fea2a89795f3b7f5dd691
2020-07-10 16:43:49 +03:00
Stefan Benten
9dbd511396
private/dbutil: reduce db connection defaults (#3920) 2020-07-08 19:59:42 +02:00
Egon Elbre
1ed5a1bac5 satellite/satellitedb/satellitedbtest: skip omitted database
The first implementation missed some changes.

Change-Id: I7ae696175e0a9ea46954970ba8547638a05ed5a9
2020-06-11 13:28:16 +00:00
Ivan Fraixedes
dc5502cb81 private: Prepare pkg for enabling gosec
Prepare package for enabling gosec linter.

Change-Id: I0cce91d83969385f95e5bf82269d6c23629e04a0
2020-06-11 12:00:52 +00:00
Egon Elbre
1c30efd3a1 private/testplanet: allow setting "omit" as database to reduce output
Change-Id: I7af90fdefe2ff2df1340aa2b17f40806d889ca18
2020-06-09 12:41:58 +03:00
Egon Elbre
36c461bd59 private/tagsql: track proper closing of rows and statements
This ensures that rows are closed to avoid leaks.
Also verifies that Err() is called, to ensure that no
error is left behind.

Change-Id: Idd1bec9bf479f40021da67b2c80ce83033149469
2020-06-05 18:25:43 +00:00
Yaroslav Vorobiov
09ca382abf storagenode/db: preflight improve index discovery
Change-Id: I876b321f6cd4e91dfced87aa4d39f2cf9a8e63d0
2020-06-05 14:03:25 +03:00
Ethan
b1bb665c78 satellite/metainfo: Handle "server is not accepting clients" error during CRDB node rejoins
https: //storjlabs.atlassian.net/browse/SM-1035
Change-Id: I27243b0d8fc3250916c86ceb915f973cbf80f656
2020-05-29 16:21:56 +00:00
Natalie Villasana
8bd4d7b43e storage/cockroachkv: add check if retry is needed during iteration
This changeset replaces https://review.dev.storj.io/c/storj/storj/+/1839
which did the same thing but Nat couldn't figure out how to fix conflicting
files the correct gerrity way.

Change-Id: If05a8902aca986ea9f6c9168a90b31beebab839a
2020-05-26 14:32:06 -04:00
Jeff Wendling
074649835b satellite/satellitedb: add some docs and improve some snapshots
This attempts to add a README.md to help create consistent migrations
that maximize our test coverage and do not include unnecessary
statements.

It also adds a feature to have an `-- OLD DATA --` section as well
as a `-- NEW DATA --` section so that we can fix mistakes made in
previous snapshots (like a row that was forgotten to be added when a
table was created) without editing them going forward.

Change-Id: I28a786f8ef163cae1de1bb08f61af1e1104b0a88
2020-05-22 21:27:36 +00:00
Natalie Villasana
2514d6328d dbutil/cockroachutil: add monkit to QueryContext
This will help us keep track of crdb errors in influx.

Change-Id: I997596aa4eb9a2b9b81305d123c3452ecdf5dde5
2020-05-20 14:56:25 -04:00
Natalie Villasana
8d87a6efc9 cockroachutil/driver: handle retryable errors returned from Next
This will only work if retryable errors are returned on the first
call to Next. Otherwise if they're returned later, we will need
deeper changes at the application code level throughout the
codebase 😬👎

Change-Id: I46d795a13670f66b7f085605ba1b779f69c339c3
2020-05-15 14:49:43 -04:00
Egon Elbre
85c45cd56f private/dbutil/pgtest: support multiple databases for testing
Currently Cockroach isn't performant for concurrent database setup and
tear-down. Instead of a single instance allow setting multiple potential
connection strings and let the tests pick one connection string
randomly.

This improves test duration by ~10 minutes.

While we are at significantly changing how pgtest works, introduce
helper PickPostgres and PickCockroach for selecting the database to
reduce code duplications in multiple places.

Change-Id: I8ad171d5c4c8a4fc081ec2ae9bdd0cc948a80619
2020-04-28 21:55:49 +03:00
Jeff Wendling
e33da90879 private/dbutil/cockroachutil: stop checking for jackc/pgx
we do not use that driver, and removing the case from the
type assertion reduces the satellite binary size by 5%.

Change-Id: I1c1b5e1e57dc4a98415103cfddd4f8c091588573
2020-04-10 07:19:02 +00:00
Jeff Wendling
d658a6a6ec private/dbutil/txutil: fix logic in transaction retries
before this change, any transaction that took longer than 5 minutes
even if it succeeded, would get a retry error included in the
result.

try to make the logic more clear and add comments for the reader.

Change-Id: Ib84a89a33907a24426ecf52c90404be0e0dfa307
2020-04-09 13:58:53 +00:00
Egon Elbre
8f73fb7a32 all: simplify uuid usage
uuid.UUID implements driver.Value so it can be directly used as a
scannable result.

Replace uses of dbutil.BytesToUUID with uuid.FromBytes.

Change-Id: I51a670185ceb3cc2199d5aa2b76bc3fc191ca8fe
2020-04-02 05:48:58 +00:00
Egon Elbre
0a69da4ff1 all: switch to storj.io/common/uuid
Change-Id: I178a0a8dac691e57bce317b91411292fb3c40c9f
2020-03-31 19:16:41 +03:00
Jeff Wendling
97e980cd8a private/dbutil: add database name to configure as a tag
storagenodes have like 10 or more databases. without this
tag they all get sent as the same value, stomping on each
other.

Change-Id: Ib12019684d6ea8f2a5b83df584056dfa79e3c4b3
2020-03-26 16:50:15 +00:00
Jeff Wendling
41887883f3 satellite/satellitedb: check indexes on migration
Change-Id: I5ba7ae2b512d77c70405ce332158f12128e27eed
2020-03-13 10:45:22 +00:00
Jeff Wendling
443aa08a06 private/dbutil/txutil: remove the individual retry events
Change-Id: I63d06e57d7e6723b4d00d51f77c46345a11c4671
2020-03-03 08:38:19 +00:00
Jeff Wendling
948589d38b private/dbutil/txutil: include details about retry attempts in error
Change-Id: I978ae44c4890df31185ec6077c9fb3a2b2fce8f1
2020-02-17 14:18:13 +00:00
Jeff Wendling
7999d24f81 all: use monkit v3
this commit updates our monkit dependency to the v3 version where
it outputs in an influx style. this makes discovery much easier
as many tools are built to look at it this way.

graphite and rothko will suffer some due to no longer being a tree
based on dots. hopefully time will exist to update rothko to
index based on the new metric format.

it adds an influx output for the statreceiver so that we can
write to influxdb v1 or v2 directly.

Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff
2020-02-05 23:53:17 +00:00
Jeff Wendling
d20db90cff private/dbutil/txutil: create new transactions for retries
it was noticed that if you had a long lived transaction A that
was blocking some other transaction B and A was being aborted
due to retriable errors, then transaction B was never given
priority. this was due to using savepoints to do lightweight
retries.

this behavior was problematic becaue we had some queries blocked
for over 16 hours, so this commit addresses the issue with two
prongs:

    1. bound the amount of time we will retry a transaction
    2. create new transactions when a retry is needed

the first ensures that we never wait for 16 hours, and the value
chosen is 10 minutes. that should be long enough for an ample
amount of retries for small queries, and huge queries probably
shouldn't be retried, even if possible: it's more preferrable to
find a way to make them smaller.

the second ensures that even in the case of retries, queries that
are blocked on the aborted transaction gain priority to run.

between those two changes, the maximum stall time due to retries
should be bounded to around 10 minutes.

Change-Id: Icf898501ef505a89738820a3fae2580988f9f5f4
2020-02-01 18:34:28 +00:00
paul cannon
5a1838bc28 private/dbutil: retry single statements on cockroachdb
This ought to make it so that all single statements (Exec- or Query-) on
a CockroachDB backend will get retried as necessary. As there is no need
for savepoints to be allocated or released in this case, there is no
round-trip overhead except when statements actually do need to be
retried.

Change-Id: Ibd7f1725ff727477c456cb309120d080f3cd7099
2020-01-24 09:01:47 +00:00
paul cannon
fd84fa6316 private/dbutil: rollback pending transactions on panic
We don't do a lot of panicking in our main code, so hopefully this won't
matter much, but we /do/ call panic a lot in our tests (t.Fatal,
require.NoError, etc). And when that happens, we need pending
transactions to be aborted or we can get into a deadlock situation when
something else tries to /Close/ that connection.

Change-Id: Idaf0d543ac95afea34f9b2393d1187f5322e9f0f
2020-01-23 16:30:19 +00:00
Jeff Wendling
3b86917cc9 private/dbutil/pgutil: faster cockroach constraint finding
Change-Id: Ia100b9ef7d2d59dfad0389feb8f2e7c47c2c4c9b
2020-01-22 15:47:04 +00:00
Egon Elbre
1279eeae39 private/tagsql,storage: fixes to context cancellation
Replace all the remaining uses of sql.DB with tagsql.DB to
fix issues with context cancellation.

Introduce tagsql.Open which helps to get rid of all tagsql.Wrap-s.
Use tagsql in cockroachkv and postgreskv.

Change-Id: I8946d203341cb85a25976896fc7881e1f704e779
2020-01-20 15:44:39 +02:00
Egon Elbre
ee0293c212 private/dbutil/sqliteutil: add missing err check
Change-Id: Ie18c76d0e6d02a5c55e2d6503437b8a07b47a64e
2020-01-19 19:24:58 +00:00
Egon Elbre
1abfe42142 satellite: use tagsql
Change-Id: I2170dee409fb0c2fe85913ddd36e7811a3b853ed
2020-01-19 14:39:16 +02:00
Egon Elbre
59d06644b9 private/migrate: switch to tagsql
Also added temporary types withRebind and withTagTx,
which will be later removed. Currently they help to avoid
changing the whole codebase at the same time.

Change-Id: I7f07ba8f4709a23a463bfa67464628665a05808f
2020-01-19 14:39:16 +02:00
Egon Elbre
5fd833b108 private/dbutil: remove basic Query
dbschema.Query is used only for testing and sqlite,
so this won't cause us problems in production.

Change-Id: Ib296a7daf161a9d3de23a7dfdc4f505d47ac4a37
2020-01-19 14:39:16 +02:00
Egon Elbre
5d80e22af9 private/tagsql: implement wrapper for sql.DB
Wrapper adds tracing and fixes context usage issues.

Change-Id: Ie6f7650eac87e2a2b64b760198498ba5857ad535
2020-01-17 13:52:12 +00:00
Egon Elbre
64fb2d3d2f Revert "dbutil: statically require all databases accesses to use contexts"
This reverts commit 8e242cd012.

Revert because lib/pq has known issues with context cancellation.
These issues need to be resolved before these changes can be merged.

Change-Id: I160af51dbc2d67c5449aafa406a403e5367bb555
2020-01-15 07:28:00 +00:00