Commit Graph

102 Commits

Author SHA1 Message Date
Egon Elbre
f237d70098 storagenode,satellite: use pkg/debug
Use debug.Server in storage node and satellite for customizing debug server.

Change-Id: I7979412376d028cadf29656d838ab94f18e2aa99
2020-01-29 16:30:31 -05:00
Ethan
149273c63f satellite/metainfo: add cache expiration for project level rate limiting
Allow rate limit project cache to expire so we can make project level rate limit changes without restarting the satellite process.

Change-Id: I159ea22edff5de7cbfcd13bfe70898dcef770e42
2020-01-29 16:14:10 +00:00
Egon Elbre
e319660f7a private/lifecycle: implement Group
lifecycle.Group implements controlling multiple items such
that their startup and close works.

Change-Id: Idb4f4a6c3a1f07cdcf44d3147a6c959686df0007
2020-01-29 00:37:33 +00:00
paul cannon
5a1838bc28 private/dbutil: retry single statements on cockroachdb
This ought to make it so that all single statements (Exec- or Query-) on
a CockroachDB backend will get retried as necessary. As there is no need
for savepoints to be allocated or released in this case, there is no
round-trip overhead except when statements actually do need to be
retried.

Change-Id: Ibd7f1725ff727477c456cb309120d080f3cd7099
2020-01-24 09:01:47 +00:00
Isaac Hess
2f77ce48f0 private/testplanet: Add databases to testplanet.databases near creation
We now close databases in testplanet in reverse order, knowing that some
caches and other objects need to close prior to the underlying db. Some
dbs were not being added near the list of closeable databases near their
creation, causing an issue with shutdown order.

Change-Id: I23391f4d77649030493e47bd7169002a72b3bf7a
2020-01-23 15:30:52 -07:00
Jeff Wendling
16bb374deb storagenode/piecestore: add large timeouts to read/write operations
this is to help protect against intentional or unintentional
slowloris style problems where a client keeps a tcp connection
alive but never sends any data. because grpc is great, we have
to spawn a separate goroutine for every read/write to the stream
so that we can return from the server handler to cancel it if
necessary. yep. really.

additionally, we update the rpcstatus package to do some stack
trace capture and add a Wrap method for the times where we want
to just use the existing error.

also fixes a number of TODOs where we attach status codes to the
returned errors in the endpoints.

Change-Id: Id8bb8ff84aa34e0f711b0cf9bce3908b36a1d3c1
2020-01-23 19:20:49 +00:00
Egon Elbre
89a148047d private/testplanet: shutdown databases in reverse order
Since we have caches on top of databases and they are included in the
databases list, we need to shut them down in-reverse order to avoid
issues with flushing to a closed database.

Change-Id: I3f23a527a2a5425638b1a7e2cab84741f019d493
2020-01-23 18:55:57 +00:00
paul cannon
fd84fa6316 private/dbutil: rollback pending transactions on panic
We don't do a lot of panicking in our main code, so hopefully this won't
matter much, but we /do/ call panic a lot in our tests (t.Fatal,
require.NoError, etc). And when that happens, we need pending
transactions to be aborted or we can get into a deadlock situation when
something else tries to /Close/ that connection.

Change-Id: Idaf0d543ac95afea34f9b2393d1187f5322e9f0f
2020-01-23 16:30:19 +00:00
Isaac Hess
40a890639d satellite/orders: Flush all pending bandwidth rollup writes on shutdown
Currently we risk losing pending bandwidth rollup writes even on a clean
shutdown. This change ensures that all pending writes are actually
written to the db when shutting down the satellite.

Change-Id: Ideab62fa9808937d3dce9585c52405d8c8a0e703
2020-01-23 08:12:41 -07:00
Egon Elbre
c6f94ce9e4 satellite/metainfo: remove support for boltdb based pointerDB
By previous changes we can now remove testplanet.New and
also remove metainfo boltdb support.

Change-Id: I5bdfbbbb45967492728e705b34b2fedb4f28c381
2020-01-23 13:54:00 +02:00
Egon Elbre
5a4745eddb all: remove usages of testplanet.New
Ensure that tests use testplanet.Run, so we always require running
against all database backends.

Change-Id: I6b0209e6a4912cf3328bd35b2c31bb8598930acb
2020-01-22 22:42:57 +02:00
Jeff Wendling
3b86917cc9 private/dbutil/pgutil: faster cockroach constraint finding
Change-Id: Ia100b9ef7d2d59dfad0389feb8f2e7c47c2c4c9b
2020-01-22 15:47:04 +00:00
Egon Elbre
fc2766eefc private/testplanet: flatten migration for running tests
Currently Cockroach DB setup takes a significant amount of time.
This flattens the database setup into a single query,
which improves the test time significantly.

The migration tests still test each migration separately.

Change-Id: Iaca16f34a6af3926fa2b5ebf618f939fd59460b3
2020-01-22 15:09:11 +00:00
Egon Elbre
8b3db70329 private/testplanet: increase metainfo rate limit
Rate limit was causing tests to fail due to making too many request.

Change-Id: Iafbc97b4880b6d98c86045b28ca7583d27f51720
2020-01-22 13:57:38 +00:00
Michal Niewrzal
6502454947 satellite/metainfo: move RS configuration to satellite
With this change RS configuration will be set on satellite. Uplink with
get RS values with BeginObject request and will use it. For backward
compatibility and to avoid super large change redundancy scheme stored
with bucket is not touched. This can be done in future.

Change-Id: Ia5f76fc10c37e2c44e4f7b8754f28eafe1f97eff
2020-01-22 09:33:53 +00:00
Ethan
21a5d70a83 satellite/metainfo: Rate limiting - API requests
Limits how many times metainfo APIs can be called per second by project ID. If limit is exceeded, the API will return Unauthorized/Too Many requests.

Limit per second and the size of the limiter cache per project are configurable, as well as whether the limiter is enabled.

Tests added/updated for the new rate_limit field in projects table.
Tests added for exceeding limits and disableing limiter.

Change-Id: Ic8ad102de3b690a475809d4f684156d5715f20fa
2020-01-21 14:25:04 +00:00
Michal Niewrzal
86f194769f uplink: adjust to changes in storj/uplink
This change is adjusting code base to changes in storj/uplink.

https://review.dev.storj.io/c/storj/uplink/+/643

Change-Id: Ieca87f9f5983e391bf4b4fec8b9d5491fd32bfa1
2020-01-20 22:06:19 +00:00
Egon Elbre
c1c878efcf all: fix import groupings
check-imports was broken and didn't complain about things.

Change-Id: I38adafd16b4aba86f0eb4f53427b4393f9a6c710
2020-01-20 17:47:44 +00:00
Egon Elbre
1279eeae39 private/tagsql,storage: fixes to context cancellation
Replace all the remaining uses of sql.DB with tagsql.DB to
fix issues with context cancellation.

Introduce tagsql.Open which helps to get rid of all tagsql.Wrap-s.
Use tagsql in cockroachkv and postgreskv.

Change-Id: I8946d203341cb85a25976896fc7881e1f704e779
2020-01-20 15:44:39 +02:00
Egon Elbre
10d932fd65 lib/uplinkc: fix test flakiness by setting MaxTimeSkew
Not having a skew caused an issue where:

1. Uplink calls "begin segment", where segment isn't committed to the
database.
2. Uplink stores piece X to the storage node A with timestamp 1.
3. Satellite runs garbage collection with timestamp 2.
4. Satellite sends retain request to storage node A with timestamp 2.
5. Storage node A deletes piece X, because 1 < 2.
6. Uplink calls "commit segment" with storage node A in it.
7. Download of segment fails, because A doesn't have piece X.

In production this is not an issue since the MaxTimeSkew is 72h by
default.

Change-Id: Id87ca3ddc44103dcd85d031b1367168c014b8e7b
2020-01-20 12:44:42 +00:00
Egon Elbre
ee0293c212 private/dbutil/sqliteutil: add missing err check
Change-Id: Ie18c76d0e6d02a5c55e2d6503437b8a07b47a64e
2020-01-19 19:24:58 +00:00
Egon Elbre
1abfe42142 satellite: use tagsql
Change-Id: I2170dee409fb0c2fe85913ddd36e7811a3b853ed
2020-01-19 14:39:16 +02:00
Egon Elbre
25b76fe63f storagenode/storagenodedb: use tagsql
Change-Id: Iba3b34a97b982deb4f72ce55517a294f249b6b55
2020-01-19 14:39:16 +02:00
Egon Elbre
59d06644b9 private/migrate: switch to tagsql
Also added temporary types withRebind and withTagTx,
which will be later removed. Currently they help to avoid
changing the whole codebase at the same time.

Change-Id: I7f07ba8f4709a23a463bfa67464628665a05808f
2020-01-19 14:39:16 +02:00
Egon Elbre
5fd833b108 private/dbutil: remove basic Query
dbschema.Query is used only for testing and sqlite,
so this won't cause us problems in production.

Change-Id: Ib296a7daf161a9d3de23a7dfdc4f505d47ac4a37
2020-01-19 14:39:16 +02:00
stefanbenten
f4097d518c satellite: reduce logging of node status
Change-Id: I6618cf4bf31b856acd7a28b54011a943c03ab22a
2020-01-18 17:47:59 +00:00
Moby von Briesen
273eb66fae cmd/storagenode,storagenode/preflight: add config flag to disable
storagenode database preflight check.

Disable preflight database check by default, and have the option to
enable it. This will allow us to enable it once it is definitely
working.

Also change the name of the config flag for preflight  time sync.

Change-Id: Ie2e20f9e25dcb38794eafa7e1505e7c6ff287c99
2020-01-17 17:53:17 +00:00
Egon Elbre
5d80e22af9 private/tagsql: implement wrapper for sql.DB
Wrapper adds tracing and fixes context usage issues.

Change-Id: Ie6f7650eac87e2a2b64b760198498ba5857ad535
2020-01-17 13:52:12 +00:00
Cameron Ayer
4424697d7f satellite/accounting: refactor live accounting to hold current estimated totals
live accounting used to be a cache to store writes before they are picked up during
the tally iteration, after which the cache is cleared. This created a window in which
users could potentially exceed the storage limit. This PR refactors live accounting to
hold current estimations of space used per project. This should also reduce DB load
since we no longer need to query the satellite DB when checking space used for limiting.

The mechanism by which the new live accounting system works is as follows:

During the upload of any segment, the size of that segment is added to its respective
project total in live accounting. At the beginning of the tally iteration we record
the current values in live accounting as `initialLiveTotals`. At the end of the tally
iteration we again record the current totals in live accounting as `latestLiveTotals`.
The metainfo loop observer in tally allows us to get the project totals from what it
observed in metainfo DB which are stored in `tallyProjectTotals`. However, for any
particular segment uploaded during the metainfo loop, the observer may or may not
have seen it. Thus, we take half of the difference between `latestLiveTotals` and
`initialLiveTotals`, and add that to the total that was found during tally and set that
as the new live accounting total.

Initially, live accounting was storing the total stored amount across all nodes rather than
the segment size, which is inconsistent with how we record amounts stored in the project
accounting DB, so we have refactored live accounting to record segment size

Change-Id: Ie48bfdef453428fcdc180b2d781a69d58fd927fb
2020-01-16 10:26:49 -05:00
Jeff Wendling
78c6d5bb32 satellite/satellitedb: reported_serials table for processing orders
this commit introduces the reported_serials table. its purpose is
to allow for blind writes into it as nodes report in so that we have
minimal contention. in order to continue to accurately account for
used bandwidth, though, we cannot immediately add the settled amount.
if we did, we would have to give up on blind writes.

the table's primary key is structured precisely so that we can quickly
find expired orders and so that we maximally benefit from rocksdb
path prefix compression. we do this by rounding the expires at time
forward to the next day, effectively giving us storagenode petnames
for free. and since there's no secondary index or foreign key
constraints, this design should use significantly less space than
the current used_serials table while also reducing contention.

after inserting the orders into the table, we have a chore that
periodically consumes all of the expired orders in it and inserts
them into the existing rollups tables. this is as if we changed
the nodes to report as the order expired rather than as soon as
possible, so the belief in correctness of the refactor is higher.

since we are able to process large batches of orders (typically
a day's worth), we can use the code to maximally batch inserts into
the rollup tables to make inserts as friendly as possible to
cockroach.

Change-Id: I25d609ca2679b8331979184f16c6d46d4f74c1a6
2020-01-15 19:21:21 -07:00
Yingrong Zhao
db8aee0806 satellite/contact; storagenode/preflight: add clock check on startup for storagenode
add config preflight.enabled-local-time

Change-Id: I7b942c9bee063aae409ee6721ae9d079dff0144f
2020-01-15 15:35:26 +00:00
Egon Elbre
08f63614be private/context2: add WithoutCancellation
Change-Id: I38557c16f41b8983886f256353cc6afb7634d9e6
2020-01-15 14:23:46 +02:00
Egon Elbre
64fb2d3d2f Revert "dbutil: statically require all databases accesses to use contexts"
This reverts commit 8e242cd012.

Revert because lib/pq has known issues with context cancellation.
These issues need to be resolved before these changes can be merged.

Change-Id: I160af51dbc2d67c5449aafa406a403e5367bb555
2020-01-15 07:28:00 +00:00
JT Olio
c01cbe0130 satellitedb: save out all db-touching traces
Change-Id: Ib1e192221f9da813fd9cbb55f620a047b82c9523
2020-01-14 18:47:45 -05:00
JT Olio
8e242cd012 dbutil: statically require all databases accesses to use contexts
this will allow for some nice runtime analysis down the road.
also, this allows for wrapping database handles in a way that
can interact with these contexts

requires https://review.dev.storj.io/c/storj/dbx/+/514

Change-Id: Ib087b7cd73296dd2c1e0331314da34d861f61d2b
2020-01-14 18:20:47 -05:00
Egon Elbre
64f056bee4 private/dbutil/sqlutil: use context in queries
Change-Id: Icb92daa483d13e6d57013f3917571d476126bfd2
2020-01-14 20:27:09 +00:00
Egon Elbre
df9e53ea0b private: ensure we don't eat the underlying error
When error is formatted using %v it's not possible to check
whether the error was caused by a context cancellation.

Change-Id: I164d1c83cdf5e7e6eacf082145b5c6a47078d041
2020-01-14 20:26:51 +00:00
Egon Elbre
cd4ff0722e private/testplanet: use defaultInterval
Change-Id: Ife2810be46faaaf8cd51b193a859a88fff894a0e
2020-01-14 16:07:36 +00:00
Isaac Hess
4950d7106a satellite/orders: Add write cache for bw rollups
Change-Id: I8ba454cb2ab4742cafd6ed09120e4240874831fc
2020-01-13 22:40:51 +00:00
Egon Elbre
b9740f0c0a storage/cockroachkv: add ctx argument
Change-Id: Ib6c29f44722b0354afcd499a0e567f04aef7eb28
2020-01-13 15:57:47 +02:00
Egon Elbre
ff267168c5 private/migrate: add ctx argument
Change-Id: I3d65912d89261386413c494c7ed1576fed4dcaf4
2020-01-13 15:52:26 +02:00
Egon Elbre
24958bd7d3 satellite: add ctx to DB.CreateTables
Change-Id: I9ecad624cf5a7fc9c86bb91c68f96a3a4efd2e92
2020-01-13 15:31:09 +02:00
Egon Elbre
0835b9024c private/dbutil/pgutil: add ctx argument
Change-Id: Icfd56ca8c1f831ad56c0195a0b883e8f0618daaf
2020-01-13 15:27:06 +02:00
Egon Elbre
c7b846589e private/dbutil/sqliteutil: add ctx argument
Change-Id: If1caa9cde746817e62cae32a152eeec81959129c
2020-01-13 15:03:30 +02:00
Michal Niewrzal
b579c260ab cmd: rename "scope" flag to "access"
We decided that better name for "scope" will be "access". This change
refactors cmd part of code but don't touch libuplink. For backward
compatibility old configs with "scope" field will be loaded without any
issue. Old flag "scope" won't be supported directly from command line.

https://storjlabs.atlassian.net/browse/V3-3488

Change-Id: I349d6971c798380d147937c91e887edb5e9ae4aa
2020-01-10 15:27:53 +00:00
Natalie Ventura Villasana
6b1829f3c3
satellite/downtime: new chore estimates downtime
Adds EstimationChore to the downtime package, which is an
independent chore that finds offline nodes given a configurable
limit, then uptime checks those nodes, and sets a last contact
success or failure given a response. For failed nodes, the chore
updates the amount of downtime the node has been offline in the
DowntimeTracking table.

Design doc section: https://github.com/storj/storj/blob/master/docs/blueprints/storage-node-downtime-tracking.md#estimating-offline-time
Jira: https://storjlabs.atlassian.net/browse/V3-2545

Change-Id: I60af95803930bf9b33232b248bb20cca6f0e0b5f
2020-01-09 15:05:13 -05:00
Egon Elbre
8d8d57c3b5 mod: update sqlite module to v2.0.2
This updates SQLite amalgamation from 3.29.0 to 3.30.1.

The module contains fixes for races.

Change-Id: Ic6a06a43ba404de0091d8a2f7444a8f4b1d5d54c
2020-01-08 21:21:15 +02:00
Yingrong Zhao
76ee8a1b4c satellite: remove UptimeReputation configs from codebase
With the new storage node downtime tracking feature, we need remove current uptime reputation configs: UptimeReputationAlpha, UptimeReputationBeta, and
UptimeReputationDQ. This is the first step of removing the uptime
reputation columns from satellitedb

Change-Id: Ie8fab13295dbf545e33aeda0c4306cda4ba54e36
2020-01-08 18:54:15 +00:00
Egon Elbre
082ec81714
uplink: move to storj.io/uplink (#3746) 2020-01-08 15:40:19 +02:00
Egon Elbre
cf2128d3b9 uplink: avoid cyclic dependency to storj.io
This helps to simplify splitting and running tests.

Change-Id: I4aaf077df7fd6bd6f14f10cb902850883349eaf5
2020-01-08 14:51:33 +02:00