Commit Graph

171 Commits

Author SHA1 Message Date
Jeff Wendling
91698207cf storagenode: live tracking of order window usage
This change accomplishes multiple things:

1. Instead of having a max in flight time, which means
   we effectively have a minimum bandwidth for uploads
   and downloads, we keep track of what windows have
   active requests happening in them.

2. We don't double check when we save the order to see if it
   is too old: by then, it's too late. A malicious uplink
   could just submit orders outside of the grace window and
   receive all the data, but the node would just not commit
   it, so the uplink gets free traffic. Because the endpoints
   also check for the order being too old, this would be a
   very tight race that depends on knowledge of the node system
   clock, but best to not have the race exist. Instead, we piggy
   back off of the in flight tracking and do the check when
   we start to handle the order, and commit at the end.

3. Change the functions that send orders and list unsent
   orders to accept a time at which that operation is
   happening. This way, in tests, we can pretend we're
   listing or sending far into the future after the windows
   are available to send, rather than exposing test functions
   to modify internal state about the grace period to get
   the desired effect. This brings tests closer to actual
   usage in production.

4. Change the calculation for if an order is allowed to be
   enqueued due to the grace period to just look at the
   order creation time, rather than some computation involving
   the window it will be in. In this way, you can easily
   answer the question of "will this order be accepted?" by
   asking "is it older than X?" where X is the grace period.

5. Increases the frequency we check to send up orders to once
   every 5 minutes instead of once every hour because we already
   have hour-long buffering due to the windows. This decreases
   the maximum latency that an order will be reported back to
   the satellite by 55 minutes.

Change-Id: Ie08b90d139d45ee89b82347e191a2f8db1b88036
2020-08-19 19:42:33 +00:00
Egon Elbre
be3fd0147e storagenode/storagenodedb: database name in all preflight errors
Shorten the error strings and include database name in all potential
preflight errors.

Change-Id: Ic92ca1ec6e14ffbddb0a0cf89e357eec9532d27e
2020-08-18 16:31:19 +03:00
Yaroslav Vorobiov
4d2a505788 storagenode/db: explicitly open and create dbs
To prevent storagenode from implicitly recreating missing dbs and storage,
as such behaviour leads to audit failures. Do not allow storagenode to
start if any of dbs or storage is missing, corrupted, or dedicated storage disk is
unmounted, to get downtime instead.

Change-Id: Ic64e1f0ff4d8ef5b2fddbe7a7e53df4f4bd8652e
2020-07-24 14:08:47 +03:00
Egon Elbre
d8dcae3075 all: fix error checking
Change-Id: Ia0da1bbd6ce695139922f94096c2419281905e32
2020-07-16 19:13:14 +03:00
Egon Elbre
080ba47a06 all: fix dots
Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2
2020-07-16 14:58:28 +00:00
stefanbenten
257855b5de all: replace == comparison with errors.Is
Change-Id: I05d9a369c7c6f144b94a4c524e8aea18eb9cb714
2020-07-14 15:50:25 +00:00
Qweder93
ac716e1514 storagenode/heldamount: payment receipt added to monthly paystub, heldamount.service separated for service and endpoint
Change-Id: Id759586c6362edbef34c230d4f0d2585c11c9b47
2020-07-06 09:51:52 +00:00
Cameron Ayer
35b709ba18 storagenode/storagenodedb: check if db is nil before closing
In the event of an error in storj.io/storj/storagenode/storagenodedb.(*DB).openDatabases
the caller will attempt to close all databases. However, the error prevents the DB from
being opened and set in the proper place. Attempting to close results in a nil pointer
dereference

https://forum.storj.io/t/node-wont-start-after-update-to-v1-6-4-runtime-error-invalid-memory-address-or-nil-pointer-dereference/7889

Change-Id: Ibfe6f3e13c36d9d15a0cb46e384f0120afdab60b
2020-07-02 15:02:38 +00:00
Qweder93
9b90712aa0 storagenode/heldamount: payents added to db
Change-Id: Ib6c486251ca08d34003c35379d10314127edf103
2020-06-30 17:24:35 +03:00
Qweder93
9a02149654 storagenode/heladamount: held history extended with joined_at date, total_held and total_disposed amounts
Change-Id: I41fe9ab8c5667aa988257a94848ea70225305d79
2020-06-30 13:33:25 +00:00
Yaroslav Vorobiov
09ca382abf storagenode/db: preflight improve index discovery
Change-Id: I876b321f6cd4e91dfced87aa4d39f2cf9a8e63d0
2020-06-05 14:03:25 +03:00
Egon Elbre
07050eea26 all: use common/storj
Change-Id: Id1e36d52f9807b5ffbb72ce73f4b60cb21b68a78
2020-05-29 11:57:32 +03:00
Moby von Briesen
dc57640d9c storagenode/piecestore: switch usedserials db for in-memory usedserials store
Part 2 of moving usedserials in memory
* Drop usedserials table in storagenodedb
* Use in-memory usedserials store in place of db for order limit
verification
* Update order limit grace period to be only one hour - this means
uplinks must send their order limits to storagenodes within an hour of
receiving them

Change-Id: I37a0e1d2ca6cb80854a3ef495af2d1d1f92e9f03
2020-05-28 12:52:52 -04:00
Qweder93
73214c6d1c storagenode/heldamount: heldhistory reworked to all satellites
Change-Id: I8d7707fddfbdc52d29951a8a002978c7fbb07049
2020-05-28 11:44:26 +00:00
Qweder93
f2a0c64425 storage/filestore: log potential disk corruption
In walkNamespaceWithPrefix log in case of "lstat" error, because this may indicate an underlying disk corruption.

SG-50

Change-Id: I867c3ffc47cfac325ae90658ec4780d213ff3e63
2020-05-27 12:12:55 +00:00
crawter
f5ac678b0a storagenode/satellitesdb: added FK constraint to satelliteID
Change-Id: If5adf2b92627fcf80850670ba672b346320ddd87
2020-05-21 13:01:20 +00:00
crawter
2c9afe7f17 storagenode/console/api/helamount: periods with heldamount data endpoint added
Change-Id: Ie893f56f02c7a76bcfc21c32c10bd1f1d05660e7
2020-05-20 11:45:06 +00:00
Qweder93
49ad90dcd8 storagenode/reputation: unknown_score (unknown_alpha / unknown_alpha+unknown_beta) added to reputation stats, https://storjlabs.atlassian.net/browse/SG-326
Change-Id: I0b29ad736f7a11c7e57a846b6891f4b40755aa48
2020-05-20 11:25:14 +00:00
Jeff Wendling
efee3563a8 storagenode/storagenodedb: fix joined_at migration
Change-Id: I9091741bbec6e7a2fc6addc23249a40176ec040a
2020-05-05 15:01:07 -06:00
Qweder93
b754806b23 storagenode/reputation: unknown_audit_reputation_alpha and beta added to db, and reputation endpoint
Change-Id: I5d92268851bb9a202cc5d6b4d403467e6f692726
2020-05-05 15:48:04 +00:00
Qweder93
16cd9b06ec storagenode/heldamount: added api for heldamount history separated by periods
Change-Id: I170010364269822848bc6cd051e0e0fb3df95d91
2020-05-05 12:29:44 +00:00
Egon Elbre
c630cf2490 storagenode/pieces: implement buffering for writing
Currently uploads can cause a lot of IOPS, reduce this by introducing a
in-memory buffer on-top of the file.

Change-Id: I5f4e3e01c0a36258271d180b922107de447bcb59
2020-05-04 06:01:32 +00:00
Jeff Wendling
4ad01e8170 storagenode/storagenodedb: backfill reputation.joined_at
it was being used in ways that implied it should be NOT NULL
even though it was possibly null. we used to get this data
from the satellite db's added_at column as seen in 30369b02,
so backfill it using that data where joined_at is NULL, and
then alter the table to constrain the column to be NOT NULL.

Fixes #3866.

Change-Id: If2d856189209740d985f71dada7b93525e625ef3
2020-05-01 17:59:53 +00:00
Jeff Wendling
202e65e408 storagenode/storagenodedb: do correct generalized alter table procedure
According to the docs at https://www.sqlite.org/lang_altertable.html
doing the steps

    1. Rename old table
    2. Create new table
    3. Copy data
    4. Drop old table

is incorrect and should be

    1. Create new table
    2. Copy data
    3. Drop old table
    4. Rename new into old

Additionally, each step was being run in a different transaction,
which could cause permanent failures if a problem happened during
the migration.

Avoid both of those problems by changing up some previous migrations
that ran in this way. Since they are semantically identical, it's
fine to change up these old migrations. It will help make newer
nodes coming up for the first time more robust.

Change-Id: I43fb004fa1b6cb2fe2554f9920925420da28fb4a
2020-05-01 13:53:26 +00:00
Egon Elbre
8928399d02 all: rename CreateTables to MigrateToLatest
CreateTables hasn't been quite true for a while now, rename to
MigrateToLatest to be clearer in it's behavior.

Change-Id: Ida48e95122a5d9b7a814e922d3698e00024a2ba7
2020-04-30 07:21:17 +00:00
Egon Elbre
85c45cd56f private/dbutil/pgtest: support multiple databases for testing
Currently Cockroach isn't performant for concurrent database setup and
tear-down. Instead of a single instance allow setting multiple potential
connection strings and let the tests pick one connection string
randomly.

This improves test duration by ~10 minutes.

While we are at significantly changing how pgtest works, introduce
helper PickPostgres and PickCockroach for selecting the database to
reduce code duplications in multiple places.

Change-Id: I8ad171d5c4c8a4fc081ec2ae9bdd0cc948a80619
2020-04-28 21:55:49 +03:00
Qweder93
805e328c47 storagenode/heldamount payments removed
Change-Id: I87cc04f43d182a4190a571ef417be85d02db9d34
2020-04-21 17:15:31 +00:00
Qweder93
30369b027c storagenode/storagenodedb/reputation: add joined_at
Change-Id: Ic471fac97bf54b537f2c34f24b4069b0641c746d
2020-04-17 12:12:09 +00:00
Kaloyan Raev
a2ce836761 remove sugar logging
Change-Id: I6b6ca9704837cb3f5f5449ba7f55661487814d9f
2020-04-15 12:37:47 +00:00
Qweder93
743b3fb226 storagenode/nodestats: add pricing model, storagenode/cache: add paystub history storing
Change-Id: I9bc104a1407c8f286a964c796656d89b122bf752
2020-04-14 19:04:00 +03:00
Moby von Briesen
14b3704f56 storagenode: add suspended status to storagenode dashboard/api
* Add migration to storagenode reputation table to add suspended
timestamp
* Send suspended info to storagenode from satellite nodestats endpoint
* Add suspended status to storagenode api
* Add an indicator on the storagenode dashboard informing operator of
the satellites the node is suspended on

Change-Id: Ie3669f6069cc0258ba76ec99d17006e1b5fd9c8a
2020-04-09 13:36:23 +00:00
Egon Elbre
11a44cdd88 all: don't depend on gogo/proto directly
Change-Id: I8822dea0d1b7b99e0b828e0373a0308a42dde2be
2020-04-08 17:32:15 +00:00
Stefan Benten
0a4d253990
storagenode/storagenodedb: Improve preflight schema error message (#3844) 2020-04-03 11:20:24 +02:00
Egon Elbre
8f73fb7a32 all: simplify uuid usage
uuid.UUID implements driver.Value so it can be directly used as a
scannable result.

Replace uses of dbutil.BytesToUUID with uuid.FromBytes.

Change-Id: I51a670185ceb3cc2199d5aa2b76bc3fc191ca8fe
2020-04-02 05:48:58 +00:00
Egon Elbre
0a69da4ff1 all: switch to storj.io/common/uuid
Change-Id: I178a0a8dac691e57bce317b91411292fb3c40c9f
2020-03-31 19:16:41 +03:00
Jeff Wendling
97e980cd8a private/dbutil: add database name to configure as a tag
storagenodes have like 10 or more databases. without this
tag they all get sent as the same value, stomping on each
other.

Change-Id: Ib12019684d6ea8f2a5b83df584056dfa79e3c4b3
2020-03-26 16:50:15 +00:00
Michal Niewrzal
f0aeda3091 storj: remove from storj/pkg packages moved to storj/private repo
* debug
* traces
* cfgstruct
* process

Package `storj/private/version` will be removed as a separate change.

Change-Id: Iadc40faa782e6225513b28218952f02d9c240a9f
2020-03-24 09:56:29 +01:00
Qweder93
8597e6b512 storagenode/console/api period payment api extended
Change-Id: I18ec331c6a684e3a9351e3c917bacdb8b8f18c28
2020-03-19 16:51:31 +02:00
crawter
fde5c3542b storagenode/console/api: period payStub api extended
Change-Id: I624bbf7a9640f9df97789bea109201cbfb556753
2020-03-19 14:42:02 +02:00
crawter
45507d285b storagenode/storagenodedb: heldamount tests added
Change-Id: I862b0b38349a11254426855ffafb1f2f0845cb4c
2020-03-15 14:55:11 +00:00
Qweder93
5ccce04338 storagenode/storagenodedb: heldamount added
Change-Id: I213e3abffd7356bbfccb3f33bcbafa558674b8d9
2020-03-13 16:23:59 +00:00
Moby von Briesen
178dbb4683 storagenode/storagenodedb: allow storagenodes to start test_table exists
In many cases when a storagenode fails the preflight check, it is due to
test_table existing, which is used to determine read/write capabilities
after the initial schema verification. If preflight ends early due to a
failure or stopped storagenode, it may not get the chance to drop this
table.

This change excludes test_table from the schema comparison to ensure
that it never prevents a storagenode from starting up.

It also adds Preflight DB test for storagenode.

Change-Id: Ib8e71df2e42fda3b2a364fbf7a801891c5831d39
2020-03-09 14:29:46 -04:00
NikolaiYurchenko
2601f25c98 web/storagenode: notification logic implementation
Change-Id: Iec741997312203117213674ef85125fa8a976249
2020-02-21 15:49:27 +00:00
Jeff Wendling
7999d24f81 all: use monkit v3
this commit updates our monkit dependency to the v3 version where
it outputs in an influx style. this makes discovery much easier
as many tools are built to look at it this way.

graphite and rothko will suffer some due to no longer being a tree
based on dots. hopefully time will exist to update rothko to
index based on the new metric format.

it adds an influx output for the statreceiver so that we can
write to influxdb v1 or v2 directly.

Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff
2020-02-05 23:53:17 +00:00
Jeff Wendling
d20db90cff private/dbutil/txutil: create new transactions for retries
it was noticed that if you had a long lived transaction A that
was blocking some other transaction B and A was being aborted
due to retriable errors, then transaction B was never given
priority. this was due to using savepoints to do lightweight
retries.

this behavior was problematic becaue we had some queries blocked
for over 16 hours, so this commit addresses the issue with two
prongs:

    1. bound the amount of time we will retry a transaction
    2. create new transactions when a retry is needed

the first ensures that we never wait for 16 hours, and the value
chosen is 10 minutes. that should be long enough for an ample
amount of retries for small queries, and huge queries probably
shouldn't be retried, even if possible: it's more preferrable to
find a way to make them smaller.

the second ensures that even in the case of retries, queries that
are blocked on the aborted transaction gain priority to run.

between those two changes, the maximum stall time due to retries
should be bounded to around 10 minutes.

Change-Id: Icf898501ef505a89738820a3fae2580988f9f5f4
2020-02-01 18:34:28 +00:00
Jeff Wendling
71ff044edb storagenode/bandwidth: fix tests to not fail for 10 hours near the end of the month
Change-Id: I390569a8702164c42edddd3be020e93782227c2e
2020-01-31 16:25:52 -07:00
Egon Elbre
d0b4272467 storagenode: fix global logger in tests
https://github.com/storj/storj/wiki/Testing#logging

Change-Id: Ic6a31360bcfedae3f37f6b2536a345f00e33cd78
2020-01-31 14:09:28 +00:00
Isaac Hess
4dafd03f11 storagenode: Prevent negative values in piece_space_used, migrate negatives to 0
Change-Id: Ibd663db087058c928190aa52c520f22e9338dd04
2020-01-30 13:03:18 -05:00
Jeff Wendling
21b65ca3b0 storagenode/storagenodedb: migrate to set total to content_size
Change-Id: I4906c2fe9cdb3a32c045c98039d4bde6b8b809e3
2020-01-30 08:53:12 -07:00
Isaac Hess
14fd6a9ef0 storagenode/pieces: Track total piece size
This change updates the storagenode piecestore apis to expose access to
the full piece size stored on disk. Previously we only had access to
(and only kept a cache of) the content size used for all pieces. This
was inaccurate when reporting the amount of disk space used by nodes.

We now have access to the total content size, as well as the total disk
usage, of all pieces. The pieces cache also keeps a cache of the total
piece size along with the content size.

Change-Id: I4fffe7e1257e04c46021a2e37c5adc6fe69bee55
2020-01-23 11:00:24 -07:00