while satellites have also run this logic, old satellites that
no longer exist cannot and so the node cannot get the updated
data. this locally migrates it so that the calculations for
the undistributed amounts are correct.
there's also some tab/space whitespace and gofmt fixes.
Change-Id: I470879703314fe6541eaba5f21b47849781894f8
Full scope:
storagenode/{console,nodestats,notifications,reputation,storagenodedb},
web/storagenode
These columns are deprecated. They used to be for the uptime reputation
system which has been replaced by downtime tracking with audits.
Change-Id: I151d6569577d89733ac97af21a1d885323522b21
Use tagsql.DB pointer as step database, to propagate changes
back and forth between actual database and migration.
Adds CreateDB operation to the migration step to be able to
create new dbs before executing migration action.
Adjusts storagenode database migration to use inner tagsql.DB
pointer of each database as step.DB.
Adjusts satellite dabase migration, adds proxy migrationDB field
to satellite db that wraps itself as tagsql.DB, pointer of which
is used as step.DB.
Change-Id: Ifed4de5b01a356cf7b37db64d2eaeb7b61982c5c
To prevent storagenode from implicitly recreating missing dbs and storage,
as such behaviour leads to audit failures. Do not allow storagenode to
start if any of dbs or storage is missing, corrupted, or dedicated storage disk is
unmounted, to get downtime instead.
Change-Id: Ic64e1f0ff4d8ef5b2fddbe7a7e53df4f4bd8652e
Part 2 of moving usedserials in memory
* Drop usedserials table in storagenodedb
* Use in-memory usedserials store in place of db for order limit
verification
* Update order limit grace period to be only one hour - this means
uplinks must send their order limits to storagenodes within an hour of
receiving them
Change-Id: I37a0e1d2ca6cb80854a3ef495af2d1d1f92e9f03
In walkNamespaceWithPrefix log in case of "lstat" error, because this may indicate an underlying disk corruption.
SG-50
Change-Id: I867c3ffc47cfac325ae90658ec4780d213ff3e63
Currently uploads can cause a lot of IOPS, reduce this by introducing a
in-memory buffer on-top of the file.
Change-Id: I5f4e3e01c0a36258271d180b922107de447bcb59
it was being used in ways that implied it should be NOT NULL
even though it was possibly null. we used to get this data
from the satellite db's added_at column as seen in 30369b02,
so backfill it using that data where joined_at is NULL, and
then alter the table to constrain the column to be NOT NULL.
Fixes#3866.
Change-Id: If2d856189209740d985f71dada7b93525e625ef3
According to the docs at https://www.sqlite.org/lang_altertable.html
doing the steps
1. Rename old table
2. Create new table
3. Copy data
4. Drop old table
is incorrect and should be
1. Create new table
2. Copy data
3. Drop old table
4. Rename new into old
Additionally, each step was being run in a different transaction,
which could cause permanent failures if a problem happened during
the migration.
Avoid both of those problems by changing up some previous migrations
that ran in this way. Since they are semantically identical, it's
fine to change up these old migrations. It will help make newer
nodes coming up for the first time more robust.
Change-Id: I43fb004fa1b6cb2fe2554f9920925420da28fb4a
CreateTables hasn't been quite true for a while now, rename to
MigrateToLatest to be clearer in it's behavior.
Change-Id: Ida48e95122a5d9b7a814e922d3698e00024a2ba7
* Add migration to storagenode reputation table to add suspended
timestamp
* Send suspended info to storagenode from satellite nodestats endpoint
* Add suspended status to storagenode api
* Add an indicator on the storagenode dashboard informing operator of
the satellites the node is suspended on
Change-Id: Ie3669f6069cc0258ba76ec99d17006e1b5fd9c8a
storagenodes have like 10 or more databases. without this
tag they all get sent as the same value, stomping on each
other.
Change-Id: Ib12019684d6ea8f2a5b83df584056dfa79e3c4b3
In many cases when a storagenode fails the preflight check, it is due to
test_table existing, which is used to determine read/write capabilities
after the initial schema verification. If preflight ends early due to a
failure or stopped storagenode, it may not get the chance to drop this
table.
This change excludes test_table from the schema comparison to ensure
that it never prevents a storagenode from starting up.
It also adds Preflight DB test for storagenode.
Change-Id: Ib8e71df2e42fda3b2a364fbf7a801891c5831d39
this commit updates our monkit dependency to the v3 version where
it outputs in an influx style. this makes discovery much easier
as many tools are built to look at it this way.
graphite and rothko will suffer some due to no longer being a tree
based on dots. hopefully time will exist to update rothko to
index based on the new metric format.
it adds an influx output for the statreceiver so that we can
write to influxdb v1 or v2 directly.
Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff
it was noticed that if you had a long lived transaction A that
was blocking some other transaction B and A was being aborted
due to retriable errors, then transaction B was never given
priority. this was due to using savepoints to do lightweight
retries.
this behavior was problematic becaue we had some queries blocked
for over 16 hours, so this commit addresses the issue with two
prongs:
1. bound the amount of time we will retry a transaction
2. create new transactions when a retry is needed
the first ensures that we never wait for 16 hours, and the value
chosen is 10 minutes. that should be long enough for an ample
amount of retries for small queries, and huge queries probably
shouldn't be retried, even if possible: it's more preferrable to
find a way to make them smaller.
the second ensures that even in the case of retries, queries that
are blocked on the aborted transaction gain priority to run.
between those two changes, the maximum stall time due to retries
should be bounded to around 10 minutes.
Change-Id: Icf898501ef505a89738820a3fae2580988f9f5f4
This change updates the storagenode piecestore apis to expose access to
the full piece size stored on disk. Previously we only had access to
(and only kept a cache of) the content size used for all pieces. This
was inaccurate when reporting the amount of disk space used by nodes.
We now have access to the total content size, as well as the total disk
usage, of all pieces. The pieces cache also keeps a cache of the total
piece size along with the content size.
Change-Id: I4fffe7e1257e04c46021a2e37c5adc6fe69bee55