Commit Graph

563 Commits

Author SHA1 Message Date
JT Olio
b872fe52a1 satellite/repair: switch to piecestore.UploadReader
Change-Id: Ia99ad2cf5422e6ba1d98b32946740f9cadba7b6d
2020-09-01 09:26:54 -06:00
Cameron Ayer
ca0c1a5f0c storagenode/{monitor,pieces}, storage/filestore: add loop to check storage directory writability
periodically create and delete a temp file in the storage directory
to verify writability. If this check fails, shut the node down.

Change-Id: I433e3a8d1d775fc779ae78e7cf3144a05ffd0574
2020-08-31 21:20:49 +00:00
nerdatwork
e072febbcc
Fixed typo in log for allocated space (#3934) 2020-08-29 16:36:37 +02:00
Egon Elbre
c86c732fc0 satellite: simplify tests
satellite.DB.Console().Projects().GetAll database query
can be replaced with planet.Uplinks[0].Projects[0].ID

Change-Id: I73b82b91afb2dde7b690917345b798f9d81f6831
2020-08-28 22:28:04 +00:00
Egon Elbre
3ca405aa97 satellite/orders: use metabase types as arguments
Change-Id: I7ddaad207c20572a5ea762667531770a56fd54ef
2020-08-28 15:52:37 +03:00
Qweder93
c4a4745dd8 storagenode/console: audit per satellite now uses satelliteName instead of satelliteID
Change-Id: I8221ec840f654a62aedfb62a4194616db890f539
2020-08-25 12:52:47 +00:00
Qweder93
f16cf5cccf storagenode/console & /inspector: added recalculation of disk space info
Change-Id: Id003d031a6464ec095c31290fd6a756ead644261
2020-08-25 14:19:10 +03:00
Egon Elbre
f0ef01de5b storagenode/gracefulexit: retry workers faster
Change-Id: Ica20a691ff117a2b36a6362ee1fed21ce49a9ac1
2020-08-24 12:27:27 +03:00
Egon Elbre
e6bea41083 Revert "gracefulexit: reconnect added"
This reverts commit cff44fbd19.

Change-Id: I6590f483493e308b8244151e1df7570fd32ca2f8
2020-08-23 18:11:24 +03:00
Qweder93
cff44fbd19 gracefulexit: reconnect added
Change-Id: I236689af944effe3e79ef92e852ae264d3b372e5
2020-08-22 14:59:46 +03:00
Moby von Briesen
68b67c83a7 storagenode/{orders,piecestore}: Always unlock unsent orders file, even with an empty order.
When we call ordersStore.BeginEnqueue, the unsent orders file for that
satellite and hour is prevented from being sent. It is freed when the
commit callback returned by BeginEnqueue is used. This change ensures
that we always call the commit callback, even when we have an empty
order or an order with Amount <= 0.

Change-Id: Ic4678f7eaa1e6957dd77d4bb5a23bb35d25b1e93
2020-08-21 11:35:31 -04:00
littleskunk
db57d76ee9
storagenode/gracefulexit: fix wrong error handling for corrupted pieces (#3930) 2020-08-21 11:35:03 +02:00
Jeff Wendling
91698207cf storagenode: live tracking of order window usage
This change accomplishes multiple things:

1. Instead of having a max in flight time, which means
   we effectively have a minimum bandwidth for uploads
   and downloads, we keep track of what windows have
   active requests happening in them.

2. We don't double check when we save the order to see if it
   is too old: by then, it's too late. A malicious uplink
   could just submit orders outside of the grace window and
   receive all the data, but the node would just not commit
   it, so the uplink gets free traffic. Because the endpoints
   also check for the order being too old, this would be a
   very tight race that depends on knowledge of the node system
   clock, but best to not have the race exist. Instead, we piggy
   back off of the in flight tracking and do the check when
   we start to handle the order, and commit at the end.

3. Change the functions that send orders and list unsent
   orders to accept a time at which that operation is
   happening. This way, in tests, we can pretend we're
   listing or sending far into the future after the windows
   are available to send, rather than exposing test functions
   to modify internal state about the grace period to get
   the desired effect. This brings tests closer to actual
   usage in production.

4. Change the calculation for if an order is allowed to be
   enqueued due to the grace period to just look at the
   order creation time, rather than some computation involving
   the window it will be in. In this way, you can easily
   answer the question of "will this order be accepted?" by
   asking "is it older than X?" where X is the grace period.

5. Increases the frequency we check to send up orders to once
   every 5 minutes instead of once every hour because we already
   have hour-long buffering due to the windows. This decreases
   the maximum latency that an order will be reported back to
   the satellite by 55 minutes.

Change-Id: Ie08b90d139d45ee89b82347e191a2f8db1b88036
2020-08-19 19:42:33 +00:00
Cameron Ayer
0155c21b44 private/testplanet, storagenode/{monitor,pieces}: write storage dir verification file on run and verify on loop
On run, write the storage directory verification file.

Every time the node runs it will write the file even if it already exists.
The reason we do this is because if the verification file is missing, the SN
doesn't know whether it is an incorrect directory, or it simply hasn't written
the file yet, and we want to keep nodes running without needing operator intervention.

Once this change has been a part of the minimum version for several releases,
we will move the file creation from the run command to the setup
command. Run will only verify its existence.

Change-Id: Ib7d20e78e711c63817db0ab3036a50af0e8f49cb
2020-08-19 19:12:21 +00:00
Cameron Ayer
586e6f2f13 private/testblobs, storage, storage/filestore: add storage dir verification to filestore
Sometimes SNOs fail to properly configure or lose connection to their storage directory
which can result in DQ. This causes unnecessary repair and is unfortunate for all parties.

This change introduces the creation of a special file in the storage directory at runtime
containing the node ID. While the storage node runs, it periodically verifies that it can
find said file with the correct contents in the correct location. If not, the node will
shut down with an error message.

This change will solve the issue of nodes losing access to the storage directory, but it will not
solve the issue of nodes pointing to the wrong directory, as the identifying file is created each
time the node starts up. After this change has been the minimum version for a few releases, we will
remove the creation of the directory-identifying file from the storage node run command and add it
to the setup command.

Change-Id: Ib7b10e96ac07373219835e39239e93957e7667a4
2020-08-19 17:18:14 +00:00
Moby von Briesen
708cb48aa6 storagenode/orders: implement orders filestore on storagenode
* Add all new orders to the orders filestore instead of the database.
* Submit orders from the filestore to the new satellite SettleWindow
endpoint.

The orders filestore will eventually replace the orders DB completely.
For now, we will still be checking the orders DB and submitting those
orders if they exist. In a later release, we will completely remove the
orders DB, but we need both the DB and filestore for the transitionary
period.

Change-Id: Iac8780fd5ab770296181bbd313e1d335f072d4dc
2020-08-19 15:00:35 +00:00
Ethan
5445d595c0 storagenode/gracefulexit: Wait for the worker delete and transfer goroutines to finish before completing the exit
A failed test showed the same piece being deleted twice. This happens if the graceful exit completes before a previous piece deletion finishes. This change adds a "wait" on the limiter before executing the delete all step when GE is done.

Change-Id: I1c8c49d1e501c2728c80d4224a4854e742be27da
2020-08-19 14:20:26 +00:00
Egon Elbre
be3fd0147e storagenode/storagenodedb: database name in all preflight errors
Shorten the error strings and include database name in all potential
preflight errors.

Change-Id: Ic92ca1ec6e14ffbddb0a0cf89e357eec9532d27e
2020-08-18 16:31:19 +03:00
NickolaiYurchenko
4cdba365ef web/storagenode: payout history table
Change-Id: I448ea8424baf31400d9868ef9ca2b8002caa7bbd
2020-08-13 12:05:56 +00:00
Egon Elbre
94a09ce20b all: add missing dots
Change-Id: I93b86c9fb3398c5d3c9121b8859dad1c615fa23a
2020-08-11 17:50:01 +03:00
Qweder93
4ee1b2d45a storagenode/console: added list of all audits per satellite to sno dashboard/satellites
Change-Id: I52e58748d6467f372d9a308347fc77e400d137e2
2020-08-10 12:55:07 +00:00
Qweder93
373934efb2 storagenode/heldamount: payout history: removed extra doubling with surge percent, added held percent
Change-Id: Idd3927c3130bff771e5437b9b18b4a4907f787e4
2020-08-10 15:29:34 +03:00
Qweder93
f804f03b1f storagenode/heldamount: payuout history updated
Change-Id: I6dc91e9eed51f9b81af3e47a45168c43d254356a
2020-08-05 09:58:34 +00:00
Qweder93
53a5d18e1a storagenode: fixed logging about piece being moved to trash, and added logging when piece was actually deleted
Change-Id: I46f6a141b27033c2087b5c4681506d80b90f4a18
2020-08-02 20:00:05 +03:00
Qweder93
b4c9badab1 storagenode/console: estimation payout fix
Change-Id: I5d9f11fffd74978f3ca684fd08aac44a27a83c71
2020-07-27 21:41:07 +03:00
Qweder93
5988ad6646 storagenode/heldamount: payout history extended with satellite's id, url, surge percent
Change-Id: I669b7b6073ded48fd5686c587357b6c86a970fc7
2020-07-25 14:30:07 +03:00
Qweder93
123aebd79f storagenode/version: version chore test fix
Change-Id: I61537ea325779cefbb1f8d7c5d373dc4bf80a7aa
2020-07-24 20:17:35 +03:00
Qweder93
f531bc8638 storagenode/heldamount payout-history rout fix, usage_at_rest in estimation payout calculations fixed
Change-Id: I6f819a404a45b2a96c1aae33c67ebea1ab83aef0
2020-07-24 15:19:45 +00:00
Yaroslav Vorobiov
4d2a505788 storagenode/db: explicitly open and create dbs
To prevent storagenode from implicitly recreating missing dbs and storage,
as such behaviour leads to audit failures. Do not allow storagenode to
start if any of dbs or storage is missing, corrupted, or dedicated storage disk is
unmounted, to get downtime instead.

Change-Id: Ic64e1f0ff4d8ef5b2fddbe7a7e53df4f4bd8652e
2020-07-24 14:08:47 +03:00
Qweder93
92efffb48a storagenode/version: notification flow now based on cursor, chore_test added, versioncontrol added to reconfigure.
Change-Id: I70713def8d585228270ec5a8c586ecc5b4d510c4
2020-07-23 14:13:24 +00:00
Cameron Ayer
1f5d5235a6 storagenode/{monitor,piecestore}: if free disk < expected available space, return free disk
We only sync free disk and available space, if necessary, on startup. If the
SNs disk fills up with non-storj data, we will not know about it when reporting
available space to the satellite.

Solution: whenever we check the node's capacity, double check free disk.
If free disk < than expected available space, return free disk.

Change-Id: I66265c16e03be45b6e1f5817c70df7eac0a76455
2020-07-22 15:08:37 +00:00
Qweder93
aa6afc3879 error handling in heldamount cash and collector delete fixed
Change-Id: I8fe58c50f844a6b819eacc14a40bc5c67268ed5c
2020-07-22 12:26:13 +00:00
Qweder93
0949731caa storagenode/console: estimation payout held split from total payout, calculations fixed
Change-Id: I064f473ffeb3a3051c9228d1dd84fe0fc86dd3ef
2020-07-21 15:31:51 +03:00
Egon Elbre
d8dcae3075 all: fix error checking
Change-Id: Ia0da1bbd6ce695139922f94096c2419281905e32
2020-07-16 19:13:14 +03:00
Egon Elbre
e70da5cd4e all: fix comments
Change-Id: I2d2307e3fab87de47a72b3595d051e2c95ff4f8a
2020-07-16 19:13:14 +03:00
Egon Elbre
080ba47a06 all: fix dots
Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2
2020-07-16 14:58:28 +00:00
Qweder93
dfdf73282d storagenode/heldamount: db tests updated with payout.Receipt
Change-Id: I17699b923c5a4d7decbd446c382f0c886c36d5e1
2020-07-16 12:24:22 +00:00
Moby von Briesen
1b807761bd storagenode/orders: Update orders filestore to be compatible with new satellite endpoint
* Instead of archiving a list of orders and deleting an "unsent" file in
separate steps, archival simply moves the old unsent file to a new
archived file
* Add maxInFlightTime to be used along with grace period for sending
buffer
* Create unsent/archival directories in constructor
* Code cleanup

Change-Id: Ia3bc2aaf60cced6c6d413465423d78c7d5151188
2020-07-15 14:21:56 -04:00
Qweder93
62fec25104 storagenode/heldamount: returns usage_at_rest in tbm instead of tbh
Change-Id: I183a56460ea76a53680ca6861d02cecebe3576ec
2020-07-15 15:46:13 +03:00
stefanbenten
257855b5de all: replace == comparison with errors.Is
Change-Id: I05d9a369c7c6f144b94a4c524e8aea18eb9cb714
2020-07-14 15:50:25 +00:00
Qweder93
7b4a8c4d6d storagenode/heldamount: payoutHistory added
Change-Id: I93dd3d024085d19ecff76075e52bf66796207fd6
2020-07-14 17:35:03 +03:00
Qweder93
7d6973b5a2 satellite: heldamount and nodestats not returning error node not found by rpc
Change-Id: Ifb00b16a4a04603251de60da6a6612fd5e98d597
2020-07-14 16:31:02 +03:00
Egon Elbre
262da14359 storagenode/console/consoleapi: disable flaky TestStorageNodeApi
Change-Id: I076c9a46fece86d34eae117ab84f94f99e7e64e0
2020-07-13 18:35:38 +03:00
Isaac Hess
78f5755d46 storagenode/nodestats: Add sat to heldamount error
Nodes are receiving an error that heldamount rpc doesn't exist on a
satellite. This simply adds which satellite to the error.

Change-Id: I7708e0511b55fdd2425969db2a545645339bad81
2020-07-13 14:29:09 +00:00
Qweder93
facde770de storagenode/heldadmount: removed logging errors node not found during getAllPaystubs/Payments from trusted satellites
Change-Id: I87f6c697d98546812450fcfb090623c76dec4bbc
2020-07-13 16:45:23 +03:00
Qweder93
f73e92c268 storagenode/gracefulexit: added blobs clean
on node's start checks if any of trusted satellites has GE status "Exited successfully"
if so - trying to delete blobs/satellite folder, so no trash left on SNO.

Change-Id: I566266c84f2a872df54cd01bc2f15a9934f138ed
2020-07-13 11:49:18 +00:00
Qweder93
e17243fcd7 storagenode/console: estimation payour for current and previous month reworked
Change-Id: I937d5d8f7c17949b539dcd6e36af27400a5043e2
2020-07-10 12:18:53 +00:00
Qweder93
0521435e08 storagenode/gracefulexit: added deletion of all files left in storage/blobs/satellite after successful GE
https://storjlabs.atlassian.net/browse/SG-368

Change-Id: I29a978fe0d0153aedf2be91dc7f45b4ef386d447
2020-07-08 14:38:31 +03:00
Bill Thorp
a3c902ab84 storagenode/pieces: hours in a month should be 720
Per https://documentation.tardigrade.io/pricing/billing-and-payment:
"The calculation of per object fees is based on a standard 720-hour month."

On most years, the average value is 730 (365*24/12), except leap years.
However, we want to have ours be 720 (30*24) so its lines up with days.

Change-Id: Ifb9691878f1a7ea81ed36c92b37985493295fe31
2020-07-07 15:26:15 -04:00
Moby von Briesen
e9dd5b2845 storagenode/piecestore: Properly log/send metrics for all successful pieces
When an uplink or repair work finishes uploading a piece to a
storagenode, it has no reason to wait another round trip after the piece
is committed to gracefully close the connection - in many (most?) cases,
the connection is simply canceled once the upload is complete. This has
the unintended side effect of producing a lot of "piece canceled" logs
and metrics on the storagenode side, when the reality is that the piece
uploads were successful, and not really canceled. This commit fixes
that.

Change-Id: Icbc1f7857d380134560219c1c19c186df2783cd0
2020-07-07 15:19:17 +00:00