Commit Graph

577 Commits

Author SHA1 Message Date
Moby von Briesen
8287e3a32d storagenode/orders/store.go: combine writeLimit/writeOrder operations
Combine store.writeLimit and store.writeOrder into
store.writeLimitAndOrder, which only requires a single call to
file.Write(). This simplifies code, but it also reduces the likelihood
of multiple calls to Write() increasing the likelihood of file
corruption.

Also combine the corresponding readLimit/readOrder functions for
consistency.

Change-Id: I62ed406fa2c02708465a678d18293f510f666440
2020-09-22 17:53:12 +00:00
nerdatwork
54dd430048
storagenode/pieces: fix typo for satellite id and piece id 2020-09-22 08:19:12 +03:00
nerdatwork
96ec44ff1b
storagenode/pieces: make log more legible 2020-09-18 15:10:13 +03:00
Qweder93
8182fdad0b storagenode: heldamount renamed to payouts, renamed some methods and structs to more meaningful names. grouped estimated payout with pathouts
satellite: heldamount renamed to SNOpayouts.

Change-Id: I244b4d2454e0621f4b8e22d3c0d3e602c0bbcb02
2020-09-16 14:57:35 +00:00
Moby von Briesen
7db5794c16 storagenode/orders/store: Do not lock order enqueues for entire duration of ListUnsentBySatellite
We only need to lock aquire mutexes inside ListUnsentBySatellite when we
want to determine whether a file has an active enqueue in progress.
On some nodes, ListUnsentBySatellite can take a particularly long time, having
undesired side-effects, so if we can minimize locking time, those nodes
will be better off.

Also, lock archive mu during ListUnsentBySatellite so files cannot be
archived and listed at the same time.

Change-Id: Ieb7e2a759c20c724a74dd8315728c873ccab14a3
2020-09-15 15:15:30 +00:00
Qweder93
528aa76ae6 storagenode/payouts: payoutHistoryMonthly surge reworked, empty receipt now won't return error
Change-Id: If99f8aec102550cd30e5906f986a4417903100be
2020-09-14 18:19:17 +03:00
Moby von Briesen
789b07e226 storagenode/orders/store.go: Do not return error from ListUnsentBySatellite when order files are corrupted.
If we see an UnexpectedEOF error when attempting to read orders, return
the orders we have been able to read successfully and do not return an
error. This behavior ensures that the storagenode orders service
attempts to archive corrupted files and does not retry them repeatedly
and get stuck.

Change-Id: I0d00d1e174f968af6e99ca861eddad190f1339e2
2020-09-10 23:36:05 +00:00
Qweder93
ac29d80495 storagenode: heldamount GetPaystub refactored, estimationPayouts logic separated form console to separate service, storagenodeapi tests fixed.
Change-Id: I902823ef40a62861ce32799e9fb7a67a1e14710d
2020-09-09 15:31:16 +00:00
Stefan Benten
179b5adad4 storagenode/orders: add missing mon.Task parameter
Change-Id: If98cf347a81f29698a6bdb0907520d60f71db433
2020-09-06 00:05:53 +00:00
Jennifer Johnson
4e2413a99d satellite/satellitedb: uses vetted_at field to select for reputable nodes
Additionally, this PR changes NewNodeFraction devDefault and testplanet config from 0.05 to 1.
This is because many tests relied on selecting nodes that were reputable based on audit and uptime
counts of 0, in effect, selecting new nodes as reputable ones.
However, since reputation is now indicated by a vetted_at db field that is explicitly set
rather than implied by audit and uptime counts, it would be more complicated to try to
update all of the nodes' reputations before selecting nodes for tests.
Now we just allow all test nodes to be new if needed.

Change-Id: Ib9531be77408662315b948fd029cee925ed2ca1d
2020-09-04 16:45:32 +00:00
Michal Niewrzal
aa47e70f03 satellite/metainfo: use metabase.SegmentKey with metainfo.Service
Instead of using string or []byte we will be using dedicated type
SegmentKey.

Change-Id: I6ca8039f0741f6f9837c69a6d070228ed10f2220
2020-09-03 15:11:32 +00:00
Qweder93
36d752e92d storagenode/reputation: offline_under_review_at added
Change-Id: Ia7ec79b2d6f20fe29de0c36223f9485380d2845c
2020-09-02 18:48:28 +03:00
Qweder93
7d9897b7af storagenode/nodestats: online_score added
Change-Id: I84b50a6cace306e5f10d53a2073fe8810d4d2960
2020-09-02 17:45:01 +03:00
JT Olio
1f711523d5 satellite/repair: switch to piecestore.UploadReader part 2
Change-Id: I5a91d2960b037c7a3c96d01bc40404316ba028e3
2020-09-01 12:40:54 -06:00
JT Olio
b872fe52a1 satellite/repair: switch to piecestore.UploadReader
Change-Id: Ia99ad2cf5422e6ba1d98b32946740f9cadba7b6d
2020-09-01 09:26:54 -06:00
Cameron Ayer
ca0c1a5f0c storagenode/{monitor,pieces}, storage/filestore: add loop to check storage directory writability
periodically create and delete a temp file in the storage directory
to verify writability. If this check fails, shut the node down.

Change-Id: I433e3a8d1d775fc779ae78e7cf3144a05ffd0574
2020-08-31 21:20:49 +00:00
nerdatwork
e072febbcc
Fixed typo in log for allocated space (#3934) 2020-08-29 16:36:37 +02:00
Egon Elbre
c86c732fc0 satellite: simplify tests
satellite.DB.Console().Projects().GetAll database query
can be replaced with planet.Uplinks[0].Projects[0].ID

Change-Id: I73b82b91afb2dde7b690917345b798f9d81f6831
2020-08-28 22:28:04 +00:00
Egon Elbre
3ca405aa97 satellite/orders: use metabase types as arguments
Change-Id: I7ddaad207c20572a5ea762667531770a56fd54ef
2020-08-28 15:52:37 +03:00
Qweder93
c4a4745dd8 storagenode/console: audit per satellite now uses satelliteName instead of satelliteID
Change-Id: I8221ec840f654a62aedfb62a4194616db890f539
2020-08-25 12:52:47 +00:00
Qweder93
f16cf5cccf storagenode/console & /inspector: added recalculation of disk space info
Change-Id: Id003d031a6464ec095c31290fd6a756ead644261
2020-08-25 14:19:10 +03:00
Egon Elbre
f0ef01de5b storagenode/gracefulexit: retry workers faster
Change-Id: Ica20a691ff117a2b36a6362ee1fed21ce49a9ac1
2020-08-24 12:27:27 +03:00
Egon Elbre
e6bea41083 Revert "gracefulexit: reconnect added"
This reverts commit cff44fbd19.

Change-Id: I6590f483493e308b8244151e1df7570fd32ca2f8
2020-08-23 18:11:24 +03:00
Qweder93
cff44fbd19 gracefulexit: reconnect added
Change-Id: I236689af944effe3e79ef92e852ae264d3b372e5
2020-08-22 14:59:46 +03:00
Moby von Briesen
68b67c83a7 storagenode/{orders,piecestore}: Always unlock unsent orders file, even with an empty order.
When we call ordersStore.BeginEnqueue, the unsent orders file for that
satellite and hour is prevented from being sent. It is freed when the
commit callback returned by BeginEnqueue is used. This change ensures
that we always call the commit callback, even when we have an empty
order or an order with Amount <= 0.

Change-Id: Ic4678f7eaa1e6957dd77d4bb5a23bb35d25b1e93
2020-08-21 11:35:31 -04:00
littleskunk
db57d76ee9
storagenode/gracefulexit: fix wrong error handling for corrupted pieces (#3930) 2020-08-21 11:35:03 +02:00
Jeff Wendling
91698207cf storagenode: live tracking of order window usage
This change accomplishes multiple things:

1. Instead of having a max in flight time, which means
   we effectively have a minimum bandwidth for uploads
   and downloads, we keep track of what windows have
   active requests happening in them.

2. We don't double check when we save the order to see if it
   is too old: by then, it's too late. A malicious uplink
   could just submit orders outside of the grace window and
   receive all the data, but the node would just not commit
   it, so the uplink gets free traffic. Because the endpoints
   also check for the order being too old, this would be a
   very tight race that depends on knowledge of the node system
   clock, but best to not have the race exist. Instead, we piggy
   back off of the in flight tracking and do the check when
   we start to handle the order, and commit at the end.

3. Change the functions that send orders and list unsent
   orders to accept a time at which that operation is
   happening. This way, in tests, we can pretend we're
   listing or sending far into the future after the windows
   are available to send, rather than exposing test functions
   to modify internal state about the grace period to get
   the desired effect. This brings tests closer to actual
   usage in production.

4. Change the calculation for if an order is allowed to be
   enqueued due to the grace period to just look at the
   order creation time, rather than some computation involving
   the window it will be in. In this way, you can easily
   answer the question of "will this order be accepted?" by
   asking "is it older than X?" where X is the grace period.

5. Increases the frequency we check to send up orders to once
   every 5 minutes instead of once every hour because we already
   have hour-long buffering due to the windows. This decreases
   the maximum latency that an order will be reported back to
   the satellite by 55 minutes.

Change-Id: Ie08b90d139d45ee89b82347e191a2f8db1b88036
2020-08-19 19:42:33 +00:00
Cameron Ayer
0155c21b44 private/testplanet, storagenode/{monitor,pieces}: write storage dir verification file on run and verify on loop
On run, write the storage directory verification file.

Every time the node runs it will write the file even if it already exists.
The reason we do this is because if the verification file is missing, the SN
doesn't know whether it is an incorrect directory, or it simply hasn't written
the file yet, and we want to keep nodes running without needing operator intervention.

Once this change has been a part of the minimum version for several releases,
we will move the file creation from the run command to the setup
command. Run will only verify its existence.

Change-Id: Ib7d20e78e711c63817db0ab3036a50af0e8f49cb
2020-08-19 19:12:21 +00:00
Cameron Ayer
586e6f2f13 private/testblobs, storage, storage/filestore: add storage dir verification to filestore
Sometimes SNOs fail to properly configure or lose connection to their storage directory
which can result in DQ. This causes unnecessary repair and is unfortunate for all parties.

This change introduces the creation of a special file in the storage directory at runtime
containing the node ID. While the storage node runs, it periodically verifies that it can
find said file with the correct contents in the correct location. If not, the node will
shut down with an error message.

This change will solve the issue of nodes losing access to the storage directory, but it will not
solve the issue of nodes pointing to the wrong directory, as the identifying file is created each
time the node starts up. After this change has been the minimum version for a few releases, we will
remove the creation of the directory-identifying file from the storage node run command and add it
to the setup command.

Change-Id: Ib7b10e96ac07373219835e39239e93957e7667a4
2020-08-19 17:18:14 +00:00
Moby von Briesen
708cb48aa6 storagenode/orders: implement orders filestore on storagenode
* Add all new orders to the orders filestore instead of the database.
* Submit orders from the filestore to the new satellite SettleWindow
endpoint.

The orders filestore will eventually replace the orders DB completely.
For now, we will still be checking the orders DB and submitting those
orders if they exist. In a later release, we will completely remove the
orders DB, but we need both the DB and filestore for the transitionary
period.

Change-Id: Iac8780fd5ab770296181bbd313e1d335f072d4dc
2020-08-19 15:00:35 +00:00
Ethan
5445d595c0 storagenode/gracefulexit: Wait for the worker delete and transfer goroutines to finish before completing the exit
A failed test showed the same piece being deleted twice. This happens if the graceful exit completes before a previous piece deletion finishes. This change adds a "wait" on the limiter before executing the delete all step when GE is done.

Change-Id: I1c8c49d1e501c2728c80d4224a4854e742be27da
2020-08-19 14:20:26 +00:00
Egon Elbre
be3fd0147e storagenode/storagenodedb: database name in all preflight errors
Shorten the error strings and include database name in all potential
preflight errors.

Change-Id: Ic92ca1ec6e14ffbddb0a0cf89e357eec9532d27e
2020-08-18 16:31:19 +03:00
NickolaiYurchenko
4cdba365ef web/storagenode: payout history table
Change-Id: I448ea8424baf31400d9868ef9ca2b8002caa7bbd
2020-08-13 12:05:56 +00:00
Egon Elbre
94a09ce20b all: add missing dots
Change-Id: I93b86c9fb3398c5d3c9121b8859dad1c615fa23a
2020-08-11 17:50:01 +03:00
Qweder93
4ee1b2d45a storagenode/console: added list of all audits per satellite to sno dashboard/satellites
Change-Id: I52e58748d6467f372d9a308347fc77e400d137e2
2020-08-10 12:55:07 +00:00
Qweder93
373934efb2 storagenode/heldamount: payout history: removed extra doubling with surge percent, added held percent
Change-Id: Idd3927c3130bff771e5437b9b18b4a4907f787e4
2020-08-10 15:29:34 +03:00
Qweder93
f804f03b1f storagenode/heldamount: payuout history updated
Change-Id: I6dc91e9eed51f9b81af3e47a45168c43d254356a
2020-08-05 09:58:34 +00:00
Qweder93
53a5d18e1a storagenode: fixed logging about piece being moved to trash, and added logging when piece was actually deleted
Change-Id: I46f6a141b27033c2087b5c4681506d80b90f4a18
2020-08-02 20:00:05 +03:00
Qweder93
b4c9badab1 storagenode/console: estimation payout fix
Change-Id: I5d9f11fffd74978f3ca684fd08aac44a27a83c71
2020-07-27 21:41:07 +03:00
Qweder93
5988ad6646 storagenode/heldamount: payout history extended with satellite's id, url, surge percent
Change-Id: I669b7b6073ded48fd5686c587357b6c86a970fc7
2020-07-25 14:30:07 +03:00
Qweder93
123aebd79f storagenode/version: version chore test fix
Change-Id: I61537ea325779cefbb1f8d7c5d373dc4bf80a7aa
2020-07-24 20:17:35 +03:00
Qweder93
f531bc8638 storagenode/heldamount payout-history rout fix, usage_at_rest in estimation payout calculations fixed
Change-Id: I6f819a404a45b2a96c1aae33c67ebea1ab83aef0
2020-07-24 15:19:45 +00:00
Yaroslav Vorobiov
4d2a505788 storagenode/db: explicitly open and create dbs
To prevent storagenode from implicitly recreating missing dbs and storage,
as such behaviour leads to audit failures. Do not allow storagenode to
start if any of dbs or storage is missing, corrupted, or dedicated storage disk is
unmounted, to get downtime instead.

Change-Id: Ic64e1f0ff4d8ef5b2fddbe7a7e53df4f4bd8652e
2020-07-24 14:08:47 +03:00
Qweder93
92efffb48a storagenode/version: notification flow now based on cursor, chore_test added, versioncontrol added to reconfigure.
Change-Id: I70713def8d585228270ec5a8c586ecc5b4d510c4
2020-07-23 14:13:24 +00:00
Cameron Ayer
1f5d5235a6 storagenode/{monitor,piecestore}: if free disk < expected available space, return free disk
We only sync free disk and available space, if necessary, on startup. If the
SNs disk fills up with non-storj data, we will not know about it when reporting
available space to the satellite.

Solution: whenever we check the node's capacity, double check free disk.
If free disk < than expected available space, return free disk.

Change-Id: I66265c16e03be45b6e1f5817c70df7eac0a76455
2020-07-22 15:08:37 +00:00
Qweder93
aa6afc3879 error handling in heldamount cash and collector delete fixed
Change-Id: I8fe58c50f844a6b819eacc14a40bc5c67268ed5c
2020-07-22 12:26:13 +00:00
Qweder93
0949731caa storagenode/console: estimation payout held split from total payout, calculations fixed
Change-Id: I064f473ffeb3a3051c9228d1dd84fe0fc86dd3ef
2020-07-21 15:31:51 +03:00
Egon Elbre
d8dcae3075 all: fix error checking
Change-Id: Ia0da1bbd6ce695139922f94096c2419281905e32
2020-07-16 19:13:14 +03:00
Egon Elbre
e70da5cd4e all: fix comments
Change-Id: I2d2307e3fab87de47a72b3595d051e2c95ff4f8a
2020-07-16 19:13:14 +03:00
Egon Elbre
080ba47a06 all: fix dots
Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2
2020-07-16 14:58:28 +00:00