Commit Graph

189 Commits

Author SHA1 Message Date
Clement Sam
b6026b9ff3 storagenode/piecestore: fix ingress graph skewed by larger signed orders
Storagenodes are currently getting larger signed orders due to
a performance optimization in uplink, which now messes with the
ingress graph because the storagenode plots the graph using
the order amount instead of actually uploaded bytes, which this
change fixes.

The egress graph might have a similar issue if the order amount
is larger than the actually downloaded bytes but since we pay
for orders, whether fulfilled or unfulfilled, we continue using
the order amount for the egress graph.

Resolves https://github.com/storj/storj/issues/5853

Change-Id: I2af7ee3ff249801ce07714bba055370ebd597c6e
2023-06-20 13:28:08 +00:00
Jeff Wendling
80b3edf1d1 storagenode/piecestore: respect maximum chunk size requests
See https://review.dev.storj.io/c/storj/common/+/10297 for
more details.

Change-Id: Id5d19f029ae872780a554874592679191c1b5b2f
2023-04-28 16:31:41 -04:00
Egon Elbre
f5020de57c storagenode/blobstore: move blob store logic
The blobstore implementation is entirely related to storagenode, so the
rightful place is together with the storagenode implementation.

Fixes https://github.com/storj/storj/issues/5754

Change-Id: Ie6637b0262cf37af6c3e558556c7604d9dc3613d
2023-04-05 18:06:20 +00:00
Márton Elek
462c16eb9b
storagenode/piecestore: use actual Initial/MaxStep defaults
storj/storj uses storj/uplink and storj/uplink uses storj/storj (for integration test).

Without using the real defaults (instead of hard coded ones) in storj/storj, we couldn't modify them. (modification in uplink will fail when storj/storj is used for integration test, with the unchanged, hard-coded defaults).

Change-Id: Ifa68567dc2d5c8d08af8041ac338870c4fc26d45
2023-04-05 19:18:12 +02:00
Márton Elek
00420b5904 storagenode/piecestore: better monkit metric for download
Download is server from two goroutines:

 * one is waiting for the orders (and updates the actual limit)
 * other one sends the valuable bytes back to the client (in case the actual order is big enough)

These two tasks are syncrhonized with the help of a `sync2.NewThrottle()`

But all of these happens in the same method, therefore we have no idea how much time is spent on waiting for next orders
 (throttle can wait until we receive new orderlimit), and how much time is spent with actual work.

This patch moves the actual work (after sending routine is waked up) to a separated method to have better visibility and measure the actual work (read data + send it).

Change-Id: Ia5068c544560a53bc2fcea6cb6fce85cfacbd95b
2023-03-30 10:46:18 +00:00
JT Olio
58f465c8d8 storagenode/piecestore: improve cancelation detection logic
if a drpc.ClosedError was returned, it would always take the
first (failure) branch, despite the second branch's existence.

Change-Id: Ife3b27869c4e9d37ca2914e2d1d1a2c60d326309
2023-03-21 17:50:49 +00:00
JT Olio
ade808375c storagenode/piecestore: be more flexible with bandwidth usage max
this is to prepare for https://review.dev.storj.io/c/storj/uplink/+/9814

Change-Id: Ib74fd63a352a0ae4cc1f8b66fb70df649844c33a
2023-03-06 18:00:17 +00:00
Márton Elek
644aca0e42 storagenode: fix piecestore download metrics
Storagenode download metrics are not accurate:

 * the current metric bump cancel metrics only for specific error messages, but there are cases where the error is already handled (err == nill)
 * instead of the full size of the piece we need to use the size of the downloaded bytes

Change-Id: I6ca75770e2d40bf514f5e273785c78e02968c919
2023-02-28 12:52:58 +00:00
JT Olio
529e3674e4 storagenode/piecestore: handle upload write if provided in first message
we may in the future want to accept writes and commits as part of the
initial request message, just like
https://review.dev.storj.io/c/storj/storj/+/9245

this change is forward compatible but continues to work with
existing clients.

Change-Id: Ifd3ac8606d498a43bb35d0a3751859656e1e8995
2023-02-23 18:33:08 +00:00
JT Olio
ca13eca718 go.mod: bump libuplink to include noise
Change-Id: If5bceb139ce6fdf6c0792b4bb536bc61b54e32bb
2023-02-10 17:03:51 +00:00
JT Olio
ae9ea22193 storagenode/piecestore: return node certificate chain at upload conclusion
uplinks currently get the node's certificate chain over TLS. once Noise
is in use, uplinks will no longer be able to do this. we should start
having the upload request return the certificate chain in the same
release that starts supporting noise.

Change-Id: I619b23cb8e25691bcc62d760f884403a4ccd64a0
2023-02-01 01:49:50 +00:00
Clement Sam
95960572b3 storagenode/piecestore: improve logs for incoming requests
- Adds "Remote Address" field to all INFO logs related to GET,
PUT, and DELETE requests
- Adds Offset and Size fields to all info logs related to GET
requests

Resolves https://github.com/storj/storj/issues/5404

Change-Id: I5dab1867619385362e5f1e0455dfab17d295a37a
2023-01-24 10:23:34 +00:00
JT Olio
8d69837f02 storagenode/piecestore: handle order if provided in first message
we may in the future want to accept orders as part of the initial
request message
(e.g. https://review.dev.storj.io/c/storj/uplink/+/9246).

this change is forward compatible but continues to work with
existing clients.

Change-Id: I475ad50d6cbfee8a1f843383230698e4ef9b9e54
2023-01-20 08:53:22 +00:00
Egon Elbre
9544a670d7 storagenode/pieces: fix concurrent empty and restore trash
This ensures that empty trash and restore trash cannot run at the same
time.

Fixes https://github.com/storj/storj/issues/5416

Change-Id: I9d2e3aa3d66e61e5c8a7427a95208bb96089792d
2023-01-03 15:01:54 +00:00
Michal Niewrzal
d1d617d654 storagenode/piecestore: small cleanup
* normalize ExistsCheckWorkers flag to avoid setting incorrect values
* wait for limiter to finish if context was canceled

Change-Id: I688a395bd958cd09233605fc264d43f91ec45ed1
2022-12-19 12:37:12 +00:00
Michal Niewrzal
5110803102 storagenode/piecestore: add Exists endpoint
Adds new method Exists which can be used to verify which
requested piece ids exists on storage node. Will verify only pieces
which belongs to the satellite that used that endpoint.

Minum WASM size was increased a bit.

https://github.com/storj/storj/issues/5415

Change-Id: Ia5f9cadeb526541b2776a8973eb7d50133ad8636
2022-12-17 04:08:26 +00:00
Egon Elbre
ee71fbb41d storagenode/piecestore: start restore trash in the background
Starting restore trash in the background allows the satellite to
continue to the next storagenode without needing to wait until
completion.

Of course, this means the satellite doesn't get feedback whether it
succeeds successfully or not. This means that the restore-trash needs to
be executed several times.

Change-Id: I62d43f6f2e4a07854f6d083a65badf897338083b
2022-12-16 18:15:52 +02:00
Egon Elbre
ff22fc7ddd all: fix deprecated ioutil commands
Change-Id: I59db35116ec7215a1b8e2ae7dbd319fa099adfac
2022-10-11 15:27:29 +00:00
Márton Elek
4b1be6bf8e storagenode/satellite: support different piece hash algorithms
Change-Id: I3db321e79f12f3ebaa249e6c32fa37fd9615687e
2022-08-23 18:15:06 +00:00
Óscar de Arriba
4fdb81c510
storagenode/pieces: allow to configure initial piece scan (#5024)
* Allow configure initial piece scan

* Update tests to include new flag

* Update default config specification

* Rename configuration flag

* Rename variable and fix formatting

* Fix format

* Fix typo

Co-authored-by: Stefan Benten <mail@stefan-benten.de>
Co-authored-by: Clement Sam <clementsam75@gmail.com>
Co-authored-by: littleskunk <jens.heimbuerge@googlemail.com>
2022-08-07 22:40:59 +00:00
Egon Elbre
466832e4bc storagenode/piecestore: check for remote closing
Remote closing during upload or download is entirely expected and
it shouldn't lead to an error in the log.

Bump drpc to get the version that contains correct error code
for it. Also bump errs, which contains a fix for .Has.

Fixes https://github.com/storj/storj/issues/4609

Change-Id: I9297cabcfdc4b3a2c19d478dc729f779a2aef0c3
2022-03-17 19:27:42 +02:00
littleskunk
07fad75912
storagenode/piecestore: upload and download metrics for Grafana alerts (#4280)
* storagenode/piecestore: Add upload and download metrics for Grafana alerts

* storagenode/piecestore: group download metrics by piece action

Change-Id: Ib2a42b60c56c3f581915d512f4907c8db71e4624

Co-authored-by: Clement Sam <clement@storj.io>
2021-11-18 12:50:39 +00:00
Yingrong Zhao
2df41028a3 storagenode/piecestore: add logs for restore trash endpoint
When a satellite operator runs restore trash command, it's hard for node
operator to know whether their node has received the request. Adding
logs so this process is visible from node operators perspective.

Change-Id: I08c670b1b0c5971e6b73e038a0935cb0caaca63d
2021-10-20 15:37:28 +00:00
Clement Sam
0d58172c38 storagenode: add doc.go files for sno packages
Change-Id: I23d4b8b462e1b03718d0c4801cc2aaff520e7356
2021-09-29 08:24:56 +00:00
Egon Elbre
71eb184ef3 storagenode/piecestore: simplify TestTooManyRequests
Change-Id: I1452735bef206bc8cf9bde72fc503fec3ae5b067
2021-09-21 17:18:37 +03:00
TungHoang
e1379bea0f
storagenode/piecestore: allow rejecting slow clients
Estimate speed of the uploads by calculating the time a client has been in a non-congested state. Storage node is uncongested when it has used up over 80% of its maximum concurrent requests capacity.

Currently it's disabled by default.
2021-07-01 09:59:04 +03:00
Yaroslav Vorobiov
c9cfb5ed0c storagenode/retain: add more verbose monkit monitoring
Change-Id: Ibb9804268751b4b1842eb729bc510dba83e9b28b
2021-06-04 20:20:11 +00:00
Egon Elbre
10372afbe4 ci: fix lint errors
Change-Id: Ib5893440807811f77175ccd347aa3f8ca9cccbdf
2021-05-17 13:37:31 +00:00
Egon Elbre
69b149a66f mod: bump uplink
uplink stopped using zap, hence some of the private methods needed to be
changed.

Change-Id: Iac1fae45a40cd3f1649b9f672bf8c250344986d5
2021-05-06 14:48:36 +00:00
Egon Elbre
961e841bd7 all: fix error naming
errs.Class should not contain "error" in the name, since that causes a
lot of stutter in the error logs. As an example a log line could end up
looking like:

    ERROR node stats service error: satellitedbs error: node stats database error: no rows

Whereas something like:

    ERROR nodestats service: satellitedbs: nodestatsdb: no rows

Would contain all the necessary information without the stutter.

Change-Id: I7b7cb7e592ebab4bcfadc1eef11122584d2b20e0
2021-04-29 15:38:21 +03:00
Yingrong Zhao
307886ffe8 storagenode/piecestore: fix error handling in TestDownload
If a read error is returned, we want to preserve it instead of
overriding it with close error

Change-Id: Ie253a453c7b6b62598b89dffd7635b8266357074
2021-04-23 18:43:45 +00:00
Jennifer Johnson
444b1f4757 storagenode/piecestore: update endpoint_test TestDownload to avoid index out of range runtime errors
Change-Id: Ib3c166a1db84f4caafa970eb800a169d35ba3a54
2021-04-07 13:34:35 -04:00
Egon Elbre
86e698f572 pb: use *UnimplementedServer to avoid breaking API changes
Change-Id: I99a34eeb37ac4453411f273511710562a519f57a
2021-03-29 12:26:10 +03:00
Ivan Fraixedes
2223877439 storgenode/piecestore: Log upload size on Upload
Add the upload size to the log lines of the storagenode Upload endpoint
to provides the information to Storage node operators.

Change-Id: Ife661d28be72c2bf02579093e21fa811566ac8dd
2020-12-28 10:28:09 +00:00
Egon Elbre
b892a00143 mod: bump dependencies and reenable test
We shouldn't have any EOF issues with recent drpc fix, let's reenable
and see whether it's still flaky.

Change-Id: I0de312bcb087c7f70ec9d3281d73d86f971845d5
2020-11-10 10:32:21 +00:00
Egon Elbre
7183dca6cb all: fix defers in loop
defer should not be called in a loop.

Change-Id: Ifa5a25a56402814b974bcdfb0c2fce56df8e7e59
2020-11-02 15:06:38 +02:00
Egon Elbre
2268cc1df3 all: fix linter complaints
Change-Id: Ia01404dbb6bdd19a146fa10ff7302e08f87a8c95
2020-10-13 15:59:01 +03:00
Egon Elbre
0bdb952269 all: use keyed special comment
Change-Id: I57f6af053382c638026b64c5ff77b169bd3c6c8b
2020-10-13 15:13:41 +03:00
Moby von Briesen
fbf2c0b242 storagenode/orders: Refactor orders store
Abstract details of writing and reading data to/from orders files so
that adding V1 and future maintenance are easier.

Change-Id: I85f4a91761293de1a782e197bc9e09db228933c9
2020-10-06 15:28:07 -04:00
JT Olio
1f711523d5 satellite/repair: switch to piecestore.UploadReader part 2
Change-Id: I5a91d2960b037c7a3c96d01bc40404316ba028e3
2020-09-01 12:40:54 -06:00
JT Olio
b872fe52a1 satellite/repair: switch to piecestore.UploadReader
Change-Id: Ia99ad2cf5422e6ba1d98b32946740f9cadba7b6d
2020-09-01 09:26:54 -06:00
Cameron Ayer
ca0c1a5f0c storagenode/{monitor,pieces}, storage/filestore: add loop to check storage directory writability
periodically create and delete a temp file in the storage directory
to verify writability. If this check fails, shut the node down.

Change-Id: I433e3a8d1d775fc779ae78e7cf3144a05ffd0574
2020-08-31 21:20:49 +00:00
Moby von Briesen
68b67c83a7 storagenode/{orders,piecestore}: Always unlock unsent orders file, even with an empty order.
When we call ordersStore.BeginEnqueue, the unsent orders file for that
satellite and hour is prevented from being sent. It is freed when the
commit callback returned by BeginEnqueue is used. This change ensures
that we always call the commit callback, even when we have an empty
order or an order with Amount <= 0.

Change-Id: Ic4678f7eaa1e6957dd77d4bb5a23bb35d25b1e93
2020-08-21 11:35:31 -04:00
Jeff Wendling
91698207cf storagenode: live tracking of order window usage
This change accomplishes multiple things:

1. Instead of having a max in flight time, which means
   we effectively have a minimum bandwidth for uploads
   and downloads, we keep track of what windows have
   active requests happening in them.

2. We don't double check when we save the order to see if it
   is too old: by then, it's too late. A malicious uplink
   could just submit orders outside of the grace window and
   receive all the data, but the node would just not commit
   it, so the uplink gets free traffic. Because the endpoints
   also check for the order being too old, this would be a
   very tight race that depends on knowledge of the node system
   clock, but best to not have the race exist. Instead, we piggy
   back off of the in flight tracking and do the check when
   we start to handle the order, and commit at the end.

3. Change the functions that send orders and list unsent
   orders to accept a time at which that operation is
   happening. This way, in tests, we can pretend we're
   listing or sending far into the future after the windows
   are available to send, rather than exposing test functions
   to modify internal state about the grace period to get
   the desired effect. This brings tests closer to actual
   usage in production.

4. Change the calculation for if an order is allowed to be
   enqueued due to the grace period to just look at the
   order creation time, rather than some computation involving
   the window it will be in. In this way, you can easily
   answer the question of "will this order be accepted?" by
   asking "is it older than X?" where X is the grace period.

5. Increases the frequency we check to send up orders to once
   every 5 minutes instead of once every hour because we already
   have hour-long buffering due to the windows. This decreases
   the maximum latency that an order will be reported back to
   the satellite by 55 minutes.

Change-Id: Ie08b90d139d45ee89b82347e191a2f8db1b88036
2020-08-19 19:42:33 +00:00
Cameron Ayer
0155c21b44 private/testplanet, storagenode/{monitor,pieces}: write storage dir verification file on run and verify on loop
On run, write the storage directory verification file.

Every time the node runs it will write the file even if it already exists.
The reason we do this is because if the verification file is missing, the SN
doesn't know whether it is an incorrect directory, or it simply hasn't written
the file yet, and we want to keep nodes running without needing operator intervention.

Once this change has been a part of the minimum version for several releases,
we will move the file creation from the run command to the setup
command. Run will only verify its existence.

Change-Id: Ib7d20e78e711c63817db0ab3036a50af0e8f49cb
2020-08-19 19:12:21 +00:00
Moby von Briesen
708cb48aa6 storagenode/orders: implement orders filestore on storagenode
* Add all new orders to the orders filestore instead of the database.
* Submit orders from the filestore to the new satellite SettleWindow
endpoint.

The orders filestore will eventually replace the orders DB completely.
For now, we will still be checking the orders DB and submitting those
orders if they exist. In a later release, we will completely remove the
orders DB, but we need both the DB and filestore for the transitionary
period.

Change-Id: Iac8780fd5ab770296181bbd313e1d335f072d4dc
2020-08-19 15:00:35 +00:00
Cameron Ayer
1f5d5235a6 storagenode/{monitor,piecestore}: if free disk < expected available space, return free disk
We only sync free disk and available space, if necessary, on startup. If the
SNs disk fills up with non-storj data, we will not know about it when reporting
available space to the satellite.

Solution: whenever we check the node's capacity, double check free disk.
If free disk < than expected available space, return free disk.

Change-Id: I66265c16e03be45b6e1f5817c70df7eac0a76455
2020-07-22 15:08:37 +00:00
Egon Elbre
e70da5cd4e all: fix comments
Change-Id: I2d2307e3fab87de47a72b3595d051e2c95ff4f8a
2020-07-16 19:13:14 +03:00
Egon Elbre
080ba47a06 all: fix dots
Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2
2020-07-16 14:58:28 +00:00
Moby von Briesen
e9dd5b2845 storagenode/piecestore: Properly log/send metrics for all successful pieces
When an uplink or repair work finishes uploading a piece to a
storagenode, it has no reason to wait another round trip after the piece
is committed to gracefully close the connection - in many (most?) cases,
the connection is simply canceled once the upload is complete. This has
the unintended side effect of producing a lot of "piece canceled" logs
and metrics on the storagenode side, when the reality is that the piece
uploads were successful, and not really canceled. This commit fixes
that.

Change-Id: Icbc1f7857d380134560219c1c19c186df2783cd0
2020-07-07 15:19:17 +00:00