Commit Graph

127 Commits

Author SHA1 Message Date
JT Olio
ae9ea22193 storagenode/piecestore: return node certificate chain at upload conclusion
uplinks currently get the node's certificate chain over TLS. once Noise
is in use, uplinks will no longer be able to do this. we should start
having the upload request return the certificate chain in the same
release that starts supporting noise.

Change-Id: I619b23cb8e25691bcc62d760f884403a4ccd64a0
2023-02-01 01:49:50 +00:00
Clement Sam
95960572b3 storagenode/piecestore: improve logs for incoming requests
- Adds "Remote Address" field to all INFO logs related to GET,
PUT, and DELETE requests
- Adds Offset and Size fields to all info logs related to GET
requests

Resolves https://github.com/storj/storj/issues/5404

Change-Id: I5dab1867619385362e5f1e0455dfab17d295a37a
2023-01-24 10:23:34 +00:00
JT Olio
8d69837f02 storagenode/piecestore: handle order if provided in first message
we may in the future want to accept orders as part of the initial
request message
(e.g. https://review.dev.storj.io/c/storj/uplink/+/9246).

this change is forward compatible but continues to work with
existing clients.

Change-Id: I475ad50d6cbfee8a1f843383230698e4ef9b9e54
2023-01-20 08:53:22 +00:00
Egon Elbre
9544a670d7 storagenode/pieces: fix concurrent empty and restore trash
This ensures that empty trash and restore trash cannot run at the same
time.

Fixes https://github.com/storj/storj/issues/5416

Change-Id: I9d2e3aa3d66e61e5c8a7427a95208bb96089792d
2023-01-03 15:01:54 +00:00
Michal Niewrzal
d1d617d654 storagenode/piecestore: small cleanup
* normalize ExistsCheckWorkers flag to avoid setting incorrect values
* wait for limiter to finish if context was canceled

Change-Id: I688a395bd958cd09233605fc264d43f91ec45ed1
2022-12-19 12:37:12 +00:00
Michal Niewrzal
5110803102 storagenode/piecestore: add Exists endpoint
Adds new method Exists which can be used to verify which
requested piece ids exists on storage node. Will verify only pieces
which belongs to the satellite that used that endpoint.

Minum WASM size was increased a bit.

https://github.com/storj/storj/issues/5415

Change-Id: Ia5f9cadeb526541b2776a8973eb7d50133ad8636
2022-12-17 04:08:26 +00:00
Egon Elbre
ee71fbb41d storagenode/piecestore: start restore trash in the background
Starting restore trash in the background allows the satellite to
continue to the next storagenode without needing to wait until
completion.

Of course, this means the satellite doesn't get feedback whether it
succeeds successfully or not. This means that the restore-trash needs to
be executed several times.

Change-Id: I62d43f6f2e4a07854f6d083a65badf897338083b
2022-12-16 18:15:52 +02:00
Márton Elek
4b1be6bf8e storagenode/satellite: support different piece hash algorithms
Change-Id: I3db321e79f12f3ebaa249e6c32fa37fd9615687e
2022-08-23 18:15:06 +00:00
Óscar de Arriba
4fdb81c510
storagenode/pieces: allow to configure initial piece scan (#5024)
* Allow configure initial piece scan

* Update tests to include new flag

* Update default config specification

* Rename configuration flag

* Rename variable and fix formatting

* Fix format

* Fix typo

Co-authored-by: Stefan Benten <mail@stefan-benten.de>
Co-authored-by: Clement Sam <clementsam75@gmail.com>
Co-authored-by: littleskunk <jens.heimbuerge@googlemail.com>
2022-08-07 22:40:59 +00:00
Egon Elbre
466832e4bc storagenode/piecestore: check for remote closing
Remote closing during upload or download is entirely expected and
it shouldn't lead to an error in the log.

Bump drpc to get the version that contains correct error code
for it. Also bump errs, which contains a fix for .Has.

Fixes https://github.com/storj/storj/issues/4609

Change-Id: I9297cabcfdc4b3a2c19d478dc729f779a2aef0c3
2022-03-17 19:27:42 +02:00
littleskunk
07fad75912
storagenode/piecestore: upload and download metrics for Grafana alerts (#4280)
* storagenode/piecestore: Add upload and download metrics for Grafana alerts

* storagenode/piecestore: group download metrics by piece action

Change-Id: Ib2a42b60c56c3f581915d512f4907c8db71e4624

Co-authored-by: Clement Sam <clement@storj.io>
2021-11-18 12:50:39 +00:00
Yingrong Zhao
2df41028a3 storagenode/piecestore: add logs for restore trash endpoint
When a satellite operator runs restore trash command, it's hard for node
operator to know whether their node has received the request. Adding
logs so this process is visible from node operators perspective.

Change-Id: I08c670b1b0c5971e6b73e038a0935cb0caaca63d
2021-10-20 15:37:28 +00:00
TungHoang
e1379bea0f
storagenode/piecestore: allow rejecting slow clients
Estimate speed of the uploads by calculating the time a client has been in a non-congested state. Storage node is uncongested when it has used up over 80% of its maximum concurrent requests capacity.

Currently it's disabled by default.
2021-07-01 09:59:04 +03:00
Yaroslav Vorobiov
c9cfb5ed0c storagenode/retain: add more verbose monkit monitoring
Change-Id: Ibb9804268751b4b1842eb729bc510dba83e9b28b
2021-06-04 20:20:11 +00:00
Egon Elbre
10372afbe4 ci: fix lint errors
Change-Id: Ib5893440807811f77175ccd347aa3f8ca9cccbdf
2021-05-17 13:37:31 +00:00
Egon Elbre
86e698f572 pb: use *UnimplementedServer to avoid breaking API changes
Change-Id: I99a34eeb37ac4453411f273511710562a519f57a
2021-03-29 12:26:10 +03:00
Ivan Fraixedes
2223877439 storgenode/piecestore: Log upload size on Upload
Add the upload size to the log lines of the storagenode Upload endpoint
to provides the information to Storage node operators.

Change-Id: Ife661d28be72c2bf02579093e21fa811566ac8dd
2020-12-28 10:28:09 +00:00
Egon Elbre
2268cc1df3 all: fix linter complaints
Change-Id: Ia01404dbb6bdd19a146fa10ff7302e08f87a8c95
2020-10-13 15:59:01 +03:00
Moby von Briesen
fbf2c0b242 storagenode/orders: Refactor orders store
Abstract details of writing and reading data to/from orders files so
that adding V1 and future maintenance are easier.

Change-Id: I85f4a91761293de1a782e197bc9e09db228933c9
2020-10-06 15:28:07 -04:00
Cameron Ayer
ca0c1a5f0c storagenode/{monitor,pieces}, storage/filestore: add loop to check storage directory writability
periodically create and delete a temp file in the storage directory
to verify writability. If this check fails, shut the node down.

Change-Id: I433e3a8d1d775fc779ae78e7cf3144a05ffd0574
2020-08-31 21:20:49 +00:00
Moby von Briesen
68b67c83a7 storagenode/{orders,piecestore}: Always unlock unsent orders file, even with an empty order.
When we call ordersStore.BeginEnqueue, the unsent orders file for that
satellite and hour is prevented from being sent. It is freed when the
commit callback returned by BeginEnqueue is used. This change ensures
that we always call the commit callback, even when we have an empty
order or an order with Amount <= 0.

Change-Id: Ic4678f7eaa1e6957dd77d4bb5a23bb35d25b1e93
2020-08-21 11:35:31 -04:00
Jeff Wendling
91698207cf storagenode: live tracking of order window usage
This change accomplishes multiple things:

1. Instead of having a max in flight time, which means
   we effectively have a minimum bandwidth for uploads
   and downloads, we keep track of what windows have
   active requests happening in them.

2. We don't double check when we save the order to see if it
   is too old: by then, it's too late. A malicious uplink
   could just submit orders outside of the grace window and
   receive all the data, but the node would just not commit
   it, so the uplink gets free traffic. Because the endpoints
   also check for the order being too old, this would be a
   very tight race that depends on knowledge of the node system
   clock, but best to not have the race exist. Instead, we piggy
   back off of the in flight tracking and do the check when
   we start to handle the order, and commit at the end.

3. Change the functions that send orders and list unsent
   orders to accept a time at which that operation is
   happening. This way, in tests, we can pretend we're
   listing or sending far into the future after the windows
   are available to send, rather than exposing test functions
   to modify internal state about the grace period to get
   the desired effect. This brings tests closer to actual
   usage in production.

4. Change the calculation for if an order is allowed to be
   enqueued due to the grace period to just look at the
   order creation time, rather than some computation involving
   the window it will be in. In this way, you can easily
   answer the question of "will this order be accepted?" by
   asking "is it older than X?" where X is the grace period.

5. Increases the frequency we check to send up orders to once
   every 5 minutes instead of once every hour because we already
   have hour-long buffering due to the windows. This decreases
   the maximum latency that an order will be reported back to
   the satellite by 55 minutes.

Change-Id: Ie08b90d139d45ee89b82347e191a2f8db1b88036
2020-08-19 19:42:33 +00:00
Cameron Ayer
0155c21b44 private/testplanet, storagenode/{monitor,pieces}: write storage dir verification file on run and verify on loop
On run, write the storage directory verification file.

Every time the node runs it will write the file even if it already exists.
The reason we do this is because if the verification file is missing, the SN
doesn't know whether it is an incorrect directory, or it simply hasn't written
the file yet, and we want to keep nodes running without needing operator intervention.

Once this change has been a part of the minimum version for several releases,
we will move the file creation from the run command to the setup
command. Run will only verify its existence.

Change-Id: Ib7d20e78e711c63817db0ab3036a50af0e8f49cb
2020-08-19 19:12:21 +00:00
Moby von Briesen
708cb48aa6 storagenode/orders: implement orders filestore on storagenode
* Add all new orders to the orders filestore instead of the database.
* Submit orders from the filestore to the new satellite SettleWindow
endpoint.

The orders filestore will eventually replace the orders DB completely.
For now, we will still be checking the orders DB and submitting those
orders if they exist. In a later release, we will completely remove the
orders DB, but we need both the DB and filestore for the transitionary
period.

Change-Id: Iac8780fd5ab770296181bbd313e1d335f072d4dc
2020-08-19 15:00:35 +00:00
Cameron Ayer
1f5d5235a6 storagenode/{monitor,piecestore}: if free disk < expected available space, return free disk
We only sync free disk and available space, if necessary, on startup. If the
SNs disk fills up with non-storj data, we will not know about it when reporting
available space to the satellite.

Solution: whenever we check the node's capacity, double check free disk.
If free disk < than expected available space, return free disk.

Change-Id: I66265c16e03be45b6e1f5817c70df7eac0a76455
2020-07-22 15:08:37 +00:00
Egon Elbre
080ba47a06 all: fix dots
Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2
2020-07-16 14:58:28 +00:00
Moby von Briesen
e9dd5b2845 storagenode/piecestore: Properly log/send metrics for all successful pieces
When an uplink or repair work finishes uploading a piece to a
storagenode, it has no reason to wait another round trip after the piece
is committed to gracefully close the connection - in many (most?) cases,
the connection is simply canceled once the upload is complete. This has
the unintended side effect of producing a lot of "piece canceled" logs
and metrics on the storagenode side, when the reality is that the piece
uploads were successful, and not really canceled. This commit fixes
that.

Change-Id: Icbc1f7857d380134560219c1c19c186df2783cd0
2020-07-07 15:19:17 +00:00
Egon Elbre
4d0cb1af7e storagenode/piecestore: verify before checking free disk
Change-Id: I3fe0383f9b13b99ef9d63ff235616ff204cf6d76
2020-06-02 17:49:14 +03:00
Qweder93
89c9672ce0 storagenode/piecestore: available storage check added in Upload
Change-Id: I71e9e5f335d4320d5de8b374fe747fec43179f78
2020-06-01 16:55:22 +00:00
Moby von Briesen
dc57640d9c storagenode/piecestore: switch usedserials db for in-memory usedserials store
Part 2 of moving usedserials in memory
* Drop usedserials table in storagenodedb
* Use in-memory usedserials store in place of db for order limit
verification
* Update order limit grace period to be only one hour - this means
uplinks must send their order limits to storagenodes within an hour of
receiving them

Change-Id: I37a0e1d2ca6cb80854a3ef495af2d1d1f92e9f03
2020-05-28 12:52:52 -04:00
littleskunk
2fbb34c3ea
nodeselection: Increase minimum free space to 500MB (#3898) 2020-05-25 12:13:28 +02:00
littleskunk
ef2671927d
storagenode/piecestore: move queue size defaults (#3881) 2020-05-15 19:10:26 +02:00
Egon Elbre
ec589a8289 all: fix comments about grpc
Change-Id: Id830fbe2d44f083c88765561b6c07c5689afe5bd
2020-05-11 13:05:34 +03:00
Egon Elbre
7d29f2e0d3 all: remove drpc wrappers
Change-Id: I45016f7d2a771dc00776196c1f531f3343e93b40
2020-05-11 08:20:34 +03:00
Egon Elbre
e6d5ce6b77 all: remove grpc
It seems everyone has migrated to drpc.

Change-Id: Ica6b2d0bdef68c6603083f2963458843eca71e9e
2020-05-10 06:36:09 +00:00
Jeff Wendling
57eb8a17e2 storagenode: allow configuring database path independently
Fixes #3852

Change-Id: I021c29c4dd7c393399f6abef41d8457514032833
2020-05-04 06:04:31 +00:00
Matt Robinson
4a95ad0b2f
storagenode/piecestore: remove error from info log for download canceled (#3869) 2020-04-30 18:42:16 +03:00
Isaac Hess
db0371703f storagenode/pieces: Return UnhandledCount to satellite
When we receive a piece deletion request, include the number of piece
IDs we couldn't add to the queue in the reponse

Change-Id: Ibebbe92ac50105bb5c74b18211ed38d468eb33f3
2020-04-27 08:56:56 -06:00
Isaac Hess
a785d37157 storagenode/pieces: Process deletes asynchronously
To improve delete performance, we want to process deletes asynchronously
once the message has been received from the satellite. This change makes
it so that storagenodes will send the delete request to a piece Deleter,
which will process a "best-effort" delete asynchronously and return a
success message to the satellite.

There is a configurable number of max delete workers and a max delete
queue size.

Change-Id: I016b68031f9065a9b09224f161b6783e18cf21e5
2020-04-23 11:51:19 -06:00
littleskunk
1336070fec
storagenode/piecestore: add missing log message about audit errors (#3861)
* storagenode/piecestore: add missing log message about audit errors
* storagenode/piecestore: add monkit data for oder limit verification errors
2020-04-23 13:20:47 -04:00
Jess G
dc78cd9634
storagenode/piecestore: fix annoying info log for upload canceled (#3857)
* storagenode/piecestore: fix annoying info upload canceled log

Change-Id: I3dd0f44226e7b946a2b30c3d0f30f28749ca6e88

* keep as info

Change-Id: I7ebb5c19e4865e3030a8a6bb7f6279d316853e89

Co-authored-by: Fadila <Fadila82@users.noreply.github.com>
2020-04-15 14:15:07 -07:00
Jess G
75b9a5971e
satellite: update log levels (#3851)
* satellite: update log levels

Change-Id: I86bc32e042d742af6dbc469a294291a2e667e81f

* log version on start up for every service

Change-Id: Ic128bb9c5ac52d4dc6d6c4cb3059fbad73f5d3de

* Use monkit for tracking failed ip resolutions

Change-Id: Ia5aa71d315515e0c5f62c98d9d115ef984cd50c2

* fix compile errors

Change-Id: Ia33c8b6e34e780bd1115120dc347a439d99e83bf

* add request limit value to storage node rpc err

Change-Id: I1ad6706a60237928e29da300d96a1bafa94156e5

* we cant track storage node ids in monkit metrics so lets use logging to track that for expired orders

Change-Id: I1cc1d240b29019ae2f8c774792765df3cbeac887

* fix build errs

Change-Id: I6d0ffe058e9a38b7ed031c85a29440f3d68e8d47
2020-04-15 12:32:22 -07:00
Egon Elbre
1b6ab173a8 private/context2: moved to storj.io/common/context2
Change-Id: Ic1dd1ed645ff3e1057c9b2b143e2c3ddf29d678e
2020-03-20 14:39:46 +00:00
Jennifer Johnson
1c1750e6be removes bandwidth limiting
On satellite, remove all references to free_bandwidth column in nodes table.
On storage node, remove references to AllocatedBandwidth and MinimumBandwidth and mark as deprecated.

Protobuf message, NodeCapacity, is left intact for backwards compatibility.
Once this is released to all satellites, we can drop the column from the DB.

Change-Id: I2ff6c6537fc9008a0c5588e951afea58ede85838
2020-03-04 14:04:00 +00:00
Cameron Ayer
7244a6a84e storagenode/{contact, piecestore}: implement low disk notification with cooldown
When a storagenode begins to run low on capacity, we want to notify
the satellite before completely running out of space. To achieve this,
at the end of an upload request, the SN checks if its available space has
fallen below a certain threshold. If so, trigger a notification to the
satellites.

The new NotifyLowDisk method on the monitor chore is implemented using the
common/syn2.Cooldown type, which allows us to execute contact only once
within a given timeframe; avoiding hammering the satellites with requests.
This PR contains changes to the storagenode/contact package, namely moving
methods involving the actual satellite communication out of Chore and into
Service. This allows us to ping satellites from the monitor chore

Change-Id: I668455748cdc6741291b61130d8ef9feece86458
2020-03-03 10:45:37 -05:00
Egon Elbre
64330c55b3 all: use pbgrpc
common/pb moved grpc to a separate package common/pb/pbgrpc.
This updates this repository to use it.

Change-Id: I2de2a190688871cf9cb61f7ea511f8a01e264e4e
2020-02-26 21:27:47 +02:00
Cameron Ayer
d578102672 storagenode/piecestore: add workgroup to endpoint to prevent stray goroutine after shutdown
Change-Id: Ie8444c3c8f870745b73342de2e9a93027fcad371
2020-02-24 21:38:52 +00:00
Cameron Ayer
f22bddf122 {storagenode/contact, private/testplanet}: remove ErrFailureToStart and panic in testplanet.Start
Change-Id: I252e8c9407400af7bda95a7657c8154660c3c801
2020-02-24 18:24:23 +00:00
Yingrong Zhao
5011e78311 storagenode/piecestore: remove unused DeletePiece endpoint
With commit: 3331b443e7, satellite will
start calling `DeletePieces`. Therefore, we can remove the old endpoint
once the above commit is deployed with all satellites

Change-Id: I0124bc00a7cb808d119eb59f8fcd7fadf68158bb
2020-02-21 21:03:49 +00:00
Cameron Ayer
3e70a893dd storagenode/{piecestore, contact}: report capacity to satellites if below specific threshold
Curently, storage nodes only report their capacity to satellites
once per hour. If a node fills up, it will fail all uploads until
the next contact cycle begins. With these changes, at the end of an
upload we check whether the MinimumDiskSpace threshold has been
passed. If so, trigger the monitor chore to update the node's
capacity, then trigger the contact chore to report the new
capacity to the satellites

Change-Id: Ie6aadaade1e2c12c87e03f8ff9059a50121380a0
2020-02-18 15:42:48 -05:00