if a drpc.ClosedError was returned, it would always take the
first (failure) branch, despite the second branch's existence.
Change-Id: Ife3b27869c4e9d37ca2914e2d1d1a2c60d326309
Storagenode download metrics are not accurate:
* the current metric bump cancel metrics only for specific error messages, but there are cases where the error is already handled (err == nill)
* instead of the full size of the piece we need to use the size of the downloaded bytes
Change-Id: I6ca75770e2d40bf514f5e273785c78e02968c919
we may in the future want to accept writes and commits as part of the
initial request message, just like
https://review.dev.storj.io/c/storj/storj/+/9245
this change is forward compatible but continues to work with
existing clients.
Change-Id: Ifd3ac8606d498a43bb35d0a3751859656e1e8995
uplinks currently get the node's certificate chain over TLS. once Noise
is in use, uplinks will no longer be able to do this. we should start
having the upload request return the certificate chain in the same
release that starts supporting noise.
Change-Id: I619b23cb8e25691bcc62d760f884403a4ccd64a0
- Adds "Remote Address" field to all INFO logs related to GET,
PUT, and DELETE requests
- Adds Offset and Size fields to all info logs related to GET
requests
Resolves https://github.com/storj/storj/issues/5404
Change-Id: I5dab1867619385362e5f1e0455dfab17d295a37a
we may in the future want to accept orders as part of the initial
request message
(e.g. https://review.dev.storj.io/c/storj/uplink/+/9246).
this change is forward compatible but continues to work with
existing clients.
Change-Id: I475ad50d6cbfee8a1f843383230698e4ef9b9e54
This ensures that empty trash and restore trash cannot run at the same
time.
Fixes https://github.com/storj/storj/issues/5416
Change-Id: I9d2e3aa3d66e61e5c8a7427a95208bb96089792d
* normalize ExistsCheckWorkers flag to avoid setting incorrect values
* wait for limiter to finish if context was canceled
Change-Id: I688a395bd958cd09233605fc264d43f91ec45ed1
Adds new method Exists which can be used to verify which
requested piece ids exists on storage node. Will verify only pieces
which belongs to the satellite that used that endpoint.
Minum WASM size was increased a bit.
https://github.com/storj/storj/issues/5415
Change-Id: Ia5f9cadeb526541b2776a8973eb7d50133ad8636
Starting restore trash in the background allows the satellite to
continue to the next storagenode without needing to wait until
completion.
Of course, this means the satellite doesn't get feedback whether it
succeeds successfully or not. This means that the restore-trash needs to
be executed several times.
Change-Id: I62d43f6f2e4a07854f6d083a65badf897338083b
* Allow configure initial piece scan
* Update tests to include new flag
* Update default config specification
* Rename configuration flag
* Rename variable and fix formatting
* Fix format
* Fix typo
Co-authored-by: Stefan Benten <mail@stefan-benten.de>
Co-authored-by: Clement Sam <clementsam75@gmail.com>
Co-authored-by: littleskunk <jens.heimbuerge@googlemail.com>
Remote closing during upload or download is entirely expected and
it shouldn't lead to an error in the log.
Bump drpc to get the version that contains correct error code
for it. Also bump errs, which contains a fix for .Has.
Fixes https://github.com/storj/storj/issues/4609
Change-Id: I9297cabcfdc4b3a2c19d478dc729f779a2aef0c3
* storagenode/piecestore: Add upload and download metrics for Grafana alerts
* storagenode/piecestore: group download metrics by piece action
Change-Id: Ib2a42b60c56c3f581915d512f4907c8db71e4624
Co-authored-by: Clement Sam <clement@storj.io>
When a satellite operator runs restore trash command, it's hard for node
operator to know whether their node has received the request. Adding
logs so this process is visible from node operators perspective.
Change-Id: I08c670b1b0c5971e6b73e038a0935cb0caaca63d
Estimate speed of the uploads by calculating the time a client has been in a non-congested state. Storage node is uncongested when it has used up over 80% of its maximum concurrent requests capacity.
Currently it's disabled by default.
errs.Class should not contain "error" in the name, since that causes a
lot of stutter in the error logs. As an example a log line could end up
looking like:
ERROR node stats service error: satellitedbs error: node stats database error: no rows
Whereas something like:
ERROR nodestats service: satellitedbs: nodestatsdb: no rows
Would contain all the necessary information without the stutter.
Change-Id: I7b7cb7e592ebab4bcfadc1eef11122584d2b20e0
Add the upload size to the log lines of the storagenode Upload endpoint
to provides the information to Storage node operators.
Change-Id: Ife661d28be72c2bf02579093e21fa811566ac8dd
We shouldn't have any EOF issues with recent drpc fix, let's reenable
and see whether it's still flaky.
Change-Id: I0de312bcb087c7f70ec9d3281d73d86f971845d5
Abstract details of writing and reading data to/from orders files so
that adding V1 and future maintenance are easier.
Change-Id: I85f4a91761293de1a782e197bc9e09db228933c9
periodically create and delete a temp file in the storage directory
to verify writability. If this check fails, shut the node down.
Change-Id: I433e3a8d1d775fc779ae78e7cf3144a05ffd0574
When we call ordersStore.BeginEnqueue, the unsent orders file for that
satellite and hour is prevented from being sent. It is freed when the
commit callback returned by BeginEnqueue is used. This change ensures
that we always call the commit callback, even when we have an empty
order or an order with Amount <= 0.
Change-Id: Ic4678f7eaa1e6957dd77d4bb5a23bb35d25b1e93
This change accomplishes multiple things:
1. Instead of having a max in flight time, which means
we effectively have a minimum bandwidth for uploads
and downloads, we keep track of what windows have
active requests happening in them.
2. We don't double check when we save the order to see if it
is too old: by then, it's too late. A malicious uplink
could just submit orders outside of the grace window and
receive all the data, but the node would just not commit
it, so the uplink gets free traffic. Because the endpoints
also check for the order being too old, this would be a
very tight race that depends on knowledge of the node system
clock, but best to not have the race exist. Instead, we piggy
back off of the in flight tracking and do the check when
we start to handle the order, and commit at the end.
3. Change the functions that send orders and list unsent
orders to accept a time at which that operation is
happening. This way, in tests, we can pretend we're
listing or sending far into the future after the windows
are available to send, rather than exposing test functions
to modify internal state about the grace period to get
the desired effect. This brings tests closer to actual
usage in production.
4. Change the calculation for if an order is allowed to be
enqueued due to the grace period to just look at the
order creation time, rather than some computation involving
the window it will be in. In this way, you can easily
answer the question of "will this order be accepted?" by
asking "is it older than X?" where X is the grace period.
5. Increases the frequency we check to send up orders to once
every 5 minutes instead of once every hour because we already
have hour-long buffering due to the windows. This decreases
the maximum latency that an order will be reported back to
the satellite by 55 minutes.
Change-Id: Ie08b90d139d45ee89b82347e191a2f8db1b88036
On run, write the storage directory verification file.
Every time the node runs it will write the file even if it already exists.
The reason we do this is because if the verification file is missing, the SN
doesn't know whether it is an incorrect directory, or it simply hasn't written
the file yet, and we want to keep nodes running without needing operator intervention.
Once this change has been a part of the minimum version for several releases,
we will move the file creation from the run command to the setup
command. Run will only verify its existence.
Change-Id: Ib7d20e78e711c63817db0ab3036a50af0e8f49cb
* Add all new orders to the orders filestore instead of the database.
* Submit orders from the filestore to the new satellite SettleWindow
endpoint.
The orders filestore will eventually replace the orders DB completely.
For now, we will still be checking the orders DB and submitting those
orders if they exist. In a later release, we will completely remove the
orders DB, but we need both the DB and filestore for the transitionary
period.
Change-Id: Iac8780fd5ab770296181bbd313e1d335f072d4dc
We only sync free disk and available space, if necessary, on startup. If the
SNs disk fills up with non-storj data, we will not know about it when reporting
available space to the satellite.
Solution: whenever we check the node's capacity, double check free disk.
If free disk < than expected available space, return free disk.
Change-Id: I66265c16e03be45b6e1f5817c70df7eac0a76455
When an uplink or repair work finishes uploading a piece to a
storagenode, it has no reason to wait another round trip after the piece
is committed to gracefully close the connection - in many (most?) cases,
the connection is simply canceled once the upload is complete. This has
the unintended side effect of producing a lot of "piece canceled" logs
and metrics on the storagenode side, when the reality is that the piece
uploads were successful, and not really canceled. This commit fixes
that.
Change-Id: Icbc1f7857d380134560219c1c19c186df2783cd0
This will give storagenode operators a better idea of whether the memory
allocated to the usedserials store is sufficient.
Change-Id: I5c30f2e39473a573f43409511ad9e2e32680479c
Part 2 of moving usedserials in memory
* Drop usedserials table in storagenodedb
* Use in-memory usedserials store in place of db for order limit
verification
* Update order limit grace period to be only one hour - this means
uplinks must send their order limits to storagenodes within an hour of
receiving them
Change-Id: I37a0e1d2ca6cb80854a3ef495af2d1d1f92e9f03
Implement an in-memory store for keeping track of order limit serial
numbers. It automatically deletes items if its size exceeds a configured
limit.
This change is part 1 - it creates the store
In part 2, the in-memory store will replace the usedserials database
Change-Id: I36f540ed809f034a27c1d7cede8a0a8b080af818