Storagenodes are currently getting larger signed orders due to
a performance optimization in uplink, which now messes with the
ingress graph because the storagenode plots the graph using
the order amount instead of actually uploaded bytes, which this
change fixes.
The egress graph might have a similar issue if the order amount
is larger than the actually downloaded bytes but since we pay
for orders, whether fulfilled or unfulfilled, we continue using
the order amount for the egress graph.
Resolves https://github.com/storj/storj/issues/5853
Change-Id: I2af7ee3ff249801ce07714bba055370ebd597c6e
* storagenode/orders/ordersfiles: unit test coverage
This change implements unit testing on common.go from the ordersfile package.
* storagenode/orders/ordersfiles: unit test coverage
This change implements the zeebo assert library instead of gotools as to not introduce a new dependency.
* storagenode/orders/ordersfiles: unit test coverage
This change implements the zeebo assert library instead of gotools as to not introduce a new dependency.
Lazyfilewalker was failing with SIGPIPE which was quite
misleading. The command was failing because the
the value of the --lower-io-priority flag was assumed
to be an arguement since it was passed as
"--lower-io-priority true" instead "--lower-io-priority=true"
Resolves https://github.com/storj/storj/issues/5900
Change-Id: Icf79fcce76dafee21659d76ee0ce19d8520c8f1d
Instead of the hardcoded payout rates that is assumed for all satellites,
this change adds a new endpoint for fetching the pricing model for
each satellite.
The pricing model is then displayed on the Info & Estimation table
on the dashboard
Updates https://github.com/storj/storj-private/issues/245
Change-Id: Iac7669e3e6eb690bbaad6e64bbbe42dfd775f078
This is particularly useful for monitoring the lazyfilewalker to
make sure it is not checking the wrong directory.
Updates https://github.com/storj/storj/issues/5349
Change-Id: I7e5fcfd4545ec4157d33a9225cd1bce607ccd154
The execwrapper package wraps the exec.Cmd and has a Command
interface that mimics the behaviour of the exec.Cmd.
This is useful for testing the lazyfilewalker subprocesses
by stubbing instead of spawning a real subprocess.
Updates https://github.com/storj/storj/issues/5349
Change-Id: I14084139c76a531f2b6d7163f9aa35c3f5e192d7
We've had issues with forgetting to close readers and writers.
Add leak tracking to find those pesky issues.
Change-Id: If6b0ad6e9958318a7e0affee9c6d0a1ece412b6d
As part of fixing the IO priority of filewalker related
processes such as the garbage collection and used-space
calculation, this change allows the initial used-space
calculation to run as a separate subprocess with lower
IO priority.
This can be enabled with the `--storage2.enable-lazy-filewalker`
config item. It falls back to the old behaviour when the
subprocess fails.
Updates https://github.com/storj/storj/issues/5349
Change-Id: Ia6ee98ce912de3e89fc5ca670cf4a30be73b36a6
We automatically start a chore to check whether the blobstore is
writeable and readable, however, we don't want to fail the tests due to
that reason. Usually we want to test some other failure.
There probably should be some nicer way to achieve this, but this is an
easier fix.
Change-Id: I77ada75329f88d3ea52edd2022e811e337c5255a
this change makes it so that the storage node no longer
cares if the cert of peers it talks to has been signed
by the sno registration server. this is fine because
the only reason a storage node would talk to a peer
besides the explicitly configured satellites is because
a satellite told it to.
we have already disabled this on uplinks (uplinks don't
care about the peer ca whitelist), and we are starting
to consider disabling this on satellites entirely.
however, before we really can disable it on satellites,
we need to disable it on storage nodes so that graceful
exit and node to node transfers can work correctly.
Change-Id: I2e0a0781bd247e574b82f0065aafb88804e59c71
The blobstore implementation is entirely related to storagenode, so the
rightful place is together with the storagenode implementation.
Fixes https://github.com/storj/storj/issues/5754
Change-Id: Ie6637b0262cf37af6c3e558556c7604d9dc3613d
storj/storj uses storj/uplink and storj/uplink uses storj/storj (for integration test).
Without using the real defaults (instead of hard coded ones) in storj/storj, we couldn't modify them. (modification in uplink will fail when storj/storj is used for integration test, with the unchanged, hard-coded defaults).
Change-Id: Ifa68567dc2d5c8d08af8041ac338870c4fc26d45
This is not recommended for most nodes; leaving your node running when
it can't handle requests fast enough is a good way to fail audits and
get disqualified, which may happen before you even know about the
problem.
But some Windows users are finding that this is being triggered
regularly on their nodes, and that it apparently causes the whole system
to lock up occasionally. We are adding this option as a way to mitigate
that problem until we can collect more information.
Change-Id: I7a652b0f9f970bbb9ed9f2cb3ad1cb89d90db8d7
FileWalker implements methods to walk over pieces in
in a storage directory.
This is just a refactor to separate filewalker functions
from pieces.Store. This is needed to simplify the work
to create a separate filewalker subprocess and reduce the
number of config flags passed to the subprocess.
You might want to check https://review.dev.storj.io/c/storj/storj/+/9773
Change-Id: I4e9567024e54fc7c0bb21a7c27182ef745839fff
Download is server from two goroutines:
* one is waiting for the orders (and updates the actual limit)
* other one sends the valuable bytes back to the client (in case the actual order is big enough)
These two tasks are syncrhonized with the help of a `sync2.NewThrottle()`
But all of these happens in the same method, therefore we have no idea how much time is spent on waiting for next orders
(throttle can wait until we receive new orderlimit), and how much time is spent with actual work.
This patch moves the actual work (after sending routine is waked up) to a separated method to have better visibility and measure the actual work (read data + send it).
Change-Id: Ia5068c544560a53bc2fcea6cb6fce85cfacbd95b
if a drpc.ClosedError was returned, it would always take the
first (failure) branch, despite the second branch's existence.
Change-Id: Ife3b27869c4e9d37ca2914e2d1d1a2c60d326309
to support TCP_FAST_OPEN, we're considering just using
two TCP connections in parallel per request, one with
and one without. this allows us to safely fire both
concurrently without stressing out the node too much.
see https://review.dev.storj.io/c/storj/storj/+/9933
Change-Id: I9aa8a0252350db5ace04ee125bfe469203e980ec
Storagenode download metrics are not accurate:
* the current metric bump cancel metrics only for specific error messages, but there are cases where the error is already handled (err == nill)
* instead of the full size of the piece we need to use the size of the downloaded bytes
Change-Id: I6ca75770e2d40bf514f5e273785c78e02968c919
we may in the future want to accept writes and commits as part of the
initial request message, just like
https://review.dev.storj.io/c/storj/storj/+/9245
this change is forward compatible but continues to work with
existing clients.
Change-Id: Ifd3ac8606d498a43bb35d0a3751859656e1e8995
this change uses the new storj/common noise helpers, which:
* add a security fix (require an expected node id for validating
noise key attestations)
* stops doing an unnecessary order signature validation (it's
already been done inside of PutPiece)
* removes some duplicate code
Change-Id: I5e67a08ff216cd9c5b0b82e40b4d9de664b6b0fc
The used space graph values are correct when a single satellite is
selected but wrong for 'All satellites'. This is related to the
queries for getting the individual disk usages for all satellites
per day and the summary and average for all satellites per day:
1. dividing the sum of at_rest_total by the total_hours is wrong.
Simply put, we were assuming that, for example (4/2)+(6/3) equals
to (4+6)/(2+3), assuming we had 4 and 6 at_rest_total values with
2 and 3 respective hours.
2. To get the average, we need to first find the sum of the
at_rest_total_bytes for each timestamp across all satellites
before taking the average of the sums instead of just taking the
average from the individual satellite values.
Closes https://github.com/storj/storj/issues/5519
Change-Id: Ib1314e238b695a6c1ecd9f9171ee86dd56bb3b24
uplinks currently get the node's certificate chain over TLS. once Noise
is in use, uplinks will no longer be able to do this. we should start
having the upload request return the certificate chain in the same
release that starts supporting noise.
Change-Id: I619b23cb8e25691bcc62d760f884403a4ccd64a0
A user on the forum was seeing the error "bad message", which was not
very helpful. This case from the ext4 filesystem using the code EBADMSG
to indicate it detected an invalid CRC, suggesting disk corruption.
This change adds some explanatory information about probable disk
corruption to all errors coming from the (*blobInfo).Stat() call, which
is where storagenode fs corruption problems will usually manifest.
Refs: https://github.com/storj/storj/issues/5375
Change-Id: I87f4a800236050415c4191ef1a0fc952f9def315
Calling stat() (really, lstat()) on every file during a directory walk
is the step that takes up the most time. Furthermore, not all directory
walk uses _need_ to have a stat done on every file. Therefore, in this
commit we avoid doing the stat at the lowest level of
walkNamespaceInPath. The stat will still be done when it is requested,
with the Stat() method on the blobInfo object.
The major upside of this is that we can avoid the stat call on most
files during a Retain operation. This should speed up garbage collection
considerably.
The major downside is that walkNamespaceInPath will no longer
automatically skip over directories that are named like blob files, or
blob files which are deleted between readdir() and stat(). Callers to
walkNamespaceInPath and its variants (WalkNamespace,
WalkSatellitePieces, etc) are now expected to handle these cases
individually.
Thanks to forum member Toyoo for the insight that this would speed up
garbage collection.
Refs: https://github.com/storj/storj/issues/5454
Change-Id: I72930573d58928fa25057ed89cd4ec474b884199
- Adds "Remote Address" field to all INFO logs related to GET,
PUT, and DELETE requests
- Adds Offset and Size fields to all info logs related to GET
requests
Resolves https://github.com/storj/storj/issues/5404
Change-Id: I5dab1867619385362e5f1e0455dfab17d295a37a
we may in the future want to accept orders as part of the initial
request message
(e.g. https://review.dev.storj.io/c/storj/uplink/+/9246).
this change is forward compatible but continues to work with
existing clients.
Change-Id: I475ad50d6cbfee8a1f843383230698e4ef9b9e54
This ensures that empty trash and restore trash cannot run at the same
time.
Fixes https://github.com/storj/storj/issues/5416
Change-Id: I9d2e3aa3d66e61e5c8a7427a95208bb96089792d
* normalize ExistsCheckWorkers flag to avoid setting incorrect values
* wait for limiter to finish if context was canceled
Change-Id: I688a395bd958cd09233605fc264d43f91ec45ed1
Adds new method Exists which can be used to verify which
requested piece ids exists on storage node. Will verify only pieces
which belongs to the satellite that used that endpoint.
Minum WASM size was increased a bit.
https://github.com/storj/storj/issues/5415
Change-Id: Ia5f9cadeb526541b2776a8973eb7d50133ad8636
Starting restore trash in the background allows the satellite to
continue to the next storagenode without needing to wait until
completion.
Of course, this means the satellite doesn't get feedback whether it
succeeds successfully or not. This means that the restore-trash needs to
be executed several times.
Change-Id: I62d43f6f2e4a07854f6d083a65badf897338083b
Previously because of the use of a LAG to calculate the hour_interval
the first record, which is usually the first day of the month usually,
doesn’t have a previous record and always assumes the at_rest_total is
for 24 hours.
Resolves https://github.com/storj/storj/issues/5390
Change-Id: Id532f8b38fe9df61432e62655318ff119a733d13
The query changes we did while fixing the usage graph led to wrong
payout calculations directly linked to disk space.
This change:
- avoids converting from Bh to B directly in the query
- returns the at_rest_total in the original bytes*hour value
- returns at_rest_total_bytes as the calculated disk spaced used in bytes
- uses the at_rest_total_bytes only for the disk space graph
- return summary_bytes as the average disk space used within the specified date
- updates the disk space graph header to "average disk space used this month"
The total disk used in the month is also displayed in B not B*day
Resolves https://github.com/storj/storj/issues/5355
Change-Id: I2cfefb0fe711f9c59de2adb547c4ab50b05c7cbb