This patch adds two new monkit metric:
* piece_writer_io: the sum of the time, which is spent with io.Write during a piece upload (excluding the fs sync of the commit)
* piece_writer_hash: the sum of the time, which is spent with hashing
The second is especially important. My storagenode (hosted on a cloud server) spend ~30 ms on hasing data, piece_write_io time is usually 5ms for me.
These metrics can help us to identify the reason of slownes on storagenode sides.
Both of these depend on the size of the piece. To make it more meaningfull without exploding the cardinality, I created a few size categories and classified the pieces based on these. Measurements shows that it can provide usefull results (>2MB uploads are usually 23-28 ms).
Change-Id: Ifa1c205a490046655bcc34891003e7b43ed9c0bc
This change allows a node to look for a piece in the trash when
serving a download request.
If the piece is found in the trash, it restores it to the blobs
directory and continue to serve the request as expected.
Resolves https://github.com/storj/storj/issues/6145
Change-Id: Ibfa3c0b4954875fa977bc995fc4dd2705ca3ce42
This add tests to the zapwrapper package and also adds a test
to verify the issue in https://github.com/storj/storj/issues/6006
Change-Id: Iec3f568e72683af71e1718017109a1ed52794b0b
There are many case where the keywords `free` and `available`
are confused in their usage.
For most cases, `free` space is the amount of free space left
on the whole disk, and not just in allocation while
`available` space is the amount of free space left in the
allocated disk space.
What the user/sno wants to see is not the free space but the
available space. To the SNO, free space is the free space
left in the allocated disk space.
Because of this confusion, the multinode dashboard displays
the `free` disk space instead of the free space in the
allocated disk space https://github.com/storj/storj/issues/5248
While the storagenode dashboard shows the correct free space
in the allocation.
This change fixes the wrong free disk space. I also added a
few comments to make a distinction between the `free`
and `available` fields in the `DiskSpace*` structs.
Change-Id: I11b372ca53a5ac05dc3f79834c18f85ebec11855
Lazyfilewalker was failing with SIGPIPE which was quite
misleading. The command was failing because the
the value of the --lower-io-priority flag was assumed
to be an arguement since it was passed as
"--lower-io-priority true" instead "--lower-io-priority=true"
Resolves https://github.com/storj/storj/issues/5900
Change-Id: Icf79fcce76dafee21659d76ee0ce19d8520c8f1d
The execwrapper package wraps the exec.Cmd and has a Command
interface that mimics the behaviour of the exec.Cmd.
This is useful for testing the lazyfilewalker subprocesses
by stubbing instead of spawning a real subprocess.
Updates https://github.com/storj/storj/issues/5349
Change-Id: I14084139c76a531f2b6d7163f9aa35c3f5e192d7
We've had issues with forgetting to close readers and writers.
Add leak tracking to find those pesky issues.
Change-Id: If6b0ad6e9958318a7e0affee9c6d0a1ece412b6d
As part of fixing the IO priority of filewalker related
processes such as the garbage collection and used-space
calculation, this change allows the initial used-space
calculation to run as a separate subprocess with lower
IO priority.
This can be enabled with the `--storage2.enable-lazy-filewalker`
config item. It falls back to the old behaviour when the
subprocess fails.
Updates https://github.com/storj/storj/issues/5349
Change-Id: Ia6ee98ce912de3e89fc5ca670cf4a30be73b36a6
The blobstore implementation is entirely related to storagenode, so the
rightful place is together with the storagenode implementation.
Fixes https://github.com/storj/storj/issues/5754
Change-Id: Ie6637b0262cf37af6c3e558556c7604d9dc3613d
FileWalker implements methods to walk over pieces in
in a storage directory.
This is just a refactor to separate filewalker functions
from pieces.Store. This is needed to simplify the work
to create a separate filewalker subprocess and reduce the
number of config flags passed to the subprocess.
You might want to check https://review.dev.storj.io/c/storj/storj/+/9773
Change-Id: I4e9567024e54fc7c0bb21a7c27182ef745839fff
Calling stat() (really, lstat()) on every file during a directory walk
is the step that takes up the most time. Furthermore, not all directory
walk uses _need_ to have a stat done on every file. Therefore, in this
commit we avoid doing the stat at the lowest level of
walkNamespaceInPath. The stat will still be done when it is requested,
with the Stat() method on the blobInfo object.
The major upside of this is that we can avoid the stat call on most
files during a Retain operation. This should speed up garbage collection
considerably.
The major downside is that walkNamespaceInPath will no longer
automatically skip over directories that are named like blob files, or
blob files which are deleted between readdir() and stat(). Callers to
walkNamespaceInPath and its variants (WalkNamespace,
WalkSatellitePieces, etc) are now expected to handle these cases
individually.
Thanks to forum member Toyoo for the insight that this would speed up
garbage collection.
Refs: https://github.com/storj/storj/issues/5454
Change-Id: I72930573d58928fa25057ed89cd4ec474b884199
This ensures that empty trash and restore trash cannot run at the same
time.
Fixes https://github.com/storj/storj/issues/5416
Change-Id: I9d2e3aa3d66e61e5c8a7427a95208bb96089792d
Adds new method Exists which can be used to verify which
requested piece ids exists on storage node. Will verify only pieces
which belongs to the satellite that used that endpoint.
Minum WASM size was increased a bit.
https://github.com/storj/storj/issues/5415
Change-Id: Ia5f9cadeb526541b2776a8973eb7d50133ad8636
Starting restore trash in the background allows the satellite to
continue to the next storagenode without needing to wait until
completion.
Of course, this means the satellite doesn't get feedback whether it
succeeds successfully or not. This means that the restore-trash needs to
be executed several times.
Change-Id: I62d43f6f2e4a07854f6d083a65badf897338083b
The collector tries deleting a piece over and over again, though
the piece does not exist on the storagenode's filesystem.
We need to delete the piece info from the expired db if the
targeted file does not exist.
This does not resolve the base problem of why the file
is deleted before the collector tries deleting it.
This change deletes the piece info from the expired db
if the file does not exist, since we're already trying
to delete that piece anyway.
Closes https://github.com/storj/storj/issues/4192
Change-Id: If659185ca14f1cb29fd3c4237374df6fcd535df8
* Allow configure initial piece scan
* Update tests to include new flag
* Update default config specification
* Rename configuration flag
* Rename variable and fix formatting
* Fix format
* Fix typo
Co-authored-by: Stefan Benten <mail@stefan-benten.de>
Co-authored-by: Clement Sam <clementsam75@gmail.com>
Co-authored-by: littleskunk <jens.heimbuerge@googlemail.com>
Currently TextMaxVerifyCount flakes in some tests, try increasing the
sleep time to ensure that things are slow enough to trigger the error
condition.
Also pass ctx to all the funcs so we can handle sleep better.
Change-Id: I605b6ea8b14a0a66d81a605ce3251f57a1669c00
This is a temporary precaution to avoid incorrectly auditing nodes for pieces that were deleted between database backups if we have to restore from a previous backup.
Here we send pieces to trash rather than directly deleting them from storage nodes so we can restore from trash after a db restoration.
Change-Id: Icd979d2a9a755e7428190c0129c9bc969649d544
periodically create and delete a temp file in the storage directory
to verify writability. If this check fails, shut the node down.
Change-Id: I433e3a8d1d775fc779ae78e7cf3144a05ffd0574
Sometimes SNOs fail to properly configure or lose connection to their storage directory
which can result in DQ. This causes unnecessary repair and is unfortunate for all parties.
This change introduces the creation of a special file in the storage directory at runtime
containing the node ID. While the storage node runs, it periodically verifies that it can
find said file with the correct contents in the correct location. If not, the node will
shut down with an error message.
This change will solve the issue of nodes losing access to the storage directory, but it will not
solve the issue of nodes pointing to the wrong directory, as the identifying file is created each
time the node starts up. After this change has been the minimum version for a few releases, we will
remove the creation of the directory-identifying file from the storage node run command and add it
to the setup command.
Change-Id: Ib7b10e96ac07373219835e39239e93957e7667a4
In walkNamespaceWithPrefix log in case of "lstat" error, because this may indicate an underlying disk corruption.
SG-50
Change-Id: I867c3ffc47cfac325ae90658ec4780d213ff3e63
Currently uploads can cause a lot of IOPS, reduce this by introducing a
in-memory buffer on-top of the file.
Change-Id: I5f4e3e01c0a36258271d180b922107de447bcb59