periodically create and delete a temp file in the storage directory
to verify writability. If this check fails, shut the node down.
Change-Id: I433e3a8d1d775fc779ae78e7cf3144a05ffd0574
Sometimes SNOs fail to properly configure or lose connection to their storage directory
which can result in DQ. This causes unnecessary repair and is unfortunate for all parties.
This change introduces the creation of a special file in the storage directory at runtime
containing the node ID. While the storage node runs, it periodically verifies that it can
find said file with the correct contents in the correct location. If not, the node will
shut down with an error message.
This change will solve the issue of nodes losing access to the storage directory, but it will not
solve the issue of nodes pointing to the wrong directory, as the identifying file is created each
time the node starts up. After this change has been the minimum version for a few releases, we will
remove the creation of the directory-identifying file from the storage node run command and add it
to the setup command.
Change-Id: Ib7b10e96ac07373219835e39239e93957e7667a4
In walkNamespaceWithPrefix log in case of "lstat" error, because this may indicate an underlying disk corruption.
SG-50
Change-Id: I867c3ffc47cfac325ae90658ec4780d213ff3e63
Currently uploads can cause a lot of IOPS, reduce this by introducing a
in-memory buffer on-top of the file.
Change-Id: I5f4e3e01c0a36258271d180b922107de447bcb59
Before the deleter would close its done channel once, so if additional
tests shared a storagenode, even if not in parallel, the later waits
would not work properly. This fixes that problem.
Change-Id: I7dcacf6699cef7c2c2948ba0f4369ef520601bf5
There was a race in the test code for piece deleter, which made it
possible to broadcast on the condition variable before anyone was
waiting. This change fixes that and has Wait take a context so it times
out with the context.
Change-Id: Ia4f77a7b7d2287d5ab1d7ba541caeb1ba036dba3
When we receive a piece deletion request, include the number of piece
IDs we couldn't add to the queue in the reponse
Change-Id: Ibebbe92ac50105bb5c74b18211ed38d468eb33f3
Each time we process a piece deletion on the storagenode, monitor how
long the item was in the queue and the size of the queue.
Change-Id: I23f1a44f8b9cecb901bdf4739d55c005ffed4bef
To improve delete performance, we want to process deletes asynchronously
once the message has been received from the satellite. This change makes
it so that storagenodes will send the delete request to a piece Deleter,
which will process a "best-effort" delete asynchronously and return a
success message to the satellite.
There is a configurable number of max delete workers and a max delete
queue size.
Change-Id: I016b68031f9065a9b09224f161b6783e18cf21e5
this commit updates our monkit dependency to the v3 version where
it outputs in an influx style. this makes discovery much easier
as many tools are built to look at it this way.
graphite and rothko will suffer some due to no longer being a tree
based on dots. hopefully time will exist to update rothko to
index based on the new metric format.
it adds an influx output for the statreceiver so that we can
write to influxdb v1 or v2 directly.
Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff
This test checks that we are actually walking over the pieces when
starting the cache, and that it is returning expected values.
A recent outage was partially caused by the fact that this cache was
accidentally reading itself (via the pieces store, which has the cache
embedded). This test ensures that does not happen, and checks that when
the cache's `Run` method is called, the space used values are read from
disk and accurately update the cache.
Change-Id: I9ec61c4299ed06c90f79b17de3ffdbbb06bc502e
instead of aborting on the first error, so that we can hit all
satellites and get the best numbers we can
Change-Id: I21d5163884940612d7d39eaf73a6fac07235cd9e
A few variables were not renamed to the new standard piecesTotal and
piecesContentSize, so it was unclear which value was being used. These
have been updated, and some comments made more thorough.
Change-Id: I363bad4dec2a8e5c54d22c3c4cd85fc3d2b3096c
This change updates the storagenode piecestore apis to expose access to
the full piece size stored on disk. Previously we only had access to
(and only kept a cache of) the content size used for all pieces. This
was inaccurate when reporting the amount of disk space used by nodes.
We now have access to the total content size, as well as the total disk
usage, of all pieces. The pieces cache also keeps a cache of the total
piece size along with the content size.
Change-Id: I4fffe7e1257e04c46021a2e37c5adc6fe69bee55
On pieces usage cache init we now load the trash info from the db. Also
fixes a test that was masking the failure here.
Change-Id: I9ff7da5bc6c0f74cf0942e20931b40e0c88d70fa
When error is formatted using %v it's not possible to check
whether the error was caused by a context cancellation.
Change-Id: Ia77dfb0817e49d9a7b168c12a6300d131007d0ee
Remove starting up messages from peers. We expect all of them to start,
if they don't, then they should return an error why they don't start.
The only informative message is when a service is disabled.
When doing initial database setup then each migration step isn't
informative, hence print only a single line with the final version.
Also use shorter log scopes.
Change-Id: Ic8b61411df2eeae2a36d600a0c2fbc97a84a5b93
This commit adds functionality to include the space used in the trash
directory when calculating available space on the node.
It also includes this trash value in the space used cache, with methods
to keep the cache up-to-date as files are trashed, restored, and
emptied.
As part of the commit, the RestoreTrash and EmptyTrash methods have
slightly changed signatures. RestoreTrash now also returns the keys that
were restored, while EmptyTrash also returns the total disk space
recovered. Each of these changes makes it possible to keep the cache
up-to-date and know how much space is being used/recovered.
Also changed is the signature of PieceStoreAccess.ContentSize method.
Previously this method returns only the content size of the blob,
removing the size of any header data. This method has been renamed
`Size` and returns both the full disk size and content size of the blob.
This allows us to only stat the file once, and in some instances (i.e.
cache) knowing the full file size is useful.
Note: This commit simply adds the trash size data to the piece size data
we were already collecting. The piece size data is not accurate for all
use-cases (e.g. because it does not contain piece header data); however,
this commit does not fix that problem. Now that the ContentSize (Size)
method returns the full size of the file, it should be easier to fix
this problem in a future commit.
Change-Id: I4a6cae09e262c8452a618116d1dc66b687f59f85
- also updated ping chore to pick up trust changes
- fixed small typo in blueprint
- fixed flags for storj-sim
- wired up changes to testplanet
Change-Id: I02982f3a63a1b4150b82a009ee126b25ed51917d
* put TestCreateV0 back in StoreForTest
* avoid direct handles to V0 pieceinfo db
* type mismatch fix
* use storage.Blobs interface in store_test.go
..instead of filestore.Store. this will allow filestore.Store to become
unexported.
* unexport filestore.Store
rename it to blobStore. things should use the storage.Blobs interface
instead. changes in this commit are purely mechanical (made through the
"refactor" tool in Gocode followed by search/replace on the word "Store"
within the storage/filestore/ directory).
* kill filestore.StoreForTest
now that filestore.blobStore is unexported, there isn't a need for a
specialized wrapper type. this (not coincidentally) also makes it
possible for the WriterForFormatVersion() method on
storagenode/pieces.StoreForTest to work, without requiring everything to
wrap the store.blobs attribute in a filestore.StoreForTest, which was
impractical.