Commit Graph

32 Commits

Author SHA1 Message Date
Michał Niewrzał
d72f9525d4 satellite/metainfo/piecedeletion: use nodes cache
Piece deletion service was using KnownReliable method from
overlaycache to get nodes addresses to send delete request.
KnownReliable was always hitting DB because this method was
not using cache. This change is using new DownloadSelectionCache
to avoid direct DB calls.

Change is not perfect because DownloadSelectionCache is not as
precise as KnownReliable method and can select few more nodes
to which we will send delete request but difference should be
small and we can improve it later.

Updates https://github.com/storj/storj/issues/4959

Change-Id: I4c3d91089a18ac35ebcb469a56536c33f76e44ea
2022-07-21 00:43:24 +00:00
Egon Elbre
edb8d656de satellite/metainfo: adjust piecedeletion timeouts
Currently slower storagenodes can slow down deletion queue.
To make piece deletion faster reduce the maximum time spent in
either dialing or piece deletion requests.

With this change:
* dial timeout is 3s
* request timeout is 15s
* fail threshold is set to 10min

Similarly, we'll mark storage node as failed when the timeout occurs.
The timeout usually indicates that the storagenode is overwhelmed.
Garbage collection will ensure that the pieces get deleted eventually.

Change-Id: Iec5de699f5917905f5807140e2c3252088c6399b
2021-10-28 13:37:01 +03:00
Egon Elbre
9fd091831d ci: optimize benchmarks
We are not using the benchmark results for anything, they are mostly
there to ensure that we don't break the benchmarks. So we can disable
CockroachDB for them.

Similarly add short versions of other tests.

Also try to precompile test/benchmark code.

Change-Id: I60b501789f70c289af68c37a052778dc75ae2b69
2021-10-08 19:42:40 +03:00
Michał Niewrzał
8a21a8cbc5 satellite/metainfo: don't enable connection pool if was setup earlier
It's safer to create new connection pool for piece deletion
only if dialer have no existing pool assigned.

Change-Id: I26661683ab7c0198587905478057c01c8f533a7e
2021-10-06 15:25:23 +00:00
Michał Niewrzał
f93dc5a166 satellite/metainfo/piecedeletion: enable connection pool
We want to enable connection pool for piece deletion to avoid
doing multiple SSL hanshakes to SN while massive deletion process.

Change-Id: Ic917e4eda304ee16a286926ef046fe9e38bf38ca
2021-10-05 13:30:42 +00:00
Márton Elek
5b40922aca satellite/metainfo: avoid endless loop when startup fails
Change-Id: Ic7e97996e950bd6d4c19a1e38a6517e801dbf92b
2021-09-30 13:26:26 +00:00
JT Olio
da9ca0c650 testplanet/satellite: reduce the number of places default values need to be configured
Satellites set their configuration values to default values using
cfgstruct, however, it turns out our tests don't test these values
at all! Instead, they have a completely separate definition system
that is easy to forget about.

As is to be expected, these values have drifted, and it appears
in a few cases test planet is testing unreasonable values that we
won't see in production, or perhaps worse, features enabled in
production were missed and weren't enabled in testplanet.

This change makes it so all values are configured the same,
systematic way, so it's easy to see when test values are different
than dev values or release values, and it's less hard to forget
to enable features in testplanet.

In terms of reviewing, this change should be actually fairly
easy to review, considering private/testplanet/satellite.go keeps
the current config system and the new one and confirms that they
result in identical configurations, so you can be certain that
nothing was missed and the config is all correct.
You can also check the config lock to see what actual config
values changed.

Change-Id: I6715d0794887f577e21742afcf56fd2b9d12170e
2021-06-01 22:14:17 +00:00
Rafael Gomes
8b2e4bfa7e satellite/metainfo/piecedeletion Remove spaces from metrics.
Change-Id: Iaf1d8a96a43087f2fcc579347f581e8a78a0fb58
2020-12-30 14:27:39 -03:00
Moby von Briesen
db6bc6503d satellite/metainfo: Update metainfo RS config to more easily support multiple RS schemes.
Make metainfo.RSConfig a valid pflag config value. This allows us to
configure the RSConfig as a string like k/m/o/n-shareSize, which makes
having multiple supported RS schemes easier in the future.

RS-related config values that are no longer needed have been removed
(MinTotalThreshold, MaxTotalThreshold, MaxBufferMem, Verify).

Change-Id: I0178ae467dcf4375c504e7202f31443d627c15e1
2020-11-09 22:16:13 +00:00
Yingrong Zhao
2cc3c19d55 satellite/piecedeletion: add test for disproportionate requests/nodes
ratio

Change-Id: I7902e4ffeca39c5a6a5b9170ce6f010d21e7e19e
2020-09-03 21:00:53 +00:00
Yingrong Zhao
7c7334e9d5 satellite/metainfo/piecedeletion: fix deadlock when waiting for threshold to reach
This PR fixes a deadlock that can happen when the number of piece
deletion requests is different from the distinct node count from those
requests. The success threshold should be based on the number of nodes
instead of the amount of requests

Change-Id: I83073a22eb1e111be1e27641cebcefecdc16afcb
2020-09-02 07:07:49 +00:00
Yingrong Zhao
14ad7a4f1c satellite/metainfo: add limiter for objectdeletion and piecedeletion
services

This PR adds a limiter on the amount of concurrent objects deletion can be handled so
we don't run out of memory.

Change-Id: Id2ce368af6f86845fcdfd34cb2f5e460efe9b272
2020-08-19 16:08:29 +00:00
Yingrong Zhao
0518b16370 satellite/piecedeletion: move node info retrieval into the service
This change will require less work for the user of peiecedeletion
service by moving overlay database call into the package.

Change-Id: I14a150ab71fe885780e7a7a74db006a779507ae5
2020-08-13 16:07:54 +00:00
Egon Elbre
080ba47a06 all: fix dots
Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2
2020-07-16 14:58:28 +00:00
Egon Elbre
5bdcd86fa7 ci: test benchmarks
This runs each benchmark for one iteration to ensure that they are
valid. Unfortunately, it does not give any useful metrics as output.

Change-Id: I68940398c8dd849aed656bd12656f48d5df10128
2020-07-10 13:26:49 +00:00
Egon Elbre
3567b49ef4 satellite/metainfo/piecedeletion: fix int to string conversion
Change-Id: I8d0cdec7cee182ade479c0cfb1d38f7f97af2ab7
2020-06-29 17:22:52 +03:00
Rafael Gomes
bdaabd611d satellite/metainfo/piecedeletion: add metrics for deletion requests
Change-Id: I48ad96e78dab84b9238c63d62bda679fc65c2072
2020-06-26 11:12:33 -03:00
Michal Niewrzal
84892631c8 private/testplanet: remove old libuplink from testplanet
Change-Id: Ib1553f84d0b3ae12a5b00382f0f53357b6a273e2
2020-05-28 13:50:23 +00:00
Egon Elbre
7f323754a4 metainfo/piecedeletion: use NodeURL-s
Change-Id: I247dbfe03e7864e940e4cd1d0f343f38e84099e0
2020-05-21 08:37:13 +03:00
Egon Elbre
ed627144ed all: use DialNodeURL throughout the codebase
Change-Id: Iaf9ae3aeef7305c937f2660c929744db2d88776c
2020-05-20 10:36:30 +00:00
Stefan Benten
01e0ba2e0d
satellite/metainfo: move logging to debug for piece deletion (#3873) 2020-05-01 23:29:28 +02:00
Isaac Hess
237d9da477 storagenode/pieces: Deleter can handle multiple tests
Before the deleter would close its done channel once, so if additional
tests shared a storagenode, even if not in parallel, the later waits
would not work properly. This fixes that problem.

Change-Id: I7dcacf6699cef7c2c2948ba0f4369ef520601bf5
2020-04-29 11:26:56 -06:00
Yingrong Zhao
518946fab9 satellite/metainfo/piecedeletion: add metrics for unhandled pieces
Change-Id: I0cd66e09a8de7c7c0a708b2a9fe44ed1739770b0
2020-04-29 13:30:51 +00:00
Isaac Hess
baccfd36b1 private/testplanet: Mark sn peer deleter test mode
When running testplanet tests, mark storagenode peer PieceDeleter as in
testing mode so that you don't have to do it on each test.

Change-Id: I2592e02c63f8bcc9152ecf436bac4e798b08bccf
2020-04-28 15:57:29 -06:00
Isaac Hess
13bf0c62ab satellite/pieces: Fix race in piece deleter
There was a race in the test code for piece deleter, which made it
possible to broadcast on the condition variable before anyone was
waiting. This change fixes that and has Wait take a context so it times
out with the context.

Change-Id: Ia4f77a7b7d2287d5ab1d7ba541caeb1ba036dba3
2020-04-28 10:50:20 -06:00
Isaac Hess
a785d37157 storagenode/pieces: Process deletes asynchronously
To improve delete performance, we want to process deletes asynchronously
once the message has been received from the satellite. This change makes
it so that storagenodes will send the delete request to a piece Deleter,
which will process a "best-effort" delete asynchronously and return a
success message to the satellite.

There is a configurable number of max delete workers and a max delete
queue size.

Change-Id: I016b68031f9065a9b09224f161b6783e18cf21e5
2020-04-23 11:51:19 -06:00
Egon Elbre
676f3e8516 satellite/metainfo/piecedeletion: try to make batches larger
Currently it was possible that PopAll returns 1010 items, then
makes one RPC call with 1000 items, then RPC call 10 items. Meanwhile,
there have been added 500 new items added to the queue.

This change ensures that we pull items from the queue early and
try to make rpc batches as large as possible.

Change-Id: I1a30dde9164c2ff7b90c906a9544593c4f1cf0e9
2020-04-22 18:43:29 +00:00
Egon Elbre
45d1ca87f5 satellite/metainfo/piecedeletion: handle zero case
During testing it's possible to get into a scenario where all nodes are
offline and list of requests is empty.

Change-Id: I271c0ca2c72009244df13e8bc1441fcd5f3da9e0
2020-04-16 13:00:32 +03:00
Egon Elbre
e1a443b04a private/testplanet: allow modifying created database
Instead of providing the database from outside to testplanet create it
inside and then allow wrapping and modifying it. This is more convenient
to use.

Change-Id: I9b8f69e6e0a19ff984b4e2bfe927c9100c77bc6c
2020-03-27 19:14:48 +00:00
Egon Elbre
09e0f3de63 satellite/metainfo/piecedeletion: add Service
Change-Id: Id7e32ed569701fa0be66f9527c43a67052994570
2020-03-18 14:50:08 +00:00
Egon Elbre
22ea0c7c1a satellite/metainfo/piecedeletion: add Dialer
This adds a piece deletion handler that has debounce for failed dialing
and batching multiple jobs into a single request.

Change-Id: If64021bebb2faae7f3e6bdcceef705aed41e7d7b
2020-03-16 23:36:01 +00:00
Egon Elbre
3d6518081a satellite/metainfo/piecedeletion: add Combiner
To handle concurrent deletion requests we need to combine them into a
single request.

To implement this we introduces few concurrency ideas:

* Combiner, which takes a node id and a Job and handles combining
  multiple requests to a single batch.

* Job, which represents deleting of multiple piece ids with a
  notification mechanism to the caller.

* Queue, which provides communication from Combiner to Handler.
  It can limit the number of requests per work queue.

* Handler, which takes an active Queue and processes it until it has
  consumed all the jobs.
  It can provide limits to handling concurrency.

Change-Id: I3299325534abad4bae66969ffa16c6ed95d5574f
2020-03-16 17:13:26 +00:00