Currently slower storagenodes can slow down deletion queue.
To make piece deletion faster reduce the maximum time spent in
either dialing or piece deletion requests.
With this change:
* dial timeout is 3s
* request timeout is 15s
* fail threshold is set to 10min
Similarly, we'll mark storage node as failed when the timeout occurs.
The timeout usually indicates that the storagenode is overwhelmed.
Garbage collection will ensure that the pieces get deleted eventually.
Change-Id: Iec5de699f5917905f5807140e2c3252088c6399b
It's safer to create new connection pool for piece deletion
only if dialer have no existing pool assigned.
Change-Id: I26661683ab7c0198587905478057c01c8f533a7e
We want to enable connection pool for piece deletion to avoid
doing multiple SSL hanshakes to SN while massive deletion process.
Change-Id: Ic917e4eda304ee16a286926ef046fe9e38bf38ca
Satellites set their configuration values to default values using
cfgstruct, however, it turns out our tests don't test these values
at all! Instead, they have a completely separate definition system
that is easy to forget about.
As is to be expected, these values have drifted, and it appears
in a few cases test planet is testing unreasonable values that we
won't see in production, or perhaps worse, features enabled in
production were missed and weren't enabled in testplanet.
This change makes it so all values are configured the same,
systematic way, so it's easy to see when test values are different
than dev values or release values, and it's less hard to forget
to enable features in testplanet.
In terms of reviewing, this change should be actually fairly
easy to review, considering private/testplanet/satellite.go keeps
the current config system and the new one and confirms that they
result in identical configurations, so you can be certain that
nothing was missed and the config is all correct.
You can also check the config lock to see what actual config
values changed.
Change-Id: I6715d0794887f577e21742afcf56fd2b9d12170e
This PR fixes a deadlock that can happen when the number of piece
deletion requests is different from the distinct node count from those
requests. The success threshold should be based on the number of nodes
instead of the amount of requests
Change-Id: I83073a22eb1e111be1e27641cebcefecdc16afcb
services
This PR adds a limiter on the amount of concurrent objects deletion can be handled so
we don't run out of memory.
Change-Id: Id2ce368af6f86845fcdfd34cb2f5e460efe9b272
This change will require less work for the user of peiecedeletion
service by moving overlay database call into the package.
Change-Id: I14a150ab71fe885780e7a7a74db006a779507ae5
During testing it's possible to get into a scenario where all nodes are
offline and list of requests is empty.
Change-Id: I271c0ca2c72009244df13e8bc1441fcd5f3da9e0