This is fixing two small issues with logging:
* repair checker was logging empty last net values as clumped pieces
but in main logic we stopped classifying it this way
* repairer summary log was showing incorrect number of pieces removed
from segment because list contains duplicated entries
There was no real issue here.
Change-Id: Ifc6a83637d621d628598200cad00cce44fa4cbf9
This patch finishes the placement aware repair.
We already introduced the parameters to select only the jobs for specific placements, the remaining part is just to configure the exclude/include rules. + a full e2e unit test.
Change-Id: I223ba84e8ab7481a53e5a444596c7a5ae51573c5
The repair checker and repair worker both need to determine which pieces
are healthy, which are retrievable, and which should be replaced, but
they have been doing it in different ways in different code, which has
been the cause of bugs. The same term could have very similar but subtly
different meanings between the two, causing much confusion.
With this change, the piece- and node-classification logic is
consolidated into one place within the satellite/repair package, so that
both subsystems can use it. This ought to make decision-making code more
concise and more readable.
The consolidated classification logic has been expanded to create more
sets, so that the decision-making code does not need to do as much
precalculation. It should now be clearer in comments and code that a
piece can belong to multiple sets arbitrarily (except where the
definition of the sets makes this logically impossible), and what the
precise meaning of each set is. These sets include Missing, Suspended,
Clumped, OutOfPlacement, InExcludedCountry, ForcingRepair,
UnhealthyRetrievable, Unhealthy, Retrievable, and Healthy.
Some other side effects of this change:
* CreatePutRepairOrderLimits no longer needs to special-case excluded
countries; it can just create as many order limits as requested (by
way of len(newNodes)).
* The repair checker will now queue a segment for repair when there are
any pieces out of placement. The code calls this "forcing a repair".
* The checker.ReliabilityCache is now accessed by way of a GetNodes()
function similar to the one on the overlay. The classification methods
like MissingPieces(), OutOfPlacementPieces(), and
PiecesNodesLastNetsInOrder() are removed in favor of the
classification logic in satellite/repair/classification.go. This
means the reliability cache no longer needs access to the placement
rules or excluded countries list.
Change-Id: I105109fb94ee126952f07d747c6e11131164fadb
When we do `satellite run api --placement '...'`, the placement rules are not parsed well.
The problem is based on `viper.AllSettings()`, and the main logic is sg. like this (from a new unit test):
```
r := ConfigurablePlacementRule{}
err := r.Set(p)
require.NoError(t, err)
serialized := r.String()
r2 := ConfigurablePlacementRule{}
err = r2.Set(serialized)
require.NoError(t, err)
require.Equal(t, p, r2.String())
```
All settings evaluates the placement rules in `ConfigurablePlacementRules` and stores the string representation.
The problem is that we don't have proper `String()` implementation (it prints out the structs instead of the original definition.
There are two main solutions for this problem:
1. We can fix the `String()`. When we parse a placement rule, the `String()` method should print out the original definition
2. We can switch to use pure string as configuration parameter, and parse the rules only when required.
I feel that 1 is error prone, we can do it (and in this patch I added a lot of `String()` implementations, but it's hard to be sure that our `String()` logic is inline with the parsing logic.
Therefore I decided to make the configuration value of the placements a string (or a wrapper around string).
That's the main reason why this patch seems to be big, as I updated all the usages.
But the main part is in beginning of the `placement.go` (configuration parsing is not a pflag.Value implementation any more, but a separated step).
And `filter.go`, (a few more String implementation for filters.
https://github.com/storj/storj/issues/6248
Change-Id: I47c762d3514342b76a2e85683b1c891502a0756a
As I learned, the `Include` supposed to communicate that some internal change also "included" to the filters during the check -> filters might be stateful.
But it's not the case any more after 552242387, where we removed the only one stateful filter.
Change-Id: I7c36ddadb2defbfa3b6b67bcc115e4427ba9e083
This patch fixes the node tag based placement of rangedloop/repairchecker + repair process.
The main change is just adding the node tags for Reliable and KnownReliabel database calls + adding new tests to prove, it works.
https://github.com/storj/storj/issues/6126
Change-Id: I245d654a18c1d61b2c72df49afa0718d0de76da1
This feature flag was disabled by default to test it slowly. Its enabled
for some time on one production satellite and test satellites without
any issue. We can enable it by default in code.
Change-Id: If9c36895bbbea12bd4aefa30cb4df912e1729e4c
Sometimes DownloadSelectionCache doesn't keep up with all node
placement changes we are doing during this test.
Change-Id: Idbda6511e3324b560cee3be85f980bf8d5b9b7ef
Currently, we have issue were while counting unhealthy pieces we are
counting twice piece which is in excluded country and is outside segment
placement. This can cause unnecessary repair.
This change is also doing another step to move RepairExcludedCountryCodes
from overlay config into repair package.
Change-Id: I3692f6e0ddb9982af925db42be23d644aec1963f
At the moment segment repairer is skipping offline nodes in checks like
clumped pieces and off placement pieces. This change is fixing this
problem using new version of KnownReliable method. New method is
returning both online and offline nodes. Provided data can be used to
find clumped and off placement pieces.
We are not using DownloadSelectionCache anymore with segment repairer.
https://github.com/storj/storj/issues/5998
Change-Id: I236a1926e21f13df4cdedc91130352d37ff97e18
Additional test case to cover situation where we are trying to
repair segment with specific placement set. We need to be sure
that segment won't be repaired into nodes that are outside
segment placement, even if that means that repair will fail.
Change-Id: I99d238aa9d9b9606eaf89cd1cf587a2585faee91
We missed to set placement as a part of selection request. It can case
uploading repaired data out of specified placement.
I will provide test as a separate change.
Change-Id: I4efe67f2d5f545a1d70e831e5d297f0977a4eed1
Segment repairer should take into account segment 'placement' field
and remove or repair pieces from nodes that are outside this placement.
In case when after considering pieces out of placement we are still above
repair threshold we are only updating segment pieces to remove
problematic pieces. Otherwise we are doing regular repair.
https://github.com/storj/storj/issues/5896
Change-Id: I72b652aff2e6b20be3ac6dbfb1d32c2840ce3d59