Reorganize some operations in the observer.processSegment method:
* to minimal reduce the memory usage removing the segments of the
objects marked as skipped. Skipped objects aren't discarded of the
analysis stage so the segments aren't needed.
* returns earlier when an object is skipped because isn't needed a
further processing.
Change-Id: I210a26c394477ee411ff7f640507dcc07733a47f
Change the bitmask used by segment reaper to use []byte rather than uint64
This passes tests but I have literally no clue how to integration test this.
Change-Id: I393f4598b27cae6e427da2190dd3109bca721c34
As per discussed we decided to rate limit how fast we iterate through
the metainfo database in the metainfo loop. This puts in place a
mechanism for rate limiting and burst limiting if need be in the future.
The default for this rate limiting is still no limits so it stays the
same as our previous functionality.
Change-Id: I950f7192962b0e49f082d2c4284e2d52b0a925c7
Implement some unit test cases for the observer.processSegment method and fix a bug found by these tests.
A production snapshot is more certain but it's huge for having in the repository and run the test with it by the CI.
We want to have tests for the different cases to detect zombie segments and relaying on production data cannot guarantee to have all of them.
NOTE the test has been implemented with random values for not having always the same combination of segments list and the same values avoiding that the implementation gets stale due to the test. The issue about this is that it's harder to understand and we could get only sometimes failures in the CI in case that the implementation has some bug.
The random tests allow to eventually check cases that a static test may now cover because it is not expressed or because it is needed to implement a large number of cases.
Because we are worried that the test implementation is complex and we could have bugs on it despite that the same bugs should exist in both, the implementation and the test, moreover that we have to consider that the implementation and the tests have been written by different people. Because of that, we may replace entirely these random tests by a list of static ones.
* Move the observer implementation and the type definitions related with
it and helper functions to its own file.
* Cluster struct type is used as a key for a ObjectsMap type.
Observer struct type has a field of ObjectsMap.
Cluster has a field for project ID.
Observer processSegment method uses its ObjectMap field for tracking
objects.
However Observer processSegment clears the map once the projectID
diverges from the one kept in the Observer lastProjectID field, meaning
that it isn't needed to keep the projectID as part of the ObjectMap key.
For this reason, ObjectMap can use as a key just only the bucket name
and Cluster struct isn't needed.
Because of such change, the ObjectMap type has been renamed to a more
descriptive name.
* Make the types defined for this specific package not being exported.
* Create a constructor function for observer to encapsulate the map
allocation.
* Don't throw away the entirely buckets objects map when an empty one
is used to reuse part of the allocations.
Encapsulate the clearing up logic into a method.
* Make the analyzeProject function to be a method of observer.