Currently we are using Reliable to get missing pieces for repair
checker. The issue is that now checker is looking at more things than
just missing pieces (clumped/off, placement pieces) and using only node
ID is not enough. We have issue where we are skipping offline nodes from
clumped and off placement pieces check.
Reliable was refactored to get data (e.g. country, lastNet) about all
reliable nodes. List is split into online and offline. This data will be
cached for quick use by repair checker. It will be also possible to
check nodes metadata like country code or lastNet.
We are also slowly moving `RepairExcludedCountryCodes` config from
overlay to repair which makes more sens for it.
This this first part of changes.
https://github.com/storj/storj/issues/5998
Change-Id: If534342488c0e440affc2894a8fbda6507b8959d
In case some invalid characters in bucket name we need to cast
bucket name to byte array for query argument. This change is
doing this for some missed cases.
Change-Id: I47d0d8e3c85a69bdf63de1137adcd533dcfe50a8
By mistake AOST was added to query in deleteInactiveObjectsAndSegments
in DeleteZombieObjects. Delete statement is not supporting it.
Unfortunately unit tests didn't cover this case. This change removes
AOST from mentioned method and it adding AOST cases to unit tests.
Change-Id: Ib7f65134290df08c490c96b7e367d12f497a3373
Because we are saving all tallies as a single SQL statement we finally
reached maximum message size. With this change we will call SaveTallies multiple times in batches.
https://github.com/storj/storj/issues/5977
Change-Id: I0c7dd27779b1743ede66448fb891e65c361aa3b0
Some of tally queries are not passing bucket name as byte but as string.
If bucket contains some special characters encoding to bytea can fail.
This change makes sure all parts of tally passes bucket name correctly.
Change-Id: I7330d546b44d86a2e4614c814580e9e5262370ed
We decided that we will stop sending explicit delete requests to nodes
and we will cleanup deleted with GC instead.
https://github.com/storj/storj/issues/5888
Change-Id: I65a308cca6fb17e97e3ba85eb3212584c96a32cd
* optimize SQL for zombie objects deletion query by reducing some direct
selects to segments table
* set AOST for expired/zombie object deletion (was 0 since now)
https://github.com/storj/storj/issues/5881
Change-Id: I50482151d056a86fe0e31678a463f413d410759d
this change adds code to CreateGetOrderLimits to filter
out any nodes that are not in the placement specified
by the segment. notably, it does not change the audit
or repair order limits. the list segments code had to be
changed to include getting the placement field from the
database.
Change-Id: Ice3e42a327811bb20928c619a72ed94e0c1464ac
We will remove segments loop soon so we need first to move
Segment definition to rangedloop package.
https://github.com/storj/storj/issues/5237
Change-Id: Ibe6aad316ffb7073cc4de166f1f17b87aac07363
A chore responsible for purging data from the console DB has been
implemented. Currently, it removes old records for unverified user
accounts. We plan to extend this functionality to include expired
project member invitations in the future.
Resolves#5790
References #5816
Change-Id: I1f3ef62fc96c10a42a383804b3b1d2846d7813f7
The "object not found" included an additional prefix "metabase:", which
broke uplink error message detection.
Ideally, changing an internal error message shouldn't break the uplink.
Change-Id: I5ce7789cc11742d3435af1ec555bc96927f1bedc
This combines the ListStreamPositions and GetSegmentByPosition
calls with a ListSegments call that now knows how to return
only the segments within a Range, just like ListStreamPositions.
It would theoretically be possible to also include the
GetObjectLastCommitted call by having it do one of three
queries based on the incoming request range, but that would
mean duplicating the data for the object in every single
row that is returned for each segment in the range.
One gross thing that ListSegments has to do now is update the
first segment returned with the information from any ancestor
segment because GetSegmentByPosition used to do that. It only
updates the first segment so that it doesn't do O(N) database
queries. It seems difficult to have it do a single query to
update all of the segments at once. I'm not certain this change
should be merged on this basis alone.
This change has made me think a couple of things should happen:
1. Server side copy with ancestor segments strikes again
making the code less clear and potentially more buggy
or inefficient for a rare case (empirically <0.1%)
2. The download code requests individual segments from
the satellite lazily as part of its download which
requires the satellite telling it the locations of
all of the segments which requires the satellite
querying the locations of all of the segments. Instead
the download RPC could return the orders for all of
the segments for a range and the download code could
issue N download calls rather than 1 download call and
N get segment calls. I believe both sides of the code
paths would be simpler and more efficient this way.
3. In looking at the timing information for downloads when
testing this, we really need to focus on getting the
auth key and bandwidth limit verification times down.
Here's the timing I saw:
- 42ms: validate auth
- 52ms: bandwidth usage checking
- 14ms: get object info
- 26ms: get segment position info
- 26ms: getting the first segment full info
- 20ms: unaccounted for by spans
- 6ms: creating the orders
This change will remove 26ms, but there's a good 90ms
in just validation. With improved semantics hitting the
database only once and improved validation, a download
rpc taking ~30ms seems doable compared to our current
~200ms.
Change-Id: I4109dba082eaedb79e634c61dbf86efa93ab1222
By accident query to get latest table stats was ordered by 'row_count'
column instead 'create'. We need latest stats so we need ordering by
creation time.
Change-Id: I9d0a0edda8bab59c3d96b7a15cd6502ed51633fc
This flag was in general one time switch to enable versions internally.
New we can remove it as it makes code more complex.
Change-Id: I740b6e8fae80d5fac51d9425793b02678357490e
We want to eliminate usages of LoopSegmentEntry.Pieces, because it is
costing a lot of cpu time to look up node IDs with every piece of every
segment we read.
In this change, we are eliminating use of LoopSegmentEntry.Pieces in the
node tally observer (both the ranged loop and segments loop variants).
It is not necessary to have a fully resolved nodeID until it is time to
store totals in the database. We can use NodeAliases as the map key
instead, and resolve NodeIDs just before storing totals.
Refs: https://github.com/storj/storj/issues/5622
Change-Id: Iec12aa393072436d7c22cc5a4ae1b63966cbcc18
Currently, to get number of entries in segments table we are doing
heavy SELECT count(*) operation. For biggest satellite it's taking
25min now. We are using this method to get stat before and after
segments loop so it adds almost 1h to overall loop time.
With current version of crdb we are using this additional code won't be
used because global configuration for stats refresh rate is inaccurate
for such large table like `segments`. Soon we should be able to upgrade
crdb and be able to adjust refresh rate per table and configure it to
satisfy defined threshold.
https://github.com/storj/storj/issues/5544
Change-Id: I05cfd9154f08894d2bc56bf716b436d1b03b87f1
Segments loop have build-in sanity check to verify if number of segments
processed by loop is roughly fine. We want to have the same verification
for ranged loop.
https://github.com/storj/storj/issues/5544
Change-Id: Ia19edc0fb4aa8dc45993498a8e6a4eb5928485e9
This code is essentially replacement for eestream.CalcPieceSize. To call
eestream.CalcPieceSize we need eestream.RedundancyStrategy which is not
trivial to get as it requires infectious.FEC. For example infectious.FEC
creation is visible on GE loop observer CPU profile because we were
doing this for each segment in DB.
New method was added to storj.Redundancy and here we are just wiring it
with metabase Segment.
BenchmarkSegmentPieceSize
BenchmarkSegmentPieceSize/eestream.CalcPieceSize
BenchmarkSegmentPieceSize/eestream.CalcPieceSize-8 5822 189189 ns/op 9776 B/op 8 allocs/op
BenchmarkSegmentPieceSize/segment.PieceSize
BenchmarkSegmentPieceSize/segment.PieceSize-8 94721329 11.49 ns/op 0 B/op 0 allocs/op
Change-Id: I5a8b4237aedd1424c54ed0af448061a236b00295
This is first attempt to use AliasPieces inastead Pieces with
segments/range loop. So far we were always using Pieces
which are always converted from AliasPieces for easy use.
Side effect is that using NodeID with loop observers is heavy
e.g. we are using maps which behaves slower with NodeIDs.
We are starting with audit observer because it's easy to change
it as in feact it doesn't need access to real NodeID at all. We just
need to reference node in some way and this way is NodeAlias.
Results of BenchmarkRemoteSegment:
name old time/op new time/op delta
RemoteSegment/Cockroach/multiple_segments-8 1.79µs ± 6% 0.03µs ± 4% -98.29% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
RemoteSegment/Cockroach/multiple_segments-8 0.00B 0.00B ~ (all equal)
name old allocs/op new allocs/op delta
RemoteSegment/Cockroach/multiple_segments-8 0.00 0.00 ~ (all equal)
Change-Id: Ib7fc87e568a4d3a9af27b5e3b644ea68ab6db7aa
Additional elements added:
* monkit metric for observers methods like Start/Fork/Join/Finish to
be able to check how much time those methods are taking
* few more logs e.g. entries with processed range
* segmentsProcessed metric to be able to check loop progress
Change-Id: I65dd51f7f5c4bdbb4014fbf04e5b6b10bdb035ec
We have an issue where object can appear in two different listing pages.
It's because protobuf listing cursor doesn't have version included and
now we can have internally versions higher than 1. On satellite side
version 1 was always used as a default cursor version.
As a workaround for existing implementation of libuplink library we will
use always maximum version for listing cursor on satellite side.
Fixing protobuf and libuplink implementation will happen later.
https://github.com/storj/storj/issues/5570
Change-Id: Ibd27b174556c9d8b8bd60fab8cff7862fd11e994
Peer for generating bloom filters will be able to use ranged loop.
As an addition some cleanup were made:
* remove unused parts of GC BF peer (identity, version control)
* added missing Close method for ranged loop service
* some additional tests added
https://github.com/storj/storj/issues/5545
Change-Id: I9a3d85f5fffd2ebc7f2bf7ed024220117ab2be29
TestLoopContinuesAfterObserverError was failing due to system
granularity measuring the duration as 0.
TestDialer_DialTimeout was failing due to connection failure came with a
delay and wasn't being handled.
Change-Id: I4638c86f5d021a86c3d3529fab13cf3608f35c40
Our DB support in storj/private was updated to enable basic context
support for executing SQL queries. This change requires some small
adjustments as not all parts were working correctly.
storj/private commit with change:
4bc77107b7acfcc2f7ad65796d5dd3d7c64801e4
Change-Id: I64d7ed92788ea0920d12cecd1aa0e414720e9b9c
This is automated test around metabase tests. It's detecting queries
which were performing full table scan during test execution.
After merging this and checking that this is not problematic in any way
we will enable this also for testplanet tests.
One query was adjusted to avoid full table scan and pass new check.
https://github.com/storj/storj/issues/5471
Change-Id: I2d176f0c25deda801e8a731b75954f83d18cc1ce
implemented observer and partial, created new structures to keep mon
metrics remain in same way as in segment loop
Change-Id: I209c126096c84b94d4717332e56238266f6cd004
Add node tally ranged loop observer and partial.
Add node tally randed observer to range loop peer.
Add config flag to select which loop to use for node tally.
Update satellite core to use segement/ranged loop based on a flag.
Duplicate existing node tally test but using ranged loop.
Change-Id: I6786f1a16933463fab5f79601bf438203a7a5f9e
Small cleanups of the observer stats init code:
1. Use sync.Once for race free addition to the monitoring chain
(purely defensive)
2. Set the observer durations before adding to the monitoring chain on
first use.
3. observerDurations slice does not need to be initialized to non-nil
Change-Id: I9ae8ec96debc7d52c4ee5d22762a89f21bb2e38c
When an observer errors we still want to finish the other observers.
This changes store the error and continues the loop, skipping
the observer which errored out and setting the duration metric to -1.
When the error occurs in the process stage, it does continue the other
ranges of the same observer. It removes the observer entirely after the process
stage. To improve this would make it more complex due to race
conditions.
Closes https://github.com/storj/storj/issues/5389
Change-Id: I528432c491d4340817d6950f1200ee2b9e703309
Add option AsOfSystemTime to segment provider to make it equivalent to
the old segment loop.
There's no comment on what it does because it's pretty complex and
makes no sense, but we can improve it later.
closes https://github.com/storj/storj/issues/5434
Change-Id: I8f09b03803e681e2fd41008c5dba67804b0f37a1
Add an observer to monitor ranged segment loop progress.
Tested by running the segment loop in storj-up and navigating to
http://<container>:11111/mon/stats and there is the entry:
rangedloop-live,scope=storj.io/storj/satellite/metabase/rangedloop numSegments=364523630000.000000
part of https://github.com/storj/storj/issues/5223
Change-Id: If3d2774d2f17f51eac86f47c6dda1fb8ad696dfe
Support interruption of the ranged segment loop through context.
Part of https://github.com/storj/storj/issues/5223
Change-Id: Iae0260e250f8ea33affed95c6592a1f42df384eb
added in storj-sim rangedloop for each satellite, to verify it works for metrics oveserver,
removed identity from rangedloop peer as we never use it, added logs on service run, added loop
to service instead of endless for loop, interval value to config
Closes: https://github.com/storj/storj/issues/5414
Change-Id: Ibc3b06071b68feda4a35b45da2bbe36e22a02fc8