Commit Graph

255 Commits

Author SHA1 Message Date
Michal Niewrzal
9b3488276d satellite/gracefulexit: use node alias instead id with observer
Using node alias helps using less cpu and memory.

Fixes https://github.com/storj/storj/issues/5654

Change-Id: If3a5c7810732cbb1bff4dcb78706c81aca56b71b
2023-05-18 22:37:46 +00:00
Michal Niewrzal
c0e7f463fe satellite/metabase: remove segmentsloop package
Last change to remove segments loop from codebase.

https://github.com/storj/storj/issues/5237

Change-Id: I77b12911b6b4e390a7385e6e8057c7587e74b70a
2023-05-18 19:08:29 +00:00
Michal Niewrzal
4bdbb25d83 satellite/metabase/rangedloop: move Segment definition
We will remove segments loop soon so we need first to move
Segment definition to rangedloop package.

https://github.com/storj/storj/issues/5237

Change-Id: Ibe6aad316ffb7073cc4de166f1f17b87aac07363
2023-05-16 12:37:17 +00:00
Michal Niewrzal
2592aaef9c satellite/gc/bloomfilter: remove segments loop parts
We are switching completely to ranged loop.

https://github.com/storj/storj/issues/5368

Change-Id: I1a22ac4b242998e287b2b7d8167b64e850b61a0f
2023-05-15 11:46:26 +00:00
Michal Niewrzal
36e046375c satellite/repair/checker: remove segments loop parts
We are switching completely to ranged loop.

https://github.com/storj/storj/issues/5368

Change-Id: I8583549973cd36aa0e0c482c20d7a75cb7568ab3
2023-05-08 12:19:13 +00:00
Jeremy Wharton
f61230a670 satellite/console/dbcleanup: create console DB cleanup chore
A chore responsible for purging data from the console DB has been
implemented. Currently, it removes old records for unverified user
accounts. We plan to extend this functionality to include expired
project member invitations in the future.

Resolves #5790
References #5816

Change-Id: I1f3ef62fc96c10a42a383804b3b1d2846d7813f7
2023-05-05 19:11:53 +00:00
Egon Elbre
ee28103b29 satellite/metabase: fix error message
The "object not found" included an additional prefix "metabase:", which
broke uplink error message detection.

Ideally, changing an internal error message shouldn't break the uplink.

Change-Id: I5ce7789cc11742d3435af1ec555bc96927f1bedc
2023-05-02 13:53:57 +00:00
Michal Niewrzal
6a55682bc6 satellite/accounting/nodetally: remove segments loop parts
We are switching completely to ranged loop.

https://github.com/storj/storj/issues/5368

Change-Id: I6176a129ba14cf83fb635048d09e6748276b52a1
2023-04-24 14:25:53 +00:00
Egon Elbre
2405bc8f3b satellite/metabase: stop using the common error type
Updates https://github.com/storj/storj/issues/5291

Change-Id: I7b57a4b454d3619cb5d8ae4cd92f818ad2839c8b
2023-04-19 19:18:18 +03:00
Egon Elbre
54beea82da satellite/metabase: define a local ErrObjectNotFound
Updates https://github.com/storj/storj/issues/5291

Change-Id: If847cd95a9b500fb18535d20b46b1b1714021584
2023-04-19 12:03:01 +00:00
Egon Elbre
48256c91b5 storage: move errors to better locations
Change-Id: Ia44570949a8f6bb50220dc838c5b6aa21e851a4d
2023-04-06 17:26:29 +03:00
Jeff Wendling
7b06575f6f satellite/meta{base,info}: reduce db round trips for download
This combines the ListStreamPositions and GetSegmentByPosition
calls with a ListSegments call that now knows how to return
only the segments within a Range, just like ListStreamPositions.

It would theoretically be possible to also include the
GetObjectLastCommitted call by having it do one of three
queries based on the incoming request range, but that would
mean duplicating the data for the object in every single
row that is returned for each segment in the range.

One gross thing that ListSegments has to do now is update the
first segment returned with the information from any ancestor
segment because GetSegmentByPosition used to do that. It only
updates the first segment so that it doesn't do O(N) database
queries. It seems difficult to have it do a single query to
update all of the segments at once. I'm not certain this change
should be merged on this basis alone.

This change has made me think a couple of things should happen:

1. Server side copy with ancestor segments strikes again
   making the code less clear and potentially more buggy
   or inefficient for a rare case (empirically <0.1%)

2. The download code requests individual segments from
   the satellite lazily as part of its download which
   requires the satellite telling it the locations of
   all of the segments which requires the satellite
   querying the locations of all of the segments. Instead
   the download RPC could return the orders for all of
   the segments for a range and the download code could
   issue N download calls rather than 1 download call and
   N get segment calls. I believe both sides of the code
   paths would be simpler and more efficient this way.

3. In looking at the timing information for downloads when
   testing this, we really need to focus on getting the
   auth key and bandwidth limit verification times down.
   Here's the timing I saw:

       - 42ms: validate auth
       - 52ms: bandwidth usage checking
       - 14ms: get object info
       - 26ms: get segment position info
       - 26ms: getting the first segment full info
       - 20ms: unaccounted for by spans
       - 6ms:  creating the orders

   This change will remove 26ms, but there's a good 90ms
   in just validation. With improved semantics hitting the
   database only once and improved validation, a download
   rpc taking ~30ms seems doable compared to our current
   ~200ms.

Change-Id: I4109dba082eaedb79e634c61dbf86efa93ab1222
2023-04-03 16:49:00 +00:00
Michal Niewrzal
dc9cbe36ae satellite/metabase: sort stats by creation time not number of entries
By accident query to get latest table stats was ordered by 'row_count'
column instead 'create'. We need latest stats so we need ordering by
creation time.

Change-Id: I9d0a0edda8bab59c3d96b7a15cd6502ed51633fc
2023-03-30 12:31:45 +00:00
Michal Niewrzal
54b6e1614a satellite/metainfo: drop MultipleVersions config flag
This flag was in general one time switch to enable versions internally.
New we can remove it as it makes code more complex.

Change-Id: I740b6e8fae80d5fac51d9425793b02678357490e
2023-03-29 13:45:09 +00:00
paul cannon
ae5947327b satellite/accounting: Use metabase.AliasPiece with tally observer
We want to eliminate usages of LoopSegmentEntry.Pieces, because it is
costing a lot of cpu time to look up node IDs with every piece of every
segment we read.

In this change, we are eliminating use of LoopSegmentEntry.Pieces in the
node tally observer (both the ranged loop and segments loop variants).
It is not necessary to have a fully resolved nodeID until it is time to
store totals in the database. We can use NodeAliases as the map key
instead, and resolve NodeIDs just before storing totals.

Refs: https://github.com/storj/storj/issues/5622
Change-Id: Iec12aa393072436d7c22cc5a4ae1b63966cbcc18
2023-03-29 12:24:05 +00:00
Michal Niewrzal
06b51258be satellite/metabase: use table stats if are up to date
Currently, to get number of entries in segments table we are doing
heavy SELECT count(*) operation. For biggest satellite it's taking
25min now. We are using this method to get stat before and after
segments loop so it adds almost 1h to overall loop time.

With current version of crdb we are using this additional code won't be
used because global configuration for stats refresh rate is inaccurate
for such large table like `segments`. Soon we should be able to upgrade
crdb and be able to adjust refresh rate per table and configure it to
satisfy defined threshold.

https://github.com/storj/storj/issues/5544

Change-Id: I05cfd9154f08894d2bc56bf716b436d1b03b87f1
2023-03-13 14:54:13 +00:00
Michal Niewrzal
67ad792d1a satellite/rangedloop: migrate segments verification from segment loop
Segments loop have build-in sanity check to verify if number of segments
processed by loop is roughly fine. We want to have the same verification
for ranged loop.

https://github.com/storj/storj/issues/5544

Change-Id: Ia19edc0fb4aa8dc45993498a8e6a4eb5928485e9
2023-03-08 17:00:11 +00:00
Michal Niewrzal
5a9577bb43 satellite/metabase/rangedloop: test loop boundaries
Fixes https://github.com/storj/storj/issues/5620

Change-Id: Ibf23e8866c1a8a8e78281c3d07e81ffc7db30fcf
2023-03-03 09:56:01 +00:00
Márton Elek
dfe3c87b87 satellite/metabase/rangedloop: save observer durations to eventkit
Change-Id: I2d027cd7f7421e2c38dcc1277d640d4183af4cec
2023-02-24 13:30:19 +00:00
Michal Niewrzal
16b7901fde satellite/metabase: add piece size calculation to segment
This code is essentially replacement for eestream.CalcPieceSize. To call
eestream.CalcPieceSize we need eestream.RedundancyStrategy which is not
trivial to get as it requires infectious.FEC. For example infectious.FEC
creation is visible on GE loop observer CPU profile because we were
doing this for each segment in DB.

New method was added to storj.Redundancy and here we are just wiring it
with metabase Segment.

BenchmarkSegmentPieceSize
BenchmarkSegmentPieceSize/eestream.CalcPieceSize
BenchmarkSegmentPieceSize/eestream.CalcPieceSize-8         	    5822	    189189 ns/op	    9776 B/op	       8 allocs/op
BenchmarkSegmentPieceSize/segment.PieceSize
BenchmarkSegmentPieceSize/segment.PieceSize-8              	94721329	        11.49 ns/op	       0 B/op	       0 allocs/op

Change-Id: I5a8b4237aedd1424c54ed0af448061a236b00295
2023-02-22 11:04:02 +00:00
Michal Niewrzal
06f4aede9b satellite/audit: use NodeAlias instead pure NodeID
This is first attempt to use AliasPieces inastead Pieces with
segments/range loop. So far we were always using Pieces
which are always converted from AliasPieces for easy use.
Side effect is that using NodeID with loop observers is heavy
e.g. we are using maps which behaves slower with NodeIDs.

We are starting with audit observer because it's easy to change
it as in feact it doesn't need access to real NodeID at all. We just
need to reference node in some way and this way is NodeAlias.

Results of BenchmarkRemoteSegment:
name                                         old time/op    new time/op    delta
RemoteSegment/Cockroach/multiple_segments-8    1.79µs ± 6%    0.03µs ± 4%  -98.29%  (p=0.008 n=5+5)

name                                         old alloc/op   new alloc/op   delta
RemoteSegment/Cockroach/multiple_segments-8     0.00B          0.00B          ~     (all equal)

name                                         old allocs/op  new allocs/op  delta
RemoteSegment/Cockroach/multiple_segments-8      0.00           0.00          ~     (all equal)

Change-Id: Ib7fc87e568a4d3a9af27b5e3b644ea68ab6db7aa
2023-02-21 15:31:59 +00:00
Michal Niewrzal
aba2f14595 satellite/metabase/rangedloop: few additions for monitoring
Additional elements added:
* monkit metric for observers methods like Start/Fork/Join/Finish to
be able to check how much time those methods are taking
* few more logs e.g. entries with processed range
* segmentsProcessed metric to be able to check loop progress

Change-Id: I65dd51f7f5c4bdbb4014fbf04e5b6b10bdb035ec
2023-02-17 08:46:00 +00:00
Michal Niewrzal
41bcc6bb62 satellite/metainfo: fix duplicates while listing committed objects
We have an issue where object can appear in two different listing pages.
It's because protobuf listing cursor doesn't have version included and
now we can have internally versions higher than 1. On satellite side
version 1 was always used as a default cursor version.

As a workaround for existing implementation of libuplink library we will
use always maximum version for listing cursor on satellite side.

Fixing protobuf and libuplink implementation will happen later.

https://github.com/storj/storj/issues/5570

Change-Id: Ibd27b174556c9d8b8bd60fab8cff7862fd11e994
2023-02-14 14:47:27 +01:00
Michal Niewrzal
94d341bcf3 satellite: use ranged loop with GC-GF peer
Peer for generating bloom filters will be able to use ranged loop.

As an addition some cleanup were made:
* remove unused parts of GC BF peer (identity, version control)
* added missing Close method for ranged loop service
* some additional tests added

https://github.com/storj/storj/issues/5545

Change-Id: I9a3d85f5fffd2ebc7f2bf7ed024220117ab2be29
2023-02-13 18:32:21 +00:00
Egon Elbre
b14019b8c5 satellite/{metabase/rangedloop,metainfo/piecedeletion}: fix flaky tests
TestLoopContinuesAfterObserverError was failing due to system
granularity measuring the duration as 0.

TestDialer_DialTimeout was failing due to connection failure came with a
delay and wasn't being handled.

Change-Id: I4638c86f5d021a86c3d3529fab13cf3608f35c40
2023-02-09 16:07:00 +02:00
Egon Elbre
dcebcf770d satellite/metabase: add database comments
Change-Id: I78e19fb9301044d076e3eabf7e92fa6b715ddf03
2023-02-07 11:42:21 +02:00
Michal Niewrzal
4e0c7b2d90 satellite/metabase/metabasetest: fix race while running tests
Change-Id: I4fdb328443617bc9710ee6b9168b31870fe336d9
2023-02-02 10:35:28 +00:00
Michal Niewrzal
bd8867cd09 satellite: adjust code to handle context cancelation for SQL queries
Our DB support in storj/private was updated to enable basic context
support for executing SQL queries. This change requires some small
adjustments as not all parts were working correctly.

storj/private commit with change:
4bc77107b7acfcc2f7ad65796d5dd3d7c64801e4

Change-Id: I64d7ed92788ea0920d12cecd1aa0e414720e9b9c
2023-01-27 10:07:43 +01:00
Michal Niewrzal
b5c5c62d7b satellite/metabase: add missing error check
Change-Id: I6891f0647cb8e4a8dd6534eaa3588bbe76e2721d
2023-01-26 11:18:14 +00:00
Michal Niewrzal
bb2ac4279a satellite/metainfo: enable multiple versions fix by default
Change-Id: I6cc7ba928e59cc8b8fa50f2ab19ec5418dc76507
2023-01-26 09:35:20 +00:00
Michal Niewrzal
8850fde9f5 satellite/metabase/metabasetest: detect full scan table queries
This is automated test around metabase tests. It's detecting queries
which were performing full table scan during test execution.

After merging this and checking that this is not problematic in any way
we will enable this also for testplanet tests.

One query was adjusted to avoid full table scan and pass new check.

https://github.com/storj/storj/issues/5471

Change-Id: I2d176f0c25deda801e8a731b75954f83d18cc1ce
2023-01-23 19:40:20 +00:00
Qweder93
d6a948f59d satellite/repair : implemented ranged loop observer
implemented observer and partial, created new structures to keep mon
metrics remain in same way as in segment loop

Change-Id: I209c126096c84b94d4717332e56238266f6cd004
2023-01-23 14:23:03 +00:00
Yaroslav Vorobiov
5644fb1a7e satellite/accounting/nodetally: add ranged loop
Add node tally ranged loop observer and partial.
Add node tally randed observer to range loop peer.
Add config flag to select which loop to use for node tally.
Update satellite core to use segement/ranged loop based on a flag.
Duplicate existing node tally test but using ranged loop.

Change-Id: I6786f1a16933463fab5f79601bf438203a7a5f9e
2023-01-17 13:50:18 +01:00
Andrew Harding
c5b5695bca satellite/metabase/rangedloop: clean up observerstats init
Small cleanups of the observer stats init code:
1. Use sync.Once for race free addition to the monitoring chain
   (purely defensive)
2. Set the observer durations before adding to the monitoring chain on
   first use.
3. observerDurations slice does not need to be initialized to non-nil

Change-Id: I9ae8ec96debc7d52c4ee5d22762a89f21bb2e38c
2023-01-13 10:40:30 +00:00
Erik van Velzen
ed910b6087
satellite/metabase/rangedloop: continue after error (#5430)
When an observer errors we still want to finish the other observers.

This changes store the error and continues the loop, skipping
the observer which errored out and setting the duration metric to -1.

When the error occurs in the process stage, it does continue the other
ranges of the same observer. It removes the observer entirely after the process
stage. To improve this would make it more complex due to race
conditions.

Closes https://github.com/storj/storj/issues/5389

Change-Id: I528432c491d4340817d6950f1200ee2b9e703309
2023-01-11 22:23:17 +01:00
Erik van Velzen
2d863759b0 satellite/metabase/rangedloop: add AsOfSystemTime
Add option AsOfSystemTime to segment provider to make it equivalent to
the old segment loop.

There's no comment on what it does because it's pretty complex and
makes no sense, but we can improve it later.

closes https://github.com/storj/storj/issues/5434

Change-Id: I8f09b03803e681e2fd41008c5dba67804b0f37a1
2023-01-11 16:22:18 +00:00
Michal Niewrzal
282aaf8945 satellite/metabase: fix GetStreamPieceCountByNodeID full table scan
Previous version of SQL query was causing full table scan.

Output of EXPLAIN:
---
distribution: local
vectorized: true

• lookup join
│ table: segments@segments_pkey
│ equality: (?column?) = (stream_id)
│ pred: remote_alias_pieces IS NOT NULL
│
└── • union
    │
    ├── • values
    │     size: 1 column, 1 row
    │
    └── • scan
          missing stats
          table: segment_copies@segment_copies_pkey
          spans: [/'\xff135285155378d980b8c49148cef3ca' - /'\xff135285155378d980b8c49148cef3ca']
---

Change-Id: I708d1df204ac2d33cefe80b23594442b193424d2
2023-01-10 23:35:22 +00:00
Erik van Velzen
23b92da490
satellite/metabase/rangedloop: live reporting (#5366)
Add an observer to monitor ranged segment loop progress.

Tested by running the segment loop in storj-up and navigating to
http://<container>:11111/mon/stats and there is the entry:

rangedloop-live,scope=storj.io/storj/satellite/metabase/rangedloop numSegments=364523630000.000000

part of https://github.com/storj/storj/issues/5223

Change-Id: If3d2774d2f17f51eac86f47c6dda1fb8ad696dfe
2023-01-06 09:49:14 +01:00
Erik van Velzen
1d4411f166
satellite/metabase/rangedloop: cancellation (#5364)
Support interruption of the ranged segment loop through context.

Part of https://github.com/storj/storj/issues/5223

Change-Id: Iae0260e250f8ea33affed95c6592a1f42df384eb
2023-01-05 16:32:30 +01:00
Qweder93
8c69ee62fc {cmd/storj-sim, satellite/rangedloop}: added rangedloop to storj-sim, removed identity
added in storj-sim rangedloop for each satellite, to verify it works for metrics oveserver,
removed identity from rangedloop peer as we never use it, added logs on service run, added loop
to service instead of endless for loop, interval value to config

Closes: https://github.com/storj/storj/issues/5414

Change-Id: Ibc3b06071b68feda4a35b45da2bbe36e22a02fc8
2023-01-05 11:29:00 +00:00
Erik van Velzen
1da9fd1eee
satellite/metabase/rangedloop: monkit durations (#5365)
Wire up duration measurement of observers with monkit.

Tested by attaching a SleepObserver, starting the rangedloop in storj-up
and navigating to http://<container>:11111/mon/stats. It reports the
following statistic:

completed-observer-duration,observer=*rangedlooptest.SleepObserver,scope=storj.io/storj/satellite/metabase/rangedloop duration=10.000117

Change-Id: Ief131d34001dd5d3ba1d7be6f161986e1f66440d
2023-01-04 12:16:47 +00:00
Michal Niewrzal
77afdae741 satellite/metabase: handle target pending/committed objects while move
Before we introduced objects versions internally move operation was
always failing when under target location object exists. But then we
had only single version 1 all the time. With versions different than 1
we need to check all existing objects under target location.

To be backward compatible with our API new logic looks like this:
* if there is no object under target location use source object version
as target version
* if there are only pending objects find first free (highest) version
which could be used to move object there
* if there is committed object under target location reject move
operation

Fixes https://github.com/storj/storj/issues/5403

Change-Id: I717f3e7c42470b406287d6ec335f6f057d3fc3b5
2023-01-04 08:50:51 +00:00
Erik van Velzen
37b4981cc0
satellite/metabase/rangedloop: measure observer duration (#5350)
Track duration of all segment loop observers. Factor out functions to
reduce size.

Still need to send the measurements out via monkit.

Part of https://github.com/storj/storj/issues/5223

Change-Id: Iae0260e250f8ea33affed95c6592a1f42df384eb
2022-12-21 21:58:08 +01:00
Egon Elbre
04f16f8768 cmd/tools/segment-verify: tool for checking duplicate net
Change-Id: Ie47c1282e580ffc418bf3b1f3c8820a48973aefc
2022-12-15 22:58:36 +00:00
Michal Niewrzal
0bbbb9c4c1 satellite/metabase: fix log for multiple committed version
Change-Id: I2556c5b523091c11937a01efff07be9e0dd964aa
2022-12-13 13:08:02 +00:00
Michal Niewrzal
0759cbdc7f satellite/metabase: handle copies with GetStreamPieceCountByNodeID
We missed proper handling of object copies for method
GetStreamPieceCountByNodeID which is used by metabase.GetObjectIPs.
That caused some lack of IPs returned when queriyng IPs of copy and
broke things like pices map on linksharing.

Fixes https://github.com/storj/storj/issues/5406

Change-Id: I9574776f34880788c2dc9ff78a6ae20d44fe628f
2022-12-13 12:32:56 +01:00
Andrew Harding
b562cbf98f satellite/metrics: provide a rangedloop observer
https://github.com/storj/storj/issues/5236

Change-Id: Ic1ed7a5533dccacd58285b64579dbdd6210de4f9
2022-12-09 12:04:39 -07:00
Andrew Harding
633ab8dcf6 satellite/metadabase/rangedloop: stream affinity for test provider
Some observers assume that they will observe all the segments for a
given stream, and that they will observe those segments in a sequential
stream over one or more iterations.

This change updates the range provider from rangedlooptest to provide
these guarantees.

The change also removes the Mock suffix from the provider/splitter types
since the package name (rangedlooptest) implies that the type is a test
double.

Change-Id: I927c409807e305787abcde57427baac22f663eaa
2022-12-09 16:49:02 +00:00
Michal Niewrzal
5c2131ed0d satellite/metabase: always try to remove old version on commit
We have a bug in our behavior while doing API pods deployment. At this
time its possible to have pods with multiple versions flag set true only
partially for some of pods. Because of that it's possible to start new
object without removing existing/older version on BeginObject
(new behavior) and also don't remove that existing/older object on
CommitObject. That can cause to have two committed objects with
different versions and that's a state we want to avoid.

To fix it we are removing multiple versions flag from CommitObject to
always try delete existing objects. This way even if we don't remove
existing object on BeginObject it will be always removed while
committing.

Fixes https://github.com/storj/storj/issues/5373

Change-Id: Idc334bf5cc785d2f559af96e92c3de6d82ca58ba
2022-12-09 13:45:03 +00:00
Erik van Velzen
3cf7ebfad0
satellite/metabase/rangedloop: database abstraction (#5337)
Add an abstraction rangedloop.SegmentProvider to fetch chunks of
segments from the metainfo database in parallel.

Part of https://github.com/storj/storj/issues/5223

Change-Id: Ife26467ea0c3be550bde0b05464ef1db62dd4d2a
2022-12-09 03:01:07 +01:00