Commit Graph

13 Commits

Author SHA1 Message Date
paul cannon
231c783698 satellite/audit: fix reservoir sampling bias
While researching logs from a large set of audits, I noticed that nearly
all of them had streamIDs starting with 0 or 1. This seemed very odd,
because streamIDs are supposed to be pretty much entirely random, and
every hex digit from 0-f should have been represented with roughly equal
frequency.

It turned out that our A-Chao implementation of reservoir sampling is
flawed. As far as we can tell, so is the Wikipedia implementation. No
one has yet reviewed the original 1982 paper by Dr. Chao in enough
detail to know where the error originated, but we do know that we have
been auditing segments near the beginning of the segment loop (low
streamIDs) far more often than segments near the end of the segment loop
(high streamIDs).

This change uses an algorithm Wikipedia calls "A-Res" instead, and adds
a test to check for that sort of bias creeping back in somehow. A-Res
will be slightly slower than A-Chao, because of a few extra steps that
need to be done, but it does appear to be selecting items uniformly.

Change-Id: I45eba4c522bafc729cebe2aab6f3fe65cd6336be
2022-12-09 18:16:58 -06:00
Michal Niewrzal
e37435602f satellite/audit: optimize loop observer
Two things were done to optimize audit observer:
* monik call was removed as we have different way to track it
* no new allocation for audit.Segment struct inside observer

Benchmark against 'main':
name                                         old time/op    new time/op    delta
RemoteSegment/Cockroach/multiple_segments-8    5.85µs ± 1%    0.74µs ± 4%   -87.28%  (p=0.008 n=5+5)

name                                         old alloc/op   new alloc/op   delta
RemoteSegment/Cockroach/multiple_segments-8    2.72kB ± 0%    0.00kB           ~     (p=0.079 n=4+5)

name                                         old allocs/op  new allocs/op  delta
RemoteSegment/Cockroach/multiple_segments-8      50.0 ± 0%       0.0       -100.00%  (p=0.008 n=5+5)

Change-Id: Ib973e48782bad4346eee1cd5aee77f0a50f69258
2022-10-02 22:24:37 +00:00
dlamarmorgan
b3cea3d1b6 satellite/audit: account for piece size during audit reservoir sampling
Treat the piece size as a weight, and perform weighted reservoir sampling as given in Algorithm A-Chao (https://en.wikipedia.org/wiki/Reservoir_sampling#Algorithm_A-Chao)

Change-Id: I299d0026d9e02d03b3d2130b0f32192928e6e326
2021-12-01 18:17:52 +00:00
Michał Niewrzał
70e6cdfd06 satellite/audit: move to segmentloop
Change-Id: I10e63a1e4b6b62f5cd3098f5922ad3de1ec5af51
2021-06-28 11:32:00 +00:00
Jeff Wendling
944bceabcd satellite/audit: fix reservoir sampling bias
Change-Id: Icc522fd86538b8182a1b7d42c1588c32a257acaf
2021-06-10 13:47:22 +03:00
Egon Elbre
4c9ed64f75 satellite/metabase/metaloop: move loop under metabase
Currently the loop handling is heavily related to the metabase rather
than metainfo.

metainfo over time has become related to the "public API" for accessing
the metabase data.

Currently updates monkit.lock, because monkit monitoring does not handle
ScopeNamed correctly. Needs a followup change to monitoring check.

Change-Id: Ie50519991d718dfb872ec9a0176a82e732c97584
2021-04-22 12:58:09 +03:00
Egon Elbre
267506bb20 satellite/metabase: move package one level higher
metabase has become a central concept and it's more suitable for it to
be directly nested under satellite rather than being part of metainfo.

metainfo is going to be the "endpoint" logic for handling requests.

Change-Id: I53770d6761ac1e9a1283b5aa68f471b21e784198
2021-04-21 15:54:22 +03:00
Egon Elbre
f19ef4afe5 satellite/metainfo/metaloop: move loop to a separate package
Change-Id: I94c931a27c1af6062185ec62688624ec02050f11
2021-03-23 15:37:34 +00:00
Kaloyan Raev
9aa61245d0 satellite/audits: migrate to metabase
Change-Id: I480c941820c5b0bd3af0539d92b548189211acb2
2020-12-17 14:38:48 +02:00
Egon Elbre
080ba47a06 all: fix dots
Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2
2020-07-16 14:58:28 +00:00
Egon Elbre
6615ecc9b6 common: separate repository
Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a
2019-12-27 14:11:15 +02:00
Natalie Villasana
6d363fb756
satellite/audit: create the audit queue, chore, and worker (#2888) 2019-09-05 11:40:52 -04:00
Natalie Villasana
243cedb628
satellite/audit: implement reservoir struct and RemoteSegment observer method (#2744) 2019-08-21 11:49:27 -04:00