This change fixes the access of unset segments and keys on the reservoir
when the reservoir size is less than the max OR the number of sampled
segments is smaller than the reservoir size. It does so by tucking away
the segments and keys behind methods that return properly sized slices
into the segments/keys arrays.
It also fixes a bug in the housekeeping for the internal index variable
that holds onto how many items in the array have been populated. As part
of this fix, it changes the type of index to int8, which reduces the
size of the reservoir struct by 8 bytes.
The tests have been updated to provide better coverage for this case.
Change-Id: I3ceb17b692fe456fc4c1ca5d67d35c96aeb0a169
While researching logs from a large set of audits, I noticed that nearly
all of them had streamIDs starting with 0 or 1. This seemed very odd,
because streamIDs are supposed to be pretty much entirely random, and
every hex digit from 0-f should have been represented with roughly equal
frequency.
It turned out that our A-Chao implementation of reservoir sampling is
flawed. As far as we can tell, so is the Wikipedia implementation. No
one has yet reviewed the original 1982 paper by Dr. Chao in enough
detail to know where the error originated, but we do know that we have
been auditing segments near the beginning of the segment loop (low
streamIDs) far more often than segments near the end of the segment loop
(high streamIDs).
This change uses an algorithm Wikipedia calls "A-Res" instead, and adds
a test to check for that sort of bias creeping back in somehow. A-Res
will be slightly slower than A-Chao, because of a few extra steps that
need to be done, but it does appear to be selecting items uniformly.
Change-Id: I45eba4c522bafc729cebe2aab6f3fe65cd6336be
Two things were done to optimize audit observer:
* monik call was removed as we have different way to track it
* no new allocation for audit.Segment struct inside observer
Benchmark against 'main':
name old time/op new time/op delta
RemoteSegment/Cockroach/multiple_segments-8 5.85µs ± 1% 0.74µs ± 4% -87.28% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
RemoteSegment/Cockroach/multiple_segments-8 2.72kB ± 0% 0.00kB ~ (p=0.079 n=4+5)
name old allocs/op new allocs/op delta
RemoteSegment/Cockroach/multiple_segments-8 50.0 ± 0% 0.0 -100.00% (p=0.008 n=5+5)
Change-Id: Ib973e48782bad4346eee1cd5aee77f0a50f69258