storj/satellite
paul cannon 231c783698 satellite/audit: fix reservoir sampling bias
While researching logs from a large set of audits, I noticed that nearly
all of them had streamIDs starting with 0 or 1. This seemed very odd,
because streamIDs are supposed to be pretty much entirely random, and
every hex digit from 0-f should have been represented with roughly equal
frequency.

It turned out that our A-Chao implementation of reservoir sampling is
flawed. As far as we can tell, so is the Wikipedia implementation. No
one has yet reviewed the original 1982 paper by Dr. Chao in enough
detail to know where the error originated, but we do know that we have
been auditing segments near the beginning of the segment loop (low
streamIDs) far more often than segments near the end of the segment loop
(high streamIDs).

This change uses an algorithm Wikipedia calls "A-Res" instead, and adds
a test to check for that sort of bias creeping back in somehow. A-Res
will be slightly slower than A-Chao, because of a few extra steps that
need to be done, but it does appear to be selecting items uniformly.

Change-Id: I45eba4c522bafc729cebe2aab6f3fe65cd6336be
2022-12-09 18:16:58 -06:00
..
abtesting console/abTesting: add support for AB testing 2022-10-27 10:57:12 +00:00
accounting satellite/accounting/tally: fix looping over all buckets 2022-12-07 10:32:23 +00:00
admin satellite/{admin,ui}: implement changes for oauth2 proxy 2022-12-06 09:44:04 +00:00
analytics satellite/analytics: Added analytics to track project members addition/deletion (#5340) 2022-11-29 14:56:03 -08:00
attribution {cmd/satellite/reports, satellite/attribution}: type and variable name adjustments 2022-04-26 20:12:38 +00:00
audit satellite/audit: fix reservoir sampling bias 2022-12-09 18:16:58 -06:00
buckets satellite/accounting/tally: fix looping over all buckets 2022-12-07 10:32:23 +00:00
compensation all: reformat comments as required by gofmt 1.19 2022-08-10 18:24:55 +00:00
console satellite/payments/stripecoinpayments: add config for price overrides 2022-12-09 15:33:27 +00:00
contact satellite/contact: emit evenkit events in case of node checkin 2022-11-14 10:38:58 +00:00
gc satellite/gc/sender: avoid sending BF to disqualified and exited nodes 2022-11-29 09:56:32 +00:00
geoip satellite/geoip: update node check-in to associate a country code 2021-11-10 16:44:41 +01:00
gracefulexit satellite/gracefulexit: observer cleanup 2022-11-05 12:10:44 +00:00
inspector {satellite/metabase, satellite/metainfo, satellite/inspector} : Use metabase.GetObjectLastCommitted instead metabase.GetObjectExactVersion 2022-09-08 07:27:22 +00:00
internalpb all: fix deprecated ioutil commands 2022-10-11 15:27:29 +00:00
mailservice satellite/consoleweb: fix flaky TestAuth tests 2022-08-04 19:06:07 +00:00
metabase satellite/metrics: provide a rangedloop observer 2022-12-09 12:04:39 -07:00
metainfo Revert "satellite/metainfo: enable metainfo.multiple-versions flag by default" 2022-12-07 19:43:20 +00:00
metrics satellite/metrics: provide a rangedloop observer 2022-12-09 12:04:39 -07:00
nodeapiversion satellite/nodeapiversion: new table for tracking node api usage 2020-07-09 15:02:25 +00:00
nodeevents satellite/nodeevents: validate emails before notifying 2022-12-06 09:59:45 -05:00
nodeselection/uploadselection satellite/repairer: handle excluded countries 2022-03-14 10:59:36 -04:00
nodestats satellite: return interval_end_time in DailyStorageUsage endpoint 2022-07-27 18:24:27 +00:00
oidc all: fix deprecated ioutil commands 2022-10-11 15:27:29 +00:00
orders satellite/orders: decrease order expiration time to 24hours 2022-11-21 14:52:32 +00:00
overlay satellite/overlay: add SetNodeContained() method 2022-12-01 12:43:40 +00:00
payments satellite/payments/stripecoinpayments: add config for price overrides 2022-12-09 15:33:27 +00:00
repair satellite/repair: Add flag to allow disabling reputation updates 2022-11-24 08:31:11 -05:00
reputation satelite/overlay: insert reputation events into node events 2022-11-02 18:32:20 +00:00
revocation satellite/satellitedb: move tests to their domains 2021-02-19 17:29:15 +02:00
rewards satellite/rewards: adding SeaweedFS to partners list (#4230) 2021-10-19 21:30:31 +02:00
satellitedb satellite/{satellitedb,audit}: add NewContainment 2022-12-07 18:03:37 +00:00
snopayouts all: fix error naming 2021-04-29 15:38:21 +03:00
admin.go satellite/payments/stripecoinpayments: add config for price overrides 2022-12-09 15:33:27 +00:00
api.go satellite/payments/stripecoinpayments: add config for price overrides 2022-12-09 15:33:27 +00:00
configlock_test.go all: fix deprecated ioutil commands 2022-10-11 15:27:29 +00:00
core.go satellite/payments/stripecoinpayments: add config for price overrides 2022-12-09 15:33:27 +00:00
gc-bf.go satellite/gc: Optionally run the GC bloomfilter process once, instead of in a loop 2022-11-01 18:19:40 +00:00
gc.go satellite/gc/sender: new service to send retain filters 2022-09-20 11:49:40 +00:00
peer.go satellite/{satellitedb,audit}: add NewContainment 2022-12-07 18:03:37 +00:00
rangedloop.go satellite/metadabase/rangedloop: stream affinity for test provider 2022-12-09 16:49:02 +00:00
repairer.go satellite/{satellitedb,audit}: add NewContainment 2022-12-07 18:03:37 +00:00