This change implements the ranged loop observer to replace the audit
chore that builds the audit queue.
The strategy employed by this change is to use a collector for each
segment range to build separate per-node segment reservoirs that are
then merge them during the join step.
In previous observer migrations, there were only a handful of tests so
the strategy was to duplicate them. In this package, there are dozens
of tests that utilize the chore. To reduce code churn and maintenance
burden until the chore is removed, this change introduces a helper that
runs tests under both the chore and observer, providing a pair of
functions that can be used to pause or run the queueing function.
https://github.com/storj/storj/issues/5232
Change-Id: I8bb4b4e55cf98b1aac9f26307e3a9a355cb3f506
The tests are forked from the chore tests with slight adaptations for
being run against the ranged loop. I also moved a benchmark for the
database from chore_test.go to db_test.go.
The pathcollector is reused as a rangedloop.Partial.
https://github.com/storj/storj/issues/5234
Change-Id: I56182031d133812a9f4d4a433c01b9150af39f31
This change creates a new independent process, the 'auditor', comparable
to the repairer, gc, and api processes. This will allow auditors to be
scaled independently of the core.
Refs: https://github.com/storj/storj/issues/5251
Change-Id: I8a29eeb0a6e35753dfa0eab5c1246048065d1e91
Now that all the reverification changes have been made and the old code
is out of the way, this commit renames the new things back to the old
names. Mostly, this involves renaming "newContainment" to "containment"
or "NewContainment" to "Containment", but there are a few other renames
that have been promised and are carried out here.
Refs: https://github.com/storj/storj/issues/5230
Change-Id: I34e2b857ea338acbb8421cdac18b17f2974f233c
Now that we are doing scalable piecewise reverifications, the code for
handling the old way of doing things (containment, pending audits,
reporting, testing) can now be removed.
Refs: https://github.com/storj/storj/issues/5230
Change-Id: Ief1a75f423eff682e8f3d57804e343b3409a6631
Here we add a worker class comparable to audit.Worker, which will be
responsible for pulling items off of the reverification queue and
calling reverifier.ReverifyPiece on them.
Note that piecewise reverification audits (which this will control) are
not yet being done. That is, nothing is being added to the
reverification queue at this point.
Refs: https://github.com/storj/storj/issues/5251
Change-Id: I94e28830e27caa49f2c8bd4a2336533e187ab69c
The Reporter is responsible for processing results from auditing
operations, logging the results, disqualifying nodes that reached
the maximum reverification count, and passing the results on to
the reputation system.
In this commit, we extend the Reporter so that it knows how to process
the results of piecewise reverification audits.
We also change most reporter-related tests so that reverifications
happen as piecewise reverification audits, exercising the new code.
Note that piecewise reverification audits are not yet being done outside
of tests. In a later commit, we will switch from doing segmentwise
reverifications to piecewise reverifications, as part of the
audit-scaling effort.
Refs: https://github.com/storj/storj/issues/5230
Change-Id: I9438164ce1ea4d9a1790d18d0e1046a8eb04d8e9
NewContainment will replace Containment later in this commit chain, but
for now it is not yet being used.
NewContainment will allow a node to be contained for multiple pending
reverify jobs at a time. It is implemented by way of the reverify queue.
Refs: https://github.com/storj/storj/issues/5231
Change-Id: I126eda0b3dfc4710a88fe4a5f41780618ec19101
Adding a new worker comparable to Verifier, called Reverifier; as the
name suggests, it will be used for reverifications, whereas Verifier
will be used for verifications.
This allows distinct logging from the two classes, plus we can add some
configuration that is specific to the Reverifier.
There is a slight modification to GetNextJob that goes along with this.
This should have no impact on operational concerns.
Refs: https://github.com/storj/storj/issues/5251
Change-Id: Ie60d2d833bc5db8660bb463dd93c764bb40fc49c
As part of the effort of splitting out the auditor workers to their own
process, we are transitioning the communication between the auditor
chore and the verification workers to a queue implemented in the
database, rather than the sequence of in-memory queues we used to use.
This logical database is safely partitionable from the rest of
satelliteDB.
Refs: https://github.com/storj/storj/issues/5251
Change-Id: I6cd31ac5265423271fbafe6127a86172c5cb53dc
Add a new chore to periodically insert nodes who are offline and
have not gotten an offline email in a certain amount of time into node
events
Change-Id: I658b385bb777b0240c98092946a93d65bee94abc
Create NodeEvents Chore on satellite core to read nodeevents DB and
notify node operators on node events. The chore sends notifications
grouped by email and event type: it selects the oldest entry in
nodeevents.DB and also any other event with the same email and event
type no matter how old it is. The oldest entry of a group must exist for
a minimum amount of time before that group can be selected, however.
This minimum amount of time is a configurable value:
--node-events.selection-wait-period. This wait period allows us to
combine events of the same time and same email address into a singular
email.
Change-Id: I8b444aa324d2dae265cc27d9e9e85faef79195d8
Add nodeevents.DB to satellite overlay service so we can insert node
events into the nodeevents DB.
Change-Id: I642c0ccc9941ecdb08cb22d5c8cf701959a55156
since amount of objects is growing and looping through all of them
starts taking lot of time, we are switching for SQL query to do it
in chunks of tallies per bucket. 2nd part of issue fix.
Closes https://github.com/storj/team-metainfo/issues/125
Change-Id: Ia26bcac0a7e2c6503df9ebbf4817a636841d3284
We want to send emails to SNOs. Node status changes go through the
overlay service, so it's a good place to add the mail service.
Add the mailservice.Service, satellite address, and satellite name to
overlay service. Also add feature flag --overlay.send-node-emails
Change-Id: I3bd2cb3bf22f9724954ce2374f8b651b902b3a24
In an effort to distribute load on the reputation database, the
reputation write cache scheduled nodes to be written at a time offset by
the local nodeID. The idea was that no two repair workers would have the
same nodeID, so they would not tend to write to the same row at the same
time.
Instead, since all satellite processes share the same satellite ID
(duh), this caused _all_ workers to try and write to the same row at the
same time _always_. This was not ideal.
This change uses a random number instead of the satellite ID. The random
number is sourced from the number of nanoseconds since the Unix epoch.
As long as workers are not started at the exact same nanosecond, they
ought to get well-distributed offsets.
Change-Id: I149bdaa6ca1ee6043cfedcf1489dd9d3e3c7a163
Currently we have a significant number of tallies that need to be
deleted together. Add a limit (by default 10k) to how many will
be deleted at the same time.
Change-Id: If530383f19b4d3bb83ed5fe956610a2e52f130a1
We want to remind unverified users to verify their emails:
once after 24 hours has passed and again after 5 days has passed.
Add mailservice.Service to satellite core because it is needed by the
chore for sending emails. To add the mailservice.Service to the core,
we create a helper function in satellite/peer.go to avoid duplicating
the code in both api.go and core.go. In addition to the chore, this
change adds methods to users.DB to get unverified users in need of
reminder.
Change-Id: I4e515bdf43f922788b4f965b2efb34fa32288bd1
This will apply an appropriate "subsystem" label to goroutines which are
part of the core, api, repairer, admin, or gc subsystems.
It will also label goroutines whose job it is to watch for slow shutdown
of lifecycle groups (there are a lot of these).
Finally, this will also label goroutines whose job it is to wait on the
toplevel errgroup of a subsystem.
Change-Id: I560b5fff4a0101300d6c9a67609c2d80d7424486
This sets the corresponding _numeric columns to be NOT NULL (it has been
verified manually that there are no more NULL _numeric values on any
known satellites, and it should be impossible with current code to get
new NULL values in the _numeric columns.
We can't drop the _gob columns immediately, as there will still be code
running that expects them, but once this version is deployed we can
finally drop them and be totally done with this crazy 5-step migration.
Change-Id: I518302528d972090d56b3eedc815656610ac8e73
All code on known satellites at this moment in time should know how to
populate and use the new numeric columns on the
stripecoinpayments_tx_conversion_rates and coinpayments_transactions
tables in the satellite db. However, there are still gob-encoded
big.Float values in the database from before these columns existed. To
get rid of those values, so that we can excise the gob-decoding code
from the relevant sections, however, we need something to read the gob
bytestrings and convert them to numeric values, a few at a time, until
they're all gone.
To accomplish that, this change adds two chores to be run in the
satellite core process- one for the coinpayments_transactions table, and
one for the stripecoinpayments_tx_conversion_rates table. They should
run relatively infrequently, so that we do not impose any undue load on
processing resources or the db.
Both of these chores work without using explicit sql transactions, but
should still be concurrent-safe, since they work by way of
compare-and-swap type operations.
If the satellite core process needs to be restarted, both of these
chores will start scanning for migrateable rows from the beginning of
the id space again. This is not ideal, but shouldn't be a problem (as
far as I can tell, there are only a few thousand rows at most in either
of these tables on any production satellite).
Change-Id: I733b7cd96760d506a1cf52735f598c6c3aa19735
In addition to upgrading the storj.io/common library, this change
moves off the TCPConnector in favor of the HybridConnector per
the deprecation warning.
Change-Id: I7e7e1e7568e8b95e4a99ad9caa158a799e68e1e3
The main motivation is to wrap the bucket DB and metainfo DB, so we
could check if a bucket is empty before applying geofencing config.
Change-Id: I8bac21555e01d51a663fb557bc1acfc8106bc2e1
Removes database tables and functionality related to our custom
coupon implementation because it has been superseded by the Stripe
coupon and promo code system. Requires implementations of the
payments Invoices interface to return coupon usages along with
invoices.
Change-Id: Iac52d2ff64afca8cc4dbb2d1f20e6ad4b39ddfde
We made decision to avoid satellite shutdown when segment loop
will return error. Loop still can reeturn error but it will be logged
and we will make monitoring/alert around that error.
Change-Id: I6aa8e284406edf644a09d6b1fe00c3155c5430c9
gracefulexit
With the effort to move audit related data into reputation store, this
PR updates gracefulexit endpoint to use reputation service to get a
node's audit score
Change-Id: Iad93ea689ad67ff9c57c7be16687e21e715fab7a
package in audit
This PR implements reputation store and replace overlay in audit service
to use such store for storing node's audit stats.
In order to keep the changeset smaller, most of the changes in this PR is for copying audit logic in overlay to
reputation package. In a following PR, the duplicating code will be
removed from overlay.
Change-Id: I16c12494a0970f44c422b26cf603c1dc489e5bc1
Bucket tally calculation will be removed from metaloop and will
use metabase objects iterator directly.
At the moment only bucket tally needs objects so it make no sense
to implement separate objects loop.
Change-Id: Iee60059fc8b9a1bf64d01cafe9659b69b0e27eb1
Current tally is calculating storage both for buckets and
storage nodes. This change is moving nodes storage
calculation to separate service that will be using
segment loop.
Change-Id: I9e68bfa0bc751c82ff738c71ca58d311f257bd8d