As the tables that get cleaned up by this job get a lot of inserts and deletes over the course of a day, the autovacuum process on PostgreSQL struggles fairly easily/quickly.
Due to its limitation, it can only delete 180,000,000 tuples in one go, before it has to rescan the entire table/index.
With the current load, the most busy satellites accumulate about 1,000,000,000 tuples per day (consumed_serials). With our current 24h interval that results in ~6-7 scans, slowing the entire database down for a quite long time.
This PR reduces the interval to 4 hours, which under a constant load, results in less than 180,000,000 entries per run.
That way, we do not scan twice for only a small gain over said amount. Reducing the interval further would also increase the DB load unnecessary, as each run scans the entire tables at least once.
For future reference, we might need to adjust the interval, if the load is significantly changing.
Change-Id: I18fdd45d93d468cff126e719c8380c29a49f43dd
also remove the continuation support from the queue, otherwise
we may end up sequential scanning the entire table to get
a few rows at the end.
then, in the core, instead of looping both to get a big enough
batch inside of the queue, as well as outside of it to ensure
we consume the whole queue, just get a single batch at a time.
also, make the queue size configurable because we'll need to
do some tuning in production.
Change-Id: If1a997c6012898056ace89366a847c4cb141a025
WHAT:
1. updated verification page URL in config
2. added list of partnered satellites to config
3. added logic for satellites dropdown on new signup/login pages
WHY:
1. signup/login flow was reworked in tardigrade.io repo (iframe removed, new pages etc.)
2. new config flag was added to check if satellite name matches at least one member of partnered satellites list to redirect user to verification page
3. new pages will have dropdown with partnered satellites list. Appropriate logic was added.
Change-Id: I33399ab66ca31f07b297a433f6b1f41da4cb6e66
The current satellite config lock code relies on bash scripts and
gnu diff, it must be run as root and hence it typically requires
docker. The old version will be removed at a later date..
I tried for several hours to run directly against cmdSetup() in
cmd/satellite/main.go, to avoid the ctx.Compile() call. I had no
luck.
Change-Id: I0a4888421e743b436d32b6af69d04759d7816751
Also distinguish the purpose for selecting nodes to avoid potential
confusion, what should allow caching and what shouldn't.
Change-Id: Iee2451c1f10d0f1c81feb1641507400d89918d61
Add a flag that allows us to easily switch disqualification from
suspension mode on or off. A node will only be disqualified from
suspension mode if it has been suspended for longer than the grace
period AND the SuspensionDQEnabled flag is true.
Change-Id: I9e67caa727183cd52ab2042b0a370a1bcaebe792
Previously we are using tracing.sampled to be the switch for turning on/off tracing.
However we would like to separate sampling rate from being the switch,
so we can set sampling rate to be 0 but still intialize tracing for
satellite and storagenodes
Change-Id: I27e6ba25ea6f6b612b4e1a57cf1301889ded41ec
Added a per IP rate limiter to the console web.
Cleaned up password check to leak less bcyrpt info.
Change-Id: I3c882978bd8de3ee9428cb6434a41ab2fc405fb2
If a node is suspended and receives an unknown or failing audit,
disqualify them if the grace period (default 1w in production) has
passed.
Migrate the nodes table so any node that is currently suspended gets
unsuspended when the satellite starts up.
Change-Id: I7b81c68026f823417faa0bf5e5cb5e67c7156b82
* Delete expired segments in expired segments service using metainfo
loop
* Add test to verify expired segments service deletes expired segments
* Ignore expired segments in checker observer
* Modify checker tests to verify that expired segments are ignored
* Ignore expired segments in segment repairer and drop from repair queue
* Add repair test to verify that a segment that expires after being
added to the repair queue is ignored and dropped from the repair queue
Change-Id: Ib2b0934db525fef58325583d2a7ca859b88ea60d
Alpha=1 and beta=0 are the expected first values for any alpha/beta
reputation system we are using in the codebase. So we are removing the
configurability of these values.
Change-Id: Ic61861b8ea5047fa1438ea6609b1d0048bf0abc3
We want to increase our throughput for downtime estimation. This commit
adds the ability to reach out to multiple nodes concurrently for downtime
estimation. The number of concurrent routines is determined by a new config
flag, EstimationConcurrencyLimit. It also increases the default
EstimationBatchSize to 1000.
Change-Id: I800ce7ec1035885afa194c3c3f64eedd4f6f61eb
BeginObject response
We want to control inline segment size and segment size on satellite
side. We need to return such information to uplink like with redundancy
scheme.
Change-Id: If04b0a45a2757a01c0cc046432c115f475e9323c
Add flag to satellite repairer, "InMemoryRepair" that allows the
satellite to decide whether to download the entire segment being
repaired into memory (this is what the satellite already does), or to
download it into temporary files on disk that will be read from in the
upload phase of repair.
This should help with handling high repair traffic on satellites that
cannot afford to spend 64mb of memory per repair worker.
Updates tests to test repair for both in memory and to disk.
Change-Id: Iddf591e165621497c98533d45bfea3c28b08a194
Now that we are trying to identify the root cause of the satellite load limitations (i.e. currently the satellite has a max ability of 400 rps for uploads and we need this to be higher), we are using the golang diagnostic tools to collect insight into what the bottlenecks are. We currently have a debug endpoint to gather some cpu and mem data, but it could be useful to have continuous profiling. GCP stackdriver has support for continuous profiling so lets set that up and see if it is helpful to gather more data.
This PR adds support for [GCP continuous profiler](https://cloud.google.com/profiler) which allows enabling continuous cpu/mem profiling and the stats are sent to stackdriver in google cloud console.
To enable the continuous profiling for a storj component, do the following:
- prereq: the workload must be running in GKE and have Stackdriver Profiling IAM role permissions
- provide the config flag `debug.profilename` in the config.yaml file for the workload (i.e. satellite api process, etc). The profilename should be the workload name, for example "satellite-api".
- once the above config flag is provided, the profiler will be initialized and profiling stats will automatically be sent to GCP project where the workload is running and viewable in the Stackdriver Profile page in the console
The current implementation assumes the workload is running in GKE, however if we find if useful we can add support to enable this from anywhere. But for simplicity, its configured this way assuming the main goal is to enable in production systems.
Change-Id: Ibf8ebe2df7bf06fdd4951ee6a1e48854dd36ad47
This new repair timeout (configured as TotalTimeout) will include both
the time to download pieces and the time to upload pieces, as well as
the time to pop the segment from the repair queue.
This is a move from Github PR #3645.
Change-Id: I47d618f57285845d8473fcd285f7d9be9b4318c8
This change adds two new tables to process orders as fast as we used
to but in an asynchronous manner and with hopefully less storage
usage. This should help scale on cockroach, but limits us to one
worker. It lays the groundwork for the order processing pipeline to
be queue rather than database driven.
For more details, see the added fast billing changes blueprint.
It also fixes the orders db so that all the timestamps that are
passed to columns that do not contain a time zone are converted to
UTC at the last possible opportunity, making it less likely to use
the APIs incorrectly. We really should migrate to include timezones
on all of our timestamp columns.
Change-Id: Ibfda8e7a3d5972b7798fb61b31ff56419c64ea35
rationale: if GC kills the satellite, it would be nice to make
it through a repair checker sweep first
Change-Id: Id56171dc8e13940cfb6481e36a910bad077a01ed
rate
Graceful exit is very slow at the moment. Over the last couple days we
increase the batch size on Stefans satellite to 1000 but as a side
effect the error rate was increased. With a batch size of 500 the error
rate looks stable.
This PR will increase the default to batch size to 300. Graceful exit
will still be painful slow but at least it will be a bit faster. At the
same time this PR also increases the number of errors we tolerate. We
don't want to DQ slow storage nodes just because they didn't finish all
300 transfers in time. We want to give them more retries.
Change-Id: I92e3f99e116d4988457d8b902a88e85ed1bcc1a7
Currently SNs report their free disk space once per hour. If a node
becomes full, it has to wait until the next contact cycle begins to
report; all the while receiving and failing upload requests. By increasing
the minimum required disk space, we can give the storage nodes more time
to report their space before the completely fill up. This change goes
hand-in-hand with another change we want to implement: trigger capacity
report on SN immediately upon falling below threshold.
Change-Id: I12f778286c6c3f582438b0e2949765ac43325e27
Control Panel allows to control different chores and services.
Currently this adds controlling of cycles.
Change-Id: I734f1676b2a0d883b8f5ba937e93c45ac1a9ce21
Allow rate limit project cache to expire so we can make project level rate limit changes without restarting the satellite process.
Change-Id: I159ea22edff5de7cbfcd13bfe70898dcef770e42
For the last few month we had no issues with order submission. I would
call it stable and now it is time to risk a lower expire time. This will
increase the database performance on the satellite and it will reduce
the delay for billing.
The long term goal is 6h but for that step we need to change graceful
exit first. At the moment storage nodes would get disuqlaified for not
transfering alle pieces in less than 6 hours.
Change-Id: I421a2c2421c5374c4e706e2338f1c2161fedc14c
With this change RS configuration will be set on satellite. Uplink with
get RS values with BeginObject request and will use it. For backward
compatibility and to avoid super large change redundancy scheme stored
with bucket is not touched. This can be done in future.
Change-Id: Ia5f76fc10c37e2c44e4f7b8754f28eafe1f97eff
Limits how many times metainfo APIs can be called per second by project ID. If limit is exceeded, the API will return Unauthorized/Too Many requests.
Limit per second and the size of the limiter cache per project are configurable, as well as whether the limiter is enabled.
Tests added/updated for the new rate_limit field in projects table.
Tests added for exceeding limits and disableing limiter.
Change-Id: Ic8ad102de3b690a475809d4f684156d5715f20fa
live accounting used to be a cache to store writes before they are picked up during
the tally iteration, after which the cache is cleared. This created a window in which
users could potentially exceed the storage limit. This PR refactors live accounting to
hold current estimations of space used per project. This should also reduce DB load
since we no longer need to query the satellite DB when checking space used for limiting.
The mechanism by which the new live accounting system works is as follows:
During the upload of any segment, the size of that segment is added to its respective
project total in live accounting. At the beginning of the tally iteration we record
the current values in live accounting as `initialLiveTotals`. At the end of the tally
iteration we again record the current totals in live accounting as `latestLiveTotals`.
The metainfo loop observer in tally allows us to get the project totals from what it
observed in metainfo DB which are stored in `tallyProjectTotals`. However, for any
particular segment uploaded during the metainfo loop, the observer may or may not
have seen it. Thus, we take half of the difference between `latestLiveTotals` and
`initialLiveTotals`, and add that to the total that was found during tally and set that
as the new live accounting total.
Initially, live accounting was storing the total stored amount across all nodes rather than
the segment size, which is inconsistent with how we record amounts stored in the project
accounting DB, so we have refactored live accounting to record segment size
Change-Id: Ie48bfdef453428fcdc180b2d781a69d58fd927fb
this commit introduces the reported_serials table. its purpose is
to allow for blind writes into it as nodes report in so that we have
minimal contention. in order to continue to accurately account for
used bandwidth, though, we cannot immediately add the settled amount.
if we did, we would have to give up on blind writes.
the table's primary key is structured precisely so that we can quickly
find expired orders and so that we maximally benefit from rocksdb
path prefix compression. we do this by rounding the expires at time
forward to the next day, effectively giving us storagenode petnames
for free. and since there's no secondary index or foreign key
constraints, this design should use significantly less space than
the current used_serials table while also reducing contention.
after inserting the orders into the table, we have a chore that
periodically consumes all of the expired orders in it and inserts
them into the existing rollups tables. this is as if we changed
the nodes to report as the order expired rather than as soon as
possible, so the belief in correctness of the refactor is higher.
since we are able to process large batches of orders (typically
a day's worth), we can use the code to maximally batch inserts into
the rollup tables to make inserts as friendly as possible to
cockroach.
Change-Id: I25d609ca2679b8331979184f16c6d46d4f74c1a6
With the new storage node downtime tracking feature, we need remove current uptime reputation configs: UptimeReputationAlpha, UptimeReputationBeta, and
UptimeReputationDQ. This is the first step of removing the uptime
reputation columns from satellitedb
Change-Id: Ie8fab13295dbf545e33aeda0c4306cda4ba54e36
We don't want slowloris nodes to be able to indefinitely block
up the satellite, so add a timeout. Some monitoring inspection
showed the largest success times being on the order of 30s, so
a 1min timeout should be sufficient to kill the misbehaving nodes.
Change-Id: I5e2c3480a15f6304e37262d0a4d30d07eae99bb3
As per discussed we decided to rate limit how fast we iterate through
the metainfo database in the metainfo loop. This puts in place a
mechanism for rate limiting and burst limiting if need be in the future.
The default for this rate limiting is still no limits so it stays the
same as our previous functionality.
Change-Id: I950f7192962b0e49f082d2c4284e2d52b0a925c7
Adds check to see if storage nodes are eligible to initiate
graceful exit, by checking their CreatedAt date and seeing if
their "age" is greater than the new config value:
NodeMinAgeInMonths
The default for this value is 6 months for now.
https://storjlabs.atlassian.net/browse/V3-3357
Change-Id: Ib807ab8987ddb5a38a27a83886490f73fe8c5816
* satellite/console: Add X-Frame-Options and Referrer-Policy security headers
* Update to use CSP instead of XFO and include tardigrade.io
* Make FrameAncestors a config option
* Update satellite-config lock
* Make help text for FrameAncestors better
* add overall failure percentage check and inactive time frame check before sending a response to sno
* update comment
* delete node from transfer queue if it has been inactive for too long
* fix linting error
* add test config value
* fix nil pointer
* add config value into testplanet
* add unit test for overall failure threshold
* move timeframe threshold to chore
* update protolock
* add chore test
* add per peiece failure count logic
* change config name from EndpointMaxFailures to MaxFailuresPerPiece
* address comments
* fix linting error
* add error handling for no row returned from progress table
* fix test for graceful exit chore on storagenode
* fix typo InActive -> Inactive
* improve readability for failure threshold calculation
* update config lock
* change error handling for GetProgress in graceful exit endpoint on the satellite side
* return proper rpc error in endpoint
* add check in chore test for checking finish timestamp and queue
* add metrics counter and chore
* updates metrics observer interval release default and dev default to 15min
* add more specific check for remote pointers
* add Counter field to metrics chore, add counter tests
* rm redundant ObjectCount suffix
* make pointer check easier to read
* change metrics.Config.Interval to ChoreInterval
* rm unneeded var
* fix comment
* update satellite config lock
* set up redis support in live accounting
* move live.Service interface into accounting package and rename to Cache, pass into satellite
* refactor Cache to store one int64 total, add IncrBy method to redis client implementation
* add monkit tracing to live accounting
What:
cmd/inspector/main.go: removes kad commands
internal/testplanet/planet.go: Waits for contact chore to finish
satellite/contact/nodesservice.go: creates an empty nodes service implementation
satellite/contact/service.go: implements Local and FetchInfo methods & adds external address config value
satellite/discovery/service.go: replaces kad.FetchInfo with contact.FetchInfo in Refresh() & removes Discover()
satellite/peer.go: sets up contact service and endpoints
storagenode/console/service.go: replaces nodeID with contact.Local()
storagenode/contact/chore.go: replaces routing table with contact service
storagenode/contact/nodesservice.go: creates empty implementation for ping and request info nodes service & implements RequestInfo method
storagenode/contact/service.go: creates a service to return the local node and update its own capacity
storagenode/monitor/monitor.go: uses contact service in place of routing table
storagenode/operator.go: moves operatorconfig from kad into its own setup
storagenode/peer.go: sets up contact service, chore, pingstats and endpoints
satellite/overlay/config.go: changes NodeSelectionConfig.OnlineWindow default to 4hr to allow for accurate repair selection
Removes kademlia setups in:
cmd/storagenode/main.go
cmd/storj-sim/network.go
internal/testplane/planet.go
internal/testplanet/satellite.go
internal/testplanet/storagenode.go
satellite/peer.go
scripts/test-sim-backwards.sh
scripts/testdata/satellite-config.yaml.lock
storagenode/inspector/inspector.go
storagenode/peer.go
storagenode/storagenodedb/database.go
Why: Replacing Kademlia
Please describe the tests:
• internal/testplanet/planet_test.go:
TestBasic: assert that the storagenode can check in with the satellite without any errors
TestContact: test that all nodes get inserted into both satellites' overlay cache during testplanet setup
• satellite/contact/contact_test.go:
TestFetchInfo: Tests that the FetchInfo method returns the correct info
• storagenode/contact/contact_test.go:
TestNodeInfoUpdated: tests that the contact chore updates the node information
TestRequestInfoEndpoint: tests that the Request info endpoint returns the correct info
Please describe the performance impact: Node discovery should be at least slightly more performant since each node connects directly to each satellite and no longer needs to wait for bootstrapping. It probably won't be faster in real time on start up since each node waits a random amount of time (less than 1 hr) to initialize its first connection (jitter).
Creates a new chore, dbcleanup, which can be used for routine deletion of items from the satellite database and adds functionality for deletion of expired serial numbers
What: Change cmd/uplink to use scopes
It moves the fields that will be subsumed by scopes into an explicit legacy section and hides their configuration flags.
Why: So that it can read scopes in from files and stuff
* Added batch update stats for recordAuditSuccessStatus
* Added batch update stats to recordAuditFailStatus
* added configurable batch size
* build individual update/delete statements so the statements can be batched into 1 call to the DB
* notified #config-changes channel and ran make update-satellite-config-lock
* updated tests to use batch update stats
* Added a gc package at satellite/gc, which contains the gc.Service, which runs garbage collection integrated with the metainfoloop, and the gc PieceTracker, which implements the metainfo loop Observer interface and stores all of the filters (about which pieces are good) for each node.
* Added a gc config located at satellite/gc/service.go (loop disabled by default in release)
* Creates bloom filters with pieces to be retained inside the metainfo loop
* Sends RetainRequests (or filters with good piece ids) to all storage nodes.
* pkg/datarepair: Add test to check num upload pieces
Add a new test for ensuring the number of pieces that the repair process
upload when a segment is injured.
* satellite/orders: Don't create "put order limits" over total
Repair must not create "put order limits" more than the total count.
* pkg/datarepair: Update upload repair pieces test
Update the test which checks the number of pieces which are uploaded
during a repair for using the same excess over the success threshold
value than the implementation.
* satellites/orders: Limit repair put order for not being total
Limit the number of put orders to be used by repair for only uploading
pieces to a % excess over the successful threshold.
* pkg/datarepair: Change DataRepair test to pass again
Make some changes in the DataRepair test to make pass again after the
repair upload repaired pieces only until a % excess over success
threshold.
Also update the steps description of the DataRepair test after it has been
changed, to match on what's now, besides to leave it more generic for
avoiding having to update it on minimal future refactorings.
* satellite: Make repair excess optimal threshold configurable
Add a new configuration parameter to the satellite for being able to
configure the percentage excess over the optimal threshold, used for
determining how many pieces should be repaired/uploaded, rather than
having the value hard coded.
* repairer: Add configurable param to segments/repairer
Add a new parameters to the segment/repairer to calculate the maximum
number of excess nodes, based on the optimal threshold, that repaired
pieces can be uploaded.
This new parameter has been added for not returning more nodes than the
number of upload orders for data repair satellite service calculate for
repairing pieces.
* pkg/storage/ec: Update log message in clien.Repair
* satellite: Update configuration lock file
* pkg/process/metrics: add an instance prefix
the distinction between which satellite is sending which
data should go in the instance field, not the suffix or application
fields. (un)fortunately, the instance id is deliberately not
configurable because we don't want it to be easy to accidentally
have multiple applications collide with the same instance id.
so we're currently stuffing the human readable instance in the
suffix. :(
perhaps a reasonable tradeoff would be an optional instance
prefix that allows operators to put their domain name in
the instance
Change-Id: I6fcc8498be908c5740439cc00f77474ad151febd
* linting
Change-Id: I9f9a44fa9a2634ef5e4f89548d42d57ce9e4450e
* add voucher service on storage node
* config field tag syntax, go routines for requests
* hook up voucher service in storagenode/peer.go
* add voucher config to testplanet
* add voucher config to testplanet
* add voucher response status INVALID, ACCEPTED, REJECTED
* add a test for vouchers service
* handle no row from GetValid, test it
* add trust pool to voucher service
* use trusted list to get satellites
* verify vouchers upon receipt
* test VerifyVoucher
* set to only listen on 127.0.0.1, move static files to same location, better template handling
* handle error
* fix path in storj-sim
* revert template handling changes
* code shouldn't panic on invalid tempalte
* do not rewrite once writing has started
* write correct error code
* use filepath for path handling
* revert change
* fix
* fix mod tidy
* use correct error code for not found, avoid infinite loop on failure
* Set up new port 8090 for in offers
Clean up commented code
Rename offers to offersweb
Remove unused code
Add todos for adding front-end templates
Add middleware for only allow local access
Add comment
Fix linting error
Remove commented code
Update storj-sim
Check request IP against Host IP
Use net pakcage to retrieve IP address
Rename service to marketing
* Add wrapper for all errors
* fix conflicts
* update the config file
* fix linting error
* remove unused packages
* remove global runtime var and add flag to storj-sim for mar static dir
* remove debugging lines
* add new config for test data and check if static dir flag is set before passing to mux
* change 'console' to 'marketing' for test data config
* fix linting errors
* update config flag
* Trigger Jenkins
* Trigger CLA
* reputation: Add configuration parameters
Add the configuration parameters which will be used by the algorithm
which will calculate the storage node reputation.
Because the reputation calculation is based on audit and uptime check
results some configuration parameters are in pkg/audit, others in
pkg/discovery and other in the satellite which will combine the both
reputation results to obtain the storage node reputation for repair and
uplink.
* satellite-config: Refresh lock file with new params
Refresh the Satellite configuration yaml lock file with the new
parameters added in this branch.
What: this will make it so release binaries default to whatever-release instead of whatever-dev in metrics collection
Why: So we can monitor release binaries with default configuration without getting drowned out by dev binaries
* set up voucher service skeleton, basic test
* add VetNode db method
* basic test for VetNode
* encode and sign voucher functions
* fill out and sign vouchers
* test pass/fail voucher request
* match EncodeVoucher to other Encode functions
* add last_ip field to dbx model node, generate dbx
* add last_ip to node proto, generate pb
* migrate
* resolve address in transport.DialNode, update lastIp in cache.UpdateAddress
* use net.SplitHostPort to isolate host address from port
* define DistinctIPs flag
* add test for GetIP
* select last_ip when querying for nodes
* if distinctIPs flag == true, query for nodes with distinct IPs
* some basic tests
* change last_ip to field 14 in proto
* remove comments
* check err
* change distinctIPs to distinctIP
* exclude IPs from newNodes in query for reputable nodes
* add index on last_ip
* only add to excludedIPs if flag is true
* test half new nodes returns distinct IPs
* fix alignment
* add test
* rework ip filter query, add retry logic, add switch for database driver
* add retry to SelectNewNodes
* change discovery intervals so IPs don't get overwritten
* remove TestGetIP
* edit updating node stats in test
* split exclude into nodeIDs and IPs
* separate non-distinct IP query into other function
* trigger checks
* remove else block
Ran into difficulties trying to find the ideal solution for sharing
these counts between multiple satellite servers, so for now this is a
dumb solution storing recent space-usage changes in a big dumb in-memory
map with a big dumb lock around it. The interface used, though, should
allow us to swap out the implementation without much difficulty
elsewhere once we know what we want it to be.
* scripts: check for satellite configuration changes
Create a simple script that checks if the satellite configuration has
differed with the last changes.
* Makefile: add target to check satellite cfg changes
Add a Makefile target which executes the script that checks if the last
changes has made a change in the satellite configuration.
* ci: add satellite cfg check on integration stage
* FIXUP: use releae defaults rather than development ones
* FIXUP: show the message when config differs & some cleanups
* scripts: add script to update the satellite cfg lock
Add a script for allowing to update the satellite configuration lock
file and add Makefile target to run it.
* scripts/testsdata: update satellite cfg lock
Update the satellite configuration lock file with the last changes that
satellite has suffered upstream.