Currently Cockroach isn't performant for concurrent database setup and
tear-down. Instead of a single instance allow setting multiple potential
connection strings and let the tests pick one connection string
randomly.
This improves test duration by ~10 minutes.
While we are at significantly changing how pgtest works, introduce
helper PickPostgres and PickCockroach for selecting the database to
reduce code duplications in multiple places.
Change-Id: I8ad171d5c4c8a4fc081ec2ae9bdd0cc948a80619
In cases like the segment reaper script connecting to the metainfodb,
we don't want a db migration to happen automatically when we call
metainfo.NewStore. This adds MigrateToLatest method for postgreskv
and cockroackv, and calls MigrateToLatest in places where NewStore used
to create tables.
Change-Id: I682d0f26d609af0601dfdb32a24866cdf5d32a7e
There was a race in the test code for piece deleter, which made it
possible to broadcast on the condition variable before anyone was
waiting. This change fixes that and has Wait take a context so it times
out with the context.
Change-Id: Ia4f77a7b7d2287d5ab1d7ba541caeb1ba036dba3
A/B indicates that B is a subtest of A, however in this case they
represent a configuration of the test, not a subtest.
Change-Id: I64eed5d5bcb12759e54fe4b5373f8e88488e50f7
Added a per IP rate limiter to the console web.
Cleaned up password check to leak less bcyrpt info.
Change-Id: I3c882978bd8de3ee9428cb6434a41ab2fc405fb2
To improve delete performance, we want to process deletes asynchronously
once the message has been received from the satellite. This change makes
it so that storagenodes will send the delete request to a piece Deleter,
which will process a "best-effort" delete asynchronously and return a
success message to the satellite.
There is a configurable number of max delete workers and a max delete
queue size.
Change-Id: I016b68031f9065a9b09224f161b6783e18cf21e5
Update unknown_audit_reputation_alpha and unknown_audit_reputation_beta.
Add test to verify that BatchUpdateStats properly modifies unknown audit
alpha/beta
Change-Id: I0d5f9cac96a99f64905cf575b772402db0756a9d
If a node is suspended and receives an unknown or failing audit,
disqualify them if the grace period (default 1w in production) has
passed.
Migrate the nodes table so any node that is currently suspended gets
unsuspended when the satellite starts up.
Change-Id: I7b81c68026f823417faa0bf5e5cb5e67c7156b82
Currently it was possible that PopAll returns 1010 items, then
makes one RPC call with 1000 items, then RPC call 10 items. Meanwhile,
there have been added 500 new items added to the queue.
This change ensures that we pull items from the queue early and
try to make rpc batches as large as possible.
Change-Id: I1a30dde9164c2ff7b90c906a9544593c4f1cf0e9
This reverts commit 105dc7acc6.
Reason for revert: Recent changes to the Postgres query plan seems to want to use this index now. Reverting until we have time to analyze what's happening.
Change-Id: I74b4b5a8f15c3850d8a958a29f51dbc80e7c282c
* Delete expired segments in expired segments service using metainfo
loop
* Add test to verify expired segments service deletes expired segments
* Ignore expired segments in checker observer
* Modify checker tests to verify that expired segments are ignored
* Ignore expired segments in segment repairer and drop from repair queue
* Add repair test to verify that a segment that expires after being
added to the repair queue is ignored and dropped from the repair queue
Change-Id: Ib2b0934db525fef58325583d2a7ca859b88ea60d
Replace most of old libuplink usages in testplanet. 100% migration will
be possible when we will be able to implement UploadWithClientConfig
with new libuplink.
Change-Id: I432d7d4917c7b67d46a058abd0a2a6a13f565ac4
Automatically attach attribution information to bucket during
BeginObject or CreateBucket when the UserAgent is set.
Change-Id: I405cb26c5a2f7394b30e3f2cf5d2214c8781eb8b
We want to avoid net/http dependency in errs2 package, hence we removed
http.ErrServerClosed from IgnoreCanceled and IsCanceled check. Now we
need to add that check explicitly to every http endpoint.
Change-Id: I62b1cc0a0a2d3b43301d713a7951e5022145f88f
This adds support for observers to join the loop together. This allows
to ensure that when multiple observers join, they will be part of the
same loop iteration.
Change-Id: Ie887d4cedfb074b65c782690a2c09c1704f56dfe
During testing it's possible to get into a scenario where all nodes are
offline and list of requests is empty.
Change-Id: I271c0ca2c72009244df13e8bc1441fcd5f3da9e0
* satellite: update log levels
Change-Id: I86bc32e042d742af6dbc469a294291a2e667e81f
* log version on start up for every service
Change-Id: Ic128bb9c5ac52d4dc6d6c4cb3059fbad73f5d3de
* Use monkit for tracking failed ip resolutions
Change-Id: Ia5aa71d315515e0c5f62c98d9d115ef984cd50c2
* fix compile errors
Change-Id: Ia33c8b6e34e780bd1115120dc347a439d99e83bf
* add request limit value to storage node rpc err
Change-Id: I1ad6706a60237928e29da300d96a1bafa94156e5
* we cant track storage node ids in monkit metrics so lets use logging to track that for expired orders
Change-Id: I1cc1d240b29019ae2f8c774792765df3cbeac887
* fix build errs
Change-Id: I6d0ffe058e9a38b7ed031c85a29440f3d68e8d47
Currently storj-sim relies on the log lines to be exactly the same,
when they change it cannot find the necessary information from log.
Change-Id: Ia039915ef3375a7cf60f107b2c05c958de15b6d5
Currently ListV2 loaded the whole data into memory, even when all the
data wasn't being used, using up more memory than needed.
Change-Id: I5846d979344729b447c108a6cc9f4227229ec981
Alpha=1 and beta=0 are the expected first values for any alpha/beta
reputation system we are using in the codebase. So we are removing the
configurability of these values.
Change-Id: Ic61861b8ea5047fa1438ea6609b1d0048bf0abc3
We want to increase our throughput for downtime estimation. This commit
adds the ability to reach out to multiple nodes concurrently for downtime
estimation. The number of concurrent routines is determined by a new config
flag, EstimationConcurrencyLimit. It also increases the default
EstimationBatchSize to 1000.
Change-Id: I800ce7ec1035885afa194c3c3f64eedd4f6f61eb
until we do a good job of cleaning them up, we should at least
not charge or pay people for them. nodes already locally delete
expired segments.
subsumes the tests in 1112.
Change-Id: I5961185764e02f6136b3231b44ecc75a9a8832c9
Whenever the node's reputation is updated, if its unknown audit
reputation is below the suspension threshold, its suspension field
is set to the current time. This could overwrite the previous
"suspendedAt" value resulting a node that never reaches the end of
its suspension.
Also log whenever a node is disqualified or its suspension status
changes
Change-Id: I5e8c8f1c46f66d79cb279b5b16a84fe03f533deb
Reduce the number of non-methods to reduce funcs in the namespace also
combine a func to slightly condense the code more.
Change-Id: Ifbe728eb8c8ca4c981df648decd259c2097b6b40
* Add migration to storagenode reputation table to add suspended
timestamp
* Send suspended info to storagenode from satellite nodestats endpoint
* Add suspended status to storagenode api
* Add an indicator on the storagenode dashboard informing operator of
the satellites the node is suspended on
Change-Id: Ie3669f6069cc0258ba76ec99d17006e1b5fd9c8a
potential encryption overhead.
This is the same approach we have for validating remote segment size.
https://storjlabs.atlassian.net/browse/USR-619
Change-Id: I2597ee734313a3068fd986001680bbedbf1bed2a
Adds a test to make sure that the correct amount of total nodes,
reputable nodes, and new nodes are returned by SelectStorageNodes
in different cases.
Change-Id: I0939159600afde8a46c35735f1edf0576fcdb4cd
This adds new endpoint /api/user/{user-email} which allows to get the
projects where the user is a member.
It also moves existing endpoint:
/project/{projectid}/limit -> /api/project/{projectid}/limit
To avoid future conflicts for displaying pages.
Change-Id: I5efe3e1c8f79894c136f92ed815f635a34ba6f98
size with BeginObject
Such solution will add one round trip to satellite during upload so for
now we are reverting this until we will have solution for this.
Change-Id: Ic2d826448ab7b0318cd6922df05deee9167cf2f0
We have been using the SQL expression `name='(*Verifier).Verify' AND
error_name='not enough shares for successful audit'` thus far to detect
cases of this problem and alert on them. Unfortunately, since this
rarely (hopefully never) happens, influxdb has no data for most of the
auditor instances, and when it has no data for a time series, it returns
no columns either. This makes Redash upset when it tries to perform a
query for an alert and can't find the column whose value it expects to
check.
This change should make it so zero values are reported when the problem
has not happened, and higher values when it has.
Change-Id: I79e5e000f879678b661dac88caae1e2915b39ab1
there are a subset of storagenodes hammering the satellite with
expired orders. if we check for expiration first, we don't have
to do a bunch of pointless signature verification. since a && b
is equal to b && a, we can order these checks in any way we want
and have it still be correct.
Change-Id: I6ffc8025c8b0d54949a1daf5f5ea1fed9e213372
BeginObject response
We want to control inline segment size and segment size on satellite
side. We need to return such information to uplink like with redundancy
scheme.
Change-Id: If04b0a45a2757a01c0cc046432c115f475e9323c
all permissions
Without read and list permissions BeginObjectDelete won't return error
if occurs. This was breaking Batch processing because there was
assumption that without error response will be always not nil.
https://storjlabs.atlassian.net/browse/SM-590
Change-Id: I0fc9539e429110a660eb28725b266d5e4771d198
uuid.UUID implements driver.Value so it can be directly used as a
scannable result.
Replace uses of dbutil.BytesToUUID with uuid.FromBytes.
Change-Id: I51a670185ceb3cc2199d5aa2b76bc3fc191ca8fe
Instead of providing the database from outside to testplanet create it
inside and then allow wrapping and modifying it. This is more convenient
to use.
Change-Id: I9b8f69e6e0a19ff984b4e2bfe927c9100c77bc6c
Add flag to satellite repairer, "InMemoryRepair" that allows the
satellite to decide whether to download the entire segment being
repaired into memory (this is what the satellite already does), or to
download it into temporary files on disk that will be read from in the
upload phase of repair.
This should help with handling high repair traffic on satellites that
cannot afford to spend 64mb of memory per repair worker.
Updates tests to test repair for both in memory and to disk.
Change-Id: Iddf591e165621497c98533d45bfea3c28b08a194
we still need to come up with a better plan to get storage nodes
to stop doing this, but in the meantime, we know this is happening,
just stop logging it and keep some stats instead.
Change-Id: Icb6bcba275e0e955c54b1a90da2b37219fff2349
storagenodes have like 10 or more databases. without this
tag they all get sent as the same value, stomping on each
other.
Change-Id: Ib12019684d6ea8f2a5b83df584056dfa79e3c4b3
This will help to determine how many grpc calls are made to the
satellite.
Also remove the grpc funcs that have been added to upstream.
Change-Id: I91878f4fd10f9bfe601c94222c102eaaf4d35963
* debug
* traces
* cfgstruct
* process
Package `storj/private/version` will be removed as a separate change.
Change-Id: Iadc40faa782e6225513b28218952f02d9c240a9f
The goal of this change is to improve the storagenode_storage_tallies table by removing the unneeded id column that is not being used but only taking up space, and also to add an index on a different column that needs it. Removing and adding a column seems simple, but ended up being more complicated because of some cockroachdb limitations.
The cockroachdb limitation when trying to remove a column from a table and create a new primary key are:
1. only allows primary key creation at table creation time (docs: https://www.cockroachlabs.com/docs/stable/primary-key.html)
2. table drop or rename is performed async and cannot be done in a transaction (issue: https://github.com/cockroachdb/cockroach/issues/12123, https://github.com/cockroachdb/cockroach/issues/22868)
To address these differences between cockroachdb and Postgres, this PR performs different migrations for the two database. The Postgres migration is straight forward and what you would expect, but the cockroach migration has two main changes:
1. To change a primary key, use the recommended process from the cockroachdb docs to create a new table with the new primary key you want and then migrate the data.
2. In order to do 1, we needed to do the new table renaming in a separate transaction from the data migration.
Ref: SM-65
Change-Id: Idc9aee3ab57aa4d5570e3d2980afea853cd966bf