Repair workers prioritize the most unhealthy segments. This has the consequence that when we
finally begin to reach the end of the queue, a good portion of the remaining segments are
healthy again as their nodes have come back online. This makes it appear that there are more
injured segments than there actually are.
solution:
Any time the checker observes an injured segment it inserts it into the repair queue or
updates it if it already exists. Therefore, we can determine which segments are no longer
injured if they were not inserted or updated by the last checker iteration. To do this we
add a new column to the injured segments table, updated_at, which is set to the current time
when a segment is inserted or updated. At the end of the checker iteration, we can delete any
items where updated_at < checker start.
Change-Id: I76a98487a4a845fab2fbc677638a732a95057a94
Jira: https://storjlabs.atlassian.net/browse/PG-69
There are a number of segments with piece_hashes_verified = false in
their metadata) on US-Central-1, Europe-West-1, and Asia-East-1
satellites. Most probably, this happened due to a bug we had in the
past. We want to verify them before executing the main migration to
metabase. This would simplify the main migration to metabase with one
less issue to think about.
Change-Id: I8831af1a254c560d45bb87d7104e49abd8242236
We have configlock_test for checking changes so the separate script and
makefile target are not needed.
Change-Id: I2bbc1c21ad849c9b7ec8bba43c0e11e94e04f6a6
Currently Cockroach migration test is the most heavy with regards to
schema changes. This causes other tests to time out. This adds an
alternate cockroach instance that is used for migration tests.
Change-Id: I01fe9313527ff002f0bb0914dd52c3645b8eaf6d
Currently a user is only able to create a project if either
a STORJ deposit or CC was added to his account. With this change, an existing
coupon is also valid to let the user proceed.
Change-Id: I7be8d2d9ec58a15c50755b3fe33af04d2fd64ea2
This PR adds the following items:
1) an in-memory read-only cache thats stores project limit info for projectIDs
This cache is stored in-memory since this is expected to be a small amount of data. In this implementation we are only storing in the cache projects that have been accessed. Currently for the largest Satellite (eu-west) there is about 4500 total projects. So storing the storage limit (int64) and the bandwidth limit (int64), this would end up being about 200kb (including the 32 byte project ID) if all 4500 projectIDs were in the cache. So this all fits in memory for the time being. At some point it may not as usage grows, but that seems years out.
The cache is a read only cache. When requests come in to upload/download a file, we will read from the cache what the current limits are for that project. If the cache does not contain the projectID, it will get the info from the database (satellitedb project table), then add it to the cache.
The only time the values in the cache are modified is when either a) the project ID is not in the cache, or b) the item in the cache has expired (default 10mins), then the data gets refreshed out of the database. This occurs by default every 10 mins. This means that if we update the usage limits in the database, that change might not show up in the cache for 10 mins which mean it will not be reflected to limit end users uploading/downloading files for that time period..
Change-Id: I3fd7056cf963676009834fcbcf9c4a0922ca4a8f
Our current endpoints bail on us, if the column data is null. Thus we need
to take the intermediate step and set the default to a fixed value and
reset those with the following release.
It sets the default column value to our current config values of 50GB
for storage and bandwidth and 100 buckets, while still enabling the field to be nullable.
All 0 values are migrated to be the default as well to ensure they can
keep using their projects, as with the original change, 0 actually means 0.
Change-Id: I797be80ce2d2105091599dc1b3fc76f74336b66b
Currently we have no way to actually set one
of the following limits to 0 (meaning not usable):
- maxBuckets
- usageLimit
- bandwidthLimit
With having the field nullable,
NULL corresponds to the global default,
0 now actually 0 and
a set value determines a custom limit.
Change-Id: I92bb77529dcbd0881ae8368921be9d246eb0919e
Jira: https://storjlabs.atlassian.net/browse/PG-67
There are a number of old-style objects (without the number of segments
in their metadata) on US-Central-1, Europe-West-1, and Asia-East-1
satellites. We want to migrate their metadata to contain the number of
segments before executing the main migration to metabase. This would
simplify the main migration to metabase with one less issue to think
about.
Change-Id: I42497ae0375b5eb972aab08c700048b9a93bb18f
WHAT:
added functionality for user to update project name. Logic only, without actual GUI updates.
WHY:
better user experience
Change-Id: I1e38e33ba827b0bdf2c89e29de24e4e87edb474a
nodes are submitting using both the legacy and windowed endpoints
and thus having their legacy submissions rejected.
it is legal to use both the legacy and windowed endpoints
in phase1 since they use the same backend. the legacy endpoint
is disabled in phase2 and phase3.
therefore, if we wait an order expiration period (2 days) after
we determine enough nodes have started using the windowed
endpoint, we can be sure that any orders they did have to
submit with the legacy endpoint will have expired.
Change-Id: I4418a881bf8bb9377efaef4c651e6103a5dc6ed0
Another change which is a part of refactoring to replace path parameter
(string/[]byte) with key paramter (metabase.SegmentKey)
Change-Id: I617878442442e5d59bbe5c995f913c3c93c16928
objectdeletion.ObjectIdentifier with metabase.ObjectLocation
Another change to use metabase.ObjectLocation across satellite codebase
to avoid duplication and provide better type safety.
Change-Id: I82cb52b94a9107ed3144255a6ef4ad9f3fc1ca63
WHAT:
notification bar added to project dashboard page. It is shown when projects count limit is reached.
Create project button is removed after creating last available project
WHY:
inform user that their projects count limit was reached
Change-Id: If0d67148003be40cc9eb4d8b25cc17f8204008d4
Additionally, this PR changes NewNodeFraction devDefault and testplanet config from 0.05 to 1.
This is because many tests relied on selecting nodes that were reputable based on audit and uptime
counts of 0, in effect, selecting new nodes as reputable ones.
However, since reputation is now indicated by a vetted_at db field that is explicitly set
rather than implied by audit and uptime counts, it would be more complicated to try to
update all of the nodes' reputations before selecting nodes for tests.
Now we just allow all test nodes to be new if needed.
Change-Id: Ib9531be77408662315b948fd029cee925ed2ca1d
metabaseSegmentKey` TransferQueueItem
We are unifying which name (and type) we are using for value we are
using to point to segment. We want to use `key` instead of `path`.
Dedicated type `metabase.SegmentKey` was created for this purposes also.
This change is doing refactoring around gracefulexit.
Change-Id: I90d51ff087b206179e61d5f1bc95f4709d76f917
This PR updates `uplink rb --force` command to use the new libuplink API
`DeleteBucketWithObjects`.
It also updates `DeleteBucket` endpoint to return a specific error
message when a given bucket has concurrent writes while being deleted.
Change-Id: Ic9593d55b0c27b26cd8966dd1bc8cd1e02a6666e
This PR fixes a deadlock that can happen when the number of piece
deletion requests is different from the distinct node count from those
requests. The success threshold should be based on the number of nodes
instead of the amount of requests
Change-Id: I83073a22eb1e111be1e27641cebcefecdc16afcb
This change forces the test of GetObjectIPs to use multiple remote
segments (earlier versions of the test were accidentally using inline
segments). This change also revealed a small bug in the for loop code,
which is fixed.
Change-Id: Ic486b079d221952ba13553acf0ca41a8873f3f21
* The audit worker wants to get items from the queue and process them.
* The audit chore wants to create new queues and swap them in when the
old queue has been processed.
This change adds a "Queues" struct which handles the concurrency
issues around the worker fetching a queue and the chore swapping a new
queue in. It simplifies the logic of the "Queue" struct to its bare
bones, so that it behaves like a normal queue with no need to understand
the details of swapping and worker/chore interactions.
Change-Id: Ic3689ede97a528e7590e98338cedddfa51794e1b
Add online score used for the new audit history offline tracking system
to the nodes table. This allows us easy access to the node's online
score for the storagenode dashboard as well as for data analysis.
Change-Id: Ie99be1192e5236862a5b3dbed2e5ef03b9169410
We were seeing error on the last day of the month with TestProjectAllocatedBandwidthRetainTwo.
This is due to AddDate normalizes its result in the same way that Date does, so, for example,
adding one month to October 31 yields December 1, the normalized form for November 31."
I also fixed a minor UTC issue with this test as well.
Change-Id: I0157873e7befa57810e5f264a922b188890fa46a
satellite.DB.Console().Projects().GetAll database query
can be replaced with planet.Uplinks[0].Projects[0].ID
Change-Id: I73b82b91afb2dde7b690917345b798f9d81f6831
When a node's audit history "online score" passes below a configured
threshold, the node goes into "offline suspension" mode and begins a
review period, where the operator is given an opportunity to bring their
node back online.
After the review period passes, offline suspension is turned off for the
node.
In the future, if a node still has a bad online score at the end of the
review period, it will be disqualified. This is disabled right now.
In the future, if a node is in offline suspension, it will be treated as
"unhealthy". Right now, there are no consequences for being in offline
suspension.
Minor changes:
* Moves AuditHistoryConfig out of UpdateStats/BatchUpdateStats args and
into UpdateRequest.
* Adds "now" argument to UpdateStats/BatchUpdateStats args for easy
testing.
* Changes formatting strings inside buildUpdateStatement to use specific
types.
Change-Id: I032b60298840fc16e6ef831da750f2d57619a397