The previous default FlushBatchSize of 10000 was causing major
slow down in select and insert statements on bucket_bandwidth_rollups.
We saw on the saltlake satellite that a FlushBatchSize of 1000 helped
reduce contention and query latency.
Change-Id: Ib95e73482219bc5aedc11925b1849fa5999774ba
WHAT:
config flag to indicate if satellite is in beta
WHY:
to avoid using hardcoded satellite names which may cause issues
Change-Id: If92eb7417c340bf343a9a91e2f6b11f0349020c5
This PR removes all back-end related referral program code including the
marketing portal.
We will have a separate PR for front-end code and database migration to
drop `offers` and `usercredits` table
Change-Id: If59f952cddfe0558a7dc03a0eac7cc1081517f88
Delete satellite order methods and DB tables which aren't used anymore
after we have done a refactoring on the orders to stuck bucket
information in the orders' encrypted metadata.
There are also configuration parameters and a satellite chore that
aren't needed anymore after the orders refactoring.
Change-Id: Ida3682b95921df70792284b42c96d2508bf8ca9c
The rollup archiver chore moves bucket bandwidth rollups and
storagenode rollups that are older than a given duration
to two new archive tables.
Change-Id: I1626a3742ad4271bc744fbcefa6355a29d49c6a5
Create a storj-sim test that checks that uplinks operations works when
satellite runs and can connect to Redis and when it cannot connect to
simulate a Redis downtime. Also verifies that the satellite can start
despite of Redis being downtime.
This test currently doesn't pass and it will be the one used to verify
the work that has to be done to make sure that the satellite allow the
clients to perform their operations despite of Redis being unavailable.
We require these changes before we deploy any customer face satellite on
a multi-region architecture.
NOTE that this test will be added later on to Jenkins to run this test
every time that we apply changes and at that time we'll see if it has to
be adjusted for being able to run on Jenkins because as it's now it may
not work because the scripts start and stop a Redis docker container.
Change-Id: I22acb22f0ca594583e36b45c88f8c03bac73b329
Full scope:
private/testplanet,satellite/{overlay,satellitedb}
Description:
In most cases, downtime tracking with audits will eventually lead
to DQ for nodes who are unresponsive. However, if a stray node has no
pieces, it will not be audited and will thus never be disqualified.
This chore will check for nodes who have not successfully been contacted
in some set time and DQ them.
There are some new flags for toggling DQ of stray nodes and the timeframes
for running the chore and how long nodes can go without contact.
Change-Id: Ic9d41fdbf214736798925e728245180fb3c55615
Query nodes table using AS OF SYSTEM TIME '-10s' (by default) when on CRDB to alleviate contention on the nodes table and minimize CRDB retries. Queries for standard uploads are already cached, and node lookups for graceful exit uploads has retry logic so it isn't necessary for the nodes returned to be current.
It turns out that alpine dropped support/updates for the aarch64 image.
Instead they have been using the arm64v8 notation for quite a while,
which resulted in breaking our recent aarch64 builds due to missing
dependencies/updates.
Both arches are exactly the same, aarch64 was created originally by Apple
and arm64 by GNU. The backends have been merged by now and the arm64 became
the de facto standard.
Since the Satellite now requires the order encryption functionality (since serial_number table is deprecated) to properly function, we can remove the config flag to turn on/off the feature.
Change-Id: Ie973f72a9a05a81cef9e53dc9c99d22c940c2488
This PR contains the minimum changes needed to stop inserting into the serial_numbers table. This is the first step in completely deprecating that table.
The next step is to create another PR to remove the expiredSerial chore, fix more tests, and remove any other methods on the serial_number table.
Change-Id: I5f12a56ebf3fa4d1a1976141d2911f25a98d2cc3
WHAT:
added brotli compression for wasm files and added copying of those files to static/wasm folder in Dockerfile
WHY:
those files are a part of web worker webpack bundle and I didn't find a way to compress them separately using webpack.
I'm open to any other ideas if they come up
Change-Id: I105cc1582e9816fd9b63052ba48358525c85a164
version
The test-versions test currently takes 1h 40min to run each time. By
running each installation concurrently, hopefully, it will reduce the execution
time for the whole test.
Change-Id: I680c7d9945e982894b11825c9075c167f754e087
We are no longer planning on implementing downtime penalization using
the method described in
docs/blueprints/archive/storage-node-downtime-tracking-deprecated.md.
Now, we are implementing the design described in
docs/blueprints/storage-node-downtime-tracking-with-audits.md.
This change removes the downtime estimation chores from the satellite
core as well as the package satellite/downtime. A future change will
remove the database table.
Change-Id: I1a1d3cf9dceeba36255d25243294865b89925518
WHAT:
POST request to get gateway credentials using access grant.
Put request url to config and use it for request.
WHY:
to show gateway credentials on UI
Change-Id: I15ef43ecdeed69b0961d5796aacb47f36d560b1b
this change tries really hard to never have all of the storage node
rollups in memory at the same time, up until the rollups are actually
getting summed together.
Change-Id: If67f49e7d71106798d996a6850b3e48671bd9e18
Rather than having a single repair override value, we will now support
repair override values based on a particular segment's RS scheme.
The new format for RS override values is
"k/o/n-override,k/o/n-override..."
Change-Id: Ieb422638446ef3a9357d59b2d279ee941367604d
CRDB doesn't like large deletes. While testing in the POC environment we found that deletes on the serial_numbers table could take hours. This change limits deletes to 1000 at a time (configurable) to avoid blocking other queries.
Change-Id: I08455e25db1574579dd4d7b7125a08e9c913dff1
We plan to add support for a new Reed-Solomon scheme soon, but our
repair queue orders segments by least number of healthy pieces first.
With a second RS scheme, fewer healthy pieces will not necessarily
correlate to lower health.
This change just adds the new column in a migration. A separate change
will add the new health function.
Right now, since we only support one RS scheme, behavior will not
change. Number of healthy pieces is being inserted as "segment health"
until the new health function is merged.
Segment health is calculated with a new priority function created in
commit 3e5640359. In order to use the function, a new config value is
added, called NodeFailureRate, representing the approximate probability
of any individual node going down in the duration of one checker run.
Change-Id: I51c4202203faf52528d923befbe886dbf86d02f2
Currently our code is only using github version of the code, so there
shouldn't be need for the exception.
Change-Id: I0c6e8a8465ab7b525d4b5d1b29e4e5298384286d
This PR does follwing changes:
1. Change oldest uplink version in the test to v0.35.3
When the test is first created, we decided to support uplink
version starting from v0.17.1, however with many API changes,
older uplinks are not usable with latest version of the network
anymore. One of the reasons being older uplinks uses deprecated
endpoint. Therefore, we will change the oldest uplink version to
the one that's using only new endpoints.
2. Disable tls certificate verification in uplink
3. Use storj-sim version control server instead of production one
4. Skip uplink version v1.3.x due to bug in that release
Change-Id: I926a6bb9829cb7181ee752437cdcb67e59197fe0
This PR fixes below issues:
1. remove concurrent installation for various versions
We were doing this to decrease the amount of execution time the versions test.
However, it's returning incorrect exit code when there's an
installation failure.
Right now, we are only installing two versions of `storj-sim` and
the rest are only doing uplink cli installation. The performance of
this test should be hugely impacted by the setup step now.
2. only remove release settings instead of deleting the entire file
uplink CLI is referrencing `private/version` package. Therefore, we
cannot delete it
3. add back `GATEWAY_0_API_KEY` in storj-sim
In order to set up older version of uplink cli, we need access to
the gate way api key.
Change-Id: Ia3c37c197bd007b6e1f7c2bd71adde42181d46f0
Make metainfo.RSConfig a valid pflag config value. This allows us to
configure the RSConfig as a string like k/m/o/n-shareSize, which makes
having multiple supported RS schemes easier in the future.
RS-related config values that are no longer needed have been removed
(MinTotalThreshold, MaxTotalThreshold, MaxBufferMem, Verify).
Change-Id: I0178ae467dcf4375c504e7202f31443d627c15e1
branch
Currently, we are testing previous release version upgrading to latest
master on each master build
However, this behavior is only desired when the test is running on a
release branch.
Change-Id: Iaeb66f44951c9e4934ca3c8316d1e490d7958239
We are moving an error into rejectErr since its preventing storage nodes from being able to settle other orders.
Change-Id: I3ac97c340e491b127f5e0024c5e8bd9f4df8d5c3
We've had bizarre crashes for satellite and it's difficult to debug
stripped binaries. The only binary where stripping is useful, is uplink
cli. This change will increase uplink by 5MB.
Change-Id: I4d1dfd36452063c22e8471d99eec97f6de6167b8
Doing it at the ProcessOrders level was insufficient: the endpoints
make multiple database calls. It was a misguided attempt to only
have one spot enter the semaphore. By putting it in the endpoint
we can not only be sure that the concurrency is correctly limited
but it can be configurable easily.
Change-Id: I937149dd077adf9eb87fce52a1a17dc0afe96f64
Remove usage of --non-interactive flag. It is not provided (and
necessary) by the multitenant S3 gateway anymore.
ACCESS_KEY and SECRET_KEY env vars are not provided anymore as they are
not generated by the multitenant S3 gateway.
Change-Id: I3ecfb92110e31ae13977de3899dad273daae6c1e