Commit Graph

322 Commits

Author SHA1 Message Date
JT Olio
3b66ba6f02 scripts/tests: uplink no longer respects --client.segment-size
this is going to make all the tests slower but it is what it is

test-sim-aws.sh is removed because it was moved to storj/gateway repo.

Change-Id: I10727e747a4c3740b1c9054ce7d17313b4fa310b
2020-03-22 17:48:57 +00:00
Jennifer Johnson
699b635e5d satellite/overlay: rename newNodePercentage to newNodeFraction
Change-Id: Ie66de91f88183b44de0773589e83e4ade9aa997a
2020-03-19 20:09:32 +00:00
Jessica Grebenschikov
5142874144 satellite/gc: move garbage collection to its own process
Change-Id: I7235aa83f7c641e31c62ba9d42192b2232dca4a5
2020-03-18 16:44:01 +00:00
Egon Elbre
09e0f3de63 satellite/metainfo/piecedeletion: add Service
Change-Id: Id7e32ed569701fa0be66f9527c43a67052994570
2020-03-18 14:50:08 +00:00
Matt Robinson
bd4982e249
test/backwards-compatibility: Exclude rc tags from testing (#3787)
Change-Id: Id486709306c5c75a262ea17410ae641be53df8ed

Co-authored-by: Ivan Fraixedes <ivan@fraixed.es>
Co-authored-by: littleskunk <jens.heimbuerge@googlemail.com>
2020-03-18 00:12:05 +01:00
littleskunk
b10b69d9ce
test/rollingupgrade: fix stage 1 release version (#3810) 2020-03-17 22:30:07 +01:00
littleskunk
80acf33abc
script/release: fix error in regex (#3809)
Co-authored-by: Stefan Benten <mail@stefan-benten.de>
2020-03-17 17:36:17 +01:00
Stefan Benten
49a30ce4a7
satellite/payments: Set proper defaults for the release (#3806)
* Slight adjustments to the migration

Change-Id: I68ae81c010c3414fde2845df16ab124f8d17834b

* Change Coupon Value

Change-Id: I0f241d09e5f716f1d1b3f0688643ba7f614d83c4

* Change AlphaUsage to 5GB

Change-Id: I5d25c6b5750684510cda8b14a27f38d5b2b07408

* change config lock

Change-Id: Ib7c7a54555ba2387c9aa8dd60a0501b0ee6491dd

* Use Scan properly

Change-Id: Ie39cf4644e3ddd703a254e2f5e616763dd805235

* Fix Config Lock

Change-Id: I558ecc1c1becfaaefc7aea5ad2fe83fd6bf6b561
2020-03-16 22:53:12 +01:00
Stefan Benten
52590197c2
satellite/payments: More Cleanup and Satellite command to ensure we have stripe customers (#3805) 2020-03-16 20:34:15 +01:00
Kaloyan Raev
4f0bf3fe1d
build: cleanup more gateway targets from Makefile (#3802)
Change-Id: Ia95caa2187b3e9e056a83cbea4230788ed4e8abd

Co-authored-by: Michal Niewrzal <michal@storj.io>
2020-03-16 15:07:52 +01:00
Stefan Benten
bd603c0751
satellite/payments: Improve Invoice Generation (#3800) 2020-03-13 17:07:39 +01:00
Yingrong Zhao
3cf05b24d8 scripts: add benchmark test for delete operation
Change-Id: I448cc50375c1c712d704d8cf93f5b8481372b0c8
2020-03-12 17:03:19 +00:00
JT Olio
051569c69f
satellite: enable open registration (and add flag that disables it) SM-441
Change-Id: I47bfedb312089f6d2bfbab013bd74ad4b8aa5f5e
2020-03-11 03:53:34 +01:00
Yingrong Zhao
1a875baa1d scripts/tests: fix arguments orders passed to test-versions.sh
Change-Id: I2e3b501477823413137f6a43fcdc4347af27e43c
2020-03-11 13:50:18 +00:00
Yingrong Zhao
46b04a38cc scripts/tests/: fix uplink access to contain satellite id
Change-Id: I7dfb6bc2da3bf84a81d75a286ee9488e32ea4f01
2020-03-10 21:15:39 +00:00
Michal Niewrzal
c20cf25f35 cmd: migrate uplink CLI to new API
Change-Id: I8f8fcc8dd9a68aac18fd79c4071696fb54853a60
2020-03-09 13:26:29 +00:00
littleskunk
842c8d8ed9
scripts/tests/rollingupgrade: fix installation for current commit 2020-03-06 17:19:55 -05:00
littleskunk
8fa8178f04
release/rollingupgrade: on release tags run rolling upgrade against previous release (#3792)
Co-authored-by: Stefan Benten <mail@stefan-benten.de>
2020-03-03 23:56:32 +01:00
Michal Niewrzal
fb2711d05e scripts: update postgres helper script to set password
Latest postgres docker image requires non empty password.

Change-Id: I03017e1b7ff4803fefc24c39087d9ccd4042373b
2020-02-27 10:33:37 +00:00
Yingrong Zhao
ac34485f5d scripts/tests: install correct version of gateway
1. only run release tags that don't contain 'rc'
2. install gateway version that's the same as satellite
3. update gateway access to contain satellite id

Change-Id: I8ca1418302c3aafdf0c4eaaf8361422a1eec2bd4
2020-02-26 13:12:31 +00:00
Jessica Grebenschikov
e19e3c1101 pkg/process:
Now that we are trying to identify the root cause of the satellite load limitations (i.e. currently the satellite has a max ability of 400 rps for uploads and we need this to be higher), we are using the golang diagnostic tools to collect insight into what the bottlenecks are.  We currently have a debug endpoint to gather some cpu and mem data, but it could be useful to have continuous profiling. GCP stackdriver has support for continuous profiling so lets set that up and see if it is helpful to gather more data.

This PR adds support for [GCP continuous profiler](https://cloud.google.com/profiler) which allows enabling continuous cpu/mem profiling and the stats are sent to stackdriver in google cloud console.

To enable the continuous profiling for a storj component, do the following:
- prereq: the workload must be running in GKE and have Stackdriver Profiling IAM role permissions
- provide the config flag `debug.profilename` in the config.yaml file for the workload (i.e. satellite api process, etc). The profilename should be the workload name, for example "satellite-api".
- once the above config flag is provided, the profiler will be initialized and profiling stats will automatically be sent to GCP project where the workload is running and viewable in the Stackdriver Profile page in the console

The current implementation assumes the workload is running in GKE, however if we find if useful we can add support to enable this from anywhere. But for simplicity, its configured this way assuming the main goal is to enable in production systems.

Change-Id: Ibf8ebe2df7bf06fdd4951ee6a1e48854dd36ad47
2020-02-25 09:04:23 -08:00
paul cannon
92d86fa044 satellite/repair: fix repair concurrency
This new repair timeout (configured as TotalTimeout) will include both
the time to download pieces and the time to upload pieces, as well as
the time to pop the segment from the repair queue.

This is a move from Github PR #3645.

Change-Id: I47d618f57285845d8473fcd285f7d9be9b4318c8
2020-02-24 19:57:09 +00:00
Jeff Wendling
f671eb2beb satellite/satellitedb: use queue for orders to get back fast billing
This change adds two new tables to process orders as fast as we used
to but in an asynchronous manner and with hopefully less storage
usage. This should help scale on cockroach, but limits us to one
worker. It lays the groundwork for the order processing pipeline to
be queue rather than database driven.

For more details, see the added fast billing changes blueprint.

It also fixes the orders db so that all the timestamps that are
passed to columns that do not contain a time zone are converted to
UTC at the last possible opportunity, making it less likely to use
the APIs incorrectly. We really should migrate to include timezones
on all of our timestamp columns.

Change-Id: Ibfda8e7a3d5972b7798fb61b31ff56419c64ea35
2020-02-24 17:07:07 +00:00
Egon Elbre
e30f7b35b6 cmd/gateway: use a separate repository
Change-Id: Idbb0b2b6cf0e60c6d5d91218c24524d72285cf26
2020-02-24 10:03:03 +02:00
Michal Niewrzal
54e38b8986 pkg/miniogw: gateway implementation with new libuplink
Change-Id: I170c3a68cfeea33b528eeb27e6aecb126ecb0365
2020-02-21 16:20:38 +01:00
Yingrong Zhao
77f67a8086 satellite/metainfo: add timeout for delete request
Change-Id: I9cad6d7ea185fc2c0ed4e58b42e4e3a78178a79f
2020-02-20 09:10:16 +00:00
JT Olio
2ae9978304 satellite/gc: skip first gc run
rationale: if GC kills the satellite, it would be nice to make
it through a repair checker sweep first

Change-Id: Id56171dc8e13940cfb6481e36a910bad077a01ed
2020-02-13 13:41:15 +02:00
littleskunk
76849558cb satellite/gracefulexit: increase performance and tolerate higher error
rate

Graceful exit is very slow at the moment. Over the last couple days we
increase the batch size on Stefans satellite to 1000 but as a side
effect the error rate was increased. With a batch size of 500 the error
rate looks stable.
This PR will increase the default to batch size to 300. Graceful exit
will still be painful slow but at least it will be a bit faster. At the
same time this PR also increases the number of errors we tolerate. We
don't want to DQ slow storage nodes just because they didn't finish all
300 transfers in time. We want to give them more retries.

Change-Id: I92e3f99e116d4988457d8b902a88e85ed1bcc1a7
2020-02-12 11:40:15 +00:00
Egon Elbre
dbf46c4aa7 satellite/admin: administrative endpoint
Admin server allows creating basic REST and html API-s
for different administrative tasks.

Change-Id: I3dc1786abe1c87350eed60ec90e48130f44e63cf
2020-02-12 12:12:50 +02:00
Cameron Ayer
b22bf16b35 satellite/overlay: add config flag for node selection free disk requirement
Currently SNs report their free disk space once per hour. If a node
becomes full, it has to wait until the next contact cycle begins to
report; all the while receiving and failing upload requests. By increasing
the minimum required disk space, we can give the storage nodes more time
to report their space before the completely fill up. This change goes
hand-in-hand with another change we want to implement: trigger capacity
report on SN immediately upon falling below threshold.

Change-Id: I12f778286c6c3f582438b0e2949765ac43325e27
2020-02-11 18:08:25 +00:00
Qweder93
dc075eaa96 satellite/payments : deposit bonuses (credits) added
Change-Id: Ib151bbb9b02d655fa619c53bfbc04ed6f3bb39e0
2020-02-11 11:11:42 +00:00
Moby von Briesen
8c19855871 scripts/tests/rollingupgrade: explicitly set debug port for old
satellite api during rolling upgrade test

The old api is using the same config file as the new satellite in the
rolling upgrade test, so we need to set it to something different so
that there is no conflict when we spin up a new storj-sim instance while
the old api is running concurrently.

Change-Id: Ia4ec2db4953f36f43275495710992831ad3916a2
2020-01-29 18:32:03 -05:00
Egon Elbre
a2b2bc676b pkg/debug: implement control panel
Control Panel allows to control different chores and services.
Currently this adds controlling of cycles.

Change-Id: I734f1676b2a0d883b8f5ba937e93c45ac1a9ce21
2020-01-29 16:30:31 -05:00
littleskunk
e0cb8037c1 satellite/projectusage: reduce usage limit from 5GB to 0GB
Change-Id: Ie3d2509613e7a4336e2a8d2b136b32f5f308aafc
2020-01-29 20:38:39 +00:00
Ethan
149273c63f satellite/metainfo: add cache expiration for project level rate limiting
Allow rate limit project cache to expire so we can make project level rate limit changes without restarting the satellite process.

Change-Id: I159ea22edff5de7cbfcd13bfe70898dcef770e42
2020-01-29 16:14:10 +00:00
Yaroslav Vorobiov
083b396c16 satellite/payments: allow floating point numbers for pricing
Change-Id: I78b60134cf043746efef5371b761939a10f75aaf
2020-01-28 22:52:13 -05:00
littleskunk
a0c9f7f3b0
satellite/projectusage: reduce usage limit from 25GB to 5GB
Change-Id: I2819012b520fd687ab8058000aa38d76b8208158
2020-01-29 04:01:09 +01:00
littleskunk
a6c6440ab7 satellite/order: decrease expire time from 7 days to 2 days
For the last few month we had no issues with order submission. I would
call it stable and now it is time to risk a lower expire time. This will
increase the database performance on the satellite and it will reduce
the delay for billing.

The long term goal is 6h but for that step we need to change graceful
exit first. At the moment storage nodes would get disuqlaified for not
transfering alle pieces in less than 6 hours.

Change-Id: I421a2c2421c5374c4e706e2338f1c2161fedc14c
2020-01-24 23:37:39 +00:00
Yingrong Zhao
5de4f66553 scripts/tests: change multisegment file to be 128kb
To cover a special case: an object that has 2 remote segments
and 1 inline segment.

Change-Id: Ia8d82bb67fc6cf76af9c7f44cd738cab6df591e9
2020-01-22 17:12:11 +00:00
Michal Niewrzal
6502454947 satellite/metainfo: move RS configuration to satellite
With this change RS configuration will be set on satellite. Uplink with
get RS values with BeginObject request and will use it. For backward
compatibility and to avoid super large change redundancy scheme stored
with bucket is not touched. This can be done in future.

Change-Id: Ia5f76fc10c37e2c44e4f7b8754f28eafe1f97eff
2020-01-22 09:33:53 +00:00
Moby von Briesen
d32626fe8c scripts/tests: update uplink config migration for test versions
Updates config migration to occur for any v0.30.x release rather than
specifically 30.4

Also updates the config for the rolling upgrade test to use 64 kib
segments, and use smaller files for the final upload of rolling upgrade.

Change-Id: I941f77fe2b9011b45f28a5f3a2430e882d2ae6b3
2020-01-21 11:50:13 -05:00
Ethan
21a5d70a83 satellite/metainfo: Rate limiting - API requests
Limits how many times metainfo APIs can be called per second by project ID. If limit is exceeded, the API will return Unauthorized/Too Many requests.

Limit per second and the size of the limiter cache per project are configurable, as well as whether the limiter is enabled.

Tests added/updated for the new rate_limit field in projects table.
Tests added for exceeding limits and disableing limiter.

Change-Id: Ic8ad102de3b690a475809d4f684156d5715f20fa
2020-01-21 14:25:04 +00:00
Moby von Briesen
0def7a9d2a scripts/tests/testversions;scripts/tests/rollingupgrade: update test versions script
Fix uplink setup step for uplink versions that requires an access field.

Update how script selects uplink versions to test.

Use significantly smaller remote files for test (performance).

Change-Id: If590b8798767e2a0621fb84cd3b8852d02f6d1da
2020-01-20 11:46:11 -05:00
stefanbenten
f4097d518c satellite: reduce logging of node status
Change-Id: I6618cf4bf31b856acd7a28b54011a943c03ab22a
2020-01-18 17:47:59 +00:00
igaass
491cd8d8ab
scripts: automated test for testing uplink share command (#3736)
* scripts: automated test for testing uplink share command

* Replace "scope" to "access"

* Remove redundant access flag

* Rename variables
Remove retVal variable
2020-01-17 12:57:38 +02:00
littleskunk
b6f1a91c67 scripts/testversions,rollingupgrade: remove encrytion key
Change-Id: I6fd35fa4b29707f53e988bd00d6523b934767ecc
2020-01-16 16:01:34 +00:00
Cameron Ayer
4424697d7f satellite/accounting: refactor live accounting to hold current estimated totals
live accounting used to be a cache to store writes before they are picked up during
the tally iteration, after which the cache is cleared. This created a window in which
users could potentially exceed the storage limit. This PR refactors live accounting to
hold current estimations of space used per project. This should also reduce DB load
since we no longer need to query the satellite DB when checking space used for limiting.

The mechanism by which the new live accounting system works is as follows:

During the upload of any segment, the size of that segment is added to its respective
project total in live accounting. At the beginning of the tally iteration we record
the current values in live accounting as `initialLiveTotals`. At the end of the tally
iteration we again record the current totals in live accounting as `latestLiveTotals`.
The metainfo loop observer in tally allows us to get the project totals from what it
observed in metainfo DB which are stored in `tallyProjectTotals`. However, for any
particular segment uploaded during the metainfo loop, the observer may or may not
have seen it. Thus, we take half of the difference between `latestLiveTotals` and
`initialLiveTotals`, and add that to the total that was found during tally and set that
as the new live accounting total.

Initially, live accounting was storing the total stored amount across all nodes rather than
the segment size, which is inconsistent with how we record amounts stored in the project
accounting DB, so we have refactored live accounting to record segment size

Change-Id: Ie48bfdef453428fcdc180b2d781a69d58fd927fb
2020-01-16 10:26:49 -05:00
littleskunk
0c365d157f
scripts/testversions: replace apikey with access
Change-Id: I4f899cc49b63b2f04f31a6df478475d7bdbab30d
2020-01-16 14:37:31 +01:00
littleskunk
1e77cb88e7
scripts/rollingupgrade: replace apikey with access
Change-Id: I356587cdc417dabc6f15769592d70269d25051dc
2020-01-16 14:00:33 +01:00
Jeff Wendling
78c6d5bb32 satellite/satellitedb: reported_serials table for processing orders
this commit introduces the reported_serials table. its purpose is
to allow for blind writes into it as nodes report in so that we have
minimal contention. in order to continue to accurately account for
used bandwidth, though, we cannot immediately add the settled amount.
if we did, we would have to give up on blind writes.

the table's primary key is structured precisely so that we can quickly
find expired orders and so that we maximally benefit from rocksdb
path prefix compression. we do this by rounding the expires at time
forward to the next day, effectively giving us storagenode petnames
for free. and since there's no secondary index or foreign key
constraints, this design should use significantly less space than
the current used_serials table while also reducing contention.

after inserting the orders into the table, we have a chore that
periodically consumes all of the expired orders in it and inserts
them into the existing rollups tables. this is as if we changed
the nodes to report as the order expired rather than as soon as
possible, so the belief in correctness of the refactor is higher.

since we are able to process large batches of orders (typically
a day's worth), we can use the code to maximally batch inserts into
the rollup tables to make inserts as friendly as possible to
cockroach.

Change-Id: I25d609ca2679b8331979184f16c6d46d4f74c1a6
2020-01-15 19:21:21 -07:00