Commit Graph

435 Commits

Author SHA1 Message Date
Michal Niewrzal
c178a08cb8 satellite/metainfo: add max segment size and max inline size to
BeginObject response

We want to control inline segment size and segment size on satellite
side. We need to return such information to uplink like with redundancy
scheme.

Change-Id: If04b0a45a2757a01c0cc046432c115f475e9323c
2020-04-02 12:41:28 +00:00
Egon Elbre
90319dbec1 scripts: fix test-sim-backwards
Change-Id: I9d6644d4b493f5f6f60a960cacfaac2b5b828a5f
2020-04-02 00:13:42 +02:00
Egon Elbre
644df8dcdc private/version: minimal fix for tag-release.sh
Previous split to a storj.io/private repository broke tag-release.sh
script. This is the minimal temporary fix to make things work.

This links the build information to specified variables and sets them
inline. This approach, of course, is very fragile.

Change-Id: I73db2305e6c304146e5a14b13f1d917881a7455c
2020-04-01 13:46:45 +00:00
Michal Niewrzal
d444fbadea scripts: cleanup rolling upgrade test
* add script for easy rolling upgrade test local execution
* remove unneeded binaries building for rolling upgrade and versions
tests
* unify build process for Jenkins and local execution for rolling
upgrade and versions tests

Change-Id: Ic11211b83f3f447494bbd5827d2af77ea4b20dfe
2020-04-01 12:30:08 +00:00
Michal Niewrzal
0374e30678 scripts: fix storj-sim installation function for rolling upgrade tests
Change-Id: Ifb749e0a41c57b90cc03293c2588b2ceb6a0a15a
2020-03-31 09:11:00 +00:00
Jeff Wendling
e2ff2ce672 satellite: compensation package and commands
Change-Id: I7fd6399837e45ff48e5f3d47a95192a01d58e125
2020-03-30 14:08:14 -06:00
Yingrong Zhao
831668478a scripts/tests: fix gateway installation
Change-Id: I1629644f543505c27d4edb8f7bbe97d037cdc8a8
2020-03-29 19:24:05 +00:00
JT Olio
f28100b73f bump storj.io/private
Change-Id: I4ddd5c34521602967b89bd18e2a71a6f1e29f436
2020-03-27 21:57:35 +00:00
Moby von Briesen
a933bcc99a satellite/repair/repairer/ec.go: add option for downloading pieces onto disk instead of in memory during repair
Add flag to satellite repairer, "InMemoryRepair" that allows the
satellite to decide whether to download the entire segment being
repaired into memory (this is what the satellite already does), or to
download it into temporary files on disk that will be read from in the
upload phase of repair.

This should help with handling high repair traffic on satellites that
cannot afford to spend 64mb of memory per repair worker.

Updates tests to test repair for both in memory and to disk.

Change-Id: Iddf591e165621497c98533d45bfea3c28b08a194
2020-03-27 16:41:00 +00:00
Natalie Villasana
8e0ca0e6f5
satellite/gc: update release default for gc to run separately (#3830) 2020-03-26 14:44:18 -04:00
Michal Niewrzal
3a3648c80b scripts: add script for running versions tests locally
Change-Id: Iaa2a2d085b7edd46f63a0d79b4e731ea72412953
2020-03-24 15:44:00 +00:00
Michal Niewrzal
fdf40a7526 storj: remove storj/private/version package which was moved to
`storj/private` repo

Change-Id: I81c3f5b9d5e4fe7bca760999eb045ee9734e5e2e
2020-03-24 14:31:33 +00:00
Egon Elbre
6a7571f73e cmd/s3-benchmark: move to storj.io/benchmark
Change-Id: Idca2b836bdf876ca28eb5cabc9bfae1d576e4a3e
2020-03-23 19:09:42 +02:00
JT Olio
3b66ba6f02 scripts/tests: uplink no longer respects --client.segment-size
this is going to make all the tests slower but it is what it is

test-sim-aws.sh is removed because it was moved to storj/gateway repo.

Change-Id: I10727e747a4c3740b1c9054ce7d17313b4fa310b
2020-03-22 17:48:57 +00:00
Jennifer Johnson
699b635e5d satellite/overlay: rename newNodePercentage to newNodeFraction
Change-Id: Ie66de91f88183b44de0773589e83e4ade9aa997a
2020-03-19 20:09:32 +00:00
Jessica Grebenschikov
5142874144 satellite/gc: move garbage collection to its own process
Change-Id: I7235aa83f7c641e31c62ba9d42192b2232dca4a5
2020-03-18 16:44:01 +00:00
Egon Elbre
09e0f3de63 satellite/metainfo/piecedeletion: add Service
Change-Id: Id7e32ed569701fa0be66f9527c43a67052994570
2020-03-18 14:50:08 +00:00
Matt Robinson
bd4982e249
test/backwards-compatibility: Exclude rc tags from testing (#3787)
Change-Id: Id486709306c5c75a262ea17410ae641be53df8ed

Co-authored-by: Ivan Fraixedes <ivan@fraixed.es>
Co-authored-by: littleskunk <jens.heimbuerge@googlemail.com>
2020-03-18 00:12:05 +01:00
littleskunk
b10b69d9ce
test/rollingupgrade: fix stage 1 release version (#3810) 2020-03-17 22:30:07 +01:00
littleskunk
80acf33abc
script/release: fix error in regex (#3809)
Co-authored-by: Stefan Benten <mail@stefan-benten.de>
2020-03-17 17:36:17 +01:00
Stefan Benten
49a30ce4a7
satellite/payments: Set proper defaults for the release (#3806)
* Slight adjustments to the migration

Change-Id: I68ae81c010c3414fde2845df16ab124f8d17834b

* Change Coupon Value

Change-Id: I0f241d09e5f716f1d1b3f0688643ba7f614d83c4

* Change AlphaUsage to 5GB

Change-Id: I5d25c6b5750684510cda8b14a27f38d5b2b07408

* change config lock

Change-Id: Ib7c7a54555ba2387c9aa8dd60a0501b0ee6491dd

* Use Scan properly

Change-Id: Ie39cf4644e3ddd703a254e2f5e616763dd805235

* Fix Config Lock

Change-Id: I558ecc1c1becfaaefc7aea5ad2fe83fd6bf6b561
2020-03-16 22:53:12 +01:00
Stefan Benten
52590197c2
satellite/payments: More Cleanup and Satellite command to ensure we have stripe customers (#3805) 2020-03-16 20:34:15 +01:00
Kaloyan Raev
4f0bf3fe1d
build: cleanup more gateway targets from Makefile (#3802)
Change-Id: Ia95caa2187b3e9e056a83cbea4230788ed4e8abd

Co-authored-by: Michal Niewrzal <michal@storj.io>
2020-03-16 15:07:52 +01:00
Stefan Benten
bd603c0751
satellite/payments: Improve Invoice Generation (#3800) 2020-03-13 17:07:39 +01:00
Yingrong Zhao
3cf05b24d8 scripts: add benchmark test for delete operation
Change-Id: I448cc50375c1c712d704d8cf93f5b8481372b0c8
2020-03-12 17:03:19 +00:00
JT Olio
051569c69f
satellite: enable open registration (and add flag that disables it) SM-441
Change-Id: I47bfedb312089f6d2bfbab013bd74ad4b8aa5f5e
2020-03-11 03:53:34 +01:00
Yingrong Zhao
1a875baa1d scripts/tests: fix arguments orders passed to test-versions.sh
Change-Id: I2e3b501477823413137f6a43fcdc4347af27e43c
2020-03-11 13:50:18 +00:00
Yingrong Zhao
46b04a38cc scripts/tests/: fix uplink access to contain satellite id
Change-Id: I7dfb6bc2da3bf84a81d75a286ee9488e32ea4f01
2020-03-10 21:15:39 +00:00
Michal Niewrzal
c20cf25f35 cmd: migrate uplink CLI to new API
Change-Id: I8f8fcc8dd9a68aac18fd79c4071696fb54853a60
2020-03-09 13:26:29 +00:00
littleskunk
842c8d8ed9
scripts/tests/rollingupgrade: fix installation for current commit 2020-03-06 17:19:55 -05:00
littleskunk
8fa8178f04
release/rollingupgrade: on release tags run rolling upgrade against previous release (#3792)
Co-authored-by: Stefan Benten <mail@stefan-benten.de>
2020-03-03 23:56:32 +01:00
Michal Niewrzal
fb2711d05e scripts: update postgres helper script to set password
Latest postgres docker image requires non empty password.

Change-Id: I03017e1b7ff4803fefc24c39087d9ccd4042373b
2020-02-27 10:33:37 +00:00
Yingrong Zhao
ac34485f5d scripts/tests: install correct version of gateway
1. only run release tags that don't contain 'rc'
2. install gateway version that's the same as satellite
3. update gateway access to contain satellite id

Change-Id: I8ca1418302c3aafdf0c4eaaf8361422a1eec2bd4
2020-02-26 13:12:31 +00:00
Jessica Grebenschikov
e19e3c1101 pkg/process:
Now that we are trying to identify the root cause of the satellite load limitations (i.e. currently the satellite has a max ability of 400 rps for uploads and we need this to be higher), we are using the golang diagnostic tools to collect insight into what the bottlenecks are.  We currently have a debug endpoint to gather some cpu and mem data, but it could be useful to have continuous profiling. GCP stackdriver has support for continuous profiling so lets set that up and see if it is helpful to gather more data.

This PR adds support for [GCP continuous profiler](https://cloud.google.com/profiler) which allows enabling continuous cpu/mem profiling and the stats are sent to stackdriver in google cloud console.

To enable the continuous profiling for a storj component, do the following:
- prereq: the workload must be running in GKE and have Stackdriver Profiling IAM role permissions
- provide the config flag `debug.profilename` in the config.yaml file for the workload (i.e. satellite api process, etc). The profilename should be the workload name, for example "satellite-api".
- once the above config flag is provided, the profiler will be initialized and profiling stats will automatically be sent to GCP project where the workload is running and viewable in the Stackdriver Profile page in the console

The current implementation assumes the workload is running in GKE, however if we find if useful we can add support to enable this from anywhere. But for simplicity, its configured this way assuming the main goal is to enable in production systems.

Change-Id: Ibf8ebe2df7bf06fdd4951ee6a1e48854dd36ad47
2020-02-25 09:04:23 -08:00
paul cannon
92d86fa044 satellite/repair: fix repair concurrency
This new repair timeout (configured as TotalTimeout) will include both
the time to download pieces and the time to upload pieces, as well as
the time to pop the segment from the repair queue.

This is a move from Github PR #3645.

Change-Id: I47d618f57285845d8473fcd285f7d9be9b4318c8
2020-02-24 19:57:09 +00:00
Jeff Wendling
f671eb2beb satellite/satellitedb: use queue for orders to get back fast billing
This change adds two new tables to process orders as fast as we used
to but in an asynchronous manner and with hopefully less storage
usage. This should help scale on cockroach, but limits us to one
worker. It lays the groundwork for the order processing pipeline to
be queue rather than database driven.

For more details, see the added fast billing changes blueprint.

It also fixes the orders db so that all the timestamps that are
passed to columns that do not contain a time zone are converted to
UTC at the last possible opportunity, making it less likely to use
the APIs incorrectly. We really should migrate to include timezones
on all of our timestamp columns.

Change-Id: Ibfda8e7a3d5972b7798fb61b31ff56419c64ea35
2020-02-24 17:07:07 +00:00
Egon Elbre
e30f7b35b6 cmd/gateway: use a separate repository
Change-Id: Idbb0b2b6cf0e60c6d5d91218c24524d72285cf26
2020-02-24 10:03:03 +02:00
Michal Niewrzal
54e38b8986 pkg/miniogw: gateway implementation with new libuplink
Change-Id: I170c3a68cfeea33b528eeb27e6aecb126ecb0365
2020-02-21 16:20:38 +01:00
Yingrong Zhao
77f67a8086 satellite/metainfo: add timeout for delete request
Change-Id: I9cad6d7ea185fc2c0ed4e58b42e4e3a78178a79f
2020-02-20 09:10:16 +00:00
JT Olio
2ae9978304 satellite/gc: skip first gc run
rationale: if GC kills the satellite, it would be nice to make
it through a repair checker sweep first

Change-Id: Id56171dc8e13940cfb6481e36a910bad077a01ed
2020-02-13 13:41:15 +02:00
littleskunk
76849558cb satellite/gracefulexit: increase performance and tolerate higher error
rate

Graceful exit is very slow at the moment. Over the last couple days we
increase the batch size on Stefans satellite to 1000 but as a side
effect the error rate was increased. With a batch size of 500 the error
rate looks stable.
This PR will increase the default to batch size to 300. Graceful exit
will still be painful slow but at least it will be a bit faster. At the
same time this PR also increases the number of errors we tolerate. We
don't want to DQ slow storage nodes just because they didn't finish all
300 transfers in time. We want to give them more retries.

Change-Id: I92e3f99e116d4988457d8b902a88e85ed1bcc1a7
2020-02-12 11:40:15 +00:00
Egon Elbre
dbf46c4aa7 satellite/admin: administrative endpoint
Admin server allows creating basic REST and html API-s
for different administrative tasks.

Change-Id: I3dc1786abe1c87350eed60ec90e48130f44e63cf
2020-02-12 12:12:50 +02:00
Cameron Ayer
b22bf16b35 satellite/overlay: add config flag for node selection free disk requirement
Currently SNs report their free disk space once per hour. If a node
becomes full, it has to wait until the next contact cycle begins to
report; all the while receiving and failing upload requests. By increasing
the minimum required disk space, we can give the storage nodes more time
to report their space before the completely fill up. This change goes
hand-in-hand with another change we want to implement: trigger capacity
report on SN immediately upon falling below threshold.

Change-Id: I12f778286c6c3f582438b0e2949765ac43325e27
2020-02-11 18:08:25 +00:00
Qweder93
dc075eaa96 satellite/payments : deposit bonuses (credits) added
Change-Id: Ib151bbb9b02d655fa619c53bfbc04ed6f3bb39e0
2020-02-11 11:11:42 +00:00
Moby von Briesen
8c19855871 scripts/tests/rollingupgrade: explicitly set debug port for old
satellite api during rolling upgrade test

The old api is using the same config file as the new satellite in the
rolling upgrade test, so we need to set it to something different so
that there is no conflict when we spin up a new storj-sim instance while
the old api is running concurrently.

Change-Id: Ia4ec2db4953f36f43275495710992831ad3916a2
2020-01-29 18:32:03 -05:00
Egon Elbre
a2b2bc676b pkg/debug: implement control panel
Control Panel allows to control different chores and services.
Currently this adds controlling of cycles.

Change-Id: I734f1676b2a0d883b8f5ba937e93c45ac1a9ce21
2020-01-29 16:30:31 -05:00
littleskunk
e0cb8037c1 satellite/projectusage: reduce usage limit from 5GB to 0GB
Change-Id: Ie3d2509613e7a4336e2a8d2b136b32f5f308aafc
2020-01-29 20:38:39 +00:00
Ethan
149273c63f satellite/metainfo: add cache expiration for project level rate limiting
Allow rate limit project cache to expire so we can make project level rate limit changes without restarting the satellite process.

Change-Id: I159ea22edff5de7cbfcd13bfe70898dcef770e42
2020-01-29 16:14:10 +00:00
Yaroslav Vorobiov
083b396c16 satellite/payments: allow floating point numbers for pricing
Change-Id: I78b60134cf043746efef5371b761939a10f75aaf
2020-01-28 22:52:13 -05:00
littleskunk
a0c9f7f3b0
satellite/projectusage: reduce usage limit from 25GB to 5GB
Change-Id: I2819012b520fd687ab8058000aa38d76b8208158
2020-01-29 04:01:09 +01:00
littleskunk
a6c6440ab7 satellite/order: decrease expire time from 7 days to 2 days
For the last few month we had no issues with order submission. I would
call it stable and now it is time to risk a lower expire time. This will
increase the database performance on the satellite and it will reduce
the delay for billing.

The long term goal is 6h but for that step we need to change graceful
exit first. At the moment storage nodes would get disuqlaified for not
transfering alle pieces in less than 6 hours.

Change-Id: I421a2c2421c5374c4e706e2338f1c2161fedc14c
2020-01-24 23:37:39 +00:00
Yingrong Zhao
5de4f66553 scripts/tests: change multisegment file to be 128kb
To cover a special case: an object that has 2 remote segments
and 1 inline segment.

Change-Id: Ia8d82bb67fc6cf76af9c7f44cd738cab6df591e9
2020-01-22 17:12:11 +00:00
Michal Niewrzal
6502454947 satellite/metainfo: move RS configuration to satellite
With this change RS configuration will be set on satellite. Uplink with
get RS values with BeginObject request and will use it. For backward
compatibility and to avoid super large change redundancy scheme stored
with bucket is not touched. This can be done in future.

Change-Id: Ia5f76fc10c37e2c44e4f7b8754f28eafe1f97eff
2020-01-22 09:33:53 +00:00
Moby von Briesen
d32626fe8c scripts/tests: update uplink config migration for test versions
Updates config migration to occur for any v0.30.x release rather than
specifically 30.4

Also updates the config for the rolling upgrade test to use 64 kib
segments, and use smaller files for the final upload of rolling upgrade.

Change-Id: I941f77fe2b9011b45f28a5f3a2430e882d2ae6b3
2020-01-21 11:50:13 -05:00
Ethan
21a5d70a83 satellite/metainfo: Rate limiting - API requests
Limits how many times metainfo APIs can be called per second by project ID. If limit is exceeded, the API will return Unauthorized/Too Many requests.

Limit per second and the size of the limiter cache per project are configurable, as well as whether the limiter is enabled.

Tests added/updated for the new rate_limit field in projects table.
Tests added for exceeding limits and disableing limiter.

Change-Id: Ic8ad102de3b690a475809d4f684156d5715f20fa
2020-01-21 14:25:04 +00:00
Moby von Briesen
0def7a9d2a scripts/tests/testversions;scripts/tests/rollingupgrade: update test versions script
Fix uplink setup step for uplink versions that requires an access field.

Update how script selects uplink versions to test.

Use significantly smaller remote files for test (performance).

Change-Id: If590b8798767e2a0621fb84cd3b8852d02f6d1da
2020-01-20 11:46:11 -05:00
stefanbenten
f4097d518c satellite: reduce logging of node status
Change-Id: I6618cf4bf31b856acd7a28b54011a943c03ab22a
2020-01-18 17:47:59 +00:00
igaass
491cd8d8ab
scripts: automated test for testing uplink share command (#3736)
* scripts: automated test for testing uplink share command

* Replace "scope" to "access"

* Remove redundant access flag

* Rename variables
Remove retVal variable
2020-01-17 12:57:38 +02:00
littleskunk
b6f1a91c67 scripts/testversions,rollingupgrade: remove encrytion key
Change-Id: I6fd35fa4b29707f53e988bd00d6523b934767ecc
2020-01-16 16:01:34 +00:00
Cameron Ayer
4424697d7f satellite/accounting: refactor live accounting to hold current estimated totals
live accounting used to be a cache to store writes before they are picked up during
the tally iteration, after which the cache is cleared. This created a window in which
users could potentially exceed the storage limit. This PR refactors live accounting to
hold current estimations of space used per project. This should also reduce DB load
since we no longer need to query the satellite DB when checking space used for limiting.

The mechanism by which the new live accounting system works is as follows:

During the upload of any segment, the size of that segment is added to its respective
project total in live accounting. At the beginning of the tally iteration we record
the current values in live accounting as `initialLiveTotals`. At the end of the tally
iteration we again record the current totals in live accounting as `latestLiveTotals`.
The metainfo loop observer in tally allows us to get the project totals from what it
observed in metainfo DB which are stored in `tallyProjectTotals`. However, for any
particular segment uploaded during the metainfo loop, the observer may or may not
have seen it. Thus, we take half of the difference between `latestLiveTotals` and
`initialLiveTotals`, and add that to the total that was found during tally and set that
as the new live accounting total.

Initially, live accounting was storing the total stored amount across all nodes rather than
the segment size, which is inconsistent with how we record amounts stored in the project
accounting DB, so we have refactored live accounting to record segment size

Change-Id: Ie48bfdef453428fcdc180b2d781a69d58fd927fb
2020-01-16 10:26:49 -05:00
littleskunk
0c365d157f
scripts/testversions: replace apikey with access
Change-Id: I4f899cc49b63b2f04f31a6df478475d7bdbab30d
2020-01-16 14:37:31 +01:00
littleskunk
1e77cb88e7
scripts/rollingupgrade: replace apikey with access
Change-Id: I356587cdc417dabc6f15769592d70269d25051dc
2020-01-16 14:00:33 +01:00
Jeff Wendling
78c6d5bb32 satellite/satellitedb: reported_serials table for processing orders
this commit introduces the reported_serials table. its purpose is
to allow for blind writes into it as nodes report in so that we have
minimal contention. in order to continue to accurately account for
used bandwidth, though, we cannot immediately add the settled amount.
if we did, we would have to give up on blind writes.

the table's primary key is structured precisely so that we can quickly
find expired orders and so that we maximally benefit from rocksdb
path prefix compression. we do this by rounding the expires at time
forward to the next day, effectively giving us storagenode petnames
for free. and since there's no secondary index or foreign key
constraints, this design should use significantly less space than
the current used_serials table while also reducing contention.

after inserting the orders into the table, we have a chore that
periodically consumes all of the expired orders in it and inserts
them into the existing rollups tables. this is as if we changed
the nodes to report as the order expired rather than as soon as
possible, so the belief in correctness of the refactor is higher.

since we are able to process large batches of orders (typically
a day's worth), we can use the code to maximally batch inserts into
the rollup tables to make inserts as friendly as possible to
cockroach.

Change-Id: I25d609ca2679b8331979184f16c6d46d4f74c1a6
2020-01-15 19:21:21 -07:00
Michal Niewrzal
c8ccd26e04 cmd/uplink: import imports 'access' into existing configuration
https://storjlabs.atlassian.net/browse/V3-3491

Change-Id: I9c5f649ded314bb3a2235588c746913a3ec2d203
2020-01-14 13:18:48 +00:00
Isaac Hess
4950d7106a satellite/orders: Add write cache for bw rollups
Change-Id: I8ba454cb2ab4742cafd6ed09120e4240874831fc
2020-01-13 22:40:51 +00:00
Michal Niewrzal
36db00b2bf cmd/uplink: don't require setup or import if --access is set
We want to make using uplink as easy as possible. That's why we wan't to
avoid requiring setup or import command before normal usage if user
specified --access flag. If this flag is set then rest flags should be
set as defaults.

https://storjlabs.atlassian.net/browse/V3-3490

Change-Id: I95a7bd77a3f00b8d9981fee513e9e77aef298bca
2020-01-11 07:47:53 +00:00
Jeff Wendling
77fd41a02e satellite: add an expiring lru cache around api keys
Change-Id: I995429c66affd33da59b091f28f09ca122070b5e
2020-01-09 22:13:41 -07:00
Natalie Ventura Villasana
6b1829f3c3
satellite/downtime: new chore estimates downtime
Adds EstimationChore to the downtime package, which is an
independent chore that finds offline nodes given a configurable
limit, then uptime checks those nodes, and sets a last contact
success or failure given a response. For failed nodes, the chore
updates the amount of downtime the node has been offline in the
DowntimeTracking table.

Design doc section: https://github.com/storj/storj/blob/master/docs/blueprints/storage-node-downtime-tracking.md#estimating-offline-time
Jira: https://storjlabs.atlassian.net/browse/V3-2545

Change-Id: I60af95803930bf9b33232b248bb20cca6f0e0b5f
2020-01-09 15:05:13 -05:00
Yingrong Zhao
76ee8a1b4c satellite: remove UptimeReputation configs from codebase
With the new storage node downtime tracking feature, we need remove current uptime reputation configs: UptimeReputationAlpha, UptimeReputationBeta, and
UptimeReputationDQ. This is the first step of removing the uptime
reputation columns from satellitedb

Change-Id: Ie8fab13295dbf545e33aeda0c4306cda4ba54e36
2020-01-08 18:54:15 +00:00
Egon Elbre
fb4b11d13e
scripts: remove old scripts (#3742) 2020-01-07 13:28:41 +02:00
littleskunk
6861f28bbf release/script: allow RC release tags
Change-Id: I635f4579e990b638c6579318dee632dce15e3cf1
2020-01-07 10:43:03 +00:00
Moby von Briesen
ea84af578b scripts/tests/rollingupgrade: create new test files for final upload
stage

The test-versions script no longer uses the `testfiles` directory, which
the final upload for the rolling-upgrade script depended on. This change
creates and populates a `testfiles` diirectory during the final upload
stage of the rolling upgrade test.

Change-Id: Iabeccbadc55a8c85a1febbd5eb4e7d889a57a8dc
2020-01-06 12:31:12 -05:00
Yingrong Zhao
07a1702f41 scripts/tests/rollingupgrade: fix test-versions.sh path referrence
Change-Id: I5c696e5d38c087c50f025796e2f48876883d0f4a
2020-01-04 19:42:15 -05:00
Yingrong Zhao
71c5c2213f scripts/tests/testversions: make binary installation and upload/download running in parallel
Change-Id: I16d87f7e16e2daf30e4d7ee5490b76c175b06930
2020-01-04 16:39:45 +00:00
Jeff Wendling
29fe206b9a satellite/gc: add timeout to retain requests
We don't want slowloris nodes to be able to indefinitely block
up the satellite, so add a timeout. Some monitoring inspection
showed the largest success times being on the order of 30s, so
a 1min timeout should be sufficient to kill the misbehaving nodes.

Change-Id: I5e2c3480a15f6304e37262d0a4d30d07eae99bb3
2020-01-03 21:46:46 +00:00
Simon Guindon
e1e7cebe49 satellite/metainfo: added rate limiting support to the metainfo loop.
As per discussed we decided to rate limit how fast we iterate through
the metainfo database in the metainfo loop. This puts in place a
mechanism for rate limiting and burst limiting if need be in the future.

The default for this rate limiting is still no limits so it stays the
same as our previous functionality.

Change-Id: I950f7192962b0e49f082d2c4284e2d52b0a925c7
2020-01-03 15:00:29 -05:00
Ethan
05b406e992 satellite:{downtime,overlay}: Implement offline node detection chore
https://storjlabs.atlassian.net/browse/V3-3398

Change-Id: I598c3bad819026377d1d113c099dc9bba8b02742
2020-01-03 17:10:03 +00:00
Moby von Briesen
e34ac3ef3a ci,scripts/tests/rolling-upgrade: run rolling upgrade test on private jenkins
Change-Id: Ic1c9f7539ee0ac371bcb856bdbcac2ff6c0ccc65
2020-01-02 16:27:41 -05:00
Moby von Briesen
aecea820fc scripts: add rolling upgrade test script
Change-Id: Ibf79c8e40da54520ce17e2e1f66124c117b32b53
2020-01-02 13:38:56 -05:00
Natalie Ventura Villasana
aa3e183c2e
satellite/gracefulexit: add ge eligibility check
Adds check to see if storage nodes are eligible to initiate
graceful exit, by checking their CreatedAt date and seeing if
their "age" is greater than the new config value:
NodeMinAgeInMonths
The default for this value is 6 months for now.

https://storjlabs.atlassian.net/browse/V3-3357

Change-Id: Ib807ab8987ddb5a38a27a83886490f73fe8c5816
2019-12-31 09:31:58 -05:00
Egon Elbre
6615ecc9b6 common: separate repository
Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a
2019-12-27 14:11:15 +02:00
Egon Elbre
ea455b6df0 all: remove code to default to grpc
We have moved to drpc so we don't need to have code for building
with grpc only.

Change-Id: I55732314dca0d5b4ce1132b68de4186a15d91b21
2019-12-20 20:12:04 +02:00
Yingrong Zhao
c6854accdf scripts: add test-versions stage to private Jenkins
test-sim-versions.sh tests upgrading the satellite, storagenodes, and uplinks from the most recent release to master, and ensures that compatibility across all uplink versions since v0.15 is maintained.

Change-Id: I80a54236d0eb2d681716caf4b825a883bdc25ef1
2019-12-20 15:52:54 +00:00
Egon Elbre
ef8bc88328 ci: use external repository
Change-Id: If26a005df45f6067240511d603fb4dd613f92b79
2019-12-19 12:05:49 +00:00
littleskunk
d5c5b57fac satellite/db: enable DeleteTallies
Change-Id: I1e2a6873b3e6398260e053592d676993272b960d
2019-12-18 13:16:06 +00:00
Yingrong Zhao
1a625887ed scripts: Add script to automate testing against all highest release
points from major releases starting from v0.15.4 for uplink

Change-Id: I7a3a300466691a47b0324ee5440d70cac42df641
2019-12-17 17:47:17 +00:00
Simon Guindon
a47d7ac89b scripts: Add script that filters postgres plaintext backup to cockroachdb compat.
Change-Id: I457e8f0566186fc76b7ae61db77c01153c3e1079
2019-12-16 22:11:16 +00:00
Simon Guindon
8242eecea6 Adding benchmarking script that reports response times.
Change-Id: Ide6f439849ec51cd41f491eb3ff00a7ad0f8a560
2019-12-16 16:34:01 -05:00
Andrew Harding
cb89496569 storagenode/trust: wire up list into pool
- also updated ping chore to pick up trust changes
- fixed small typo in blueprint
- fixed flags for storj-sim
- wired up changes to testplanet

Change-Id: I02982f3a63a1b4150b82a009ee126b25ed51917d
2019-12-13 20:32:50 +00:00
Jess G
4f282921c4
jenkins: run storj-sim integration tests with cockraochdb (#3723)
* add integration tests to jenksin

* have jenkins run storj-sim integration tests w/crdb

Change-Id: I696d55c5894aaf630dcd7a566e1dd705ee88486b

* rm crdb integration tests to see if postgres passes

Change-Id: I1727a027ff802acbff5fc55961a0d605faefcf2d

* comment out aws tests to see if that is the error

Change-Id: I456c3d36f6a4ce7760ea0b6c402b6ea16cfe77e3

* add aws profile to integration tests

Change-Id: Ic01185dbc7b84ac48dfb846f8f272b34b50379b6

* add tmp path for aws profile and creds

Change-Id: I7b82ee5a99937edd3f66ae01bfb5cb21028a62cf

* change linux KiB syntax to bytes to support osx

Change-Id: Ia1f1027ba8da64a6ba537062deb9b3519973621f
2019-12-10 11:18:02 -08:00
littleskunk
71b58edb2c satellite/repair: decrease repair interval
Change-Id: Id9efdbfaa82521c35dc41e7a52b700522c197e77
2019-12-10 00:36:00 +00:00
littleskunk
6ab72a6e79 satellite/gracefulexit: enable graceful exit in production
Change-Id: I526ce4a4de9c318f1333b793e3167f5f86d65adc
2019-12-09 17:32:34 +00:00
Malcolm Bouzi
18a5e614d9 satellite/web: add segmentio plugin (#3405) 2019-11-27 11:57:59 -05:00
Yingrong Zhao
63e51df9a6
private/testplanet: add a mock referral manager server into testplanet (#3631) 2019-11-21 17:34:49 -05:00
Matt Robinson
976881f72b
satellite/console: Add security headers (#3615)
* satellite/console: Add X-Frame-Options and Referrer-Policy security headers

* Update to use CSP instead of XFO and include tardigrade.io

* Make FrameAncestors a config option

* Update satellite-config lock

* Make help text for FrameAncestors better
2019-11-21 11:15:22 -05:00
Matt Robinson
b5707d1e5d
scripts: make update-tools.sh more verbose (#3572)
* Make update-tools.sh more verbose

* Was checking the wrong filehandle
2019-11-20 09:41:06 -05:00
littleskunk
c52c7275ad
satellite/repair: reduce upload timeout (#3597) 2019-11-18 18:52:56 +01:00
Nikolai Siedov
3fe518d547
satellite: added ability to inject stripe public key post build (#3560) 2019-11-18 13:38:43 +02:00
Jeff Wendling
ecd2ef4a21 all: build release fully dprc and test in mixed mode
Change-Id: I3bded3edf25a0b113601c8b12ecf1337f596649b
2019-11-15 10:03:18 -07:00
Yaroslav Vorobiov
53c6741ba6
satellite/payments: add API for retrieving conversion ratio, convert tokens to USD before applying to balance (#3530) 2019-11-15 16:59:39 +02:00
Yehor Butko
a8e4e9cb03
satellite/payments: project usage charges (#3512) 2019-11-15 16:27:44 +02:00
Egon Elbre
ee6c1cac8a
private: rename internal to private (#3573) 2019-11-14 21:46:15 +02:00
Matt Robinson
b2a7a9f4c4 scripts: add script to update tools (#3570) 2019-11-14 16:53:32 +02:00
Natalie Villasana
1a9757a7f2 satellite/gracefulexit: add count for order limits sent from satellite to exiting node (#3544) 2019-11-13 09:54:50 -05:00
Egon Elbre
994a69cfdc jenkins: use lower segment size for back comp test (#3097) 2019-11-06 05:53:38 -08:00
Egon Elbre
3b18c864dc
test/backwards-compatibility: fix port change (#3509) 2019-11-06 13:46:54 +02:00
Yaroslav Vorobiov
0b32690d0a satellite/peer: add payments config (#3488)
* satellite/peer: add payments config

* remove stripe-key from console config

* update config lock

* fix imports

* fix config-lock
2019-11-05 21:26:19 +01:00
littleskunk
def3dcbaa9
satellite/audit: increase timeout to 5 minutes (#3480)
* satellite/audit: increase timeout to 5 minutes

* fix lint error
2019-11-05 11:21:25 +01:00
Jess G
5abb91afcf
satellite: change the Peer name to Core (#3472)
* change satellite.Peer name to Core

* change to Core in testplanet

* missed a few places

* keep shared stuff in peer.go to stay consistent with storj/docs
2019-11-04 11:01:02 -08:00
Jennifer Li Johnson
76b64b79ba
cmd/identity: allow using redis for RevocationDB (#3259) 2019-11-01 13:27:47 -04:00
Maximillian von Briesen
590312970d satellite/gracefulexit: add flag for enabling/disabling graceful exit on the satellite (#3437) 2019-11-01 16:21:24 +02:00
Maximillian von Briesen
d9bb25b4b9 satellite/metainfo: support a wider range of values for RS.Total in satellite metainfo validation (#3431)
change uplink RS default configuration from 130 to 95
2019-10-31 15:04:33 -04:00
Michal Niewrzal
acc7b116aa
scripts: use postgres script with all tests (#3404) 2019-10-31 07:03:54 -07:00
Yingrong Zhao
bfa6699e2c
satellite/repair: add timeout for repair download from a single node(#3418) 2019-10-30 16:31:08 -04:00
Jess G
e96d615013
satellite: remove satellite API code from peer (#3414)
* rm dup api code from sa peer, update storj-sim

* fix for backwards compat tests

* use env var instead of localhost

* changes per CR

* fix env var name

* skip peer for setup
2019-10-30 12:23:09 -07:00
Natalie Villasana
4878135068
satellite/gracefulexit, storagenode/gracefulexit: add timeouts (#3407) 2019-10-30 13:40:57 -04:00
Michal Niewrzal
da2eaa7085
scripts: dev script to start postgres before tests (#3344) 2019-10-27 12:02:10 -07:00
Yingrong Zhao
fa1ac24e19
satellite/gracefulexit: add failure threshold check (#3329)
* add overall failure percentage check and inactive time frame check before sending a response to sno

* update comment

* delete node from transfer queue if it has been inactive for too long

* fix linting error

* add test config value

* fix nil pointer

* add config value into testplanet

* add unit test for overall failure threshold

* move timeframe threshold to chore

* update protolock

* add chore test

* add per peiece failure count logic

* change config name from EndpointMaxFailures to MaxFailuresPerPiece

* address comments

* fix linting error

* add error handling for no row returned from progress table

* fix test for graceful exit chore on storagenode

* fix typo InActive -> Inactive

* improve readability for failure threshold calculation

* update config lock

* change error handling for GetProgress in graceful exit endpoint on the satellite side

* return proper rpc error in endpoint

* add check in chore test for checking finish timestamp and queue
2019-10-24 12:24:42 -04:00
Egon Elbre
3c438f31bd
satellite/satellitedb: remove sqlite support (#3296) 2019-10-19 00:27:57 +03:00
littleskunk
2a5526fcc4
satellite/repair: reduce timeout (#3302) 2019-10-18 13:43:24 +02:00
Natalie Villasana
855fca003d satellite/metrics: create a metrics chore (#3263)
* add metrics counter and chore

* updates metrics observer interval release default and dev default to 15min

* add more specific check for remote pointers

* add Counter field to metrics chore, add counter tests

* rm redundant ObjectCount suffix

* make pointer check easier to read

* change metrics.Config.Interval to ChoreInterval

* rm unneeded var

* fix comment

* update satellite config lock
2019-10-16 14:08:33 -04:00
Cameron
76ad83f12c
satellite/accounting: add redis support to live accounting (#3213)
* set up redis support in live accounting

* move live.Service interface into accounting package and rename to Cache, pass into satellite

* refactor Cache to store one int64 total, add IncrBy method to redis client implementation

* add monkit tracing to live accounting
2019-10-16 12:50:29 -04:00
Natalie Villasana
cf430d2d73
scripts: add check-monitoring script to detect changes to monkit calls (#3114) 2019-10-15 13:00:14 -04:00
Jennifer Li Johnson
b185dbbee2
satellite/discovery: remove discovery related code (#3175) 2019-10-14 10:57:01 -04:00
littleskunk
96aeedcdee
OrderLimit/GracePeriod: Increase time window from 1h to 24h (#3255)
* OrderLimit/GracePeriod: Increase time window from 1h to 24h

* update satellite config lock
2019-10-13 17:40:24 +02:00
Ethan Adams
a1275746b4
satellite/gracefulexit: Implement the 'process' endpoint on the satellite (#3223) 2019-10-11 17:18:05 -04:00
Isaac Hess
9256399872
CI: test drpc and grpc (#3163)
* wip: test drpc

* Add parallel intregration test

* Add jenkinsfile.drpc

* Remove unnecessary jenkinsfile items

* testing: GOFLAGS=-drpc (#3236)

* Use GOFLAGS

* add debug

* revert tags

* revert changes

* move goflags to the correct place

* add sanity check
2019-10-11 08:30:06 -06:00
Ethan Adams
4c4519f0be
satellite/gracefulexit: add transfer queue for pieces (#3174)
initial impl of transfer queue
updated docs represent the new design how we handle durability during exit
2019-10-07 16:38:05 -04:00
Jeff Wendling
c1fbfea7fa drpc: bump to latest version
Change-Id: I8426c2dd7f6263050c746c2724524ff687c7298a
2019-10-04 15:09:10 -06:00
Jennifer Li Johnson
7ceaabb18e
Delete Bootstrap and Kademlia (#2974) 2019-10-04 16:48:41 -04:00
Stefan Benten
1db4251234 Satellite/repair: Add Repair Threshold Override to allow earlier repair (#3151) 2019-10-02 14:58:37 +02:00
Egon Elbre
ef5e0dce20
scripts: ignore .build directory for size checks (#3153) 2019-10-02 15:22:35 +03:00
Maximillian von Briesen
08ed50bcaa
satellite/metainfo: add commit interval to prevent long delays between order limit creation and segment commit (#3149) 2019-10-01 12:55:02 -04:00
Bogdan Artemenko
423d35fb3f
satellite/console: Added support URLs and other fields to config file (#3090) 2019-09-27 10:48:53 -06:00
Matt Robinson
02f68d68d6 Put -s and -w in the right spot (#3135) 2019-09-27 17:38:02 +02:00
Bryan White
c8aa821ccb
pkg/certificates: move certificate package to root (#3107) 2019-09-26 09:11:05 -07:00
Stefan Benten
c71f3a3f4a internal/version: Change default endpoint to query (#3126)
* change default domain name

change default domain name to point to the new version control

* Update satellite-config.yaml.lock
2019-09-25 22:55:38 +02:00
Bryan White
515799267f fix certificates auth export command (#3110)
* fix certificates auth export command

* actually fix command config logic

* fix test-certificates.sh

* simplify
2019-09-24 10:38:18 -06:00
Jennifer Li Johnson
d2502bb51b Adds tests for kad replacement and restores kad operator configs (#3094)
* test that all nodes can check in with all satellites

* keep kademlia config

* add untrusted satellite test

* use getversion

* remove kademlia config changes in test-sim-backwards.sh

* add kademlia flags back to storj-sim storagenode

* reset kademlia flags in storagenode entrypoint
2019-09-20 16:02:23 -04:00
Jennifer Li Johnson
724bb44723
Remove Kademlia dependencies from Satellite and Storagenode (#2966)
What:

cmd/inspector/main.go: removes kad commands
internal/testplanet/planet.go: Waits for contact chore to finish
satellite/contact/nodesservice.go: creates an empty nodes service implementation
satellite/contact/service.go: implements Local and FetchInfo methods & adds external address config value
satellite/discovery/service.go: replaces kad.FetchInfo with contact.FetchInfo in Refresh() & removes Discover()
satellite/peer.go: sets up contact service and endpoints
storagenode/console/service.go: replaces nodeID with contact.Local()
storagenode/contact/chore.go: replaces routing table with contact service
storagenode/contact/nodesservice.go: creates empty implementation for ping and request info nodes service & implements RequestInfo method
storagenode/contact/service.go: creates a service to return the local node and update its own capacity
storagenode/monitor/monitor.go: uses contact service in place of routing table
storagenode/operator.go: moves operatorconfig from kad into its own setup
storagenode/peer.go: sets up contact service, chore, pingstats and endpoints
satellite/overlay/config.go: changes NodeSelectionConfig.OnlineWindow default to 4hr to allow for accurate repair selection
Removes kademlia setups in:

cmd/storagenode/main.go
cmd/storj-sim/network.go
internal/testplane/planet.go
internal/testplanet/satellite.go
internal/testplanet/storagenode.go
satellite/peer.go
scripts/test-sim-backwards.sh
scripts/testdata/satellite-config.yaml.lock
storagenode/inspector/inspector.go
storagenode/peer.go
storagenode/storagenodedb/database.go
Why: Replacing Kademlia

Please describe the tests:
• internal/testplanet/planet_test.go:

TestBasic: assert that the storagenode can check in with the satellite without any errors
TestContact: test that all nodes get inserted into both satellites' overlay cache during testplanet setup
• satellite/contact/contact_test.go:

TestFetchInfo: Tests that the FetchInfo method returns the correct info
• storagenode/contact/contact_test.go:

TestNodeInfoUpdated: tests that the contact chore updates the node information
TestRequestInfoEndpoint: tests that the Request info endpoint returns the correct info
Please describe the performance impact: Node discovery should be at least slightly more performant since each node connects directly to each satellite and no longer needs to wait for bootstrapping. It probably won't be faster in real time on start up since each node waits a random amount of time (less than 1 hr) to initialize its first connection (jitter).
2019-09-19 15:56:34 -04:00
Michal Niewrzal
1c72e80e40 uplink/satellite: fix for case when inline segment is last one (#3062)
* uplink/satellite: fix when inline seg is last one

* review comments
2019-09-19 01:18:14 +02:00
Jennifer Li Johnson
ce3203e910
update NodeSelectionConfig.OnlineWindow to 4hr default (#3082) 2019-09-18 14:57:57 -04:00
Maximillian von Briesen
684b07b2c1
scripts/protobuf.go: update drpc version for protobuf generation (#3059) 2019-09-17 13:14:38 -04:00
Andrew Harding
f550ab5d1c
Uplink "import" command (#2981)
* uplink import cmd

* pkg/process: fix import order

* fix golangci-lint failures

* remove "help" from the satellite config lock file
2019-09-13 12:33:30 -06:00
Ethan Adams
731016cd85
Increase file size limit to 650 KB (#3034) 2019-09-12 13:54:44 -04:00
Ivan Fraixedes
cc8a47324a
scripts: Fix warn message update sat config lock (#3029)
Fix the warning message to indicate the Slack channel where the
satellite configuration changes must be posted.
2019-09-12 18:10:22 +02:00
Kaloyan Raev
208327835f
Script for deploying the Docker manifest for watchtower (#3015) 2019-09-12 17:38:48 +03:00
Natalie Villasana
aa3567187e
satellite/audit: worker now verifies and reverifies (#2965) 2019-09-11 18:37:01 -04:00
Bryan White
6c80f01bf0
pkg/certificates: add authorization endpoint and refactor (#2971) 2019-09-11 10:36:44 +02:00
Egon Elbre
7589ca796f
cmd/storj-sim: allow overriding executables (#2976)
The backward compatibility test upgrades only half of the storage nodes and
tests with both the release and new uplink.
2019-09-09 22:13:38 +03:00