Rather than starting all servers on 127.0.0.1 start them
on a random local host to try avoid port exhaustion.
The port exhaustion is just a guess.
Change-Id: Ibf31d6a017852238d836291d703642b44ff66c0c
For nodes in excluded areas, we don't necessarily want to remove them
from the pointer, but we do want to increase the number of pieces in the
segment in case those excluded area nodes go down. To do that, we
increase the number of pieces repaired by the number of pieces in
excluded areas.
Change-Id: I0424f1bcd7e93f33eb3eeeec79dbada3b3ea1f3a
Move storagnode/console caching headers to private/web. Also,
start using them in multinode/console/server.
Change-Id: I1f0f3c9833a183476009737cece515ae7537fb83
Added new endpoint to get project's single bucket usage rollup.
Extended generation code to handle service method args.
Change-Id: Ief768632a801c047c66e0617056fbd7b30427b33
Added new projectaccounting query to get project's single bucket usage rollup.
Added new service method to call new query.
Added implementation for IsAuthenticated method which is used by new generated API.
Change-Id: I7cde5656d489953b6c7d109f236362eb465fa64a
Added a feture flag which will be used to indicate if new generated console api is used.
Fixed some comments from previous PR.
Change-Id: Ice31c998b0b347028a491c971a648fd1269bfd49
We would like to disable in production those parts of code
which are now mixed with new server-side copy logic.
Change-Id: Iff50682bc9545207330f58dd19b5eee53d404d7f
Currently the metainfo/metabase DB connections are missing the proper
application_name in order to differentiate and filter queries on the DB
side for analytics.
Without it, it is very time-consuming to correlate processes and their load.
This change adds the "check" on DB connection init and passes the fallbacks
in all places to catch connection strings, that do not set it.
Change-Id: Iea5cea8658bc63778ff89038e5c1c352bf482cfd
The existing implementation doesn't send proper rfc1341 compatible mails
according to https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html as
the closing `--boundary_id--` is missing from our mails. (see 7.2.1 from
the standard).
Mailservers which are not very flexible, deny the mails.
Example mail log: (see the
missing boundary delimiter at the end).
```
Subject: Activate your email
From: "Storj DCS - EU1" <noreply@eu1.storj.io>
To: "..." <...>
...
--26d7220f6f1c9f6fb47b535319eb15dce513bb6e1d941a0efddf25e96712
Content-Type: text/html; charset=UTF-8
....
</body>
</html>
.
smtp: DATA error {"msg_id":"6d82c9e3","reason":"unexpected EOF"}
smtp: 554 5.0.0 Internal server error (msg ID = 6d82c9e3)
```
This patch moves all the Multipart writing to a function to make sure
that the `wr.Close()` (which writes out the last part) is executed
BEFORE `body.Bytes()` (defer added it AFTER `body.Bytes()` was calculated)
Change-Id: I8f18fc81b1857b646470eab32e73d6cbdc50d2ad
Users signing up through a url containing a promo code will have that code applied to their stripe account instead of the free tier coupon.
Change-Id: I071041b0934648ef3f5bdb05b6ec97c400f89ae4
This change includes storagenode QUIC status on SNO dashboard.
If disabled, it displays warning for SNOs to foward their
UDP port for quic.
Change-Id: I8d28c9c0f5f1e90d80b7c18b9e1e7b78c5e45609
The main motivation is to wrap the bucket DB and metainfo DB, so we
could check if a bucket is empty before applying geofencing config.
Change-Id: I8bac21555e01d51a663fb557bc1acfc8106bc2e1
* Remove "enable-logging", because it ends up spawning consoles on
Windows.
* Remove "disable-gpu", if we have a GPU, then let's use it.
* Create custom client, so we can attach logging to CDP.
* Ignore potential context.Canceled.
* Fix onboarding wizard test for new objects browser.
* Return an error on a context cancellation.
* Wire all loggers to planet.
Change-Id: I67eb138ba31252f55ac5b383679d033bcf71f1b2
Multipart upload limits added. Last part has no size limit.
Max number of parts: 10000, min part size: 5 MiB
Change-Id: Ic2262ce25f989b34d92f662bde720d4c4d0dc93d
When an email is verified, insert an auth cookie so that when the user
is redirected after verifying their email, they are immediately taken to
the onboarding flow.
Change-Id: I557d8a2805b24dd8039ada255522bc1b56cc8b53
Testplanet does not have a multinode instance, hence,
makes it difficult to run ui tests for the multinode dashboard.
Instead of using storj-sim for multinode ui tests,
it's better to use the same approach (testplanet) fir all UI tests.
This changes adds a multinode instance to testplanet.
Change-Id: I58aa8c4597e789275f9e7ea7059703c742903492
Removes database tables and functionality related to our custom
coupon implementation because it has been superseded by the Stripe
coupon and promo code system. Requires implementations of the
payments Invoices interface to return coupon usages along with
invoices.
Change-Id: Iac52d2ff64afca8cc4dbb2d1f20e6ad4b39ddfde
This allows to set `STORJ_TEST_MONKIT` to either
`svg` or `json` to write individual testplanet test
traces to disk.
It also allows to specify an absolute directory:
STORJ_TEST_MONKIT=svg,dir:/abs/dir/path
It requires an absolute path, because from the context of
tests, there's no easy way to find the folder where tests
were called.
Change-Id: I6fe008a4d4237d221cf5a5bede798b46399ee197
Added tests for signup with invalid email or password.
Added test for signup screen's satellites dropdown.
Change-Id: I76d975039543e315b3e9c9416e3ec1f2a3331a6a
Currently TextMaxVerifyCount flakes in some tests, try increasing the
sleep time to ensure that things are slow enough to trigger the error
condition.
Also pass ctx to all the funcs so we can handle sleep better.
Change-Id: I605b6ea8b14a0a66d81a605ce3251f57a1669c00
At some point we moved metabase package outside Metainfo
but we didn't do that for satellite structure. This change
refactors only tests.
When uplink will be adjusted we can remove old entries in
Metainfo struct.
Change-Id: I2b66ed29f539b0ec0f490cad42c72840e0351bcb
Rate limits application of coupon codes by user ID to prevent
brute forcing. Refactors the rate limiter to allow limiting based
on arbitrary criteria and not just by IP.
Change-Id: I99d6749bd5b5e47d7e1aeb0314e363a8e7259dba
Currently, reputation table is only populated when a node has been
audited. This is ok in production, however a lot of our tests doesn't
upload any data or trigger audits.
This PR adds an initialization step in testplanet to populate reputation
table with zero value for nodes reputation.
Change-Id: I11b381236669db346dc68a48a6d4a27334a0a8b8
package in audit
This PR implements reputation store and replace overlay in audit service
to use such store for storing node's audit stats.
In order to keep the changeset smaller, most of the changes in this PR is for copying audit logic in overlay to
reputation package. In a following PR, the duplicating code will be
removed from overlay.
Change-Id: I16c12494a0970f44c422b26cf603c1dc489e5bc1
Added MFA passcode and recovery code field for token requests.
Added endpoints for MFA-related activity: enabling MFA,
disabling MFA, generating a new MFA secret key, and
generating new MFA recovery codes.
Change-Id: Ia1443f05d3a2fecaa7f170f56d73c7a4e9b69ad5
Current tally is calculating storage both for buckets and
storage nodes. This change is moving nodes storage
calculation to separate service that will be using
segment loop.
Change-Id: I9e68bfa0bc751c82ff738c71ca58d311f257bd8d
The user must complete a reCAPTCHA in order to register.
ReCAPTCHA verification failure results in rejection of the
registration attempt.
Change-Id: I34ba7db414d756fd1aaebdc3d19cccbfc7fc1ea3
This is part of metaloop refactoring. We plan to remove
irreparable at some point but there was not time for it.
Now instead refatoring it for segmentloop its just easier
to drop it.
Later we still need to drop table with migration step.
Change-Id: I270e77f119273d39a1ecdcf5e1c37a5662a29ab4
Currently we did not limit the "as of system time" for iterating over
objects table. Using just an interval would cause problems with the
tests. That could be overcome skipping that interval for tests
altogether, however, we should probably test those more to ensure that
GC stays working as intended.
This is a safer code, however, maybe not as straigthforward as it could
be.
Change-Id: I374f77783b2af42bb6da846735ceea20a7ce5e60
Satellites set their configuration values to default values using
cfgstruct, however, it turns out our tests don't test these values
at all! Instead, they have a completely separate definition system
that is easy to forget about.
As is to be expected, these values have drifted, and it appears
in a few cases test planet is testing unreasonable values that we
won't see in production, or perhaps worse, features enabled in
production were missed and weren't enabled in testplanet.
This change makes it so all values are configured the same,
systematic way, so it's easy to see when test values are different
than dev values or release values, and it's less hard to forget
to enable features in testplanet.
In terms of reviewing, this change should be actually fairly
easy to review, considering private/testplanet/satellite.go keeps
the current config system and the new one and confirms that they
result in identical configurations, so you can be certain that
nothing was missed and the config is all correct.
You can also check the config lock to see what actual config
values changed.
Change-Id: I6715d0794887f577e21742afcf56fd2b9d12170e
We want to move some of current metainfo loop observers to
segment loop. This change adds new service, similar to metainfo
loop but which is iterating only over segments.
Change-Id: I67f7f461781723a4476e2b83377f31736d7c4870
When services take long to shutdown it's useful to get a stack trace for
diagnosing the underlying problem.
Change-Id: Ic73a45741dfbe8fdddafd56a5b72121da886d133
Add test with NotBefore and NotAfter restricted permission to verify that we don't have an access to bucket
Change-Id: I7ec98a5b02c0098ee7ec81034278398f4435f1cf
Currently the interface is not useful. When we need to vary the
implementation for testing purposes we can introduce a local interface
for the service/chore that needs it, rather than using the large api.
Unfortunately, this requires adding a cleanup callback for tests, there
might be a better solution to this problem.
Change-Id: I079fe4dbe297b0ae08c10081a1cea4dfbc277682
Initially metabase was developed separately and it was useful to have a
separate environment flag for tests, however, it's more convenient to
use the same as rest of the testsuite.
Change-Id: Ia4d79be27ce5911cbae68d57cdf0b30f63459444
Use the 'AS OF SYSTEM TIME' Cockroach DB clause for the Graceful Exit
(a.k.a GE) queries that count the delete the GE queue items of nodes
which have already exited the network.
Split the subquery used for deleting all the transfer queue items of
nodes which has exited when CRDB is used and batch the queries because
CRDB struggles when executing in a single query unlike Postgres.
The new test which has been added to this commit to verify the CRDB
batch logic for deleting all the transfer queue items of the exited
nodes has raised that the Enqueue method has to run in baches when CRDB
is used otherwise CRDB has return the error "driver: bad connection"
when a big a amount of items are passed to be enqueued. This error
didn't happen with the current test implementation it was with an
initial one that it was creating a big amount of exited nodes and
transfer queue items for those nodes.
Change-Id: I6a099cdbc515a240596bc93141fea3182c2e50a9
errs.Class should not contain "error" in the name, since that causes a
lot of stutter in the error logs. As an example a log line could end up
looking like:
ERROR node stats service error: satellitedbs error: node stats database error: no rows
Whereas something like:
ERROR nodestats service: satellitedbs: nodestatsdb: no rows
Would contain all the necessary information without the stutter.
Change-Id: I7b7cb7e592ebab4bcfadc1eef11122584d2b20e0
Initially there were pkg and private packages, however for all practical
purposes there's no significant difference between them. It's clearer to
have a single private package - and when we do get a specific
abstraction that needs to be reused, we can move it to storj.io/common
or storj.io/private.
Change-Id: Ibc2036e67f312f5d63cb4a97f5a92e38ae413aa5
Initially we duplicated the code to avoid large scale changes to
the packages. Now we are past metainfo refactor we can remove the
duplication.
Change-Id: I9d0b2756cc6e2a2f4d576afa408a15273a7e1cef
Currently the loop handling is heavily related to the metabase rather
than metainfo.
metainfo over time has become related to the "public API" for accessing
the metabase data.
Currently updates monkit.lock, because monkit monitoring does not handle
ScopeNamed correctly. Needs a followup change to monitoring check.
Change-Id: Ie50519991d718dfb872ec9a0176a82e732c97584
* Add a nullable billing_periods column in the coupons table
* Add nullable billing_periods column to the currently unused
coupon_codes table
* Drop the duration column from the coupon_codes table
* Replace duration config type so that the default promotional coupon
can be configured to never expire
Zero downtime migration plan:
* Add billing_periods column to coupons and coupon_codes tables (this change)
* After one release, remove all references to the old duration column,
replacing with references to billing_periods. At this point, we can also
change the defult promotional coupon to never expire and migrate over
values from the old duration column.
* After another release, drop the duration column.
Change-Id: I374e8dc9fab9f81b4a5bc681771955662d4c007a
Rename the functions that are prefixed with 'New' which connect with
Redis by 'Open' to make clear that they perform network operations.
Change-Id: I1351e89a642e8e2c2586626646315ad0fb2c6242
This is one step for implementing the free tier:
* Change the default project limit from 10 to 3
* Move storage and bandwidth project usage limits from the metainfo
package to the console package (otherwise there is a cyclical
dependency, and metainfo doesn't use these values anyway)
* Change the default storage usage limit per project from 500gb to 50gb
* Change the default bandwidth usage limit per project from 500gb to 50gb
* Migrate the database so that old users and projects continue to have
the old defaults (10 projects/500gb usage)
Change-Id: Ice9ee6a738bc6410da18c336c672d3fcd0cab1b9
We would like to log Node IDs and last contact successes of nodes DQd
in this manner. We would also like to avoid returning an unbounded list
of items from the db. Therefore we change the query to select a limited
number of nodes that meet the DQ conditions and iterate until 0 rows are
returned. Each column of the query is already indexed.
Change-Id: Iaec2d9b56e7202b7c2028ba21750d40c8dd506ee
Update the Redis dependency to use the last major production version.
The last version accepts a context parameter in all the network methods
so it allows us to pass it through them.
Change-Id: I34121b2ec3c2728602115c724933ad24c9e6e4fd
Migration step 148 will cause errors because we missed some
references to the columns being dropped. Removing the step
altogether causes problems with backwards compatibility tests
because the change already exists in the latest release tag.
To circumvent, we change v148 to an empty migration.
Add methods FindTable and RemoveColumn in private/dbutil/dbschema
Change-Id: Ia527e95b88a88c5dc82800928ce6f8cfb879e334
Move a specific interface & types used for testing to be a private
subpackage with a name that clearly identifies it for testing purpose.
Change-Id: I646cf3b6f0a3b518a6f9a125998dc5a02df02db6
Pregenerate the database schema we should use for most tests.
Currently, Cockroach is slow with regards to migration and it's
better if it happens in as few transactions as possible.
This reduces test time from ~21min to ~15min.
Change-Id: Ife8117053e6b9ecf3c93fe63677edf15d4d7c254
In ec client in uplink we have two methods Put and PutSingleResult. Logic is the same but result for PutSingleResult is combined into single var. We would like to remove unused Put method but to do this it needs to be replaced in test.
Change-Id: Ia65eff3ecc9e68d5b3180b557ea82fa31d3c969c
Trackers check whether rows, transactions and statments are properly
closed when tests finish. Stmt wasn't closing the tracker, which
ended up causing false-positives.
Change-Id: I5ac57bc1579f903f19283b85a1996c06ce87741b
This PR removes all back-end related referral program code including the
marketing portal.
We will have a separate PR for front-end code and database migration to
drop `offers` and `usercredits` table
Change-Id: If59f952cddfe0558a7dc03a0eac7cc1081517f88
Adds checks in the existing NeedsRetry cockroachutil method
to see if the error is from the net package and a tcp error,
in which case it should retry. Also adds checks for if the
error contains EOF-- we don't retry in this case because it's
unclear if the query succeeded or failed.
These checks are needed to prevent our app from crashing when
it gets "connection reset by peer" and other tcp errors that
we've seen frequently coming from the metainfo loop.
Change-Id: I194d4b120082393cb6dbda2dd86b44f4696a66a4
allow disabling tcp/quic
In order to have more control of a server so that we can
simulate connection failures in `testplanet`, this PR changes
quic.Listener to accept an existing UDPConn instead of relying on the
quic-go library to create the UDPConn.
This PR also adds two flags on the `server.Config` struct to allow
enabling/disabling tcp/tls listener and quic listener. By default, they
are both set to true.
- `DisableTCPTLS`: internal flag, disables tcp/tls listener.
- `DisableQUIC`: hidden flag, disables quic listener
By making the `DisableQUIC` a hidden flag, it allows storagenode operators to
have the ability to disable quic traffic in case their set up can't work
with udp traffic.
Change-Id: I853b12435d988b9c41ad9b873fd57480d792e378
Delete satellite order methods and DB tables which aren't used anymore
after we have done a refactoring on the orders to stuck bucket
information in the orders' encrypted metadata.
There are also configuration parameters and a satellite chore that
aren't needed anymore after the orders refactoring.
Change-Id: Ida3682b95921df70792284b42c96d2508bf8ca9c
The rollup archiver chore moves bucket bandwidth rollups and
storagenode rollups that are older than a given duration
to two new archive tables.
Change-Id: I1626a3742ad4271bc744fbcefa6355a29d49c6a5
This PR introduces a new listener that can listen for quic traffic on
both storagenodes and satellites.
Change-Id: I5eb5bc82c37dde20d3be2ec8fa5f69c18fae0af0
We wanto have single uplink branch for standard and multipart-upload satellite but some tests are using helper methods from multipart. This change adds methods used by uplink test.
Change-Id: I82352ed56674ff7e8743b58061ba594018e78e3b
On servers with non-UTC it would have calculated a different month boundary.
If node joined in current month calculations will be related on amount of days node've been working.
Change-Id: Ie572b197f50c6cdff5a044a53dfb5b9138f82f24
Full scope:
private/testplanet,satellite/{overlay,satellitedb}
Description:
In most cases, downtime tracking with audits will eventually lead
to DQ for nodes who are unresponsive. However, if a stray node has no
pieces, it will not be audited and will thus never be disqualified.
This chore will check for nodes who have not successfully been contacted
in some set time and DQ them.
There are some new flags for toggling DQ of stray nodes and the timeframes
for running the chore and how long nodes can go without contact.
Change-Id: Ic9d41fdbf214736798925e728245180fb3c55615
Avoid using project uuid string representation, because
it uses more bandwidth.
This reduces the encrypted metadata size from 118 -> 97 bytes.
Change-Id: Ic53a81b83acc065f24f28cd404f9c0b1fe592594
Since the Satellite now requires the order encryption functionality (since serial_number table is deprecated) to properly function, we can remove the config flag to turn on/off the feature.
Change-Id: Ie973f72a9a05a81cef9e53dc9c99d22c940c2488
We are no longer planning on implementing downtime penalization using
the method described in
docs/blueprints/archive/storage-node-downtime-tracking-deprecated.md.
Now, we are implementing the design described in
docs/blueprints/storage-node-downtime-tracking-with-audits.md.
This change removes the downtime estimation chores from the satellite
core as well as the package satellite/downtime. A future change will
remove the database table.
Change-Id: I1a1d3cf9dceeba36255d25243294865b89925518
We want to stop using the serial_numbers table in satelliteDB. One of the last places using the serial_numbers table is when storagenodes settle orders, we look up the bucket name and project ID from the serial number from the serial_numbers table.
Now that we have support to add encrypted metadata into the OrderLimit, this PR makes use of that and now attempts to read the project ID and bucket name from the encrypted orderLimit metadata instead of from the serial_numbers table. For backwards compatibility and to ensure no errors, we will still fallback to the old way of getting that info from the serial_numbers table, but this will be removed in the next release as long as there are no errors.
All processes that create orderLimits must have an orders.encryption-keys set. The services that create orderLimits (and thus need to encrypt the order metadata) are the satellite apiProcess, the repair process, audit service (core process), and graceful exit (core process). Only the satellite api process decrypts the order metadata when storagenodes settle orders. This means that the same encryption key needs to be provided in the config for the satellite api process, repair process, and the core process like so:
orders.include-encrypted-metadata=true
orders.encryption-keys="<"encryptionKeyID>=<encryptionKey>"
Change-Id: Ie2c037971713d6fbf69d697bfad7f8b672eedd66
the immediate need is to be able to move the repair queue back out
of cockroach if we can't save it.
Change-Id: If26001a4e6804f6bb8713b4aee7e4fd6254dc326
CRDB doesn't like large deletes. While testing in the POC environment we found that deletes on the serial_numbers table could take hours. This change limits deletes to 1000 at a time (configurable) to avoid blocking other queries.
Change-Id: I08455e25db1574579dd4d7b7125a08e9c913dff1
Previously, we created a new file to use for directory verification
every time the storage node starts. This is not helpful if the storage node
points to the wrong directory when restarting. Now we will only create the file
on setup. Now the file should be created only once and will be verified at
runtime.
Change-Id: Id529f681469138d368e5ea3c63159befe62b1a5b
Make metainfo.RSConfig a valid pflag config value. This allows us to
configure the RSConfig as a string like k/m/o/n-shareSize, which makes
having multiple supported RS schemes easier in the future.
RS-related config values that are no longer needed have been removed
(MinTotalThreshold, MaxTotalThreshold, MaxBufferMem, Verify).
Change-Id: I0178ae467dcf4375c504e7202f31443d627c15e1
Currently we were picking databases randomly for testing,
however a round-robin picking might have more predictable
behavior and cause less cockroach timeouts.
Change-Id: I74ac0d5b38c89452d3c46d3811330e46e7449514
Use tagsql.DB pointer as step database, to propagate changes
back and forth between actual database and migration.
Adds CreateDB operation to the migration step to be able to
create new dbs before executing migration action.
Adjusts storagenode database migration to use inner tagsql.DB
pointer of each database as step.DB.
Adjusts satellite dabase migration, adds proxy migrationDB field
to satellite db that wraps itself as tagsql.DB, pointer of which
is used as step.DB.
Change-Id: Ifed4de5b01a356cf7b37db64d2eaeb7b61982c5c
To avoid further name collisions, the very broad named package gets moved into
the consoleauth package where its also mainly being used.
Change-Id: Ie563c9700adbf0553baca2b7b8ba4a1d9c29d144
Currently Cockroach migration test is the most heavy with regards to
schema changes. This causes other tests to time out. This adds an
alternate cockroach instance that is used for migration tests.
Change-Id: I01fe9313527ff002f0bb0914dd52c3645b8eaf6d
This PR adds the following items:
1) an in-memory read-only cache thats stores project limit info for projectIDs
This cache is stored in-memory since this is expected to be a small amount of data. In this implementation we are only storing in the cache projects that have been accessed. Currently for the largest Satellite (eu-west) there is about 4500 total projects. So storing the storage limit (int64) and the bandwidth limit (int64), this would end up being about 200kb (including the 32 byte project ID) if all 4500 projectIDs were in the cache. So this all fits in memory for the time being. At some point it may not as usage grows, but that seems years out.
The cache is a read only cache. When requests come in to upload/download a file, we will read from the cache what the current limits are for that project. If the cache does not contain the projectID, it will get the info from the database (satellitedb project table), then add it to the cache.
The only time the values in the cache are modified is when either a) the project ID is not in the cache, or b) the item in the cache has expired (default 10mins), then the data gets refreshed out of the database. This occurs by default every 10 mins. This means that if we update the usage limits in the database, that change might not show up in the cache for 10 mins which mean it will not be reflected to limit end users uploading/downloading files for that time period..
Change-Id: I3fd7056cf963676009834fcbcf9c4a0922ca4a8f
Additionally, this PR changes NewNodeFraction devDefault and testplanet config from 0.05 to 1.
This is because many tests relied on selecting nodes that were reputable based on audit and uptime
counts of 0, in effect, selecting new nodes as reputable ones.
However, since reputation is now indicated by a vetted_at db field that is explicitly set
rather than implied by audit and uptime counts, it would be more complicated to try to
update all of the nodes' reputations before selecting nodes for tests.
Now we just allow all test nodes to be new if needed.
Change-Id: Ib9531be77408662315b948fd029cee925ed2ca1d
periodically create and delete a temp file in the storage directory
to verify writability. If this check fails, shut the node down.
Change-Id: I433e3a8d1d775fc779ae78e7cf3144a05ffd0574
* The audit worker wants to get items from the queue and process them.
* The audit chore wants to create new queues and swap them in when the
old queue has been processed.
This change adds a "Queues" struct which handles the concurrency
issues around the worker fetching a queue and the chore swapping a new
queue in. It simplifies the logic of the "Queue" struct to its bare
bones, so that it behaves like a normal queue with no need to understand
the details of swapping and worker/chore interactions.
Change-Id: Ic3689ede97a528e7590e98338cedddfa51794e1b