Currently the interface is not useful. When we need to vary the
implementation for testing purposes we can introduce a local interface
for the service/chore that needs it, rather than using the large api.
Unfortunately, this requires adding a cleanup callback for tests, there
might be a better solution to this problem.
Change-Id: I079fe4dbe297b0ae08c10081a1cea4dfbc277682
Initially metabase was developed separately and it was useful to have a
separate environment flag for tests, however, it's more convenient to
use the same as rest of the testsuite.
Change-Id: Ia4d79be27ce5911cbae68d57cdf0b30f63459444
Use the 'AS OF SYSTEM TIME' Cockroach DB clause for the Graceful Exit
(a.k.a GE) queries that count the delete the GE queue items of nodes
which have already exited the network.
Split the subquery used for deleting all the transfer queue items of
nodes which has exited when CRDB is used and batch the queries because
CRDB struggles when executing in a single query unlike Postgres.
The new test which has been added to this commit to verify the CRDB
batch logic for deleting all the transfer queue items of the exited
nodes has raised that the Enqueue method has to run in baches when CRDB
is used otherwise CRDB has return the error "driver: bad connection"
when a big a amount of items are passed to be enqueued. This error
didn't happen with the current test implementation it was with an
initial one that it was creating a big amount of exited nodes and
transfer queue items for those nodes.
Change-Id: I6a099cdbc515a240596bc93141fea3182c2e50a9
multiregion satellites have complex database connection strings
largely due to using a different backend for the repair queue than
cockroach.
billing stuff didn't work right with this.
Change-Id: Ie8759a8c47e71347c3a190abfc9d53945d7b8855
errs.Class should not contain "error" in the name, since that causes a
lot of stutter in the error logs. As an example a log line could end up
looking like:
ERROR node stats service error: satellitedbs error: node stats database error: no rows
Whereas something like:
ERROR nodestats service: satellitedbs: nodestatsdb: no rows
Would contain all the necessary information without the stutter.
Change-Id: I7b7cb7e592ebab4bcfadc1eef11122584d2b20e0
Initially there were pkg and private packages, however for all practical
purposes there's no significant difference between them. It's clearer to
have a single private package - and when we do get a specific
abstraction that needs to be reused, we can move it to storj.io/common
or storj.io/private.
Change-Id: Ibc2036e67f312f5d63cb4a97f5a92e38ae413aa5
cache is really common variable and type name and we have already used
the package name alias in multiple places.
Change-Id: I6435785b7549b541d533de59ec94557b9bd11e04
Initially we duplicated the code to avoid large scale changes to
the packages. Now we are past metainfo refactor we can remove the
duplication.
Change-Id: I9d0b2756cc6e2a2f4d576afa408a15273a7e1cef
We need some chores to join without triggering the loop.
For example it's fine to run metrics, only when something else is
running.
Change-Id: I9d8bd16f59c28c540c8d72971bc4e233a8660c02
Currently the tool verifies:
* validity of plain_offset
* whether plain_size is smaller than encrypted_size
Change-Id: I9ec4fb5ead3356a196392c26ca377fcdb367138e
This tool attempts to make TCP+TLS+DRPC and QUIC+DRPC connections to a
specified IP:port. If successful with either, it shows the node ID of
the opposite end of the connection.
This should be useful for SNOs who want to check that their packet
forwarding is working or that their node is listening for both
connection types.
Change-Id: I935dff037dd7f1106941a35567f6445230259d1a
Currently the loop handling is heavily related to the metabase rather
than metainfo.
metainfo over time has become related to the "public API" for accessing
the metabase data.
Currently updates monkit.lock, because monkit monitoring does not handle
ScopeNamed correctly. Needs a followup change to monitoring check.
Change-Id: Ie50519991d718dfb872ec9a0176a82e732c97584
Get storagenode and storagenode-updater binaries during
run of the container to not to release new docker image
on each new version of the storagenode binary.
Change-Id: Ic0eb4a9c18a98598dfd9b96c1d352c7399496fd2
metabase has become a central concept and it's more suitable for it to
be directly nested under satellite rather than being part of metainfo.
metainfo is going to be the "endpoint" logic for handling requests.
Change-Id: I53770d6761ac1e9a1283b5aa68f471b21e784198
Currently os.Create was leaving a file open causing atomic write file to
fail with access denied.
Also add a specific test for importing an access.
Change-Id: Id188bc480e795849ec7fdc72b1fc86433d76c47a
Check that the bloom filter creation date is earlier than the
metainfo loop system time used for db scanning.
Change-Id: Ib0f47c124f5651deae0fd7e7996abcdcaac98fb4
We already merged the multipart-upload branch to main. These two tools
make only sense if we are migrating a satellite from Pointer DB to
Metabase. There is one remaining satellite to migrate, but these tools
should be used from the respective release branch instead of from main.
Removing these tools from main will:
1) Avoid the mistake to use them from the main branch instead of from
the respective release branch.
2) Allow to finally remove any code related to the old Pointer DB.
Change-Id: Ied66098c5d0b8fefeb5d6e92b5e0ef5c6603df5d
We recently added create_at column to segments table.
Old segments needs to get this value from objects table.
This tool will iterate over all objects and update corresponding
segments if create_at column is not set.
Change-Id: Ib5aedc384637e739ee9af84454af0639e2559416
* Add a nullable billing_periods column in the coupons table
* Add nullable billing_periods column to the currently unused
coupon_codes table
* Drop the duration column from the coupon_codes table
* Replace duration config type so that the default promotional coupon
can be configured to never expire
Zero downtime migration plan:
* Add billing_periods column to coupons and coupon_codes tables (this change)
* After one release, remove all references to the old duration column,
replacing with references to billing_periods. At this point, we can also
change the defult promotional coupon to never expire and migrate over
values from the old duration column.
* After another release, drop the duration column.
Change-Id: I374e8dc9fab9f81b4a5bc681771955662d4c007a
Test had two issues:
* sub test was using parent test to fail (ctx.Check)
* pointerDB and metabaseDB were using the same unique DB and pointerDB was closing/deleting it first
Change-Id: I5741b76d518663d80c4e6448c76e4ee9dd86c8e1
instead of only generating invoices for nodes that had some
activity, we generate it for every node so that we can find
and pay terminal nodes that did not meet thresholds before
we recognized them as terminal.
Change-Id: Ibb3433e1b35f1ddcfbe292c034238c9fa1b66c44
Rename the functions that are prefixed with 'New' which connect with
Redis by 'Open' to make clear that they perform network operations.
Change-Id: I1351e89a642e8e2c2586626646315ad0fb2c6242
Uplink tests are running storj/storj tests only against postgres. Without this change integration test on uplink will fail as 'omit' is not supported in migrator tests code.
Change-Id: Ic72406f52439e98683d050508fb42aa41632e183
Update the Redis dependency to use the last major production version.
The last version accepts a context parameter in all the network methods
so it allows us to pass it through them.
Change-Id: I34121b2ec3c2728602115c724933ad24c9e6e4fd
Old pointerdbs can have key with just project id, segment index and bucket, without object key. We need to ignore such keys.
Change-Id: I80a466a94e317a229da236fe6bc9e762e6f7ced6
We can have objects with missing segments. Such objects will be removed by segment reaper but between executions we can have it. We should not interrupt migration but collect such objects to cleanup migrated DB later.
Change-Id: I5cc4a66395c1773a6430f34cc25a0f2449133f80
The new 'consistency ge-cleanup-orphaned-data' cli command deleted
orphaned transfer queue items, but not entries in the
graceful_exit_progress table. This will delete orphaned entries
from the exit progress table too.
Change-Id: I5f927aac1f258490678deaf179be92ccfe10fcd8
should be
https://link.tardigradeshare.io/s/<access>/<bucket>/<path>
legacy URLs have the /s/ missing but a redirect is issued
Change-Id: Ic2a3dc092ff68d7706fd888a9fbfc27716877c6e
This PR removes all back-end related referral program code including the
marketing portal.
We will have a separate PR for front-end code and database migration to
drop `offers` and `usercredits` table
Change-Id: If59f952cddfe0558a7dc03a0eac7cc1081517f88
Delete satellite order methods and DB tables which aren't used anymore
after we have done a refactoring on the orders to stuck bucket
information in the orders' encrypted metadata.
There are also configuration parameters and a satellite chore that
aren't needed anymore after the orders refactoring.
Change-Id: Ida3682b95921df70792284b42c96d2508bf8ca9c
Add a command to the satellite for cleaning up the Graceful Exit (a.k.a
GE) transfer queue items of nodes that have exited.
The commit adds to the GE satellite DB a couple of new methods, and its
corresponding test, for performing the operations of the new command.
Change-Id: I29a572a59689d63b24990ac13c52e76d65aaa917
Provide a clearer error message to users who confuse the API Key with
the Access Grant and suggest the right command to them.
Change-Id: If73ae8cde140b68a19f4cfc3f59bb88a3b74c9c1
Currently first satellite GC would've conflicted with second satellites
public RPC port. Instead assign "satellite workers" a new peer class.
Change-Id: Id6bdaa17243556482e88da708c5147149788f6be
In the past we were storing fixed segment size with StreamInfo, encrypted in metadata. The value was unencrypted size of segment, not encrypted one.
Change-Id: Id6b18440c674223eabbb152b1636c83e1ab6462c
We add a timeout for the http client used to register the access with
the auth service. We have a hard-coded common default for now.
Change-Id: I50207ad83c9221b7cb61f39310e24b140b95673b
Allow the satellite commands which uses the live accounting cache (core
and API) to run when at the time that its instantiated there is an error
connecting to the backend.
This prevent that if live accounting backend is down we can run these
services because:
1. The services must run despite of the cache backend being down
although it may be degraded.
2. We may need to start new replicas of the services or the services in
a different place while we are troubleshooting and fixing the cache
backend system.
3. Our services may restart when the cache backend or the network
connecting to it fails momentarily.
Change-Id: Ic93f9571bc0865c9488d64ab1356376fae797efc
Jens noticed that 'uplink access register' wasn't working with named
accesses.
This was because GetNamedAccess was harcoded to use inspectCfg, which
in the case of 'uplink access register' wasn't being bound to the
config file.
Change-Id: I49403b45af28ad33408cfc5ec6545a395f0f080d
Fail all the processes immediately when one of the processes fails. This
is to make it more obvious that one of them has failed.
To disable failfast, use `-failfast=false`.
Change-Id: I2bbedf12fb653e42739d00273aa9ae515d34eda6
Previously, we were trying to overwrite accesses, which is a nested map
in the uplink config, by calling viper.MergeWithConfig with a nested
map. While this works for keys that don't exist already, it does not
overwrite already existing keys. In order to do that, we need to call
MergeWithConfig with "accesses.<accessname> -> value" rather than using
"accesses -> <accessname> -> value".
Change-Id: I74d7a9decf2078cdf2ff440eaf24821e30474b53
The current uplink access register method has the ability to write to AWS credential files.
This is caused issues with repeat usage in recent AWS CLI code, and there was concern that
it was an unstable solution. This version instead specifies various output formats "env"
and "aws". "env" formats the text so that it can be used with 'export'. "Aws" generates
"aws configure" commands to persist the credentials to the AWS credential files as the
previous version could.
Example usages:
Setting ephemeral evironment variables in bash:
export $(uplink access register $(storj-sim network env GATEWAY_0_ACCESS)
--auth-service http://localhost:8000 --format env)
Setting persistant configs via AWS CLI in bash:
source <(uplink access register $(storj-sim network env GATEWAY_0_ACCESS)
--auth-service http://localhost:8000 --format aws --aws-profile storjsim)
Change-Id: I5d78d6462a3537780af3717a298bb2bebf9c2799
Gateway-MT requires integration tests, which would be aided by having an
exported RegisterAccess() method in uplink/cmd.
To support this change, a little of the Uplink cmd logic was shifted around
and a method was made public. I also normalized finding the access
between accessInspect and accessRegister.
Change-Id: I29369296521c2cc179e27233f5451b95f46109d8
Since the Satellite now requires the order encryption functionality (since serial_number table is deprecated) to properly function, we can remove the config flag to turn on/off the feature.
Change-Id: Ie973f72a9a05a81cef9e53dc9c99d22c940c2488
WHAT:
added brotli compression for wasm files and added copying of those files to static/wasm folder in Dockerfile
WHY:
those files are a part of web worker webpack bundle and I didn't find a way to compress them separately using webpack.
I'm open to any other ideas if they come up
Change-Id: I105cc1582e9816fd9b63052ba48358525c85a164
Currently node id in access grant is '1' and it cannot be parsed to
valid node id. This change update access grant satellite address with
randomly generated node id.
Change-Id: Id1684ac71509bc5a8177b069a914355be3c72d43
and replaces access grant with access
uplink share <path> --> creates access grant
uplink share --register <path> --> registers access grant
uplink share --url <path> --> creates URL, implies register and public
uplink share --dns <hostname> <path> --> creates dns info, implies register and public
Change-Id: I7930c4973a602d3d721ec6f77170f90957dad8c0
It was designed to detect and remove zombie segments in the PointerDB.
This tool should be not relevant with the MetabaseDB anymore.
Change-Id: I112552203b1329a5a659f69a0043eb1f8dadb551
It turns out, that running a docker image build for specific
arches is not possible from amd64 (eg. installing ca-certificates).
Change-Id: I8b8f002b7e532fb4a0c6542d5b573c294c501068
Jeff provided feedback on https://review.dev.storj.io/c/storj/storj/+/3176 after
the changeset was already merged. I attempt to address that feedback here.
Change-Id: Ibc7dba3e4e2c73736042fe4b4ee49ce679ba7f44
We want to stop using the serial_numbers table in satelliteDB. One of the last places using the serial_numbers table is when storagenodes settle orders, we look up the bucket name and project ID from the serial number from the serial_numbers table.
Now that we have support to add encrypted metadata into the OrderLimit, this PR makes use of that and now attempts to read the project ID and bucket name from the encrypted orderLimit metadata instead of from the serial_numbers table. For backwards compatibility and to ensure no errors, we will still fallback to the old way of getting that info from the serial_numbers table, but this will be removed in the next release as long as there are no errors.
All processes that create orderLimits must have an orders.encryption-keys set. The services that create orderLimits (and thus need to encrypt the order metadata) are the satellite apiProcess, the repair process, audit service (core process), and graceful exit (core process). Only the satellite api process decrypts the order metadata when storagenodes settle orders. This means that the same encryption key needs to be provided in the config for the satellite api process, repair process, and the core process like so:
orders.include-encrypted-metadata=true
orders.encryption-keys="<"encryptionKeyID>=<encryptionKey>"
Change-Id: Ie2c037971713d6fbf69d697bfad7f8b672eedd66
Added flag to append a new profile to ~/.aws/credentials using
the provided profile name. This is handy for the AWS CLI, so
you can do things like 'aws configure get aws_access_key_id --profile=me'
Change-Id: I0469a18ca76e078624ed455a06bd7aabd95a1b97
this change tries really hard to never have all of the storage node
rollups in memory at the same time, up until the rollups are actually
getting summed together.
Change-Id: If67f49e7d71106798d996a6850b3e48671bd9e18
Previously uplink register only accepted a fully serialized access grant.
This is kind of annoying, I changed it so that it could also use access names.
Change-Id: If6d4d1baa8d4fb3d87fdedb895d459fa12743f1a
Firstly, this changes the repair functionality to return Canceled errors
when a repair is canceled during the Get phase. Previously, because we
do not track individual errors per piece, this would just show up as a
failure to download enough pieces to repair the segment, which would
cause the segment to be added to the IrreparableDB, which is entirely
unhelpful.
Then, ignore Canceled errors in the return value of the repair worker.
Apparently, when the worker returns an error, that makes Cobra exit the
program with a nonzero exit code, which causes some piece of our
deployment automation to freak out and page people. And when we ask the
repair worker to shut down, "canceled" errors are what we _expect_, not
an error case.
Change-Id: Ia3eb1c60a8d6ec5d09e7cef55dea523be28e8435
For coordinating with other processes it can be useful to wait until
another process is accepting requests on an address.
Change-Id: Id623ed815149f14f9f0344e2f396ab70fc4dec6a
Previously, we created a new file to use for directory verification
every time the storage node starts. This is not helpful if the storage node
points to the wrong directory when restarting. Now we will only create the file
on setup. Now the file should be created only once and will be verified at
runtime.
Change-Id: Id529f681469138d368e5ea3c63159befe62b1a5b
Previously, we ran setup if no config file was found in the expected dir.
However, there may be situations where a previously set up node's files
may be unreachable. In this case, we would prefer to exit with an error
rather than assume this node needs to be initialized.
The solution here is to add a new env variable to call the setup command.
If SETUP == true, the node will setup, but not run. If SETUP != true,
the node will run and not setup.
If a previously set up node runs with SETUP, it will return an error.
If a node runs without an initial SETUP, it will return an error.
Change-Id: Id2c796ec3d43f2add5e5f34fb777a563eae59f2f
This change forward-proofs the satellite selection on uplink setup.
Currently we have hard-coded response values, so if we add satellites in
the future we have to remember to update the switch statement below. In
this change it should work for any number of future satellites.
Change-Id: I3250fe2154dbeb4820efadf49780b20c4b7a3408
This PR fixes below issues:
1. remove concurrent installation for various versions
We were doing this to decrease the amount of execution time the versions test.
However, it's returning incorrect exit code when there's an
installation failure.
Right now, we are only installing two versions of `storj-sim` and
the rest are only doing uplink cli installation. The performance of
this test should be hugely impacted by the setup step now.
2. only remove release settings instead of deleting the entire file
uplink CLI is referrencing `private/version` package. Therefore, we
cannot delete it
3. add back `GATEWAY_0_API_KEY` in storj-sim
In order to set up older version of uplink cli, we need access to
the gate way api key.
Change-Id: Ia3c37c197bd007b6e1f7c2bd71adde42181d46f0
As part of the Metainfo Refactoring, we need to make the Metainfo Loop
working with both the current PointerDB and the new Metabase. Thus, the
Metainfo Loop should pass to the Observer interface more specific Object
and Segment types instead of pb.Pointer.
After this change, there are still a couple of use cases that require
access to the pb.Pointer (hence we have it as a field in the
metainfo.Segment type):
1. Expired Deletion Service
2. Repair Service
It would require additional refactoring in these two services before we
are able to clean this.
Change-Id: Ib3eb6b7507ed89d5ba745ffbb6b37524ef10ed9f
Check binary version on self-update instead of current process
version to prevent updating already updated binary.
Add info logs to report current version of service beeing
updated.
Change-Id: Id22dee188a99d6d45db925104786f49f5d3a61ae
This makes it possible to remove of this obsolete flag from the
multi-tenant gateway.
As a consequence, displaying the GATEWAY_0_ACCESS env var will always
require a running storj-sim. Until now, it was required only the first
time. Then the value was stored in the 'access' config. But this is now
not possible anymore.
The changes in StripeMock are required to fix failures in integration
tests. StripeMock is in-memory and its data does not survive restarts of
storj-sim. The second and following starts of storj-sim had invalid
state of StripeMock, which failed requests that were required to
populate the GATEWAY_0_ACCESS env var. The changes in StripeMock makes
it repopulate the Stripe customers from the database.
Change-Id: I981a208172b76577f12ecdaae485f5ae4ea269bc
log.Fatal immediately terminates the program without running any defers.
We should properly close all the services and databases.
Change-Id: I5e959cef3eafedeacb3a2062e3da47e8d04e8e75
Sadly the build process with this command is very, very flaky and often fails pulling down curl via apk.
As we currently do not need it anyway, it is safe to remove.
Change-Id: I8a396c560d61a7fe6324560152a68c07c6b31638
The VerifyPieceHashes method has a sanity check for the number pieces to
be removed from the pointer after the audit for verifying the piece
hashes.
This sanity check failed when we executed the command on the production
satellites because the Verify command removes Fails and PendingAudits
nodes from the audit report if piece_hashes_verified = false.
A new temporary UsedToVerifyPieceHashes flag is added to
audits.Verifier. It is set to true only by the verify-piece-hashes
command. If the flag is true then the Verify method will always include
Fails and PendingAudits nodes in the report.
Test case is added to cover this use case.
Change-Id: I2c7cb6b12029d52b2fc565365eee0826c3de6ee8
Remove usage of --non-interactive flag. It is not provided (and
necessary) by the multitenant S3 gateway anymore.
ACCESS_KEY and SECRET_KEY env vars are not provided anymore as they are
not generated by the multitenant S3 gateway.
Change-Id: I3ecfb92110e31ae13977de3899dad273daae6c1e
Jira: https://storjlabs.atlassian.net/browse/PG-69
There are a number of segments with piece_hashes_verified = false in
their metadata) on US-Central-1, Europe-West-1, and Asia-East-1
satellites. Most probably, this happened due to a bug we had in the
past. We want to verify them before executing the main migration to
metabase. This would simplify the main migration to metabase with one
less issue to think about.
Change-Id: I8831af1a254c560d45bb87d7104e49abd8242236
Jira: https://storjlabs.atlassian.net/browse/PG-67
There are a number of old-style objects (without the number of segments
in their metadata) on US-Central-1, Europe-West-1, and Asia-East-1
satellites. We want to migrate their metadata to contain the number of
segments before executing the main migration to metabase. This would
simplify the main migration to metabase with one less issue to think
about.
Change-Id: I42497ae0375b5eb972aab08c700048b9a93bb18f
Fix case where line is larger than maxline and writer ended up in an
infinite loop trying to buffer more data.
Change-Id: I243da738b331279d6bf27255778b5798e7f37f95
This PR updates `uplink rb --force` command to use the new libuplink API
`DeleteBucketWithObjects`.
It also updates `DeleteBucket` endpoint to return a specific error
message when a given bucket has concurrent writes while being deleted.
Change-Id: Ic9593d55b0c27b26cd8966dd1bc8cd1e02a6666e
periodically create and delete a temp file in the storage directory
to verify writability. If this check fails, shut the node down.
Change-Id: I433e3a8d1d775fc779ae78e7cf3144a05ffd0574
Jira: https://storjlabs.atlassian.net/browse/USR-822
This the last step of dropping these 2 db tables. It also deletes all
code associate with them.
Change-Id: I8be840dc2a7be255cf6308c9434b729fe4d9391e
Add a config so that some percent of users require credit cards /
account balances
in order to create a project or have a promotional coupon applied
UI was updated to match needed paywall status
At this point we decided not to use a field to store if a user is in an
A/B
test, and instead just use math to see if they're in a test. We decided
to use MD5 (because its in Postgres too) and User UUID for that math.
Change-Id: I0fcd80707dc29afc668632d078e1b5a7a24f3bb3
To prevent storagenode from implicitly recreating missing dbs and storage,
as such behaviour leads to audit failures. Do not allow storagenode to
start if any of dbs or storage is missing, corrupted, or dedicated storage disk is
unmounted, to get downtime instead.
Change-Id: Ic64e1f0ff4d8ef5b2fddbe7a7e53df4f4bd8652e
We passed in revocationDB and metainfoDB for no reason.
Lets remove it from the dependency list to further reduce the footprint.
Change-Id: Ic0317bb92670fbd305d4a8b0ed1cb82858e2f6d3
Jira: https://storjlabs.atlassian.net/browse/USR-821
The `migrate-credits` billing command checks the available credits
balance for all users and moves it to the Stripe balance by creating a
new credit balance transaction.
Change-Id: Iafc7b95a4edad47f7c145a22e210f8c821ac183d
Add a long description to the graceful exit command to clearly mention
that the command is interactive asking which satellite the SNO wants to
exit.
Change-Id: Icd4056a470e707322f600133e63d9dc56eb877b7
When a request comes in on the satellite api and we validate the
macaroon, we now also check if any of the macaroon's tails have been
revoked.
Change-Id: I80ce4312602baf431cfa1b1285f79bed88bb4497
when a user runs `uplink share`, they get a bunch of results back,
given their configuration and existing access. one of the results
is a URL for in-browser sharing and hosting of the file.
first off, we want to make sure this URL is read only. we want to
avoid a situation where someone post this URL to some public
location, not realizing the access allows writes or deletes. if
a user really wants a URL with write/delete access, they can
construct it themselves.
secondly, we want to make sure the url is sharing a single path or
path prefix. having a url for multiple paths/path prefixes
indepedently again can be constructed of course, but should not
be the default behavior
Change-Id: I2ca2ebeea9f1c7d4bfbd7a437a32dc7a3b2a32cc
Reverts https://review.dev.storj.io/c/storj/storj/+/2074
That change broke all billing commands with `sql: database is closed`.
Fixing the issue does not make sense as the whole setup of billing
commands is now error-prone. It's better to revert it.
Change-Id: Id13f817b73c7313ba60181f740b0712e4bdce077
setup
We don't have default ports as a part of configuration anymore because
satellite-addr flag was removed.
Change-Id: Ibf9fc4b399beaf51ebb9461de2d8994a322f9686
This changeset allows a user agent string to be set during uplink setup, which
is thereafter used for partner value attribution. EG
uplink setup --client.user-agent ”MyCompany”
Change-Id: Iefa8755fccc06acb8a303a342b943cece44a81f7
Remove unneeded observer's tests cases after the changes that enhanced
the segment-reaper not to be limited to objects up to 65 segments.
Also remove some tests cases that were already covered by the
TestObserver_processSegment_from_to test function.
Change-Id: I2385b406b50f7810ed22d3297b903a993f83fcfe
Viper has a feature/bug where all YAML config keys are cast
to lowercase (see https://github.com/spf13/viper/issues/260
and https://github.com/spf13/viper/issues/411).
Since we use Viper to load our config values, it doesn't seem
like there's an easy way to preserve case sensitivity for the
access names chosen by users right now. Adding this prompt
should help user experience by clarifying that all access
names must be lowercase.
Change-Id: I47e8344bb0ca7e78458405496f20e78e3c9f9a88
Rename a test function to indicate the function that it actually tests
rather than indicating the function that calls the function under test.
Change-Id: I2c1fe165e3acf952993a9a65dfbfbdcda0d1cccc
After the segment-reaper was enhanced to support objects with any number
of segments, the test helper function to generate test data doesn't have
to support to generate objects with a number of segments up to or over
the previous limit.
The commit remove that unnecessary test logic.
Change-Id: I2897c5a96fb133f61de2cdb0ef7d13ee5e0151e8
This allows to seeing logs in the output of the invoice commands.
Existing ensure-stripe-customer commands is moved from the 'reports' to
the new 'billing' root command.
Change-Id: I752c7ab6ca59bfac8e0f174a45d2ab45fc18e467
able to cover more testing scenarios
Currently, its hard to implement test suite for payments because
mockpayments is on to high level and we cannot emulate many things e.g.
adding credit card. This change is first to be able to add mock for
Stripe client and do more granular tests.
Change-Id: Ied85d4bd0642debdffe1161657c1e475202e9d23
To avoid including multiple months in a single invoice, we need all
inspector's invoice commands to run in for specific period.
See https://storjlabs.atlassian.net/browse/USR-725
Change-Id: I3637dc189234f02350daca8d897c21765762ea55
See https://storjlabs.atlassian.net/browse/SM-752
These changes allow us to change the log level at runtime through a handler off of the debug endpoint.
Examples of changing the log level on storj-sim
To get the current level for the satellite api process:
curl -XGET 'http://127.0.0.1:10009/logging' --header 'Content-Type: text/plain'
To change the log level:
curl -XPUT 'http://127.0.0.1:10009/logging' --header 'Content-Type: text/plain' --data-raw '{"level":"error"}'
Change-Id: I05d164b290929fa06b6d78c01075ee41f8238044
Reorganize some operations in the observer.processSegment method:
* to minimal reduce the memory usage removing the segments of the
objects marked as skipped. Skipped objects aren't discarded of the
analysis stage so the segments aren't needed.
* returns earlier when an object is skipped because isn't needed a
further processing.
Change-Id: I210a26c394477ee411ff7f640507dcc07733a47f
Rename a test name and an error class because the bitmask was renamed to
bitArray in a previous commit, however, they weren't updated.
Change-Id: I92afcf747dec9e0a15d4c59b896e07b942f3518b
The bitArray.IsSequence method had a bug that caused that sometimes the
test "Bitmask/IsSequence/sequence starts at other index than 0" to fail
due to it uses random values.
The test mostly found a corner case, I've added new tests to catch the
bug and then apply the fix.
Change-Id: If01dc07006d261ba40bbd99d36ef776e09ed3efc
Change the bitmask used by segment reaper to use []byte rather than uint64
This passes tests but I have literally no clue how to integration test this.
Change-Id: I393f4598b27cae6e427da2190dd3109bca721c34
Currently uploads can cause a lot of IOPS, reduce this by introducing a
in-memory buffer on-top of the file.
Change-Id: I5f4e3e01c0a36258271d180b922107de447bcb59
CreateTables hasn't been quite true for a while now, rename to
MigrateToLatest to be clearer in it's behavior.
Change-Id: Ida48e95122a5d9b7a814e922d3698e00024a2ba7
when tracing is enabled, we should also set sampling rate to
a non-zero value. For now, we will set it to 1.
Uplink CLI users should be able to override it with the sample
flag.
Change-Id: I8bcf514fb14c2a1c4349b7957dd24ec23e4a85e5
In cases like the segment reaper script connecting to the metainfodb,
we don't want a db migration to happen automatically when we call
metainfo.NewStore. This adds MigrateToLatest method for postgreskv
and cockroackv, and calls MigrateToLatest in places where NewStore used
to create tables.
Change-Id: I682d0f26d609af0601dfdb32a24866cdf5d32a7e
Previously we are using tracing.sampled to be the switch for turning on/off tracing.
However we would like to separate sampling rate from being the switch,
so we can set sampling rate to be 0 but still intialize tracing for
satellite and storagenodes
Change-Id: I27e6ba25ea6f6b612b4e1a57cf1301889ded41ec
We want to make tracing to be opt-in.
For now, we will use `tracing.sample` as the toggle config to enable or
disable tracing and default to sample every traces from uplink cli.
If user wants to change the default sampling rate, they can do so by
using the `--tracing.sample` flag to override the default value
Change-Id: I6f25dac0f43024c50a8aaf6c549e6a514211f834
currently production uses a different application suffix for gc
services, so chronograf can distinguish between gc processes and core
processes, but it'd be nice to be a bit more consistent with repairers
and api servers
Change-Id: Icb96fed006c59d7afd730317d35636a6e4573b58
Currently storj-sim relies on the log lines to be exactly the same,
when they change it cannot find the necessary information from log.
Change-Id: Ia039915ef3375a7cf60f107b2c05c958de15b6d5
uuid.UUID implements driver.Value so it can be directly used as a
scannable result.
Replace uses of dbutil.BytesToUUID with uuid.FromBytes.
Change-Id: I51a670185ceb3cc2199d5aa2b76bc3fc191ca8fe
Previous split to a storj.io/private repository broke tag-release.sh
script. This is the minimal temporary fix to make things work.
This links the build information to specified variables and sets them
inline. This approach, of course, is very fragile.
Change-Id: I73db2305e6c304146e5a14b13f1d917881a7455c
* debug
* traces
* cfgstruct
* process
Package `storj/private/version` will be removed as a separate change.
Change-Id: Iadc40faa782e6225513b28218952f02d9c240a9f
We are not generating identity for gateway during setup anymore, its
generated on-fly by libuplink. With this change we can remove
`--identity-dir` from gateway.
Change-Id: I63298115d4399564b2f29541b8dc16e3b3acdcf2
The admission/v3 protocol now supports arbitrary key/value headers to be
included in each packet of metrics. This commit creates support for
this, so the lua config file can declare a filter taking into account
the key/value headers.
Change-Id: I41de8c018d33304ccf46ec221ae689d55c5fb1ee
Previously, we were simply discarding rows from the repair queue when
they couldn't be repaired (either because the overlay said too many
nodes were down, or because we failed to download enough pieces).
Now, such segments will be put into the irreparableDB for further
and (hopefully) more focused attention.
This change also better differentiates some error cases from Repair()
for monitoring purposes.
Change-Id: I82a52a6da50c948ddd651048e2a39cb4b1e6df5c
New API has limited number of options to configure at the moment. We
should remove unused flags from Uplink CLI and add if needed in the
future.
Change-Id: Icf3f3dadd43cb61a3b408b02d0762aef34425dbf
On satellite, remove all references to free_bandwidth column in nodes table.
On storage node, remove references to AllocatedBandwidth and MinimumBandwidth and mark as deprecated.
Protobuf message, NodeCapacity, is left intact for backwards compatibility.
Once this is released to all satellites, we can drop the column from the DB.
Change-Id: I2ff6c6537fc9008a0c5588e951afea58ede85838
By using a require for storj.io/storj it will make the import
unambiguous. This means it is possible to have a module name
storj.io/storj/cmd/gateway.
Change-Id: I98439cbbaf433ae31309b7f80a19ced896018f65
Now that we are trying to identify the root cause of the satellite load limitations (i.e. currently the satellite has a max ability of 400 rps for uploads and we need this to be higher), we are using the golang diagnostic tools to collect insight into what the bottlenecks are. We currently have a debug endpoint to gather some cpu and mem data, but it could be useful to have continuous profiling. GCP stackdriver has support for continuous profiling so lets set that up and see if it is helpful to gather more data.
This PR adds support for [GCP continuous profiler](https://cloud.google.com/profiler) which allows enabling continuous cpu/mem profiling and the stats are sent to stackdriver in google cloud console.
To enable the continuous profiling for a storj component, do the following:
- prereq: the workload must be running in GKE and have Stackdriver Profiling IAM role permissions
- provide the config flag `debug.profilename` in the config.yaml file for the workload (i.e. satellite api process, etc). The profilename should be the workload name, for example "satellite-api".
- once the above config flag is provided, the profiler will be initialized and profiling stats will automatically be sent to GCP project where the workload is running and viewable in the Stackdriver Profile page in the console
The current implementation assumes the workload is running in GKE, however if we find if useful we can add support to enable this from anywhere. But for simplicity, its configured this way assuming the main goal is to enable in production systems.
Change-Id: Ibf8ebe2df7bf06fdd4951ee6a1e48854dd36ad47
The code to generate monkit.lock has a bug where it doesn't take
ScopeNamed into account and assumes the package. Since the downgrade
file was created from monkit.lock, we also assumed the package, so
we were downgrading to the wrong metric.
No other places call ScopeNamed that would cause a problem.
Change-Id: If9fbbd971a7d755f5de33ed20b8a6bcc95670ee3
This peer will contain our administrative panels.
It's completely separated from our other satellite
processes because it allows better control for restricting
access to it.
Change-Id: Ifca473bee82ff6c680b346918ba32b835a7a6847
In case the endpoint doesn't start, it might end up indefinitely
waiting for it to come up stalling jenkins.
Change-Id: Ib10bf1a25461e7532ec56ca705178bc9a7f85d12
this commit updates our monkit dependency to the v3 version where
it outputs in an influx style. this makes discovery much easier
as many tools are built to look at it this way.
graphite and rothko will suffer some due to no longer being a tree
based on dots. hopefully time will exist to update rothko to
index based on the new metric format.
it adds an influx output for the statreceiver so that we can
write to influxdb v1 or v2 directly.
Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff
Currently we risk losing pending bandwidth rollup writes even on a clean
shutdown. This change ensures that all pending writes are actually
written to the db when shutting down the satellite.
Change-Id: Ideab62fa9808937d3dce9585c52405d8c8a0e703
Setup command of uplink has to create the configuration directory just
before saving the configuration file for making it more robust than
creating in the initial state of the process.
When creating the directory at the beginning of the process leaves the
possibility to delete such directory during the setup process and leads
to a failure.
Ticket https://storjlabs.atlassian.net/browse/V3-3545
Change-Id: I30db0175e23a597e9675d267b4d7e25d5d4c5119
storagenode database preflight check.
Disable preflight database check by default, and have the option to
enable it. This will allow us to enable it once it is definitely
working.
Also change the name of the config flag for preflight time sync.
Change-Id: Ie2e20f9e25dcb38794eafa7e1505e7c6ff287c99
for storagenode
Ensure that database schema matches latest test migration schema before
allowing the node to start up.
Ensure minimal read/write functionality for each storagenode database
before allowing the node to start up.
This will eliminate many unhandled audit errors we are seeing.
Change-Id: Ic0e628b04a9c35b7a8243f6a81d4683918170ba9
this commit introduces the reported_serials table. its purpose is
to allow for blind writes into it as nodes report in so that we have
minimal contention. in order to continue to accurately account for
used bandwidth, though, we cannot immediately add the settled amount.
if we did, we would have to give up on blind writes.
the table's primary key is structured precisely so that we can quickly
find expired orders and so that we maximally benefit from rocksdb
path prefix compression. we do this by rounding the expires at time
forward to the next day, effectively giving us storagenode petnames
for free. and since there's no secondary index or foreign key
constraints, this design should use significantly less space than
the current used_serials table while also reducing contention.
after inserting the orders into the table, we have a chore that
periodically consumes all of the expired orders in it and inserts
them into the existing rollups tables. this is as if we changed
the nodes to report as the order expired rather than as soon as
possible, so the belief in correctness of the refactor is higher.
since we are able to process large batches of orders (typically
a day's worth), we can use the code to maximally batch inserts into
the rollup tables to make inserts as friendly as possible to
cockroach.
Change-Id: I25d609ca2679b8331979184f16c6d46d4f74c1a6
JIRA: https://storjlabs.atlassian.net/browse/V3-3499
The `uplink share` command does not print the restricted API key and the
restricted encryption access anymore.
Change-Id: Ie4ebe0b27067ee00af97c775f4e06f558b894fe2
We want to make using uplink as easy as possible. That's why we wan't to
avoid requiring setup or import command before normal usage if user
specified --access flag. If this flag is set then rest flags should be
set as defaults.
https://storjlabs.atlassian.net/browse/V3-3490
Change-Id: I95a7bd77a3f00b8d9981fee513e9e77aef298bca
We decided that better name for "scope" will be "access". This change
refactors cmd part of code but don't touch libuplink. For backward
compatibility old configs with "scope" field will be loaded without any
issue. Old flag "scope" won't be supported directly from command line.
https://storjlabs.atlassian.net/browse/V3-3488
Change-Id: I349d6971c798380d147937c91e887edb5e9ae4aa
Remove starting up messages from peers. We expect all of them to start,
if they don't, then they should return an error why they don't start.
The only informative message is when a service is disabled.
When doing initial database setup then each migration step isn't
informative, hence print only a single line with the final version.
Also use shorter log scopes.
Change-Id: Ic8b61411df2eeae2a36d600a0c2fbc97a84a5b93
As per discussed we decided to rate limit how fast we iterate through
the metainfo database in the metainfo loop. This puts in place a
mechanism for rate limiting and burst limiting if need be in the future.
The default for this rate limiting is still no limits so it stays the
same as our previous functionality.
Change-Id: I950f7192962b0e49f082d2c4284e2d52b0a925c7
Fixes Least Authority Issue F:
https://storjlabs.atlassian.net/browse/V3-3409
If the --allowed-path-prefix flag is not set to the `share` command, any
command arguments will be used as allowed path prefixes.
This patch also improves the output of the `share` command to print the
state of all restrictions, so users can confirm they match their
intention.
Change-Id: Id1b4df20b182d3fe04cb2196feea090975fce8b4
flate compression with default settings plays very poorly
together with race, causing test to take a significant amount
of time.
Use pass-through compression to avoid the issue.
Improves test from 2m45s to 17s.
Change-Id: Iadf1381c538736d48e018164697bdfd3356e24b8
This change updates the three satellite report commands that accept date
ranges to parse and treat those dates uniformly.
- End dates are now uniformly exclusive. Exclusive end dates helps
operators avoid one-off errors on month boundaries, as in the operator
does not have to remember how many days are in that month and can just
run the report from the 1st (inclusive) through the 1st (exclusive).
- Fixed the date range validity check which only failed if the start
date came after the end date (it should have failed dates that were
equal since the check happened after adjusting for inclusivity).
Change-Id: Ib2ee1c71ddb916c6e1906834d5ff0dc47d1a5801
- also updated ping chore to pick up trust changes
- fixed small typo in blueprint
- fixed flags for storj-sim
- wired up changes to testplanet
Change-Id: I02982f3a63a1b4150b82a009ee126b25ed51917d
Add randomized test cases for testing the observer processSegment method
when the observer has a time range (from and to) set and there are
objects with segments whose creation date is outside of this range.
Change-Id: Ieac82a21f278f0850b95275bfdd2e8a812cc57a5
the old code to eventually create an api key on a satellite had
some issues. namely, there were some ignored errors, swallowed
errors, incorrect returns of nil errors when there should have
been an error, and it did not handle working against an already
existing database.
this commit fixes the above issues and organizes the code into
a set of methods performing individual steps rather than one big
function. it adds retries and attempts to get existing values
instead of creating them when possible, which means that it will
work if the values already exist. additionally, it removes the
3 second sleep in favor of a bounded retry loop with a small sleep
which improves startup times.
Change-Id: I4de04659e5a62dd3f675fbf3c76f3311c410a03e
After changing how we execute the storagenode-updater process we lost
timestamps in the log.
The fix is to start using zap logging.
The Windows Installer is changed to register the storagenode-updater
service in a way that the Windows Service Manager passes the
--log.output flag instead of the old --log.
The old --log flag is deprecated, but not removed. We will support it
for backward compatibility. This is required as the storagenode-updater
can auto-updated itself, but the Windows Service Manager of this old
installtion will continue passing the old --log flag when starting it.
Change-Id: I690dff27e01335e617aa314032ecbadc4ea8cbd5
Signed-off-by: Kaloyan Raev <kaloyan@storj.io>
Implement some unit test cases for the observer.processSegment method and fix a bug found by these tests.
A production snapshot is more certain but it's huge for having in the repository and run the test with it by the CI.
We want to have tests for the different cases to detect zombie segments and relaying on production data cannot guarantee to have all of them.
NOTE the test has been implemented with random values for not having always the same combination of segments list and the same values avoiding that the implementation gets stale due to the test. The issue about this is that it's harder to understand and we could get only sometimes failures in the CI in case that the implementation has some bug.
The random tests allow to eventually check cases that a static test may now cover because it is not expressed or because it is needed to implement a large number of cases.
Because we are worried that the test implementation is complex and we could have bugs on it despite that the same bugs should exist in both, the implementation and the test, moreover that we have to consider that the implementation and the tests have been written by different people. Because of that, we may replace entirely these random tests by a list of static ones.
for storj-sim to work, we need to avoid schemas in cockroach urls
so we have storj-sim create namespaced databases instead of schemas
and we have the migrate command create the database in the same way
that it would create a schema for postgres. then it works!
a follow up commit will move the creation of the database/schemas
into storj-sim's setup step so that we can avoid doing these icky
creations during normal migration calls. it will also make the
pointerdb have an explicit call to migrate instead of just doing
it every time it's opened.
Change-Id: If69ef5cb96b6866b0438c761bd445afb3597ae5f
* Move the observer implementation and the type definitions related with
it and helper functions to its own file.
* Cluster struct type is used as a key for a ObjectsMap type.
Observer struct type has a field of ObjectsMap.
Cluster has a field for project ID.
Observer processSegment method uses its ObjectMap field for tracking
objects.
However Observer processSegment clears the map once the projectID
diverges from the one kept in the Observer lastProjectID field, meaning
that it isn't needed to keep the projectID as part of the ObjectMap key.
For this reason, ObjectMap can use as a key just only the bucket name
and Cluster struct isn't needed.
Because of such change, the ObjectMap type has been renamed to a more
descriptive name.
* Make the types defined for this specific package not being exported.
* Create a constructor function for observer to encapsulate the map
allocation.
* Don't throw away the entirely buckets objects map when an empty one
is used to reuse part of the allocations.
Encapsulate the clearing up logic into a method.
* Make the analyzeProject function to be a method of observer.