This change is adjusting metainfo endpoint to use metabase for uploading
and downloading remote objects. Inline segments will be added later.
Change-Id: I109d45bf644cd48096c47361043ebd8dfeaea0f3
We shouldn't have any EOF issues with recent drpc fix, let's reenable
and see whether it's still flaky.
Change-Id: I0de312bcb087c7f70ec9d3281d73d86f971845d5
Jira: https://storjlabs.atlassian.net/browse/PG-69
There are a number of segments with piece_hashes_verified = false in
their metadata) on US-Central-1, Europe-West-1, and Asia-East-1
satellites. Most probably, this happened due to a bug we had in the
past. We want to verify them before executing the main migration to
metabase. This would simplify the main migration to metabase with one
less issue to think about.
Change-Id: I8831af1a254c560d45bb87d7104e49abd8242236
This PR updates `uplink rb --force` command to use the new libuplink API
`DeleteBucketWithObjects`.
It also updates `DeleteBucket` endpoint to return a specific error
message when a given bucket has concurrent writes while being deleted.
Change-Id: Ic9593d55b0c27b26cd8966dd1bc8cd1e02a6666e
This PR changes DeleteBucket to be able to delete all objects within a
bucket if `DeleteAll` is set in `BucketDeleteRequest`.
It also changes `DeleteBucket` API to treat `ErrBucketNotFound` as a
successful delete operation instead of returning an error back to the
client.
Change-Id: I3a22c16224c7894f2d0c2a40ba1ae8717fa1005f
This adds the unimplemented GetObjectIPs method to metainfo endpoint so
we can import new common protobuf definitions.
Change-Id: I154f26baccb6bb3c66de3eb25611930545c9754b
Why: We need a way to cut down on database traffic due to bandwidth
measurement and tracking.
What: This changeset is the Satellite side of settling orders in 1 hr windows.
See design doc for more details: https://review.dev.storj.io/c/storj/storj/+/1732
Change-Id: I2e1c151e2e65516ebe1b7f47b7c5f83a3a220b31
What:
Use the github.com/jackc/pgx postgresql driver in place of
github.com/lib/pq.
Why:
github.com/lib/pq has some problems with error handling and context
cancellations (i.e. it might even issue queries or DML statements more
than once! see https://github.com/lib/pq/issues/939). The
github.com/jackx/pgx library appears not to have these problems, and
also appears to be better engineered and implemented (in particular, it
doesn't use "exceptions by panic"). It should also give us some
performance improvements in some cases, and even more so if we can use
it directly instead of going through the database/sql layer.
Change-Id: Ia696d220f340a097dee9550a312d37de14ed2044
When a request comes in on the satellite api and we validate the
macaroon, we now also check if any of the macaroon's tails have been
revoked.
Change-Id: I80ce4312602baf431cfa1b1285f79bed88bb4497
This allows to seeing logs in the output of the invoice commands.
Existing ensure-stripe-customer commands is moved from the 'reports' to
the new 'billing' root command.
Change-Id: I752c7ab6ca59bfac8e0f174a45d2ab45fc18e467
To avoid including multiple months in a single invoice, we need all
inspector's invoice commands to run in for specific period.
See https://storjlabs.atlassian.net/browse/USR-725
Change-Id: I3637dc189234f02350daca8d897c21765762ea55
See https://storjlabs.atlassian.net/browse/SM-752
These changes allow us to change the log level at runtime through a handler off of the debug endpoint.
Examples of changing the log level on storj-sim
To get the current level for the satellite api process:
curl -XGET 'http://127.0.0.1:10009/logging' --header 'Content-Type: text/plain'
To change the log level:
curl -XPUT 'http://127.0.0.1:10009/logging' --header 'Content-Type: text/plain' --data-raw '{"level":"error"}'
Change-Id: I05d164b290929fa06b6d78c01075ee41f8238044
when tracing is enabled, we should also set sampling rate to
a non-zero value. For now, we will set it to 1.
Uplink CLI users should be able to override it with the sample
flag.
Change-Id: I8bcf514fb14c2a1c4349b7957dd24ec23e4a85e5
Previously we are using tracing.sampled to be the switch for turning on/off tracing.
However we would like to separate sampling rate from being the switch,
so we can set sampling rate to be 0 but still intialize tracing for
satellite and storagenodes
Change-Id: I27e6ba25ea6f6b612b4e1a57cf1301889ded41ec
When we receive a piece deletion request, include the number of piece
IDs we couldn't add to the queue in the reponse
Change-Id: Ibebbe92ac50105bb5c74b18211ed38d468eb33f3
Automatically attach attribution information to bucket during
BeginObject or CreateBucket when the UserAgent is set.
Change-Id: I405cb26c5a2f7394b30e3f2cf5d2214c8781eb8b
* Add migration to storagenode reputation table to add suspended
timestamp
* Send suspended info to storagenode from satellite nodestats endpoint
* Add suspended status to storagenode api
* Add an indicator on the storagenode dashboard informing operator of
the satellites the node is suspended on
Change-Id: Ie3669f6069cc0258ba76ec99d17006e1b5fd9c8a
size with BeginObject
Such solution will add one round trip to satellite during upload so for
now we are reverting this until we will have solution for this.
Change-Id: Ic2d826448ab7b0318cd6922df05deee9167cf2f0
BeginObject response
We want to control inline segment size and segment size on satellite
side. We need to return such information to uplink like with redundancy
scheme.
Change-Id: If04b0a45a2757a01c0cc046432c115f475e9323c
uuid.UUID implements driver.Value so it can be directly used as a
scannable result.
Replace uses of dbutil.BytesToUUID with uuid.FromBytes.
Change-Id: I51a670185ceb3cc2199d5aa2b76bc3fc191ca8fe
This will help to determine how many grpc calls are made to the
satellite.
Also remove the grpc funcs that have been added to upstream.
Change-Id: I91878f4fd10f9bfe601c94222c102eaaf4d35963
* debug
* traces
* cfgstruct
* process
Package `storj/private/version` will be removed as a separate change.
Change-Id: Iadc40faa782e6225513b28218952f02d9c240a9f
Switch back to the original DeleteBucket and DeleteObject methods.
Next step: remove the DeleteBucketReturnDeleted and
DeleteObjectReturnDeleted from storj.io/uplink.
Change-Id: I273a305326d411e51ce354ce72fcc6ecadf4dd5f
step 1 in https://review.dev.storj.io/c/storj/uplink/+/1236
Now the old libuplink uses the temporary DeleteBucketReturnDeleted and
DeleteObjectReturnDeleted methods. This way, in the next step, we will
be able to change the DeleteBucket and DeleteObject methods to return
the deleted bucket/object.
Change-Id: I2e638be1960bca6ce1456c92849fcdd6d93e5252
The admission/v3 protocol now supports arbitrary key/value headers to be
included in each packet of metrics. This commit creates support for
this, so the lua config file can declare a filter taking into account
the key/value headers.
Change-Id: I41de8c018d33304ccf46ec221ae689d55c5fb1ee
When a storagenode begins to run low on capacity, we want to notify
the satellite before completely running out of space. To achieve this,
at the end of an upload request, the SN checks if its available space has
fallen below a certain threshold. If so, trigger a notification to the
satellites.
The new NotifyLowDisk method on the monitor chore is implemented using the
common/syn2.Cooldown type, which allows us to execute contact only once
within a given timeframe; avoiding hammering the satellites with requests.
This PR contains changes to the storagenode/contact package, namely moving
methods involving the actual satellite communication out of Chore and into
Service. This allows us to ping satellites from the monitor chore
Change-Id: I668455748cdc6741291b61130d8ef9feece86458
common/pb moved grpc to a separate package common/pb/pbgrpc.
This updates this repository to use it.
Change-Id: I2de2a190688871cf9cb61f7ea511f8a01e264e4e
Now that we are trying to identify the root cause of the satellite load limitations (i.e. currently the satellite has a max ability of 400 rps for uploads and we need this to be higher), we are using the golang diagnostic tools to collect insight into what the bottlenecks are. We currently have a debug endpoint to gather some cpu and mem data, but it could be useful to have continuous profiling. GCP stackdriver has support for continuous profiling so lets set that up and see if it is helpful to gather more data.
This PR adds support for [GCP continuous profiler](https://cloud.google.com/profiler) which allows enabling continuous cpu/mem profiling and the stats are sent to stackdriver in google cloud console.
To enable the continuous profiling for a storj component, do the following:
- prereq: the workload must be running in GKE and have Stackdriver Profiling IAM role permissions
- provide the config flag `debug.profilename` in the config.yaml file for the workload (i.e. satellite api process, etc). The profilename should be the workload name, for example "satellite-api".
- once the above config flag is provided, the profiler will be initialized and profiling stats will automatically be sent to GCP project where the workload is running and viewable in the Stackdriver Profile page in the console
The current implementation assumes the workload is running in GKE, however if we find if useful we can add support to enable this from anywhere. But for simplicity, its configured this way assuming the main goal is to enable in production systems.
Change-Id: Ibf8ebe2df7bf06fdd4951ee6a1e48854dd36ad47
With commit: 3331b443e7, satellite will
start calling `DeletePieces`. Therefore, we can remove the old endpoint
once the above commit is deployed with all satellites
Change-Id: I0124bc00a7cb808d119eb59f8fcd7fadf68158bb
we want to return back to the user as quick as possible but also keep
deleting remaining pieces on the storagenodes
Change-Id: I04e9e7a80b17a8c474c841cceae02bb21d2e796f
this commit updates our monkit dependency to the v3 version where
it outputs in an influx style. this makes discovery much easier
as many tools are built to look at it this way.
graphite and rothko will suffer some due to no longer being a tree
based on dots. hopefully time will exist to update rothko to
index based on the new metric format.
it adds an influx output for the statreceiver so that we can
write to influxdb v1 or v2 directly.
Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff
it was noticed that if you had a long lived transaction A that
was blocking some other transaction B and A was being aborted
due to retriable errors, then transaction B was never given
priority. this was due to using savepoints to do lightweight
retries.
this behavior was problematic becaue we had some queries blocked
for over 16 hours, so this commit addresses the issue with two
prongs:
1. bound the amount of time we will retry a transaction
2. create new transactions when a retry is needed
the first ensures that we never wait for 16 hours, and the value
chosen is 10 minutes. that should be long enough for an ample
amount of retries for small queries, and huge queries probably
shouldn't be retried, even if possible: it's more preferrable to
find a way to make them smaller.
the second ensures that even in the case of retries, queries that
are blocked on the aborted transaction gain priority to run.
between those two changes, the maximum stall time due to retries
should be bounded to around 10 minutes.
Change-Id: Icf898501ef505a89738820a3fae2580988f9f5f4
We move PathCipher to encryption.Store and we need to adjust
storj/uplink for those changes. Uplink repo is also using libuplink to
run tests so we need first adjust storj/storj libuplink and later
storj/uplink.
Change-Id: I84f23e6bad18ac139f72c19939dc526f9f46d88b
this is to help protect against intentional or unintentional
slowloris style problems where a client keeps a tcp connection
alive but never sends any data. because grpc is great, we have
to spawn a separate goroutine for every read/write to the stream
so that we can return from the server handler to cancel it if
necessary. yep. really.
additionally, we update the rpcstatus package to do some stack
trace capture and add a Wrap method for the times where we want
to just use the existing error.
also fixes a number of TODOs where we attach status codes to the
returned errors in the endpoints.
Change-Id: Id8bb8ff84aa34e0f711b0cf9bce3908b36a1d3c1
This reverts commit 8e242cd012.
Revert because lib/pq has known issues with context cancellation.
These issues need to be resolved before these changes can be merged.
Change-Id: I160af51dbc2d67c5449aafa406a403e5367bb555
this will allow for some nice runtime analysis down the road.
also, this allows for wrapping database handles in a way that
can interact with these contexts
requires https://review.dev.storj.io/c/storj/dbx/+/514
Change-Id: Ib087b7cd73296dd2c1e0331314da34d861f61d2b
Adds check to see if storage nodes are eligible to initiate
graceful exit, by checking their CreatedAt date and seeing if
their "age" is greater than the new config value:
NodeMinAgeInMonths
The default for this value is 6 months for now.
https://storjlabs.atlassian.net/browse/V3-3357
Change-Id: Ib807ab8987ddb5a38a27a83886490f73fe8c5816
these may not be optimal but they're probably better based on
our previous testing. we can tune better in the future now that
the groundwork is there.
Change-Id: Iafaee86d3181287c33eadf6b7eceb307dda566a6
* separate sadb migration, add version check
* update checkversion to do same validation as migration
* changes per CR
* add sa migration to storj-sim
* add different debug port in storj-sim for migration
* add wait for exit for storj-sim migration
* update sa docker entrypoint to support migration
* storj-sim satellite parts all wait for migration
* upgrade golang-migrate/migrate to v4 because bug
* fix go mod tidy
keep a pool of connections open when dialing for drpc. this
makes it so that long lived clients (like lib/uplink's Project)
don't continue to use a bad connection forever. it also allows
for concurrent rpcs.
Change-Id: If649b286050e4f09c413fadc3e1ce88f5fc6e600
all of the packages and tests work with both grpc and
drpc. we'll probably need to do some jenkins pipelines
to run the tests with drpc as well.
most of the changes are really due to a bit of cleanup
of the pkg/transport.Client api into an rpc.Dialer in
the spirit of a net.Dialer. now that we don't need
observers, we can pass around stateless configuration
to everything rather than stateful things that issue
observations. it also adds a DialAddressID for the
case where we don't have a pb.Node, but we do have an
address and want to assert some ID. this happened
pretty frequently, and now there's no more weird
contortions creating custom tls options, etc.
a lot of the other changes are being consistent/using
the abstractions in the rpc package to do rpc style
things like finding peer information, or checking
status codes.
Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412
The fundamental problem is that both drpc and grpc servers
want to close the listener and they both want to ignore the
error from Accept after the listener is closed. There's no
way to do this in a race free way. Fortunately, the mux
hands out listeners that can be independently closed. That
means they can both do their own shutdown logic where they
ignore the error, and then after they're closed, the code
orchestrating the servers can close the listeners.
The final weird bit is that the server's Close method is
required to wait until the Run method has exited (or at
least enough for the listeners to definitely be closed)
because tests depend on that behavior, so we have to add
some channels/mutexes/onces to ensure that Run has exited
and that a new call can't start after Close is called.
Change-Id: I7c4ef293f7963f83138815f51824fd5b8d09ce15
What: Change cmd/uplink to use scopes
It moves the fields that will be subsumed by scopes into an explicit legacy section and hides their configuration flags.
Why: So that it can read scopes in from files and stuff
* update UI to reflect final mockups
* implement create handler and render offers table data to UI
* fix line-height unit and remove important from selectors
* update file names and ids for clarity
* shorten 'label' in ids
* localize global vars, fix endpoint names, remove unnecessary receiver, fix comments
* fix unnecessary initialization of pointer
* correct file-naming conventions
* register timeConverter in an init func for safety and remove unnecessary important from css
* consolidate create endpoints and add comments
* register timeConverter in init func
* add copyright to files
* introduce require pkg
* add proper http server unit test
* update linting and create offers concurrently in unit test
* fix getOffers comment
* add copy-right to unit-test
* fix data-races
* fix linting
* remove converter in NewServer
* requested changes in progress
* add require for checking status code
* renamed template file
* fix 400 handler
* fix missing copyright and remove extra line
* fix build
* run goroutine for testing parallel
* evaluate reqType with switch stmt and promp for credit amount in cents
* fix lint issue
* add default case
* remove unnecessary var
* fix range scope error
* remove empty lines and use long form for struct field
* fix merge conflicts
* fix template reference
* fix modal id
* not delete package
* add currency formatting and requested changes
* markup formatting
* lean out currency logic and move wait outside loop
* pass ToDollars func to home template
* fix lint
* update UI to reflect final mockups
* fix line-height unit and remove important from selectors
* update file names and ids for clarity
* shorten 'label' in ids
* correct file-naming conventions
* add copyright to files
* check if break in imports is causing lint error
* resolve lint
* tidy go mod
* fix shorthands
* Set up new port 8090 for in offers
Clean up commented code
Rename offers to offersweb
Remove unused code
Add todos for adding front-end templates
Add middleware for only allow local access
Add comment
Fix linting error
Remove commented code
Update storj-sim
Check request IP against Host IP
Use net pakcage to retrieve IP address
Rename service to marketing
* Add wrapper for all errors
* fix conflicts
* update the config file
* fix linting error
* remove unused packages
* remove global runtime var and add flag to storj-sim for mar static dir
* remove debugging lines
* add new config for test data and check if static dir flag is set before passing to mux
* change 'console' to 'marketing' for test data config
* fix linting errors
* update config flag
* Trigger Jenkins
* Trigger CLA
* change BindSetup to be an option to Bind
* add process.Bind to allow composite structures
* hack fix for noprefix flags
* used tagged version of structs
Before this PR, some flags were created by calling `cfgstruct.Bind` and having their fields create a flag. Once the flags were parsed, `viper` was used to acquire all the values from them and config files, and the fields in the struct were set through the flag interface.
This doesn't work for slices of things on config structs very well, since it can only set strings, and for a string slice, it turns out that the implementation in `pflag` appends an entry rather than setting it.
This changes three things:
1. Only have a `Bind` call instead of `Bind` and `BindSetup`, and make `BindSetup` an option instead.
2. Add a `process.Bind` call that takes in a `*cobra.Cmd`, binds the struct to the command's flags, and keeps track of that struct in a global map keyed by the command.
3. Use `viper` to get the values and load them into the bound configuration structs instead of using the flags to propagate the changes.
In this way, we can support whatever rich configuration we want in the config yaml files, while still getting command like flags when important.
this change removes the cryptopasta dependency.
a couple possible sources of problem with this change:
* the encoding used for ECDSA signatures on SignedMessage has changed.
the encoding employed by cryptopasta was workable, but not the same
as the encoding used for such signatures in the rest of the world
(most particularly, on ECDSA signatures in X.509 certificates). I
think we'll be best served by using one ECDSA signature encoding from
here on, but if we need to use the old encoding for backwards
compatibility with existing nodes, that can be arranged.
* since there's already a breaking change in SignedMessage, I changed
it to send and receive public keys in raw PKIX format, instead of
PEM. PEM just adds unhelpful overhead for this case.
* cmd/statreceiver: lua-scriptable stat receiver
Change-Id: I3ce0fe3f1ef4b1f4f27eed90bac0e91cfecf22d7
* some updates
Change-Id: I7c3485adcda1278fce01ae077b4761b3ddb9fb7a
* more comments
Change-Id: I0bb22993cd934c3d40fc1da80d07e49e686b80dd
* linter fixes
Change-Id: Ied014304ecb9aadcf00a6b66ad28f856a428d150
* catch errors
Change-Id: I6e1920f1fd941e66199b30bc427285c19769fc70
* review feedback
Change-Id: I9d4051851eab18970c5f5ddcf4ff265508e541d3
* errorgroup improvements
Change-Id: I4699dda3022f0485fbb50c9dafe692d3921734ff
* too tricky
the previous thing was better for memory with lots of errors at a time
but https://play.golang.org/p/RweTMRjoSCt is too much of a foot gun
Change-Id: I23f0b3d77dd4288fcc20b3756a7110359576bf44
* adds channel for getting node out of lookup
* WIP adding the channels to lookups
* WIP adding channel to node lookups
* Wires up FindNodes method with channels
* WIP adds a test suite for lookup - tests are still failing
* WIP wires up use of testplanet for kademlia lookup tests
* WIP merging in node id changes
* Merges in pkg/storj node type changes
* Tests passing
* Lookup node working via Inspector now
* updates
* WIP working on getting tests passing
* WIP getting tests passing
* FindNode works
* Linter fix
* Adds copyrights to lookup_test
* removes a fmt.Printf I missed
* Removes commented out lines
* preparing for use of `customtype` gogo extension with `NodeID` type
* review changes
* preparing for use of `customtype` gogo extension with `NodeID` type
* review changes
* wip
* tests passing
* wip fixing tests
* more wip test fixing
* remove NodeIDList from proto files
* linter fixes
* linter fixes
* linter/review fixes
* more freaking linter fixes
* omg just kill me - linterrrrrrrr
* travis linter, i will muder you and your family in your sleep
* goimports everything - burn in hell travis
* goimports update
* go mod tidy
* protobuf for sending bandwidth agreements to satellite from storage nodes
* Setup process for sending agreements
* Add payer_id to db with bandwidth agreements for better sorting
* Linter errors
* Read agreements from PSDB
* Try writing message to server
* Cleanup
* Basic functionality
* Better error handelling
* Fix test
* setup config and server structure for receiving bandwidth agreements
* Resolve linter issues
* Optional commit for if we want to handle deletes all at once
* add identity to Server, add logic for receiving bandwidth messsages
* Bandwidth agreement DBX creation and integration with bw agreement endpoint
Co-authored-by: Kishore <kishore@storj.io>
Co-authored-by: Cam <cameron@storj.io>
* protobuf for sending bandwidth agreements to satellite from storage nodes
* Setup process for sending agreements
* Add payer_id to db with bandwidth agreements for better sorting
* Linter errors
* Read agreements from PSDB
* Try writing message to server
* Cleanup
* Basic functionality
* Better error handelling
* Fix test
* setup config and server structure for receiving bandwidth agreements
* Resolve linter issues
* Optional commit for if we want to handle deletes all at once
* add identity to Server, add logic for receiving bandwidth messsages
* Bandwidth agreement DBX creation and integration with bw agreement endpoint
Co-authored-by: Kishore <kishore@storj.io>
Co-authored-by: Cam <cameron@storj.io>
* added postgres create/read/delete test function
Co-authored-by: kishore <kishore@storj.io
Co-authored-by: cam <cameron@storj.io>
* edit comment
* removed sqlite3 driver from dbx
* remove generated sqlite code, add dbx read limitoffset
* remove getServerAndDB function, rename getDBPath to getPSQLInfo
* WIP writing server endpoint test
* code review changes
..although it ought to work for other storage.KeyValueStore needs as
well. it's just optimized to work pretty well for a largish hierarchy of
paths.
This includes the addition of "long benchmarks" for KeyValueStore
testing. These will only be run when -test-bench-long is added to the
test flags. In these benchmarks, a large corpus of paths matching a
natural ("real-life") hierarchy is read from paths.data.gz (which you
can get from https://github.com/storj/path-test-corpus) and imported
into a particular KeyValueStore. Recursive and non-recursive queries are
run on it to detect performance problems that arise only at scale.
This also includes alternate implementation of the postgreskv client,
which works in a less-bizarre way for non-recursive queries, but suffers
from poor performance in tests such as the long benchmarks. Once this
alternate impl is committed to the tree, we can remove it again; I just
want it to be available for future reference.
* begin adding path encryption
* do not encrypt/decrypt first element of path (bucket)
* add path encryption for delete and list
* use encrypted paths in streamstore.Meta
* fix listing with encrypted paths
* move encrypt/decryptAfterBucket to streamstore
* fix listing with no prefix
* remove duplicate logic for listing with no prefix
* initial commit- wip, working on testing and library
* wip working on testing library functioon to get poointer
* working on nil reference for testing
* tests wip
* wip-working on getting tests to work
* working on tests
* put test passes
* working on test- need to export
* created pdclient, and now working on testing function
* tests working for list- getting object back
* wip - got derived piece id
* fixed making grpc public
* fixed linter errors and minor added method for size
* need to work on testing, added random integer function
* got psc server working for testing
* working on ranger test and ranger method
* testing creds for new computer
* working on getting segment metadata
* get random stripe
* added caveat to random fn
* fixed data types
* modified library to have one public function that returns a random stripe
* removed extra comments
* added commons.go file for audit
* added last path to be remembered
* changed random function to cryto/rand/ & worked on tests passing
* working on testing to get analysis of randomness
* changed to track last item in pagination
* finished testing randomness, cleaned up code
* fixed rebase errors
* removed error, kept common file
* fixed travis errors
* attempt to fix overlay issue
* fixed travis error
* updated pointer parameters
* made smaller functions, renamed audit
* made changes per suggestions
* removed gosum
* fixed pr per suggestions
* removed comment
* Loads cache from context for PointerDB access
* WIP adds overlay lookups to pointerdb requests
* Pointer lookup code is added for Get
* adds feature flag for pointerdb return
* refactors pointerdb code
* removes some unnecessary debug logs
* Fixes indent in config
* adds early return for non-remote pointers
* formats code, removes some comments
* Fixes tests broken by pointer proto changes
* adds error check and merges variable declaration
* removes commented out proto import
* adds error check to pdbclient