This should stop the failures when TestCommandLineTool is run with
parallel test execution enabled.
Change-Id: Id80d5eacb78fcec886be786ae8b182517b17fbc6
Clarify and expand some tests, and add a large test that tries to cover
the whole segment-verify stack.
Trying to reproduce a problem we saw in production where the Found count
in the output csv increases with each entry, eventually reaching the
thousands, instead of representing the actual number of pieces found for
each entry.
Change-Id: I65e342fad7dc1c350830fd0c8ce75a87a01d495c
Copy full wasm folder to /app/static/ instead of explicit files because access.wasm file name is dynamic now (includes hash).
Change-Id: I5e8cebccdf5afdf024174b6bdc169fc4923b9ed1
This part can be called from multiple goroutines, therefore we should bw prepared for concurrent run.
Change-Id: I7acf1a29bdb51427d3d03f501b58b190dcf08412
There are 3 different ways to execute segment verify.
When the bucket based segment list is used, the code tries to reuse Segments objects.
But without resetting the stat, it will create bad results.
(This is not the case of the other type of runs, as there we create arrays in each loop)
Change-Id: Ie2d52c7e44088a85d4a3ce541da1c5ff767591d6
Add billing commands to manually complete or fail storjscan invoice
payments that are stuck in a pending state.
Change-Id: Ia19f0a2597201d9d17aad0889eaedff095d706b9
This change allows tag-signer to fail when key-value pairs
provided as arguments are comma-separated.
However, for cases where a value is expected to contain a comma,
we validate the value only if --confirm flag is specified
Resolves https://github.com/storj/storj/issues/6336
Change-Id: Ib6a100ee3adf529f44c8b3ca620a3c0b4f953a17
This change ensures that files generated by building the Vuetify
project are copied to the satellite container.
References #6251
Change-Id: If56fe754d51f1487a3b3c2cf98c40e3010539121
Currently, graceful exit is a complicated subsystem that keeps a queue
of all pieces expected to be on a node, and asks the node to transfer
those pieces to other nodes one by one. The complexity of the system
has, unfortunately, led to numerous bugs and unexpected behaviors.
We have decided to remove this entire subsystem and restructure graceful
exit as follows:
* Nodes will signal their intent to exit gracefully
* The satellite will not send any new pieces to gracefully exiting nodes
* Pieces on gracefully exiting nodes will be considered by the repair
subsystem as "retrievable but unhealthy". They will be repaired off of
the exiting node as needed.
* After one month (with an appropriately high online score), the node
will be considered exited, and held amounts for the node will be
released. The repair worker will continue to fetch pieces from the
node as long as the node stays online.
* If, at the end of the month, a node's online score is below a certain
threshold, its graceful exit will fail.
Refs: https://github.com/storj/storj/issues/6042
Change-Id: I52d4e07a4198e9cb2adf5e6cee2cb64d6f9f426b
This patch removes the following lines from the output (with disable tracing + set the log level to warn):
```
2023-09-25T15:30:58+02:00 INFO process/tracing.go:73 Anonymized tracing enabled
2023-09-25T15:30:58+02:00 DEBUG tracing collector monkit-jaeger@v0.0.0-20220915074555-d100d7589f41/udp.go:128 started
2023-09-25T15:30:58+02:00 DEBUG process/debug.go:37 debug server listening on 127.0.0.1:34803
```
Change-Id: Iccbf4fc3bde9436e0571943d0d85c51ebc766ef9
The repair checker and repair worker both need to determine which pieces
are healthy, which are retrievable, and which should be replaced, but
they have been doing it in different ways in different code, which has
been the cause of bugs. The same term could have very similar but subtly
different meanings between the two, causing much confusion.
With this change, the piece- and node-classification logic is
consolidated into one place within the satellite/repair package, so that
both subsystems can use it. This ought to make decision-making code more
concise and more readable.
The consolidated classification logic has been expanded to create more
sets, so that the decision-making code does not need to do as much
precalculation. It should now be clearer in comments and code that a
piece can belong to multiple sets arbitrarily (except where the
definition of the sets makes this logically impossible), and what the
precise meaning of each set is. These sets include Missing, Suspended,
Clumped, OutOfPlacement, InExcludedCountry, ForcingRepair,
UnhealthyRetrievable, Unhealthy, Retrievable, and Healthy.
Some other side effects of this change:
* CreatePutRepairOrderLimits no longer needs to special-case excluded
countries; it can just create as many order limits as requested (by
way of len(newNodes)).
* The repair checker will now queue a segment for repair when there are
any pieces out of placement. The code calls this "forcing a repair".
* The checker.ReliabilityCache is now accessed by way of a GetNodes()
function similar to the one on the overlay. The classification methods
like MissingPieces(), OutOfPlacementPieces(), and
PiecesNodesLastNetsInOrder() are removed in favor of the
classification logic in satellite/repair/classification.go. This
means the reliability cache no longer needs access to the placement
rules or excluded countries list.
Change-Id: I105109fb94ee126952f07d747c6e11131164fadb
When we do `satellite run api --placement '...'`, the placement rules are not parsed well.
The problem is based on `viper.AllSettings()`, and the main logic is sg. like this (from a new unit test):
```
r := ConfigurablePlacementRule{}
err := r.Set(p)
require.NoError(t, err)
serialized := r.String()
r2 := ConfigurablePlacementRule{}
err = r2.Set(serialized)
require.NoError(t, err)
require.Equal(t, p, r2.String())
```
All settings evaluates the placement rules in `ConfigurablePlacementRules` and stores the string representation.
The problem is that we don't have proper `String()` implementation (it prints out the structs instead of the original definition.
There are two main solutions for this problem:
1. We can fix the `String()`. When we parse a placement rule, the `String()` method should print out the original definition
2. We can switch to use pure string as configuration parameter, and parse the rules only when required.
I feel that 1 is error prone, we can do it (and in this patch I added a lot of `String()` implementations, but it's hard to be sure that our `String()` logic is inline with the parsing logic.
Therefore I decided to make the configuration value of the placements a string (or a wrapper around string).
That's the main reason why this patch seems to be big, as I updated all the usages.
But the main part is in beginning of the `placement.go` (configuration parsing is not a pflag.Value implementation any more, but a separated step).
And `filter.go`, (a few more String implementation for filters.
https://github.com/storj/storj/issues/6248
Change-Id: I47c762d3514342b76a2e85683b1c891502a0756a
This change fixes an issue where the configured user agent wasn't used
by Uplink commands.
Resolvesstorj/customer-issues#999
Change-Id: I2d9f38308eddad7c471a100c968082783c05a3b3
This change adds a new forget-satellite sub-command to
the storagenode CLI which cleans up untrusted satellite
data.
Issue: https://github.com/storj/storj/issues/6068
Change-Id: Iafa109fdc98afdba7582f568a61c22222da65f02
This change removes unused GraphQL code. It also updates storj sim code
to use the GraphQL replacement HTTP endpoints and removes the GraphQL
dependency.
Issue: https://github.com/storj/storj/issues/6142
Change-Id: Ie502553706c4b1282cd883a9275ea7332b8fc92d
the progress bar was being set to inconsistent lengths
multiple times, causing a crash. this fixes that by
only setting the progress bar length once to the length
of the full object. it avoids a round trip by doing so
only after it has gotten the first read handle from the
source, so the length information is cached.
Change-Id: I112d7c79016e54ba3794e96c6174cc01b8baedb4
for very large machines (>10Gbit) it is still useful
to have parallelism for uploads because we're actually
bound by getting new pieces from the satellite, so doing
that in parallel provides a big win.
this change adds back that flag to exist for uploads, and
removes the backwards compatibility code for the flag with
the maximum-concurrent-pieces as they are now independent.
the upload code parallelism story is now this:
- each object is a transfer
- each transfer happens in N parts (size dynamically
chosen to avoid having >10000 parts)
- each part can happen in parallel up to the limit
specified
- each parallel part can have up to the limit of
max concurrent pieces and segments
this change also changes some defaults to be better.
- the connection pool capacity now takes into acount
transfers, parallelism and max concurrent pieces
- the default smallest part size is 1GiB to allow the
new upload code path to upload multiple segments
Change-Id: Iff6709ae73425fbc2858ed360faa2d3ece297c2d
Placement can be null in DB and we need adjust scanning this column
from DB.
Additionally this change sets application name for DB connection.
Change-Id: I3c7d6294f4a3e5e441160b2fd4aeafffe705ec76
We need migrate all existing segment copies to contain all the same
metadata as original segment. So far we were not duplicating stored
pieces but we are changing this behavior right now. We will use this
tool after enabling new way of doing server side copies.
Fixes https://github.com/storj/storj/issues/5890
Change-Id: Ia9ca12486f3c527abd28949eb438d1c4c7138d55
This change creates the ability to run a server separate from the
console web server to serve the front end app. You can run it with
`satellite run ui`. Since there are now potentially two servers instead
of one, the UI server has the option to act as a reverse proxy to the
api server for local development by setting `--console.address` to the
console backend address and `--console.backend-reverse-proxy` to the
console backend's http url. Also, a feature flag has been implemented
on the api server to retain the ability to serve the front end app. It
is toggled with `--console.frontend-enable`.
github issue: https://github.com/storj/storj/issues/5843
Change-Id: I0d30451a20636e3184110dbe28c8a2a8a9505804
placement.AllowedCountry is the old way to specify placement, with the new approach we can use a more generic (dynamic method), which can check full node information instead of just the country code.
The 90% of this patch is just search and replace:
* we need to use NodeFilters instead of placement.AllowedCountry
* which means, we need an initialized PlacementRules available everywhere
* which means we need to configure the placement rules
The remaining 10% is the placement.go, where we introduced a new type of configuration (lightweight expression language) to define any kind of placement without code change.
Change-Id: Ie644b0b1840871b0e6bbcf80c6b50a947503d7df
All the files in uploadselection are (in fact) related to generic node selection, and used not only for upload,
but for download, repair, etc...
Change-Id: Ie4098318a6f8f0bbf672d432761e87047d3762ab
We use two different Node types in `overlay` and `uploadnodeselection` and converting back and forth.
Using the same object would allow us to use a unified node selection interface everywhere.
Change-Id: Ie71e29d60184ee0e5b4547eb54325f09c418f73c
This patch makes satellite container images compatible with storj-up.
Which means that any official release can be easily tested locally.
It means that we need some binaries (like storj-up, dlv) and shall fragments part of the production image, but I think the risk is very low and the benefit is very high.
This is the first towards to unify all the images and make it possible to test/run the same components everywhere (storj-up/nigttly/prod).
https://github.com/storj/storj/issues/5946
Change-Id: Ib53b6213d94f93a793a841dedfe32cc59ef69b72
This change makes dial timeout configurable and change it also from
defatul 20s to 5s. Main motivation is that during repair we often loose
lots of time to dial which eventually will fail. New timeout should be
still enough to dial but we will move forward quicker to next node if
that one will fail.
Timeout is also applied directly as context timeout in case we will
use noise of tcp fast open one day.
Change-Id: I021bf459af49b11241e314fa1a7887c81d5214ea
downloads still need the old copy code because they aren't
parallel in the same way uploads are. revert all the code
that removed the parallel copy, only use the non-parallel
copy for uploads, and add back the parallelism and chunk
size flags and have them set the maximum concurrent pieces
flags to values based on each other when only one is set
for backwards compatibility.
mostly reverts 54ef1c8ca2
Change-Id: I8b5f62bf18a6548fa60865c6c61b5f34fbcec14c
We have a special method to exclude methods (which are called to frequently) from distributed traces.
https://github.com/storj/common/blob/main/tracing/excluded.go
But this works only, if we define the exclusion function during the initialization.
Let's do it in this patch.
Change-Id: Icf12202bd7213b5c0009332ce2755b267f2bdbae
also change the config creation to be more robust to
changes that add defaults in the future by not fully
reconstructing the config value passed in to the
project.
Change-Id: I673e8b54ce0b951ae735bf4658525c477c26ac5a
the inspector tool of satellite is removed recently. We need to remove it from the build/docker files.
Change-Id: Icd75474ea64fe5f9dd4ed76d8597982108f93536