to support TCP_FAST_OPEN, we're considering just using
two TCP connections in parallel per request, one with
and one without. this allows us to safely fire both
concurrently without stressing out the node too much.
see https://review.dev.storj.io/c/storj/storj/+/9933
Change-Id: I9aa8a0252350db5ace04ee125bfe469203e980ec
this change uses the new storj/common noise helpers, which:
* add a security fix (require an expected node id for validating
noise key attestations)
* stops doing an unnecessary order signature validation (it's
already been done inside of PutPiece)
* removes some duplicate code
Change-Id: I5e67a08ff216cd9c5b0b82e40b4d9de664b6b0fc
uplinks currently get the node's certificate chain over TLS. once Noise
is in use, uplinks will no longer be able to do this. we should start
having the upload request return the certificate chain in the same
release that starts supporting noise.
Change-Id: I619b23cb8e25691bcc62d760f884403a4ccd64a0
Starting restore trash in the background allows the satellite to
continue to the next storagenode without needing to wait until
completion.
Of course, this means the satellite doesn't get feedback whether it
succeeds successfully or not. This means that the restore-trash needs to
be executed several times.
Change-Id: I62d43f6f2e4a07854f6d083a65badf897338083b
Running all of the migrations necessary to initialize a storage node
database takes a significant amount of time during runs.
The package current supports initializing a database from manually coalesced
migration data (i.e. snapshot) which improves the situation somewhat.
This change takes things a bit further by changing the snapshot code to
instead hydrate the database directory from a pre-generated snapshot zip
file.
name old time/op new time/op delta
Run_StorageNodeCount_4/Postgres-16 2.50s ± 0% 0.16s ± 0% ~ (p=1.000 n=1+1)
Change-Id: I213bbba5f9199497fbe8ce889b627e853f8b29a0
The current implementation blocks the the startup until one or none
of the trusted satellites is able to reach the node via QUIC.
This can cause delayed startup. Also, the quic check is done
once during startup, and if there is a misconfiguration later,
snos would have to restart to node.
In this change, we reuse the contact service which pings the satellite
periodically for node checkin. During checkin the satellite tries
pinging the node back via both TCP and QUIC and reports both statuses.
WIth this, we are able to get a periodic update of the QUIC status
without restarting the node.
Also adds the time the node was last pinged via QUIC to the tooltip
on the QUIC status tab.
Resolves https://github.com/storj/storj/issues/4398
Change-Id: I18aa2a8e8d44e8187f8f2eb51f398fa6073882a4
Today each storagenode should have a port which is opened for the internet, and handles DRPC protocol calls.
When we do a HTTP call on the DRPC endpoint, it hangs until a timeout.
This patch changes the behavior: the main DRPC port of the storagenodes can accept HTTP requests and can be used to monitor the status of the node:
* if returns with HTTP 200 only if the storagnode is healthy (not suspended / disqualified + online score > 0.9)
* it CAN include information about the current status (per satellite). It's opt-in, you should configure it so.
In this way it becomes extremely easy to monitor storagenodes with external uptime services.
Note: this patch exposes some information which was not easily available before (especially the node status, and used satellites). I think it should be acceptable:
* Until having more community satellites, all storagenodes are connected to the main Storj satellites.
* With community satellites, it's good thing to have more transparency (easy way to check who is connected to which satellites)
The implementation is based on this line:
```
http.Serve(NewPrefixedListener([]byte("GET / HT"), publicMux.Route("GET / HT")), p.public.http)
```
This line answers to the TCP requests with `GET / HT...` (GET HTTP request to the route), but puts back the removed prefix.
Change-Id: I3700c7e24524850825ecdf75a4bcc3b4afcb3a74
Similar to the existing snapshot based tests of satellite/metabase db we make a migration here which is:
* dedicated to unit tests
* faster (with less steps)
* but safe: additional unit test ensures that the snapshot based migration and normal prod migration have the same results.
Change-Id: Ie324b09f64b4553df02247a9461ece305a6cf832
* Allow configure initial piece scan
* Update tests to include new flag
* Update default config specification
* Rename configuration flag
* Rename variable and fix formatting
* Fix format
* Fix typo
Co-authored-by: Stefan Benten <mail@stefan-benten.de>
Co-authored-by: Clement Sam <clementsam75@gmail.com>
Co-authored-by: littleskunk <jens.heimbuerge@googlemail.com>
Go can now directly embed files without relying on external tools.
This makes code use go:embed and avoid the external tooling.
go:embed requires files to be present in the embedded directory,
hence we need to add .keep to "dist" folder. We also add one to
public/.keep, such that it won't be deleted when building storagenode.
Change-Id: I8bef81236be6829ed37ed4c16ef693677b93a631
This change includes storagenode QUIC status on SNO dashboard.
If disabled, it displays warning for SNOs to foward their
UDP port for quic.
Change-Id: I8d28c9c0f5f1e90d80b7c18b9e1e7b78c5e45609
Currently, if a node has untrusted a satellite, the satellite can still
successfully ping the node. If a node decide to untrust a satellite, the
satellite should also mark it as conact failed
Change-Id: Idf80fa00d9849205533dd3e5b3b775b5b9686705
Initially there were pkg and private packages, however for all practical
purposes there's no significant difference between them. It's clearer to
have a single private package - and when we do get a specific
abstraction that needs to be reused, we can move it to storj.io/common
or storj.io/private.
Change-Id: Ibc2036e67f312f5d63cb4a97f5a92e38ae413aa5
allow disabling tcp/quic
In order to have more control of a server so that we can
simulate connection failures in `testplanet`, this PR changes
quic.Listener to accept an existing UDPConn instead of relying on the
quic-go library to create the UDPConn.
This PR also adds two flags on the `server.Config` struct to allow
enabling/disabling tcp/tls listener and quic listener. By default, they
are both set to true.
- `DisableTCPTLS`: internal flag, disables tcp/tls listener.
- `DisableQUIC`: hidden flag, disables quic listener
By making the `DisableQUIC` a hidden flag, it allows storagenode operators to
have the ability to disable quic traffic in case their set up can't work
with udp traffic.
Change-Id: I853b12435d988b9c41ad9b873fd57480d792e378
On servers with non-UTC it would have calculated a different month boundary.
If node joined in current month calculations will be related on amount of days node've been working.
Change-Id: Ie572b197f50c6cdff5a044a53dfb5b9138f82f24
Right now, the best way for a storage node operator to get the current
space used for each satellite is to run the `storagenode exit-satellite`
command for graceful exit, and cancel at the second confirmation prompt.
This is convoluted and the data is readily available from the Blobs
Usage Cache.
This change adds the current space used by each satellite to the
endpoints `/api/sno` and `/api/sno/satellite/<Satellite ID>`
Change-Id: I2173005bb016fc76db96fd598d26b485e5b2aa0b
log.Fatal immediately terminates the program without running any defers.
We should properly close all the services and databases.
Change-Id: I5e959cef3eafedeacb3a2062e3da47e8d04e8e75
If we see an UnexpectedEOF error when attempting to read orders, return
the orders we have been able to read successfully and do not return an
error. This behavior ensures that the storagenode orders service
attempts to archive corrupted files and does not retry them repeatedly
and get stuck.
Change-Id: I0d00d1e174f968af6e99ca861eddad190f1339e2