Commit Graph

795 Commits

Author SHA1 Message Date
Andrew Harding
5c744d7ed4 storagenode/pieces: close reader after use
Change-Id: Icd9df821edb668c5521732396b7d6be3b8e75c7a
2023-03-13 14:06:10 +00:00
JT Olio
ade808375c storagenode/piecestore: be more flexible with bandwidth usage max
this is to prepare for https://review.dev.storj.io/c/storj/uplink/+/9814

Change-Id: Ib74fd63a352a0ae4cc1f8b66fb70df649844c33a
2023-03-06 18:00:17 +00:00
Egon Elbre
71e42ac2a6 storagnode/storagenodedb: fix database queries
Change-Id: Ic5036c30aed7f68b9c28d6190650028461503465
2023-03-01 15:57:08 +02:00
Márton Elek
644aca0e42 storagenode: fix piecestore download metrics
Storagenode download metrics are not accurate:

 * the current metric bump cancel metrics only for specific error messages, but there are cases where the error is already handled (err == nill)
 * instead of the full size of the piece we need to use the size of the downloaded bytes

Change-Id: I6ca75770e2d40bf514f5e273785c78e02968c919
2023-02-28 12:52:58 +00:00
JT Olio
529e3674e4 storagenode/piecestore: handle upload write if provided in first message
we may in the future want to accept writes and commits as part of the
initial request message, just like
https://review.dev.storj.io/c/storj/storj/+/9245

this change is forward compatible but continues to work with
existing clients.

Change-Id: Ifd3ac8606d498a43bb35d0a3751859656e1e8995
2023-02-23 18:33:08 +00:00
JT Olio
ca13eca718 go.mod: bump libuplink to include noise
Change-Id: If5bceb139ce6fdf6c0792b4bb536bc61b54e32bb
2023-02-10 17:03:51 +00:00
JT Olio
522aed083d private/server,satellite/contact,misc: use new storj/common noise helpers
this change uses the new storj/common noise helpers, which:
 * add a security fix (require an expected node id for validating
   noise key attestations)
 * stops doing an unnecessary order signature validation (it's
   already been done inside of PutPiece)
 * removes some duplicate code

Change-Id: I5e67a08ff216cd9c5b0b82e40b4d9de664b6b0fc
2023-02-07 09:53:45 -05:00
Clement Sam
3d3f9d133a storagenode: fix B*h to bytes disk usage conversion
The used space graph values are correct when a single satellite is
selected but wrong for 'All satellites'. This is related to the
queries for getting the individual disk usages for all satellites
per day and the summary and average for all satellites per day:

1. dividing the sum of at_rest_total by the total_hours is wrong.
Simply put, we were assuming that, for example (4/2)+(6/3) equals
to (4+6)/(2+3), assuming we had 4 and 6 at_rest_total values with
2 and 3 respective hours.

2. To get the average, we need to first find the sum of the
at_rest_total_bytes for each timestamp across all satellites
before taking the average of the sums instead of just taking the
average from the individual satellite values.

Closes https://github.com/storj/storj/issues/5519

Change-Id: Ib1314e238b695a6c1ecd9f9171ee86dd56bb3b24
2023-02-06 18:50:31 +00:00
JT Olio
ae9ea22193 storagenode/piecestore: return node certificate chain at upload conclusion
uplinks currently get the node's certificate chain over TLS. once Noise
is in use, uplinks will no longer be able to do this. we should start
having the upload request return the certificate chain in the same
release that starts supporting noise.

Change-Id: I619b23cb8e25691bcc62d760f884403a4ccd64a0
2023-02-01 01:49:50 +00:00
JT Olio
382af95499 storagenode/contact: send noise key and settings as contact info
Change-Id: I1e7a83de36d5cf16eed8874091b15af1e0b73df7
2023-01-31 21:49:20 +00:00
paul cannon
2f04e20627 storage/filestore: better error message on data corruption
A user on the forum was seeing the error "bad message", which was not
very helpful. This case from the ext4 filesystem using the code EBADMSG
to indicate it detected an invalid CRC, suggesting disk corruption.

This change adds some explanatory information about probable disk
corruption to all errors coming from the (*blobInfo).Stat() call, which
is where storagenode fs corruption problems will usually manifest.

Refs: https://github.com/storj/storj/issues/5375
Change-Id: I87f4a800236050415c4191ef1a0fc952f9def315
2023-01-30 08:54:06 -06:00
paul cannon
ed7c82439d storage/filestore: avoid stat() during walkNamespaceInPath
Calling stat() (really, lstat()) on every file during a directory walk
is the step that takes up the most time. Furthermore, not all directory
walk uses _need_ to have a stat done on every file. Therefore, in this
commit we avoid doing the stat at the lowest level of
walkNamespaceInPath. The stat will still be done when it is requested,
with the Stat() method on the blobInfo object.

The major upside of this is that we can avoid the stat call on most
files during a Retain operation. This should speed up garbage collection
considerably.

The major downside is that walkNamespaceInPath will no longer
automatically skip over directories that are named like blob files, or
blob files which are deleted between readdir() and stat(). Callers to
walkNamespaceInPath and its variants (WalkNamespace,
WalkSatellitePieces, etc) are now expected to handle these cases
individually.

Thanks to forum member Toyoo for the insight that this would speed up
garbage collection.

Refs: https://github.com/storj/storj/issues/5454
Change-Id: I72930573d58928fa25057ed89cd4ec474b884199
2023-01-30 13:47:03 +00:00
JT Olio
e40191afd6 storj: upgrade to use latest storj/common NodeAddress
Change-Id: I5987391bcfe5f6dfd7b525698c337a4cbda9b76e
2023-01-25 01:37:26 +00:00
Clement Sam
95960572b3 storagenode/piecestore: improve logs for incoming requests
- Adds "Remote Address" field to all INFO logs related to GET,
PUT, and DELETE requests
- Adds Offset and Size fields to all info logs related to GET
requests

Resolves https://github.com/storj/storj/issues/5404

Change-Id: I5dab1867619385362e5f1e0455dfab17d295a37a
2023-01-24 10:23:34 +00:00
JT Olio
8d69837f02 storagenode/piecestore: handle order if provided in first message
we may in the future want to accept orders as part of the initial
request message
(e.g. https://review.dev.storj.io/c/storj/uplink/+/9246).

this change is forward compatible but continues to work with
existing clients.

Change-Id: I475ad50d6cbfee8a1f843383230698e4ef9b9e54
2023-01-20 08:53:22 +00:00
Egon Elbre
90b7076d26 storagenode/pieces: fix log line
Change-Id: I8dba6b0f3d6af3140dfa503c8d6b33e6808d004f
2023-01-17 11:04:47 +02:00
Egon Elbre
9544a670d7 storagenode/pieces: fix concurrent empty and restore trash
This ensures that empty trash and restore trash cannot run at the same
time.

Fixes https://github.com/storj/storj/issues/5416

Change-Id: I9d2e3aa3d66e61e5c8a7427a95208bb96089792d
2023-01-03 15:01:54 +00:00
Michal Niewrzal
d1d617d654 storagenode/piecestore: small cleanup
* normalize ExistsCheckWorkers flag to avoid setting incorrect values
* wait for limiter to finish if context was canceled

Change-Id: I688a395bd958cd09233605fc264d43f91ec45ed1
2022-12-19 12:37:12 +00:00
Michal Niewrzal
5110803102 storagenode/piecestore: add Exists endpoint
Adds new method Exists which can be used to verify which
requested piece ids exists on storage node. Will verify only pieces
which belongs to the satellite that used that endpoint.

Minum WASM size was increased a bit.

https://github.com/storj/storj/issues/5415

Change-Id: Ia5f9cadeb526541b2776a8973eb7d50133ad8636
2022-12-17 04:08:26 +00:00
Egon Elbre
ee71fbb41d storagenode/piecestore: start restore trash in the background
Starting restore trash in the background allows the satellite to
continue to the next storagenode without needing to wait until
completion.

Of course, this means the satellite doesn't get feedback whether it
succeeds successfully or not. This means that the restore-trash needs to
be executed several times.

Change-Id: I62d43f6f2e4a07854f6d083a65badf897338083b
2022-12-16 18:15:52 +02:00
Clement Sam
951d5db7f7 storagenode: fix hour_interval for first day defaulted to 24h
Previously because of the use of a LAG to calculate the hour_interval
the first record, which is usually the first day of the month usually,
doesn’t have a previous record and always assumes the at_rest_total is
for 24 hours.

Resolves https://github.com/storj/storj/issues/5390

Change-Id: Id532f8b38fe9df61432e62655318ff119a733d13
2022-12-15 13:30:11 +00:00
Clement Sam
7461ffe148 {storagenode,web/multinode}: fix storage usage db/cache retrieval queries
The query changes we did while fixing the usage graph led to wrong
payout calculations directly linked to disk space.

This change:

- avoids converting from Bh to B directly in the query
- returns the at_rest_total in the original bytes*hour value
- returns at_rest_total_bytes as the calculated disk spaced used in bytes
- uses the at_rest_total_bytes only for the disk space graph
- return summary_bytes as the average disk space used within the specified date
- updates the disk space graph header to "average disk space used this month"

The total disk used in the month is also displayed in B not B*day

Resolves https://github.com/storj/storj/issues/5355

Change-Id: I2cfefb0fe711f9c59de2adb547c4ab50b05c7cbb
2022-12-09 11:07:33 +00:00
Egon Elbre
8777523255 private/testplanet: disable WAL for storagenodes
Change-Id: I1be4f7901c830e829118afeb90f04b87df555459
2022-12-05 11:41:06 +00:00
Andrew Harding
4fdea51d5c storagenode/storagenodedb: faster test db init
Running all of the migrations necessary to initialize a storage node
database takes a significant amount of time during runs.

The package current supports initializing a database from manually coalesced
migration data (i.e. snapshot) which improves the situation somewhat.

This change takes things a bit further by changing the snapshot code to
instead hydrate the database directory from a pre-generated snapshot zip
file.

name                                old time/op  new time/op  delta
Run_StorageNodeCount_4/Postgres-16   2.50s ± 0%   0.16s ± 0%   ~     (p=1.000 n=1+1)

Change-Id: I213bbba5f9199497fbe8ce889b627e853f8b29a0
2022-12-01 20:45:36 +00:00
Clement Sam
3e0a4230a5 storagenode/payout: fix disk space value at payout
Payout is still calculating using the tb*h. 0So it’s getting the total disk space used this month and dividing by (24*30) instead of just 30.

More context here: https://forum.storj.io/t/current-month-earnings-in-node-v1-67-1/20319/5

Follow up PR for https://github.com/storj/storj/issues/5146

Change-Id: Ie2d48497f2a9bdbc995c99ee27e70b46580ff638
2022-11-18 01:04:20 +00:00
Clement Sam
f5156296d4 storagenode/pieces: warn and trash v0 pieces when not found in v0pieceInfoDB
Context: https://github.com/storj/storj/issues/4225#issuecomment-1307575782

Closes https://github.com/storj/storj/issues/4225

Change-Id: Ib8c3189f86118338556d48a6af657e6dc109b4c0
2022-11-14 14:54:16 +00:00
Clement Sam
59b37db670 storagenode: overhaul QUIC check implementation
The current implementation blocks the the startup until one or none
of the trusted satellites is able to reach the node via QUIC.
This can cause delayed startup. Also, the quic check is done
once during startup, and if there is a misconfiguration later,
snos would have to restart to node.

In this change, we reuse the contact service which pings the satellite
periodically for node checkin. During checkin the satellite tries
pinging the node back via both TCP and QUIC and reports both statuses.
WIth this, we are able to get a periodic update of the QUIC status
without restarting the node.

Also adds the time the node was last pinged via QUIC to the tooltip
on the QUIC status tab.

Resolves https://github.com/storj/storj/issues/4398

Change-Id: I18aa2a8e8d44e8187f8f2eb51f398fa6073882a4
2022-11-09 03:15:57 +00:00
Egon Elbre
aeb645d32b all: replace deprecated ioutil
Change-Id: I60b0bbf5b68b066e2d44b8b99438594d600a3c2d
2022-10-31 15:50:41 +00:00
Clement Sam
d625eb85fd storagenode: use bytes instead of bytes*hour unit for used space graph
Closes https://github.com/storj/storj/issues/5146

Change-Id: I1b135da81a68193b5b50c761088d79471ca3a2fe
2022-10-28 18:42:45 +00:00
JT Olio
58a9c55f36 mod: bump dependencies
- storj.io/common

Change-Id: Ib78154acc253a13683495abfdd96d702625fdce8
2022-10-19 17:01:53 +00:00
Egon Elbre
ff22fc7ddd all: fix deprecated ioutil commands
Change-Id: I59db35116ec7215a1b8e2ae7dbd319fa099adfac
2022-10-11 15:27:29 +00:00
Erik van Velzen
e6b5501f9b satellite/gc/sender: new service to send retain filters
Implement a new service to read retain filter from a bucket and
send them out to storagenodes.

This allows the retain filters to be generated by a separate command on
a backup of the database.

Paralellism (setting ConcurrentSends) and end-to-end garbage collection
tests will be restored in a subsequent commit.

Solves https://github.com/storj/team-metainfo/issues/121

Change-Id: Iaf8a33fbf6987676cc3cf74a18a8078916fe673d
2022-09-20 11:49:40 +00:00
Clement Sam
07beef378d storagenode/collector: delete expired piece info if file does not exist
The collector tries deleting a piece over and over again, though
the piece does not exist on the storagenode's filesystem.
We need to delete the piece info from the expired db if the
targeted file does not exist.
This does not resolve the base problem of why the file
is deleted before the collector tries deleting it.
This change deletes the piece info from the expired db
if the file does not exist, since we're already trying
to delete that piece anyway.

Closes https://github.com/storj/storj/issues/4192

Change-Id: If659185ca14f1cb29fd3c4237374df6fcd535df8
2022-09-15 12:29:29 +00:00
Clement Sam
a848c29b9b storagenode/nodestats: add monkit metrics for reputation scores
Closes https://github.com/storj/storj/issues/4835

Change-Id: Ib56e34145b962bede3525066f9bd7ef950d21e9b
2022-09-15 08:43:48 +00:00
Clement Sam
64e5fb7772 storagenode/collector: fix error check when file does not exist
The collector calls the Delete() method on the pieces
which returns an error which is wrapped by many error classes.
Delete() method is using Stat() from
1aec831d98/storage/filestore/dir.go (L328)
under the hood.
os.IsNotExist(errors.Unwrap(err) will always be false unless
errors.Unwrap(err) is called multiple times till it gets to
the core os.ErrNotExist. Here is a test case to explain better:

    func TestABC(t *testing.T) {
	classA := errs.Class("A")
	classB := errs.Class("B")

	wrappedError := classB.Wrap(classA.Wrap(os.ErrNotExist))

	require.True(t, os.IsNotExist(errs.Unwrap(wrappedError)))
	require.True(t, os.IsNotExist(errors.Unwrap(wrappedError)))
    }

Using errs.Is() seems to resolve this even without unwrapping the error:

    func TestABC(t *testing.T) {
	classA := errs.Class("A")
	classB := errs.Class("B")

	wrappedError := classB.Wrap(classA.Wrap(os.ErrNotExist))

	require.True(t, errs.Is(wrappedError, os.ErrNotExist))
	require.False(t, errs.Is(wrappedError, os.ErrExist))
	require.False(t, os.IsNotExist(wrappedError))
    }

Does not resolve the collector issue here but enhances it:
https://github.com/storj/storj/issues/4192

Change-Id: Ifb75dd15b54c1e1a5e23f6eba2d621d64874a5cc
2022-09-02 12:26:33 +00:00
Márton Elek
7e71986493 storagenode: accept HTTP calls on public port, listening for monitoring requests
Today each storagenode should have a port which is opened for the internet, and handles DRPC protocol calls.

When we do a HTTP call on the DRPC endpoint, it hangs until a timeout.

This patch changes the behavior: the main DRPC port of the storagenodes can accept HTTP requests and can be used to monitor the status of the node:

 * if returns with HTTP 200 only if the storagnode is healthy (not suspended / disqualified + online score > 0.9)
 * it CAN include information about the current status (per satellite). It's opt-in, you should configure it so.

In this way it becomes extremely easy to monitor storagenodes with external uptime services.

Note: this patch exposes some information which was not easily available before (especially the node status, and used satellites). I think it should be acceptable:

 * Until having more community satellites, all storagenodes are connected to the main Storj satellites.
 * With community satellites, it's good thing to have more transparency (easy way to check who is connected to which satellites)

The implementation is based on this line:

```
http.Serve(NewPrefixedListener([]byte("GET / HT"), publicMux.Route("GET / HT")), p.public.http)
```

This line answers to the TCP requests with `GET / HT...` (GET HTTP request to the route), but puts back the removed prefix.

Change-Id: I3700c7e24524850825ecdf75a4bcc3b4afcb3a74
2022-08-26 09:38:09 +00:00
Márton Elek
4b1be6bf8e storagenode/satellite: support different piece hash algorithms
Change-Id: I3db321e79f12f3ebaa249e6c32fa37fd9615687e
2022-08-23 18:15:06 +00:00
Clement Sam
bac0155664 storagenode/storagenodedb: fix null at_rest_total values for storage usage
Wrapping a COALESCE around the computed at_rest_total value to fallback
to the original at_rest_total value when the computed value is null.

https://forum.storj.io/t/release-preparation-v1-62/19444/5?u=clement

Change-Id: Ifa268ccbe35a63e3b68f07464194fa034ad261b5
2022-08-23 12:56:28 +00:00
Márton Elek
6b27c64833 testplanet: support snapshot based migration for storagenode
Similar to the existing snapshot based tests of satellite/metabase db we make a migration here which is:
 * dedicated to unit tests
 * faster (with less steps)
 * but safe: additional unit test ensures that the snapshot based migration and normal prod migration have the same results.

Change-Id: Ie324b09f64b4553df02247a9461ece305a6cf832
2022-08-22 09:46:27 +00:00
paul cannon
37a4edbaff all: reformat comments as required by gofmt 1.19
I don't know why the go people thought this was a good idea, because
this automatic reformatting is bound to do the wrong thing sometimes,
which is very annoying. But I don't see a way to turn it off, so best to
get this change out of the way.

Change-Id: Ib5dbbca6a6f6fc944d76c9b511b8c904f796e4f3
2022-08-10 18:24:55 +00:00
Clement Sam
25f8f678ab storagenode/nodestats: retrieve storage usage starting from last day of previous month
For the storagenode usage graph currently,
1. the graph is still likely to contain spikes on the first day of
the month for a newly setup storagenode because there's no previous
interval_end_time to deduct from.

2. if a node goes offline for too long, say the last usage report
in the storageusage cache has an interval_end_time of
2020-07-30 00:00:00+00:00 and later, it comes back online a few
days later, it requests for the storage usage from the satellite
starting from the current month, say 2021-01-01 00:00:00+00:00,
the calculated hours for the first day would be 48 hours and it
could be wrong because the cache is missing one day usage report.

This PR addresses second issue on the storagenode side by requesting
storage usage data, instead of just a month boundary, request for an
interval starting from the last day of the previous month to the
current day of the current month.
The first one will be a tradeoff and wouldn't really matter since
it will just be an issue on the first day the storagenode joined
the satellite.

Updates https://github.com/storj/storj/issues/4178

Change-Id: I041c56c412030ce013dd77dce11b0b5d6550927b
2022-08-10 01:37:02 +00:00
Clement Sam
9a539c4830 storagenode/storageusage: add interval_end_time, rename interval_start to timestamp
The satellite now returns the last interval_end_time for each
daily storage usage.
We need to store the interval_end_time in the storage usage cache.
Also, renamed interval_start to timestamp to avoid ambiguity since
the interval_start only stores just the date/day returned by the
satellite.

Updates https://github.com/storj/storj/issues/4178

Change-Id: I94138ba8a506eeedd6703787ee03ab3e072efa32
2022-08-10 01:03:00 +00:00
Óscar de Arriba
4fdb81c510
storagenode/pieces: allow to configure initial piece scan (#5024)
* Allow configure initial piece scan

* Update tests to include new flag

* Update default config specification

* Rename configuration flag

* Rename variable and fix formatting

* Fix format

* Fix typo

Co-authored-by: Stefan Benten <mail@stefan-benten.de>
Co-authored-by: Clement Sam <clementsam75@gmail.com>
Co-authored-by: littleskunk <jens.heimbuerge@googlemail.com>
2022-08-07 22:40:59 +00:00
Egon Elbre
e9692c5681 storagenode/gracefulexit: remove unused interface
Change-Id: Ie6c3d69f5177872d8f4308ac476bc87655da9e4b
2022-08-04 11:26:14 +03:00
Egon Elbre
cf92220c20 {satellite,storagenode}/gracefulexit: simplify limiter usage
Change-Id: Ied7091fe5355b96d327e3f893c5bdd4946a9e6af
2022-08-04 08:18:15 +00:00
Egon Elbre
bc9ab8ee5e satellite/audit,storagenode/gracefulexit: fixes to limiter
Ensure we don't rely on limiter to wait multiple times.

Change-Id: I75d48420236216d4c2fc6fa99293f51f80cd9c33
2022-08-03 10:24:16 +03:00
Qweder93
98b8c7be06 storagenode/console: consoleAPI EstimatedPayout flacky test fixed
In rare cases time frame between creating time.Now() variable and calling
service method that receives it (after API method call) was big enough to distort
current month expectations and make test fail with last digit difference in 1

Change-Id: Ib811492d62f6598a5c40a09de6a87bffeaa0a78e
2022-08-01 15:13:15 +00:00
Stefan Benten
c3171b4ba4
storagenode/retain: Move summary and start logs to info level (#4954)
We currently do not log the GC information/stats under normal circumstances.
This is not good for monitoring and troubleshooting.
2022-07-08 18:19:08 +02:00
Egon Elbre
05e165283f storagenode/console/consoleapi: use fixed time.Now()
It seems the tests relied on time.Now(), which might cause some
discrepancies in calculations. Use a fixed time.Now() rather than
recalculating.

As a sidefix, remove "Test" prefix from t.Run. These are unnecessary.

Change-Id: I1de903fcf0fcf46fc8e3acf2463e17239b8e3cc6
2022-07-01 12:36:01 +03:00
René Smeekes
fd042a92b9
storagenode/reputation: clarify wording on suspension notification (#4921)
Fixes https://github.com/storj/storj/issues/4920
2022-06-20 22:07:12 +02:00