Commit Graph

223 Commits

Author SHA1 Message Date
Michal Niewrzal
c4c391e154 satellite/accounting/live: replace address parsing with redis util
With this change we are replacing parsing code with existing go-redis
util.

We also switch redis client to version 9.

Change-Id: Ie4a651e3ae6960e68958c690873925d319b70e10
2023-04-05 13:20:11 +00:00
paul cannon
0e1c99f75a storage/filestore: fix panic on fs error in EmptyTrash
fileInfo is, of course, nil here, so we can't use fileInfo.Name() to
report the location of the problem.

Change-Id: Ia858a3795b127da0fc812d0a158b36ad344ff76b
2023-02-14 18:09:19 -06:00
Egon Elbre
0e00c7b8da certificate/authorization,cmd/certificates: remove gob code
Now that we've migrated data we can remove gob handling.

Change-Id: I55f772b7a0dda71c51db0683bad3db66b89867ac
2023-02-08 15:05:40 +00:00
Egon Elbre
b8c7dcbf7b certificate: improve gob migration
`storage.KeyValueStore` requires ordered iteration, which redis
doesn't support natively. This would require loading all the keys
into memory and then processing them, rather than iterating over them
one-by-one.

This adds a temporary `IterateUnordered` to handle the migrations
more gracefully.

Change-Id: I55b763500523077c7ab8fdfad175c32cc7788e47
2023-02-01 10:21:48 +00:00
paul cannon
2f04e20627 storage/filestore: better error message on data corruption
A user on the forum was seeing the error "bad message", which was not
very helpful. This case from the ext4 filesystem using the code EBADMSG
to indicate it detected an invalid CRC, suggesting disk corruption.

This change adds some explanatory information about probable disk
corruption to all errors coming from the (*blobInfo).Stat() call, which
is where storagenode fs corruption problems will usually manifest.

Refs: https://github.com/storj/storj/issues/5375
Change-Id: I87f4a800236050415c4191ef1a0fc952f9def315
2023-01-30 08:54:06 -06:00
paul cannon
ed7c82439d storage/filestore: avoid stat() during walkNamespaceInPath
Calling stat() (really, lstat()) on every file during a directory walk
is the step that takes up the most time. Furthermore, not all directory
walk uses _need_ to have a stat done on every file. Therefore, in this
commit we avoid doing the stat at the lowest level of
walkNamespaceInPath. The stat will still be done when it is requested,
with the Stat() method on the blobInfo object.

The major upside of this is that we can avoid the stat call on most
files during a Retain operation. This should speed up garbage collection
considerably.

The major downside is that walkNamespaceInPath will no longer
automatically skip over directories that are named like blob files, or
blob files which are deleted between readdir() and stat(). Callers to
walkNamespaceInPath and its variants (WalkNamespace,
WalkSatellitePieces, etc) are now expected to handle these cases
individually.

Thanks to forum member Toyoo for the insight that this would speed up
garbage collection.

Refs: https://github.com/storj/storj/issues/5454
Change-Id: I72930573d58928fa25057ed89cd4ec474b884199
2023-01-30 13:47:03 +00:00
Egon Elbre
aeb645d32b all: replace deprecated ioutil
Change-Id: I60b0bbf5b68b066e2d44b8b99438594d600a3c2d
2022-10-31 15:50:41 +00:00
Egon Elbre
4fd59fb3c9 go.mod: bump common
Change-Id: I1ea76bad3d1fbe0cb78b8ab23a1512c6007a166f
2022-10-18 13:21:49 -04:00
Egon Elbre
ff22fc7ddd all: fix deprecated ioutil commands
Change-Id: I59db35116ec7215a1b8e2ae7dbd319fa099adfac
2022-10-11 15:27:29 +00:00
Márton Elek
ac6bb1e187 storage: experimental flag to turn off file sync
Full file sync before saving files to piecestore seems to be very expensive.

Bwfore we do any step to eliminate them we are planning to do more measurement which requires a temporary flag to turn it off. (not for production).

Change-Id: I5cb8f8cb348ca3590fb5eae14d02edb3f0424617
2022-10-06 13:10:34 +00:00
Egon Elbre
0d2d59f884 all: fix linting issues
Change-Id: Idfc93948e59a181321d79b365e638d63e256a16f
2022-03-21 15:26:42 +00:00
Ethan Adams
27c6c6aeae
ci: Golangci lint v1.43.0 changes (#4307)
Co-authored-by: Stefan Benten <mail@stefan-benten.de>
2021-12-09 20:49:48 +01:00
Egon Elbre
9fd091831d ci: optimize benchmarks
We are not using the benchmark results for anything, they are mostly
there to ensure that we don't break the benchmarks. So we can disable
CockroachDB for them.

Similarly add short versions of other tests.

Also try to precompile test/benchmark code.

Change-Id: I60b501789f70c289af68c37a052778dc75ae2b69
2021-10-08 19:42:40 +03:00
littleskunk
3c3df0f362
storage/filestore: test interaction between VerifyDirReadable and VerifyDirWriteable 2021-09-15 18:57:56 +03:00
Egon Elbre
1aec831d98 satellite/audit,storage: increase sleep delay in TestMaxVerifyCount
Currently TextMaxVerifyCount flakes in some tests, try increasing the
sleep time to ensure that things are slow enough to trigger the error
condition.

Also pass ctx to all the funcs so we can handle sleep better.

Change-Id: I605b6ea8b14a0a66d81a605ce3251f57a1669c00
2021-09-10 15:30:37 +00:00
JT Olio
1c3688f1d6 storage/filestore: include satellite id in open_file_in_trash metrics
Change-Id: Ib7aa15f84c3acd9eedc4778bd2cde3072d0bf64b
2021-06-29 23:25:25 +00:00
Egon Elbre
10372afbe4 ci: fix lint errors
Change-Id: Ib5893440807811f77175ccd347aa3f8ca9cccbdf
2021-05-17 13:37:31 +00:00
Egon Elbre
961e841bd7 all: fix error naming
errs.Class should not contain "error" in the name, since that causes a
lot of stutter in the error logs. As an example a log line could end up
looking like:

    ERROR node stats service error: satellitedbs error: node stats database error: no rows

Whereas something like:

    ERROR nodestats service: satellitedbs: nodestatsdb: no rows

Would contain all the necessary information without the stutter.

Change-Id: I7b7cb7e592ebab4bcfadc1eef11122584d2b20e0
2021-04-29 15:38:21 +03:00
Egon Elbre
4d37378129 storage/{cockroachkv,postgreskv}: delete unused code
Change-Id: I11da924508d207d8816bda206d28261931d9cd4e
2021-04-23 16:37:28 +03:00
Egon Elbre
a2e20c93ae private/dbutil: use dbutil and tagsql from storj.io/private
Initially we duplicated the code to avoid large scale changes to
the packages. Now we are past metainfo refactor we can remove the
duplication.

Change-Id: I9d0b2756cc6e2a2f4d576afa408a15273a7e1cef
2021-04-23 14:36:52 +03:00
Ivan Fraixedes
c5cb4dce4d redis: Rename functions prefixed with New by Open
Rename the functions that are prefixed with 'New' which connect with
Redis by 'Open' to  make clear that they perform network operations.

Change-Id: I1351e89a642e8e2c2586626646315ad0fb2c6242
2021-03-25 06:09:27 +00:00
Ivan Fraixedes
4c1098e571 Redis: Update Redis package to last major version
Update the Redis dependency to use the last major production version.
The last version accepts a context parameter in all the network methods
so it allows us to pass it through them.

Change-Id: I34121b2ec3c2728602115c724933ad24c9e6e4fd
2021-03-18 14:19:49 +00:00
Ivan Fraixedes
84b844a2a7 redis-server: Move testing type to specific testing pkg
Move a specific interface & types used for testing to be a private
subpackage with a name that clearly identifies it for testing purpose.

Change-Id: I646cf3b6f0a3b518a6f9a125998dc5a02df02db6
2021-03-10 06:09:46 +00:00
Ivan Fraixedes
ce26616647
satellite/accounting/live: Use Redis client directly
We have to adapt the live accounting to allow the packages that use it
to differentiate about errors for being able to ignore them and make our
satellite resilient to Redis downtime.

For differentiating errors we should make changes in the live accounting
but also in the storage/redis.Client, however, we may need to do some
dirty workarounds or break other parts of the implementation that
depends on it.

On the other hand we want to get rid of the storage/redis.Client because
it has more functionality that the one that we are using and some
process has been started to remove it.

Hence, we have refactored the live accounting to directly use the Redis
client library for later on (in a future commit) adapt the satellite for
being resilient to Redis downtime.

Last but not least, a test for expired bandwidth keys have been added
and with it a bug was spotted and fix it.

Change-Id: Ibd191522cd20f6a9a15e5ccb7beb83a678e530ff
2021-01-12 15:33:29 +01:00
Ethan Adams
f90ea10a4a
Allow for DB application names per process. (#3983) 2020-12-04 11:24:39 +01:00
Egon Elbre
7183dca6cb all: fix defers in loop
defer should not be called in a loop.

Change-Id: Ifa5a25a56402814b974bcdfb0c2fce56df8e7e59
2020-11-02 15:06:38 +02:00
Egon Elbre
e0dca4042d all: add pprof labels for debugger
By using pprof.Labels debugger is able to show service/peer names in
goroutine names.

Change-Id: I5f55253470f7cc7e556f8e8b87f746394e41675f
2020-10-29 15:10:07 +00:00
Egon Elbre
caefde6b32 private/{dbutil,tagsql}: pass ctx to database opening
Database opening usually dial and hence we should pass ctx to them.

Change-Id: Iaa2875981570d83e65be3710f841cf30349f807b
2020-10-29 10:51:29 +00:00
Egon Elbre
e3985799a1 storage/{cockroachkv,postgreskv}: add ctx to opening
Database opening usually dial and hence we should pass ctx to them.

Change-Id: Iecf41241aaa94d54506cbc80b0e53449848d8819
2020-10-29 10:49:08 +00:00
Egon Elbre
22ec940f7e storage/filestore: defer closing
Change-Id: Iccd6d1c64c1b7a6eecaa4c675bb7b554b381d0f5
2020-10-26 21:09:58 +02:00
Egon Elbre
2268cc1df3 all: fix linter complaints
Change-Id: Ia01404dbb6bdd19a146fa10ff7302e08f87a8c95
2020-10-13 15:59:01 +03:00
Egon Elbre
0bdb952269 all: use keyed special comment
Change-Id: I57f6af053382c638026b64c5ff77b169bd3c6c8b
2020-10-13 15:13:41 +03:00
Cameron Ayer
ca0c1a5f0c storagenode/{monitor,pieces}, storage/filestore: add loop to check storage directory writability
periodically create and delete a temp file in the storage directory
to verify writability. If this check fails, shut the node down.

Change-Id: I433e3a8d1d775fc779ae78e7cf3144a05ffd0574
2020-08-31 21:20:49 +00:00
Cameron Ayer
586e6f2f13 private/testblobs, storage, storage/filestore: add storage dir verification to filestore
Sometimes SNOs fail to properly configure or lose connection to their storage directory
which can result in DQ. This causes unnecessary repair and is unfortunate for all parties.

This change introduces the creation of a special file in the storage directory at runtime
containing the node ID. While the storage node runs, it periodically verifies that it can
find said file with the correct contents in the correct location. If not, the node will
shut down with an error message.

This change will solve the issue of nodes losing access to the storage directory, but it will not
solve the issue of nodes pointing to the wrong directory, as the identifying file is created each
time the node starts up. After this change has been the minimum version for a few releases, we will
remove the creation of the directory-identifying file from the storage node run command and add it
to the setup command.

Change-Id: Ib7b10e96ac07373219835e39239e93957e7667a4
2020-08-19 17:18:14 +00:00
Egon Elbre
94a09ce20b all: add missing dots
Change-Id: I93b86c9fb3398c5d3c9121b8859dad1c615fa23a
2020-08-11 17:50:01 +03:00
Yaroslav Vorobiov
4d2a505788 storagenode/db: explicitly open and create dbs
To prevent storagenode from implicitly recreating missing dbs and storage,
as such behaviour leads to audit failures. Do not allow storagenode to
start if any of dbs or storage is missing, corrupted, or dedicated storage disk is
unmounted, to get downtime instead.

Change-Id: Ic64e1f0ff4d8ef5b2fddbe7a7e53df4f4bd8652e
2020-07-24 14:08:47 +03:00
Egon Elbre
b67d7ecbc5 cmd/storagenode,storage/cockroachkv: better error handling
Change-Id: I6646aa046dc365c0dee38f23041be4fc2defb759
2020-07-16 20:03:50 +03:00
Egon Elbre
d8dcae3075 all: fix error checking
Change-Id: Ia0da1bbd6ce695139922f94096c2419281905e32
2020-07-16 19:13:14 +03:00
Egon Elbre
e70da5cd4e all: fix comments
Change-Id: I2d2307e3fab87de47a72b3595d051e2c95ff4f8a
2020-07-16 19:13:14 +03:00
Egon Elbre
080ba47a06 all: fix dots
Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2
2020-07-16 14:58:28 +00:00
stefanbenten
257855b5de all: replace == comparison with errors.Is
Change-Id: I05d9a369c7c6f144b94a4c524e8aea18eb9cb714
2020-07-14 15:50:25 +00:00
paul cannon
bbdb351e5e all: use jackc/pgx in place of lib/pq
What:

Use the github.com/jackc/pgx postgresql driver in place of
github.com/lib/pq.

Why:

github.com/lib/pq has some problems with error handling and context
cancellations (i.e. it might even issue queries or DML statements more
than once! see https://github.com/lib/pq/issues/939). The
github.com/jackx/pgx library appears not to have these problems, and
also appears to be better engineered and implemented (in particular, it
doesn't use "exceptions by panic"). It should also give us some
performance improvements in some cases, and even more so if we can use
it directly instead of going through the database/sql layer.

Change-Id: Ia696d220f340a097dee9550a312d37de14ed2044
2020-07-13 15:54:41 +00:00
Rafael Gomes
24a1eac16c storage/postgreskv: Sort storage keys before delete (postgres)
Change-Id: I63599e142b387ded25d110458ae10c2c96cd8ea6
2020-07-10 20:43:45 +00:00
Rafael Gomes
569b49768e storage/cockroachkv: Sort storage keys before delete (cockroach)
Change-Id: Iee8e56ff66e2760e933f3860d5fc75230a507558
2020-07-10 16:27:26 -04:00
Egon Elbre
5bdcd86fa7 ci: test benchmarks
This runs each benchmark for one iteration to ensure that they are
valid. Unfortunately, it does not give any useful metrics as output.

Change-Id: I68940398c8dd849aed656bd12656f48d5df10128
2020-07-10 13:26:49 +00:00
Qweder93
0521435e08 storagenode/gracefulexit: added deletion of all files left in storage/blobs/satellite after successful GE
https://storjlabs.atlassian.net/browse/SG-368

Change-Id: I29a978fe0d0153aedf2be91dc7f45b4ef386d447
2020-07-08 14:38:31 +03:00
JT Olio
78c0d9352d storage: support monkit traces of limit exceeded
errors.New errors will not show up in monkit tracing
as a useful error type. this change fixes a test (!)
and makes it so monkit will tell us what the error
type is, if we have this failure

Change-Id: Ic9933704e4095495c7ee286d9df3eb7eb94b25c9
2020-07-06 15:14:02 +00:00
Ivan Fraixedes
ed9816fd30 storage/filestore: Ignore IsNotExist error walking files
A file piece could be deleted in between walking the list of files read
from a directory and before we actually perform any operation on such
file. When that happens, we don't want to return an error, we want to
just ignore it and carry on.

Change-Id: I8f6986070e5883599a08fccf8b125c075b30fe1b
2020-06-23 19:28:42 +00:00
Rafael Gomes
958ea1b9df satellite/accounting: add download limit cache
Change-Id: I722930cab8bd5d240f4878dc6997e9bc7637311f
2020-06-12 16:33:46 -03:00
Egon Elbre
36c461bd59 private/tagsql: track proper closing of rows and statements
This ensures that rows are closed to avoid leaks.
Also verifies that Err() is called, to ensure that no
error is left behind.

Change-Id: Idd1bec9bf479f40021da67b2c80ce83033149469
2020-06-05 18:25:43 +00:00