storj

Author	SHA1	Message	Date
paul cannon	fd55dad735	storagenode/retain: don't quit on error It has been noted in the forum that, during a Retain operation, when a piece can't be deleted, the process never completes. The error is written to the log, but the completion line "Moved pieces to trash during retain" never is. This `return` line is the reason. We should instead continue the loop. Change-Id: I0f51d34aba0e81ad60a75802069b42dc135ad907 Refs: https://github.com/storj/storj/issues/6482	2023-11-06 10:54:46 -06:00
Clement Sam	e0542c2d24	storagenode: run garbage collection filewalker as a low I/O subprocess Updates https://github.com/storj/storj/issues/5349 Change-Id: I7d810d737b17f0b74943765f7f7cc30b9fcf1425	2023-05-02 19:43:38 +00:00
paul cannon	2f04e20627	storage/filestore: better error message on data corruption A user on the forum was seeing the error "bad message", which was not very helpful. This case from the ext4 filesystem using the code EBADMSG to indicate it detected an invalid CRC, suggesting disk corruption. This change adds some explanatory information about probable disk corruption to all errors coming from the (*blobInfo).Stat() call, which is where storagenode fs corruption problems will usually manifest. Refs: https://github.com/storj/storj/issues/5375 Change-Id: I87f4a800236050415c4191ef1a0fc952f9def315	2023-01-30 08:54:06 -06:00
paul cannon	ed7c82439d	storage/filestore: avoid stat() during walkNamespaceInPath Calling stat() (really, lstat()) on every file during a directory walk is the step that takes up the most time. Furthermore, not all directory walk uses _need_ to have a stat done on every file. Therefore, in this commit we avoid doing the stat at the lowest level of walkNamespaceInPath. The stat will still be done when it is requested, with the Stat() method on the blobInfo object. The major upside of this is that we can avoid the stat call on most files during a Retain operation. This should speed up garbage collection considerably. The major downside is that walkNamespaceInPath will no longer automatically skip over directories that are named like blob files, or blob files which are deleted between readdir() and stat(). Callers to walkNamespaceInPath and its variants (WalkNamespace, WalkSatellitePieces, etc) are now expected to handle these cases individually. Thanks to forum member Toyoo for the insight that this would speed up garbage collection. Refs: https://github.com/storj/storj/issues/5454 Change-Id: I72930573d58928fa25057ed89cd4ec474b884199	2023-01-30 13:47:03 +00:00
Erik van Velzen	e6b5501f9b	satellite/gc/sender: new service to send retain filters Implement a new service to read retain filter from a bucket and send them out to storagenodes. This allows the retain filters to be generated by a separate command on a backup of the database. Paralellism (setting ConcurrentSends) and end-to-end garbage collection tests will be restored in a subsequent commit. Solves https://github.com/storj/team-metainfo/issues/121 Change-Id: Iaf8a33fbf6987676cc3cf74a18a8078916fe673d	2022-09-20 11:49:40 +00:00
Stefan Benten	c3171b4ba4	storagenode/retain: Move summary and start logs to info level (#4954 ) We currently do not log the GC information/stats under normal circumstances. This is not good for monitoring and troubleshooting.	2022-07-08 18:19:08 +02:00
Yaroslav Vorobiov	c9cfb5ed0c	storagenode/retain: add more verbose monkit monitoring Change-Id: Ibb9804268751b4b1842eb729bc510dba83e9b28b	2021-06-04 20:20:11 +00:00
Qweder93	53a5d18e1a	storagenode: fixed logging about piece being moved to trash, and added logging when piece was actually deleted Change-Id: I46f6a141b27033c2087b5c4681506d80b90f4a18	2020-08-02 20:00:05 +03:00
Egon Elbre	080ba47a06	all: fix dots Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2	2020-07-16 14:58:28 +00:00
Jeff Wendling	7999d24f81	all: use monkit v3 this commit updates our monkit dependency to the v3 version where it outputs in an influx style. this makes discovery much easier as many tools are built to look at it this way. graphite and rothko will suffer some due to no longer being a tree based on dots. hopefully time will exist to update rothko to index based on the new metric format. it adds an influx output for the statreceiver so that we can write to influxdb v1 or v2 directly. Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff	2020-02-05 23:53:17 +00:00
Egon Elbre	6615ecc9b6	common: separate repository Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a	2019-12-27 14:11:15 +02:00
littleskunk	08947e177d	storagenode/garbagecollection: enable in production Change-Id: I627b7a37ca4a85eb19936ca2c7ca907d7cc63f5b	2019-12-16 22:44:04 +00:00
littleskunk	9d1faeee58	storagenode/garbagecollection: increase MaxTimeSkew to be higher than satellite MaxCommitInterval Change-Id: I86f8d0b44bea3aa005ff26d52588611c59df5e9a	2019-12-09 16:03:55 +00:00
Isaac Hess	6aeddf2f53	storagenode/pieces: Add Trash and RestoreTrash to piecestore (#3575 ) * storagenode/pieces: Add Trash and RestoreTrash to piecestore * Add index for expiration trash	2019-11-20 09:28:49 -07:00
littleskunk	7eb6724c92	logging: unify logging around satellite ID, node ID and piece ID (#3491 ) * logging: unify logging around satellite ID, node ID and piece ID * unify segment index	2019-11-05 22:04:07 +01:00
Maximillian von Briesen	08ed50bcaa	satellite/metainfo: add commit interval to prevent long delays between order limit creation and segment commit (#3149 )	2019-10-01 12:55:02 -04:00
Egon Elbre	a801fab66a	all: add archview annotations (#2964 )	2019-09-10 16:24:16 +03:00
Egon Elbre	8a5db77e04	storagenode/retain: add comment (#2910 )	2019-08-29 19:42:17 +03:00
Egon Elbre	62e3bf5b34	storagenode/retain: fix concurrency issues (#2828 ) * nicer flags * fix concurrency * add concurrent workers * initialize things * fix tests * close retain service * ensure we don't have workers working on the same satellite * ensure things compile * fix other compilation issues: * concurrency changes ran this with `go test -count=1000` and it passed all of them. - we add a closed channel so that we can select on it with context cancellation. - we put a once in so we only close the channel once. - every time the queue/running state changes, we have to broadcast because we may want to wake up N pending Wait calls or other concurrent workers. - because we broadcast, we don't need to do the polling in Wait anymore. - ensure Run doesn't start multiple times so that we don't have to worry about concurrent Close with multiple Runs. - hold the lock while we start workers so that a concurrent Close with Run can't decide that there's nothing started and exit and then have Run start things. - make sure to poll the closed/context channels through loops or at the start of Run calls in case Close happens first. - these polls should be under a mutex because they have a default case which makes it possible to schedule such that Close hasn't executed the channel close so it starts more work. - cancel a local Run context when it's going to exit to make sure that any retainPieces calls have a canceled context. - hopefully enough comments to both check my work and help readers digest what's going on. Change-Id: Ida0e226a7e01e8ae64fa2c59dd5a84b04bccfbd7 * use the retain error class Change-Id: I1511eaef135f98afd57b878e997e4c8a0d11cafc * concurrency fixes again - forgot to update the gc test to use the old Wait api. - we need to drop the lock while we wait for the workers to exit, because they may be blocked on the condition variable - additionally, we need to broadcast when we close the signal channel because the state changed: they want to wake up and exit. Change-Id: I4204699792275260cd912f29aa73720f7d9b14b5 * undo my misguided rename Change-Id: I6baffe1eb0434e260212c485bbcc01bed3250881 * remove pollInterval * format paragraph more nicely * move skew calculation into retain pieces	2019-08-28 16:35:25 -04:00
Maximillian von Briesen	d83a965139	storagenode/piecestore: Add retain service on storagenode (#2785 ) Add retain service on storagenode. This service runs retain jobs that have been queued by the storagenodes. Rather than running retain jobs during the grpc Retain() call, the grpc call queues a retain job to the retain service and returns immediately afterwards, removing a significant bottleneck in garbage collection.	2019-08-19 14:52:47 -04:00

20 Commits