storj

Author	SHA1	Message	Date
paul cannon	2f04e20627	storage/filestore: better error message on data corruption A user on the forum was seeing the error "bad message", which was not very helpful. This case from the ext4 filesystem using the code EBADMSG to indicate it detected an invalid CRC, suggesting disk corruption. This change adds some explanatory information about probable disk corruption to all errors coming from the (*blobInfo).Stat() call, which is where storagenode fs corruption problems will usually manifest. Refs: https://github.com/storj/storj/issues/5375 Change-Id: I87f4a800236050415c4191ef1a0fc952f9def315	2023-01-30 08:54:06 -06:00
paul cannon	ed7c82439d	storage/filestore: avoid stat() during walkNamespaceInPath Calling stat() (really, lstat()) on every file during a directory walk is the step that takes up the most time. Furthermore, not all directory walk uses _need_ to have a stat done on every file. Therefore, in this commit we avoid doing the stat at the lowest level of walkNamespaceInPath. The stat will still be done when it is requested, with the Stat() method on the blobInfo object. The major upside of this is that we can avoid the stat call on most files during a Retain operation. This should speed up garbage collection considerably. The major downside is that walkNamespaceInPath will no longer automatically skip over directories that are named like blob files, or blob files which are deleted between readdir() and stat(). Callers to walkNamespaceInPath and its variants (WalkNamespace, WalkSatellitePieces, etc) are now expected to handle these cases individually. Thanks to forum member Toyoo for the insight that this would speed up garbage collection. Refs: https://github.com/storj/storj/issues/5454 Change-Id: I72930573d58928fa25057ed89cd4ec474b884199	2023-01-30 13:47:03 +00:00
Erik van Velzen	e6b5501f9b	satellite/gc/sender: new service to send retain filters Implement a new service to read retain filter from a bucket and send them out to storagenodes. This allows the retain filters to be generated by a separate command on a backup of the database. Paralellism (setting ConcurrentSends) and end-to-end garbage collection tests will be restored in a subsequent commit. Solves https://github.com/storj/team-metainfo/issues/121 Change-Id: Iaf8a33fbf6987676cc3cf74a18a8078916fe673d	2022-09-20 11:49:40 +00:00
Márton Elek	4b1be6bf8e	storagenode/satellite: support different piece hash algorithms Change-Id: I3db321e79f12f3ebaa249e6c32fa37fd9615687e	2022-08-23 18:15:06 +00:00
Stefan Benten	c3171b4ba4	storagenode/retain: Move summary and start logs to info level (#4954 ) We currently do not log the GC information/stats under normal circumstances. This is not good for monitoring and troubleshooting.	2022-07-08 18:19:08 +02:00
Yaroslav Vorobiov	c9cfb5ed0c	storagenode/retain: add more verbose monkit monitoring Change-Id: Ibb9804268751b4b1842eb729bc510dba83e9b28b	2021-06-04 20:20:11 +00:00
Qweder93	53a5d18e1a	storagenode: fixed logging about piece being moved to trash, and added logging when piece was actually deleted Change-Id: I46f6a141b27033c2087b5c4681506d80b90f4a18	2020-08-02 20:00:05 +03:00
Egon Elbre	080ba47a06	all: fix dots Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2	2020-07-16 14:58:28 +00:00
Egon Elbre	c630cf2490	storagenode/pieces: implement buffering for writing Currently uploads can cause a lot of IOPS, reduce this by introducing a in-memory buffer on-top of the file. Change-Id: I5f4e3e01c0a36258271d180b922107de447bcb59	2020-05-04 06:01:32 +00:00
Jeff Wendling	7999d24f81	all: use monkit v3 this commit updates our monkit dependency to the v3 version where it outputs in an influx style. this makes discovery much easier as many tools are built to look at it this way. graphite and rothko will suffer some due to no longer being a tree based on dots. hopefully time will exist to update rothko to index based on the new metric format. it adds an influx output for the statreceiver so that we can write to influxdb v1 or v2 directly. Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff	2020-02-05 23:53:17 +00:00
Egon Elbre	21f53e38da	storagenode/storagenodedb/storagenodedbtest: pass ctx as an argument Change-Id: I10b0a8ef3a7d5001e7d361f1873ad5987af1f9c2	2020-01-20 16:56:12 +02:00
Egon Elbre	6615ecc9b6	common: separate repository Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a	2019-12-27 14:11:15 +02:00
Egon Elbre	7a36507a0a	private/testcontext: ensure we call cleanup everywhere Change-Id: Icb921144b651611d78f3736629430d05c3b8a7d3	2019-12-17 14:16:09 +00:00
littleskunk	08947e177d	storagenode/garbagecollection: enable in production Change-Id: I627b7a37ca4a85eb19936ca2c7ca907d7cc63f5b	2019-12-16 22:44:04 +00:00
littleskunk	9d1faeee58	storagenode/garbagecollection: increase MaxTimeSkew to be higher than satellite MaxCommitInterval Change-Id: I86f8d0b44bea3aa005ff26d52588611c59df5e9a	2019-12-09 16:03:55 +00:00
Isaac Hess	6aeddf2f53	storagenode/pieces: Add Trash and RestoreTrash to piecestore (#3575 ) * storagenode/pieces: Add Trash and RestoreTrash to piecestore * Add index for expiration trash	2019-11-20 09:28:49 -07:00
Egon Elbre	ee6c1cac8a	private: rename internal to private (#3573 )	2019-11-14 21:46:15 +02:00
paul cannon	bd89f51c66	Keep v0pieceinfo database isolated (#3364 ) * put TestCreateV0 back in StoreForTest * avoid direct handles to V0 pieceinfo db * type mismatch fix * use storage.Blobs interface in store_test.go ..instead of filestore.Store. this will allow filestore.Store to become unexported. * unexport filestore.Store rename it to blobStore. things should use the storage.Blobs interface instead. changes in this commit are purely mechanical (made through the "refactor" tool in Gocode followed by search/replace on the word "Store" within the storage/filestore/ directory). * kill filestore.StoreForTest now that filestore.blobStore is unexported, there isn't a need for a specialized wrapper type. this (not coincidentally) also makes it possible for the WriterForFormatVersion() method on storagenode/pieces.StoreForTest to work, without requiring everything to wrap the store.blobs attribute in a filestore.StoreForTest, which was impractical.	2019-11-13 13:15:31 -06:00
littleskunk	7eb6724c92	logging: unify logging around satellite ID, node ID and piece ID (#3491 ) * logging: unify logging around satellite ID, node ID and piece ID * unify segment index	2019-11-05 22:04:07 +01:00
Maximillian von Briesen	08ed50bcaa	satellite/metainfo: add commit interval to prevent long delays between order limit creation and segment commit (#3149 )	2019-10-01 12:55:02 -04:00
Egon Elbre	a801fab66a	all: add archview annotations (#2964 )	2019-09-10 16:24:16 +03:00
Egon Elbre	8a5db77e04	storagenode/retain: add comment (#2910 )	2019-08-29 19:42:17 +03:00
Egon Elbre	62e3bf5b34	storagenode/retain: fix concurrency issues (#2828 ) * nicer flags * fix concurrency * add concurrent workers * initialize things * fix tests * close retain service * ensure we don't have workers working on the same satellite * ensure things compile * fix other compilation issues: * concurrency changes ran this with `go test -count=1000` and it passed all of them. - we add a closed channel so that we can select on it with context cancellation. - we put a once in so we only close the channel once. - every time the queue/running state changes, we have to broadcast because we may want to wake up N pending Wait calls or other concurrent workers. - because we broadcast, we don't need to do the polling in Wait anymore. - ensure Run doesn't start multiple times so that we don't have to worry about concurrent Close with multiple Runs. - hold the lock while we start workers so that a concurrent Close with Run can't decide that there's nothing started and exit and then have Run start things. - make sure to poll the closed/context channels through loops or at the start of Run calls in case Close happens first. - these polls should be under a mutex because they have a default case which makes it possible to schedule such that Close hasn't executed the channel close so it starts more work. - cancel a local Run context when it's going to exit to make sure that any retainPieces calls have a canceled context. - hopefully enough comments to both check my work and help readers digest what's going on. Change-Id: Ida0e226a7e01e8ae64fa2c59dd5a84b04bccfbd7 * use the retain error class Change-Id: I1511eaef135f98afd57b878e997e4c8a0d11cafc * concurrency fixes again - forgot to update the gc test to use the old Wait api. - we need to drop the lock while we wait for the workers to exit, because they may be blocked on the condition variable - additionally, we need to broadcast when we close the signal channel because the state changed: they want to wake up and exit. Change-Id: I4204699792275260cd912f29aa73720f7d9b14b5 * undo my misguided rename Change-Id: I6baffe1eb0434e260212c485bbcc01bed3250881 * remove pollInterval * format paragraph more nicely * move skew calculation into retain pieces	2019-08-28 16:35:25 -04:00
Maximillian von Briesen	d83a965139	storagenode/piecestore: Add retain service on storagenode (#2785 ) Add retain service on storagenode. This service runs retain jobs that have been queued by the storagenodes. Rather than running retain jobs during the grpc Retain() call, the grpc call queues a retain job to the retain service and returns immediately afterwards, removing a significant bottleneck in garbage collection.	2019-08-19 14:52:47 -04:00

24 Commits