storj

Author	SHA1	Message	Date
Egon Elbre	004e610d0f	satellite/internalpb: move datarepair.pb to internal Change-Id: If901d9ff4e5ee6715b963eeeb46513a602a44b3d	2020-10-30 13:28:14 +02:00
Egon Elbre	2268cc1df3	all: fix linter complaints Change-Id: Ia01404dbb6bdd19a146fa10ff7302e08f87a8c95	2020-10-13 15:59:01 +03:00
Cameron Ayer	c2525ba2b5	satellite/{repair,satellitedb}: clean up healthy segments from repair queue at end of checker iteration Repair workers prioritize the most unhealthy segments. This has the consequence that when we finally begin to reach the end of the queue, a good portion of the remaining segments are healthy again as their nodes have come back online. This makes it appear that there are more injured segments than there actually are. solution: Any time the checker observes an injured segment it inserts it into the repair queue or updates it if it already exists. Therefore, we can determine which segments are no longer injured if they were not inserted or updated by the last checker iteration. To do this we add a new column to the injured segments table, updated_at, which is set to the current time when a segment is inserted or updated. At the end of the checker iteration, we can delete any items where updated_at < checker start. Change-Id: I76a98487a4a845fab2fbc677638a732a95057a94	2020-09-29 20:38:22 +00:00
Cameron Ayer	e7c34a053d	satellite/satellitedb: add column and index "updated_at" to injuredsegments Change-Id: I59e9bb2077885f09e17795375fe98ed31bd83d54	2020-09-14 12:53:04 -04:00
stefanbenten	257855b5de	all: replace == comparison with errors.Is Change-Id: I05d9a369c7c6f144b94a4c524e8aea18eb9cb714	2020-07-14 15:50:25 +00:00
Jeff Wendling	254b42ff65	satellite/satellitedb: fix leaked rows from repairQueue.Insert Change-Id: If5e62c49770f591ebe3f4d2dd4dd2658c229a022	2020-06-03 14:31:21 -06:00
Moby von Briesen	290c006a10	satellite/repair/{checker,queue}: add metric for new segments added to repair queue * add monkit stat new_remote_segments_needing_repair, which reports the number of new unhealthy segments in the repair queue since the previous checker iteration Change-Id: I2f10266006fdd6406ece50f4759b91382059dcc3	2020-05-27 06:23:47 +00:00
Bill Thorp	e99e675fb1	satellite/satellitedb: use time zones with all timestamps The migration was broken into one migration per table to reduce table locking and reduce the chances of failure due to SQL timeouts. Of the 14 fields that lacked time zones, only the 3 named 'interval_start` seemed to have non-UTC data in them. These fields are fixed in the migration by removing the +00 and adding AT TIME ZONE current_setting('TIMEZONE') Field with good data are migrated by adding AT TIME ZONE 'UTC' Note that postgres's timezone() is different than cockroach's timezone() so AT TIME ZONE is used. https://storjlabs.atlassian.net/browse/SM-104 Change-Id: I410f2f1d7c11b143f17844347f37e6f4b1e70fce	2020-03-05 21:11:25 +00:00
Moby von Briesen	4e5a7f13c7	satellite/repair/queue: Prioritize selection of items off repair queue by segment health Add a column to the repair queue table in the satellite db for healthy piece count. When an item is selected from the repair queue, the least durable segment that has not been attempted in the past hour should be selected first. This prevents our repairer from getting stuck doing work on segments that are close to the repair threshold while allowing segments that are more unhealthy to degrade further. The migration also clears the repair queue so that the migration runs quickly and we can properly account for segment health in future repair work. We do not select items off the repair queue that have been attempted in the past six hours. This was changed from on hour to allow us time to try a wider variety of segments when the repair queue is very large. Change-Id: Iaf183f1e5fd45cd792a52e3563a3e43a2b9f410b	2020-02-26 09:54:16 -05:00
Egon Elbre	76fdb5d863	storage: add configurable lookup limits Currently storage tests were tied to the default lookup limit. By increasing the limits, the tests will take longer and sometimes cause a large number of goroutines to be started. This change adds configurable lookup limit to all storage backends. Also remove boltdb.NewShared, since it's not used any more. Change-Id: I1a052f149da471246fac5745da133c3cfc27582e	2020-01-22 21:35:56 +02:00
Egon Elbre	7d79aab14e	satellite/satellitedb: fixes to row handling Change-Id: I48fae692bcca152143a12f333296c42471538850	2020-01-16 17:07:26 +02:00
paul cannon	4a26fb5bd5	satellite/satellitedb: don't use crdb.ExecuteTx with postgres crdb.ExecuteTx is great, but I don't think it will work right with PostgreSQL. It works by way of cockroach savepoints, which allows it to react to retryable errors, whereas tx.Commit() doesn't. But I don't think PostgreSQL savepoints work exactly the same way. I'm not 100% sure, but it doesn't seem worth the risk. So, I'm switching one case here to use the new dbutil.WithTx instead, which will use crdb.ExecuteTx if appropriate. The other case doesn't need a transaction at all. Change-Id: I39283f3b5d8d47596db7aff5048bb74597e5918f	2020-01-06 23:51:35 +00:00
Egon Elbre	6615ecc9b6	common: separate repository Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a	2019-12-27 14:11:15 +02:00
paul cannon	b5ddfc6fa5	satellite/satellitedb: unexport satellitedb.DB Backstory: I needed a better way to pass around information about the underlying driver and implementation to all the various db-using things in satellitedb (at least until some new "cockroach driver" support makes it to DBX). After hitting a few dead ends, I decided I wanted to have a type that could act like a dbx.DB but which would also carry information about the implementation, etc. Then I could pass around that type to all the things in satellitedb that previously wanted dbx.DB. But then I realized that satellitedb.DB was, essentially, exactly that already. One thing that might have kept satellitedb.DB from being directly usable was that embedding a dbx.DB inside it would make a lot of dbx methods publicly available on a satellitedb.DB instance that previously were nicely encapsulated and hidden. But after a quick look, I realized that _nothing_ outside of satellite/satellitedb even needs to use satellitedb.DB at all. It didn't even need to be exported, except for some trivially-replaceable code in migrate_postgres_test.go. And once I made it unexported, any concerns about exposing new methods on it were entirely moot. So I have here changed the exported satellitedb.DB type into the unexported satellitedb.satelliteDB type, and I have changed all the places here that wanted raw dbx.DB handles to use this new type instead. Now they can just take a gander at the implementation member on it and know all they need to know about the underlying database. This will make it possible for some other pending code here to differentiate between postgres and cockroach backends. Change-Id: I27af99f8ae23b50782333da5277b553b34634edc	2019-12-16 19:09:30 +00:00
Jennifer Johnson	8e9532dbd9	satellitedb/repairqueue: use errs.New() instead of fmt.Errorf() to retain stack trace Change-Id: I47f1985aaeace556e3e0f7a20d2718410936db17	2019-12-05 15:52:25 +00:00
Jennifer Johnson	7c5f777a4f	satellitedb/repairqueue: runs a different implementation of the query within Select() for postgres vs cockroach Change-Id: Ie34dbdb9d870d7d9f8f269702b6b3bad0c55b98e	2019-12-04 21:32:02 +00:00
Egon Elbre	ee6c1cac8a	private: rename internal to private (#3573 )	2019-11-14 21:46:15 +02:00
Egon Elbre	3c438f31bd	satellite/satellitedb: remove sqlite support (#3296 )	2019-10-19 00:27:57 +03:00
Alexander Leitner	159ad439b1	Add count to repair queue (#2661 ) * Add count to repair queue	2019-07-30 11:21:40 -04:00
Egon Elbre	0cdeae1922	add missing error handling (#2630 )	2019-07-25 17:01:44 +02:00
Ivan Fraixedes	3c8f1370d2	[v3 2137] - Add more info to find out repair failures (#2623 ) * pkg/datarepair/repairer: Track always time for repair Make a minor change in the worker function of the repairer, that when successful, always track the metric time for repair independently if the time since checker queue metric can be tracked. * storage/postgreskv: Wrap error in Get func Wrap the returned error of the Get function as it is done when the query doesn't return any row. * satellite/metainfo: Move debug msg to the right place NewStore function was writing a debug log message when the DB was connected, however it was always writing it out despite if an error happened when getting the connection. * pkg/datarepair/repairer: Wrap error before logging it Wrap the error returned by process which is executed by the Run method of the repairer service to add context to the error log message. * pkg/datarepair/repairer: Make errors more specific in worker Make the error messages of the "worker" method of the Service more specific and the logged message for such errors. * pkg/storage/repair: Improve error reporting Repair In order of improving the error reporting by the pkg/storage/repair.Repair method, several errors of this method and functions/methods which this one relies one have been updated to be wrapper into their corresponding classes. * pkg/storage/segments: Track path param of Repair method Track in monkit the path parameter passed to the Repair method. * satellite/satellitedb: Wrap Error returned by Delete Wrap the error returned by repairQueue.Delete method to enhance the error with a class and stack and the pkg/storage/segments.Repairer.Repair method get a more contextualized error from it.	2019-07-23 16:28:06 +02:00
Maximillian von Briesen	b590e53d64	Order by attempted time in injured segments select (#2533 )	2019-07-12 13:35:20 -04:00
Michal Niewrzal	268c629ba8	Replace base64 encoding for path segments (#2345 )	2019-07-11 13:26:07 -04:00
JT Olio	29d16b4d68	satellite: add monkit task to missing places (#2108 )	2019-06-04 13:55:37 +02:00
Bill Thorp	a6c4019288	using DB time only (#2018 ) * using DB time only, using UTC	2019-05-21 12:50:55 -04:00
Egon Elbre	f7ed63a119	handle database error checks properly (#1796 )	2019-04-23 14:13:57 +03:00
Bill Thorp	17a227e6e9	refactor injuredsegments db so that we can't have duplicates (#1717 ) made repairqueue not use a true queue, forbid duplicates	2019-04-16 14:14:09 -04:00
Bill Thorp	665fd33e3c	Repair queue isolation level fix (#1466 ) Implemented custom SQLite and Postgres Repairqueue Dequeue handlers	2019-03-14 17:12:47 -04:00
Jennifer Li Johnson	856b98997c	updates copyright 2018 to 2019 (#1133 )	2019-01-24 15:15:10 -05:00
Michal Niewrzal	b712fbcbb0	Fix 'empty queue' error when satellite starts (#939 )	2019-01-02 17:00:32 +01:00
Egon Elbre	4346cd060f	Implement mutex around satellitedb (#932 )	2018-12-27 11:56:25 +02:00
Cameron	f70b826fd4	repair queue masterDB support (#865 ) * add injuredsegment model to satellitedb.dbx * add context to queue.RepairQueue interface * use queue.RepairQueue interface, use masterdb	2018-12-21 10:11:19 -05:00

32 Commits