storj

Author	SHA1	Message	Date
paul cannon	355ea2133b	satellite/audit: remove pieces when audits fail When pieces fail an audit (hard fail, meaning the node acknowledged it did not have the piece or the piece was corrupted), we will now remove those pieces from the segment. Previously, we did not do this, and some node operators were seeing the same missing piece audited over and over again and losing reputation every time. This change will include both verification and reverification audits. It will also apply to pieces found to be bad during repair, if repair-to-reputation reporting is enabled. Change-Id: I0ca7af7e3fecdc0aebbd34fee4be3a0eab53f4f7	2023-06-22 14:19:00 +00:00
Michal Niewrzal	cb9a7bdc71	satellite/repair/repairer: make DialTimeout configurable This change makes dial timeout configurable and change it also from defatul 20s to 5s. Main motivation is that during repair we often loose lots of time to dial which eventually will fail. New timeout should be still enough to dial but we will move forward quicker to next node if that one will fail. Timeout is also applied directly as context timeout in case we will use noise of tcp fast open one day. Change-Id: I021bf459af49b11241e314fa1a7887c81d5214ea	2023-06-16 12:23:25 +00:00
Michal Niewrzal	0c177ef91f	satellite: cleanup orders dependencies Only API peer needs access to order DB (and rollups cache) because it's only place where we are creating orders for PUT and GET operations. For other peers like auditor and repairer we can set noop implementation to reduce number of dependencies needed for them. Change-Id: Ic32d1879f0b97ffc4516f401898e31e95ae892e4	2023-03-09 13:34:21 +00:00
Márton Elek	ffaf15a3b0	satellite/overlay: remove unused mail service from overlay It was surprising that `satellite auditor` complained about SMTP mail settings, even if it's not supposed to sending any mail. Looks like we can remove the mail service dependency, as it's not a hard requirement for overlay.Service. Change-Id: I29a52eeff3f967ddb2d74a09458dc0ee2f051bd7	2023-03-09 12:17:35 +00:00
paul cannon	fc905a15f7	satellite/audit: newContainment->containment Now that all the reverification changes have been made and the old code is out of the way, this commit renames the new things back to the old names. Mostly, this involves renaming "newContainment" to "containment" or "NewContainment" to "Containment", but there are a few other renames that have been promised and are carried out here. Refs: https://github.com/storj/storj/issues/5230 Change-Id: I34e2b857ea338acbb8421cdac18b17f2974f233c	2022-12-16 17:59:52 +00:00
paul cannon	0342ca1aa6	satellite/audit: delete now-unused code Now that we are doing scalable piecewise reverifications, the code for handling the old way of doing things (containment, pending audits, reporting, testing) can now be removed. Refs: https://github.com/storj/storj/issues/5230 Change-Id: Ief1a75f423eff682e8f3d57804e343b3409a6631	2022-12-16 14:53:39 +00:00
paul cannon	1854351da6	satellite/audit: teach Reporter about piecewise audits The Reporter is responsible for processing results from auditing operations, logging the results, disqualifying nodes that reached the maximum reverification count, and passing the results on to the reputation system. In this commit, we extend the Reporter so that it knows how to process the results of piecewise reverification audits. We also change most reporter-related tests so that reverifications happen as piecewise reverification audits, exercising the new code. Note that piecewise reverification audits are not yet being done outside of tests. In a later commit, we will switch from doing segmentwise reverifications to piecewise reverifications, as part of the audit-scaling effort. Refs: https://github.com/storj/storj/issues/5230 Change-Id: I9438164ce1ea4d9a1790d18d0e1046a8eb04d8e9	2022-12-12 11:28:02 +00:00
paul cannon	378b8915c4	satellite/{satellitedb,audit}: add NewContainment NewContainment will replace Containment later in this commit chain, but for now it is not yet being used. NewContainment will allow a node to be contained for multiple pending reverify jobs at a time. It is implemented by way of the reverify queue. Refs: https://github.com/storj/storj/issues/5231 Change-Id: I126eda0b3dfc4710a88fe4a5f41780618ec19101	2022-12-07 18:03:37 +00:00
Moby von Briesen	3501656e98	satellite/repair: Add flag to allow disabling reputation updates Reputation updates during repair currently consumes a lot of database resources. Sometimes increasing the rate of repair is more important than auditing a node based on whether they have or don't have the correct piece during repair. This is the job of the audit service. This commit is to implement an intermediate solution from this issue: https://github.com/storj/storj/issues/5089 This commit does not address the more in-depth fix discussed here: https://github.com/storj/storj/issues/4939 Change-Id: I4163b18d78a96fadf5265789fd73c8aa8def0e9f	2022-11-24 08:31:11 -05:00
Cameron	f06da25c3d	satellite/overlay: add nodeevents.DB to satellite overlay service Add nodeevents.DB to satellite overlay service so we can insert node events into the nodeevents DB. Change-Id: I642c0ccc9941ecdb08cb22d5c8cf701959a55156	2022-11-02 15:56:37 +00:00
Cameron	4be042154c	sdatellite/reputation: add overlay service to reputation service Change-Id: I7b1af819f9e1989578335247730378a8658bfe7f	2022-10-26 17:03:29 +00:00
Cameron	a52f766273	satellite/overlay: add email-sending functionality to overlay service We want to send emails to SNOs. Node status changes go through the overlay service, so it's a good place to add the mail service. Add the mailservice.Service, satellite address, and satellite name to overlay service. Also add feature flag --overlay.send-node-emails Change-Id: I3bd2cb3bf22f9724954ce2374f8b651b902b3a24	2022-10-13 18:01:05 +00:00
Michal Niewrzal	a97cd97789	satellite/orders: remove unused service dependency Orders service doesn't need buckets service anymore. Change-Id: I27853cda87e82b528f53667e4b4866801f7bfb62	2022-09-28 08:56:36 +00:00
paul cannon	799b159bba	satellite/reputation: offset write times by random, not by satelliteID In an effort to distribute load on the reputation database, the reputation write cache scheduled nodes to be written at a time offset by the local nodeID. The idea was that no two repair workers would have the same nodeID, so they would not tend to write to the same row at the same time. Instead, since all satellite processes share the same satellite ID (duh), this caused _all_ workers to try and write to the same row at the same time _always_. This was not ideal. This change uses a random number instead of the satellite ID. The random number is sourced from the number of nanoseconds since the Unix epoch. As long as workers are not started at the exact same nanosecond, they ought to get well-distributed offsets. Change-Id: I149bdaa6ca1ee6043cfedcf1489dd9d3e3c7a163	2022-08-03 21:14:06 +00:00
paul cannon	2f20bbf4d8	satellite/reputation: add a reputation write cache This should lower the amount of database load coming from reputation updates. Change-Id: Iaacfb81480075261da77c5cc93e08b24f69f8949	2022-07-14 21:40:16 +00:00
Egon Elbre	48b0a65fbd	satellite/overlay: use ReadCache in Download/UploadSelectionCache sync2.ReadCache implements preemptive refreshing preventing stalling while it's being updated. Change-Id: Iee9ef36049b986f0e426c14a139b2bc9ac17fb53	2022-07-12 13:52:48 +03:00
paul cannon	41c5879f7c	satellite: more detailed goroutine labels This will apply an appropriate "subsystem" label to goroutines which are part of the core, api, repairer, admin, or gc subsystems. It will also label goroutines whose job it is to watch for slow shutdown of lifecycle groups (there are a lot of these). Finally, this will also label goroutines whose job it is to wait on the toplevel errgroup of a subsystem. Change-Id: I560b5fff4a0101300d6c9a67609c2d80d7424486	2022-05-11 17:50:55 +00:00
paul cannon	fd01c6cc25	satellite/{repair,audit}: simplify reputation reporter Also, make it an interface so that the upcoming write cache can be dropped in to the same place. Change-Id: I2c286743825e647c0cef5b6578245391851fa10c	2022-05-10 14:04:43 +00:00
Mya	814e3126fa	satellite/buckets: add new buckets service The main motivation is to wrap the bucket DB and metainfo DB, so we could check if a bucket is empty before applying geofencing config. Change-Id: I8bac21555e01d51a663fb557bc1acfc8106bc2e1	2021-11-16 12:36:17 +02:00
Yaroslav Vorobiov	469ae72c19	satellite/repair: update audit records during repair Change-Id: I788b2096968f043601aba6502a2e4e784f1f02a0	2021-09-24 00:48:13 +00:00
Michał Niewrzał	495e530933	satellite/metainfo: drop metainfo.Service Drop a service that in fact don't make sense and its just a wrapper for direct DB methods. Change-Id: I1cb76f3ecc2d8765964d919c88541179957645c1	2021-09-09 17:30:10 +02:00
Yingrong Zhao	f8914ccce0	satellite/{repair, overlay}: use reputation store in repair Change-Id: I48db9e68f48239d48621ccc77d33618ecb83ce1a	2021-07-28 13:22:05 -04:00
Michał Niewrzał	a93e47514a	satellite: remove irreparabledb This is part of metaloop refactoring. We plan to remove irreparable at some point but there was not time for it. Now instead refatoring it for segmentloop its just easier to drop it. Later we still need to drop table with migration step. Change-Id: I270e77f119273d39a1ecdcf5e1c37a5662a29ab4	2021-06-17 07:20:15 +00:00
Egon Elbre	10372afbe4	ci: fix lint errors Change-Id: Ib5893440807811f77175ccd347aa3f8ca9cccbdf	2021-05-17 13:37:31 +00:00
Egon Elbre	910eec8eee	satellite/metainfo: remove MetabaseDB interface Currently the interface is not useful. When we need to vary the implementation for testing purposes we can introduce a local interface for the service/chore that needs it, rather than using the large api. Unfortunately, this requires adding a cleanup callback for tests, there might be a better solution to this problem. Change-Id: I079fe4dbe297b0ae08c10081a1cea4dfbc277682	2021-05-13 13:22:14 +00:00
Kaloyan Raev	2ee3030275	all: remove code related to PointerDB Change-Id: I6675c9597f87019020f6233b83ab2f1119d2bc46	2021-04-21 12:35:31 +00:00
Michał Niewrzał	ec88d21a3c	Merge 'main' branch. Change-Id: I6e8162d1a6caf75e89c9f9c9f9522730aebf83ae	2021-01-11 10:26:58 +01:00
Jeff Wendling	2d2359667d	satellite/orders: remove unused satelliteAddress field Change-Id: I58091769472688433c48becc8dfc9029bddd87aa	2021-01-08 12:25:39 -05:00
Michał Niewrzał	ad3e3a38c5	Merge 'main' branch Change-Id: Ia0db1b1f9ef3e0671d3f2208881b0abc3064e200	2021-01-04 12:13:45 +01:00
Ethan Adams	6070018021	satellite/overlay: use AS OF SYSTEM TIME with Cockroach Query nodes table using AS OF SYSTEM TIME '-10s' (by default) when on CRDB to alleviate contention on the nodes table and minimize CRDB retries. Queries for standard uploads are already cached, and node lookups for graceful exit uploads has retry logic so it isn't necessary for the nodes returned to be current.	2020-12-22 21:07:07 +02:00
Michal Niewrzal	8d3ea9c251	satellite/repair/repairer: implement SegmentRepairer with metabase Change-Id: I647c625e00a626c44e812602ad9bc3e85a7b602c	2020-12-17 10:47:21 +00:00
Michal Niewrzal	efaba85c73	Merge 'master' branch Change-Id: I3520b3e327732929f5167b07a15ddb92d26cae1b	2020-11-24 10:03:20 +01:00
Moby von Briesen	575f50df84	satellite/repair: Update repair override config to support multiple RS schemes. Rather than having a single repair override value, we will now support repair override values based on a particular segment's RS scheme. The new format for RS override values is "k/o/n-override,k/o/n-override..." Change-Id: Ieb422638446ef3a9357d59b2d279ee941367604d	2020-11-23 18:01:15 +00:00
Kaloyan Raev	b8c6fb764c	satellite/metainfo: add metabase to metainfo service Change-Id: Ie3ff238b138d8a57d99e32b13f7a71aa624d53e3	2020-10-30 12:49:47 +02:00
Egon Elbre	dc48197bd8	satellite/orders: add bucket id to order limit Change-Id: I9019ec77d692e62ac17b67a1da71dc3535cde50c	2020-09-03 10:50:11 +03:00
Egon Elbre	61b17f1214	satellite/orders: add encryption keys flag to Service Change-Id: Ie96e75bc96241b799d04654ef5e05b82e6a899bb	2020-09-02 05:02:14 +00:00
Egon Elbre	36ed939b89	satellite/orders: add buckets db to service We need to add bucket UUID into the order limit, hence we need access to the buckets table. Change-Id: I348ce1f709c9fcdec5c4034acaab59805b33da9f	2020-07-24 17:36:49 +03:00
Egon Elbre	44f9193404	satellite/orders: make optimal threshold multiplier into an argument It feels weird having a repairer configuration part of order services. Let's have a single source of truth for it. Change-Id: I24f7c897aec80f3293f8af24876cbb6733d85a0b	2020-07-24 16:35:59 +03:00
Egon Elbre	ba4c3d9986	satellite/orders: remove unused node status logging flag Change-Id: I24da78a11cc5d3d88cdf6aca85c4238e4086e59c	2020-07-24 16:35:59 +03:00
Ethan	159df8b2e4	Add logging listener for retrieving and setting log levels See https://storjlabs.atlassian.net/browse/SM-752 These changes allow us to change the log level at runtime through a handler off of the debug endpoint. Examples of changing the log level on storj-sim To get the current level for the satellite api process: curl -XGET 'http://127.0.0.1:10009/logging' --header 'Content-Type: text/plain' To change the log level: curl -XPUT 'http://127.0.0.1:10009/logging' --header 'Content-Type: text/plain' --data-raw '{"level":"error"}' Change-Id: I05d164b290929fa06b6d78c01075ee41f8238044	2020-05-12 16:38:06 -04:00
Jess G	75b9a5971e	satellite: update log levels (#3851 ) * satellite: update log levels Change-Id: I86bc32e042d742af6dbc469a294291a2e667e81f * log version on start up for every service Change-Id: Ic128bb9c5ac52d4dc6d6c4cb3059fbad73f5d3de * Use monkit for tracking failed ip resolutions Change-Id: Ia5aa71d315515e0c5f62c98d9d115ef984cd50c2 * fix compile errors Change-Id: Ia33c8b6e34e780bd1115120dc347a439d99e83bf * add request limit value to storage node rpc err Change-Id: I1ad6706a60237928e29da300d96a1bafa94156e5 * we cant track storage node ids in monkit metrics so lets use logging to track that for expired orders Change-Id: I1cc1d240b29019ae2f8c774792765df3cbeac887 * fix build errs Change-Id: I6d0ffe058e9a38b7ed031c85a29440f3d68e8d47	2020-04-15 12:32:22 -07:00
Kaloyan Raev	a2ce836761	remove sugar logging Change-Id: I6b6ca9704837cb3f5f5449ba7f55661487814d9f	2020-04-15 12:37:47 +00:00
Moby von Briesen	a933bcc99a	satellite/repair/repairer/ec.go: add option for downloading pieces onto disk instead of in memory during repair Add flag to satellite repairer, "InMemoryRepair" that allows the satellite to decide whether to download the entire segment being repaired into memory (this is what the satellite already does), or to download it into temporary files on disk that will be read from in the upload phase of repair. This should help with handling high repair traffic on satellites that cannot afford to spend 64mb of memory per repair worker. Updates tests to test repair for both in memory and to disk. Change-Id: Iddf591e165621497c98533d45bfea3c28b08a194	2020-03-27 16:41:00 +00:00
Michal Niewrzal	fdf40a7526	storj: remove `storj/private/version` package which was moved to `storj/private` repo Change-Id: I81c3f5b9d5e4fe7bca760999eb045ee9734e5e2e	2020-03-24 14:31:33 +00:00
Michal Niewrzal	f0aeda3091	storj: remove from `storj/pkg` packages moved to `storj/private` repo * debug * traces * cfgstruct * process Package `storj/private/version` will be removed as a separate change. Change-Id: Iadc40faa782e6225513b28218952f02d9c240a9f	2020-03-24 09:56:29 +01:00
paul cannon	79553059cb	satellite/repair: put irreparable segments in irreparableDB Previously, we were simply discarding rows from the repair queue when they couldn't be repaired (either because the overlay said too many nodes were down, or because we failed to download enough pieces). Now, such segments will be put into the irreparableDB for further and (hopefully) more focused attention. This change also better differentiates some error cases from Repair() for monitoring purposes. Change-Id: I82a52a6da50c948ddd651048e2a39cb4b1e6df5c	2020-03-09 21:45:16 +00:00
Qweder93	484ec7463a	storagenode: notifications on outdated software version Change-Id: If19b075c78a7b2c441e11b783c3c09fed55060c7	2020-03-02 16:48:02 +00:00
Jeff Wendling	7999d24f81	all: use monkit v3 this commit updates our monkit dependency to the v3 version where it outputs in an influx style. this makes discovery much easier as many tools are built to look at it this way. graphite and rothko will suffer some due to no longer being a tree based on dots. hopefully time will exist to update rothko to index based on the new metric format. it adds an influx output for the statreceiver so that we can write to influxdb v1 or v2 directly. Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff	2020-02-05 23:53:17 +00:00
Egon Elbre	8dea4f52db	satellite: add control panel Change-Id: Id48246e9bcd4c6ec643277fe740937b2e42ad85b	2020-01-30 08:06:43 -05:00
Egon Elbre	4e2bf81719	pkg/debug: add better title Change-Id: Icc6114f4e7523cfe6c7984ef1f6eec664ae4ee65	2020-01-30 07:49:40 -05:00

1 2

61 Commits