storj

Author	SHA1	Message	Date
Jess G	75b9a5971e	satellite: update log levels (#3851 ) * satellite: update log levels Change-Id: I86bc32e042d742af6dbc469a294291a2e667e81f * log version on start up for every service Change-Id: Ic128bb9c5ac52d4dc6d6c4cb3059fbad73f5d3de * Use monkit for tracking failed ip resolutions Change-Id: Ia5aa71d315515e0c5f62c98d9d115ef984cd50c2 * fix compile errors Change-Id: Ia33c8b6e34e780bd1115120dc347a439d99e83bf * add request limit value to storage node rpc err Change-Id: I1ad6706a60237928e29da300d96a1bafa94156e5 * we cant track storage node ids in monkit metrics so lets use logging to track that for expired orders Change-Id: I1cc1d240b29019ae2f8c774792765df3cbeac887 * fix build errs Change-Id: I6d0ffe058e9a38b7ed031c85a29440f3d68e8d47	2020-04-15 12:32:22 -07:00
littleskunk	048ca4558f	satellite/repair: clean up logging (#3833 ) Co-authored-by: Michal Niewrzal <michal@storj.io>	2020-03-30 11:59:56 +02:00
Egon Elbre	480ea1e4b5	satellite/repair/repairer: fix temporary file handling Change-Id: Ice1a467510737b3375c018ae37b16431c7dffe9e	2020-03-27 21:36:23 +02:00
Moby von Briesen	a933bcc99a	satellite/repair/repairer/ec.go: add option for downloading pieces onto disk instead of in memory during repair Add flag to satellite repairer, "InMemoryRepair" that allows the satellite to decide whether to download the entire segment being repaired into memory (this is what the satellite already does), or to download it into temporary files on disk that will be read from in the upload phase of repair. This should help with handling high repair traffic on satellites that cannot afford to spend 64mb of memory per repair worker. Updates tests to test repair for both in memory and to disk. Change-Id: Iddf591e165621497c98533d45bfea3c28b08a194	2020-03-27 16:41:00 +00:00
Moby von Briesen	8b72181a1f	satellite/{audit,overlay,satellitedb}: implement unknown audit reputation and suspension * change overlay.UpdateStats to allow a third audit outcome. Now it can handle successful, failed, and unknown audits. * when "unknown audit reputation" (unknownAuditAlpha/(unknownAuditAlpha+unknownAuditBeta)) falls below the DQ threshold, put node into suspension. * when unknown audit reputation goes above the DQ threshold, remove node from suspension. * record unknown audits from audit reporter. * add basic tests around unknown audits and suspension. Change-Id: I125f06f3af52e8a29ba48dc19361821a9ff1daa1	2020-03-16 20:29:26 +00:00
Jess G	39cb821196	satellite/overlay: rm combinedcache, fix IP naming to be network (#3798 ) * rn combinedcache, rm dns node lookup Change-Id: I239f07211764b097d851230d8c81900a47756e9e * excludeIPs -> excludedNetworks Change-Id: Ifa6f44ab17457cdd5aff4cd5694296867c18b179 * use lowercase var name Change-Id: I825aad2b718c71f455e747be18f8cabd02aabe55 * update Getnetwork name Change-Id: I002a1b7bc6b4ef40159c0cd2b0ef209f80a9c503 * fix comments Change-Id: Ibddf5b9ffa9d685af6c392d893db063ef18e45fa * update comments with ipv6 Change-Id: I31758b7d4979e7c27d014668f4fb532ad838cda2 Co-authored-by: Stefan Benten <mail@stefan-benten.de>	2020-03-12 11:37:57 -07:00
paul cannon	79553059cb	satellite/repair: put irreparable segments in irreparableDB Previously, we were simply discarding rows from the repair queue when they couldn't be repaired (either because the overlay said too many nodes were down, or because we failed to download enough pieces). Now, such segments will be put into the irreparableDB for further and (hopefully) more focused attention. This change also better differentiates some error cases from Repair() for monitoring purposes. Change-Id: I82a52a6da50c948ddd651048e2a39cb4b1e6df5c	2020-03-09 21:45:16 +00:00
Jennifer Johnson	1c1750e6be	removes bandwidth limiting On satellite, remove all references to free_bandwidth column in nodes table. On storage node, remove references to AllocatedBandwidth and MinimumBandwidth and mark as deprecated. Protobuf message, NodeCapacity, is left intact for backwards compatibility. Once this is released to all satellites, we can drop the column from the DB. Change-Id: I2ff6c6537fc9008a0c5588e951afea58ede85838	2020-03-04 14:04:00 +00:00
paul cannon	92d86fa044	satellite/repair: fix repair concurrency This new repair timeout (configured as TotalTimeout) will include both the time to download pieces and the time to upload pieces, as well as the time to pop the segment from the repair queue. This is a move from Github PR #3645. Change-Id: I47d618f57285845d8473fcd285f7d9be9b4318c8	2020-02-24 19:57:09 +00:00
Egon Elbre	5342dd9fe6	go.mod: update uplink Change-Id: I867a6a1eef8aa5d60bb676e5112b98c4192ce811	2020-02-21 16:08:12 +02:00
Cameron Ayer	b22bf16b35	satellite/overlay: add config flag for node selection free disk requirement Currently SNs report their free disk space once per hour. If a node becomes full, it has to wait until the next contact cycle begins to report; all the while receiving and failing upload requests. By increasing the minimum required disk space, we can give the storage nodes more time to report their space before the completely fill up. This change goes hand-in-hand with another change we want to implement: trigger capacity report on SN immediately upon falling below threshold. Change-Id: I12f778286c6c3f582438b0e2949765ac43325e27	2020-02-11 18:08:25 +00:00
Jeff Wendling	7999d24f81	all: use monkit v3 this commit updates our monkit dependency to the v3 version where it outputs in an influx style. this makes discovery much easier as many tools are built to look at it this way. graphite and rothko will suffer some due to no longer being a tree based on dots. hopefully time will exist to update rothko to index based on the new metric format. it adds an influx output for the statreceiver so that we can write to influxdb v1 or v2 directly. Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff	2020-02-05 23:53:17 +00:00
Moby von Briesen	006a2824ba	satellite/repair: lock monkit stats in checker and repairer Change-Id: Ia10fc8da0177389a500359ce51d21a5806f3f7b1	2020-01-30 14:09:56 +00:00
Egon Elbre	8dea4f52db	satellite: add control panel Change-Id: Id48246e9bcd4c6ec643277fe740937b2e42ad85b	2020-01-30 08:06:43 -05:00
Egon Elbre	082ec81714	uplink: move to storj.io/uplink (#3746 )	2020-01-08 15:40:19 +02:00
Egon Elbre	6615ecc9b6	common: separate repository Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a	2019-12-27 14:11:15 +02:00
Egon Elbre	72d407559e	satellite/metainfo: don't leak error implementation detail (#3722 ) * satellite/metainfo: don't leak implementation detail * add missing wrap	2019-12-10 15:21:30 -05:00
littleskunk	71b58edb2c	satellite/repair: decrease repair interval Change-Id: Id9efdbfaa82521c35dc41e7a52b700522c197e77	2019-12-10 00:36:00 +00:00
Yingrong Zhao	7af42e3c10	satellite/metainfo, satellite/repair, uplink/eestream: add metric for download failed due to not enough pieces available (#3665 )	2019-12-04 16:24:36 -05:00
littleskunk	c52c7275ad	satellite/repair: reduce upload timeout (#3597 )	2019-11-18 18:52:56 +01:00
Egon Elbre	ee6c1cac8a	private: rename internal to private (#3573 )	2019-11-14 21:46:15 +02:00
littleskunk	7eb6724c92	logging: unify logging around satellite ID, node ID and piece ID (#3491 ) * logging: unify logging around satellite ID, node ID and piece ID * unify segment index	2019-11-05 22:04:07 +01:00
Yingrong Zhao	bfa6699e2c	satellite/repair: add timeout for repair download from a single node(#3418 )	2019-10-30 16:31:08 -04:00
Natalie Villasana	5453886231	satellite/repair, uplink/ecclient: remove unused expiration arg from ec.Repair and ec.putPiece (#3416 )	2019-10-30 11:35:00 -04:00
littleskunk	2a5526fcc4	satellite/repair: reduce timeout (#3302 )	2019-10-18 13:43:24 +02:00
littleskunk	6e7607239c	satellite/repair: improve logging (#3287 ) * satellite/repair: improve logging * use Stringer wherever possible	2019-10-16 17:28:56 +02:00
littleskunk	5b20c716e6	satellite/repair: dont error on deleted segments (#3252 )	2019-10-15 05:39:28 +02:00
Maximillian von Briesen	a75e3e6b81	satellite/repairer: fix segment_time_until_repair metric (#3199 )	2019-10-07 13:54:12 -04:00
Cameron	eb5413ae5e	defer close piecestore in downloadAndVerifyPiece (#3192 )	2019-10-06 13:41:53 -04:00
Stefan Benten	1db4251234	Satellite/repair: Add Repair Threshold Override to allow earlier repair (#3151 )	2019-10-02 14:58:37 +02:00
Jeff Wendling	098cbc9c67	all: use pkg/rpc instead of pkg/transport all of the packages and tests work with both grpc and drpc. we'll probably need to do some jenkins pipelines to run the tests with drpc as well. most of the changes are really due to a bit of cleanup of the pkg/transport.Client api into an rpc.Dialer in the spirit of a net.Dialer. now that we don't need observers, we can pass around stateless configuration to everything rather than stateful things that issue observations. it also adds a DialAddressID for the case where we don't have a pb.Node, but we do have an address and want to assert some ID. this happened pretty frequently, and now there's no more weird contortions creating custom tls options, etc. a lot of the other changes are being consistent/using the abstractions in the rpc package to do rpc style things like finding peer information, or checking status codes. Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412	2019-09-25 15:37:06 -06:00
Natalie Villasana	cc70cd2329	satellite/repair: add metric trackers for segment age before repair (#3056 )	2019-09-17 15:18:48 -04:00
Yingrong Zhao	b37ea864b1	satellite/repair: delete pieces that failed piece hashes verification from pointer (#3051 ) * add test * add implementation * remove todo comments * modifies cooment * fix linting * typo oops	2019-09-16 13:13:24 -04:00
Yingrong Zhao	95aa33c964	satellite/repair/repairer: update audit status as failed after failing piece hash verification (#2997 ) * update audit status as failed for nodes that failed piece hash verification * remove comment * fix lint error * add test * fix format * use named return value for Get * add comments * add more better comment * format	2019-09-13 12:21:20 -04:00
Maximillian von Briesen	289cfe8ff2	satellite/repair: do not log "retrieved segment" if repair queue empty (#2995 )	2019-09-11 16:06:36 +03:00
Egon Elbre	a801fab66a	all: add archview annotations (#2964 )	2019-09-10 16:24:16 +03:00
Maximillian von Briesen	fb10815229	Repair with hashes (#2925 ) * add outline for ECRepairer * add description of process in TODO comments * begin download/getting hash for a single piece * verify piece hash and order limit during download * fix download piece * begin filling out ESREpair. Get * wip move ecclient.Repair to ecrepairer.Repair * pass satellite signee into repairer * reconstruct original stripe from pieces * move rebuildStripe() * calculate piece size differently, increment successful count * fix shares slices initialization * rename stripeData to segment * do not pad reader in Repair() * temp debug * create unsafeRSScheme * use decode reader * rename file name to be all lowercase * make repair downloader async * declare condition variable inside Get method * set downloadAndVerifyPiece's in-memory buffer to be share size * update unusedLimits var * address comments * remove unnecessary comments * move initialization of segmentRepaire to be outside of repairer service * use ReadAll during download * remove dots and move hashing to after validating for order limit signature * wip test * make sure files exactly at min threshold are repaired * remove unused code * use corrput data and write back to storagenode * only create corrupted node and piece ids once * add comment * address nat's comment * fix linting and checker_test * update comment * add comments * remove "copied from ecclient" comments * add clarification comments in ec.Repair	2019-09-06 15:20:36 -04:00
Egon Elbre	c8edeb0257	satellite/overlay: rename overlay.Cache to overlay.Service (#2717 )	2019-08-06 19:35:59 +03:00
Bill Thorp	fcbc9d71da	satellite/repair: add shouldDelete (#2702 ) * add shouldDelete to repair	2019-08-05 11:09:16 -04:00
Alexander Leitner	4632ab0a67	Delete irreparable segments (#2642 ) * Delete irreparable segments	2019-07-30 11:38:25 -04:00
Egon Elbre	e75813d094	satellite/repair: move segment repairer to satellite and simplify (#2651 )	2019-07-29 13:24:56 +02:00
Egon Elbre	5d0816430f	rename all the things (#2531 ) * rename pkg/linksharing to linksharing * rename pkg/httpserver to linksharing/httpserver * rename pkg/eestream to uplink/eestream * rename pkg/stream to uplink/stream * rename pkg/metainfo/kvmetainfo to uplink/metainfo/kvmetainfo * rename pkg/auth/signing to pkg/signing * rename pkg/storage to uplink/storage * rename pkg/accounting to satellite/accounting * rename pkg/audit to satellite/audit * rename pkg/certdb to satellite/certdb * rename pkg/discovery to satellite/discovery * rename pkg/overlay to satellite/overlay * rename pkg/datarepair to satellite/repair	2019-07-28 08:55:36 +03:00

42 Commits