storj

Author	SHA1	Message	Date
Michal Niewrzal	7dde184cb5	Merge 'master' branch Change-Id: I6070089128a150a4dd501bbc62a1f8b394aa643e	2020-11-10 11:58:59 +00:00
Egon Elbre	2268cc1df3	all: fix linter complaints Change-Id: Ia01404dbb6bdd19a146fa10ff7302e08f87a8c95	2020-10-13 15:59:01 +03:00
Egon Elbre	0bdb952269	all: use keyed special comment Change-Id: I57f6af053382c638026b64c5ff77b169bd3c6c8b	2020-10-13 15:13:41 +03:00
Michal Niewrzal	aa47e70f03	satellite/metainfo: use metabase.SegmentKey with metainfo.Service Instead of using string or []byte we will be using dedicated type SegmentKey. Change-Id: I6ca8039f0741f6f9837c69a6d070228ed10f2220	2020-09-03 15:11:32 +00:00
Egon Elbre	3ca405aa97	satellite/orders: use metabase types as arguments Change-Id: I7ddaad207c20572a5ea762667531770a56fd54ef	2020-08-28 15:52:37 +03:00
Moby von Briesen	5dfe27f175	satellite/{repair,overlay}: Use overlay NodeSelectionCache for repair uploads This change removes the overlay function FindStorageNodesForRepair, which skips using the node selection cache and hits the database directly. Otherwise, it is functionally identical to FindStorageNodesForUpload, which checks the node selection cache first. When selecting nodes for PUT_REPAIRs, we now call FindStorageNodesForUpload instead of FindStorageNodesForRepair to reduce database load. Change-Id: If34e109695b2ed2b8fb6759115bf769a3459684e	2020-08-04 12:50:12 -04:00
Egon Elbre	44f9193404	satellite/orders: make optimal threshold multiplier into an argument It feels weird having a repairer configuration part of order services. Let's have a single source of truth for it. Change-Id: I24f7c897aec80f3293f8af24876cbb6733d85a0b	2020-07-24 16:35:59 +03:00
Cameron Ayer	e14f7a3fb4	satellite/repair: update healthyPieces and unhealthyPieces after CreateGetRepairOrderLimits Inside CreateGetRepairOrderLimits we pass in a list of healthy pieces, but when we query node info from this list we apply the "reliable" filter again. We sometimes end up with nodes which at first were healthy, but then became unhealthy, and thus can be repaired, but we do not update the 'unhealthyPieces' list with these nodes. This causes an error, 'piece to add already exists', as we fail to remove these pieces from the pointer before replacing them with repaired pieces. Change-Id: I6e2445f342ac117ded30351fa7e5e523c9ec26bd	2020-07-23 13:24:46 +00:00
Egon Elbre	080ba47a06	all: fix dots Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2	2020-07-16 14:58:28 +00:00
Moby von Briesen	acf8b72cd0	satellite/repair/repairer: cut off long tail when minimum number of required uploads is met This will speed up the Put step of repair by not waiting to time out for a handful of slow nodes, at the expense of a slightly less durable pointer. It will still repair to the optimal threshold, but not every node that is selected will end up in the pointer. Change-Id: I02a0658e3fe6fc0383f26af0f50a065b8b11a651	2020-05-28 16:25:28 -04:00
Jess G	825226c98e	satellite/overlay: use node selection cache for uploads (#3859 ) * satellite/overlay: use node selection cache for uploads Change-Id: Ibd16cccee979d0544f2f4a01749af9f36f02a6ad * fix config lock Change-Id: Idd307e4dee8ab92749f1ec3f996419ea0af829fd * start fixing tests Change-Id: I207d373a3b2a2d9312c9e72fe9bd0b01e06ad6cf * fix test, add some more Change-Id: I82b99c2004fca2510965f9b389f87dd4474bc722 * change config name Change-Id: I0c0f7fc726b2565dc3828cb723f5459a940f2a0b * add benchmarks Change-Id: I05fa25bff8d5b65f94d918556855b95163d002e9 * revert bench to put in different PR Change-Id: I0f6942296895594768f19614bd7b2e3b9b106ade * add staleness to benchmark Change-Id: Ia80a310623d5a342afa6d835402170b531b0f870 * add cache config to testplanet Change-Id: I39abdab8cc442694da543115a9e470b2a8a25dff * have repair select old way Change-Id: I25a938457d7d1bcf89fd15130cb6b0ac19585252 * lower testplante config time Change-Id: Ib56a2ed086c06bc6061388d15a10a2526a663af7 * fix test Change-Id: I3868e9cacde2dfbf9c407afab04dc5fc2f286f69	2020-04-24 09:11:04 -07:00
Moby von Briesen	178aa8b5e0	satellite/{metainfo,repair}: Delete expired segments from metainfo * Delete expired segments in expired segments service using metainfo loop * Add test to verify expired segments service deletes expired segments * Ignore expired segments in checker observer * Modify checker tests to verify that expired segments are ignored * Ignore expired segments in segment repairer and drop from repair queue * Add repair test to verify that a segment that expires after being added to the repair queue is ignored and dropped from the repair queue Change-Id: Ib2b0934db525fef58325583d2a7ca859b88ea60d	2020-04-22 13:02:31 +00:00
littleskunk	048ca4558f	satellite/repair: clean up logging (#3833 ) Co-authored-by: Michal Niewrzal <michal@storj.io>	2020-03-30 11:59:56 +02:00
Moby von Briesen	a933bcc99a	satellite/repair/repairer/ec.go: add option for downloading pieces onto disk instead of in memory during repair Add flag to satellite repairer, "InMemoryRepair" that allows the satellite to decide whether to download the entire segment being repaired into memory (this is what the satellite already does), or to download it into temporary files on disk that will be read from in the upload phase of repair. This should help with handling high repair traffic on satellites that cannot afford to spend 64mb of memory per repair worker. Updates tests to test repair for both in memory and to disk. Change-Id: Iddf591e165621497c98533d45bfea3c28b08a194	2020-03-27 16:41:00 +00:00
Moby von Briesen	8b72181a1f	satellite/{audit,overlay,satellitedb}: implement unknown audit reputation and suspension * change overlay.UpdateStats to allow a third audit outcome. Now it can handle successful, failed, and unknown audits. * when "unknown audit reputation" (unknownAuditAlpha/(unknownAuditAlpha+unknownAuditBeta)) falls below the DQ threshold, put node into suspension. * when unknown audit reputation goes above the DQ threshold, remove node from suspension. * record unknown audits from audit reporter. * add basic tests around unknown audits and suspension. Change-Id: I125f06f3af52e8a29ba48dc19361821a9ff1daa1	2020-03-16 20:29:26 +00:00
Jess G	39cb821196	satellite/overlay: rm combinedcache, fix IP naming to be network (#3798 ) * rn combinedcache, rm dns node lookup Change-Id: I239f07211764b097d851230d8c81900a47756e9e * excludeIPs -> excludedNetworks Change-Id: Ifa6f44ab17457cdd5aff4cd5694296867c18b179 * use lowercase var name Change-Id: I825aad2b718c71f455e747be18f8cabd02aabe55 * update Getnetwork name Change-Id: I002a1b7bc6b4ef40159c0cd2b0ef209f80a9c503 * fix comments Change-Id: Ibddf5b9ffa9d685af6c392d893db063ef18e45fa * update comments with ipv6 Change-Id: I31758b7d4979e7c27d014668f4fb532ad838cda2 Co-authored-by: Stefan Benten <mail@stefan-benten.de>	2020-03-12 11:37:57 -07:00
paul cannon	79553059cb	satellite/repair: put irreparable segments in irreparableDB Previously, we were simply discarding rows from the repair queue when they couldn't be repaired (either because the overlay said too many nodes were down, or because we failed to download enough pieces). Now, such segments will be put into the irreparableDB for further and (hopefully) more focused attention. This change also better differentiates some error cases from Repair() for monitoring purposes. Change-Id: I82a52a6da50c948ddd651048e2a39cb4b1e6df5c	2020-03-09 21:45:16 +00:00
Jennifer Johnson	1c1750e6be	removes bandwidth limiting On satellite, remove all references to free_bandwidth column in nodes table. On storage node, remove references to AllocatedBandwidth and MinimumBandwidth and mark as deprecated. Protobuf message, NodeCapacity, is left intact for backwards compatibility. Once this is released to all satellites, we can drop the column from the DB. Change-Id: I2ff6c6537fc9008a0c5588e951afea58ede85838	2020-03-04 14:04:00 +00:00
Egon Elbre	5342dd9fe6	go.mod: update uplink Change-Id: I867a6a1eef8aa5d60bb676e5112b98c4192ce811	2020-02-21 16:08:12 +02:00
Cameron Ayer	b22bf16b35	satellite/overlay: add config flag for node selection free disk requirement Currently SNs report their free disk space once per hour. If a node becomes full, it has to wait until the next contact cycle begins to report; all the while receiving and failing upload requests. By increasing the minimum required disk space, we can give the storage nodes more time to report their space before the completely fill up. This change goes hand-in-hand with another change we want to implement: trigger capacity report on SN immediately upon falling below threshold. Change-Id: I12f778286c6c3f582438b0e2949765ac43325e27	2020-02-11 18:08:25 +00:00
Moby von Briesen	006a2824ba	satellite/repair: lock monkit stats in checker and repairer Change-Id: Ia10fc8da0177389a500359ce51d21a5806f3f7b1	2020-01-30 14:09:56 +00:00
Egon Elbre	082ec81714	uplink: move to storj.io/uplink (#3746 )	2020-01-08 15:40:19 +02:00
Egon Elbre	6615ecc9b6	common: separate repository Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a	2019-12-27 14:11:15 +02:00
Egon Elbre	72d407559e	satellite/metainfo: don't leak error implementation detail (#3722 ) * satellite/metainfo: don't leak implementation detail * add missing wrap	2019-12-10 15:21:30 -05:00
Yingrong Zhao	bfa6699e2c	satellite/repair: add timeout for repair download from a single node(#3418 )	2019-10-30 16:31:08 -04:00
Natalie Villasana	5453886231	satellite/repair, uplink/ecclient: remove unused expiration arg from ec.Repair and ec.putPiece (#3416 )	2019-10-30 11:35:00 -04:00
littleskunk	6e7607239c	satellite/repair: improve logging (#3287 ) * satellite/repair: improve logging * use Stringer wherever possible	2019-10-16 17:28:56 +02:00
littleskunk	5b20c716e6	satellite/repair: dont error on deleted segments (#3252 )	2019-10-15 05:39:28 +02:00
Maximillian von Briesen	a75e3e6b81	satellite/repairer: fix segment_time_until_repair metric (#3199 )	2019-10-07 13:54:12 -04:00
Stefan Benten	1db4251234	Satellite/repair: Add Repair Threshold Override to allow earlier repair (#3151 )	2019-10-02 14:58:37 +02:00
Jeff Wendling	098cbc9c67	all: use pkg/rpc instead of pkg/transport all of the packages and tests work with both grpc and drpc. we'll probably need to do some jenkins pipelines to run the tests with drpc as well. most of the changes are really due to a bit of cleanup of the pkg/transport.Client api into an rpc.Dialer in the spirit of a net.Dialer. now that we don't need observers, we can pass around stateless configuration to everything rather than stateful things that issue observations. it also adds a DialAddressID for the case where we don't have a pb.Node, but we do have an address and want to assert some ID. this happened pretty frequently, and now there's no more weird contortions creating custom tls options, etc. a lot of the other changes are being consistent/using the abstractions in the rpc package to do rpc style things like finding peer information, or checking status codes. Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412	2019-09-25 15:37:06 -06:00
Natalie Villasana	cc70cd2329	satellite/repair: add metric trackers for segment age before repair (#3056 )	2019-09-17 15:18:48 -04:00
Yingrong Zhao	b37ea864b1	satellite/repair: delete pieces that failed piece hashes verification from pointer (#3051 ) * add test * add implementation * remove todo comments * modifies cooment * fix linting * typo oops	2019-09-16 13:13:24 -04:00
Yingrong Zhao	95aa33c964	satellite/repair/repairer: update audit status as failed after failing piece hash verification (#2997 ) * update audit status as failed for nodes that failed piece hash verification * remove comment * fix lint error * add test * fix format * use named return value for Get * add comments * add more better comment * format	2019-09-13 12:21:20 -04:00
Maximillian von Briesen	fb10815229	Repair with hashes (#2925 ) * add outline for ECRepairer * add description of process in TODO comments * begin download/getting hash for a single piece * verify piece hash and order limit during download * fix download piece * begin filling out ESREpair. Get * wip move ecclient.Repair to ecrepairer.Repair * pass satellite signee into repairer * reconstruct original stripe from pieces * move rebuildStripe() * calculate piece size differently, increment successful count * fix shares slices initialization * rename stripeData to segment * do not pad reader in Repair() * temp debug * create unsafeRSScheme * use decode reader * rename file name to be all lowercase * make repair downloader async * declare condition variable inside Get method * set downloadAndVerifyPiece's in-memory buffer to be share size * update unusedLimits var * address comments * remove unnecessary comments * move initialization of segmentRepaire to be outside of repairer service * use ReadAll during download * remove dots and move hashing to after validating for order limit signature * wip test * make sure files exactly at min threshold are repaired * remove unused code * use corrput data and write back to storagenode * only create corrupted node and piece ids once * add comment * address nat's comment * fix linting and checker_test * update comment * add comments * remove "copied from ecclient" comments * add clarification comments in ec.Repair	2019-09-06 15:20:36 -04:00
Egon Elbre	c8edeb0257	satellite/overlay: rename overlay.Cache to overlay.Service (#2717 )	2019-08-06 19:35:59 +03:00
Bill Thorp	fcbc9d71da	satellite/repair: add shouldDelete (#2702 ) * add shouldDelete to repair	2019-08-05 11:09:16 -04:00
Alexander Leitner	4632ab0a67	Delete irreparable segments (#2642 ) * Delete irreparable segments	2019-07-30 11:38:25 -04:00
Egon Elbre	e75813d094	satellite/repair: move segment repairer to satellite and simplify (#2651 )	2019-07-29 13:24:56 +02:00

39 Commits