storj

Author	SHA1	Message	Date
Michal Niewrzal	27a9d14e2a	satellite/repair: use metabase.SegmentKey type in repair package Another change which is a part of refactoring to replace path parameter (string/[]byte) with key paramter (metabase.SegmentKey) Change-Id: I617878442442e5d59bbe5c995f913c3c93c16928	2020-09-08 19:35:20 +00:00
Michal Niewrzal	9202295348	satellite/metainfo: replace ScopedPath with metabase.SegmentLocation Change-Id: I7e89c9e8eaeae58be828a32ad47ed3028501f4c7	2020-09-04 10:06:52 +00:00
Michal Niewrzal	aa47e70f03	satellite/metainfo: use metabase.SegmentKey with metainfo.Service Instead of using string or []byte we will be using dedicated type SegmentKey. Change-Id: I6ca8039f0741f6f9837c69a6d070228ed10f2220	2020-09-03 15:11:32 +00:00
JT Olio	b872fe52a1	satellite/repair: switch to piecestore.UploadReader Change-Id: Ia99ad2cf5422e6ba1d98b32946740f9cadba7b6d	2020-09-01 09:26:54 -06:00
Michal Niewrzal	0604a672c1	satellite/metainfo: use metabase in loop Change-Id: I1bb0c6fe0a762895fde950690b06f7dd9d77e178	2020-09-01 10:06:16 +00:00
Moby von Briesen	5d21e85529	satellite/audit/queue: Separate audit queue into two separate structs. * The audit worker wants to get items from the queue and process them. * The audit chore wants to create new queues and swap them in when the old queue has been processed. This change adds a "Queues" struct which handles the concurrency issues around the worker fetching a queue and the chore swapping a new queue in. It simplifies the logic of the "Queue" struct to its bare bones, so that it behaves like a normal queue with no need to understand the details of swapping and worker/chore interactions. Change-Id: Ic3689ede97a528e7590e98338cedddfa51794e1b	2020-08-31 20:51:25 +00:00
Egon Elbre	3ca405aa97	satellite/orders: use metabase types as arguments Change-Id: I7ddaad207c20572a5ea762667531770a56fd54ef	2020-08-28 15:52:37 +03:00
Moby von Briesen	5dfe27f175	satellite/{repair,overlay}: Use overlay NodeSelectionCache for repair uploads This change removes the overlay function FindStorageNodesForRepair, which skips using the node selection cache and hits the database directly. Otherwise, it is functionally identical to FindStorageNodesForUpload, which checks the node selection cache first. When selecting nodes for PUT_REPAIRs, we now call FindStorageNodesForUpload instead of FindStorageNodesForRepair to reduce database load. Change-Id: If34e109695b2ed2b8fb6759115bf769a3459684e	2020-08-04 12:50:12 -04:00
Moby von Briesen	76030a8237	satellite/audit/{queue,chore}: Wait for audit queue to be finished before swapping * Do not swap the active audit queue with the pending audit queue until the active audit queue is empty. * Do not begin creating a new pending audit queue until the existing pending audit queue has been swapped to the active queue. Change-Id: I81db5bfa01458edb8cdbe71f5baeebdcb1b94317	2020-07-28 16:56:26 +00:00
Egon Elbre	44f9193404	satellite/orders: make optimal threshold multiplier into an argument It feels weird having a repairer configuration part of order services. Let's have a single source of truth for it. Change-Id: I24f7c897aec80f3293f8af24876cbb6733d85a0b	2020-07-24 16:35:59 +03:00
Cameron Ayer	e14f7a3fb4	satellite/repair: update healthyPieces and unhealthyPieces after CreateGetRepairOrderLimits Inside CreateGetRepairOrderLimits we pass in a list of healthy pieces, but when we query node info from this list we apply the "reliable" filter again. We sometimes end up with nodes which at first were healthy, but then became unhealthy, and thus can be repaired, but we do not update the 'unhealthyPieces' list with these nodes. This causes an error, 'piece to add already exists', as we fail to remove these pieces from the pointer before replacing them with repaired pieces. Change-Id: I6e2445f342ac117ded30351fa7e5e523c9ec26bd	2020-07-23 13:24:46 +00:00
Egon Elbre	d8dcae3075	all: fix error checking Change-Id: Ia0da1bbd6ce695139922f94096c2419281905e32	2020-07-16 19:13:14 +03:00
Egon Elbre	080ba47a06	all: fix dots Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2	2020-07-16 14:58:28 +00:00
paul cannon	bbdb351e5e	all: use jackc/pgx in place of lib/pq What: Use the github.com/jackc/pgx postgresql driver in place of github.com/lib/pq. Why: github.com/lib/pq has some problems with error handling and context cancellations (i.e. it might even issue queries or DML statements more than once! see https://github.com/lib/pq/issues/939). The github.com/jackx/pgx library appears not to have these problems, and also appears to be better engineered and implemented (in particular, it doesn't use "exceptions by panic"). It should also give us some performance improvements in some cases, and even more so if we can use it directly instead of going through the database/sql layer. Change-Id: Ia696d220f340a097dee9550a312d37de14ed2044	2020-07-13 15:54:41 +00:00
paul cannon	4997fd55d0	satellite/repair: remove healthy from irreparabledb Change-Id: Ia9d300d0359883f03734d0bdf204d56d6642ce34	2020-06-26 21:26:00 +00:00
Cameron Ayer	3b4b5f45c7	satellite: replace references to Suspended with UnknownAuditSuspended Change-Id: I3d2d00c95954c0546ad077702617895f262926ef	2020-06-23 14:19:22 +00:00
Egon Elbre	410d897840	satellite: fix string(int) conversions Change-Id: I54c6ca8c2dad3c321175f72271b7536cc2a4df09	2020-06-12 06:41:34 +00:00
Moby von Briesen	e7e69f383a	satellite/repair/repairer/ec.go: Add monkit tracing for ec repairer Adds monkit tracing for ecrepairer.downloadAndVerifyPiece and ecrepairer.putPiece so we can get more accurate estimates of node performance during repair. Change-Id: Ic05025bf3c493bb3d6f5d325d090c5b7c9e5465d	2020-05-29 14:00:45 +00:00
Moby von Briesen	acf8b72cd0	satellite/repair/repairer: cut off long tail when minimum number of required uploads is met This will speed up the Put step of repair by not waiting to time out for a handful of slow nodes, at the expense of a slightly less durable pointer. It will still repair to the optimal threshold, but not every node that is selected will end up in the pointer. Change-Id: I02a0658e3fe6fc0383f26af0f50a065b8b11a651	2020-05-28 16:25:28 -04:00
Moby von Briesen	290c006a10	satellite/repair/{checker,queue}: add metric for new segments added to repair queue * add monkit stat new_remote_segments_needing_repair, which reports the number of new unhealthy segments in the repair queue since the previous checker iteration Change-Id: I2f10266006fdd6406ece50f4759b91382059dcc3	2020-05-27 06:23:47 +00:00
Egon Elbre	bef84a5f9d	storagenode: remove dependency to overlay.NodeDossier This is the last dependency from storage node to satellite. Change-Id: I12f7abb91e84f823ba5af126c6e2979519838612	2020-05-21 08:37:13 +03:00
Egon Elbre	941d10cbc3	private/testplanet: remove Peer.Local() Currently storagenode depends on overlay.NodeDossier, this is the first step in removing it. Change-Id: I034a3f1601835f8349bd41752455022e19bcc707	2020-05-20 11:05:34 +00:00
Egon Elbre	ed627144ed	all: use DialNodeURL throughout the codebase Change-Id: Iaf9ae3aeef7305c937f2660c929744db2d88776c	2020-05-20 10:36:30 +00:00
Egon Elbre	ec589a8289	all: fix comments about grpc Change-Id: Id830fbe2d44f083c88765561b6c07c5689afe5bd	2020-05-11 13:05:34 +03:00
Egon Elbre	bcd93ee375	private/testplanet: add StopNodeAndUpdate This was commonly used and code with it can be simplified. Change-Id: I2f2b91f7de54269aee6ef027f97f9e8a7d222e39	2020-05-08 13:02:19 +00:00
Moby von Briesen	de366537a8	satellite/satellitedb/overlaycache: fix behavior around gracefully exited nodes Sometimes nodes who have gracefully exited will still be holding pieces according to the satellite. This has some unintended side effects currently, such as nodes getting disqualified after having successfully exited. * When the audit reporter attempts to update node stats, do not update stats (alpha, beta, suspension, disqualification) if the node has finished graceful exit (audit/reporter_test.go TestGracefullyExitedNotUpdated) * Treat gracefully exited nodes as "not reputable" so that the repairer and checker do not count them as healthy (overlay/statdb_test.go TestKnownUnreliableOrOffline, repair/repair_test.go TestRepairGracefullyExited) Change-Id: I1920d60dd35de5b2385a9b06989397628a2f1272	2020-04-28 23:58:43 +00:00
Jess G	825226c98e	satellite/overlay: use node selection cache for uploads (#3859 ) * satellite/overlay: use node selection cache for uploads Change-Id: Ibd16cccee979d0544f2f4a01749af9f36f02a6ad * fix config lock Change-Id: Idd307e4dee8ab92749f1ec3f996419ea0af829fd * start fixing tests Change-Id: I207d373a3b2a2d9312c9e72fe9bd0b01e06ad6cf * fix test, add some more Change-Id: I82b99c2004fca2510965f9b389f87dd4474bc722 * change config name Change-Id: I0c0f7fc726b2565dc3828cb723f5459a940f2a0b * add benchmarks Change-Id: I05fa25bff8d5b65f94d918556855b95163d002e9 * revert bench to put in different PR Change-Id: I0f6942296895594768f19614bd7b2e3b9b106ade * add staleness to benchmark Change-Id: Ia80a310623d5a342afa6d835402170b531b0f870 * add cache config to testplanet Change-Id: I39abdab8cc442694da543115a9e470b2a8a25dff * have repair select old way Change-Id: I25a938457d7d1bcf89fd15130cb6b0ac19585252 * lower testplante config time Change-Id: Ib56a2ed086c06bc6061388d15a10a2526a663af7 * fix test Change-Id: I3868e9cacde2dfbf9c407afab04dc5fc2f286f69	2020-04-24 09:11:04 -07:00
Moby von Briesen	178aa8b5e0	satellite/{metainfo,repair}: Delete expired segments from metainfo * Delete expired segments in expired segments service using metainfo loop * Add test to verify expired segments service deletes expired segments * Ignore expired segments in checker observer * Modify checker tests to verify that expired segments are ignored * Ignore expired segments in segment repairer and drop from repair queue * Add repair test to verify that a segment that expires after being added to the repair queue is ignored and dropped from the repair queue Change-Id: Ib2b0934db525fef58325583d2a7ca859b88ea60d	2020-04-22 13:02:31 +00:00
Jess G	75b9a5971e	satellite: update log levels (#3851 ) * satellite: update log levels Change-Id: I86bc32e042d742af6dbc469a294291a2e667e81f * log version on start up for every service Change-Id: Ic128bb9c5ac52d4dc6d6c4cb3059fbad73f5d3de * Use monkit for tracking failed ip resolutions Change-Id: Ia5aa71d315515e0c5f62c98d9d115ef984cd50c2 * fix compile errors Change-Id: Ia33c8b6e34e780bd1115120dc347a439d99e83bf * add request limit value to storage node rpc err Change-Id: I1ad6706a60237928e29da300d96a1bafa94156e5 * we cant track storage node ids in monkit metrics so lets use logging to track that for expired orders Change-Id: I1cc1d240b29019ae2f8c774792765df3cbeac887 * fix build errs Change-Id: I6d0ffe058e9a38b7ed031c85a29440f3d68e8d47	2020-04-15 12:32:22 -07:00
Egon Elbre	6492b13d81	all: remove old uuid Change-Id: I3a137f73456f010c37d3933dbe12cbbb840b809f	2020-04-02 19:30:36 +03:00
Egon Elbre	0a69da4ff1	all: switch to storj.io/common/uuid Change-Id: I178a0a8dac691e57bce317b91411292fb3c40c9f	2020-03-31 19:16:41 +03:00
littleskunk	048ca4558f	satellite/repair: clean up logging (#3833 ) Co-authored-by: Michal Niewrzal <michal@storj.io>	2020-03-30 11:59:56 +02:00
Egon Elbre	480ea1e4b5	satellite/repair/repairer: fix temporary file handling Change-Id: Ice1a467510737b3375c018ae37b16431c7dffe9e	2020-03-27 21:36:23 +02:00
Moby von Briesen	a933bcc99a	satellite/repair/repairer/ec.go: add option for downloading pieces onto disk instead of in memory during repair Add flag to satellite repairer, "InMemoryRepair" that allows the satellite to decide whether to download the entire segment being repaired into memory (this is what the satellite already does), or to download it into temporary files on disk that will be read from in the upload phase of repair. This should help with handling high repair traffic on satellites that cannot afford to spend 64mb of memory per repair worker. Updates tests to test repair for both in memory and to disk. Change-Id: Iddf591e165621497c98533d45bfea3c28b08a194	2020-03-27 16:41:00 +00:00
Egon Elbre	e8f18a2cfe	private/testplanet: expose storagenode and satellite Config Change-Id: I80fe7ed8ef7356948879afcc6ecb984c5d1a6b9d	2020-03-27 17:01:25 +02:00
paul cannon	ba5991dc86	satellite/repair: add monitoring for remote_segments_healthy_percentage Change-Id: I6ad29fe1a947ac19d15e40ea33164a510eb33d4f	2020-03-17 17:45:59 +00:00
Moby von Briesen	2f991b6c56	satellite/{overlay, satellitedb}: account for `suspended` field in overlay cache Make sure that suspended nodes are treated appropriately by the overlay cache. This means we should expect the following behavior: * suspended nodes (vetted or not) should not be selected for uploading new segments * suspended nodes should be treated by the checker and repairer as "unhealthy", and should be removed upon successful repair This commit also removes unused overlay functionality. Fixes a bug with commit `8b72181a1f` where the audit reporter was automatically suspending nodes regardless of audit outcome (see test added). Tests: * updates repair tests to ensure that a suspended node is treated as unhealthy and will be removed from the pointer on successful repair * updates overlay tests for KnownUnreliableOrOffline and KnownReliable to expect suspended nodes to be considered "unreliable" * adds satellitedb test that ensures overlay.SelectStorageNodes and overlay.SelectNewStorageNodes do not include suspended nodes * adds audit reporter test to ensure that different audit outcomes result in the correct suspended/disqualified states Change-Id: I40dba67278c8e8d2ce0bcec5e0a5cb6e4ce2f561	2020-03-17 17:14:56 +00:00
Moby von Briesen	8b72181a1f	satellite/{audit,overlay,satellitedb}: implement unknown audit reputation and suspension * change overlay.UpdateStats to allow a third audit outcome. Now it can handle successful, failed, and unknown audits. * when "unknown audit reputation" (unknownAuditAlpha/(unknownAuditAlpha+unknownAuditBeta)) falls below the DQ threshold, put node into suspension. * when unknown audit reputation goes above the DQ threshold, remove node from suspension. * record unknown audits from audit reporter. * add basic tests around unknown audits and suspension. Change-Id: I125f06f3af52e8a29ba48dc19361821a9ff1daa1	2020-03-16 20:29:26 +00:00
Bill Thorp	94c11c5212	satellite: remove some unnecessary UTC() calls Fixes some easy cases of extraneous UTC() calls Change-Id: I3f4c287ae622a455b9a492a8892a699e0710ca9a	2020-03-13 13:49:44 +00:00
Jess G	39cb821196	satellite/overlay: rm combinedcache, fix IP naming to be network (#3798 ) * rn combinedcache, rm dns node lookup Change-Id: I239f07211764b097d851230d8c81900a47756e9e * excludeIPs -> excludedNetworks Change-Id: Ifa6f44ab17457cdd5aff4cd5694296867c18b179 * use lowercase var name Change-Id: I825aad2b718c71f455e747be18f8cabd02aabe55 * update Getnetwork name Change-Id: I002a1b7bc6b4ef40159c0cd2b0ef209f80a9c503 * fix comments Change-Id: Ibddf5b9ffa9d685af6c392d893db063ef18e45fa * update comments with ipv6 Change-Id: I31758b7d4979e7c27d014668f4fb532ad838cda2 Co-authored-by: Stefan Benten <mail@stefan-benten.de>	2020-03-12 11:37:57 -07:00
Jessica Grebenschikov	803e2930f4	satellite: use IP for all uplink operations, use hostname for audit and repairs My understanding is that the nodes table has the following fields: - `address` field which can be a hostname or an IP - `last_net` field that is the /24 subnet of the IP resolved from the address This PR does the following: 1) add back the `last_ip` field to the nodes table 2) for uplink operations remove the calls that the satellite makes to `lookupNodeAddress` (which makes the DNS calls to resolve the IP from the hostname) and instead use the data stored in the nodes table `last_ip` field. This means that the IP that the satellite sends to the uplink for the storage nodes could be approx 1 hr stale. In the short term this is fine, next we will be adding changes so that the storage node pushes any IP changes to the satellite in real time. 3) use the address field for repair and audit since we want them to still make DNS calls to confirm the IP is up to date 4) try to reduce confusion about hostname, ip, subnet, and address in the code base Change-Id: I96ce0d8bb78303f82483d0701bc79544b74057ac	2020-03-11 09:11:40 -07:00
paul cannon	79553059cb	satellite/repair: put irreparable segments in irreparableDB Previously, we were simply discarding rows from the repair queue when they couldn't be repaired (either because the overlay said too many nodes were down, or because we failed to download enough pieces). Now, such segments will be put into the irreparableDB for further and (hopefully) more focused attention. This change also better differentiates some error cases from Repair() for monitoring purposes. Change-Id: I82a52a6da50c948ddd651048e2a39cb4b1e6df5c	2020-03-09 21:45:16 +00:00
Moby von Briesen	e4da7bd9cd	satellite/repair/checker: use repair override if available in checker and irreparable In production, the satellite is overriding the default repair threshold (35) to a higher value (52). In some places in the checker and irreparable processes, the repair threshold on the redundancy scheme is used in place of the override value. This fixes those cases. Change-Id: Ie7387217d9fb3886f050b5e5b67be51f276196de	2020-03-06 15:39:53 -05:00
Bill Thorp	e99e675fb1	satellite/satellitedb: use time zones with all timestamps The migration was broken into one migration per table to reduce table locking and reduce the chances of failure due to SQL timeouts. Of the 14 fields that lacked time zones, only the 3 named 'interval_start` seemed to have non-UTC data in them. These fields are fixed in the migration by removing the +00 and adding AT TIME ZONE current_setting('TIMEZONE') Field with good data are migrated by adding AT TIME ZONE 'UTC' Note that postgres's timezone() is different than cockroach's timezone() so AT TIME ZONE is used. https://storjlabs.atlassian.net/browse/SM-104 Change-Id: I410f2f1d7c11b143f17844347f37e6f4b1e70fce	2020-03-05 21:11:25 +00:00
Jennifer Johnson	1c1750e6be	removes bandwidth limiting On satellite, remove all references to free_bandwidth column in nodes table. On storage node, remove references to AllocatedBandwidth and MinimumBandwidth and mark as deprecated. Protobuf message, NodeCapacity, is left intact for backwards compatibility. Once this is released to all satellites, we can drop the column from the DB. Change-Id: I2ff6c6537fc9008a0c5588e951afea58ede85838	2020-03-04 14:04:00 +00:00
Moby von Briesen	d5540c89a1	satellite/repair/checker: add monkit metrics for segments immediately above repair threshold Record counts for segments at health=rt+1 through health=rt+5 for every checker iteration. Change-Id: I2a00c0bc34d17beb21cacdeab4dac77f755faefe	2020-02-26 20:27:15 +00:00
Moby von Briesen	4e5a7f13c7	satellite/repair/queue: Prioritize selection of items off repair queue by segment health Add a column to the repair queue table in the satellite db for healthy piece count. When an item is selected from the repair queue, the least durable segment that has not been attempted in the past hour should be selected first. This prevents our repairer from getting stuck doing work on segments that are close to the repair threshold while allowing segments that are more unhealthy to degrade further. The migration also clears the repair queue so that the migration runs quickly and we can properly account for segment health in future repair work. We do not select items off the repair queue that have been attempted in the past six hours. This was changed from on hour to allow us time to try a wider variety of segments when the repair queue is very large. Change-Id: Iaf183f1e5fd45cd792a52e3563a3e43a2b9f410b	2020-02-26 09:54:16 -05:00
paul cannon	92d86fa044	satellite/repair: fix repair concurrency This new repair timeout (configured as TotalTimeout) will include both the time to download pieces and the time to upload pieces, as well as the time to pop the segment from the repair queue. This is a move from Github PR #3645. Change-Id: I47d618f57285845d8473fcd285f7d9be9b4318c8	2020-02-24 19:57:09 +00:00
Egon Elbre	5342dd9fe6	go.mod: update uplink Change-Id: I867a6a1eef8aa5d60bb676e5112b98c4192ce811	2020-02-21 16:08:12 +02:00
Cameron Ayer	b22bf16b35	satellite/overlay: add config flag for node selection free disk requirement Currently SNs report their free disk space once per hour. If a node becomes full, it has to wait until the next contact cycle begins to report; all the while receiving and failing upload requests. By increasing the minimum required disk space, we can give the storage nodes more time to report their space before the completely fill up. This change goes hand-in-hand with another change we want to implement: trigger capacity report on SN immediately upon falling below threshold. Change-Id: I12f778286c6c3f582438b0e2949765ac43325e27	2020-02-11 18:08:25 +00:00
Michal Niewrzal	426c8eb31a	private/testplanet: add DeleteBucket method for uplink New method added to be able to delete easily bucket during tests. Change-Id: Iaae89618cc676ddbbbd4b0df2eeacd143ea6f3c2	2020-02-11 15:58:13 +00:00
Jeff Wendling	7999d24f81	all: use monkit v3 this commit updates our monkit dependency to the v3 version where it outputs in an influx style. this makes discovery much easier as many tools are built to look at it this way. graphite and rothko will suffer some due to no longer being a tree based on dots. hopefully time will exist to update rothko to index based on the new metric format. it adds an influx output for the statreceiver so that we can write to influxdb v1 or v2 directly. Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff	2020-02-05 23:53:17 +00:00
Moby von Briesen	006a2824ba	satellite/repair: lock monkit stats in checker and repairer Change-Id: Ia10fc8da0177389a500359ce51d21a5806f3f7b1	2020-01-30 14:09:56 +00:00
Egon Elbre	8dea4f52db	satellite: add control panel Change-Id: Id48246e9bcd4c6ec643277fe740937b2e42ad85b	2020-01-30 08:06:43 -05:00
Michal Niewrzal	6502454947	satellite/metainfo: move RS configuration to satellite With this change RS configuration will be set on satellite. Uplink with get RS values with BeginObject request and will use it. For backward compatibility and to avoid super large change redundancy scheme stored with bucket is not touched. This can be done in future. Change-Id: Ia5f76fc10c37e2c44e4f7b8754f28eafe1f97eff	2020-01-22 09:33:53 +00:00
Egon Elbre	f3b4bf2b7c	satellite/satellitedb/satellitedbtest: pass ctx as an argument ctx is created in most tests, instead pass in as argument to reduce code duplication. Change-Id: I466c51c008392001129c8b007c9d6b3619935ac4	2020-01-20 16:35:42 +02:00
Jeff Wendling	9da16b1d9e	satellite/satellitedb/dbx: name the package dbx everyone was importing it as dbx anyway. why should it be named satellitedb? so yeah just pass the "-p dbx" flag. Change-Id: I5efa669f4f00f196b38a9acd0d402009475a936f	2020-01-15 15:16:39 -07:00
Yingrong Zhao	76ee8a1b4c	satellite: remove UptimeReputation configs from codebase With the new storage node downtime tracking feature, we need remove current uptime reputation configs: UptimeReputationAlpha, UptimeReputationBeta, and UptimeReputationDQ. This is the first step of removing the uptime reputation columns from satellitedb Change-Id: Ie8fab13295dbf545e33aeda0c4306cda4ba54e36	2020-01-08 18:54:15 +00:00
Egon Elbre	082ec81714	uplink: move to storj.io/uplink (#3746 )	2020-01-08 15:40:19 +02:00
Egon Elbre	2680bae88c	private/testplanet: remove dependency to uplink Remove direct dependency on uplink.RSConfig, this simplifies moving the config file without introducing weird dependencies. Change-Id: I7fd2a145401e0205d7047631df9d2810241efeec	2020-01-02 09:40:46 +00:00
Egon Elbre	6615ecc9b6	common: separate repository Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a	2019-12-27 14:11:15 +02:00
Cameron Ayer	a4f9865b47	satellite: adds and enables cockroachdb compatibility for tests Change-Id: I85a3ad8c3b9d7e15ea8675b6c55af0002933db57	2019-12-16 22:29:25 +00:00
Egon Elbre	72d407559e	satellite/metainfo: don't leak error implementation detail (#3722 ) * satellite/metainfo: don't leak implementation detail * add missing wrap	2019-12-10 15:21:30 -05:00
littleskunk	71b58edb2c	satellite/repair: decrease repair interval Change-Id: Id9efdbfaa82521c35dc41e7a52b700522c197e77	2019-12-10 00:36:00 +00:00
Yingrong Zhao	7af42e3c10	satellite/metainfo, satellite/repair, uplink/eestream: add metric for download failed due to not enough pieces available (#3665 )	2019-12-04 16:24:36 -05:00
Jennifer Johnson	ecb960f506	private/dbutil: distinguishes between db drivers and implementations to allow for different implementations of SQL queries. Change-Id: I2dc8d1d371139aa8bc805e92a2b80b71f580fd64	2019-12-04 18:31:26 +00:00
littleskunk	c52c7275ad	satellite/repair: reduce upload timeout (#3597 )	2019-11-18 18:52:56 +01:00
littleskunk	8b3444e088	satellite/nodeselection: don't select nodes that haven't checked in for a while (#3567 ) * satellite/nodeselection: dont select nodes that havent checked in for a while * change testplanet online window to one minute * remove satellite reconfigure online window = 0 in repair tests * pass timestamp into UpdateCheckIn * change timestamp to timestamptz * edit tests to set last_contact_success to 4 hours ago * fix syntax error * remove check for last_contact_success > last_contact_failure in IsOnline	2019-11-15 23:43:06 +01:00
Egon Elbre	ee6c1cac8a	private: rename internal to private (#3573 )	2019-11-14 21:46:15 +02:00
paul cannon	0c025fa937	storage/: remove reverse-key-listing feature We don't use reverse listing in any of our code, outside of tests, and it is only exposed through libuplink in the lib/uplink.(*Project).ListBuckets() API. We also don't know of any users who might have a need for reverse listing through ListBuckets(). Since one of our prospective pointerdb backends can not support backwards iteration, and because of the above considerations, we are going to remove the reverse listing feature. Change-Id: I8d2a1f33d01ee70b79918d584b8c671f57eef2a0	2019-11-12 18:47:51 +00:00
Egon Elbre	cc032d3151	satellite/metainfo: fix some uses of metainfo.Delete (#3513 ) * satellite/metainfo: rename Delete to UnsynchronizedDelete * fix deletes * make db private * fix typos * also verify on commit object	2019-11-06 18:02:14 +01:00
littleskunk	7eb6724c92	logging: unify logging around satellite ID, node ID and piece ID (#3491 ) * logging: unify logging around satellite ID, node ID and piece ID * unify segment index	2019-11-05 22:04:07 +01:00
Yingrong Zhao	bfa6699e2c	satellite/repair: add timeout for repair download from a single node(#3418 )	2019-10-30 16:31:08 -04:00
Natalie Villasana	5453886231	satellite/repair, uplink/ecclient: remove unused expiration arg from ec.Repair and ec.putPiece (#3416 )	2019-10-30 11:35:00 -04:00
Egon Elbre	3c438f31bd	satellite/satellitedb: remove sqlite support (#3296 )	2019-10-19 00:27:57 +03:00
littleskunk	2a5526fcc4	satellite/repair: reduce timeout (#3302 )	2019-10-18 13:43:24 +02:00
littleskunk	6e7607239c	satellite/repair: improve logging (#3287 ) * satellite/repair: improve logging * use Stringer wherever possible	2019-10-16 17:28:56 +02:00
Natalie Villasana	cf430d2d73	scripts: add check-monitoring script to detect changes to monkit calls (#3114 )	2019-10-15 13:00:14 -04:00
littleskunk	5b20c716e6	satellite/repair: dont error on deleted segments (#3252 )	2019-10-15 05:39:28 +02:00
Jennifer Li Johnson	b185dbbee2	satellite/discovery: remove discovery related code (#3175 )	2019-10-14 10:57:01 -04:00
Maximillian von Briesen	a75e3e6b81	satellite/repairer: fix segment_time_until_repair metric (#3199 )	2019-10-07 13:54:12 -04:00
littleskunk	b908a09c8e	satellite/repair: remove deprecated error message (#3193 )	2019-10-06 20:54:20 +02:00
Cameron	eb5413ae5e	defer close piecestore in downloadAndVerifyPiece (#3192 )	2019-10-06 13:41:53 -04:00
Egon Elbre	394a9c82c3	satellite/{accounting/tally,repair/checker}: create a valid test pointer (#3167 )	2019-10-04 13:05:25 +03:00
Stefan Benten	1db4251234	Satellite/repair: Add Repair Threshold Override to allow earlier repair (#3151 )	2019-10-02 14:58:37 +02:00
Jeff Wendling	098cbc9c67	all: use pkg/rpc instead of pkg/transport all of the packages and tests work with both grpc and drpc. we'll probably need to do some jenkins pipelines to run the tests with drpc as well. most of the changes are really due to a bit of cleanup of the pkg/transport.Client api into an rpc.Dialer in the spirit of a net.Dialer. now that we don't need observers, we can pass around stateless configuration to everything rather than stateful things that issue observations. it also adds a DialAddressID for the case where we don't have a pb.Node, but we do have an address and want to assert some ID. this happened pretty frequently, and now there's no more weird contortions creating custom tls options, etc. a lot of the other changes are being consistent/using the abstractions in the rpc package to do rpc style things like finding peer information, or checking status codes. Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412	2019-09-25 15:37:06 -06:00
Egon Elbre	ab3e3f827c	satellite/repair/checker: set erasure share size in tests (#3101 )	2019-09-24 10:01:48 +03:00
Jennifer Li Johnson	724bb44723	Remove Kademlia dependencies from Satellite and Storagenode (#2966 ) What: cmd/inspector/main.go: removes kad commands internal/testplanet/planet.go: Waits for contact chore to finish satellite/contact/nodesservice.go: creates an empty nodes service implementation satellite/contact/service.go: implements Local and FetchInfo methods & adds external address config value satellite/discovery/service.go: replaces kad.FetchInfo with contact.FetchInfo in Refresh() & removes Discover() satellite/peer.go: sets up contact service and endpoints storagenode/console/service.go: replaces nodeID with contact.Local() storagenode/contact/chore.go: replaces routing table with contact service storagenode/contact/nodesservice.go: creates empty implementation for ping and request info nodes service & implements RequestInfo method storagenode/contact/service.go: creates a service to return the local node and update its own capacity storagenode/monitor/monitor.go: uses contact service in place of routing table storagenode/operator.go: moves operatorconfig from kad into its own setup storagenode/peer.go: sets up contact service, chore, pingstats and endpoints satellite/overlay/config.go: changes NodeSelectionConfig.OnlineWindow default to 4hr to allow for accurate repair selection Removes kademlia setups in: cmd/storagenode/main.go cmd/storj-sim/network.go internal/testplane/planet.go internal/testplanet/satellite.go internal/testplanet/storagenode.go satellite/peer.go scripts/test-sim-backwards.sh scripts/testdata/satellite-config.yaml.lock storagenode/inspector/inspector.go storagenode/peer.go storagenode/storagenodedb/database.go Why: Replacing Kademlia Please describe the tests: • internal/testplanet/planet_test.go: TestBasic: assert that the storagenode can check in with the satellite without any errors TestContact: test that all nodes get inserted into both satellites' overlay cache during testplanet setup • satellite/contact/contact_test.go: TestFetchInfo: Tests that the FetchInfo method returns the correct info • storagenode/contact/contact_test.go: TestNodeInfoUpdated: tests that the contact chore updates the node information TestRequestInfoEndpoint: tests that the Request info endpoint returns the correct info Please describe the performance impact: Node discovery should be at least slightly more performant since each node connects directly to each satellite and no longer needs to wait for bootstrapping. It probably won't be faster in real time on start up since each node waits a random amount of time (less than 1 hr) to initialize its first connection (jitter).	2019-09-19 15:56:34 -04:00
Michal Niewrzal	1c72e80e40	uplink/satellite: fix for case when inline segment is last one (#3062 ) * uplink/satellite: fix when inline seg is last one * review comments	2019-09-19 01:18:14 +02:00
Jess G	7c203b4884	add satelliteSystem to testplanet and update tests (#3066 )	2019-09-17 13:14:49 -07:00
Natalie Villasana	cc70cd2329	satellite/repair: add metric trackers for segment age before repair (#3056 )	2019-09-17 15:18:48 -04:00
Yingrong Zhao	b37ea864b1	satellite/repair: delete pieces that failed piece hashes verification from pointer (#3051 ) * add test * add implementation * remove todo comments * modifies cooment * fix linting * typo oops	2019-09-16 13:13:24 -04:00
Maximillian von Briesen	7ada5d4152	satellite/metainfo: make piece hashes nil before storing pointer in metainfo.UpdatePieces() (#3050 )	2019-09-16 12:11:12 -04:00
Yingrong Zhao	95aa33c964	satellite/repair/repairer: update audit status as failed after failing piece hash verification (#2997 ) * update audit status as failed for nodes that failed piece hash verification * remove comment * fix lint error * add test * fix format * use named return value for Get * add comments * add more better comment * format	2019-09-13 12:21:20 -04:00
Egon Elbre	7240e6cbb2	satellite: remove remote/inline file from BucketTally (#3041 )	2019-09-13 16:51:41 +03:00
Natalie Villasana	a085b05ec5	satellitePeer -> satellite rename consistency in repair test (#3032 )	2019-09-12 13:16:39 -04:00
Egon Elbre	8b668ab1f8	satellite/metainfo.Loop: use a parsed path for observers (#3003 )	2019-09-12 13:38:49 +03:00
Natalie Villasana	aa3567187e	satellite/audit: worker now verifies and reverifies (#2965 )	2019-09-11 18:37:01 -04:00
Maximillian von Briesen	289cfe8ff2	satellite/repair: do not log "retrieved segment" if repair queue empty (#2995 )	2019-09-11 16:06:36 +03:00
Egon Elbre	a801fab66a	all: add archview annotations (#2964 )	2019-09-10 16:24:16 +03:00
Maximillian von Briesen	64602c3007	fix flaky repair tests (#2973 )	2019-09-06 15:02:01 -07:00
Maximillian von Briesen	fb10815229	Repair with hashes (#2925 ) * add outline for ECRepairer * add description of process in TODO comments * begin download/getting hash for a single piece * verify piece hash and order limit during download * fix download piece * begin filling out ESREpair. Get * wip move ecclient.Repair to ecrepairer.Repair * pass satellite signee into repairer * reconstruct original stripe from pieces * move rebuildStripe() * calculate piece size differently, increment successful count * fix shares slices initialization * rename stripeData to segment * do not pad reader in Repair() * temp debug * create unsafeRSScheme * use decode reader * rename file name to be all lowercase * make repair downloader async * declare condition variable inside Get method * set downloadAndVerifyPiece's in-memory buffer to be share size * update unusedLimits var * address comments * remove unnecessary comments * move initialization of segmentRepaire to be outside of repairer service * use ReadAll during download * remove dots and move hashing to after validating for order limit signature * wip test * make sure files exactly at min threshold are repaired * remove unused code * use corrput data and write back to storagenode * only create corrupted node and piece ids once * add comment * address nat's comment * fix linting and checker_test * update comment * add comments * remove "copied from ecclient" comments * add clarification comments in ec.Repair	2019-09-06 15:20:36 -04:00
Egon Elbre	6ff94caf22	satellite/satellitedb: move tests near the interface (#2863 )	2019-08-26 13:19:02 +03:00
Egon Elbre	00b2e1a7d7	all: enable staticcheck (#2849 ) * by having megacheck in disable it also disabled staticcheck * fix closing body * keep interfacer disabled * hide bodies * don't use deprecated func * fix dead code * fix potential overrun * keep stylecheck disabled * don't pass nil as context * fix infinite recursion * remove extraneous return * fix data race * use correct func * ignore unused var * remove unused consts	2019-08-22 13:40:15 +02:00
Egon Elbre	95080643b1	satellite/repair: fix data race (#2833 )	2019-08-20 17:46:39 +03:00
Egon Elbre	c8edeb0257	satellite/overlay: rename overlay.Cache to overlay.Service (#2717 )	2019-08-06 19:35:59 +03:00
Bill Thorp	fcbc9d71da	satellite/repair: add shouldDelete (#2702 ) * add shouldDelete to repair	2019-08-05 11:09:16 -04:00
Maximillian von Briesen	4547084f26	Implement checker observer (#2620 ) Integrate checker into metainfo loop	2019-08-01 14:44:32 -04:00
ethanadams	c9b46f2fe2	V3-1987: Optimize audits stats persistence (#2632 ) * Added batch update stats for recordAuditSuccessStatus * Added batch update stats to recordAuditFailStatus * added configurable batch size * build individual update/delete statements so the statements can be batched into 1 call to the DB * notified #config-changes channel and ran make update-satellite-config-lock * updated tests to use batch update stats	2019-07-31 13:21:06 -04:00
Alexander Leitner	4632ab0a67	Delete irreparable segments (#2642 ) * Delete irreparable segments	2019-07-30 11:38:25 -04:00
Alexander Leitner	159ad439b1	Add count to repair queue (#2661 ) * Add count to repair queue	2019-07-30 11:21:40 -04:00
Egon Elbre	e75813d094	satellite/repair: move segment repairer to satellite and simplify (#2651 )	2019-07-29 13:24:56 +02:00
Egon Elbre	dd7c8610bb	satellite/repair: move test files (#2649 )	2019-07-28 12:15:34 +03:00
Egon Elbre	5d0816430f	rename all the things (#2531 ) * rename pkg/linksharing to linksharing * rename pkg/httpserver to linksharing/httpserver * rename pkg/eestream to uplink/eestream * rename pkg/stream to uplink/stream * rename pkg/metainfo/kvmetainfo to uplink/metainfo/kvmetainfo * rename pkg/auth/signing to pkg/signing * rename pkg/storage to uplink/storage * rename pkg/accounting to satellite/accounting * rename pkg/audit to satellite/audit * rename pkg/certdb to satellite/certdb * rename pkg/discovery to satellite/discovery * rename pkg/overlay to satellite/overlay * rename pkg/datarepair to satellite/repair	2019-07-28 08:55:36 +03:00

1 2 3 4 5

214 Commits