Commit Graph

41 Commits

Author SHA1 Message Date
Ivan Fraixedes
f420b29d35
[V3-1927] Repairer uploads to max threshold instead of success… (#2423)
* pkg/datarepair: Add test to check num upload pieces
  Add a new test for ensuring the number of pieces that the repair process
  upload when a segment is injured.
* satellite/orders: Don't create "put order limits" over total
  Repair must not create "put order limits" more than the total count.
* pkg/datarepair: Update upload repair pieces test
  Update the test which checks the number of pieces which are uploaded
  during a repair for using the same excess over the success threshold
  value than the implementation.
* satellites/orders: Limit repair put order for not being total
  Limit the number of put orders to be used by repair for only uploading
  pieces to a % excess over the successful threshold.
* pkg/datarepair: Change DataRepair test to pass again
  Make some changes in the DataRepair test to make pass again after the
  repair upload repaired pieces only until a % excess over success
  threshold.
  Also update the steps description of the DataRepair test after it has been
  changed, to match on what's now, besides to leave it more generic for
  avoiding having to update it on minimal future refactorings.
* satellite: Make repair excess optimal threshold configurable
  Add a new configuration parameter to the satellite for being able to
  configure the percentage excess over the optimal threshold, used for
  determining how many pieces should be repaired/uploaded, rather than
  having the value hard coded.
* repairer: Add configurable param to segments/repairer
  Add a new parameters to the segment/repairer to calculate the maximum
  number of excess nodes, based on the optimal threshold, that repaired
  pieces can be uploaded.
  This new parameter has been added for not returning more nodes than the
  number of upload orders for data repair satellite service calculate for
  repairing pieces.
* pkg/storage/ec: Update log message in clien.Repair
* satellite: Update configuration lock file
2019-07-12 00:44:47 +02:00
Michal Niewrzal
268c629ba8
Replace base64 encoding for path segments (#2345) 2019-07-11 13:26:07 -04:00
Maximillian von Briesen
de85d17069
Add checker metrics (#2487)
checker_segment_total_count - Number of total segments in pointer during checker iteration
checker_segment_healthy_count - Number of healthy segments in pointer during checker iterationn
time_since_checker_queue - Seconds elapsed between checker queue and beginning repair
time_for_repair - Seconds elapsed between beginning repair and ending repair/dequeueing
2019-07-10 17:27:46 -04:00
Maximillian von Briesen
52e5a4eee3 pass logger into repairer and ecclient (#2365) 2019-07-02 13:08:02 +03:00
JT Olio
e58a06bd0c config: update release values to match prod (#2192) 2019-06-15 18:19:19 +02:00
JT Olio
3fe8343b6c repairer: fix config comments (#2105) 2019-06-04 14:13:31 +02:00
JT Olio
9c5708da32 pkg/*: add monkit task to missing places (#2109) 2019-06-04 13:36:27 +02:00
Bill Thorp
6522579ecb better repairer logging (#2006)
* logging and delete only repairs with no errors

* removing delete logi~c
2019-05-21 00:05:28 +02:00
aligeti
60cf1dafb0
repair segment reassess it missing pieces just before repair (#1939)
* repair segment reaccess it missing pieces just before repair to see if it actually needs repair
2019-05-16 09:49:10 -04:00
Bill Thorp
ea978dd674
hopefully sensible satellite defaults (#1888)
* hopefully sensible satellite defaults
2019-05-07 10:44:47 -04:00
Bill Thorp
2c9ef5b107
longer repair window (#1866) 2019-04-30 11:20:18 -04:00
Michal Niewrzal
fe3dfc1587
Move pointerdb.Service to satellite (#1826) 2019-04-25 10:46:32 +02:00
Egon Elbre
f49b0acc5b repair until queue is empty (#1716)
* repair until queue is empty
2019-04-22 11:16:21 -04:00
Bill Thorp
17a227e6e9
refactor injuredsegments db so that we can't have duplicates (#1717)
made repairqueue not use a true queue, forbid duplicates
2019-04-16 14:14:09 -04:00
Maximillian von Briesen
bb3b4e4816 Data repair integration test (#1582) 2019-04-08 13:33:47 -04:00
Egon Elbre
be06fdfd6c Create orders.Service (#1593) 2019-03-28 22:09:23 +02:00
Egon Elbre
94e79eda6d
remove overlay endpoint (#1521) 2019-03-23 10:06:11 +02:00
Egon Elbre
05d148aeb5
Storage node and upload/download protocol refactor (#1422)
refactor storage node server
refactor upload and download protocol
2019-03-18 12:55:06 +02:00
JT Olio
2a59679766 pkg/transport: require tls configuration for dialing (#1286)
* separate TLS options from server options (because we need them for dialing too)
* stop creating transports in multiple places
* ensure that we actually check revocation, whitelists, certificate signing, etc, for all connections.
2019-02-11 13:17:32 +02:00
Egon Elbre
5a63c00442
Fix issues with blocking during startup (#1212) 2019-02-01 19:28:40 +02:00
Jennifer Li Johnson
856b98997c
updates copyright 2018 to 2019 (#1133) 2019-01-24 15:15:10 -05:00
Egon Elbre
78dc02b758 Satellite Peer (#1034)
* add satellite peer

* Add overlay

* reorganize kademlia

* add RunRefresh

* add refresh to storagenode.Peer

* add discovery

* add agreements and metainfo

* rename

* add datarepair checker

* add repair

* add todo notes for audit

* add testing interface

* add into testplanet

* fixes

* fix compilation errors

* fix compilation errors

* make testplanet run

* remove audit refrences

* ensure that audit tests run

* dev

* checker tests compilable

* fix discovery

* fix compilation

* fix

* fix

* dev

* fix

* disable auth

* fixes

* revert go.mod/sum

* fix linter errors

* fix

* fix copyright

* Add address param for SN dashboard (#1076)

* Rename storj-sdk to storj-sim (#1078)

* Storagenode logs and config improvements  (#1075)

* Add more info to SN logs

* remove config-dir from user config

* add output where config was stored

* add message for successful connection

* fix linter

* remove storage.path from user config

* resolve config path

* move success  message to info

* log improvements

* Remove captplanet (#1070)

* pkg/server: include production cert (#1082)

Change-Id: Ie8e6fe78550be83c3bd797db7a1e58d37c684792

* Generate Payments Report (#1079)

* memory.Size: autoformat sizes based on value entropy (#1081)

* Jj/bytes (#1085)

* run tally and rollup

* sets dev default tally and rollup intervals

* nonessential storj-sim edits (#1086)

* Closing context doesn't stop storage node (#1084)

* Print when cancelled

* Close properly

* Don't log nil

* Don't print error when closing dashboard

* Fix panic in inspector if ping fails (#1088)

* Consolidate identity management to identity cli commands (#1083)

* Consolidate identity management:

Move identity cretaion/signing out of storagenode setup command.

* fixes

* linters

* Consolidate identity management:

Move identity cretaion/signing out of storagenode setup command.

* fixes

* sava backups before saving signed certs

* add "-prebuilt-test-cmds" test flag

* linters

* prepare cli tests for travis

* linter fixes

* more fixes

* linter gods

* sp/sdk/sim

* remove ca.difficulty

* remove unused difficulty

* return setup to its rightful place

* wip travis

* Revert "wip travis"

This reverts commit 56834849dcf066d3cc0a4f139033fc3f6d7188ca.

* typo in travis.yaml

* remove tests

* remove more

* make it only create one identity at a time for consistency

* add config-dir for consitency

* add identity creation to storj-sim

* add flags

* simplify

* fix nolint and compile

* prevent overwrite and pass difficulty, concurrency, and parent creds

* goimports
2019-01-18 08:54:08 -05:00
Michal Niewrzal
b712fbcbb0
Fix 'empty queue' error when satellite starts (#939) 2019-01-02 17:00:32 +01:00
Egon Elbre
4346cd060f
Implement mutex around satellitedb (#932) 2018-12-27 11:56:25 +02:00
Kaloyan Raev
2a8fef8062
Data repairer should use the redundancy strategy from segment's pointer (#838) 2018-12-13 09:12:36 +02:00
Michal Niewrzal
376bd74bed
Stops logging repairer errors for empty queue (#717) 2018-11-27 16:57:51 +01:00
Jennifer Li Johnson
93c5f385a8
Enable checker in captplanet and staging (#643)
* enable checker

* add option to use mock overlay in checker

* adds logs to checker

* appease linter
2018-11-20 10:54:22 -05:00
Jennifer Li Johnson
e678e52229
Creates Accounting Pkg to tally at rest node storage (#568)
* creates accounting package with tally service

* adds cancel on context

* test online nodes
2018-11-08 11:18:28 -05:00
Cameron
de46a999bc
Integrate SegmentStore Repair method with repair service (#582)
* add storeConfig struct and getSegmentStore helper for creating a segment store

* implement segment store in repairer, remove unnecessary repairer Repair method

* change repair method parameter from int to int32 to match type being passed in

* implement repairer service in captplanet

* rework Config, set Config defaults in captplanet/setup
2018-11-06 09:52:11 -05:00
Egon Elbre
2a8b681c4d
Run repairer and checker early (#565)
* Run repairers, checker, auditors first time they run to detect potential setup problems.
* Fix error handling in audit.Service
2018-11-01 16:03:45 +02:00
Jennifer Li Johnson
7ae2fa3575
moves bulk of code from ticker case to outside for indentation's sake (#559)
* moves bulk of code from ticker case to outside for indentation's sake

* adds whitespace

* removes break
2018-10-30 16:14:15 -04:00
Jennifer Li Johnson
1fb96689b8
creates run loop for data repair checker (#490)
* creates run loop for data repair checker

* moves actual checking and repairing under ticker case

* fixes mismatched queueaddrs
2018-10-30 15:16:40 -04:00
Cameron
f7828e73ea
remove ctx from repairer struct (#535) 2018-10-25 14:59:36 -04:00
Egon Elbre
8efb4f0e89
Fix repairing run (#523)
* Fix repairing run
* Fix concurrency bugs
* Add sync2.Limiter concurrency primitive
2018-10-24 15:35:59 +03:00
Alexander Leitner
3e1b16ea99
Start redis (#470)
* Start miniredis, repairer, and checker with captplanet
2018-10-12 14:04:16 -04:00
Jennifer Li Johnson
0e7f6358fb
creates configs for data repair package (#463)
* creates configs
2018-10-12 13:49:49 -04:00
Jennifer Li Johnson
ea6fc3c532
use utils.combinederrors (#461) 2018-10-11 14:42:32 -04:00
Jennifer Li Johnson
6fb13896fb
Method to identify injured segments to repair (#398)
* creates checker

* tests offline nodes

* test id injured segs:

* Adds healthy pieces to injured segment struct

* changes inequality

* creates common files

* adds checker benchmarking

* creates more common files

* Replaces pointedb direct db with api call to a new iterate method on pointerdb

* move monkit

* removes identifyrequest proto

* remove healthypieces

* adds benchmarking

creates common file for datarepair

* recreates proto file

* api key on ctx
2018-10-09 12:09:33 -04:00
Alexander Leitner
dc8bea2cd1
Repairer points to redis server (#427)
* Let's do it right this time

* Oh travis...

* Handle redis URL

* Travis... why u gotta be like this?

* Handle when address does not use redis scheme

* Start repairer

* Match provider.Responsibility interface

* Simplify if statement

* Config doesn't need to be a pointer

* Initialize doesn't need to be exported

* Don't run checker or repairer on startup

* Fix travis complaints
2018-10-05 11:58:07 -04:00
Alexander Leitner
f80ec62e9d
Reorganize repair (#419)
* Reorganize repair

* Don't run the repair code yet

* Pass max repair from config to repairer initialize

* Add repairer Interface

* fix comment
2018-10-03 14:35:56 -04:00
Cameron
027e4045c6
setup repairer loop (#378)
* setup repairer loop

* added read from queue

* Refactor to make things easier to import

* add more control flow to repairer

* add comment

* basic interval structure for running check/repair

* change function name GetNext to Dequeue

* better increment/decrement syntax

* export Repairer struct

* delete 'unreachable code'

* add mon.Task() to Repairer.Repair

* remove 24 hour interval

* set maxRepair on Config as well as Repairer

* add comment for Repairer struct, check err

* comment out runCfg.Repair in cmd/satellite/main.go because it is NI yet
2018-10-02 15:46:29 -04:00