Commit Graph

83 Commits

Author SHA1 Message Date
Ivan Fraixedes
3c8f1370d2
[v3 2137] - Add more info to find out repair failures (#2623)
* pkg/datarepair/repairer: Track always time for repair
  Make a minor change in the worker function of the repairer, that when
  successful, always track the metric time for repair independently if the
  time since checker queue metric can be tracked.

* storage/postgreskv: Wrap error in Get func
  Wrap the returned error of the Get function as it is done when the
  query doesn't return any row.

* satellite/metainfo: Move debug msg to the right place
  NewStore function was writing a debug log message when the DB was
  connected, however it was always writing it out despite if an error
  happened when getting the connection.

* pkg/datarepair/repairer: Wrap error before logging it
  Wrap the error returned by process which is executed by the Run method
  of the repairer service to add context to the error log message.

* pkg/datarepair/repairer: Make errors more specific in worker
  Make the error messages of the "worker" method of the Service more
  specific and the logged message for such errors.

* pkg/storage/repair: Improve error reporting Repair
  In order of improving the error reporting by the
  pkg/storage/repair.Repair method, several errors of this method and
  functions/methods which this one relies one have been updated to be
  wrapper into their corresponding classes.

* pkg/storage/segments: Track path param of Repair method
  Track in monkit the path parameter passed to the Repair method.

* satellite/satellitedb: Wrap Error returned by Delete
  Wrap the error returned by repairQueue.Delete method to enhance the
  error with a class and stack and the
  pkg/storage/segments.Repairer.Repair method get a more contextualized
  error from it.
2019-07-23 16:28:06 +02:00
Ivan Fraixedes
f420b29d35
[V3-1927] Repairer uploads to max threshold instead of success… (#2423)
* pkg/datarepair: Add test to check num upload pieces
  Add a new test for ensuring the number of pieces that the repair process
  upload when a segment is injured.
* satellite/orders: Don't create "put order limits" over total
  Repair must not create "put order limits" more than the total count.
* pkg/datarepair: Update upload repair pieces test
  Update the test which checks the number of pieces which are uploaded
  during a repair for using the same excess over the success threshold
  value than the implementation.
* satellites/orders: Limit repair put order for not being total
  Limit the number of put orders to be used by repair for only uploading
  pieces to a % excess over the successful threshold.
* pkg/datarepair: Change DataRepair test to pass again
  Make some changes in the DataRepair test to make pass again after the
  repair upload repaired pieces only until a % excess over success
  threshold.
  Also update the steps description of the DataRepair test after it has been
  changed, to match on what's now, besides to leave it more generic for
  avoiding having to update it on minimal future refactorings.
* satellite: Make repair excess optimal threshold configurable
  Add a new configuration parameter to the satellite for being able to
  configure the percentage excess over the optimal threshold, used for
  determining how many pieces should be repaired/uploaded, rather than
  having the value hard coded.
* repairer: Add configurable param to segments/repairer
  Add a new parameters to the segment/repairer to calculate the maximum
  number of excess nodes, based on the optimal threshold, that repaired
  pieces can be uploaded.
  This new parameter has been added for not returning more nodes than the
  number of upload orders for data repair satellite service calculate for
  repairing pieces.
* pkg/storage/ec: Update log message in clien.Repair
* satellite: Update configuration lock file
2019-07-12 00:44:47 +02:00
Egon Elbre
d52f764e54
protocol: implement new piece signing and verification (#2525) 2019-07-11 16:51:40 -04:00
Alexander Leitner
1c5db71faf
Change protobuf expirations to use time.Time (#2509)
* Change protobuf expirations to use time.Time instead of timestamp.Timestamp
2019-07-09 17:54:00 -04:00
Ivan Fraixedes
e0a8937c6c
pkg/storage/ec: Make minor improvements in client.Repair (#2454)
* Fix some log message to actually report the number of pieces needed to
  repaired for reaching the successful/optimal threshold.
* Remove some unneeded `nil` check conditional.
2019-07-05 12:39:10 +02:00
Maximillian von Briesen
2f6e193c32
Remove long tail timer in ecclient (#2433)
* remove long tail timer in ecclient
2019-07-03 11:00:24 -04:00
Jeff Wendling
42e124cd08
monitor optimal wait fraction (#2435)
* monitor optimal wait fraction

Change-Id: I1c76da5e8031237cf78ce5a0774732dd5e558ea1

* monitor other times about the upload

Change-Id: Iae81c80fb1446fbf4b3dd04fc6b238f2ede96545
2019-07-02 20:49:35 +00:00
Alexander Leitner
6d55bbdb57
OrderLimit creation date time limit (#2412)
* Limit by order creation
2019-07-02 12:06:12 -04:00
Maximillian von Briesen
52e5a4eee3 pass logger into repairer and ecclient (#2365) 2019-07-02 13:08:02 +03:00
Egon Elbre
385c046723
pkg/pb: rename Order2 to Order, OrderLimit2 to OrderLimit (#2406) 2019-07-01 18:54:11 +03:00
Egon Elbre
7b66e0cd7c Use dial to clarify that it's internally closing the connection. (#2347) 2019-06-26 15:14:48 +03:00
Egon Elbre
b6ad3e9c9f
internal/testrand: new package for random data (#2282) 2019-06-26 13:38:51 +03:00
Philip Hutchins
11016b5067 Moving verbose info logging to debug (#2346) 2019-06-26 12:14:25 +03:00
Egon Elbre
c7679b9b30
Fix some leaks and add notes about close handling (#2334) 2019-06-25 23:00:51 +03:00
Kaloyan Raev
964c87c476 Fix checks around repair threshold (#2246) 2019-06-19 22:13:11 +02:00
Kaloyan Raev
ebd9b375fc
Repair should not corrupt files (#2194) 2019-06-14 12:16:31 +03:00
Simon Guindon
3a7ebea76e
Moved monkit tracing after the limit nil check & fix node id formatting. (#2156)
* Moved monkit tracing after the limit nil check & fix node id formatting.

* Fixed to keep function traced even if limits is nil.
2019-06-07 18:34:16 -04:00
Simon Guindon
a55df84bf7
Adding RS values to the info log and NodeID to tracing. (#2130) 2019-06-06 15:51:00 -04:00
JT Olio
ccb158c99b
pkg/auth: add monkit task to missing places (#2123)
What: add monkit.Task to a bunch of functions that are missing it

Why: this will significantly help our instrumentation, data collection, and tracing about what's going on in the network
2019-06-05 07:47:01 -06:00
JT Olio
9c5708da32 pkg/*: add monkit task to missing places (#2109) 2019-06-04 13:36:27 +02:00
littleskunk
878e79dd79 Bugfix: Repair increase success counter (#2015) 2019-05-21 14:23:00 +02:00
littleskunk
8e023b8bbf improve logging (#2004) 2019-05-20 16:18:16 +02:00
littleskunk
c974e0ce8a
Store repaired Segments and improve Repair Condition (#2000)
* repair no cutoff longtail

* commit repair pieces even if not hitting success threshold

* commit repair pieces even if not hitting success threshold

* remove useless condition

* better error message
2019-05-20 12:50:13 +02:00
littleskunk
d2c95c1d62 improve repair logs (#1999) 2019-05-20 10:37:46 +02:00
Kaloyan Raev
8fc5fe1d6f
Refactor pb.Node protobuf (#1785) 2019-04-22 12:07:50 +03:00
Philip Hutchins
a840948671
Moving failure for putting pieces to Debug as this is expected behavior (#1777) 2019-04-18 08:35:44 -04:00
paul cannon
0ae0de75bc use SerializableMeta to store bucket attributes (#1658) 2019-04-10 18:27:04 -04:00
Maximillian von Briesen
ed8fc126aa
add error in ec client Repair() if not enough successful nodes by end (#1722) 2019-04-09 18:46:43 -04:00
Michal Niewrzal
d7feafe56b Move psserver tests (#1522) 2019-03-20 23:12:00 +02:00
Kaloyan Raev
d057efb05e
Add Repair method to ECClient (#1509) 2019-03-19 15:14:59 +02:00
Egon Elbre
117edec54c
Add serial number type (#1508) 2019-03-18 15:08:24 +02:00
Egon Elbre
05d148aeb5
Storage node and upload/download protocol refactor (#1422)
refactor storage node server
refactor upload and download protocol
2019-03-18 12:55:06 +02:00
Kaloyan Raev
5fa7a4a7c6 Ensure ECClient upload timer is stopped when no more status is expected (#1397)
This change ensures that the upload timer of ECClient is always stopped after no more status is expected from uploaded pieces. It also ensures that the "Timer expired" message will be logged only if the context is not already cancelled.

This is to avoid confusing logs where a "Timer expired" message is logged significantly later and mixes with similar messages logged from the upload of the next file segments.
2019-03-04 14:48:13 +01:00
Michal Niewrzal
81408a3c9e
Use SignedHash on client/uplink side (#1354)
* psclient receives storage node hash and compare it to own hash for verification
* uplink sends delete request when hashes don't match
* valid hashes are propagated up to segments.Store for future sending to satellite
2019-02-25 16:57:54 +01:00
Bill Thorp
373b301736
BWA aliases (#1333)
aliased RBAs and PBAs
2019-02-22 16:17:35 -05:00
Natalie Villasana
c3d3f41d30 removes some SignedMessage use (#1258)
Removes most instances of pb.SignedMessage (there's more to take out but they shouldn't hurt anyone as is).

There used to be places in psserver where a PieceID was hmac'd with the SatelliteID, which was gotten from a SignedMessage. This PR makes it so some functions access the SatelliteID from the Payer Bandwidth Allocation instead.

This requires passing a SatelliteID into psserver functions where they weren't before, so the following proto messages have been changed:

 * PieceId - satellite_id field added
   This is so the psserver.Piece function has access to the SatelliteID when it needs to get the namespaced pieceID.
   This proto message should probably be renamed to PieceRequest, or a new PieceRequest message should be created so this isn't misnamed.

 * PieceDelete - satellite_id field added
   This is so the psserver.Delete function has access to the SatelliteID when receiving a request to Delete.
2019-02-19 23:36:08 -06:00
JT Olio
2a59679766 pkg/transport: require tls configuration for dialing (#1286)
* separate TLS options from server options (because we need them for dialing too)
* stop creating transports in multiple places
* ensure that we actually check revocation, whitelists, certificate signing, etc, for all connections.
2019-02-11 13:17:32 +02:00
paul cannon
c35b93766d
Unite all cryptographic signing and verifying (#1244)
this change removes the cryptopasta dependency.

a couple possible sources of problem with this change:

 * the encoding used for ECDSA signatures on SignedMessage has changed.
   the encoding employed by cryptopasta was workable, but not the same
   as the encoding used for such signatures in the rest of the world
   (most particularly, on ECDSA signatures in X.509 certificates). I
   think we'll be best served by using one ECDSA signature encoding from
   here on, but if we need to use the old encoding for backwards
   compatibility with existing nodes, that can be arranged.

 * since there's already a breaking change in SignedMessage, I changed
   it to send and receive public keys in raw PKIX format, instead of
   PEM. PEM just adds unhelpful overhead for this case.
2019-02-07 14:39:20 -06:00
Kaloyan Raev
3c73d3a33c Fixes intermitent failures in storage/ec TestPut (#1239) 2019-02-05 18:49:52 +02:00
Kaloyan Raev
dd76829d10
Improve logic for cutting the long tail during upload (#909) 2019-02-05 12:54:25 +02:00
Egon Elbre
d5346982c2
Delete provider package (#1177) 2019-01-30 22:47:21 +02:00
Egon Elbre
6132ce86b7
Remove utils.LogClose (#1169) 2019-01-29 22:42:27 +02:00
Jennifer Li Johnson
856b98997c
updates copyright 2018 to 2019 (#1133) 2019-01-24 15:15:10 -05:00
Egon Elbre
99d3b7a3c8
Fix import grouping (#1111) 2019-01-22 17:48:23 +02:00
Matt Robinson
c0e6b62708
Test all-in-one (#900)
* Add test for aio

* Don't trust the user to have images built for a version

* Make travis run the aio test

* Add missing values to docker-compose, sort some things, consider the gateway image

* today's changes

* config changed, again

* more fixes

* Expose satellite port on localhost:7778

* Add retries and a timeout around the big-testfile test in AIO

* Another config value changed

* Make this error message a little more useful

* Fix nil condition
2019-01-03 14:54:27 -05:00
Jennifer Li Johnson
a2fa5c4c5a Proper NodeType Handling (#873)
* adds enums to nodetype

* updating nodetype todos

* ran pb updates

* reorder nodetypes

* adding checks

* wip

* wip

* wip

* bug in test-captplanet

* wip

* add values to storagenode, satellite, captplanet binaries

* Cleanup

* more cleanup

* wip

* lint

* lint

* wip

* fixes bug

* regenerate protos

Change-Id: Id270212e8c7479e52641058042cf23b5317ab773

* limit node type changes to kademlia

Change-Id: I9c1a6cc4a79e05086627f0fdeb5028c62ce754f4

* dpanic

Change-Id: Id952a2ad13c807ebaea0ec0a875405e267d81c3e

* review comments

Change-Id: I7f9b77ef22779dd012fd490375b136014f51f834
2019-01-02 11:47:34 -07:00
Kaloyan Raev
252da15f0d
Randomize node selection during GETs (#827) 2018-12-11 18:05:14 +02:00
Egon Elbre
1e4556f88a
Fix import groupings (#739) 2018-11-30 15:40:13 +02:00
Bryan White
2a0c4e60d2
preparing for use of customtype gogo extension with NodeID type (#693)
* preparing for use of `customtype` gogo extension with `NodeID` type

* review changes

* preparing for use of `customtype` gogo extension with `NodeID` type

* review changes

* wip

* tests passing

* wip fixing tests

* more wip test fixing

* remove NodeIDList from proto files

* linter fixes

* linter fixes

* linter/review fixes

* more freaking linter fixes

* omg just kill me - linterrrrrrrr

* travis linter, i will muder you and your family in your sleep

* goimports everything - burn in hell travis

* goimports update

* go mod tidy
2018-11-29 19:39:27 +01:00
Bryan White
ee62e2a9d8
Use transport client and cleanup all the clients (#574)
* wip

* linter fixes

* linter fixes

* test fixes

* linter fixes

* fix merge + restructure piecestore packages

* review feedback

* linter fixes

* linter fixes

* remove unnecessary aliases to piecestore

* more merge fixing
2018-11-06 18:49:17 +01:00