Commit Graph

99 Commits

Author SHA1 Message Date
Maximillian von Briesen
6df4d7bc73
storagenode/gracefulexit + satellite/gracefulexit: add storagenode-side transfer validation (#3371)
* Make the exiting node check piece hashes, piece IDs, and piece hash signatures before relaying successful transfer data to the satellite.
* Enable immediate graceful exit failure for "successful" transfers that fail satellite-side validation.
* Move transfer piece logic in storagenode worker to separate function (to make the worker easier to understand)
2019-10-25 13:16:20 -04:00
Ivan Fraixedes
d9d82b0336 uplink: Reduce satellite request using Batch when possible (#3351)
* uplink/metainfo: Return classified Not Found error

Metainfo client Batch method must return the Storj Not Found error class
when the RCP server response with a not found status code as any other
metainfo Client method does.

Also if the error isn't Not Found one, it must wrap the error.

* uplink/storage/streams: Use Batch request in Delete

Change the 2 individual metainfo Client calls that streamStore Delete
method does by a single Batch one.
2019-10-24 14:18:48 -07:00
Michal Niewrzal
521c39bda0
uplink/metainfo: cleanup method names (#3315) 2019-10-22 23:59:56 -07:00
JT Olio
2c6fa3c5f8
pkg/rpc: remove read/write deadlines as a mechanism for request timeouts (#3335)
libuplink was incorrectly setting timeouts to 10 seconds still, but
should have been at least 10 minutes. the order sender was setting them
to 1 hour. we don't want timeouts in uplink-side logic as it establishes
a minimum rate on tcp streams.

instead of all of this, just use tcp keep alive. tcp keep alive packets are
sent every 15 seconds and if the peer stops responding the connection
dies. this is enabled by default with go. this will kill tcp connections
when they stop working.

Change-Id: I3d7ad49f71950b3eb43044eedf4b17993116045b
2019-10-22 17:57:24 -06:00
Ethan Adams
3e0d12354a
storagenode/gracefulexit: Implement storage node graceful exit worker - part 1 (#3322) 2019-10-22 16:42:21 -04:00
Ivan Fraixedes
071d1c4313
upload: Add more info to returned error response & to logs (#3218)
* uplink/storage/segments: return error no optimal threshold
  Return an error if the store get less uploaded pieces than the indicated
  by the optimal threshold.

* satellite/metainfo: Fix gRPC status error & add reason
  This commit fix the CommitSegment endpoint method to return an
  "Invalid Argument" status code when uplink submits invalid data which is
  detected when filtering invalid pieces by filterInvalidPieces endpoint
  method.

  Because filterInvalidPieces is also used by CommitSegmentOld, such
  method part has been changed accordingly.

  * An initial check in CommitSegment to detect earlier if uplink sends an
    invalid number of upload pieces.
  * Add more information to some log messages.
  * Return more information to uplink when it sends a number of invalid
    pieces which make impossible to finish the operation successfully.

* satellite/metainfo: Swap some "sugar" loggers to normal ones
  Swap "sugar" loggers to normal ones because they impact the performance
  in production systems and they should only be used under specific
  circumstances which were none of the ones changed.
2019-10-17 20:01:40 +02:00
Ivan Fraixedes
21c6737b10
uplink/ecclient: clarify defer logic in putPiece (#3247)
Refactoring the 'defer' function logic to just only have what's important to not forget before returning but simplifying its logic for making easy to understand the overall function logic.
2019-10-16 18:05:22 +02:00
littleskunk
2301a8287f Satellite/PieceHashValidation: Increase time window from 2h to 24h to avoid timezone issues (#3291) 2019-10-16 06:47:08 -06:00
Ivan Fraixedes
9caa3181d3
uplink/piecestore: Check SN piece hash timestamp (#3246)
Uplink must verify that every piece upload to a storage node return a
hash whose timestamp isn't older than the maximum elapsed time allowed
by the Satellite.

We cannot leave this check only to the Satellite site, because if there
is no error reported by this matter, the uplink cuts down the long tail.
When uplink submits the result uploads including these invalid ones, the
Satellite filters out the invalid ones and that can provoke that it gets
less than the optimal threshold amount of valid upload results, so it
rejects the request.

Detecting the error at this stage will allow the uplink to detect these
uploads as invalid and avoid to cut down the long tail prematurely.
2019-10-15 16:07:18 +02:00
JT Olio
6ede140df1
pkg/rpc: defeat MITM attacks in most cases (#3215)
This change adds a trusted registry (via the source code) of node address to node id mappings (currently only for well known Satellites) to defeat MITM attacks to Satellites. It also extends the uplink UI such that when entering a satellite address by hand, a node id prefix can also be added to defeat MITM attacks with unknown satellites.

When running uplink setup, satellite addresses can now be of the form 12EayRS2V1k@us-central-1.tardigrade.io (not even using a full node id) to ensure that the peer contacted is the peer that was expected. When using a known satellite address, the known node ids are used if no override is provided.
2019-10-12 14:34:41 -06:00
Jeff Wendling
098cbc9c67 all: use pkg/rpc instead of pkg/transport
all of the packages and tests work with both grpc and
drpc. we'll probably need to do some jenkins pipelines
to run the tests with drpc as well.

most of the changes are really due to a bit of cleanup
of the pkg/transport.Client api into an rpc.Dialer in
the spirit of a net.Dialer. now that we don't need
observers, we can pass around stateless configuration
to everything rather than stateful things that issue
observations. it also adds a DialAddressID for the
case where we don't have a pb.Node, but we do have an
address and want to assert some ID. this happened
pretty frequently, and now there's no more weird
contortions creating custom tls options, etc.

a lot of the other changes are being consistent/using
the abstractions in the rpc package to do rpc style
things like finding peer information, or checking
status codes.

Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412
2019-09-25 15:37:06 -06:00
Michal Niewrzal
607da4ab4a
metainfo: move FinishDeleteSegment logic to BeginDeleteSegment (#3104) 2019-09-23 14:41:58 -07:00
JT Olio
946ec201e2
metainfo: move api keys to part of the request (#3069)
What: we move api keys out of the grpc connection-level metadata on the client side and into the request protobufs directly. the server side still supports both mechanisms for backwards compatibility.

Why: dRPC won't support connection-level metadata. the only thing we currently use connection-level metadata for is api keys. we need to move all information needed by a request into the request protobuf itself for drpc support. check out the .proto changes for the main details.

One fun side-fact: Did you know that protobuf fields 1-15 are special and only use one byte for both the field number and type? Additionally did you know we don't use field 15 anywhere yet? So the new request header will use field 15, and should use field 15 on all protobufs going forward.

Please describe the tests: all existing tests should pass

Please describe the performance impact: none
2019-09-19 10:19:29 -06:00
Michal Niewrzal
1c72e80e40 uplink/satellite: fix for case when inline segment is last one (#3062)
* uplink/satellite: fix when inline seg is last one

* review comments
2019-09-19 01:18:14 +02:00
Jess G
7c203b4884
add satelliteSystem to testplanet and update tests (#3066) 2019-09-17 13:14:49 -07:00
Isaac Hess
5a50042c77
uplink/storage/streams: Add test for interrupted deletes (#3040)
* uplink/storage/streams: Add test for interrupted deletes

* uplink/storage/streams: Fix linting errors
2019-09-13 13:08:15 -06:00
Ivan Fraixedes
ccbf73ecc7
uplink/ecclient: Remove unneeded atomic operation (#3036)
Atomic operations are only needed when a variable can be accessed
concurrently, so when it isn't the case there is no need to use them.
2019-09-13 12:47:35 +02:00
Ivan Fraixedes
8a48500ba4
uplink/ecclient: Report success in debug level (#3037)
Packages shouldn't be chatty when the things go as expected unless the
DEBUG log level is set.
2019-09-13 12:04:12 +02:00
Michal Niewrzal
64c467ffe7
uplink: integrate new Metainfo calls (#2640) 2019-09-10 08:39:47 -07:00
Maximillian von Briesen
fb10815229 Repair with hashes (#2925)
* add outline for ECRepairer

* add description of process in TODO comments

* begin download/getting hash for a single piece

* verify piece hash and order limit during download

* fix download piece

* begin filling out ESREpair. Get

* wip move ecclient.Repair to ecrepairer.Repair

* pass satellite signee into repairer

* reconstruct original stripe from pieces

* move rebuildStripe()

* calculate piece size differently, increment successful count

* fix shares slices initialization

* rename stripeData to segment

* do not pad reader in Repair()

* temp debug

* create unsafeRSScheme

* use decode reader

* rename file name to be all lowercase

* make repair downloader async

* declare condition variable inside Get method

* set downloadAndVerifyPiece's in-memory buffer to be share size

* update unusedLimits var

* address comments

* remove unnecessary comments

* move initialization of segmentRepaire to be outside of repairer service

* use ReadAll during download

* remove dots and move hashing to after validating for order limit signature

* wip test

* make sure files exactly at min threshold are repaired

* remove unused code

* use corrput data and write back to storagenode

* only create corrupted node and piece ids once

* add comment

* address nat's comment

* fix linting and checker_test

* update comment

* add comments

* remove "copied from ecclient" comments

* add clarification comments in ec.Repair
2019-09-06 15:20:36 -04:00
Michal Niewrzal
61168493dc
uplink: don't stop deleting segments on first error (#2943) 2019-09-05 14:25:30 +02:00
Michal Niewrzal
a6721ba92f
satellite/metainfo: Improve metainfo ListSegments (#2882) 2019-08-30 23:30:18 +02:00
Natalie Villasana
9a1b9f8431
uplink/ecclient: change delete logs from err to debug level (#2917) 2019-08-30 17:00:34 -04:00
Egon Elbre
c309bd3fec
lint: add linting for errs package (#2881) 2019-08-27 19:07:12 +03:00
Bill Thorp
a250551b6d storagenode/piecestore + uplink/piecestore: return PieceHash and original OrderLimit during GET_REPAIR (#2775) 2019-08-26 14:57:41 -04:00
JT Olio
12d50ebb99
streams: don't encrypt segment count (#2859)
What: this change makes sure the count of segments is not encrypted.

Why: having the segment count encrypted just makes things hard for no reason - a satellite operator can figure out how many segments an object has by looking at the other segments in the database. but if a user has access but has lost their encryption key, they now can't clean up or delete old segments because they can't know how many there are without just guessing until they get errors. :(

Backwards compatibility: clients will still understand old pointers and will still write old pointers. at some point in the future perhaps we can do a migration for remaining old pointers so we can delete the old code.

Please describe the tests: covered by existing tests

Please describe the performance impact: none
2019-08-22 15:15:58 -06:00
Jeff Wendling
057d30152c
uplink/storage/segments: seed download permuatation with timestamp (#2809) 2019-08-16 11:14:02 -06:00
Maximillian von Briesen
189b268892
uplink/piecestore: Change where ignore cancel happens for closing downloads (#2786) 2019-08-15 10:32:05 -04:00
Bryan White
1915b59af3 satellite/repair: monkit improvements (#2773) 2019-08-14 15:40:26 -04:00
Maximillian von Briesen
3a82b63974
uplink/ecclient: performance - close connections faster (#2757) 2019-08-14 10:03:51 -04:00
Egon Elbre
48211daa9d
uplink/piecestore: handle Download errors better (#2771) 2019-08-14 12:02:58 +03:00
Egon Elbre
9eba5ac631
lib/uplink: remove Seek method (#2768) 2019-08-13 20:29:02 +03:00
Cameron
1f837c53eb
uplink/ecclient: read concurrently with dials during download (#2711)
* do dialing in read

* remove unused type clientCloser

* add mutex to lazyPieceReader

* add nodeID to Download.Read trace
2019-08-09 11:01:40 -04:00
Egon Elbre
c8edeb0257
satellite/overlay: rename overlay.Cache to overlay.Service (#2717) 2019-08-06 19:35:59 +03:00
Michal Niewrzal
de7dddbe59
metainfo: Batch request (#2694) 2019-08-06 16:56:23 +02:00
Jeff Wendling
21a3bf89ee cmd/uplink: use scopes to open (#2501)
What: Change cmd/uplink to use scopes

It moves the fields that will be subsumed by scopes into an explicit legacy section and hides their configuration flags.

Why: So that it can read scopes in from files and stuff
2019-08-05 11:01:20 -06:00
Bryan White
e4c10f3311 uplink/ecclient: add more monkit for segment piece info (#2701) 2019-08-05 17:46:32 +03:00
Michal Niewrzal
688d932d93
Make one implementation for SetAttribution/SetBucketAttribution (#2683) 2019-08-05 09:07:40 +02:00
Egon Elbre
ebbf0e1462
uplink/storage: don't import mock in production code (#2687) 2019-08-02 11:01:36 +03:00
Michal Niewrzal
287fdf9936
Integrate new Metainfo calls (server side) (#2682) 2019-08-01 11:04:31 +02:00
Egon Elbre
4f0d39cc64
don't use global loggers (#2675) 2019-07-31 17:38:44 +03:00
Egon Elbre
9ba8b53ed5 pkg/auth: use grpc.WithPerRPCCredentials (#2670) 2019-07-31 13:57:13 +02:00
Ivan Fraixedes
abef20930f
storagenode: Report gRPC error when satellite is untrusted (#2658)
* storagenode/piecestore: Unexport endpoint method
  Make an exported endpoint method to be unexported because it's only used
  by the same package and makes easy to change without thinking in
  breaking changes.
* uplink/ecclient: Use structured logger
  Swap sugared logger by the normal structured logger for having the full
  stack traces of the error in the debug message.
* storagenode/piecestore: Send gRPC error codes upload
  Refactoring in the storagenode/piecestore to send gRPC status error codes
  when some of the methods involved by upload return an error.
  
  The uplink related to uploads has also been modified to retrieve the
  gRPC status code when an error is returned by the server.
2019-07-30 18:58:08 +02:00
Egon Elbre
e75813d094 satellite/repair: move segment repairer to satellite and simplify (#2651) 2019-07-29 13:24:56 +02:00
Egon Elbre
dd7c8610bb
satellite/repair: move test files (#2649) 2019-07-28 12:15:34 +03:00
Egon Elbre
5d0816430f
rename all the things (#2531)
* rename pkg/linksharing to linksharing
* rename pkg/httpserver to linksharing/httpserver
* rename pkg/eestream to uplink/eestream
* rename pkg/stream to uplink/stream
* rename pkg/metainfo/kvmetainfo to uplink/metainfo/kvmetainfo
* rename pkg/auth/signing to pkg/signing
* rename pkg/storage to uplink/storage
* rename pkg/accounting to satellite/accounting
* rename pkg/audit to satellite/audit
* rename pkg/certdb to satellite/certdb
* rename pkg/discovery to satellite/discovery
* rename pkg/overlay to satellite/overlay
* rename pkg/datarepair to satellite/repair
2019-07-28 08:55:36 +03:00
Michal Niewrzal
5710dc3a32
Metainfo RPC segment methods (part 2) (#2616) 2019-07-24 13:33:23 +02:00
Michal Niewrzal
cba008d7df
Add GetObject method to Metainfo (#2611) 2019-07-23 13:09:12 +02:00
aligeti
29b576961f
value attribution merge fix and more test cases (#2588)
* value attribution merge fix and more test cases
2019-07-19 11:17:34 -04:00
Simon Guindon
91f0adef10
Add the ability to set dial and request timeouts from the cmd/uplink CLI to libuplink. (#2439)
* Added the ability to pass timeout settings from cmd/uplink to libuplink.

* Removed commented out code.

* Updated 2min timeouts for the uplink CLI.

* Removed comment.

* Made transport defaultDialTimeout and defaultRequestTimeout public

* Added comments to describe where these defaults apply.

* Added a new defaults to libuplink and added tests.

* Added a new defaults to libuplink and added tests.
2019-07-18 11:13:59 -04:00