For improving the deletion performance we are shifting the
responsibility to delete the pieces of the object from Uplink to the
Satellite.
BeginDeleteObject was the first call to return the stream ID which was
used for after retrieving the list of segments and then get addressed
order limits for deleting the pieces (of each segment) from the storage
nodes.
Now we want the Satellite deletes the pieces of all the object segments
from the storage nodes hence we don't need anymore to have several
network round trips between the Uplink and the Satellite because the
Satellite can delete all of them in the initial BegingDeleteObject
request.
satellite/metainfo.ListSegments has been changed to return 0 items if
the pointer of the last segment of an object is not found because we
need to preserve the backward compatibility with Uplinks that won't be
updated to the last release and they rely on listing the segments after
calling BeginDeleteObject for retrieving the addressed order limits
to contact the storage nodes to delete the pieces.
Change-Id: I5f99ecf27d62d65b0a062936b9b17581ef692af0
- non-blocking, memory only operations
- have a negative impact on performance
- make monkit SVGs for downloads all but unreadable
Change-Id: I2a1397249a7aaaa2de26b70cd65128663278a424
planet.Start starts a testplanet system, whereas planet.Run starts a testplanet
and runs a test against it with each DB backend (cockroach compat).
Change-Id: I39c9da26d9619ee69a2b718d24ab00271f9e9bc2
Fix some documentation comments in a few methods and apply our code
style conventions to all of them in this source file of the package.
Change-Id: I78186cb1e7d45020d6cf7bbea8a39d146ada1cb6
* pkg/pg: Add new service function storage node
Add a new service function to the storage node piece store for deleting
pieces when satellites request them.
* storagenode/piecestore: Add endpoint to delete piece
Add a new endpoint to receive from trusted satellites to delete a piece.
* private/testplanet: Fix storagenode mock
Add to the storagenode mock the new endpoint method.
* proto.lock: Update it with the last protbuff changes
* storagenode/piecestore: Reuse test piece upload
Extract the repeated logic from several tests functions for uploading a
test piece to a test helper function.
* uplink/piecestore: Implement client side method
Implement the client side method of the new piecestore RPC function.
* storagenode/piecestore: Add test DeletePiece endpoint
Implement a test for the DeletePiece new endpoint method.
* satellite/nodeselection: dont select nodes that havent checked in for a while
* change testplanet online window to one minute
* remove satellite reconfigure online window = 0 in repair tests
* pass timestamp into UpdateCheckIn
* change timestamp to timestamptz
* edit tests to set last_contact_success to 4 hours ago
* fix syntax error
* remove check for last_contact_success > last_contact_failure in IsOnline
The number of the last segment uploaded returned by upload wasn't
correct in all the return statements. The Put method calls upload but
with the returned values wasn't even certain of cleaning up correctly
the segments uploaded when an error happens.
This commit moves the logic of cleaning up inside of the upload method
because it's easier to understand than only doing inside of it the clean
up when a context cancellation happens. It also fixes the number of
segments to be deleted by cancelHandler when an error happens.
These changes should avoid trying to delete segments which haven't been
uploaded because of the wrong index value or because of being inline.
We don't use reverse listing in any of our code, outside of tests, and
it is only exposed through libuplink in the
lib/uplink.(*Project).ListBuckets() API. We also don't know of any users
who might have a need for reverse listing through ListBuckets().
Since one of our prospective pointerdb backends can not support
backwards iteration, and because of the above considerations, we are
going to remove the reverse listing feature.
Change-Id: I8d2a1f33d01ee70b79918d584b8c671f57eef2a0
* uplink/storage/streams: Upload loop ops reorganization
Reorganize the operations of the loop run by streamsStore.upload method
for not doing unneeded computations on each iteration.
* uplink/storage/streams: Move out returns values declaration
Move out return values declarations for those which aren't strictly
needed due to defer statements nor documentation purpose.
Change signature of metainfo DeleteObject to get rid of an extra call to
kvmetainfo GetBucket method and eliminate one round trip to the
satellite when deleting objects.
we used to do something similar for puts, but that ended up hurting
more than it helped. since deletes are best effort, we can do it
here to kill long tails or unresponsive nodes.
Change-Id: I89fd2d9dcf519d76c78ddad70bc419d1868d2df1
* uplink/storage/streams: Remove unused field of struct
Remove an unused field from a struct type because it isn't used.
* uplink/storage/streams: Add end period in func doc
Add missing end period in some functions documentation comments for
following conventions.
* uplink/storage/segments: Replace switch by if
Replace the switch statement of 2 branches one with a condition which
returns and the second is the default by a if conditional because it's
easy to read.
Refactoring of the segments Store interface Get method signature to
force the implementations to not use metainfo Client and be able for the
callers to batch requests.
* Make the exiting node check piece hashes, piece IDs, and piece hash signatures before relaying successful transfer data to the satellite.
* Enable immediate graceful exit failure for "successful" transfers that fail satellite-side validation.
* Move transfer piece logic in storagenode worker to separate function (to make the worker easier to understand)
* uplink/metainfo: Return classified Not Found error
Metainfo client Batch method must return the Storj Not Found error class
when the RCP server response with a not found status code as any other
metainfo Client method does.
Also if the error isn't Not Found one, it must wrap the error.
* uplink/storage/streams: Use Batch request in Delete
Change the 2 individual metainfo Client calls that streamStore Delete
method does by a single Batch one.
libuplink was incorrectly setting timeouts to 10 seconds still, but
should have been at least 10 minutes. the order sender was setting them
to 1 hour. we don't want timeouts in uplink-side logic as it establishes
a minimum rate on tcp streams.
instead of all of this, just use tcp keep alive. tcp keep alive packets are
sent every 15 seconds and if the peer stops responding the connection
dies. this is enabled by default with go. this will kill tcp connections
when they stop working.
Change-Id: I3d7ad49f71950b3eb43044eedf4b17993116045b
* uplink/storage/segments: return error no optimal threshold
Return an error if the store get less uploaded pieces than the indicated
by the optimal threshold.
* satellite/metainfo: Fix gRPC status error & add reason
This commit fix the CommitSegment endpoint method to return an
"Invalid Argument" status code when uplink submits invalid data which is
detected when filtering invalid pieces by filterInvalidPieces endpoint
method.
Because filterInvalidPieces is also used by CommitSegmentOld, such
method part has been changed accordingly.
* An initial check in CommitSegment to detect earlier if uplink sends an
invalid number of upload pieces.
* Add more information to some log messages.
* Return more information to uplink when it sends a number of invalid
pieces which make impossible to finish the operation successfully.
* satellite/metainfo: Swap some "sugar" loggers to normal ones
Swap "sugar" loggers to normal ones because they impact the performance
in production systems and they should only be used under specific
circumstances which were none of the ones changed.
Refactoring the 'defer' function logic to just only have what's important to not forget before returning but simplifying its logic for making easy to understand the overall function logic.
Uplink must verify that every piece upload to a storage node return a
hash whose timestamp isn't older than the maximum elapsed time allowed
by the Satellite.
We cannot leave this check only to the Satellite site, because if there
is no error reported by this matter, the uplink cuts down the long tail.
When uplink submits the result uploads including these invalid ones, the
Satellite filters out the invalid ones and that can provoke that it gets
less than the optimal threshold amount of valid upload results, so it
rejects the request.
Detecting the error at this stage will allow the uplink to detect these
uploads as invalid and avoid to cut down the long tail prematurely.
This change adds a trusted registry (via the source code) of node address to node id mappings (currently only for well known Satellites) to defeat MITM attacks to Satellites. It also extends the uplink UI such that when entering a satellite address by hand, a node id prefix can also be added to defeat MITM attacks with unknown satellites.
When running uplink setup, satellite addresses can now be of the form 12EayRS2V1k@us-central-1.tardigrade.io (not even using a full node id) to ensure that the peer contacted is the peer that was expected. When using a known satellite address, the known node ids are used if no override is provided.
all of the packages and tests work with both grpc and
drpc. we'll probably need to do some jenkins pipelines
to run the tests with drpc as well.
most of the changes are really due to a bit of cleanup
of the pkg/transport.Client api into an rpc.Dialer in
the spirit of a net.Dialer. now that we don't need
observers, we can pass around stateless configuration
to everything rather than stateful things that issue
observations. it also adds a DialAddressID for the
case where we don't have a pb.Node, but we do have an
address and want to assert some ID. this happened
pretty frequently, and now there's no more weird
contortions creating custom tls options, etc.
a lot of the other changes are being consistent/using
the abstractions in the rpc package to do rpc style
things like finding peer information, or checking
status codes.
Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412
What: we move api keys out of the grpc connection-level metadata on the client side and into the request protobufs directly. the server side still supports both mechanisms for backwards compatibility.
Why: dRPC won't support connection-level metadata. the only thing we currently use connection-level metadata for is api keys. we need to move all information needed by a request into the request protobuf itself for drpc support. check out the .proto changes for the main details.
One fun side-fact: Did you know that protobuf fields 1-15 are special and only use one byte for both the field number and type? Additionally did you know we don't use field 15 anywhere yet? So the new request header will use field 15, and should use field 15 on all protobufs going forward.
Please describe the tests: all existing tests should pass
Please describe the performance impact: none