potential encryption overhead.
This is the same approach we have for validating remote segment size.
https://storjlabs.atlassian.net/browse/USR-619
Change-Id: I2597ee734313a3068fd986001680bbedbf1bed2a
size with BeginObject
Such solution will add one round trip to satellite during upload so for
now we are reverting this until we will have solution for this.
Change-Id: Ic2d826448ab7b0318cd6922df05deee9167cf2f0
BeginObject response
We want to control inline segment size and segment size on satellite
side. We need to return such information to uplink like with redundancy
scheme.
Change-Id: If04b0a45a2757a01c0cc046432c115f475e9323c
all permissions
Without read and list permissions BeginObjectDelete won't return error
if occurs. This was breaking Batch processing because there was
assumption that without error response will be always not nil.
https://storjlabs.atlassian.net/browse/SM-590
Change-Id: I0fc9539e429110a660eb28725b266d5e4771d198
uuid.UUID implements driver.Value so it can be directly used as a
scannable result.
Replace uses of dbutil.BytesToUUID with uuid.FromBytes.
Change-Id: I51a670185ceb3cc2199d5aa2b76bc3fc191ca8fe
Instead of providing the database from outside to testplanet create it
inside and then allow wrapping and modifying it. This is more convenient
to use.
Change-Id: I9b8f69e6e0a19ff984b4e2bfe927c9100c77bc6c
This implements a service for pointer verification. This makes the
slightly clearer, because it's not part of metainfo.
It also adds a peer identity cache which reduces database calls and peer
identity decoding.
Change-Id: I45da40460d579c6f5fd74c69bccea215157aafda
step 1 in https://review.dev.storj.io/c/storj/uplink/+/1236
Now the old libuplink uses the temporary DeleteBucketReturnDeleted and
DeleteObjectReturnDeleted methods. This way, in the next step, we will
be able to change the DeleteBucket and DeleteObject methods to return
the deleted bucket/object.
Change-Id: I2e638be1960bca6ce1456c92849fcdd6d93e5252
Initial change for checking bucket existence on satellite side for
requests like BeginObject and ListObjects. This is simple implementation
that is just checking bucket in DB but should be improved in future to
avoid DB calls as much as possible.
Part of https://storjlabs.atlassian.net/browse/USR-365
Change-Id: I9076acddc44d7dbfa7612a1c24a007de01621583
This adds a piece deletion handler that has debounce for failed dialing
and batching multiple jobs into a single request.
Change-Id: If64021bebb2faae7f3e6bdcceef705aed41e7d7b
To handle concurrent deletion requests we need to combine them into a
single request.
To implement this we introduces few concurrency ideas:
* Combiner, which takes a node id and a Job and handles combining
multiple requests to a single batch.
* Job, which represents deleting of multiple piece ids with a
notification mechanism to the caller.
* Queue, which provides communication from Combiner to Handler.
It can limit the number of requests per work queue.
* Handler, which takes an active Queue and processes it until it has
consumed all the jobs.
It can provide limits to handling concurrency.
Change-Id: I3299325534abad4bae66969ffa16c6ed95d5574f
Metainfo method validateAuth checks things like API key, user permission
and rate limit but at the end all errors were returned as
rpcstatus.Unauthenticated.
Old Metainfo is not touched to avoid backward compatibility issues.
Change-Id: I78eb276210fc50151da58a5c84e13ecd0961da29
New API has limited number of options to configure at the moment. We
should remove unused flags from Uplink CLI and add if needed in the
future.
Change-Id: Icf3f3dadd43cb61a3b408b02d0762aef34425dbf
uplinks
New libuplink is not storing encryption values in with bucket but old
uplinks are using those values for configuration. If bucket was created
with new libuplink we will send back satellite defaults.
Change-Id: Ie1bf3682847e07b302270b4c4bf1a7219f4bf011
On satellite, remove all references to free_bandwidth column in nodes table.
On storage node, remove references to AllocatedBandwidth and MinimumBandwidth and mark as deprecated.
Protobuf message, NodeCapacity, is left intact for backwards compatibility.
Once this is released to all satellites, we can drop the column from the DB.
Change-Id: I2ff6c6537fc9008a0c5588e951afea58ede85838
This change is a special case for batch processing. If in batch request
CommitSegment and CommitObject are one after another we can execute
these requests as one. This will avoid current logic where we are saving
pointer for CommitSegment and later we are deleting this pointer and
saving it once again as under last segment path for CommitObject.
This change should handle issue we have in older uplinks with incorrect
order of storing pointers.
Change-Id: I86514c95df169e6fbc91b52e5117472cae70cb8b
we want to return back to the user as quick as possible but also keep
deleting remaining pieces on the storagenodes
Change-Id: I04e9e7a80b17a8c474c841cceae02bb21d2e796f
rationale: if GC kills the satellite, it would be nice to make
it through a repair checker sweep first
Change-Id: Id56171dc8e13940cfb6481e36a910bad077a01ed
Trace the calls to DeletePiecesService.DeletePieces method and add
metrics for having statistics about the rate that specific storage node
is dialed and duration time spent on dialing storage nodes.
These statistics will help us to find out if we should implement
connections queues to storage node for reducing the deletion time in cae
that we see that we're spending too much time dialing frequent storage
nodes.
Ticket: https://storjlabs.atlassian.net/browse/SM-85
Change-Id: I9601676c3a8ad96c73c93833145929e4817755e2
Fixes https://storjlabs.atlassian.net/browse/USER-240
- Adds UnsynchronizedPut method to metainfo service that overwrites any
existing pointer under the same path
- Uses UnsynchronizedPut in the metainfo endpoint for committing the
segments
Change-Id: Icb43f31ea33f14066ca9dfdcf226eb3079b90948
Currently SNs report their free disk space once per hour. If a node
becomes full, it has to wait until the next contact cycle begins to
report; all the while receiving and failing upload requests. By increasing
the minimum required disk space, we can give the storage nodes more time
to report their space before the completely fill up. This change goes
hand-in-hand with another change we want to implement: trigger capacity
report on SN immediately upon falling below threshold.
Change-Id: I12f778286c6c3f582438b0e2949765ac43325e27
Add monkit metric for the rate-limit when the rate limit is hit
Logs warning with projectID
https://storjlabs.atlassian.net/browse/SM-165
Change-Id: I352dc40006021990d1bc66a999f62bbf8deb54db
this commit updates our monkit dependency to the v3 version where
it outputs in an influx style. this makes discovery much easier
as many tools are built to look at it this way.
graphite and rothko will suffer some due to no longer being a tree
based on dots. hopefully time will exist to update rothko to
index based on the new metric format.
it adds an influx output for the statreceiver so that we can
write to influxdb v1 or v2 directly.
Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff
Allow rate limit project cache to expire so we can make project level rate limit changes without restarting the satellite process.
Change-Id: I159ea22edff5de7cbfcd13bfe70898dcef770e42
Satellite now is keeping RS values for uplink but old uplinks were using
default bucket settings. Because of that we need to override buckets
settings with satellite settings to avoid breaking older uplinks.
Change-Id: Ia1068db70e4adbf741c5e81d27d9e39799049c22
This reverts commit 8772867855.
for uplink versions v0.25.0 through v0.30.7, there's a bug with multiplesegment upload
where the last segment is inline caused by this commit.
Change-Id: If375e186b23265586caf08991c25980e99f3cc1a
Change is adding object deletion to BeginObject request (before upload).
Now when satellite controls deletion we can move deletion before upload
to satellite. This change improves two things:
* no need for additional request to delete object before upload (need
one more change to storj/uplink)
* fix an issue with lack of permissions to upload if caveat allows only
for writing (e.g. disallow deletes but allows to write)
https://storjlabs.atlassian.net/browse/V3-3362
Change-Id: Ic453146298cdd302df290c532123731a3f99e38e
Change DeleteObjectPieces for deleting the segments' pointers of an
object in a reverse order.
Last segment: L
N: total number of segments
Deleting in reverse order is: L, n-1 to 0
Deleting in reverse order makes BeginDeleteObject usable to delete
partially uploaded objects that were interrupted (e.g. upload
cancellation).
With this change, the uplink upload cancellation, can be changed to use
BeginDeleteObject to cleanup already uploaded segments without having to
retrieve orders and dial every single node which stored a piece.
Ticket: https://storjlabs.atlassian.net/browse/V3-3525
Change-Id: Ieca6fd3801c4b71671811cb5f08a99d5146928a6
A few variables were not renamed to the new standard piecesTotal and
piecesContentSize, so it was unclear which value was being used. These
have been updated, and some comments made more thorough.
Change-Id: I363bad4dec2a8e5c54d22c3c4cd85fc3d2b3096c
This change updates the storagenode piecestore apis to expose access to
the full piece size stored on disk. Previously we only had access to
(and only kept a cache of) the content size used for all pieces. This
was inaccurate when reporting the amount of disk space used by nodes.
We now have access to the total content size, as well as the total disk
usage, of all pieces. The pieces cache also keeps a cache of the total
piece size along with the content size.
Change-Id: I4fffe7e1257e04c46021a2e37c5adc6fe69bee55
With this change RS configuration will be set on satellite. Uplink with
get RS values with BeginObject request and will use it. For backward
compatibility and to avoid super large change redundancy scheme stored
with bucket is not touched. This can be done in future.
Change-Id: Ia5f76fc10c37e2c44e4f7b8754f28eafe1f97eff
Limits how many times metainfo APIs can be called per second by project ID. If limit is exceeded, the API will return Unauthorized/Too Many requests.
Limit per second and the size of the limiter cache per project are configurable, as well as whether the limiter is enabled.
Tests added/updated for the new rate_limit field in projects table.
Tests added for exceeding limits and disableing limiter.
Change-Id: Ic8ad102de3b690a475809d4f684156d5715f20fa
This change is a special case for batch processing. If in batch request
CommitSegment and CommitObject are one after another we can execute
these request as one. This will avoid current logic where we are saving
pointer for CommitSegment and later we are deleting this pointer and
saving it once again as under last segment path for CommitObject.
Change-Id: If170e78c8410f5ba5916cbff6a29b9221db9ce2e
live accounting used to be a cache to store writes before they are picked up during
the tally iteration, after which the cache is cleared. This created a window in which
users could potentially exceed the storage limit. This PR refactors live accounting to
hold current estimations of space used per project. This should also reduce DB load
since we no longer need to query the satellite DB when checking space used for limiting.
The mechanism by which the new live accounting system works is as follows:
During the upload of any segment, the size of that segment is added to its respective
project total in live accounting. At the beginning of the tally iteration we record
the current values in live accounting as `initialLiveTotals`. At the end of the tally
iteration we again record the current totals in live accounting as `latestLiveTotals`.
The metainfo loop observer in tally allows us to get the project totals from what it
observed in metainfo DB which are stored in `tallyProjectTotals`. However, for any
particular segment uploaded during the metainfo loop, the observer may or may not
have seen it. Thus, we take half of the difference between `latestLiveTotals` and
`initialLiveTotals`, and add that to the total that was found during tally and set that
as the new live accounting total.
Initially, live accounting was storing the total stored amount across all nodes rather than
the segment size, which is inconsistent with how we record amounts stored in the project
accounting DB, so we have refactored live accounting to record segment size
Change-Id: Ie48bfdef453428fcdc180b2d781a69d58fd927fb
Create a service for deleting pieces of storage nodes.
Currently the DeletePieces method returns after a success threshold,
completion or a timeout.
The end goal is to return when reaching the success threshold and
leaving the remaining goroutines running after DeletePieces method
returns and add a life cycle to the service that it waits for them when
it closes.
This is the first commit for ticket:
https://storjlabs.atlassian.net/browse/V3-3476
Change-Id: If740bbf57c741f880449980b8176b036dd956c7b
this allows for setting $STORJ_METAINFO_POSTGRESQL_USE_ALT=yes if you
want to use the cockroachkv implementation for metainfo against postgres
Change-Id: I0c9458c83fd67ee63ef4a78351e64a80a0647408
Add a back-pressure mechanism to the satellite metainfo
DeleteObjectPieces method for returning once the 75% of successful
deleted pieces is reached.
Change-Id: Ia38df49fba5838f0605c40a77cfff8e3442cb5b0
The DeleteObjectPieces should print out the warning on closing the
connections only if there was an error.
Change-Id: If3d7ab256d8508c08388c1f22c7dd1eb819d2509
The DeleteObjectPieces must close the storage node client once it has
finished deleting its pieces.
Change-Id: I08eb8af8e4215d77d59b52f5055211b918374ab4
DeleteObjectPieces must not call overlay cache KnownReliable method with
an empty list of node IDs for avoiding to log a useless noisy warning.
Change-Id: Ibe2a34f2913f003d3ba020f9764c1369fa63123b
Move tests for old Metainfo API to separate file. Metainfo tests file is
large enough and in future it will be easier to remove old tests.
Change-Id: I9421907ef015a6dfa65f4de6ef01b2d2c8baa7df
Use the helper function IsRPC of the err2 package rather than checking
if an error is of a specific RPC status code with an 'if' conditional.
Change-Id: Ibe89d6c2d836307c3112a6d7cc6bf95f0f985fd2
When the context was being cancelled the error was being discarded within the rate limiting error handling which caused tests to fail.
Change-Id: I5c6458c16da09a11531233ea0ee80d914969cb3f
deletePointer must return an ErrObjectNotFound rather than a rpc status
error NotFound because the callers must distinguish such error if it
comes from the getPointer or from the UnsynchronizedDelete.
Change-Id: I68b4e45a2765e63b73bf85c2c39a5fc0198373f6
As per discussed we decided to rate limit how fast we iterate through
the metainfo database in the metainfo loop. This puts in place a
mechanism for rate limiting and burst limiting if need be in the future.
The default for this rate limiting is still no limits so it stays the
same as our previous functionality.
Change-Id: I950f7192962b0e49f082d2c4284e2d52b0a925c7
We are missing some tests for new Metainfo API that we have for old API.
This is first change to adjust old tests to new API.
Change-Id: Ie2b16bf85de8633662f952e863dbf3d409d801d9
For improving the deletion performance we are shifting the
responsibility to delete the pieces of the object from Uplink to the
Satellite.
BeginDeleteObject was the first call to return the stream ID which was
used for after retrieving the list of segments and then get addressed
order limits for deleting the pieces (of each segment) from the storage
nodes.
Now we want the Satellite deletes the pieces of all the object segments
from the storage nodes hence we don't need anymore to have several
network round trips between the Uplink and the Satellite because the
Satellite can delete all of them in the initial BegingDeleteObject
request.
satellite/metainfo.ListSegments has been changed to return 0 items if
the pointer of the last segment of an object is not found because we
need to preserve the backward compatibility with Uplinks that won't be
updated to the last release and they rely on listing the segments after
calling BeginDeleteObject for retrieving the addressed order limits
to contact the storage nodes to delete the pieces.
Change-Id: I5f99ecf27d62d65b0a062936b9b17581ef692af0
Remove direct dependency on uplink.RSConfig, this simplifies
moving the config file without introducing weird dependencies.
Change-Id: I7fd2a145401e0205d7047631df9d2810241efeec
The endpoint listSegmentsManually method misses a check for the limit
parameter, otherwise it can return inconsistent results when it's 0 or
negative.
When 0 or negative, without the check, it returns no segments but also
that there isn't more segments and that isn't correct.
The function is only called from the Endpoint.ListSegments method and
the function cares to ensure that limit is always greater than 0, but if
the method doesn't check that a new future caller could misuse it and
provoke a bug.
Additionally:
* Documentation for the modified function has been written
* The part of the function that repeated the logic of the
Endpoint.getPointer method has been removed for using that method.
* Added logging before returning an internal error in
Endpoint.getPointer.
Change-Id: I5c4f0db2292da0162db6b7d63553895808d0925a
Do some cleanup for adding new identified TODOs (associated with ticket
https://storjlabs.atlassian.net/browse/V3-3406) and remove an old one.
Change-Id: I5d20dbe1c4dee0a8279e08b05b907f4cc9dba278
* Use unexported existent method in logic that was duplicated in some
exported methods.
* Log a forgotten internal error.
* Improve the documentation adding more and fixing some to fit to our
code style conventions.
Change-Id: Ie6f8bc59f9089f92b8b0d1b4c09c2142c3f273f5
The Endpoint.getPointer method lacked of tracing.
Also add a dot at the end of documentation comment for following our
code style conventions.
Change-Id: I9b63ad297f04e31825648aae43aa8f9ebba2b4e2
Return an error when misusing the endpoint method
'listSegmentsFromNumberOfSegments' because there is the method
'listSegmentsManually' for being used when the number of segments is
less or equal than 0.
If we don't return an error on `listSegmentsFromNumberOfSegments` we
would realize that we have a bug much more later than returning an error
because the clients wouldn't receive an error and would receive an empty
list, making them to wonder what they are doing wrong to receive 0
results before they realize that they could be in front of a bug.
This commit also renames the function to be plural as "numberOfSegments"
parameter and the test function which missed also the end 's'.
Change-Id: I02318685bf36aa3af26545731a1711621a5e2e39
planet.Start starts a testplanet system, whereas planet.Run starts a testplanet
and runs a test against it with each DB backend (cockroach compat).
Change-Id: I39c9da26d9619ee69a2b718d24ab00271f9e9bc2
Fix a documentation comment for one method and apply our code
conventions to some that I stumbled.
Change-Id: I3baf5d004a128dcd561c3e27c080aab345c64461
Improve the piece hash validation filtering out a piece when an order
limit is not found for it.
The commit also improves the documentation of an internal metainfo
method and rename the parameters of 2 methods for clarifying what they
are.
* satellite/metainfo: Rollback path parts check in loop
We have to rollback the changes applied in checking the rawPath parts
from 4 to 3 because the production prointerDB is still storing buckets.
* satellite/metainfo: Don't return path parts less 4
Don't return an error in the metainfo loop iterator when a path doesn't
have 4 parts because it belongs to bucket metadata, not an actual
object.
Large conditional blocks are hard to read.
When the conditional block only has one branch it's easy to understand
the logic of the function to early return switching the condition.
We don't use reverse listing in any of our code, outside of tests, and
it is only exposed through libuplink in the
lib/uplink.(*Project).ListBuckets() API. We also don't know of any users
who might have a need for reverse listing through ListBuckets().
Since one of our prospective pointerdb backends can not support
backwards iteration, and because of the above considerations, we are
going to remove the reverse listing feature.
Change-Id: I8d2a1f33d01ee70b79918d584b8c671f57eef2a0
* uplink/storage/segments: return error no optimal threshold
Return an error if the store get less uploaded pieces than the indicated
by the optimal threshold.
* satellite/metainfo: Fix gRPC status error & add reason
This commit fix the CommitSegment endpoint method to return an
"Invalid Argument" status code when uplink submits invalid data which is
detected when filtering invalid pieces by filterInvalidPieces endpoint
method.
Because filterInvalidPieces is also used by CommitSegmentOld, such
method part has been changed accordingly.
* An initial check in CommitSegment to detect earlier if uplink sends an
invalid number of upload pieces.
* Add more information to some log messages.
* Return more information to uplink when it sends a number of invalid
pieces which make impossible to finish the operation successfully.
* satellite/metainfo: Swap some "sugar" loggers to normal ones
Swap "sugar" loggers to normal ones because they impact the performance
in production systems and they should only be used under specific
circumstances which were none of the ones changed.
all of the packages and tests work with both grpc and
drpc. we'll probably need to do some jenkins pipelines
to run the tests with drpc as well.
most of the changes are really due to a bit of cleanup
of the pkg/transport.Client api into an rpc.Dialer in
the spirit of a net.Dialer. now that we don't need
observers, we can pass around stateless configuration
to everything rather than stateful things that issue
observations. it also adds a DialAddressID for the
case where we don't have a pb.Node, but we do have an
address and want to assert some ID. this happened
pretty frequently, and now there's no more weird
contortions creating custom tls options, etc.
a lot of the other changes are being consistent/using
the abstractions in the rpc package to do rpc style
things like finding peer information, or checking
status codes.
Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412
What: we move api keys out of the grpc connection-level metadata on the client side and into the request protobufs directly. the server side still supports both mechanisms for backwards compatibility.
Why: dRPC won't support connection-level metadata. the only thing we currently use connection-level metadata for is api keys. we need to move all information needed by a request into the request protobuf itself for drpc support. check out the .proto changes for the main details.
One fun side-fact: Did you know that protobuf fields 1-15 are special and only use one byte for both the field number and type? Additionally did you know we don't use field 15 anywhere yet? So the new request header will use field 15, and should use field 15 on all protobufs going forward.
Please describe the tests: all existing tests should pass
Please describe the performance impact: none