Transactions in our code that might need to work against CockroachDB
need to be retried in the event of a retryable error. The transaction
helper functions in dbutil do that automatically. I am changing this
code to use those helpers instead.
Change-Id: I660540885a0784fae844cf99376d1537e208fa69
overlay.GetOfflineNodesLimited
We only care about node ID, address, and last contact success/failure
from the downtime service, so the overlay should only return these
values for the downtime-specific queries.
Change-Id: I08a6ecfdd2a12b82cae62e87d6adeab53975bfce
Transactions in our code that might need to work against CockroachDB
need to be retried in the event of a retryable error. The transaction
helper functions in dbutil do that automatically. I am changing this
code to use those helpers instead.
Change-Id: Icd3da71448a84c582c6afdc6b52d1f345fe9469f
Transactions in our code that might need to work against CockroachDB
need to be retried in the event of a retryable error. The transaction
helper functions in dbutil do that automatically. I am changing this
code to use those helpers instead.
Change-Id: Ibaadd2c8540ba5c8cccd6ecbf529017ab98b78ca
Transactions in our code that might need to work against CockroachDB
need to be retried in the event of a retryable error. The transaction
helper functions in dbutil do that automatically. I am changing this
code to use those helpers instead.
Change-Id: Id24906f5f3ae83245dabb218e1f70e0bcb3b417a
Remove starting up messages from peers. We expect all of them to start,
if they don't, then they should return an error why they don't start.
The only informative message is when a service is disabled.
When doing initial database setup then each migration step isn't
informative, hence print only a single line with the final version.
Also use shorter log scopes.
Change-Id: Ic8b61411df2eeae2a36d600a0c2fbc97a84a5b93
When the context was being cancelled the error was being discarded within the rate limiting error handling which caused tests to fail.
Change-Id: I5c6458c16da09a11531233ea0ee80d914969cb3f
deletePointer must return an ErrObjectNotFound rather than a rpc status
error NotFound because the callers must distinguish such error if it
comes from the getPointer or from the UnsynchronizedDelete.
Change-Id: I68b4e45a2765e63b73bf85c2c39a5fc0198373f6
We don't want slowloris nodes to be able to indefinitely block
up the satellite, so add a timeout. Some monitoring inspection
showed the largest success times being on the order of 30s, so
a 1min timeout should be sufficient to kill the misbehaving nodes.
Change-Id: I5e2c3480a15f6304e37262d0a4d30d07eae99bb3
As per discussed we decided to rate limit how fast we iterate through
the metainfo database in the metainfo loop. This puts in place a
mechanism for rate limiting and burst limiting if need be in the future.
The default for this rate limiting is still no limits so it stays the
same as our previous functionality.
Change-Id: I950f7192962b0e49f082d2c4284e2d52b0a925c7
We are missing some tests for new Metainfo API that we have for old API.
This is first change to adjust old tests to new API.
Change-Id: Ie2b16bf85de8633662f952e863dbf3d409d801d9
For improving the deletion performance we are shifting the
responsibility to delete the pieces of the object from Uplink to the
Satellite.
BeginDeleteObject was the first call to return the stream ID which was
used for after retrieving the list of segments and then get addressed
order limits for deleting the pieces (of each segment) from the storage
nodes.
Now we want the Satellite deletes the pieces of all the object segments
from the storage nodes hence we don't need anymore to have several
network round trips between the Uplink and the Satellite because the
Satellite can delete all of them in the initial BegingDeleteObject
request.
satellite/metainfo.ListSegments has been changed to return 0 items if
the pointer of the last segment of an object is not found because we
need to preserve the backward compatibility with Uplinks that won't be
updated to the last release and they rely on listing the segments after
calling BeginDeleteObject for retrieving the addressed order limits
to contact the storage nodes to delete the pieces.
Change-Id: I5f99ecf27d62d65b0a062936b9b17581ef692af0
Remove direct dependency on uplink.RSConfig, this simplifies
moving the config file without introducing weird dependencies.
Change-Id: I7fd2a145401e0205d7047631df9d2810241efeec
Adds check to see if storage nodes are eligible to initiate
graceful exit, by checking their CreatedAt date and seeing if
their "age" is greater than the new config value:
NodeMinAgeInMonths
The default for this value is 6 months for now.
https://storjlabs.atlassian.net/browse/V3-3357
Change-Id: Ib807ab8987ddb5a38a27a83886490f73fe8c5816
The endpoint listSegmentsManually method misses a check for the limit
parameter, otherwise it can return inconsistent results when it's 0 or
negative.
When 0 or negative, without the check, it returns no segments but also
that there isn't more segments and that isn't correct.
The function is only called from the Endpoint.ListSegments method and
the function cares to ensure that limit is always greater than 0, but if
the method doesn't check that a new future caller could misuse it and
provoke a bug.
Additionally:
* Documentation for the modified function has been written
* The part of the function that repeated the logic of the
Endpoint.getPointer method has been removed for using that method.
* Added logging before returning an internal error in
Endpoint.getPointer.
Change-Id: I5c4f0db2292da0162db6b7d63553895808d0925a
Do some cleanup for adding new identified TODOs (associated with ticket
https://storjlabs.atlassian.net/browse/V3-3406) and remove an old one.
Change-Id: I5d20dbe1c4dee0a8279e08b05b907f4cc9dba278
In satellite/accounting/rollup Service.RollupStorage we have a few
potential error scenarios that return time.Now(). Especially in the case
where we exit early because we have received 0 tallies since the *last*
rollup, this creates a potential race condition.
Between the time we call GetTalliesSince and realize it is empty, it's
possible a tally was inserted in that interval. As currently written we
are returning a latestTally time that excludes that tally.
We are currently protected because in Service.Rollup we don't save the
rollup unless we have populated the rollupStats. However, this change is
more correct and future-proof, because Service.RollupStorage should
always return a correct latestTally time, which in case of errors and
empty tallies, is the last successful tally.
Change-Id: I2521a2cc9802c8f06e512dde4422803a272e2a0a
Adds the KnownReliable method to Overlay Service that filters all nodes
from the given list to be only reliable nodes (online and qualified).
The method return []*pb.Node of reliable nodes. The pb.Node values are
ready for dialing.
The first use case is when deleting an object to efficiently dial all
reliable nodes holding a piece of that object and send them a delete
request.
Change-Id: I13e0a8666f3807c5c31ef1a1087476018a5d3acb
Fixes a data race caused by not waiting for workers to finish
before shutting down. Currently this ended up failing logging
because it was closed when test tried to write to it.
Change-Id: I074045cd83bbf49e658f51353aa7901e9a5d074b
this will allow us to inspect the type of `db.Driver()` on *sql.DB
connections to correctly differentiate between pg and crdb conns.
as a bonus, this moves all concerns about when to replace "cockroach://"
with "postgres://" out of view, letting the thin shim "driver" take care
of that.
Change-Id: Ib24103ab7c508231e681f89a7321b623e4e125e9
Backstory: I needed a better way to pass around information about the
underlying driver and implementation to all the various db-using things
in satellitedb (at least until some new "cockroach driver" support makes
it to DBX). After hitting a few dead ends, I decided I wanted to have a
type that could act like a *dbx.DB but which would also carry
information about the implementation, etc. Then I could pass around that
type to all the things in satellitedb that previously wanted *dbx.DB.
But then I realized that *satellitedb.DB was, essentially, exactly that
already.
One thing that might have kept *satellitedb.DB from being directly
usable was that embedding a *dbx.DB inside it would make a lot of dbx
methods publicly available on a *satellitedb.DB instance that previously
were nicely encapsulated and hidden. But after a quick look, I realized
that _nothing_ outside of satellite/satellitedb even needs to use
satellitedb.DB at all. It didn't even need to be exported, except for
some trivially-replaceable code in migrate_postgres_test.go. And once
I made it unexported, any concerns about exposing new methods on it were
entirely moot.
So I have here changed the exported *satellitedb.DB type into the
unexported *satellitedb.satelliteDB type, and I have changed all the
places here that wanted raw dbx.DB handles to use this new type instead.
Now they can just take a gander at the implementation member on it and
know all they need to know about the underlying database.
This will make it possible for some other pending code here to
differentiate between postgres and cockroach backends.
Change-Id: I27af99f8ae23b50782333da5277b553b34634edc
* Use unexported existent method in logic that was duplicated in some
exported methods.
* Log a forgotten internal error.
* Improve the documentation adding more and fixing some to fit to our
code style conventions.
Change-Id: Ie6f8bc59f9089f92b8b0d1b4c09c2142c3f273f5
The Endpoint.getPointer method lacked of tracing.
Also add a dot at the end of documentation comment for following our
code style conventions.
Change-Id: I9b63ad297f04e31825648aae43aa8f9ebba2b4e2