Transactions in our code that might need to work against CockroachDB
need to be retried in the event of a retryable error. The transaction
helper functions in dbutil do that automatically. I am changing this
code to use those helpers instead.
I also fleshed out consoledb_test.go to do actual inserts and gets to
make sure things were working correctly.
Change-Id: I089bf4c776d15dc8578080e26760bd6dff4beec9
Transactions in our code that might need to work against CockroachDB
need to be retried in the event of a retryable error. The transaction
helper functions in dbutil do that automatically. I am changing this
code to use those helpers instead.
Change-Id: I22b850ce5859fa07d13bf475be5140e6bde95b8a
Transactions in our code that might need to work against CockroachDB
need to be retried in the event of a retryable error. The WithTx
helper functions in dbutil and dbx do that automatically. I am changing
this code to use those helpers instead.
Change-Id: Iaf492af35471931125f2b7365aa4338f44154881
DeleteObjectPieces must not call overlay cache KnownReliable method with
an empty list of node IDs for avoiding to log a useless noisy warning.
Change-Id: Ibe2a34f2913f003d3ba020f9764c1369fa63123b
Move tests for old Metainfo API to separate file. Metainfo tests file is
large enough and in future it will be easier to remove old tests.
Change-Id: I9421907ef015a6dfa65f4de6ef01b2d2c8baa7df
Use the helper function IsRPC of the err2 package rather than checking
if an error is of a specific RPC status code with an 'if' conditional.
Change-Id: Ibe89d6c2d836307c3112a6d7cc6bf95f0f985fd2
Disqualifies a node when the node fails to complete a graceful
exit.
Adds a new DisqualifyNode method to the overlay cache, since there
wasn't an existing method to disqualify a node but do nothing else
to its stats.
Adds checks to existing tests to make sure that a storage node that
fails a graceful exit is marked as disqualified in the overlay
cache.
https: //storjlabs.atlassian.net/browse/V3-3342
Change-Id: I4d554a519ab59db31ad3b8e28764c8683a6e3888
crdb.ExecuteTx is great, but I don't think it will work right with
PostgreSQL. It works by way of cockroach savepoints, which allows
it to react to retryable errors, whereas tx.Commit() doesn't. But
I don't think PostgreSQL savepoints work exactly the same way. I'm not
100% sure, but it doesn't seem worth the risk.
So, I'm switching one case here to use the new dbutil.WithTx instead,
which will use crdb.ExecuteTx if appropriate. The other case doesn't
need a transaction at all.
Change-Id: I39283f3b5d8d47596db7aff5048bb74597e5918f
Transactions in our code that might need to work against CockroachDB
need to be retried in the event of a retryable error. The transaction
helper functions in dbutil do that automatically. I am changing this
code to use those helpers instead.
Change-Id: I660540885a0784fae844cf99376d1537e208fa69
overlay.GetOfflineNodesLimited
We only care about node ID, address, and last contact success/failure
from the downtime service, so the overlay should only return these
values for the downtime-specific queries.
Change-Id: I08a6ecfdd2a12b82cae62e87d6adeab53975bfce
Transactions in our code that might need to work against CockroachDB
need to be retried in the event of a retryable error. The transaction
helper functions in dbutil do that automatically. I am changing this
code to use those helpers instead.
Change-Id: Icd3da71448a84c582c6afdc6b52d1f345fe9469f
Transactions in our code that might need to work against CockroachDB
need to be retried in the event of a retryable error. The transaction
helper functions in dbutil do that automatically. I am changing this
code to use those helpers instead.
Change-Id: Ibaadd2c8540ba5c8cccd6ecbf529017ab98b78ca
Transactions in our code that might need to work against CockroachDB
need to be retried in the event of a retryable error. The transaction
helper functions in dbutil do that automatically. I am changing this
code to use those helpers instead.
Change-Id: Id24906f5f3ae83245dabb218e1f70e0bcb3b417a
Remove starting up messages from peers. We expect all of them to start,
if they don't, then they should return an error why they don't start.
The only informative message is when a service is disabled.
When doing initial database setup then each migration step isn't
informative, hence print only a single line with the final version.
Also use shorter log scopes.
Change-Id: Ic8b61411df2eeae2a36d600a0c2fbc97a84a5b93
When the context was being cancelled the error was being discarded within the rate limiting error handling which caused tests to fail.
Change-Id: I5c6458c16da09a11531233ea0ee80d914969cb3f
deletePointer must return an ErrObjectNotFound rather than a rpc status
error NotFound because the callers must distinguish such error if it
comes from the getPointer or from the UnsynchronizedDelete.
Change-Id: I68b4e45a2765e63b73bf85c2c39a5fc0198373f6
We don't want slowloris nodes to be able to indefinitely block
up the satellite, so add a timeout. Some monitoring inspection
showed the largest success times being on the order of 30s, so
a 1min timeout should be sufficient to kill the misbehaving nodes.
Change-Id: I5e2c3480a15f6304e37262d0a4d30d07eae99bb3
As per discussed we decided to rate limit how fast we iterate through
the metainfo database in the metainfo loop. This puts in place a
mechanism for rate limiting and burst limiting if need be in the future.
The default for this rate limiting is still no limits so it stays the
same as our previous functionality.
Change-Id: I950f7192962b0e49f082d2c4284e2d52b0a925c7
We are missing some tests for new Metainfo API that we have for old API.
This is first change to adjust old tests to new API.
Change-Id: Ie2b16bf85de8633662f952e863dbf3d409d801d9
For improving the deletion performance we are shifting the
responsibility to delete the pieces of the object from Uplink to the
Satellite.
BeginDeleteObject was the first call to return the stream ID which was
used for after retrieving the list of segments and then get addressed
order limits for deleting the pieces (of each segment) from the storage
nodes.
Now we want the Satellite deletes the pieces of all the object segments
from the storage nodes hence we don't need anymore to have several
network round trips between the Uplink and the Satellite because the
Satellite can delete all of them in the initial BegingDeleteObject
request.
satellite/metainfo.ListSegments has been changed to return 0 items if
the pointer of the last segment of an object is not found because we
need to preserve the backward compatibility with Uplinks that won't be
updated to the last release and they rely on listing the segments after
calling BeginDeleteObject for retrieving the addressed order limits
to contact the storage nodes to delete the pieces.
Change-Id: I5f99ecf27d62d65b0a062936b9b17581ef692af0
Remove direct dependency on uplink.RSConfig, this simplifies
moving the config file without introducing weird dependencies.
Change-Id: I7fd2a145401e0205d7047631df9d2810241efeec
Adds check to see if storage nodes are eligible to initiate
graceful exit, by checking their CreatedAt date and seeing if
their "age" is greater than the new config value:
NodeMinAgeInMonths
The default for this value is 6 months for now.
https://storjlabs.atlassian.net/browse/V3-3357
Change-Id: Ib807ab8987ddb5a38a27a83886490f73fe8c5816
The endpoint listSegmentsManually method misses a check for the limit
parameter, otherwise it can return inconsistent results when it's 0 or
negative.
When 0 or negative, without the check, it returns no segments but also
that there isn't more segments and that isn't correct.
The function is only called from the Endpoint.ListSegments method and
the function cares to ensure that limit is always greater than 0, but if
the method doesn't check that a new future caller could misuse it and
provoke a bug.
Additionally:
* Documentation for the modified function has been written
* The part of the function that repeated the logic of the
Endpoint.getPointer method has been removed for using that method.
* Added logging before returning an internal error in
Endpoint.getPointer.
Change-Id: I5c4f0db2292da0162db6b7d63553895808d0925a
Do some cleanup for adding new identified TODOs (associated with ticket
https://storjlabs.atlassian.net/browse/V3-3406) and remove an old one.
Change-Id: I5d20dbe1c4dee0a8279e08b05b907f4cc9dba278
In satellite/accounting/rollup Service.RollupStorage we have a few
potential error scenarios that return time.Now(). Especially in the case
where we exit early because we have received 0 tallies since the *last*
rollup, this creates a potential race condition.
Between the time we call GetTalliesSince and realize it is empty, it's
possible a tally was inserted in that interval. As currently written we
are returning a latestTally time that excludes that tally.
We are currently protected because in Service.Rollup we don't save the
rollup unless we have populated the rollupStats. However, this change is
more correct and future-proof, because Service.RollupStorage should
always return a correct latestTally time, which in case of errors and
empty tallies, is the last successful tally.
Change-Id: I2521a2cc9802c8f06e512dde4422803a272e2a0a
Adds the KnownReliable method to Overlay Service that filters all nodes
from the given list to be only reliable nodes (online and qualified).
The method return []*pb.Node of reliable nodes. The pb.Node values are
ready for dialing.
The first use case is when deleting an object to efficiently dial all
reliable nodes holding a piece of that object and send them a delete
request.
Change-Id: I13e0a8666f3807c5c31ef1a1087476018a5d3acb
Fixes a data race caused by not waiting for workers to finish
before shutting down. Currently this ended up failing logging
because it was closed when test tried to write to it.
Change-Id: I074045cd83bbf49e658f51353aa7901e9a5d074b
this will allow us to inspect the type of `db.Driver()` on *sql.DB
connections to correctly differentiate between pg and crdb conns.
as a bonus, this moves all concerns about when to replace "cockroach://"
with "postgres://" out of view, letting the thin shim "driver" take care
of that.
Change-Id: Ib24103ab7c508231e681f89a7321b623e4e125e9
Backstory: I needed a better way to pass around information about the
underlying driver and implementation to all the various db-using things
in satellitedb (at least until some new "cockroach driver" support makes
it to DBX). After hitting a few dead ends, I decided I wanted to have a
type that could act like a *dbx.DB but which would also carry
information about the implementation, etc. Then I could pass around that
type to all the things in satellitedb that previously wanted *dbx.DB.
But then I realized that *satellitedb.DB was, essentially, exactly that
already.
One thing that might have kept *satellitedb.DB from being directly
usable was that embedding a *dbx.DB inside it would make a lot of dbx
methods publicly available on a *satellitedb.DB instance that previously
were nicely encapsulated and hidden. But after a quick look, I realized
that _nothing_ outside of satellite/satellitedb even needs to use
satellitedb.DB at all. It didn't even need to be exported, except for
some trivially-replaceable code in migrate_postgres_test.go. And once
I made it unexported, any concerns about exposing new methods on it were
entirely moot.
So I have here changed the exported *satellitedb.DB type into the
unexported *satellitedb.satelliteDB type, and I have changed all the
places here that wanted raw dbx.DB handles to use this new type instead.
Now they can just take a gander at the implementation member on it and
know all they need to know about the underlying database.
This will make it possible for some other pending code here to
differentiate between postgres and cockroach backends.
Change-Id: I27af99f8ae23b50782333da5277b553b34634edc
* Use unexported existent method in logic that was duplicated in some
exported methods.
* Log a forgotten internal error.
* Improve the documentation adding more and fixing some to fit to our
code style conventions.
Change-Id: Ie6f8bc59f9089f92b8b0d1b4c09c2142c3f273f5
The Endpoint.getPointer method lacked of tracing.
Also add a dot at the end of documentation comment for following our
code style conventions.
Change-Id: I9b63ad297f04e31825648aae43aa8f9ebba2b4e2
Return an error when misusing the endpoint method
'listSegmentsFromNumberOfSegments' because there is the method
'listSegmentsManually' for being used when the number of segments is
less or equal than 0.
If we don't return an error on `listSegmentsFromNumberOfSegments` we
would realize that we have a bug much more later than returning an error
because the clients wouldn't receive an error and would receive an empty
list, making them to wonder what they are doing wrong to receive 0
results before they realize that they could be in front of a bug.
This commit also renames the function to be plural as "numberOfSegments"
parameter and the test function which missed also the end 's'.
Change-Id: I02318685bf36aa3af26545731a1711621a5e2e39
planet.Start starts a testplanet system, whereas planet.Run starts a testplanet
and runs a test against it with each DB backend (cockroach compat).
Change-Id: I39c9da26d9619ee69a2b718d24ab00271f9e9bc2
for storj-sim to work, we need to avoid schemas in cockroach urls
so we have storj-sim create namespaced databases instead of schemas
and we have the migrate command create the database in the same way
that it would create a schema for postgres. then it works!
a follow up commit will move the creation of the database/schemas
into storj-sim's setup step so that we can avoid doing these icky
creations during normal migration calls. it will also make the
pointerdb have an explicit call to migrate instead of just doing
it every time it's opened.
Change-Id: If69ef5cb96b6866b0438c761bd445afb3597ae5f
satellitedb migration tests ran against multiple base versions, however after the merging all the steps the base versions didn't exists anymore - which meant none of the migration tests were actually running.
Fix a documentation comment for one method and apply our code
conventions to some that I stumbled.
Change-Id: I3baf5d004a128dcd561c3e27c080aab345c64461
first, so that they all work the same way, because it's getting
complicated, and second, so that we can do the appropriate thing
instead of CREATE SCHEMA for cockroachdb.
Change-Id: I27fbaeeb6223a3e06d97bcf692a2d014b31465f7
it doesn't necessarily _have_ to be UTC; the time is correct as returned
either way, but this will make it a little less prone to variance.
also, there is a test that depends on the time being returned in UTC.
Change-Id: Ia71e24ecd9973ba70a1cfb5621a3030a5c82d004
Improve the piece hash validation filtering out a piece when an order
limit is not found for it.
The commit also improves the documentation of an internal metainfo
method and rename the parameters of 2 methods for clarifying what they
are.
This will make it so we don't need to comment out those lines every time
we want to enable the cockroachdb tests during development.
Once it's ready this flag can go away.