Commit Graph

808 Commits

Author SHA1 Message Date
Ivan Fraixedes
5cd6058318 satellite/metainfo: Add back-pressure mechanism DeleteObjectPieces
Add a back-pressure mechanism to the satellite metainfo
DeleteObjectPieces method for returning once the 75% of successful
deleted pieces is reached.

Change-Id: Ia38df49fba5838f0605c40a77cfff8e3442cb5b0
2020-01-09 08:22:11 +00:00
Yingrong Zhao
76ee8a1b4c satellite: remove UptimeReputation configs from codebase
With the new storage node downtime tracking feature, we need remove current uptime reputation configs: UptimeReputationAlpha, UptimeReputationBeta, and
UptimeReputationDQ. This is the first step of removing the uptime
reputation columns from satellitedb

Change-Id: Ie8fab13295dbf545e33aeda0c4306cda4ba54e36
2020-01-08 18:54:15 +00:00
littleskunk
05e4a86654
satellite/metainfo: Close client DeleteObjectPieces
The DeleteObjectPieces should print out the warning on closing the
connections only if there was an error.

Change-Id: If3d7ab256d8508c08388c1f22c7dd1eb819d2509
2020-01-08 15:54:11 +01:00
Ivan Fraixedes
922c43f921
satellite/metainfo: Close client DeleteObjectPieces
The DeleteObjectPieces must close the storage node client once it has
finished deleting its pieces.

Change-Id: I08eb8af8e4215d77d59b52f5055211b918374ab4
2020-01-08 15:01:34 +01:00
Egon Elbre
082ec81714
uplink: move to storj.io/uplink (#3746) 2020-01-08 15:40:19 +02:00
Jeff Wendling
c740b82e66 satellitedb/dbx: remove sed usage for bash script
turns out portable sed is hard: it has to work with both
linux and bsd sed, etc. instead, use a really really basic
bash script and a temporary file. this should be much less
likely to cause issues on a wide range of machines.

Change-Id: Ia759789fb52aa1ee3361426bb6c02ed4eac3d23a
2020-01-08 01:40:24 +00:00
paul cannon
6b21334c47 satellite/satellitedb: use txutil.ExecuteInTx in dbx WithTx()
Change-Id: I42ec21fdf117c661b3e1687a04014650c3a6ab97
2020-01-07 17:00:08 -06:00
paul cannon
0c5e381434 satellite/console: use transaction helpers in consoledb
Transactions in our code that might need to work against CockroachDB
need to be retried in the event of a retryable error. The transaction
helper functions in dbutil do that automatically. I am changing this
code to use those helpers instead.

I also fleshed out consoledb_test.go to do actual inserts and gets to
make sure things were working correctly.

Change-Id: I089bf4c776d15dc8578080e26760bd6dff4beec9
2020-01-07 17:59:10 +00:00
paul cannon
22b6e9220a satellite/satellitedb: use transaction helpers in irreparabledb
Transactions in our code that might need to work against CockroachDB
need to be retried in the event of a retryable error. The transaction
helper functions in dbutil do that automatically. I am changing this
code to use those helpers instead.

Change-Id: I22b850ce5859fa07d13bf475be5140e6bde95b8a
2020-01-07 17:40:09 +00:00
paul cannon
723ed23298 satellite/satellitedb: use transaction helpers in orders
Transactions in our code that might need to work against CockroachDB
need to be retried in the event of a retryable error. The WithTx
helper functions in dbutil and dbx do that automatically. I am changing
this code to use those helpers instead.

Change-Id: Iaf492af35471931125f2b7365aa4338f44154881
2020-01-07 16:31:47 +00:00
Ivan Fraixedes
027e3d4f62
satellte/metainfo: Avoid a noisy warning
DeleteObjectPieces must not call overlay cache KnownReliable method with
an empty list of node IDs for avoiding to log a useless noisy warning.

Change-Id: Ibe2a34f2913f003d3ba020f9764c1369fa63123b
2020-01-07 14:15:04 +01:00
Michal Niewrzal
e232042e85 satellite/metainfo: move old API tests to separate file
Move tests for old Metainfo API to separate file. Metainfo tests file is
large enough and in future it will be easier to remove old tests.

Change-Id: I9421907ef015a6dfa65f4de6ef01b2d2c8baa7df
2020-01-07 11:34:33 +00:00
Ivan Fraixedes
9618959f1d
satellite/metainfo: Use errs2 IsRPC helper function
Use the helper function IsRPC of the err2 package rather than checking
if an error is of a specific RPC status code with an 'if' conditional.

Change-Id: Ibe89d6c2d836307c3112a6d7cc6bf95f0f985fd2
2020-01-07 08:26:45 +01:00
Natalie Ventura Villasana
1cb0f80a8d satellite/gracefulexit: dq node on exit fail
Disqualifies a node when the node fails to complete a graceful
exit.

Adds a new DisqualifyNode method to the overlay cache, since there
wasn't an existing method to disqualify a node but do nothing else
to its stats.

Adds checks to existing tests to make sure that a storage node that
fails a graceful exit is marked as disqualified in the overlay
cache.

https: //storjlabs.atlassian.net/browse/V3-3342
Change-Id: I4d554a519ab59db31ad3b8e28764c8683a6e3888
2020-01-06 19:16:26 -05:00
paul cannon
4a26fb5bd5 satellite/satellitedb: don't use crdb.ExecuteTx with postgres
crdb.ExecuteTx is great, but I don't think it will work right with
PostgreSQL. It works by way of cockroach savepoints, which allows
it to react to retryable errors, whereas tx.Commit() doesn't. But
I don't think PostgreSQL savepoints work exactly the same way. I'm not
100% sure, but it doesn't seem worth the risk.

So, I'm switching one case here to use the new dbutil.WithTx instead,
which will use crdb.ExecuteTx if appropriate. The other case doesn't
need a transaction at all.

Change-Id: I39283f3b5d8d47596db7aff5048bb74597e5918f
2020-01-06 23:51:35 +00:00
paul cannon
f3aee1b758 satellite/satellitedb: use transaction helpers in containment
Transactions in our code that might need to work against CockroachDB
need to be retried in the event of a retryable error. The transaction
helper functions in dbutil do that automatically. I am changing this
code to use those helpers instead.

Change-Id: I660540885a0784fae844cf99376d1537e208fa69
2020-01-06 23:07:38 +00:00
Moby von Briesen
6c2e4cc0cd satellite/overlay: Return NodeLastContact instead of a node dossier from
overlay.GetOfflineNodesLimited

We only care about node ID, address, and last contact success/failure
from the downtime service, so the overlay should only return these
values for the downtime-specific queries.

Change-Id: I08a6ecfdd2a12b82cae62e87d6adeab53975bfce
2020-01-06 17:12:30 -05:00
paul cannon
4203e25c54 satellite/satellitedb: use transaction helpers in overlaycache
Transactions in our code that might need to work against CockroachDB
need to be retried in the event of a retryable error. The transaction
helper functions in dbutil do that automatically. I am changing this
code to use those helpers instead.

Change-Id: Icd3da71448a84c582c6afdc6b52d1f345fe9469f
2020-01-06 21:42:57 +00:00
paul cannon
b072e16ff7 satellite/satellitedb: use transaction helpers in peeridentities
Transactions in our code that might need to work against CockroachDB
need to be retried in the event of a retryable error. The transaction
helper functions in dbutil do that automatically. I am changing this
code to use those helpers instead.

Change-Id: Ibaadd2c8540ba5c8cccd6ecbf529017ab98b78ca
2020-01-06 20:42:15 +00:00
paul cannon
eb81879d47 satellite/satellitedb: use transaction helpers in usercredits
Transactions in our code that might need to work against CockroachDB
need to be retried in the event of a retryable error. The transaction
helper functions in dbutil do that automatically. I am changing this
code to use those helpers instead.

Change-Id: Id24906f5f3ae83245dabb218e1f70e0bcb3b417a
2020-01-06 20:40:45 +00:00
Egon Elbre
f41d440944 all: reduce number of log messages
Remove starting up messages from peers. We expect all of them to start,
if they don't, then they should return an error why they don't start.
The only informative message is when a service is disabled.

When doing initial database setup then each migration step isn't
informative, hence print only a single line with the final version.

Also use shorter log scopes.

Change-Id: Ic8b61411df2eeae2a36d600a0c2fbc97a84a5b93
2020-01-06 19:03:46 +00:00
paul cannon
a33734bee7 satellite/satellitedb/dbx: add cockroach driver type
Change-Id: I7a0da6e066c67a521fc1b23b085ab8554eee0d4c
2020-01-06 18:01:03 +00:00
Simon Guindon
80b41af8f1 satellite/metainfo: Fixed bug that discarded context cancellation errors
When the context was being cancelled the error was being discarded within the rate limiting error handling which caused tests to fail.
Change-Id: I5c6458c16da09a11531233ea0ee80d914969cb3f
2020-01-03 22:48:12 -05:00
Ivan Fraixedes
fc4ea28695 satellite/metainfo: Return ErrObjectNotFound
deletePointer must return an ErrObjectNotFound rather than a rpc status
error NotFound because the callers must distinguish such error if it
comes from the getPointer or from the UnsynchronizedDelete.

Change-Id: I68b4e45a2765e63b73bf85c2c39a5fc0198373f6
2020-01-03 22:01:29 +00:00
Jeff Wendling
29fe206b9a satellite/gc: add timeout to retain requests
We don't want slowloris nodes to be able to indefinitely block
up the satellite, so add a timeout. Some monitoring inspection
showed the largest success times being on the order of 30s, so
a 1min timeout should be sufficient to kill the misbehaving nodes.

Change-Id: I5e2c3480a15f6304e37262d0a4d30d07eae99bb3
2020-01-03 21:46:46 +00:00
Simon Guindon
e1e7cebe49 satellite/metainfo: added rate limiting support to the metainfo loop.
As per discussed we decided to rate limit how fast we iterate through
the metainfo database in the metainfo loop. This puts in place a
mechanism for rate limiting and burst limiting if need be in the future.

The default for this rate limiting is still no limits so it stays the
same as our previous functionality.

Change-Id: I950f7192962b0e49f082d2c4284e2d52b0a925c7
2020-01-03 15:00:29 -05:00
Ethan
05b406e992 satellite:{downtime,overlay}: Implement offline node detection chore
https://storjlabs.atlassian.net/browse/V3-3398

Change-Id: I598c3bad819026377d1d113c099dc9bba8b02742
2020-01-03 17:10:03 +00:00
Yaroslav
389567fc9e satellite/console: add credit card charges to billing history
Change-Id: I82a08c42c01086dc7fb9508da5c6c0baa2438124
2020-01-03 17:34:59 +02:00
Yaroslav
0cc7056a9a satellite/console: convert dates to UTC in advanced usage reports
Change-Id: I5c72c869533a7613bffdb8077fdedff2a4e203d0
2020-01-03 14:17:37 +02:00
Michal Niewrzal
38eff60698 satellite/metainfo: adjust old API test to new API
We are missing some tests for new Metainfo API that we have for old API.
This is first change to adjust old tests to new API.

Change-Id: Ie2b16bf85de8633662f952e863dbf3d409d801d9
2020-01-03 11:05:14 +01:00
Ethan
8859c36234 satellite/{downtime,contact}: Add CheckNodeAvailability for use within the downtime tracking chores.
https://storjlabs.atlassian.net/browse/V3-2545

Change-Id: I1dd54a0c77cb4905bb1f350beeb82c6f7700ee70
2020-01-02 18:24:11 +00:00
Moby von Briesen
ff74b44c5f satellite/overlay: Add ability for overlay to get offline nodes ordered by last checked time
This is required for the downtime tracking service: https://storjlabs.atlassian.net/browse/V3-2545

Change-Id: I286cdc07d802393948eb10c25c45ba78cc3ceafc
2020-01-02 16:39:38 +00:00
Ivan Fraixedes
c3b58f1656 satellte/metainfo: Make BeginDeleteObject to delete pieces
For improving the deletion performance we are shifting the
responsibility to delete the pieces of the object from Uplink to the
Satellite.

BeginDeleteObject was the first call to return the stream ID which was
used for after retrieving the list of segments and then get addressed
order limits for deleting the pieces (of each segment) from the storage
nodes.

Now we want the Satellite deletes the pieces of all the object segments
from the storage nodes hence we don't need anymore to have several
network round trips between the Uplink and the Satellite because the
Satellite can delete all of them in the initial BegingDeleteObject
request.

satellite/metainfo.ListSegments has been changed to return 0 items if
the pointer of the last segment of an object is not found because we
need to preserve the backward compatibility with Uplinks that won't be
updated to the last release and they rely on listing the segments after
calling BeginDeleteObject for retrieving the addressed order limits
to contact the storage nodes to delete the pieces.

Change-Id: I5f99ecf27d62d65b0a062936b9b17581ef692af0
2020-01-02 15:53:59 +00:00
Egon Elbre
3528c56c6f satellite/satellitedb/satellitedbtest: skip unconfigured db
Change-Id: Ib6ea58208ef19410146845e82eb08724888e85ae
2020-01-02 10:51:59 +00:00
Egon Elbre
e03d3fb577 uplink: move configs to cmd/uplink/cmd
Change-Id: Ifc1d3440dcef429c2a6142c16f3e991abf49f1d2
2020-01-02 09:40:57 +00:00
Egon Elbre
2680bae88c private/testplanet: remove dependency to uplink
Remove direct dependency on uplink.RSConfig, this simplifies
moving the config file without introducing weird dependencies.

Change-Id: I7fd2a145401e0205d7047631df9d2810241efeec
2020-01-02 09:40:46 +00:00
Natalie Ventura Villasana
aa3e183c2e
satellite/gracefulexit: add ge eligibility check
Adds check to see if storage nodes are eligible to initiate
graceful exit, by checking their CreatedAt date and seeing if
their "age" is greater than the new config value:
NodeMinAgeInMonths
The default for this value is 6 months for now.

https://storjlabs.atlassian.net/browse/V3-3357

Change-Id: Ib807ab8987ddb5a38a27a83886490f73fe8c5816
2019-12-31 09:31:58 -05:00
Ivan Fraixedes
7266155375 satellite/metainfo: List segments manually limit check
The endpoint listSegmentsManually method misses a check for the limit
parameter, otherwise it can return inconsistent results when it's 0 or
negative.

When 0 or negative, without the check, it returns no segments but also
that there isn't more segments and that isn't correct.

The function is only called from the Endpoint.ListSegments method and
the function cares to ensure that limit is always greater than 0, but if
the method doesn't check that a new future caller could misuse it and
provoke a bug.

Additionally:

* Documentation for the modified function has been written
* The part of the function that repeated the logic of the
  Endpoint.getPointer method has been removed for using that method.
* Added logging before returning an internal error in
  Endpoint.getPointer.

Change-Id: I5c4f0db2292da0162db6b7d63553895808d0925a
2019-12-31 07:44:46 +00:00
Moby von Briesen
bb3baf5a4e satellite/satellitedb: Add nodes_offline_times table for downtime tracking
Change-Id: If6b80fe0a20d88cedacaf4b76b75aa21d0af2465
2019-12-30 15:45:02 -05:00
Ivan Fraixedes
059537c16e satellite/metainfo: Add new TODOs & remove old ones
Do some cleanup for adding new identified TODOs (associated with ticket
https://storjlabs.atlassian.net/browse/V3-3406) and remove an old one.

Change-Id: I5d20dbe1c4dee0a8279e08b05b907f4cc9dba278
2019-12-27 13:16:09 +00:00
Egon Elbre
6615ecc9b6 common: separate repository
Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a
2019-12-27 14:11:15 +02:00
Egon Elbre
d55288cf68 pkg/rpc: replace methods with direct calls to pb
Change-Id: I8bd015d8d316a2c12c1daceca1d9fd257f6f57bc
2019-12-22 17:12:43 +02:00
Egon Elbre
006baa9ca6 pkg/rpc: remove drpc aliases
We need to split up pb package, which means we cannot have a core package
that depends on them.

Change-Id: I7f4f6fd82f89a51a9b2ad08bf2b1207253b8a215
2019-12-22 16:58:08 +02:00
Egon Elbre
acb4435a67 satellite/satellitedb: improve Cockroach migrate test
Load schemas in parallel instead of one-by-one.

Optimizes from 2m30s to 1m15s.

Change-Id: I0bf6381a0ae99b44271fe55d4ee658683064c097
2019-12-21 10:58:43 +00:00
Yingrong Zhao
6e71591b9b satellitedb;storagenodedb: remove unnecessary use of DB transactions in graceful exit
Change-Id: Ief0a28c6750c130896b48bfebfbea7fb3caa810f
2019-12-20 21:24:38 +00:00
Ethan
b959ccbae6 satellite/gracefulexit: Use proper rpc status codes for disqualified nodes and too many connections
Change-Id: I41380026175e7678c7cd3d44211de8eb86ce4d0f
2019-12-20 19:05:28 +00:00
Jeff Wendling
ffbc43d170 satellite/satellitedb/dbx: monitor the database calls
Change-Id: I9fbdc4a35fedfc7a1b4e1b630d2a3664b3218e67
2019-12-18 21:57:36 +00:00
Isaac Hess
d5d0c442ac satellite/accounting/rollup: Use lastRollup as zero-value
In satellite/accounting/rollup Service.RollupStorage we have a few
potential error scenarios that return time.Now(). Especially in the case
where we exit early because we have received 0 tallies since the *last*
rollup, this creates a potential race condition.

Between the time we call GetTalliesSince and realize it is empty, it's
possible a tally was inserted in that interval. As currently written we
are returning a latestTally time that excludes that tally.

We are currently protected because in Service.Rollup we don't save the
rollup unless we have populated the rollupStats. However, this change is
more correct and future-proof, because Service.RollupStorage should
always return a correct latestTally time, which in case of errors and
empty tallies, is the last successful tally.

Change-Id: I2521a2cc9802c8f06e512dde4422803a272e2a0a
2019-12-18 21:33:33 +00:00
littleskunk
d5c5b57fac satellite/db: enable DeleteTallies
Change-Id: I1e2a6873b3e6398260e053592d676993272b960d
2019-12-18 13:16:06 +00:00
Kaloyan Raev
f8d0864630 satellite/metainfo: use KnownReliable in DeleteObjectPieces
This reduces the number of queries to the overlay when deleting objects.

Change-Id: I28ed2c2d225e0c5eb1a8d952235fa7e5837a48d1
2019-12-18 12:38:22 +00:00