Commit Graph

41 Commits

Author SHA1 Message Date
Moby von Briesen
273eb66fae cmd/storagenode,storagenode/preflight: add config flag to disable
storagenode database preflight check.

Disable preflight database check by default, and have the option to
enable it. This will allow us to enable it once it is definitely
working.

Also change the name of the config flag for preflight  time sync.

Change-Id: Ie2e20f9e25dcb38794eafa7e1505e7c6ff287c99
2020-01-17 17:53:17 +00:00
Cameron Ayer
4424697d7f satellite/accounting: refactor live accounting to hold current estimated totals
live accounting used to be a cache to store writes before they are picked up during
the tally iteration, after which the cache is cleared. This created a window in which
users could potentially exceed the storage limit. This PR refactors live accounting to
hold current estimations of space used per project. This should also reduce DB load
since we no longer need to query the satellite DB when checking space used for limiting.

The mechanism by which the new live accounting system works is as follows:

During the upload of any segment, the size of that segment is added to its respective
project total in live accounting. At the beginning of the tally iteration we record
the current values in live accounting as `initialLiveTotals`. At the end of the tally
iteration we again record the current totals in live accounting as `latestLiveTotals`.
The metainfo loop observer in tally allows us to get the project totals from what it
observed in metainfo DB which are stored in `tallyProjectTotals`. However, for any
particular segment uploaded during the metainfo loop, the observer may or may not
have seen it. Thus, we take half of the difference between `latestLiveTotals` and
`initialLiveTotals`, and add that to the total that was found during tally and set that
as the new live accounting total.

Initially, live accounting was storing the total stored amount across all nodes rather than
the segment size, which is inconsistent with how we record amounts stored in the project
accounting DB, so we have refactored live accounting to record segment size

Change-Id: Ie48bfdef453428fcdc180b2d781a69d58fd927fb
2020-01-16 10:26:49 -05:00
Jeff Wendling
78c6d5bb32 satellite/satellitedb: reported_serials table for processing orders
this commit introduces the reported_serials table. its purpose is
to allow for blind writes into it as nodes report in so that we have
minimal contention. in order to continue to accurately account for
used bandwidth, though, we cannot immediately add the settled amount.
if we did, we would have to give up on blind writes.

the table's primary key is structured precisely so that we can quickly
find expired orders and so that we maximally benefit from rocksdb
path prefix compression. we do this by rounding the expires at time
forward to the next day, effectively giving us storagenode petnames
for free. and since there's no secondary index or foreign key
constraints, this design should use significantly less space than
the current used_serials table while also reducing contention.

after inserting the orders into the table, we have a chore that
periodically consumes all of the expired orders in it and inserts
them into the existing rollups tables. this is as if we changed
the nodes to report as the order expired rather than as soon as
possible, so the belief in correctness of the refactor is higher.

since we are able to process large batches of orders (typically
a day's worth), we can use the code to maximally batch inserts into
the rollup tables to make inserts as friendly as possible to
cockroach.

Change-Id: I25d609ca2679b8331979184f16c6d46d4f74c1a6
2020-01-15 19:21:21 -07:00
Yingrong Zhao
db8aee0806 satellite/contact; storagenode/preflight: add clock check on startup for storagenode
add config preflight.enabled-local-time

Change-Id: I7b942c9bee063aae409ee6721ae9d079dff0144f
2020-01-15 15:35:26 +00:00
Egon Elbre
cd4ff0722e private/testplanet: use defaultInterval
Change-Id: Ife2810be46faaaf8cd51b193a859a88fff894a0e
2020-01-14 16:07:36 +00:00
Isaac Hess
4950d7106a satellite/orders: Add write cache for bw rollups
Change-Id: I8ba454cb2ab4742cafd6ed09120e4240874831fc
2020-01-13 22:40:51 +00:00
Egon Elbre
24958bd7d3 satellite: add ctx to DB.CreateTables
Change-Id: I9ecad624cf5a7fc9c86bb91c68f96a3a4efd2e92
2020-01-13 15:31:09 +02:00
Egon Elbre
0835b9024c private/dbutil/pgutil: add ctx argument
Change-Id: Icfd56ca8c1f831ad56c0195a0b883e8f0618daaf
2020-01-13 15:27:06 +02:00
Michal Niewrzal
b579c260ab cmd: rename "scope" flag to "access"
We decided that better name for "scope" will be "access". This change
refactors cmd part of code but don't touch libuplink. For backward
compatibility old configs with "scope" field will be loaded without any
issue. Old flag "scope" won't be supported directly from command line.

https://storjlabs.atlassian.net/browse/V3-3488

Change-Id: I349d6971c798380d147937c91e887edb5e9ae4aa
2020-01-10 15:27:53 +00:00
Natalie Ventura Villasana
6b1829f3c3
satellite/downtime: new chore estimates downtime
Adds EstimationChore to the downtime package, which is an
independent chore that finds offline nodes given a configurable
limit, then uptime checks those nodes, and sets a last contact
success or failure given a response. For failed nodes, the chore
updates the amount of downtime the node has been offline in the
DowntimeTracking table.

Design doc section: https://github.com/storj/storj/blob/master/docs/blueprints/storage-node-downtime-tracking.md#estimating-offline-time
Jira: https://storjlabs.atlassian.net/browse/V3-2545

Change-Id: I60af95803930bf9b33232b248bb20cca6f0e0b5f
2020-01-09 15:05:13 -05:00
Yingrong Zhao
76ee8a1b4c satellite: remove UptimeReputation configs from codebase
With the new storage node downtime tracking feature, we need remove current uptime reputation configs: UptimeReputationAlpha, UptimeReputationBeta, and
UptimeReputationDQ. This is the first step of removing the uptime
reputation columns from satellitedb

Change-Id: Ie8fab13295dbf545e33aeda0c4306cda4ba54e36
2020-01-08 18:54:15 +00:00
Egon Elbre
082ec81714
uplink: move to storj.io/uplink (#3746) 2020-01-08 15:40:19 +02:00
Cameron Ayer
0038abb51b private/testplanet: use redis for live accounting
storing live accounting in memory will not work, as the core and api each create
their own instance. Using redis will allow each to access the same store

Change-Id: I4c8250b579d7b6b6d8991bc890894573626effe6
2020-01-03 21:04:50 +00:00
Ethan
05b406e992 satellite:{downtime,overlay}: Implement offline node detection chore
https://storjlabs.atlassian.net/browse/V3-3398

Change-Id: I598c3bad819026377d1d113c099dc9bba8b02742
2020-01-03 17:10:03 +00:00
Ethan
8859c36234 satellite/{downtime,contact}: Add CheckNodeAvailability for use within the downtime tracking chores.
https://storjlabs.atlassian.net/browse/V3-2545

Change-Id: I1dd54a0c77cb4905bb1f350beeb82c6f7700ee70
2020-01-02 18:24:11 +00:00
Ivan Fraixedes
c3b58f1656 satellte/metainfo: Make BeginDeleteObject to delete pieces
For improving the deletion performance we are shifting the
responsibility to delete the pieces of the object from Uplink to the
Satellite.

BeginDeleteObject was the first call to return the stream ID which was
used for after retrieving the list of segments and then get addressed
order limits for deleting the pieces (of each segment) from the storage
nodes.

Now we want the Satellite deletes the pieces of all the object segments
from the storage nodes hence we don't need anymore to have several
network round trips between the Uplink and the Satellite because the
Satellite can delete all of them in the initial BegingDeleteObject
request.

satellite/metainfo.ListSegments has been changed to return 0 items if
the pointer of the last segment of an object is not found because we
need to preserve the backward compatibility with Uplinks that won't be
updated to the last release and they rely on listing the segments after
calling BeginDeleteObject for retrieving the addressed order limits
to contact the storage nodes to delete the pieces.

Change-Id: I5f99ecf27d62d65b0a062936b9b17581ef692af0
2020-01-02 15:53:59 +00:00
Egon Elbre
e03d3fb577 uplink: move configs to cmd/uplink/cmd
Change-Id: Ifc1d3440dcef429c2a6142c16f3e991abf49f1d2
2020-01-02 09:40:57 +00:00
Egon Elbre
2680bae88c private/testplanet: remove dependency to uplink
Remove direct dependency on uplink.RSConfig, this simplifies
moving the config file without introducing weird dependencies.

Change-Id: I7fd2a145401e0205d7047631df9d2810241efeec
2020-01-02 09:40:46 +00:00
Natalie Ventura Villasana
aa3e183c2e
satellite/gracefulexit: add ge eligibility check
Adds check to see if storage nodes are eligible to initiate
graceful exit, by checking their CreatedAt date and seeing if
their "age" is greater than the new config value:
NodeMinAgeInMonths
The default for this value is 6 months for now.

https://storjlabs.atlassian.net/browse/V3-3357

Change-Id: Ib807ab8987ddb5a38a27a83886490f73fe8c5816
2019-12-31 09:31:58 -05:00
Stefan Benten
758fe35aba
storagenode/orders: adding jitter to sending (#3725) 2019-12-30 21:35:26 +01:00
Egon Elbre
6615ecc9b6 common: separate repository
Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a
2019-12-27 14:11:15 +02:00
Fadila
115b8b0fc8 storagenode/piecestore: delete several pieces in a single request
This is part of the deletion performance improvement.
See https://storjlabs.atlassian.net/browse/V3-3349

Change-Id: Idcd83a302f2bd5cc3299e1a4195a7e177f452599
2019-12-27 10:58:04 +00:00
Egon Elbre
d55288cf68 pkg/rpc: replace methods with direct calls to pb
Change-Id: I8bd015d8d316a2c12c1daceca1d9fd257f6f57bc
2019-12-22 17:12:43 +02:00
Egon Elbre
9e4d833170 private/testplanet: use default interval
The default interval tries to balance:
1. ensure that most things run at least once during tests
2. ensure that they won't run over 10 times

Change-Id: I911b57b595ffbef1963654bf4a42efad1534b058
2019-12-20 17:01:30 +00:00
Ivan Fraixedes
46c8d2e9c7 private/testplanet: Wait until peer ends when closing it
Close a peer didn't guarantee that the peer ended its services and we
want that when a StopPeer method returns the peer service is actually
finished.

Change-Id: If97f41b7e404990555640c71e097ebc719678ae7
2019-12-20 14:23:25 +00:00
Egon Elbre
7455ab771b pkg/peertls/tlsopts: move test that requires testplanet
For splitting core repository we need it not to pull in testplanet
even in tests.

Change-Id: I04d46b418e6e908185a4da694cf47dc3c5cc65f0
2019-12-17 13:45:51 +00:00
Egon Elbre
b04f9996c5 pkg/rpc: move test that needs testplanet
Move rpc test that uses testplanet into private/testplanet.

This ensures that rpc doesn't have the whole system as a dependency
making it easier to separate.

This unfortunately leaves pkg/rpc without specific tests, but
we would need to write new tests that only use the core packages.

Change-Id: I402ab3c2d50282af159c2ef3371d23b0997fef0a
2019-12-17 13:31:12 +00:00
Cameron Ayer
a4f9865b47 satellite: adds and enables cockroachdb compatibility for tests
Change-Id: I85a3ad8c3b9d7e15ea8675b6c55af0002933db57
2019-12-16 22:29:25 +00:00
Andrew Harding
cb89496569 storagenode/trust: wire up list into pool
- also updated ping chore to pick up trust changes
- fixed small typo in blueprint
- fixed flags for storj-sim
- wired up changes to testplanet

Change-Id: I02982f3a63a1b4150b82a009ee126b25ed51917d
2019-12-13 20:32:50 +00:00
Cameron Ayer
6fae361c31 replace planet.Start in tests with planet.Run
planet.Start starts a testplanet system, whereas planet.Run starts a testplanet
and runs a test against it with each DB backend (cockroach compat).

Change-Id: I39c9da26d9619ee69a2b718d24ab00271f9e9bc2
2019-12-10 16:55:54 +00:00
paul cannon
378b863b2b private,satellite: unite all the "temp db schema" things
first, so that they all work the same way, because it's getting
complicated, and second, so that we can do the appropriate thing
instead of CREATE SCHEMA for cockroachdb.

Change-Id: I27fbaeeb6223a3e06d97bcf692a2d014b31465f7
2019-12-05 15:36:59 +00:00
Ivan Fraixedes
bf97ef06fc
storagenode: Add new endpoint to receive satellite requests for… (#3590)
* pkg/pg: Add new service function storage node

  Add a new service function to the storage node piece store for deleting
  pieces when satellites request them.

* storagenode/piecestore: Add endpoint to delete piece

  Add a new endpoint to receive from trusted satellites to delete a piece.

* private/testplanet: Fix storagenode mock

  Add to the storagenode mock the new endpoint method.

* proto.lock: Update it with the last protbuff changes

* storagenode/piecestore: Reuse test piece upload

  Extract the repeated logic from several tests functions for uploading a
  test piece to a test helper function.

* uplink/piecestore: Implement client side method

  Implement the client side method of the new piecestore RPC function.

* storagenode/piecestore: Add test DeletePiece endpoint

  Implement a test for the DeletePiece new endpoint method.
2019-11-26 18:47:19 +01:00
Jess G
854e5507ab
crdb uses namespaced db for each test (#3646)
* crdb uses namespaced db for each test

* add test for me test

* fix lint and tests

* updates per cr comments

* rm all replaceall
2019-11-26 08:39:57 -08:00
Egon Elbre
36fead0093 satellite/metainfo: add UserAgent support to endpoints (#3548) 2019-11-26 03:12:37 -08:00
Yingrong Zhao
79a4fff6c7
satellite/referrals: set up referrals service and http endpoints (#3566) 2019-11-25 16:36:36 -05:00
Jess G
388f33b84d
satellitedb: add support to testplanet for cockroachdb (#3634)
* update migration steps, add crdb support to testplanet

* add crdb support

* have jenkins run a bares bones crdb compat test

* skip crdb tests

* skip crdb tests

* fix root_piece_id column

* write crdb store to tmp dir

* escape
2019-11-22 11:59:46 -08:00
Yingrong Zhao
63e51df9a6
private/testplanet: add a mock referral manager server into testplanet (#3631) 2019-11-21 17:34:49 -05:00
Isaac Hess
6aeddf2f53
storagenode/pieces: Add Trash and RestoreTrash to piecestore (#3575)
* storagenode/pieces: Add Trash and RestoreTrash to piecestore

* Add index for expiration trash
2019-11-20 09:28:49 -07:00
littleskunk
8b3444e088
satellite/nodeselection: don't select nodes that haven't checked in for a while (#3567)
* satellite/nodeselection: dont select nodes that havent checked in for a while

* change testplanet online window to one minute

* remove satellite reconfigure online window = 0 in repair tests

* pass timestamp into UpdateCheckIn

* change timestamp to timestamptz

* edit tests to set last_contact_success to 4 hours ago

* fix syntax error

* remove check for last_contact_success > last_contact_failure in IsOnline
2019-11-15 23:43:06 +01:00
Yehor Butko
a8e4e9cb03
satellite/payments: project usage charges (#3512) 2019-11-15 16:27:44 +02:00
Egon Elbre
ee6c1cac8a
private: rename internal to private (#3573) 2019-11-14 21:46:15 +02:00