876e1be3b5
Change-Id: I1cf6f3bda84c9b9d5ccfbbaf51813d6ea9a65679
120 lines
5.8 KiB
Markdown
120 lines
5.8 KiB
Markdown
# Sparse Order Storage Take 2
|
||
|
||
This blueprint describes a way to cut down on database traffic due to bandwidth
|
||
measurement and tracking enormously. It reduces used serial storage from
|
||
terabytes to a single boolean per node per hour.
|
||
|
||
## Background
|
||
|
||
Storj needs to measure bandwidth usage in a secure, cryptographic, and
|
||
correctly incentivized way to be able to pay storage node operators for the
|
||
bandwidth that the system uses from them.
|
||
|
||
Currently we use a system described in section 4.17 of the Storj V3 whitepaper,
|
||
though "bandwidth allocations" have become named "Order Limits" and
|
||
"restricted bandwidth allocations" have become named "Orders."
|
||
|
||
We track Order Limit serial numbers in the database, created at the time a
|
||
segment begins upload, and then we use that database for preventing storage
|
||
nodes from submitting the same Order more than once.
|
||
|
||
The current naive system of keeping track of serial numbers, used or unused, is
|
||
a massive amount of load on our central database, to the point that it accounts
|
||
currently for 99% of all of the data in the Satellite DB. The plan below seeks
|
||
to eliminate terabytes of data storage in used serial tracking in Postgres for
|
||
a bit per node per hour (we have a couple thousand nodes total right now).
|
||
|
||
## Order Limit creation
|
||
|
||
The only reason we need to talk to the database currently when creating order
|
||
limits is to store which bucket the order limit belongs to. Otherwise, we sign
|
||
the order limit in a way where the Satellite can validate that it signed the
|
||
order limit later. So, instead, we should encrypt the bucket id in the order
|
||
limit.
|
||
|
||
For future proofing we should make a Satellite-eyes-only generic metadata
|
||
envelope (with support for future key/value pairs) in the order limit.
|
||
|
||
We should have the Satellite keep a set of shared symmetric keys, and we always
|
||
support decrypting with any key we know of, but we only encrypt with currently
|
||
rotating-in approved keys. Keys should have an id so the envelope can identify
|
||
which key it was encrypted with.
|
||
|
||
We should use authenticated encryption no matter what, but authenticated
|
||
encryption could additionally avoid keys needing ids (we would have to cycle
|
||
through and try keys until one worked).
|
||
|
||
## Order submission
|
||
|
||
The main problem we're trying to solve is the concept of double spends. We want
|
||
to make sure that a Satellite and an Uplink and a Storage Node all agree on
|
||
an amount of bandwidth getting used. The way we do this as described in whitepaper
|
||
section 4.17 is to have the Satellite and Uplink work together to sign what they
|
||
intend to use, and then the Storage Node is bound to not submit more than that.
|
||
We want the Storage Node to submit the bandwidth it has used, but once it "claims"
|
||
bandwidth from an Uplink, we need to make sure it can't claim the same bandwidth
|
||
twice.
|
||
|
||
Originally, this section described a rather ingenious plan that involved reusing
|
||
tools from the efficient certificate revocation literature (sparse merkle trees),
|
||
but then we realized that we don't need any of it. Here's the new plan:
|
||
|
||
The Satellite will keep Order windows, one per hour. Each Order Limit specifies
|
||
the time it was issued, and the satellite will keep a flag per node per window
|
||
representing whether or not Orders for that window were submitted.
|
||
|
||
Storage Nodes will keep track of Orders by creation hour. Storage Nodes
|
||
currently reject Orders that are older than a couple of hours. We will make sure
|
||
this is exactly an hour, and therefore, Storage Nodes will have the property
|
||
that once an hour has expired, no new orders for that hour will come in.
|
||
|
||
As pointed out by Pentium100 on the forum: we’ll need to specifically make sure
|
||
we consider that requests must start within the hour, and then not submit orders
|
||
for a window until all of the requests that started within that hour finish.
|
||
|
||
Storage Nodes will then have 48 hours to submit their Orders, in an hour batch
|
||
at a time. Submitted Orders for an hour is an all-or-nothing process. The
|
||
Storage Node will stream to the Satellite all of its Orders for that hour, and
|
||
the Satellite will result in one of three outcomes: the orders for that
|
||
window are accepted, the window was already submitted for, or an unexpected
|
||
error occurred and the storage node should try again.
|
||
|
||
The Satellite will keep track, per window per node, whether or not it has
|
||
accepted orders for that hour. If the window has already had a submission, no
|
||
further submissions for that hour can be made.
|
||
|
||
When orders are submitted, the Satellite will update the bandwidth rollup
|
||
tables in the usual way.
|
||
|
||
## Other options
|
||
|
||
See the previous version of this document on
|
||
https://review.dev.storj.io/c/storj/storj/+/1732/3 (patchset 3)
|
||
|
||
## Other concerns
|
||
|
||
## Can storage nodes submit window batches that make the Satellite consume large amounts of memory?
|
||
|
||
Depending on the storage node, the number of orders submitted within an hour
|
||
may be substantial. Care will need to be taken about the order streaming protocol
|
||
to make sure the Satellite is able to appropriately stage its own internal updates
|
||
so that it does not try and store all of the orders in memory while processing.
|
||
|
||
### Usage limits
|
||
|
||
We think usage limits are an orthogonal problem and should be solved with a
|
||
separate system to keep track of per-bandwidth-window allocations and usage.
|
||
|
||
## References
|
||
|
||
* https://review.dev.storj.io/c/storj/storj/+/1732/3
|
||
* https://www.links.org/files/RevocationTransparency.pdf
|
||
* https://ethresear.ch/t/optimizing-sparse-merkle-trees/3751
|
||
* https://eprint.iacr.org/2018/955.pdf
|
||
* https://www.cs.purdue.edu/homes/ninghui/papers/accumulator_acns07.pdf
|
||
* https://en.wikipedia.org/wiki/Benaloh_cryptosystem
|
||
* https://en.wikipedia.org/wiki/Paillier_cryptosystem#Electronic_voting
|
||
* https://en.wikipedia.org/wiki/Homomorphic_encryption#Partially_homomorphic_cryptosystems
|
||
* http://kodu.ut.ee/~lipmaa/papers/lip12b/cl-accum.pdf
|
||
* https://blog.goodaudience.com/deep-dive-on-rsa-accumulators-230bc84144d9
|