storj/docs/blueprints/sparse-order-storage.md

# Sparse Order Storage Take 2

This blueprint describes a way to cut down on database traffic due to bandwidth
measurement and tracking enormously. It reduces used serial storage from
terabytes to a single boolean per node per hour.

## Background

Storj needs to measure bandwidth usage in a secure, cryptographic, and
correctly incentivized way to be able to pay storage node operators for the
bandwidth that the system uses from them.

Currently we use a system described in section 4.17 of the Storj V3 whitepaper,
though "bandwidth allocations" have become named "Order Limits" and
"restricted bandwidth allocations" have become named "Orders."

We track Order Limit serial numbers in the database, created at the time a
segment begins upload, and then we use that database for preventing storage
nodes from submitting the same Order more than once.

The current naive system of keeping track of serial numbers, used or unused, is
a massive amount of load on our central database, to the point that it accounts
currently for 99% of all of the data in the Satellite DB. The plan below seeks
to eliminate terabytes of data storage in used serial tracking in Postgres for
a bit per node per hour (we have a couple thousand nodes total right now).

## Order Limit creation

The only reason we need to talk to the database currently when creating order
limits is to store which bucket the order limit belongs to. Otherwise, we sign
the order limit in a way where the Satellite can validate that it signed the
order limit later. So, instead, we should encrypt the bucket id in the order
limit.

For future proofing we should make a Satellite-eyes-only generic metadata
envelope (with support for future key/value pairs) in the order limit.

We should have the Satellite keep a set of shared symmetric keys, and we always
support decrypting with any key we know of, but we only encrypt with currently
rotating-in approved keys. Keys should have an id so the envelope can identify
which key it was encrypted with.

We should use authenticated encryption no matter what, but authenticated
encryption could additionally avoid keys needing ids (we would have to cycle
through and try keys until one worked).

## Order submission

The main problem we're trying to solve is the concept of double spends. We want
to make sure that a Satellite and an Uplink and a Storage Node all agree on
an amount of bandwidth getting used. The way we do this as described in whitepaper
section 4.17 is to have the Satellite and Uplink work together to sign what they
intend to use, and then the Storage Node is bound to not submit more than that.
We want the Storage Node to submit the bandwidth it has used, but once it "claims"
bandwidth from an Uplink, we need to make sure it can't claim the same bandwidth
twice.

Originally, this section described a rather ingenious plan that involved reusing
tools from the efficient certificate revocation literature (sparse merkle trees),
but then we realized that we don't need any of it. Here's the new plan:

The Satellite will keep Order windows, one per hour. Each Order Limit specifies
the time it was issued, and the satellite will keep a flag per node per window
representing whether or not Orders for that window were submitted.

Storage Nodes will keep track of Orders by creation hour. Storage Nodes
currently reject Orders that are older than a couple of hours. We will make sure
this is exactly an hour, and therefore, Storage Nodes will have the property
that once an hour has expired, no new orders for that hour will come in.

As pointed out by Pentium100 on the forum: we’ll need to specifically make sure
we consider that requests must start within the hour, and then not submit orders
for a window until all of the requests that started within that hour finish.

Storage Nodes will then have 48 hours to submit their Orders, in an hour batch
at a time. Submitted Orders for an hour is an all-or-nothing process. The
Storage Node will stream to the Satellite all of its Orders for that hour, and
the Satellite will result in one of three outcomes: the orders for that
window are accepted, the window was already submitted for, or an unexpected
error occurred and the storage node should try again.

The Satellite will keep track, per window per node, whether or not it has
accepted orders for that hour. If the window has already had a submission, no
further submissions for that hour can be made.

When orders are submitted, the Satellite will update the bandwidth rollup
tables in the usual way.

## Other options

See the previous version of this document on
https://review.dev.storj.io/c/storj/storj/+/1732/3 (patchset 3)

## Other concerns

## Can storage nodes submit window batches that make the Satellite consume large amounts of memory?

Depending on the storage node, the number of orders submitted within an hour
may be substantial. Care will need to be taken about the order streaming protocol
to make sure the Satellite is able to appropriately stage its own internal updates
so that it does not try and store all of the orders in memory while processing.

### Usage limits

We think usage limits are an orthogonal problem and should be solved with a
separate system to keep track of per-bandwidth-window allocations and usage.

## References

 * https://review.dev.storj.io/c/storj/storj/+/1732/3
 * https://www.links.org/files/RevocationTransparency.pdf
 * https://ethresear.ch/t/optimizing-sparse-merkle-trees/3751
 * https://eprint.iacr.org/2018/955.pdf
 * https://www.cs.purdue.edu/homes/ninghui/papers/accumulator_acns07.pdf
 * https://en.wikipedia.org/wiki/Benaloh_cryptosystem
 * https://en.wikipedia.org/wiki/Paillier_cryptosystem#Electronic_voting
 * https://en.wikipedia.org/wiki/Homomorphic_encryption#Partially_homomorphic_cryptosystems
 * http://kodu.ut.ee/~lipmaa/papers/lip12b/cl-accum.pdf
 * https://blog.goodaudience.com/deep-dive-on-rsa-accumulators-230bc84144d9