docs/blueprints: certified nodes
Change-Id: I7c670d740e4c3d1035dee145ed65aaed4cbaba0c
This commit is contained in:
parent
a4d68b9b7e
commit
a85c080509
273
docs/blueprints/certified-nodes.md
Normal file
273
docs/blueprints/certified-nodes.md
Normal file
@ -0,0 +1,273 @@
|
|||||||
|
# Node and operator certification
|
||||||
|
|
||||||
|
## Abstract
|
||||||
|
|
||||||
|
This is a proposal for a small feature and service that allows for nodes and
|
||||||
|
operators to have signed tags of certain kinds for use in project-specific or
|
||||||
|
Satellite-specific node selection.
|
||||||
|
|
||||||
|
## Background/context
|
||||||
|
|
||||||
|
We have a couple of ongoing needs:
|
||||||
|
|
||||||
|
* 1099 KYC
|
||||||
|
* Private storage node networks
|
||||||
|
* SOC2/HIPAA/etc node certification
|
||||||
|
* Voting and operator signaling
|
||||||
|
|
||||||
|
### 1099 KYC
|
||||||
|
|
||||||
|
The United States has a rule that if node operators earn more than $600/year,
|
||||||
|
we need to file a 1099 for each of them. Our current way of dealing with this
|
||||||
|
is manual and time consuming, and so it would be nice to automate it.
|
||||||
|
|
||||||
|
Ultimately, we should be able to automatically:
|
||||||
|
|
||||||
|
1) keep track of which nodes are run by operators under or over the $600
|
||||||
|
threshold.
|
||||||
|
2) keep track of if an automated KYC service has signed off that we have the
|
||||||
|
necessary information to file a 1099.
|
||||||
|
3) automatically suspend nodes that have earned more than $600 but have not
|
||||||
|
provided legally required information.
|
||||||
|
|
||||||
|
### Private storage node networks
|
||||||
|
|
||||||
|
We have seen growing interest from customers that want to bring their own
|
||||||
|
hard drives, or be extremely choosy about the nodes they are willing to work
|
||||||
|
with. The current way we are solving this is spinning up private Satellites
|
||||||
|
that are configured to only work with the nodes those customers provide, but
|
||||||
|
it would be better if we didn't have to start custom Satellites for this.
|
||||||
|
|
||||||
|
Instead, it would be nice to have a per-project configuration on an existing
|
||||||
|
Satellite that allowed that project to specify a specific subset of verified
|
||||||
|
or validated nodes, e.g., Project A should be able to say only nodes from
|
||||||
|
node providers B and C should be selected. Symmetrically, Nodes from providers
|
||||||
|
B and C may only want to accept data from certain projects, like Project A.
|
||||||
|
|
||||||
|
When nodes from providers B and C are added to the Satellite, they should be
|
||||||
|
able to provide a provider-specific signature, and requirements about
|
||||||
|
customer-specific requirements, if any.
|
||||||
|
|
||||||
|
### SOC2/HIPAA/etc node certification
|
||||||
|
|
||||||
|
This is actually just a slightly different shape of the private storage node
|
||||||
|
network problem, but instead of being provider-specific, it is property
|
||||||
|
specific.
|
||||||
|
|
||||||
|
Perhaps Project D has a compliance requirement. They can only store data
|
||||||
|
on nodes that meet specific requirements.
|
||||||
|
|
||||||
|
Node operators E and F are willing to conform and attest to these compliance
|
||||||
|
requirements, but don't know about project D. It would be nice if Node
|
||||||
|
operators E and F could navigate to a compliance portal and see a list of
|
||||||
|
potential compliance attestations available. For possible compliance
|
||||||
|
attestations, node operators could sign agreements for these, and then receive
|
||||||
|
a verified signature that shows their selected compliance options.
|
||||||
|
|
||||||
|
Then, Project D's node selection process would filter by nodes that had been
|
||||||
|
approved for the necessary compliance requirements.
|
||||||
|
|
||||||
|
### Voting and operator signaling
|
||||||
|
|
||||||
|
As Satellite operators ourselves, we are currently engaged in a discussion about
|
||||||
|
pricing changes with storage node operators. Future Satellite operators may find
|
||||||
|
themselves in similar situations. It would be nice if storage node operators
|
||||||
|
could indicate votes for values. This would potentially be more representative
|
||||||
|
of network sentiment than posts on a forum.
|
||||||
|
|
||||||
|
Note that this isn't a transparent voting scheme, where other voters can see
|
||||||
|
the votes made, so this may not be a great voting solution in general.
|
||||||
|
|
||||||
|
## Design and implementation
|
||||||
|
|
||||||
|
I believe there are two basic building blocks that solves all of the above
|
||||||
|
issues:
|
||||||
|
|
||||||
|
* Signed node tags (with potential values)
|
||||||
|
* A document signing service
|
||||||
|
|
||||||
|
### Signed node tags
|
||||||
|
|
||||||
|
The network representation:
|
||||||
|
|
||||||
|
```
|
||||||
|
message Tag {
|
||||||
|
// Note that there is a signal flat namespace of all names per
|
||||||
|
// signer node id. Signers should be careful to make sure that
|
||||||
|
// there are no name collisions. For self-signed content-hash
|
||||||
|
// based values, the name should have the prefix of the content
|
||||||
|
// hash.
|
||||||
|
string name = 1;
|
||||||
|
bytes value = 2; // optional, representation dependent on name.
|
||||||
|
}
|
||||||
|
|
||||||
|
message TagSet {
|
||||||
|
// must always be set. this is the node the signer is signing for.
|
||||||
|
bytes node_id = 1;
|
||||||
|
|
||||||
|
repeated Tag tags = 2;
|
||||||
|
|
||||||
|
// must always be set. this makes sure the signature is signing the
|
||||||
|
// timestamp inside.
|
||||||
|
int64 timestamp = 3;
|
||||||
|
}
|
||||||
|
|
||||||
|
message SignedTagSet {
|
||||||
|
// this is the seralized form of TagSet, serialized so that
|
||||||
|
// the signature process has something stable to work with.
|
||||||
|
bytes serialized_tag = 1;
|
||||||
|
|
||||||
|
// this is who signed (could be self signed, could be well known).
|
||||||
|
bytes signer_node_id = 3;
|
||||||
|
bytes signature = 4;
|
||||||
|
}
|
||||||
|
|
||||||
|
message SignedTagSets {
|
||||||
|
repeated SignedTagSet tags = 1;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Note that every tag is signing a name/value pair (value optional) against
|
||||||
|
a specific node id.
|
||||||
|
|
||||||
|
Note also that names are only unique within the namespace of a given signer.
|
||||||
|
|
||||||
|
The database representation on the Satellite. N.B.: nothing should be entered
|
||||||
|
into this database without validation:
|
||||||
|
|
||||||
|
```
|
||||||
|
model signed_tags (
|
||||||
|
field node_id blob
|
||||||
|
field name text
|
||||||
|
field value blob
|
||||||
|
field timestamp int64
|
||||||
|
field signer_node_id blob
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
The "signer_node_id" is worth more explanation. Every signer should have a
|
||||||
|
stable node id. Satellites and storage nodes already have one, but any other
|
||||||
|
service that validates node tags would also need one.
|
||||||
|
In particular, the document signing service (below) would have its own unique
|
||||||
|
node id for signing tags, whereas for voting-style tags or tags based on a
|
||||||
|
content-addressed identifier (e.g. a hash of a document), the nodes would
|
||||||
|
self-sign.
|
||||||
|
|
||||||
|
### Document signing service
|
||||||
|
|
||||||
|
We would start a small web service, where users can log in and sign and fill
|
||||||
|
out documents. This web service would then create a unique activation code
|
||||||
|
that storage node operators could run on their storage nodes for activation and
|
||||||
|
signing. They could run `storagenode activate <code>` and then the node would
|
||||||
|
reach out to the signing service and get a `SignedTag` related to that node
|
||||||
|
given the information the user provided. The node could then present these
|
||||||
|
to the satellite.
|
||||||
|
|
||||||
|
Ultimately, the document signing service will require a separate design doc,
|
||||||
|
but here are some considerations for it:
|
||||||
|
|
||||||
|
Activation codes must expire shortly. Even Netflix has two hours of validity
|
||||||
|
for their service code - for a significantly less critical use case. What would
|
||||||
|
be a usable validity time for our use case? 15 minutes? 1 hour? Should we make
|
||||||
|
it configurable?
|
||||||
|
|
||||||
|
We want to still keep usability in mind for a SNO who needs to activate 500
|
||||||
|
nodes.
|
||||||
|
|
||||||
|
It would be even better if the SNO could force invalidating the activation code
|
||||||
|
when they are done with it.
|
||||||
|
|
||||||
|
As activation codes expire, the SNO should be able to generate a new activation
|
||||||
|
code if they want to associate a new node to an already signed document.
|
||||||
|
|
||||||
|
It should be hard to brute-force activation codes. They shouldn't be simple
|
||||||
|
numbers (4-digit or 6-digit) but something as complex as UUID.
|
||||||
|
|
||||||
|
It's also possible that SNO uses some signature mechanism during signing service
|
||||||
|
authentication, and the same signature is used for activation. If the same
|
||||||
|
signature mechanism is used during activation then no token is necessary.
|
||||||
|
|
||||||
|
### Update node selection
|
||||||
|
|
||||||
|
Once the above two building blocks exist, many problems become much more easily
|
||||||
|
solvable.
|
||||||
|
|
||||||
|
We would want to extend node selection to be able to do queries,
|
||||||
|
given project-specific configuration, based on these signed_tag values.
|
||||||
|
|
||||||
|
Because node selection mostly happens in memory from cached node table data,
|
||||||
|
it should be easy to add some denormalized data for certain selected cases,
|
||||||
|
such as:
|
||||||
|
|
||||||
|
* Document hashes nodes have self signed.
|
||||||
|
* Approval states based on well known third party signer nodes (a KYC service).
|
||||||
|
|
||||||
|
Once these fields exist, then node selection can happen as before, filtering
|
||||||
|
for the appropriate value given project settings.
|
||||||
|
|
||||||
|
## How these building blocks work for the example use cases
|
||||||
|
|
||||||
|
### 1099 KYC
|
||||||
|
|
||||||
|
The document signing service would have a KYC (Know Your Customer) form. Once
|
||||||
|
filled out, the document signing service would make a `TagSet` that includes all
|
||||||
|
of the answers to the KYC questions, for the given node id, signed by the
|
||||||
|
document signing service's node id.
|
||||||
|
|
||||||
|
The node would hang on to this `SignedTagSet` and submit it along with others
|
||||||
|
in a `SignedTagSets` to Satellites occasionally (maybe once a month during
|
||||||
|
node CheckIn).
|
||||||
|
|
||||||
|
### Private storage node networks
|
||||||
|
|
||||||
|
Storage node provisioning would provide nodes with a signed `SignedTagSet`
|
||||||
|
from a provisioning service that had its own node id. Then a private Satellite
|
||||||
|
could be configured to require that all nodes present a `SignedTagSet` signed
|
||||||
|
by the configured provisioning service that has that node's id in it.
|
||||||
|
|
||||||
|
Notably - this functionality could also be solved by the older waitlist node
|
||||||
|
identity signing certificate process, but we are slowly removing what remains
|
||||||
|
of that feature over time.
|
||||||
|
|
||||||
|
This functionality could also be solved by setting the Satellite's minimum
|
||||||
|
allowable node id difficulty to the maximum possible difficulty, thus preventing
|
||||||
|
any automatic node registration, and manually inserting node ids into the
|
||||||
|
database. This is what we are currently doing for private network trials, but
|
||||||
|
if `SignedTagSet`s existed, that would be easier.
|
||||||
|
|
||||||
|
### SOC2/HIPAA/etc node certification
|
||||||
|
|
||||||
|
For any type of document that doesn't require any third party service
|
||||||
|
(such as government id validation, etc), the document and its fields can be
|
||||||
|
filled out and self signed by the node, along with a content hash of the
|
||||||
|
document in question.
|
||||||
|
|
||||||
|
The node would create a `TagSet`, where one field is the hash of the legal
|
||||||
|
document that was agreed upon, and the remaining fields (with names prefixed
|
||||||
|
by the document's content hash) would be form fields
|
||||||
|
that the node operator filled in and ascribed to the document. Then, the
|
||||||
|
`TagSet` would be signed by the node itself. The cryptographic nature of the
|
||||||
|
content hash inside the `TagSet` would validate what the node operator had
|
||||||
|
agreed to.
|
||||||
|
|
||||||
|
### Voting and operator signaling
|
||||||
|
|
||||||
|
Node operators could self sign additional `Tag`s inside of a miscellaneous
|
||||||
|
`TagSet`, including `Tag`s such as
|
||||||
|
|
||||||
|
```
|
||||||
|
"storage-node-vote-20230611-network-change": "yes"
|
||||||
|
```
|
||||||
|
|
||||||
|
Or similar.
|
||||||
|
|
||||||
|
## Open problems
|
||||||
|
|
||||||
|
* Revocation? - `TagSets` have a timestamp inside that must be filled out. In
|
||||||
|
The future, certain tags could have an expiry or updated values or similar.
|
||||||
|
|
||||||
|
## Other options
|
||||||
|
|
||||||
|
## Wrapup
|
||||||
|
|
||||||
|
## Related work
|
Loading…
Reference in New Issue
Block a user