docs/blueprints: certified nodes
Change-Id: I7c670d740e4c3d1035dee145ed65aaed4cbaba0c
This commit is contained in:
parent
a4d68b9b7e
commit
a85c080509
273
docs/blueprints/certified-nodes.md
Normal file
273
docs/blueprints/certified-nodes.md
Normal file
@ -0,0 +1,273 @@
|
||||
# Node and operator certification
|
||||
|
||||
## Abstract
|
||||
|
||||
This is a proposal for a small feature and service that allows for nodes and
|
||||
operators to have signed tags of certain kinds for use in project-specific or
|
||||
Satellite-specific node selection.
|
||||
|
||||
## Background/context
|
||||
|
||||
We have a couple of ongoing needs:
|
||||
|
||||
* 1099 KYC
|
||||
* Private storage node networks
|
||||
* SOC2/HIPAA/etc node certification
|
||||
* Voting and operator signaling
|
||||
|
||||
### 1099 KYC
|
||||
|
||||
The United States has a rule that if node operators earn more than $600/year,
|
||||
we need to file a 1099 for each of them. Our current way of dealing with this
|
||||
is manual and time consuming, and so it would be nice to automate it.
|
||||
|
||||
Ultimately, we should be able to automatically:
|
||||
|
||||
1) keep track of which nodes are run by operators under or over the $600
|
||||
threshold.
|
||||
2) keep track of if an automated KYC service has signed off that we have the
|
||||
necessary information to file a 1099.
|
||||
3) automatically suspend nodes that have earned more than $600 but have not
|
||||
provided legally required information.
|
||||
|
||||
### Private storage node networks
|
||||
|
||||
We have seen growing interest from customers that want to bring their own
|
||||
hard drives, or be extremely choosy about the nodes they are willing to work
|
||||
with. The current way we are solving this is spinning up private Satellites
|
||||
that are configured to only work with the nodes those customers provide, but
|
||||
it would be better if we didn't have to start custom Satellites for this.
|
||||
|
||||
Instead, it would be nice to have a per-project configuration on an existing
|
||||
Satellite that allowed that project to specify a specific subset of verified
|
||||
or validated nodes, e.g., Project A should be able to say only nodes from
|
||||
node providers B and C should be selected. Symmetrically, Nodes from providers
|
||||
B and C may only want to accept data from certain projects, like Project A.
|
||||
|
||||
When nodes from providers B and C are added to the Satellite, they should be
|
||||
able to provide a provider-specific signature, and requirements about
|
||||
customer-specific requirements, if any.
|
||||
|
||||
### SOC2/HIPAA/etc node certification
|
||||
|
||||
This is actually just a slightly different shape of the private storage node
|
||||
network problem, but instead of being provider-specific, it is property
|
||||
specific.
|
||||
|
||||
Perhaps Project D has a compliance requirement. They can only store data
|
||||
on nodes that meet specific requirements.
|
||||
|
||||
Node operators E and F are willing to conform and attest to these compliance
|
||||
requirements, but don't know about project D. It would be nice if Node
|
||||
operators E and F could navigate to a compliance portal and see a list of
|
||||
potential compliance attestations available. For possible compliance
|
||||
attestations, node operators could sign agreements for these, and then receive
|
||||
a verified signature that shows their selected compliance options.
|
||||
|
||||
Then, Project D's node selection process would filter by nodes that had been
|
||||
approved for the necessary compliance requirements.
|
||||
|
||||
### Voting and operator signaling
|
||||
|
||||
As Satellite operators ourselves, we are currently engaged in a discussion about
|
||||
pricing changes with storage node operators. Future Satellite operators may find
|
||||
themselves in similar situations. It would be nice if storage node operators
|
||||
could indicate votes for values. This would potentially be more representative
|
||||
of network sentiment than posts on a forum.
|
||||
|
||||
Note that this isn't a transparent voting scheme, where other voters can see
|
||||
the votes made, so this may not be a great voting solution in general.
|
||||
|
||||
## Design and implementation
|
||||
|
||||
I believe there are two basic building blocks that solves all of the above
|
||||
issues:
|
||||
|
||||
* Signed node tags (with potential values)
|
||||
* A document signing service
|
||||
|
||||
### Signed node tags
|
||||
|
||||
The network representation:
|
||||
|
||||
```
|
||||
message Tag {
|
||||
// Note that there is a signal flat namespace of all names per
|
||||
// signer node id. Signers should be careful to make sure that
|
||||
// there are no name collisions. For self-signed content-hash
|
||||
// based values, the name should have the prefix of the content
|
||||
// hash.
|
||||
string name = 1;
|
||||
bytes value = 2; // optional, representation dependent on name.
|
||||
}
|
||||
|
||||
message TagSet {
|
||||
// must always be set. this is the node the signer is signing for.
|
||||
bytes node_id = 1;
|
||||
|
||||
repeated Tag tags = 2;
|
||||
|
||||
// must always be set. this makes sure the signature is signing the
|
||||
// timestamp inside.
|
||||
int64 timestamp = 3;
|
||||
}
|
||||
|
||||
message SignedTagSet {
|
||||
// this is the seralized form of TagSet, serialized so that
|
||||
// the signature process has something stable to work with.
|
||||
bytes serialized_tag = 1;
|
||||
|
||||
// this is who signed (could be self signed, could be well known).
|
||||
bytes signer_node_id = 3;
|
||||
bytes signature = 4;
|
||||
}
|
||||
|
||||
message SignedTagSets {
|
||||
repeated SignedTagSet tags = 1;
|
||||
}
|
||||
```
|
||||
|
||||
Note that every tag is signing a name/value pair (value optional) against
|
||||
a specific node id.
|
||||
|
||||
Note also that names are only unique within the namespace of a given signer.
|
||||
|
||||
The database representation on the Satellite. N.B.: nothing should be entered
|
||||
into this database without validation:
|
||||
|
||||
```
|
||||
model signed_tags (
|
||||
field node_id blob
|
||||
field name text
|
||||
field value blob
|
||||
field timestamp int64
|
||||
field signer_node_id blob
|
||||
)
|
||||
```
|
||||
|
||||
The "signer_node_id" is worth more explanation. Every signer should have a
|
||||
stable node id. Satellites and storage nodes already have one, but any other
|
||||
service that validates node tags would also need one.
|
||||
In particular, the document signing service (below) would have its own unique
|
||||
node id for signing tags, whereas for voting-style tags or tags based on a
|
||||
content-addressed identifier (e.g. a hash of a document), the nodes would
|
||||
self-sign.
|
||||
|
||||
### Document signing service
|
||||
|
||||
We would start a small web service, where users can log in and sign and fill
|
||||
out documents. This web service would then create a unique activation code
|
||||
that storage node operators could run on their storage nodes for activation and
|
||||
signing. They could run `storagenode activate <code>` and then the node would
|
||||
reach out to the signing service and get a `SignedTag` related to that node
|
||||
given the information the user provided. The node could then present these
|
||||
to the satellite.
|
||||
|
||||
Ultimately, the document signing service will require a separate design doc,
|
||||
but here are some considerations for it:
|
||||
|
||||
Activation codes must expire shortly. Even Netflix has two hours of validity
|
||||
for their service code - for a significantly less critical use case. What would
|
||||
be a usable validity time for our use case? 15 minutes? 1 hour? Should we make
|
||||
it configurable?
|
||||
|
||||
We want to still keep usability in mind for a SNO who needs to activate 500
|
||||
nodes.
|
||||
|
||||
It would be even better if the SNO could force invalidating the activation code
|
||||
when they are done with it.
|
||||
|
||||
As activation codes expire, the SNO should be able to generate a new activation
|
||||
code if they want to associate a new node to an already signed document.
|
||||
|
||||
It should be hard to brute-force activation codes. They shouldn't be simple
|
||||
numbers (4-digit or 6-digit) but something as complex as UUID.
|
||||
|
||||
It's also possible that SNO uses some signature mechanism during signing service
|
||||
authentication, and the same signature is used for activation. If the same
|
||||
signature mechanism is used during activation then no token is necessary.
|
||||
|
||||
### Update node selection
|
||||
|
||||
Once the above two building blocks exist, many problems become much more easily
|
||||
solvable.
|
||||
|
||||
We would want to extend node selection to be able to do queries,
|
||||
given project-specific configuration, based on these signed_tag values.
|
||||
|
||||
Because node selection mostly happens in memory from cached node table data,
|
||||
it should be easy to add some denormalized data for certain selected cases,
|
||||
such as:
|
||||
|
||||
* Document hashes nodes have self signed.
|
||||
* Approval states based on well known third party signer nodes (a KYC service).
|
||||
|
||||
Once these fields exist, then node selection can happen as before, filtering
|
||||
for the appropriate value given project settings.
|
||||
|
||||
## How these building blocks work for the example use cases
|
||||
|
||||
### 1099 KYC
|
||||
|
||||
The document signing service would have a KYC (Know Your Customer) form. Once
|
||||
filled out, the document signing service would make a `TagSet` that includes all
|
||||
of the answers to the KYC questions, for the given node id, signed by the
|
||||
document signing service's node id.
|
||||
|
||||
The node would hang on to this `SignedTagSet` and submit it along with others
|
||||
in a `SignedTagSets` to Satellites occasionally (maybe once a month during
|
||||
node CheckIn).
|
||||
|
||||
### Private storage node networks
|
||||
|
||||
Storage node provisioning would provide nodes with a signed `SignedTagSet`
|
||||
from a provisioning service that had its own node id. Then a private Satellite
|
||||
could be configured to require that all nodes present a `SignedTagSet` signed
|
||||
by the configured provisioning service that has that node's id in it.
|
||||
|
||||
Notably - this functionality could also be solved by the older waitlist node
|
||||
identity signing certificate process, but we are slowly removing what remains
|
||||
of that feature over time.
|
||||
|
||||
This functionality could also be solved by setting the Satellite's minimum
|
||||
allowable node id difficulty to the maximum possible difficulty, thus preventing
|
||||
any automatic node registration, and manually inserting node ids into the
|
||||
database. This is what we are currently doing for private network trials, but
|
||||
if `SignedTagSet`s existed, that would be easier.
|
||||
|
||||
### SOC2/HIPAA/etc node certification
|
||||
|
||||
For any type of document that doesn't require any third party service
|
||||
(such as government id validation, etc), the document and its fields can be
|
||||
filled out and self signed by the node, along with a content hash of the
|
||||
document in question.
|
||||
|
||||
The node would create a `TagSet`, where one field is the hash of the legal
|
||||
document that was agreed upon, and the remaining fields (with names prefixed
|
||||
by the document's content hash) would be form fields
|
||||
that the node operator filled in and ascribed to the document. Then, the
|
||||
`TagSet` would be signed by the node itself. The cryptographic nature of the
|
||||
content hash inside the `TagSet` would validate what the node operator had
|
||||
agreed to.
|
||||
|
||||
### Voting and operator signaling
|
||||
|
||||
Node operators could self sign additional `Tag`s inside of a miscellaneous
|
||||
`TagSet`, including `Tag`s such as
|
||||
|
||||
```
|
||||
"storage-node-vote-20230611-network-change": "yes"
|
||||
```
|
||||
|
||||
Or similar.
|
||||
|
||||
## Open problems
|
||||
|
||||
* Revocation? - `TagSets` have a timestamp inside that must be filled out. In
|
||||
The future, certain tags could have an expiry or updated values or similar.
|
||||
|
||||
## Other options
|
||||
|
||||
## Wrapup
|
||||
|
||||
## Related work
|
Loading…
Reference in New Issue
Block a user