docs/blueprints: certified nodes

Change-Id: I7c670d740e4c3d1035dee145ed65aaed4cbaba0c
2023-03-23 15:11:51 -04:00 · 2023-03-23 15:11:51 -04:00 · a85c080509
commit a85c080509
parent a4d68b9b7e
1 changed files with 273 additions and 0 deletions
--- a/docs/blueprints/certified-nodes.md
+++ b/docs/blueprints/certified-nodes.md
@ -0,0 +1,273 @@
 # Node and operator certification
 ## Abstract
 This is a proposal for a small feature and service that allows for nodes and
 operators to have signed tags of certain kinds for use in project-specific or
 Satellite-specific node selection.
 ## Background/context
 We have a couple of ongoing needs:
 * 1099 KYC
 * Private storage node networks
 * SOC2/HIPAA/etc node certification
 * Voting and operator signaling
 ### 1099 KYC
 The United States has a rule that if node operators earn more than $600/year,
 we need to file a 1099 for each of them. Our current way of dealing with this
 is manual and time consuming, and so it would be nice to automate it.
 Ultimately, we should be able to automatically:
 1) keep track of which nodes are run by operators under or over the $600
   threshold.
 2) keep track of if an automated KYC service has signed off that we have the
   necessary information to file a 1099.
 3) automatically suspend nodes that have earned more than $600 but have not
   provided legally required information.
 ### Private storage node networks
 We have seen growing interest from customers that want to bring their own
 hard drives, or be extremely choosy about the nodes they are willing to work
 with. The current way we are solving this is spinning up private Satellites
 that are configured to only work with the nodes those customers provide, but
 it would be better if we didn't have to start custom Satellites for this.
 Instead, it would be nice to have a per-project configuration on an existing
 Satellite that allowed that project to specify a specific subset of verified
 or validated nodes, e.g., Project A should be able to say only nodes from
 node providers B and C should be selected. Symmetrically, Nodes from providers
 B and C may only want to accept data from certain projects, like Project A.
 When nodes from providers B and C are added to the Satellite, they should be
 able to provide a provider-specific signature, and requirements about
 customer-specific requirements, if any.
 ### SOC2/HIPAA/etc node certification
 This is actually just a slightly different shape of the private storage node
 network problem, but instead of being provider-specific, it is property
 specific.
 Perhaps Project D has a compliance requirement. They can only store data
 on nodes that meet specific requirements.
 Node operators E and F are willing to conform and attest to these compliance
 requirements, but don't know about project D. It would be nice if Node
 operators E and F could navigate to a compliance portal and see a list of
 potential compliance attestations available. For possible compliance
 attestations, node operators could sign agreements for these, and then receive
 a verified signature that shows their selected compliance options.
 Then, Project D's node selection process would filter by nodes that had been
 approved for the necessary compliance requirements.
 ### Voting and operator signaling
 As Satellite operators ourselves, we are currently engaged in a discussion about
 pricing changes with storage node operators. Future Satellite operators may find
 themselves in similar situations. It would be nice if storage node operators
 could indicate votes for values. This would potentially be more representative
 of network sentiment than posts on a forum.
 Note that this isn't a transparent voting scheme, where other voters can see
 the votes made, so this may not be a great voting solution in general.
 ## Design and implementation
 I believe there are two basic building blocks that solves all of the above
 issues:
 * Signed node tags (with potential values)
 * A document signing service
 ### Signed node tags
 The network representation:
 ```
 message Tag {
    // Note that there is a signal flat namespace of all names per
    // signer node id. Signers should be careful to make sure that
    // there are no name collisions. For self-signed content-hash
    // based values, the name should have the prefix of the content
    // hash.
    string name = 1;
    bytes value = 2; // optional, representation dependent on name.
 }
 message TagSet {
    // must always be set. this is the node the signer is signing for.
    bytes node_id = 1;
    repeated Tag tags = 2;
    // must always be set. this makes sure the signature is signing the
    // timestamp inside.
    int64 timestamp = 3;
 }
 message SignedTagSet {
    // this is the seralized form of TagSet, serialized so that
    // the signature process has something stable to work with.
    bytes serialized_tag = 1;
    // this is who signed (could be self signed, could be well known).
    bytes signer_node_id = 3;
    bytes signature = 4;
 }
 message SignedTagSets {
    repeated SignedTagSet tags = 1;
 }
 ```
 Note that every tag is signing a name/value pair (value optional) against
 a specific node id.
 Note also that names are only unique within the namespace of a given signer.
 The database representation on the Satellite. N.B.: nothing should be entered
 into this database without validation:
 ```
 model signed_tags (
    field node_id            blob
    field name               text
    field value              blob
    field timestamp          int64
    field signer_node_id     blob
 )
 ```
 The "signer_node_id" is worth more explanation. Every signer should have a
 stable node id. Satellites and storage nodes already have one, but any other
 service that validates node tags would also need one.
 In particular, the document signing service (below) would have its own unique
 node id for signing tags, whereas for voting-style tags or tags based on a
 content-addressed identifier (e.g. a hash of a document), the nodes would
 self-sign.
 ### Document signing service
 We would start a small web service, where users can log in and sign and fill
 out documents. This web service would then create a unique activation code
 that storage node operators could run on their storage nodes for activation and
 signing. They could run `storagenode activate <code>` and then the node would
 reach out to the signing service and get a `SignedTag` related to that node
 given the information the user provided. The node could then present these
 to the satellite.
 Ultimately, the document signing service will require a separate design doc,
 but here are some considerations for it:
 Activation codes must expire shortly. Even Netflix has two hours of validity
 for their service code - for a significantly less critical use case. What would
 be a usable validity time for our use case? 15 minutes? 1 hour? Should we make
 it configurable?
 We want to still keep usability in mind for a SNO who needs to activate 500
 nodes.
 It would be even better if the SNO could force invalidating the activation code
 when they are done with it.
 As activation codes expire, the SNO should be able to generate a new activation
 code if they want to associate a new node to an already signed document.
 It should be hard to brute-force activation codes. They shouldn't be simple
 numbers (4-digit or 6-digit) but something as complex as UUID.
 It's also possible that SNO uses some signature mechanism during signing service
 authentication, and the same signature is used for activation. If the same
 signature mechanism is used during activation then no token is necessary.
 ### Update node selection
 Once the above two building blocks exist, many problems become much more easily
 solvable.
 We would want to extend node selection to be able to do queries,
 given project-specific configuration, based on these signed_tag values.
 Because node selection mostly happens in memory from cached node table data,
 it should be easy to add some denormalized data for certain selected cases,
 such as:
 * Document hashes nodes have self signed.
 * Approval states based on well known third party signer nodes (a KYC service).
 Once these fields exist, then node selection can happen as before, filtering
 for the appropriate value given project settings.
 ## How these building blocks work for the example use cases
 ### 1099 KYC
 The document signing service would have a KYC (Know Your Customer) form. Once
 filled out, the document signing service would make a `TagSet` that includes all
 of the answers to the KYC questions, for the given node id, signed by the
 document signing service's node id.
 The node would hang on to this `SignedTagSet` and submit it along with others
 in a `SignedTagSets` to Satellites occasionally (maybe once a month during
 node CheckIn).
 ### Private storage node networks
 Storage node provisioning would provide nodes with a signed `SignedTagSet`
 from a provisioning service that had its own node id. Then a private Satellite
 could be configured to require that all nodes present a `SignedTagSet` signed
 by the configured provisioning service that has that node's id in it.
 Notably - this functionality could also be solved by the older waitlist node
 identity signing certificate process, but we are slowly removing what remains
 of that feature over time.
 This functionality could also be solved by setting the Satellite's minimum
 allowable node id difficulty to the maximum possible difficulty, thus preventing
 any automatic node registration, and manually inserting node ids into the
 database. This is what we are currently doing for private network trials, but
 if `SignedTagSet`s existed, that would be easier.
 ### SOC2/HIPAA/etc node certification
 For any type of document that doesn't require any third party service
 (such as government id validation, etc), the document and its fields can be
 filled out and self signed by the node, along with a content hash of the
 document in question.
 The node would create a `TagSet`, where one field is the hash of the legal
 document that was agreed upon, and the remaining fields (with names prefixed
 by the document's content hash) would be form fields
 that the node operator filled in and ascribed to the document. Then, the
 `TagSet` would be signed by the node itself. The cryptographic nature of the
 content hash inside the `TagSet` would validate what the node operator had
 agreed to.
 ### Voting and operator signaling
 Node operators could self sign additional `Tag`s inside of a miscellaneous
 `TagSet`, including `Tag`s such as
 ```
 "storage-node-vote-20230611-network-change": "yes"
 ```
 Or similar.
 ## Open problems
 * Revocation? - `TagSets` have a timestamp inside that must be filled out. In
  The future, certain tags could have an expiry or updated values or similar.
 ## Other options
 ## Wrapup
 ## Related work