# Storage Node Satellite Selection ## Abstract This document details an enhanced method of Satellite selection and maintenance for Storage Node operators. ## Background. With the removal of Kademlia, Storage Nodes need a way to identify and select Satellite's with whom to interact. The implementation of Satellite selection is currently accomplished via a list of whitelisted Satellite URLs in the configuration file. The list defaults to well-known satellites hard-coded into the storage node binary. This method is simple and easy to configure at first time setup, but unfortunately requires manual maintenance of the list going forward. The ideal solution would be just as easy to set up in the common case while removing the burden of future maintenance. ## Design The proposed design is to discover trusted Satellites from externally maintained lists from trusted sources with the ability to manually trust/block Satellites. ### Concepts #### Satellite URL A Satellite URL holds all the information needed to contact and identify a Satellite. It is comprised of an optional scheme (i.e. `storj://`), an ID, and an address. The address **MUST** contain both a host and port for the purposes of this feature. The following are all examples of valid Satellite URLs: ``` 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S@us-central-1.tardigrade.io:7777 storj://12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S@us-central-1.tardigrade.io:7777 ``` The following are invalid Satellite URLs (missing or partial IDs): ``` us-central-1.tardigrade.io:7777 12EayRS2@us-central-1.tardigrade.io:7777 storj://us-central-1.tardigrade.io:7777 storj://12EayRS2@us-central-1.tardigrade.io:7777 ``` #### Trusted Satellite List The Trusted Satellite List is a text document where each line represents the Satellite URL of a trusted Satellite. ``` 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S@us-central-1.tardigrade.io:7777 121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6@asia-east-1.tardigrade.io:7777 ``` ### Storage Node Configuration The Storage Node configuration for Satellite selection is a list of one or more entries, where each entry is one of the following: * Trusted Satellite List URL * Trusted Satellite URL * Untrusted Satellite URL #### Trusted Satellite List URL Entry This entry contains a URL where a Trusted Satellite List can be downloaded. Supported schemes are `file://`, `http://` and `https://`. When using HTTP an `https://` URL should be preferred over an `http://` URL to ensure transport security and prevent a person-in-the-middle from tampering with the list. Examples: ``` https://www.tardigrade.io/trusted-satellites file:///some/path/to/trusted-satellites.txt ``` #### Trusted Satellite URL Entry This entry contains the URL to an explicitly trusted Satellite. The format of the entry is a Satellite URL. #### Untrusted Satellite Entry This entry contains the URL to an explicitly untrusted Satellite. The format of the entry is a `!` followed by one of the following: * Satellite ID followed by an `@` (to distinguish it from a host) ``` !121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6@ ``` * Satellite host. If the host is a domain, then subdomains are also untrusted (i.e. `!tardigrade.io` will block `us-central-1.tardigrade.io`) ``` !tardigrade.io !us-central-1.tardigrade.io !us-east-1.tardigrade.io ``` * Satellite URL ``` !121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6@us-central-1.tardigrade.io:7777 ``` ### Building the List of Trusted Satellite URLs To build the list of trusted Satellite URLs each entry in the configuration is traversed in order, parsed, and processed accordingly: 1. If the entry begins with a `!`, it represents an untrusted Satellite entry. It is added to the `untrusted` list, which is used later. 1. If the entry begins with `file://`, `http://`, or `https://`, it is a Trusted Satellite List URL. The URL is used to fetch a list of Trusted Satellite URLs that are added into the `trusted` list. 1. If the entry begins with `storj://`, or otherwise does not use a scheme, it is a trusted Satellite URL entry. It is added into the `trusted` list. 1. If an entry does not match any of the above, it is a configuration error. If a Trusted Satellite List cannot be fetched a warning should be logged. If available, the last known copy from the Trusted Satellite List URL should be used. Storage Nodes should attempt to persist the downloaded lists. If they cannot, a warning should be logged. After all configuration entries have been processed, each URL in the `trusted` list is checked against the `untrusted` list and removed if it matches an entry . An `untrusted` entry matches a URL using the following criteria: * When the `untrusted` entry is just a Satellite ID, it matches any URL with that ID. * When the `untrusted` entry is just a host, it matches any URL with the same host. If the host is a domain name, then the entry also matches URLs within a subdomain of that host. * When the `untrusted` entry is a full Satellite URL, it matches any URL that is equal. After the `trusted` list has been pruned, the remaining URLs are aggregated according to the following rules: * A Satellite URL is considered _authoritative_ if it matches either of the following criteria: * Configured via a Trusted Satellite URL entry * Configured via a `file://` URL * Configured via an `https://` or `http://` Trusted Satellite List URL AND matches the domain or is a subdomain of the domain name in the Trusted Satellite List URL. * Satellite URLs are equivalent if the address portions are equal. * When aggregating equivalent Satellite URLs (i.e. address matches) with differing IDs, the _authoritative_ Satellite URL wins. If neither or both are _authoritative_, the one aggregated first wins. #### Example Consider the following Trusted Satellite List URLs and their contents. For brevity sake, the full ID of each URL is being shortened (real configurations **MUST** specify the full ID). * `file:///path/to/some/trusted-satellites.txt` ``` 1@bar.test:7777 ``` * `https://foo.test/trusted-satellites` ``` 2@f.foo.test:7777 2@buz.test:7777 2@qiz.test:7777 5@ohno.test:7777 ``` * `https://bar.test/trusted-satellites` ``` 3@f.foo.test:7777 3@bar.test:7777 3@baz.test:7777 3@buz.test:7777 3@quz.test:7777 ``` * `https://baz.test/trusted-satellites` ``` 4@baz.test:7777 4@qiz.test:7777 4@subdomain.quz.test:7777 ``` Now consider the following configuration: ``` - !quz.test - file:///path/to/some/trusted-satellites.txt - https://foo.test/trusted-satellites - https://bar.test/trusted-satellites - https://baz.test/trusted-satellites - 0@f.foo.test:7777 - !2@qiz.test:7777 - !5 ``` After expanding each entry, we have the following unaggregated `trusted` list: ``` 1@bar.test:7777 (authoritative due to file:// URL) 2@f.foo.test:7777 (authoritative due to foo.test domain) 2@buz.test:7777 2@qiz.test:7777 5@ohno.test:7777 3@f.foo.test:7777 3@bar.test:7777 (authoritative due to bar.test domain) 3@baz.test:7777 3@buz.test:7777 3@quz.test:7777 4@baz.test:7777 (authoritative due to baz.test domain) 4@qiz.test:7777 4@subdomain.quz.test:7777 0@f.foo.test:7777 (authoritative due to explicit configuration) ``` And the following `untrusted` list: ``` quz.test 2@qiz.test:7777 5@ ``` The `trusted` list is pruned with the `untrusted` list, leaving the following `trusted` list: ``` 1@bar.test:7777 (authoritative due to file:// URL) 2@f.foo.test:7777 (authoritative due to foo.test domain) 2@buz.test:7777 3@f.foo.test:7777 3@bar.test:7777 (authoritative due to bar.test domain) 3@baz.test:7777 3@buz.test:7777 4@baz.test:7777 (authoritative due to baz.test domain) 4@qiz.test:7777 0@f.foo.test:7777 (authoritative due to explicit configuration) ``` We aggregate from top to bottom (i.e. in the order they were specified/fetched) and are left with the following: ``` 1@bar.test:7777 2@f.foo.test:7777 2@buz.test:7777 4@baz.test:7777 4@qiz.test:7777 ``` * `1@bar.test:7777` was selected because even though `3@bar.test:7777` was also authoritative, `1@bar.test:7777` came first. * `2@f.foo.test:7777` was selected because it was authoritative over `3@f.foo.test:7777` and came before `0@f.foo.test`. * `2@buz.test:7777` was selected because it came before `3@buz.test:7777` (neither was authoritative) * `4@baz.test:7777` was selected it was authoritative over `3@baz.test:7777`, even though the latter came first. * `4@qiz.test:7777` was selected because it was the only URL for `qiz.test:7777` ### Rebuilding the List of Trusted Satellite URLs The list of trusted Satellite URLs should be recalculated daily (with some jitter). ### Backwards Compatability The old piecestore configuration (i.e. `piecestore.OldConfig`) currently contains a comma separated list of trusted Satellite URLs (`WhitelistedSatellites`). It defaults to the current list of known good satellites. On startup, if the new configuration is unset, then the old configuration should be used to form a fixed set of trusted Satellite URLs. ## Open Issues * How long should storage nodes use cached/persisted lists for? Should lists be persisted at all? * If aggregation yields no URLs (list URL unreachable) should we default to anything? How should this be reported? * If block listing removes all URLs, how should this be reported? * Can we safely auto-migrate storage nodes into this new method of management? * How long does the storage node wait before garbage collecting pieces from Satellites it no longer trusts? Should this be manual operation? ## To Do * Implement an endpoint at `https://www.tardigrade.io/trusted-satellites` to return the default list of trusted Satellites. * Implement a `trust.ListConfig` configuration struct which: * Contains the list of entries (with a release default of a single list containing `https://www.tardigrade.io/trusted-satellites`) * Contains a refresh interval * Maintains backwards compatability with `WhitelistedSatellites` in `piecestore.OldConfig` * Implement `storj.io/storj/storagenode/trust.List` that: * Consumes `trust.ListConfig` for configuration * Performs the initial fetching and building of trusted Satellite URLs * Updates according to the refresh interval (with jitter) * Refactor `storj.io/storj/storagenode/trust.Pool` to use `trust.List`