storj/docs/blueprints/storage-node-satellite-selection.md

305 lines
10 KiB
Markdown
Raw Normal View History

# Storage Node Satellite Selection
## Abstract
This document details an enhanced method of Satellite selection and maintenance
for Storage Node operators.
## Background.
With the removal of Kademlia, Storage Nodes need a way to identify and select
Satellite's with whom to interact. The implementation of Satellite selection is
currently accomplished via a list of whitelisted Satellite URLs in the
configuration file. The list defaults to well-known satellites hard-coded into
the storage node binary. This method is simple and easy to configure at first
time setup, but unfortunately requires manual maintenance of the list going
forward.
The ideal solution would be just as easy to set up in the common case while
removing the burden of future maintenance.
## Design
The proposed design is to discover trusted Satellites from externally
maintained lists from trusted sources with the ability to manually trust/block
Satellites.
### Concepts
#### Satellite URL
A Satellite URL holds all the information needed to contact and identify a
Satellite. It is comprised of an optional scheme (i.e. `storj://`), an ID, and
an address.
The address **MUST** contain both a host and port for the purposes of this
feature.
The following are all examples of valid Satellite URLs:
```
12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S@us-central-1.tardigrade.io:7777
storj://12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S@us-central-1.tardigrade.io:7777
```
The following are invalid Satellite URLs (missing or partial IDs):
```
us-central-1.tardigrade.io:7777
12EayRS2@us-central-1.tardigrade.io:7777
storj://us-central-1.tardigrade.io:7777
storj://12EayRS2@us-central-1.tardigrade.io:7777
```
#### Trusted Satellite List
The Trusted Satellite List is a text document where each line represents the
Satellite URL of a trusted Satellite.
```
12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S@us-central-1.tardigrade.io:7777
121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6@asia-east-1.tardigrade.io:7777
```
### Storage Node Configuration
The Storage Node configuration for Satellite selection is a list of one or more
entries, where each entry is one of the following:
* Trusted Satellite List URL
* Trusted Satellite URL
* Untrusted Satellite URL
#### Trusted Satellite List URL Entry
This entry contains a URL where a Trusted Satellite List can be downloaded.
Supported schemes are `file://`, `http://` and `https://`. When using HTTP an
`https://` URL should be preferred over an `http://` URL to ensure transport
security and prevent a person-in-the-middle from tampering with the list.
Examples:
```
https://www.tardigrade.io/trusted-satellites
file:///some/path/to/trusted-satellites.txt
```
#### Trusted Satellite URL Entry
This entry contains the URL to an explicitly trusted Satellite. The format of
the entry is a Satellite URL.
#### Untrusted Satellite Entry
This entry contains the URL to an explicitly untrusted Satellite. The format of
the entry is a `!` followed by one of the following:
* Satellite ID followed by an `@` (to distinguish it from a host)
```
!121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6@
```
* Satellite host. If the host is a domain, then subdomains are also untrusted
(i.e. `!tardigrade.io` will block `us-central-1.tardigrade.io`)
```
!tardigrade.io
!us-central-1.tardigrade.io
!us-east-1.tardigrade.io
```
* Satellite URL
```
!121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6@us-central-1.tardigrade.io:7777
```
### Building the List of Trusted Satellite URLs
To build the list of trusted Satellite URLs each entry in the configuration is
traversed in order, parsed, and processed accordingly:
1. If the entry begins with a `!`, it represents an untrusted Satellite entry. It is added to the `untrusted` list, which is used later.
1. If the entry begins with `file://`, `http://`, or `https://`, it is a Trusted Satellite List URL. The URL is used to fetch a list of Trusted Satellite URLs that are added into the `trusted` list.
1. If the entry begins with `storj://`, or otherwise does not use a scheme, it is a trusted Satellite URL entry. It is added into the `trusted` list.
1. If an entry does not match any of the above, it is a configuration error.
If a Trusted Satellite List cannot be fetched a warning should be logged. If
available, the last known copy from the Trusted Satellite List URL should be
used. Storage Nodes should attempt to persist the downloaded lists. If they
cannot, a warning should be logged.
After all configuration entries have been processed, each URL in the `trusted`
list is checked against the `untrusted` list and removed if it matches an entry
. An `untrusted` entry matches a URL using the following criteria:
* When the `untrusted` entry is just a Satellite ID, it matches any URL with
that ID.
* When the `untrusted` entry is just a host, it matches any URL with the same
host. If the host is a domain name, then the entry also matches URLs within a
subdomain of that host.
* When the `untrusted` entry is a full Satellite URL, it matches any URL that
is equal.
After the `trusted` list has been pruned, the remaining URLs are aggregated
according to the following rules:
* A Satellite URL is considered _authoritative_ if it matches either of the
following criteria:
* Configured via a Trusted Satellite URL entry
* Configured via a `file://` URL
* Configured via an `https://` or `http://` Trusted Satellite List URL AND matches the domain or is a subdomain of the domain name in the Trusted Satellite List URL.
* Satellite URLs are equivalent if the address portions are equal.
* When aggregating equivalent Satellite URLs (i.e. address matches) with
differing IDs, the _authoritative_ Satellite URL wins. If neither or both are
_authoritative_, the one aggregated first wins.
#### Example
Consider the following Trusted Satellite List URLs and their contents. For
brevity sake, the full ID of each URL is being shortened (real configurations
**MUST** specify the full ID).
* `file:///path/to/some/trusted-satellites.txt`
```
1@bar.test:7777
```
* `https://foo.test/trusted-satellites`
```
2@f.foo.test:7777
2@buz.test:7777
2@qiz.test:7777
5@ohno.test:7777
```
* `https://bar.test/trusted-satellites`
```
3@f.foo.test:7777
3@bar.test:7777
3@baz.test:7777
3@buz.test:7777
3@quz.test:7777
```
* `https://baz.test/trusted-satellites`
```
4@baz.test:7777
4@qiz.test:7777
4@subdomain.quz.test:7777
```
Now consider the following configuration:
```
- !quz.test
- file:///path/to/some/trusted-satellites.txt
- https://foo.test/trusted-satellites
- https://bar.test/trusted-satellites
- https://baz.test/trusted-satellites
- 0@f.foo.test:7777
- !2@qiz.test:7777
- !5
```
After expanding each entry, we have the following unaggregated `trusted` list:
```
1@bar.test:7777 (authoritative due to file:// URL)
2@f.foo.test:7777 (authoritative due to foo.test domain)
2@buz.test:7777
2@qiz.test:7777
5@ohno.test:7777
3@f.foo.test:7777
3@bar.test:7777 (authoritative due to bar.test domain)
3@baz.test:7777
3@buz.test:7777
3@quz.test:7777
4@baz.test:7777 (authoritative due to baz.test domain)
4@qiz.test:7777
4@subdomain.quz.test:7777
0@f.foo.test:7777 (authoritative due to explicit configuration)
```
And the following `untrusted` list:
```
quz.test
2@qiz.test:7777
5@
```
The `trusted` list is pruned with the `untrusted` list, leaving the following `trusted` list:
```
1@bar.test:7777 (authoritative due to file:// URL)
2@f.foo.test:7777 (authoritative due to foo.test domain)
2@buz.test:7777
3@f.foo.test:7777
3@bar.test:7777 (authoritative due to bar.test domain)
3@baz.test:7777
3@buz.test:7777
4@baz.test:7777 (authoritative due to baz.test domain)
4@qiz.test:7777
0@f.foo.test:7777 (authoritative due to explicit configuration)
```
We aggregate from top to bottom (i.e. in the order they were specified/fetched)
and are left with the following:
```
1@bar.test:7777
2@f.foo.test:7777
2@buz.test:7777
4@baz.test:7777
4@qiz.test:7777
```
* `1@bar.test:7777` was selected because even though `3@bar.test:7777` was also
authoritative, `1@bar.test:7777` came first.
* `2@f.foo.test:7777` was selected because it was authoritative over
`3@f.foo.test:7777` and came before `0@f.foo.test`.
* `2@buz.test:7777` was selected because it came before `3@buz.test:7777` (neither was authoritative)
* `4@baz.test:7777` was selected it was authoritative over `3@baz.test:7777`, even though the latter came first.
* `4@qiz.test:7777` was selected because it was the only URL for `qiz.test:7777`
### Rebuilding the List of Trusted Satellite URLs
The list of trusted Satellite URLs should be recalculated daily (with some jitter).
### Backwards Compatibility
The old piecestore configuration (i.e. `piecestore.OldConfig`) currently contains a
comma separated list of trusted Satellite URLs (`WhitelistedSatellites`). It
defaults to the current list of known good satellites. On startup, if the new
configuration is unset, then the old configuration should be used to form
a fixed set of trusted Satellite URLs.
## Open Issues
* How long should storage nodes use cached/persisted lists for? Should lists be persisted at all?
* If aggregation yields no URLs (list URL unreachable) should we default to anything? How should this be reported?
* If block listing removes all URLs, how should this be reported?
* Can we safely auto-migrate storage nodes into this new method of management?
* How long does the storage node wait before garbage collecting pieces from Satellites it no longer trusts? Should this be manual operation?
## To Do
* Implement an endpoint at `https://www.tardigrade.io/trusted-satellites` to return the default list of trusted Satellites.
* Implement a `trust.ListConfig` configuration struct which:
* Contains the list of entries (with a release default of a single list containing `https://www.tardigrade.io/trusted-satellites`)
* Contains a refresh interval
* Maintains backwards compatibility with `WhitelistedSatellites` in `piecestore.OldConfig`
* Implement `storj.io/storj/storagenode/trust.List` that:
* Consumes `trust.ListConfig` for configuration
* Performs the initial fetching and building of trusted Satellite URLs
* Updates according to the refresh interval (with jitter)
* Refactor `storj.io/storj/storagenode/trust.Pool` to use `trust.List`