storj/docs/blueprints/password-key-derivation.md

# Title: Password Key Derivation

## Abstract

This design doc describes the way cryptographic encryption keys will be derived from user provided passwords in a way that resists offline attacks and low entropy passwords.

## Background

A unique encryption key exists for every path inside of a bucket, and they are designed in a way that allows for deriving a key from a parent and some path component. For example, if you had the key for `sj://bucket/path1/path2` then you could derive the key for `sj://bucket/path1/path2/path3`.

That leaves open the question for how a root key is created. Some requirements on that root key creation include

1. It's based on the user's choice (for example, a password).
2. If the user inputs the same password on different machines, the same root key should be created.

These requirements allow users to be in full control of their encryption, and don't require users safely transporting high entropy (hard to remember) secrets to bootstrap new uplinks.

This design accommodates more requirements that allow for additional features:

3. A root key can be created for any encrypted path in a bucket, not just the bucket.
4. A table of root keys for low entropy passwords should not be possible. In other words, an attacker with knowledge of the algorithm should not be able to use a dictionary of common passwords and pre-compute what keys to check in the event of a data breach.
5. Users do not have to enter a password for every bucket they create: they can have a default password that is used for every bucket unless otherwise specified.

The third requirement allows having multiple encrypted domains to exist within a single bucket, allowing delegation of encryption. The fourth requirement enhances security if a satellite breach happens. This is usually accomplished with [salting](https://en.wikipedia.org/wiki/Salt_(cryptography)). The fifth requirement allows providing a finite amount of information to a third party that allows them access to the keys for any bucket created with the default password in the future.

## Design

First, a **root key** is defined to be, in the terminology of section 4.11 in the [whitepaper](https://storj.io/storjv3.pdf), the same as s<sub>0</sub>. Any number of root keys can be created for any bucket and (possibly empty) encrypted path. Subsequent secrets are derived using the unencrypted path components and will end up encrypted as described in the whitepaper and appended to the encrypted path used when the root key was created. This means that the root key for a bucket and encrypted path may not match the derived key from a root key for a bucket and empty encrypted path, and then deriving a key using path segments to match that encrypted path.

In other words, `rootKey(bucket, encryptedPath) != deriveKey(rootKey(bucket, ""), encryptedPath)`.

Satellites will allow clients to query for a salt for a project. The result of this query must be stable.

The following algorithm is used to create a root key from an api key, password, and optional encrypted path:

```
projectSalt = getProjectSalt(apiKey)
mixedSalt   = hmac(hash=sha256, secret=userPassword, data=projectSalt)
pathSalt    = hmac(hash=sha256, secret=mixedSalt, data=encryptedPath or "")
rootKey     = argon2id(salt=pathSalt, password=password)
```

Uplink always stores and operates on outputs of the previous algorithms (the root keys), never the input material (passwords, salts, etc).

During the folding of path components into the root key as defined by the whitepaper, if the root key was created with an empty encrypted path, the bucket name is used as the first path component in the fold. This ensures that different keys are output for the same input paths in two different buckets. A default root key is created by using an empty encrypted path.

## Rationale

This design accomplishes all of the requirements listed above.

- Each root key requires some secret that necessarily comes from the user in some way.
- The algorithm is deterministic, so the user gets the same keys on any machine where she inputs the same passwords.
- Root keys for a sub-path in a bucket are accomplished by providing a non-empty encrypted path when creating the root key.
- Because entropy is added as the salt in the `argon2id` step, dictionary attacks on low-entropy passwords cannot be precomputed.
- One can create a default root key by using an empty encrypted path and using that if no other root key applies.

Some other points of consideration include

- Clients can ignore the salt when they know they have a strong password if they wish. Indeed, they can ignore all of this and do their own key management. At a minimum, we allow the user to specify an already-formed root key in their configuration if they don't want one created from a password.
- The salt is not blindly trusted from the satellite, and is mixed with the password in an HMAC step that ensures a hostile satellite cannot predict what the actual salt will be.
- HMAC provides security against forgeries without knowing the secret, which implies that sharing the output of a HMAC does not leak information about the secret, even under attacker provided data, so I believe the usage of HMAC is safe.
- Users will already lose data if they do not back up their data pointers in the satellite (because the encryption key for the data is only stored there), so adding the salts does not add a requirement that users need to back up data in a satellite, it just adds a small amount of additional information they need to back up.

## Implementation

First, the satellite must allow salt information to be queried for projects, and it may use the project UUID to back it.

Next, the clients must be changed to run the above algorithms to create root keys.
Password based key derivation design (#2020) * initial password key derivation design * some typos/clarifications * point out that clients can ignore bucket entropy * clarification about offline table generation * change proposal to only do network requests when inputting passwords * make algorithm have shorter lines * fix algorithm example * be explicit about allowing already-formed root keys * go back to having a salt in the project/bucket metadata * some proofreading/clarifications * feedback and clarifications * clarify definition of root key 2019-05-23 18:05:16 +01:00			`# Title: Password Key Derivation`

			`## Abstract`

			`This design doc describes the way cryptographic encryption keys will be derived from user provided passwords in a way that resists offline attacks and low entropy passwords.`

			`## Background`

			A unique encryption key exists for every path inside of a bucket, and they are designed in a way that allows for deriving a key from a parent and some path component. For example, if you had the key for `sj://bucket/path1/path2` then you could derive the key for `sj://bucket/path1/path2/path3`.

			`That leaves open the question for how a root key is created. Some requirements on that root key creation include`

			`1. It's based on the user's choice (for example, a password).`
			`2. If the user inputs the same password on different machines, the same root key should be created.`

			`These requirements allow users to be in full control of their encryption, and don't require users safely transporting high entropy (hard to remember) secrets to bootstrap new uplinks.`

chore: fix typos in the documentation (#3959) 2020-11-09 20:00:34 +00:00			`This design accommodates more requirements that allow for additional features:`
Password based key derivation design (#2020) * initial password key derivation design * some typos/clarifications * point out that clients can ignore bucket entropy * clarification about offline table generation * change proposal to only do network requests when inputting passwords * make algorithm have shorter lines * fix algorithm example * be explicit about allowing already-formed root keys * go back to having a salt in the project/bucket metadata * some proofreading/clarifications * feedback and clarifications * clarify definition of root key 2019-05-23 18:05:16 +01:00
			`3. A root key can be created for any encrypted path in a bucket, not just the bucket.`
Update password based key derivation design doc (#2367) 2019-07-09 05:07:03 +01:00			`4. A table of root keys for low entropy passwords should not be possible. In other words, an attacker with knowledge of the algorithm should not be able to use a dictionary of common passwords and pre-compute what keys to check in the event of a data breach.`
			`5. Users do not have to enter a password for every bucket they create: they can have a default password that is used for every bucket unless otherwise specified.`
Password based key derivation design (#2020) * initial password key derivation design * some typos/clarifications * point out that clients can ignore bucket entropy * clarification about offline table generation * change proposal to only do network requests when inputting passwords * make algorithm have shorter lines * fix algorithm example * be explicit about allowing already-formed root keys * go back to having a salt in the project/bucket metadata * some proofreading/clarifications * feedback and clarifications * clarify definition of root key 2019-05-23 18:05:16 +01:00
Update password based key derivation design doc (#2367) 2019-07-09 05:07:03 +01:00			`The third requirement allows having multiple encrypted domains to exist within a single bucket, allowing delegation of encryption. The fourth requirement enhances security if a satellite breach happens. This is usually accomplished with [salting](https://en.wikipedia.org/wiki/Salt_(cryptography)). The fifth requirement allows providing a finite amount of information to a third party that allows them access to the keys for any bucket created with the default password in the future.`
Password based key derivation design (#2020) * initial password key derivation design * some typos/clarifications * point out that clients can ignore bucket entropy * clarification about offline table generation * change proposal to only do network requests when inputting passwords * make algorithm have shorter lines * fix algorithm example * be explicit about allowing already-formed root keys * go back to having a salt in the project/bucket metadata * some proofreading/clarifications * feedback and clarifications * clarify definition of root key 2019-05-23 18:05:16 +01:00
			`## Design`

			First, a root key is defined to be, in the terminology of section 4.11 in the [whitepaper](https://storj.io/storjv3.pdf), the same as s<sub>0</sub>. Any number of root keys can be created for any bucket and (possibly empty) encrypted path. Subsequent secrets are derived using the unencrypted path components and will end up encrypted as described in the whitepaper and appended to the encrypted path used when the root key was created. This means that the root key for a bucket and encrypted path may not match the derived key from a root key for a bucket and empty encrypted path, and then deriving a key using path segments to match that encrypted path.

			In other words, `rootKey(bucket, encryptedPath) != deriveKey(rootKey(bucket, ""), encryptedPath)`.

Update password based key derivation design doc (#2367) 2019-07-09 05:07:03 +01:00			`Satellites will allow clients to query for a salt for a project. The result of this query must be stable.`
Password based key derivation design (#2020) * initial password key derivation design * some typos/clarifications * point out that clients can ignore bucket entropy * clarification about offline table generation * change proposal to only do network requests when inputting passwords * make algorithm have shorter lines * fix algorithm example * be explicit about allowing already-formed root keys * go back to having a salt in the project/bucket metadata * some proofreading/clarifications * feedback and clarifications * clarify definition of root key 2019-05-23 18:05:16 +01:00
Update password based key derivation design doc (#2367) 2019-07-09 05:07:03 +01:00			`The following algorithm is used to create a root key from an api key, password, and optional encrypted path:`
Password based key derivation design (#2020) * initial password key derivation design * some typos/clarifications * point out that clients can ignore bucket entropy * clarification about offline table generation * change proposal to only do network requests when inputting passwords * make algorithm have shorter lines * fix algorithm example * be explicit about allowing already-formed root keys * go back to having a salt in the project/bucket metadata * some proofreading/clarifications * feedback and clarifications * clarify definition of root key 2019-05-23 18:05:16 +01:00
			```
Update password based key derivation design doc (#2367) 2019-07-09 05:07:03 +01:00			`projectSalt = getProjectSalt(apiKey)`
			`mixedSalt = hmac(hash=sha256, secret=userPassword, data=projectSalt)`
Password based key derivation design (#2020) * initial password key derivation design * some typos/clarifications * point out that clients can ignore bucket entropy * clarification about offline table generation * change proposal to only do network requests when inputting passwords * make algorithm have shorter lines * fix algorithm example * be explicit about allowing already-formed root keys * go back to having a salt in the project/bucket metadata * some proofreading/clarifications * feedback and clarifications * clarify definition of root key 2019-05-23 18:05:16 +01:00			`pathSalt = hmac(hash=sha256, secret=mixedSalt, data=encryptedPath or "")`
			`rootKey = argon2id(salt=pathSalt, password=password)`
			```

			`Uplink always stores and operates on outputs of the previous algorithms (the root keys), never the input material (passwords, salts, etc).`

Update password based key derivation design doc (#2367) 2019-07-09 05:07:03 +01:00			`During the folding of path components into the root key as defined by the whitepaper, if the root key was created with an empty encrypted path, the bucket name is used as the first path component in the fold. This ensures that different keys are output for the same input paths in two different buckets. A default root key is created by using an empty encrypted path.`

Password based key derivation design (#2020) * initial password key derivation design * some typos/clarifications * point out that clients can ignore bucket entropy * clarification about offline table generation * change proposal to only do network requests when inputting passwords * make algorithm have shorter lines * fix algorithm example * be explicit about allowing already-formed root keys * go back to having a salt in the project/bucket metadata * some proofreading/clarifications * feedback and clarifications * clarify definition of root key 2019-05-23 18:05:16 +01:00			`## Rationale`

			`This design accomplishes all of the requirements listed above.`

			`- Each root key requires some secret that necessarily comes from the user in some way.`
			`- The algorithm is deterministic, so the user gets the same keys on any machine where she inputs the same passwords.`
			`- Root keys for a sub-path in a bucket are accomplished by providing a non-empty encrypted path when creating the root key.`
			- Because entropy is added as the salt in the `argon2id` step, dictionary attacks on low-entropy passwords cannot be precomputed.
Update password based key derivation design doc (#2367) 2019-07-09 05:07:03 +01:00			`- One can create a default root key by using an empty encrypted path and using that if no other root key applies.`
Password based key derivation design (#2020) * initial password key derivation design * some typos/clarifications * point out that clients can ignore bucket entropy * clarification about offline table generation * change proposal to only do network requests when inputting passwords * make algorithm have shorter lines * fix algorithm example * be explicit about allowing already-formed root keys * go back to having a salt in the project/bucket metadata * some proofreading/clarifications * feedback and clarifications * clarify definition of root key 2019-05-23 18:05:16 +01:00
			`Some other points of consideration include`

			`- Clients can ignore the salt when they know they have a strong password if they wish. Indeed, they can ignore all of this and do their own key management. At a minimum, we allow the user to specify an already-formed root key in their configuration if they don't want one created from a password.`
			`- The salt is not blindly trusted from the satellite, and is mixed with the password in an HMAC step that ensures a hostile satellite cannot predict what the actual salt will be.`
			`- HMAC provides security against forgeries without knowing the secret, which implies that sharing the output of a HMAC does not leak information about the secret, even under attacker provided data, so I believe the usage of HMAC is safe.`
			`- Users will already lose data if they do not back up their data pointers in the satellite (because the encryption key for the data is only stored there), so adding the salts does not add a requirement that users need to back up data in a satellite, it just adds a small amount of additional information they need to back up.`

			`## Implementation`

Update password based key derivation design doc (#2367) 2019-07-09 05:07:03 +01:00			`First, the satellite must allow salt information to be queried for projects, and it may use the project UUID to back it.`
Password based key derivation design (#2020) * initial password key derivation design * some typos/clarifications * point out that clients can ignore bucket entropy * clarification about offline table generation * change proposal to only do network requests when inputting passwords * make algorithm have shorter lines * fix algorithm example * be explicit about allowing already-formed root keys * go back to having a salt in the project/bucket metadata * some proofreading/clarifications * feedback and clarifications * clarify definition of root key 2019-05-23 18:05:16 +01:00
			`Next, the clients must be changed to run the above algorithms to create root keys.`