Wednesday, October 14, 2020

[oobsumlc] Encrypted data on a server

Consider a server storing client data encrypted.

Server stores Cmain = Encrypt(key=K2, plaintext=Pmain).

where Pmain is the main client data, probably large.  K2 is a securely randomly chosen key.  Also:

Server stores C1 = Encrypt(key=K1, plaintext=K2).
Server stores H = Hash(K1).

K1 and K2 are raw keys for symmetric encryption.  They are not passwords.  The server does not do any computationally expensive password-based key derivation.  (As will be described below, the client, not server, will compute a PBKDF.)  The Hash function is cryptographically secure but not computationally expensive.

Server also stores, unencrypted, some associated data (probably not the correct technical term) describing a concrete password-based key derivation function (PBKDF), including its parameters such as salt, number of rounds, and memory cost, e.g., for Argon2i.  (The algorithm does not need to be PBKDF2.)  The PBKDF algorithm and parameters were chosen by the user.

When the server needs access to the plaintext, server establishes a secure channel to the client then tells client the associated data.  Server then asks client, please provide K1.

Client parses the associated data and runs, on the client's side, the specified PBKDF algorithm with the specified parameters, deriving the key K1 from a password that the user knows.  The client UI can smoothly invoke the PBKDF, prompting the user for a password.  Client transmits K1 to the server over the secure channel.

Perhaps the associated data includes a cryptographic signature so that the client can verify that the associated data hasn't been modified since the client deposited it.  Managing signing keys is the client's responsibility.

The key-stretching PBKDF algorithm can be as computationally expensive as user wants.  The algorithm and parameters were originally chosen by the user.  In this way, the user gets a say in the security of the user's data stored in the server.  The computational cost of the key stretching is borne by the client, so the security versus convenience tradeoff is entirely within the client.  This system also avoids denial-of-service attacks against the server which could have happened if the server had to do the expensive key-stretching calculation to verify a password.

Note that the PBKDF associated data that the server stores and tells the client were only hints for the client.  If the client knows K1 directly (which is not good security practice by the client, but there's nothing stopping the client from doing this), then the client can supply K1, skipping the PBKDF computation.

Picking up where we left off, the rest of straightforward: client has told the server the key K1 over a secure channel.  (If we are used to protocols in which the client merely proves knowledge of a pre-shared key without actually transmitting the key, it might seem weird that a key gets transmitted, but because the channel is encrypted and the server is authenticated, I think transmitting the key is OK.  We leave unspecified how security for the channel is achieved.)  Server verifies that Hash(K1) == H.  The point of H is for the server to be able to quickly validate a guess K1.  Server also limits the number or rate of guesses, e.g., with CAPTCHA.

If Hash(K1) == H, then the server decrypts (with K1) C1 to get K2, then decrypts (with K2) Cmain to get Pmain.  Whatever work that needs to happen with the main data Pmain now happens.  After the work is done, server forgets K1, K2, and Pmain.  Client needs to trust that the server will forget these things and not abuse the access to the client's data.  Maybe auditing?

Inspiration was for a cloud email service.  Server keeps data at rest encrypted, but, with the user's permission, sometimes decrypts the whole thing to, for example, do a search requested by the client.

If H is compromised and the password or the PBKDF is weak, an attacker can determine K1 by brute force, guessing passwords, running the PBKDF, and checking its output against H.  It is important that the user choose a strong PBKDF.  The server can advise a minimum standard and perhaps refuse to store associated data describing too weak a PBKDF, but because the PBKDF computation is done by the client, there's ultimately no way for the server to force the client to do anything.  And it's probably best to have an option for the associated data to be free-form text so that the user can specify arbitrarily complex or new PBKDF algorithms.

(Authenticated encryption is an alternative to the hash check.  The same advice applies regarding the client needing to use a good PBKDF.)

An authenticated client can also tell the server, "please destroy the main data".  Server securely deletes (shreds) C1 (which is small), which destroys K2.  The point of this extra wrapping of the key is to allow quickly wiping the main data.  Server deallocates Cmain.  (It does not need to securely delete Cmain, because securely destroying K2 is sufficient.)

All this seems fairly straightforward, though it might violate the adage "don't roll your own crypto".

No comments :