## Friday, September 24, 2021

### [osjepxqx] format-preserving encryption with Keccak

suppose you want a random permutation.

Fisher-Yates shuffle (Knuth shuffle) requires O(n) space and O(n) time.  this is acceptable for about n < 2^30.

Fisher-Yates bounces around memory in a way very unfriendly the caches (memory hierarchy).  creating a random permutation by comparison-sorting random keys requires O(n log n) time but is typically (e.g., quicksort, mergesort) more friendly to caches.  some cleverness is needed to avoid ties between keys: for example, generate sort keys by hashing values with SHAKE256, squeezing as many output bits as needed to break ties.

if you have exactly n = 2^128 items, a 128-bit block cipher such as AES provides a random permutation.  it requires O(log(n)) space and O(log(n)) time, a tremendous savings.  (maybe O(log(n) * log(log(n))) time if we guess at the number of rounds in a generalized block cipher.)  the catch is, using (say) a 128-bit key gives you access to only 2^128 out of the (2^128)! possible permutations.  the difference between and n and n! is huge.  using a 256- bit key, AES-256 (still a 128-bit block cipher), does not improve things very much: 2^128 = 2^2^7 = exp(89), 2^256 = 2^2^8 = exp(177), (2^128)! = 2^2^135.0 = exp(3.0e+40) (via the lngamma function).

when a cipher provides a random permutation, it does not return a list as Fisher-Yates does.  instead, it provides two functions, answering the questions "what item is in slot X?" and "to which slot did item Y move"?  (imagine we started with items numbered 0 through 2^128-1 in order in slots 0 through 2^128-1.)  these functions are called decryption and encryption.

if you have between (about) 2^126+1 and 2^128-1 items, cycle walking is a clever technique that can quickly construct a random permutation among that many items from a 128-bit cipher.

if the number of items is much smaller than the block size, cycle walking takes an unreasonable amount of time.

if you have 2^(2n) items, you can build your own block cipher using a Feistel network.  (previously.)  this combined with cycle walking allows constructing random permutations of arbitrary size.  (the integerLog2 function may be useful in determining the next even power of 2 greater than or equal to a given number.)

constructing a random permutation over an arbitrary number of items is known as format-preserving encryption.

NIST Special Publication 800-38G specifies format-preserving encryption with a Feistel network and cycle walking but with many extra bells and whistles.  some of them might be for cryptographic security.  we will ignore cryptographic security; we just want a random-ish permutation in less time and space than Fisher-Yates.  we use cryptographic primitives because they are convenient.

restricting the block size to 2^(2n), we can create a binary block cipher using a balanced Feistel network.  (NIST SP 800-38G allows the radix to be different from 2 and allows odd exponents, constructing a slightly unbalanced Feistel network for odd exponents.)

Keccak, in particular its Extensible Output Function (XOF) SHAKE256, allows easy construction of Feistel's non-linear round functions F.  (NIST SP 800-38G uses AES.)

the famous Luby and Rackoff result showed that 4 rounds are sufficient if your F functions are perfect (random oracle).  let's do 10 rounds because NIST SP 800-38G does so, and because SHAKE256 isn't perfect.

we need 10 different F functions F[i], one for each round, all different.  each F must produce output of width n bits, easily satisfied by SHAKE256.  to construct the 10 different functions, prepend the input of each call to SHAKE256 with different initialization bits.  we are using Keccak as a Message Authentication Code (MAC) where the initialization bits are the key.  unlike hash functions based on the Merkle-Damgard construction (such as MD5, SHA1, and the SHA-2 family), it is safe to use simple concatenation to transform Keccak into a MAC.

call the 10 different initialization bitstrings subkeys.  Keccak has 1600 bits of internal state, so we want to supply at least 1600 bits of initialization to maximize the chance that all F functions are different.  SHAKE256 reads input in blocks of 1088 bits (called the "rate"), so let's use 2 blocks, 2176 bits of initialization.  all together, we need 21760 bits (2720 bytes) of key material for all the subkeys.

to get 21760 bits of subkeys, use Keccak's Extensible Output Function TupleHash256 on a tuple of 2 items: the number of items in the permutation, and the user-supplied random seed.  this random seed is an string supplied by the user.  using a cryptographically secure hash as the key derivation function and key expansion function thwarts the user from specifying bad subkeys, e.g., inducing the same F function in two different rounds.  (incidentally, internally the AES key schedule similarly expands AES's user-supplied keys: 128 bit key to 1408 bits, 192 bit key to 1664 bits, 256 bit key to 1920 bits.)

we use TupleHash256, not SHAKE256 or cSHAKE, because incorporating the size of the permutation, i.e., the number of items, into the randomness is important.  if the same random number seed is used to create random permutations of similar (but unequal) sizes, incorporating the size prevents the resulting random permutations from being "similar", by some measure of similarity between permutations of different sizes.

these additional ways of using Keccak are handy.

despite producing output of arbitrary length (extensible output), TupleHash256 can only produce 2^256 distinct outputs, which is between 57! and 58! .  if you have only 57 items (which incidentally is small enough for Fisher-Yates), how close does this Luby-Rackoff construction come to generating all possible permutations?

if doing many queries to the random permutation, one can save time by saving the subkeys after the key expansion (key derivation) step, or better yet, by saving the SHAKE256 internal states after the subkeys have been input (this is why we chose the subkey size to be a multiple of the SHAKE256 input block size).  the former requires 2720 bytes of storage, the latter 2000 bytes.  when permuting less than 1000 items, Fisher-Yates requires less storage (2 bytes per item to store numbers 0 through 999 representing items).  the break-even threshold is 500 items if you precompute and store both a permutation and inverse permutation.