Monday, July 06, 2015

[dcqhszdf] ChaCha cipher example

The ChaCha cipher seems not to get as much love as Salsa20. Here is a step-by-step example of the ChaCha round function operating on a matrix. The format of the example is loosely based on the analogous example in section 4.1 of this Salsa20 paper: D. J. Bernstein. The Salsa20 family of stream ciphers. Document ID: 31364286077dcdff8e4509f9ff3139ad. URL: http://cr.yp.to/papers.html#salsafamily. Date: 2007.12.25.

original column [a;b;c;d]
61707865
04030201
14131211
00000007

after first line of round function
65737a66
04030201
14131211
7a616573

after second line of round function
65737a66
775858a7
8e747784
7a616573

after third line of round function
dccbd30d
775858a7
8e747784
aab67ea6

after all 4 lines of round function, i.e., quarter round
dccbd30d
395746a7
392af62a
aab67ea6

original matrix, with the same original column above
61707865 3320646e 79622d32 6b206574
04030201 08070605 0c0b0a09 100f0e0d
14131211 18171615 1c1b1a19 201f1e1d
00000007 00000000 01040103 06020905

one round (4 quarter rounds on columns)
dccbd30d 109b031b 0eb5ed20 4483ec2b
395746a7 d88a8f5f 7a292fab b06c9135
392af62a 6ac28db6 dfbce7ba a234a188
aab67ea6 e8383c7a 8d694938 0791063e

after shift rows
dccbd30d 109b031b 0eb5ed20 4483ec2b
d88a8f5f 7a292fab b06c9135 395746a7
dfbce7ba a234a188 392af62a 6ac28db6
0791063e aab67ea6 e8383c7a 8d694938

after another 4 quarter rounds on columns
06b44c34 69a94c11 2ce99b08 216830d1
29b215bd 721e2a33 f0a18097 708e1ee5
2b0e8de3 b801251f 42265fb2 696de1c2
e6fef362 c96c6325 c6cc126e 82c0635a

unshifting rows (concludes 1 double round)
06b44c34 69a94c11 2ce99b08 216830d1
708e1ee5 29b215bd 721e2a33 f0a18097
42265fb2 696de1c2 2b0e8de3 b801251f
c96c6325 c6cc126e 82c0635a e6fef362

after 8 rounds (4 double rounds)
f6093fbb efaf11c6 8bd2c9a4 bf1ff3da
bf543ce8 c46c6b5e c717fe59 863195b1
2775d1a0 babe2495 1b5c653e df7dc23c
5f3e08d7 041df75f f6e58623 abc0ab7e

Adding the original input to the output of 8 rounds
5779b820 22cf7634 0534f6d6 2a40594e
c3573ee9 cc737163 d3230862 9640a3be
3b88e3b1 d2d53aaa 37777f57 ff9ce059
5f3e08de 041df75f f7e98726 b1c2b483

reading the above as bytes, little endian
20 b8 79 57 34 76 cf 22 d6 f6 34 05 4e 59 40 2a
e9 3e 57 c3 63 71 73 cc 62 08 23 d3 be a3 40 96
b1 e3 88 3b aa 3a d5 d2 57 7f 77 37 59 e0 9c ff
de 08 3e 5f 5f f7 1d 04 26 87 e9 f7 83 b4 c2 b1

same as above but with 20000 rounds (10000 double rounds)
11 a3 0a d7 30 d2 a3 dc d8 ad c8 d4 b6 e6 63 32
72 c0 44 51 e2 4c ed 68 9d 8d ff 27 99 93 70 d4
30 2e 83 09 d8 41 70 49 2c 32 fd d9 38 cc c9 ae
27 97 53 88 ec 09 65 e4 88 ff 66 7e be 7e 5d 65

The example was calculated using an implementation of ChaCha in Haskell, whose end results agree with Bernstein's C reference implementation. The Haskell implementation is polymorphic, allowing as matrix elements any data type (of any word width) implementing Bits, and parametrizable to matrices of any size 4xN. (Security is probably bad for N not equal to 4. For word width different from 32, you probably want different rotation amounts.) The flexibility comes at a cost: the implemention is 3000 times slower than Bernstein's reference C implementation (which is in turn is slower than SIMD optimized assembly-language implementations).

Also included in the same project is a similar Haskell implementation of Salsa20, parametrized to matrices of any size MxN because of the more regular structure of the Salsa20 quarter round function compared to ChaCha. We demonstrate taking advantage of polymorphism to use the same code both to evaluate Salsa20 on Word32 and to generate C code for the round function.

No comments :