Wednesday, October 12, 2016

[ugawzrcp] Entropy loss in repeated hashing

How much entropy is lost when iterating a hash function?  We expand on this answer:

The probability a given bucket is occupied (gets hit), or equivalently the proportion of buckets occupied, when there are n buckets and f[i]*n trials is

f[i+1] = 1- (1 - 1/n)^(f[i] *n)
= 1 - z^f[i]
where z = (1 - 1/n)^n
n = 2^b where b is the width of the hash, e.g., b=128 for MD5.
f[0] = 1

The quantity z could be approximated as exp(-1) for large n, and the whole expression could be approximated as f[i+1] = 1-exp(-f[i]), but we do not have to make these approximations.  Evaluate z directly with arbitrary precision arithmetic to avoid round-off error.

Similarly, the recurrence could be approximately solved as a closed-form function of its index, but we do not have to approximate.  Instead, explicitly iterate the recurrence however many times you were planning to iterate the hash function.  This computation will take roughly the same amount of time that the iterated hash function iterations would have taken.  The recurrence computation can be done with double-precision floating point, or with arbitrary precision arithmetic if you are worried about round-off error.

(Aside: the first iteration loses approximately 0.6617 bits of entropy.  If we naively assume the amount of entropy lost each iteration is the same, then SHA-512 would converge to a fixed point after 774 iterations.  This intuition is false, because as the number of trials decreases, we are less likely to lose filled buckets due to collisions.)

Finally, evaluate the bits of entropy lost as -log(f[i])/log(2).  It may be instructive to choose the number of iterations to be a power of 2: i = 2^j.  Experimentally, I confirm that the answer about j-1 bits.

Note well, this is still an approximation because the it does not account for the hash function being the same each iteration, so that values can go in a loop.

Aside: run Floyd's cycle-finding algorithm on various hash functions and other cryptographic primitives to see if short cycles can be found.

No comments :