Saturday, November 09, 2019

[wzojzneg] Standard library default random number generator

The default random number generator of a standard library ought to prioritize quality over performance: good but slow.  Below are some desired properties of a good pseudo random number generator (PRNG).  The general idea is that programmers should not need to be careful in using it.  They should not need to be careful to avoid pitfalls and mistakes.  The PRNG should be idiot proof.

  1. Period should be much greater than 2^32 so that it is practically impossible for a single thread to consume the entire period of the PRNG.  If the PRNG is implemented so that different seeds start at different points of just one stream, it should be practically impossible for one thread to consume so many samples from one seed that it overlaps with the start point of another seed.  Nowadays, it seems within the realm of imagination for a single thread to consume 2^64 random numbers.  If the PRNG offers the capability of fast forwarding the stream, then different seeds must produce independent streams; it cannot be implemented so that different seeds start at different points of just one stream.
  2. The number of distinct seeds should be large enough to avoid a birthday coincidence (a.k.a. birthday attack) even with a parallel large number of threads choosing different seeds at random.  With massively parallel processing, it seems within the realm of imagination to be initializing a PRNG 2^80 times, so it needs to support at least 160 bits of seed.  Different seeds should always produce different random streams, unlike say the glibc random number generator which produces the same stream when initialized with either of the seeds 0 or 1.
  3. All bits of the random samples should be equally random, avoiding the difficulty that many PRNGs have in which their low-order bits are not very random.  Every bit should have period at least 2^64.
  4. The initial (early) samples should be random; the user should not need to manually discard some initial number of samples, a problem seen in some RC4 based generators.  Also previously noted in Haskell.
  5. The PRNG should pass the various batteries of statistical tests devised for testing PRNGs.
  6. The default random number generator should be a cryptographically secure PRNG: csPRNG.  This would avoid mistakes that programmers sometimes make.  For example, in a non-cryptographic PRNG, with enough old samples, one can decode its internal state, with which one can then correctly predict all future samples.
  7. The default random number generator should not be vulnerable to side channel attacks.  This is an onerous requirement because it probably forces any program that uses the default random number generator to be setuid root in order to use secure memory, which is a limited resource (cannot be swapped to disk).  Of course, also provide non-default alternatives that aren't so hardened.
  8. The default behavior in the absence of the user specifying a seed is to seed it from a high quality entropy source.
  9. If the behavior of the default random number generator changes (or is even suspected to have changed due to a tweak in implementation), perhaps replaced with a better one, the old one should get a distinct name so that old results that used the old generator can be replicated, though there might need to be done some research determining which old generator it was, which may have been known only as the default generator at the time.

If the development of the program has reached a stage where optimizing the default PRNG is appropriate, not premature optimization, then the programmer can look into faster but harder-to-use (more gotchas) PRNGs also included in the library.

Some potentially useful primitives for csPRNG: ChaCha (SIMD optimized), AES-NI, SipHash.  Random number generation is also very amenable to multicore.

No comments :