Monday, December 28, 2009

[ilqixood] Generating random numbers faster

Consider the benchmark task of generating 2,000,000,000 random integers each between 0 and 3. This was the inner loop of a random walk on an integer lattice.

We use rand(), the system library random number generator, which on Linux GNU libc (glibc), happens to be a lagged Fibonacci generator. All programs were compiled with g++ -Wall -DNDEBUG -O99 with gcc 4.2.4.

If we scale the output of rand() the "right" way, i.e., first converting to double (float only has 23 bits of precision), dividing through by RAND_MAX+1.0, multiplying by 4, and casting to int, we get this program. We do not sample the lower bits because the are supposedly less random than the upper bits. The run time is 159.174 seconds.

Noting that RAND_MAX = 231-1, we can scale for ranges that are powers of two by simply a right shift. This gets us this program. The run time is 75.257 seconds, or a speedup of 2.1 times over the base "floating point" implementation.

Examing the open-source source code to rand(), we see that it has considerable overhead due trying to handle various types of random number generator state sizes. Also, because it is in a separately compiled library, it cannot be inlined. Let us manually reimplement the same random number generator algorithm in an inlineable C++ class. This gets us this program. The run time is 12.457 seconds, or a speedup of 6.0 times over the "pure integer" implementation, and a speedup of 12.8 times over the "floating point" implementation.

These are some spectacular speedups. Note that all three programs produce exactly the same result (namely, 2999980260); that is, they are all implementing the exact same random number algorithm (and scaling), resulting in exactly the same stream of random integers between 0 and 3.

While researching the internals of rand(), I encountered Bug libc/3662 "Implementation bugs in random_r and friends".

No comments :