Consider the benchmark task of generating 2,000,000,000 random integers each between 0 and 3. This was the inner loop of a random walk on an integer lattice.
We use rand(), the system library random number
generator, which on Linux GNU libc (glibc), happens to be a lagged
Fibonacci generator. All programs were compiled with g++
-Wall -DNDEBUG -O99 with gcc 4.2.4.
If we scale the output of rand() the "right" way, i.e., first
converting to double (float only has 23
bits of precision), dividing through by RAND_MAX+1.0,
multiplying by 4, and casting to int,
we get this
program. We do not sample the lower bits because the are
supposedly less random than the upper bits. The run time is 159.174
seconds.
Noting that RAND_MAX = 231-1, we can scale
for ranges that are powers of two by simply a right shift. This
gets us
this
program. The run time is 75.257 seconds, or a speedup of 2.1 times
over the base "floating point" implementation.
Examing the open-source source code to rand(), we see that it has
considerable overhead due trying to handle various types of random
number generator state sizes. Also, because it is in a separately
compiled library, it cannot be inlined. Let us manually reimplement
the same random number generator algorithm in an inlineable C++
class. This gets
us this
program. The run time is 12.457 seconds, or a speedup of 6.0
times over the "pure integer" implementation, and a speedup of 12.8
times over the "floating point" implementation.
These are some spectacular speedups. Note that all three programs produce exactly the same result (namely, 2999980260); that is, they are all implementing the exact same random number algorithm (and scaling), resulting in exactly the same stream of random integers between 0 and 3.
While researching the internals of rand(), I
encountered Bug
libc/3662 "Implementation bugs in random_r and friends".
No comments :
Post a Comment