Consider the benchmark task of generating 2,000,000,000 random integers each between 0 and 3. This was the inner loop of a random walk on an integer lattice.
We use rand()
, the system library random number
generator, which on Linux GNU libc (glibc), happens to be a lagged
Fibonacci generator. All programs were compiled with g++
-Wall -DNDEBUG -O99
with gcc 4.2.4.
If we scale the output of rand()
the "right" way, i.e., first
converting to double
(float
only has 23
bits of precision), dividing through by RAND_MAX+1.0
,
multiplying by 4, and casting to int
,
we get this
program. We do not sample the lower bits because the are
supposedly less random than the upper bits. The run time is 159.174
seconds.
Noting that RAND_MAX
= 231-1, we can scale
for ranges that are powers of two by simply a right shift. This
gets us
this
program. The run time is 75.257 seconds, or a speedup of 2.1 times
over the base "floating point" implementation.
Examing the open-source source code to rand()
, we see that it has
considerable overhead due trying to handle various types of random
number generator state sizes. Also, because it is in a separately
compiled library, it cannot be inlined. Let us manually reimplement
the same random number generator algorithm in an inlineable C++
class. This gets
us this
program. The run time is 12.457 seconds, or a speedup of 6.0
times over the "pure integer" implementation, and a speedup of 12.8
times over the "floating point" implementation.
These are some spectacular speedups. Note that all three programs produce exactly the same result (namely, 2999980260); that is, they are all implementing the exact same random number algorithm (and scaling), resulting in exactly the same stream of random integers between 0 and 3.
While researching the internals of rand()
, I
encountered Bug
libc/3662 "Implementation bugs in random_r and friends".
No comments :
Post a Comment