The integral of 1/x is log(x). For u=x+delta, the area under the curve between 1/u and 1/(u+1) is exactly log(u)-log(u-1) and approximately 1/x for some delta. Define f(delta,x)=log(x+delta)-log(x+delta-1). We experiment with different values of the delta offset, seeking to approximate 1/x well.

The value delta=0.5 is optimal for large x in the following sense. Define the error function Y(delta,x)=1/f(delta,x) - x. Then (limit (x tends to infinity) of Y(0.5,x)) = 0. However, approximations for small x are not so good: f(0.5,1) = 1.0986 = 1/0.9102. f(0.5,2) = 1/1.9576.

The value delta1=1/(e-1) = 0.581976706869326424385 causes the approximation to be exact at (only) x=1: f(delta1,1)=1. f(delta1,2)=1/2.0413.

For general delta, we claim without proof that (limit (x tends to infinity) of Y(delta,x)) = delta-0.5. For the above value delta1, f(delta1,x) ~= 1/(x+0.081976706869326424385) for large x.

Motivation is to sample from the Zipf distribution without having to compute and store partial sums of the harmonic series. Could also do a hybrid approach of computing and storing the probabilities of the more frequent items and using the above continuous approximation for the rest.

## No comments :

Post a Comment