Consider the Zipf distribution with exponent 2. Sampling over an infinite number of items is easy. To approximately sample over a finite number of items, first sample from infinity. Then use mod to wrap it around to a finite range. How much distortion does this introduce?
Even easier: anything beyond the last item, choose the first instead. The first item gets oversampled by the area of the infinite tail. Because the first item already has pretty high probability, adding a little bit more probability won't affect it, in a relative sense, very much.
No comments :
Post a Comment