Wednesday, March 04, 2026

[ibrwpoia] full coverage by uniform sampling

B different balls in an urn.  uniformly draw with replacement N times.  what is the probability that all B balls are seen during the N draws?

given B, for what value of N is the probability (about) 50%?

simulation in Pari/GP:

u(trials, B, N)= my(success=0); for(c=1, trials, my(a=vector(B)); for(i=1, N, r=1+random(B); a[r]=1); good=1; for(i=1, B, if(0==a[i], good=0; break)); success+=good); 1.0*success/trials;

B N
2 2
4 7
8 19-20
10 26-27
16 50
20 67
30 113
32 122
40 162
50 213-214
60 267-268
64 289-290

B=64 balls, N=290 draws:
u(10^5, 64, 290)
0.50303

cpu time = 21,920 ms, real time = 21,924 ms.

exact answer: 1 - probability of failure by the inclusion-exclusion principle, simplifies to:

v(B, N)= sum(i=0, B, (-1)^i * (B-i)^N * binomial(B, i)) /B^N

1.*v(64,290)
0.504257282

exact bounds:

10 [26, 27]
20 [66, 67]
30 [112, 113]
40 [161, 162]
50 [213, 214]
60 [267, 268]
70 [322, 323]
80 [379, 380]
90 [437, 438]
100 [496, 497]
110 [556, 557]
120 [618, 619]
130 [679, 680]
140 [742, 743]
150 [806, 807]
160 [870, 871]
170 [934, 935]
180 [1000, 1001]
190 [1066, 1067]
200 [1132, 1133]
210 [1199, 1200]
220 [1266, 1267]
230 [1334, 1335]
240 [1402, 1403]
250 [1471, 1472]
260 [1540, 1541]

2 [2, 2]
4 [6, 7]
8 [19, 20]
16 [50, 51]
32 [122, 123]
64 [289, 290]
128 [667, 668]
256 [1512, 1513]
512 [3381, 3382]
1024 [7472, 7473]
2048 [16364, 16365]
4096 [35569, 35570]

the growth rate of N seems to be O(B*log B).  let's use N = B*log B exactly and calculate the probability of full coverage:

v(B, round(B*log(B)))

B N v
4 6 0.380859375
8 17 0.365562211
16 44 0.347014424
32 111 0.363466498
64 266 0.364065310
128 621 0.366295109
256 1420 0.367804277
512 3194 0.367507854
1024 7098 0.367763404
2048 15615 0.367750482
4096 34070 0.367873303

the probability seems to be converging to exp(-1) = 0.367879441 .

how many draws N are necessary to achieve at least a given probability of full coverage?

previously similar: Euler's Game of Coincidence

No comments :