Thursday, November 23, 2023

[cmbruxok] geometric distribution

let a Bernoulli trial have probability of success p = 1/n .

the number of repetitions needed to achieve 1 success has the geometric distribution.

formulae in terms of p are from Wikipedia.  we have reformulated them in terms of n.

mean = 1/p = n

variance = (1-p)/p^2 = n*(n-1)

standard deviation = sqrt(variance) = sqrt((1-p)/p^2) = sqrt(n*(n-1))

series forms of the standard deviation, assuming small p or large n:

sqrt((1-p)/p^2) = p^-1 - 1/2 - p/8 - p^2/16 - (5*p^3)/128 - (7*p^4)/256 - ...

sqrt(n*(n - 1)) = n - 1/2 - (1/8)*n^-1 - (1/16)*n^-2 - (5/128)*n^-3 - (7/256)*n^-4 - ...

via Mathematica:

Series[Sqrt[n*(n-1)],{n,Infinity,10}]

Series[Sqrt[(1-p)/p^2],{p,0,10}, Assumptions -> (p>0)]

what's up with those coefficients?  try p <- 4p to get rid of some of the denominators:

Series[Sqrt[(1-(4*p))/(4*p)^2] , {p,0,3}, Assumptions -> (p>0) ]

(1/4)*p^-1 - 1/2 - p/2 - p^2 - (5/2)*p^3

Series[Sqrt[(1-(4*p))/(4*p)^2] , {p,0,63}, Assumptions -> (p>0) ] //InputForm

SeriesData[p, 0, {1/4, -1/2, -1/2, -1, -5/2, -7, -21, -66, -429/2, -715, -2431, -8398, -29393, -104006, -371450, -1337220, -9694845/2, -17678835, -64822395, -238819350, -883631595, -3282060210, -12233133510, -45741281820, -171529806825, -644952073662, -2430973200726, -9183676536076, -34766775458002, -131873975875180, -501121108325684, -1907493251046152, -14544636039226909/2, -27767032438524099, -106168065206121555, -406472021074865382, -1558142747453650631, -5979899192930226746, -22975402162310871182, -88366931393503350700, -340212685864987900195, -1311063521138246054410, -5056959295818949067010, -19522214955952221979620, -75426739602542675830350, -291650059796498346544020, -1128558927038624036626860, -4370164355766586695023160, -16934386878595523443214745, -65663949121084682738995950, -254776122589808569027304286, -989130828878080326811887228, -3842392835257158192615408078, -14934583472886312975071208756, -58078935727891217125276922940, -225979859013976735723804754712, -879707308304409435496239937986, -3426228463922436748774829232156, -13350476428387425952122610456332, -52044230144561152016749159406040, -202972497563788492865321721683556, -791925482298060021343386389519448, -3091063979292427825243540423608168, -12069868871522813412855729273136656, -94295850558771979787935384946380125/2}, -1, 64, 1]

the denominator 2 occurs on terms in which the exponent of p is of the form 2^k-1.

they are related to Catalan numbers, after searching OEIS.

cumulative distribution function:

cdf(x) = 1-(1-p)^x = 1 - (1 - 1/n)^x

how many trials x are necessary to have a (1-failurerate) probability of at least 1 success?

1-failurerate = 1-(1-p)^x

solve for x:

failurerate = (1-p)^x
log(failurerate) = x*log(1-p)
x = log(failurerate)/log(1-p)
= log(failurerate) / (log(n-1)-log(n))

the last line uses 1-p = (n-1)/n .

1 / (log(n-1)-log(n)) = -n + 1/2 + (1/12)*n^-1 + (1/24)*n^-2 + (19/720)*n^-3 + ...

so x ~= -log(failurerate) * (n - 0.5)

No comments:

Post a Comment