Monday, March 22, 2021

[obrhagiy] counting thoroughbred names

thoroughbred horse names must be 18 characters or less, including spaces.  how many possible names are there?  (tl;dr: 54894134287950926521040514 .)

assume the following restrictions:

  1. at least 1 character.
  2. first character cannot be space; nor can last character.
  3. cannot have 2 or more spaces in a row.

in reality, there are many more rules (e.g., punctuation permitted and counted, no vulgarities, cannot be similar to famous horses), but we ignore them.

this feels like counting sentences composed of words separated by spaces.

let x be the number of letters in the alphabet.  x=26.  parameterizing by x allows trying small values for x and verifying answers by hand.

ignore the first and last characters for now; they will just contribute a factor of x^2.

let f(n) be the number of possibilities of middle characters in a (middle character)-string of length n.

if the first character of the string is a letter, there are x possibilities for that letter.  the rest of the string is the same problem just 1 character shorter, so x*f(n-1) possibilities.

if the first character is a space (1 possibility), the next character must be a letter (x possibilities) (because cannot have two spaces in a row), and then the next character after that can be anything, the same problem 2 characters shorter.  product of possibilities is 1*x*f(n-2).

total: f(n) = x * (f(n-1) + f(n-2)) , foreshadowing Fibonacci.

base cases: f(0)=1 (empty string is the 1 possibility); f(1)=x+1 (all letters plus space).

in Pari/GP:

f(n)=if(n==0,1,if(n==1,x+1,x*(f(n-1)+f(n-2))))

if the alphabet has only 1 letter, then the number of possibilities of middle characters is the Fibonacci sequence:

? x=1
1

? for(i=0,10,print(i," ",f(i)))
0 1
1 2
2 3
3 5
4 8
5 13
6 21
7 34
8 55
9 89
10 144

we explicitly list the possibilities for short strings, verifying the numbers above.  period is the space character.

f(1)={a .}
f(2)={a. .a aa}
f(3)={aaa aa. a.a .aa .a.}
f(4)={aaaa aaa. aa.a a.aa .aaa a.a. .a.a .aa.}

alphabet of 2 characters, and lists:

? x=2
2
? f(2)
8

{aa ab ba bb a. b. .a .b}

? f(3)
22

{aaa aab aba abb baa bab bba bbb
aa. ab. ba. bb
a.a a.b b.a b.b
.aa .ab .ba .bb
.a. .b.}

? kill(x)
? for(i=0,10,print(f(i)))
1
x + 1
x^2 + 2*x
x^3 + 3*x^2 + x
x^4 + 4*x^3 + 3*x^2
x^5 + 5*x^4 + 6*x^3 + x^2
x^6 + 6*x^5 + 10*x^4 + 4*x^3
x^7 + 7*x^6 + 15*x^5 + 10*x^4 + x^3
x^8 + 8*x^7 + 21*x^6 + 20*x^5 + 5*x^4
x^9 + 9*x^8 + 28*x^7 + 35*x^6 + 15*x^5 + x^4
x^10 + 10*x^9 + 36*x^8 + 56*x^7 + 35*x^6 + 6*x^5

we recognize Pascal's triangle in the polynomial coefficients, so conjecture the following faster formula to be equivalent to the recursive definition.

ff(n)=sum(i=0,n,x^i*binomial(i+1,n-i))

we numerically verify the conjecture (this is not a proof), comparing algebraic expressions in x, for string lengths up to 32:

? for(i=0,32,if(f(i)!=ff(i),break);print(i))
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

time = 7,941 ms.

alternatively, memoization could have achieved efficient computation of f.

the total number of possible thoroughbred names of lengths up to n is the number of 1-character names, namely x, plus the number of the 2-character through n-character names.  here we incorporate the factor of x^2 for the first and last characters of names of length at least 2.  length 0 is a special case: the empty string is not a valid name.

sf(n)=if(n<=0,0, x + x^2*sum(i=0,n-2,ff(i)))

? sf(0)
0
? sf(1)
x
? sf(2)
x^2 + x
? sf(3)
x^3 + 2*x^2 + x
? sf(18)
x^18 + 17*x^17 + 121*x^16 + 470*x^15 + 1093*x^14 + 1586*x^13 + 1486*x^12 + 968*x^11 + 511*x^10 + 256*x^9 + 128*x^8 + 64*x^7 + 32*x^6 + 16*x^5 + 8*x^4 + 4*x^3 + 2*x^2 + x

? x=26
26
? sf(18)
54894134287950926521040514

thus, there are about 5 * 10^25 possible thoroughbred horse names using the letters A-Z and space.  86 computer bits.

we next compare with simpler versions of the problem.

let y1 be the number of strings without spaces, exactly 18 characters:
? y1=26^18
29479510200013918864408576

let y2 be the number of non-empty strings without spaces, up to 18 characters:
? y2=sum(i=1,18,26^i)
30658690608014475618984918

let y3 be the number we calculated above:
? y3=sf(18)
54894134287950926521040514

then, we have the following ratios:
y2/y1 = 1.04
y3/y1 = 1.86
y3/y2 = 1.79

let y1s be the number of 18-character strings permitting spaces anywhere in the middle 16 characters, including permitting 2 or more spaces in a row:
? y1s=26^2 * 27^(18-2)
53922115519965816667632036

let y2s be the number of strings, from 1 to 18 characters, permitting spaces anywhere in the middle, including permitting 2 or more spaces in a row:
? y2s=26+sum(i=2, 18, 26^2 * 27^(i-2))
55996043039964501924079422

we have the following ratios:
y1s/y1 = (27/26)^16 = 1.829
y2s/y2 = 1.826
y2s/y3 = 1.020

next, we explore primality and factorization.

below is the prime factorization of the number of thoroughbred names.  it is a multiple of 26, as expected.

? print(factor(sf(18)))
[2, 1; 3, 7; 13, 1; 19, 1; 43, 1; 79, 1; 14957350184566129, 1]

we can also factor the polynomial.  is it interesting that (x+1)^2 is a factor?

? factor(sf(18))

[x 1]

[x + 1 2]

[x^15 + 15*x^14 + 90*x^13 + 275*x^12 + 453*x^11 + 405*x^10 + 223*x^9 + 117*x^8 + 54*x^7 + 31*x^6 + 12*x^5 + 9*x^4 + 2*x^3 + 3*x^2 + 1 1]

next, we do calculations including the empty string, because that is the computer-science right thing to do.

prime factorization of the integer:

? x=26;
? print(factorint(1+sf(18)))
[5, 1; 17, 1; 199, 1; 3245293188764465061841, 1]

the polynomial is irreducible:

? kill(x)
? factor(1+sf(18))

[x^18 + 17*x^17 + 121*x^16 + 470*x^15 + 1093*x^14 + 1586*x^13 + 1486*x^12 + 968*x^11 + 511*x^10 + 256*x^9 + 128*x^8 + 64*x^7 + 32*x^6 + 16*x^5 + 8*x^4 + 4*x^3 + 2*x^2 + x + 1 1]

irreducibility is common.  string lengths which result in an irreducible polynomial:

? for(i=0, 100, if(polisirreducible(1+sf(i)), print(i)))

1 2 3 5 6 8 9 11 12 14 15 17 18 20 21 23 24 26 27 29 30 32 33 35 36 38 39 41 42 44 45 47 48 50 51 53 54 56 57 59 60 62 63 65 66 68 69 71 72 74 75 77 78 80 81 83 84 86 87 89 90 92 93 95 96 98 99

complement, i.e., the polynomial can be factored:

? for(i=0, 100, if(!polisirreducible(1+sf(i)), print(i)))

0 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100

returning to 18-character strings, alphabet sizes which result in a prime number:

? for(x=0, 2000, y=eval(1+sf(18)); if(ispseudoprime(y), print(x)))

69 110 194 330 444 560 714 890 1605 1940

future work: permit balanced parentheses in a sentence.

No comments :