Tuesday, January 31, 2006

Simple speech recognition

Suppose you want to assign names to commands for simple (look up the word) speech commands. You have a bunch of candidate spoken names for each command.

Consider a set of set of nodes, with a distance function defined between all pairs. Choose one exactly node from each set, maximizing the minimum distance between two chosen nodes, and also maximizing the second smallest distance after that, and so forth.

It becomes more interesting if arbitrary constraints across sets (these nodes must all be chosen together) are possible; multiple languages for the purpose of disambiguation.

Talk to your phone

When your smart phone requires a speech recognition, it could off-load the speech recognition computation task to a more powerful computer (than the phone) elsewhere.

Friday, January 27, 2006

bzip2 is worse backwards

Reversing English text causes bzip2 to perform slightly worse. Does this indicate the English is better predicted by the following string than the preceding string?

2014913 1999.08.wm.a 2036979 1999.08.wm.b 780095 2000.01.wm.a 788875 2000.01.wm.b 808188 2000.09.wm.a 816975 2000.09.wm.b 873623 2001.02.wm.a 884963 2001.02.wm.b 691899 2001.09.wm.a 699738 2001.09.wm.b 519555 2002.02.wm.a 525283 2002.02.wm.b 878585 2002.09.wm.a 888467 2002.09.wm.b 564562 2003.02.wm.a 570720 2003.02.wm.b 801120 2003.09.wm.a 809459 2003.09.wm.b 755611 2004.02.wm.a 763328 2004.02.wm.b 1172502 2004.09.wm.a 1185160 2004.09.wm.b 1066360 2005.02.wm.a 1076694 2005.02.wm.b 1300432 2005.09.wm.a 1314282 2005.09.wm.b 509209 now.wm.a 514840 now.wm.b

bw cipher

How can the predictive properties of Burrows-Wheeler transform be used to make a substitution cipher that foils frequency analysis?

Probably right after Move-To-Front (MTF). Arithmetic coding or partial squares for growth ratios less than 2. (For 2, use Poe's method of randomly choosing among many codes for "1", etc.) Order-0 coding is sufficient? (Is there alternation after BWT?)

quoted regexp bash

escaped (and doubly escaped) symbols for sh for inside a perl regular expression

\! \" \# \\\$ \\% \\\& \' \\\( \\\) \\\* \\+ , - \\. \\/ : \; \< = \> \\\? \\@ \\\[ \\\\ \\\] \\^ _ \` \{ \\\| \} \~

Factor -> Discrete Log?

Given an efficient factoring oracle, can one compute discrete logarithms quickly?

sqashing bigrams

perl -pwe 's/the/A/g; s/in/B/g; s/re/C/g; s/that/D/g; s/on/E/g; s/an/F/g; s/er/G/g; s/ou/H/g; s/it/I/g; s/en/J/g; s/st/K/g; s/or/L/g; s/Bg/M/g; s/al/N/g; s/at/O/g; s/to/P/g; s/is/Q/g; s/as/R/g; s/ar/S/g; s/le/T/g; s/ed/U/g; s/th/V/g; s/es/W/g; s/of/X/g; s/ic/Y/g; s/ly/Z/g'

#! perl -pw s/\Q th/A/g; s/\Qe /B/g; s/\Qt /C/g; s/\Qs /D/g; s/\Qin/E/g; s/\Qer/F/g; s/\Q a/G/g; s/\Qon/H/g; s/\Qd /I/g; s/\QAB/J/g; s/\Qto /K/g; s/\Qy /L/g; s/\Qen/M/g; s/\Qou/N/g; s/\QEg /O/g; s/\Qor/P/g; s/\Q. /Q/g; s/\Qan/R/g; s/\Qre/S/g; s/\Q, /T/g; s/\Qat/U/g; s/\Qit/V/g; s/\Qal/W/g; s/\Qar/X/g; s/\QAaC/Y/g; s/\Qst/Z/g;

Is greedy suboptimal?

Thursday, January 26, 2006

Punctuation Frequences

Punctuation Frequences

perl -nlwe 'next if /^Auth/ or /^From:/;print' wm| sort | uniq -c | perl -nlwe 'die unless /^\s*(\d+) (.*)/;print $2 if $1==1' | perl -plwe 's/[A-Za-z0-9 ]//g' | perl -nlwe '@F=split//;print for @F'| sort | uniq -c | sort -nr > www/one/punctuation-frequencies

There really are 27 letters in the alphabet (symbols used to form words), including apostrophe.

Word frequencies:
perl -nlwe 'next if /^Auth/ or /^From:/;print' wm| sort | uniq -c | perl -nlwe 'die unless /^\s*(\d+) (.*)/;print $2 if $1==1' | perl -plwe 's/http\S+//g'|perl -plwe '$_=lc;s/[^a-z0-9\x27]/ /g' | perl -nlwae 'for(@F){s/^\x27*//;s/\x27*$//;print}' | sort|uniq -c | sort -nr |

Wednesday, January 25, 2006

large prime groups

http://www.ietf.org/internet-drafts/draft-ietf-tls-srp-10.txt

RFC 3526

alphabet simplification

Attempts to make an alphabet simpler deprives words of their etymological, as well as other, meaning, hidden in the weird spellings. For example, replacing "ph" with "f" removes the Greek origin, potentially causing ambiguity (collision) between the "f" and (formerly) "ph" versions of the word. Korean suffers from this, where they still need to occasionally use Chinese characters to disambiguate words (usually names) with the same pronunciation.

Huffman, aithmetic, etc. coding is instead the way.

Monday, January 23, 2006

JVM extension

It occurs to me that the JVM specification might be able to add registers while still maintaining backward compatibility, i.e., the new JVM can still run old class files. Older programs simply need not use the registers. The main problem might a completely full opcode space (abuse opcode 186?).

Of course, since a class file has version numbers in it, one could also simply scrap everything and define a new JVM, ( What would it be like, from lessons learned from the current JVMs?

And on a slightly unrelated note, what JITs use escape analysis to allocate objects on the stack instead of the heap? Are stack allocation and registers the main impediments to Java runtime performance?

(Hmm, Java explicitly forbids self-modifying code.)

Saturday, January 21, 2006

prime multipliers small subgroup

a(n)=2*p*a(n-1)+1 where p is unity or odd prime.

q=3; for(j=1,1000, forprime(ii=1,500000, if(ii==2,i=2,i=2*ii); p=q*i+1; if(isprime(p)==1, print(j," ",i," ",log(p)/log(2));break)); q=p)

1 3 5 1 3 17 13 3 17 277 157 3 11 19 67 307 3 11 673 37 43 181 211 409 127 877 397 13 31 19 337 103 199 1 433 3631 643 1597 4561 337 4861 571 709 283 1861 457 907 271 2617 6529 3847 61 127 883 3121 4813 2017 4831 1699 109 3793 823 739 1933 2659 577 4567 271 73 1597 2371 5737 4651 601 631 4057 8329 4273 787 5527 1237 2203 8389 307 3 1811 6607 1783 103 3907 15643 223 61 6271 67 2377 5527 2677 2131 8011 16267 1747 1201 3307 1033 6823 151 1327 14557 6361 79 3943 5431 8929 3037 8707 1933 7177 3469 24007 6397 1459 109 5827 157 379 9433 24631 4423 181 283 6967 4261 3019 9949 3109 3469 859 5407 4999 2347 3253 18289 1627 5347 2131 823 1303 151 157 7 3697 7867 2887 409 1123 11887 18457 10723 8317 4987 577 1129 487 541 6691 16057 3313 2467 13417 8287 3511 97 23719 13 12721 1579 2137 3877 733 1489 16333 1 2311 331 79 8461 1063 38653 8431 10477 5059 21379 1153 23131 13 1741 57037 13177 30253 9697 2671 829 8863 5113 12781 8269 31327 457 12421 1567 3319 6121 13933 2551 12253 42751 3499 139 13879 52807 13327 8599 16879 5077 7699 58153 12277 2797 24229 23773 10687 8089 4423 5701 3559 8893 15307 4567 17467 193 15139 26371 11083 43 23017 12919 14827 27409 17923 10477 16699 1879 787 11491 4957 34273 9967 5923 2719 421 20287 3559 10093 30181 43201 11941 6991 3259 8863 17137 6163 11953 54121 2239 7507 613 17929 4561 46819 10399 35089 34147 15139 30223 5557 7489 30427 661 709 2143 21187 86869 157 4861 11527 41113 907 1951 13513 2131 12853 997 55663 9649 67843 12781 12967 7369 29383 15823 22291 1249 6373 3331 9931 19231 42577 32479 19087 8287 18493 6397 4093 57457 1543 21871 27397 457 45013 7057 1747 7603 14197 28393 11551 10453 10831 27457 421 56599 9001 2089 11827 5689 14143 22639 8011 17749 3727 11839 643 53197 20323 6961 20287 739 30577 7717 25537 17011 619 499 54583 34603 7621 229 2161 61057 35869 4327

Ending at 4548 bit prime.

Thursday, January 19, 2006

factoring Sylvester's sequence

Sylvester's Sequence -- From MathWorld

? a=2;for(i=0,10,print(i," ",factor(a));a=a*a-a+1)
0 Mat([2, 1])
1 Mat([3, 1])
2 Mat([7, 1])
3 Mat([43, 1])
4 [13, 1; 139, 1]
5 Mat([3263443, 1])
6 [547, 1; 607, 1; 1033, 1; 31051, 1]
7 [29881, 1; 67003, 1; 9119521, 1; 6212157481, 1]
8 [5295435634831, 1; 31401519357481261, 1; 77366930214021991992277, 1]
9 [181, 1; 1987, 1; 112374829138729, 1; 114152531605972711, 1; 35874380272246624152764569191134894955972560447869169859142453622851, 1]

10 does not look promising: small factors 2287 and 2271427 and large cofactor taken out to B1=457000 in ecm -one -I 1 -c 0 1. The algebraicity might help: it's one plus the product of all the factors seen so far. At 199-digits, or 661 bits, it's just barely within GNFS.

Update: Sylvester 10th factored

A091335

Wednesday, January 18, 2006

Proof that 78557 is a Sierpinski Number

Proof that 78557 is a Sierpinski Number

One can probably make a pretty picture that suggests 78557*2^k repeats with period 36. I wonder if a repeat can be detected visually on any of the unsolved smaller numbers.

two power prime sequence exponents

A084435 may not be infinite, if it hits a Sierpinski number. The exponents I have are: 0,1,2,1,5,1,1,29,3,37,31,227,835,115, and then it gets stuck. So the 1290-bit 389 digit prime 19668389864872500557560203790497858277784492418873798407026612403944566893365351322116868514901310836920700982678919663050434757651618869728107413733749722845223292398631963057602012174833030924955644631702803467805757838961272217158156264530315202755743516127681944466901139382567960961717373223867237148005010940960999146246286142747282353242557953406503238201229315676231109030713491457 might be special.

Update: the next term is 7615.

The term after that is greater than 4165 (giving up).

A113767

prime multiple

Base number p_0. p_i+1=n*p_i+1 s.t. p is prime, n minimized.

for p_0=3, sequence continues 3 7 29 59 709 2837 22697 590123 1180247 9441977 169955587 2719289393 5438578787 32631472723 391577672677 1566310690709 50121942102689 1503658263080671 9021949578484027 360877983139361081 21652678988361664861 476358937743956626943 5716307252927479523317

Multipliers: 2 4 2 12 4 8 26 2 8 18 16 2 6 12 4 32 30 6 40 60 22 12 208 18 48 168 18 76 6 232 48 4 50 78 28 236 68 1026 10 162 6 138 162 48 120 330 82 414 130 188 164 6 6 100 126 10 36 40 48 94 14 218 954 132 18 174 18 796 2 750 132 114 36 328 144 34 572 416 396 360 180 400 30 382 534 406 348 250 270 522 76 168 256 14 956 974 570 700 308 122 1082 14 138 1230 588 478 2292 24 120 888 820 236 854 554 1824 520 92 810 1300 98 390 1042 644 680 170 12 366 918 202 56 542 408 258 118 90 342 910 192 1074 270 1546 1568 686 182 128 1550 656 620 1440 106 1086 636 352 300 90 432 70 36 2634 42 480 838 434 3230 1098 430 398 210 316 50 1380 1108 300 28 1422 30 2116 1130 48 624 28 218 308 1190 1158 18 42 816 2344 314 696 42 712 618 1606 3012 1014 876 810 2470 458 290 2022 496 1950 958 2300 798 1330 1772 230 2240 3036 3478 2378 1670 140 1032 366 484 2066 372 496 24 5766 3180 906 6 300 1500 346 264 4378 1302 394 242 3558 2772 448 558 3376 182 288 300 4086 1414 932 1212 2062 3668 890 2702 686 2208 1074 1084 404 30 58 1692 3942 390 612 2076 2692 540 142 1640 590 306 1974 534 1348 44 2642 330 1158 364 2738 1110 418 2400 1768 4142 3774 1468 1328 566 1542 1026 580 1238 1064 494 1454 894 184 1070 3800 4476 3096 192 756 2626 2730 4552 2532 216 4392 1420 1742 1184 174 2872 428 14336 620 1748 818 302 1998 2898 2094 1594 1310 80 758 180 952 2964 706 1898 806 1190 2528 1040 138 1420 3242 786 378 2472 1978 6878 1178 8844 1090 2768 5220 90 1030 1370 542 1524 1344 700 1656 1374 2344 5888 320 3416 1806 1564 5400 1510 1476 4420 3798 7680 1242 2014 1770 112 596 152 1430 7652 404 1766 738 2946 4294 1814 1482 6090 210 6762 4140 366 1786 2286 1498 66 862 2442 5940 1722 892 1046 4650 1110 1270 312 8260 3032 2396 8648 2304 484 2930 1260 1956 5110 308 2670 4348 3272 428 3486 5478 96 90 2286 3040 530 10688 734 2460 526 3450 1248 58 776 578 432 3354 2244 1456 2940 190 2522 36 8370 5086 4440 5452 4650

A061092, A113335

Tuesday, January 17, 2006

brightness by spectrum

For each frequency across the EM spectrum, what is the brightest object in the sky? (Outside of solar system, with special note if things outside the solar system outshine the sun.)

Crab nebula, Markarian 501, Scorpius X-1, Puppis A, Vela supernova remnant, Sirius, Betelgeuse, R Doradus, Eta Carinae, Cassiopeia A, (maybe Cygnus A?), inner galaxy

GRB

I hadn't realized astronomers call the constellation Scorpius, but astrologers Scorpio.

Black Holes

Milky Way black hole estimated 5e+36 kg

M87 black hole estimated 4e+39 kg

(millions vs billions of "sunmass")

M87 black hole is probably one of the largest single things in the universe.

Monday, January 16, 2006

atom smasher fusion

Why not use particle accelerator technology to drive fusion: e.g., beams of highly accelerated protons into a hydrogen target, or beams in opposite directions? Seems simple, but more importantly, a mature, well-understood technology.

No supernovae in M87

M87 is a large galaxy with old stars. Why aren't supernovae nearly constantly popping? (Last observed 1919.)

Friday, January 13, 2006

Mailing list to blog syndication

A mailing list (of, say, announcements) can easily become an RSS feed via blogger's email-to-post gateway and site feed capability.

Wednesday, January 11, 2006

three-factor authentication

Consider a ATM card with a (preferably large) photo.

1. Something you have: the card

2. Something you know: the PIN

3. Something you are: the photo

prime binary approximant sqrt 2

echo 'scale=1000; obase=2; sqrt(2)/2' | bc | perl -pwe 's/\.//; s/\\\n//' | perl -nlwe 'BEGIN {print "ibase=2"} for($i=0; $i<length; $i++) { $x=substr $_,$i,1; if ($x eq "1") {print substr $_,0,($i+1)}}' | bc | perl -pwe 's/\\\n//' | perl -nlwe 'print "isprime($_)"' | athrun gp | grep '1$'

%2 = 1
%3 = 1
%5 = 1
%7 = 1
%8 = 1
%11 = 1
%41 = 1
%63 = 1
%67 = 1
%73 = 1
%642 = 1
%892 = 1
%1594 = 1

Monday, January 09, 2006

Black hole split

Under what conditions, or modifications to laws of physics, can a black hole be split in two? Neutron stars?

--

Mobile Email from a Cingular Wireless Customer http://www.cingular.com

increment

javascript:(
function(){ 
   var e,s; 
   IB=1; 
   function isDigit(c) { 
      return ("0" <= c && c <= "9") 
   } 
   L = location.href; 
   LL = L.length; 
   for (e=LL-1; e>=0; --e) 
     if (isDigit(L.charAt(e))) { 
       for(s=e-1; s>=0; --s) 
         if (!isDigit(L.charAt(s))) 
           break; 
       break; 
     } 
   ++s; 
   if (e<0) return; 
   oldNum = L.substring(s,e+1); 
   newNum = "" + (parseInt(oldNum,10) + IB); 
   while (newNum.length < oldNum.length) 
     newNum = "0" + newNum; 
   location.href = L.substring(0,s) + newNum + L.slice(e+1); 
}
)();

Saturday, January 07, 2006

safe prime 3072-bit

The largest "safe prime" less than 2^3072 is 2^3072-1103717. 2 is the minimum primitive root.

Semantic Selection in web documents

Use semantic markers such as quotation marks, punctiation, tag boundaries, and natural language processing to allow smart selection of a phrase in text. Possibly via right-click context menu, e.g., click on the word "semantic" in the first sentence, one of the choices is "Web search for the noun-phrase ``semantic markers''".

UI problem of multiple possible selections around a point.

Friday, January 06, 2006

New Mersenne prime

2^30402457-1 is prime. The factors of the predecessor of the exponent are 30402456 = 2 * 2 * 2 * 3 * 7 * 37 * 67 * 73

Relatively smooth, again.

Thursday, January 05, 2006

binary constants

Some constants related to the number 2, in binary:

2 = 10

1/2 = 0.1

sqrt(2) = 1.01101010000010011110011001100111111100111011110011001001

sqrt(1/2) = 0.101101010000010011110011001100111111100111011110011001001

2^sqrt(2) = 10.1010101001000110111000101111001111111011000000000110001011

sqrt(2)^sqrt(2) = 1.1010000111101101010010001100000011010011100101010000111001

log(2) = .10110001011100100001011111110111110100011100111101111001101

1/log(2) = 1.011100010101010001110110010100101011100000101111111

log(log(2)) = -0.010111011101001111001010011011110111010110101110011110101

(log 2)^2 = 0.01111010111111101111011111111110000010110001011000111010101

phi = (sqrt(2^2+1)-1)/2 = .100111100011011101111001101110010111111101001010011111

wmf, per-program permissions

The WMF exploit makes it more clear why we need to have per-program permissions. This, say, web browser has read permissions to the preferences directory, read-write only to the cache directory, and insert (no read, no overwrite) to the downloads directory. And no permissions, read or write, to anywhere the user (usually a Windows superuser) has access to, unless specific case-by-case authorization by the user by dialog box.

At the same time, more compartmentalized (package-ized) software installs.

Monday, January 02, 2006

Countdown to 2038

#! perl -wl
$t=time;
$_=unpack("B*",pack "N",$t);
s/^0//;$d=2147483_647-$t;
$d++;
print "Unix time in binary is $_.";
print "There are $d seconds remaining."

On Sat Feb 25 06:58:08 2006, the counter will hit 1000100000000000000000000000000, only two bits set. On Sat May 13 01:27:28 2006, there will be only 1 billion seconds remaining.

I would like to see, as a work of art, a mechanical binary clock which counts forwards, with a way of a indicating a connection being made when the clock hits 31 ones, perhaps an electrical connection to a bomb. Lights for the consecutive low-order bits that are positive.