Wednesday, August 20, 2008

Hash of words

Take a word, generate its primary Double Metaphone code (in uppercase). Take the MD5 hash of the code. Drop the least significant 8 bits, then take the 10 least significant bits remaining.

In reverse, here's a dictionary generated by this perl script run over Linux's /usr/share/dict/words. This script avoids one-syllable words, similar words, and words with short codes. With such a limited source word list, we must use bits 110-119 (0-indexed) to achieve full coverage of the all 10-bit-wide bitstrings. Other positions don't quite cover.

Sadly, speaking 100 words per minute works out to 16 bits per second, and even less if an error-correcting code is used.

No comments :