Monday, November 07, 2016

[wisnmyni] Data as a number

Convert a number with possibly many leading zeroes in base M to base N, prefixing the base N output representation with leading zeroes in a way that unambigiously specifies the number of leading zeroes in base M input.  I think this is possible when M > N.  Some potentially useful conversions:

(M=16, N=10); (M=256, N=10); (M=256, N=100); (M=10; N=2)

The deeper idea is, whenever a string of characters represents raw data and not a word in a natural language, it should be encoded so that it is clear that the string is not meant to be interpreted as a word in natural language.  Unannotated hexadecimal fails this rule: if you see the string "deadbeef", is it a word, with a meaning perhaps related to food, or is it a number?  (Annotated hexadecimal, e.g., "0xdeadbeef" is clearly not a word.)  Simpler example: what does "a" mean, indefinite article or ten in hexadecimal?  English, and other orthographies which use Hindu-Arabic numerals, already have a character set which unambiguously state that a string encodes data and not words: the numerals.  (One could argue that numbers -- strings of Hindu-Arabic numerals -- have more meaning than strictly data: they have ordinality, they obey group and field axioms, etc.  However, we frequently see numbers which aren't that, e.g., serial numbers, phone numbers, ZIP codes, ID and credit card numbers.)

The inspiration was, expressing hashes in hexadecimal is sillyRadix conversion is quick and easy for a computer; it should be in decimal with leading zeroes if necessary.  If making the hash compact is a goal, then base 26 or base 95, with some annotation signifying it is encoded data, is better than hexadecimal.

Some Haskell code demonstrating some of the conversions.

No comments :