Wednesday, April 20, 2016

[pcpzbfob] Lexicographic numbers

Given a set of numbers, strictly speaking, digit sequences because 01 and 1 are distinct, sort them lexicographically.  Unfortunately, in certain places, one cannot insert a new number.

It is impossible to insert anything before 0.  It is impossible to insert anything between 1 and 10.

We assume these numbers are keys to a map-like data structure, and we would always like to be able to insert anywhere.  Real numbers would work, except concrete implementations like IEEE 754 do have values which are minimum and also have pairs of numbers with nothing in between.

Fix this by considering digit sequences contrained to end in 5.

Decreasing sequence: 5 > 25 > 15 > 05 > 025 > 015 > 005 ...
Increasing sequence: 5 < 85 < 95 < 985 < 995 < 9985 ...

Equivalently, assume that every number invisibly has a 5 appended to it.  We can easily insert things before 0:  000 < 001 < 002 < 00 < 01 < 02 < 0.  This part resembles women's clothing sizes, or machined part sizes, with longer strings of zeroes meaning smaller.

Confusingly, 10 < 1.  Numbers sort like this: 10, 11, 12, 13, 14, 1, 15, 16.  One can see there are plenty of things that can be inserted between 10 and 1.  In the close neighborhood of 1 is 14 < 145 < 149 < 1 < 1502 < 150 < 152 < 15.  We can also insert things before 10.  0 < 05 < 06 < 09 < 100 < 104 < 10.

The empty string curiously sorts between 4 and 5, though perhaps it should be forbidden.

The invisible 5 at the end sometimes acts like a digit whose value is in between 4 and 5, because X4999999... < X < X5000000...

Compute the shortest number between two given numbers, or smaller or larger than a given number.  If there are many such shortest numbers, determine the middle one.  1 "" 7 8 9 97 98 99 997 998 999 ...  But when counting up, it makes more sense to compute the least shortest number: 1 "" 5 6 7 8 9 95 96 97 98 99 995 996 997 998 999 ...

I had a few false starts before discovering this scheme and am not completely convinced the scheme doesn't have problems.

The idea could be extended to sorting words with letters, even compound words with spaces in which the space character lexicographically sorts before a.  Let M be the invisibly appended letter, analogous to 5.

For binary, we need at least two invisible appended digits, 01 or 10.  For balanced ternary, I think the invisible digit could elegantly be 0.

For the purpose of creating numbers which always have another number in between, and larger, and smaller, arbitrary precision real numbers, including negative numbers and arbitrary precision after the decimal point, would work just as well and be less confusing.  Though sorting seems harder, needing to first locate the decimal point.

No comments :