Unary can encode arbitrarily large non-negative integers. Adding a trailing 0 allows encoding a sequence of numbers by concatenating their representations. Let's call the following encoding Encoding 1.
x | Encoding 1 | Bit length |
---|---|---|
0 | 0 | 1 |
1 | 10 | 2 |
2 | 110 | 3 |
3 | 1110 | 4 |
4 | 11110 | 5 |
5 | 111110 | 6 |
6 | 1111110 | 7 |
7 | 11111110 | 8 |
8 | 111111110 | 9 |
9 | 1111111110 | 10 |
Unary takes up a lot of space; binary is better. For arbitrary sized integers (and especially, to encode a collection of integers), we need a way of encoding the length of the binary portion. This can be done by prefixing a header which denotes the length of the following binary portion, with the header encoded in Encoding 1. The header is highlighted in bold below. We form all possible bit strings of this format and line them up with the natural numbers to yield a bijection. Let's call this combined scheme Encoding 2.
x | Encoding 2 | Bit length |
---|---|---|
0 | 0 | 1 |
1 | 100 | 3 |
2 | 101 | 3 |
3 | 11000 | 5 |
4 | 11001 | 5 |
5 | 11010 | 5 |
6 | 11011 | 5 |
7 | 1110000 | 7 |
8 | 1110001 | 7 |
9 | 1110010 | 7 |
10 | 1110011 | 7 |
11 | 1110100 | 7 |
12 | 1110101 | 7 |
13 | 1110110 | 7 |
14 | 1110111 | 7 |
15 | 111100000 | 9 |
30 | 111101111 | 9 |
31 | 11111000000 | 11 |
63 | 1111110000000 | 13 |
127 | 111111100000000 | 15 |
255 | 11111111000000000 | 17 |
Because we reset the binary portion back to 000...0 for each new length, converting back and forth from a number x to its encoding is a little bit tricky.
Encoding 2 still takes up quite a bit of space, with the header consuming over half the length of an encoded number. We can improve this by recursion: encode the header (highlighted in bold below) in Encoding 2 (instead of Encoding 1), then follow it with a binary portion as before. Let's call this combined scheme Encoding 3.
x | Encoding 3 | Bit length |
---|---|---|
0 | 0 | 1 |
1 | 1000 | 4 |
2 | 1001 | 4 |
3 | 10100 | 5 |
4 | 10101 | 5 |
5 | 10110 | 5 |
6 | 10111 | 5 |
7 | 11000000 | 8 |
8 | 11000001 | 8 |
9 | 11000010 | 8 |
10 | 11000011 | 8 |
11 | 11000100 | 8 |
12 | 11000101 | 8 |
13 | 11000110 | 8 |
14 | 11000111 | 8 |
15 | 110010000 | 9 |
31 | 1101000000 | 10 |
63 | 11011000000 | 11 |
127 | 11100000000000 | 14 |
255 | 111000100000000 | 15 |
We can recurse until infinity, with Encoding N having its header encoded in Encoding N-1. There is always an integer large enough such that Encoding N saves space over N-1. (Practically, there is no benefit beyond Encoding 4.). Translating back and forth efficiently from a (large) number and its encoding continues to be a little tricky.
This scheme is similar to encoding the length of the length of a number in decimal.
Next, we can recurse "beyond infinity". For each bit string in Encoding N, prefix it with N encoded in Encoding 1. This prefix is highlighted in bold below, demonstrating prefixing on Encoding 1, 2, and 3. (We had arbitrarily chosen for the Encoding indices to start from 1 not 0 to preserve the elegance of encoding 0 as just 0.)
[100 1010 10110 101110 1011110 10111110 ...]
[1100 110100 110101 11011000 11011001 11011010 ...]
[11100 11101000 11101001 111010100 111010101 111010110 ...]
...
Sort the elements of this infinite list of infinite lists of bit strings first by bit length then lexicographically. Call this Encoding Omega (inspired by ordinal numbers, where omega is the first number beyond all the finite numbers).
x | Encoding Omega | Bit length |
---|---|---|
0 | 0 | 1 |
1 | 100 | 3 |
2 | 1010 | 4 |
3 | 1100 | 4 |
4 | 10110 | 5 |
5 | 11100 | 5 |
6 | 101110 | 6 |
7 | 110100 | 6 |
8 | 110101 | 6 |
9 | 111100 | 6 |
10 | 1011110 | 7 |
11 | 1111100 | 7 |
12 | 10111110 | 8 |
13 | 11011000 | 8 |
14 | 11011001 | 8 |
15 | 11011010 | 8 |
16 | 11011011 | 8 |
17 | 11101000 | 8 |
18 | 11101001 | 8 |
19 | 11111100 | 8 |
20 | 101111110 | 9 |
21 | 111010100 | 9 |
22 | 111010101 | 9 |
23 | 111010110 | 9 |
24 | 111010111 | 9 |
25 | 111111100 | 9 |
26 | 1011111110 | 10 |
27 | 1101110000 | 10 |
28 | 1101110001 | 10 |
29 | 1101110010 | 10 |
30 | 1101110011 | 10 |
31 | 1101110100 | 10 |
32 | 1101110101 | 10 |
33 | 1101110110 | 10 |
34 | 1101110111 | 10 |
35 | 1111010000 | 10 |
36 | 1111010001 | 10 |
37 | 1111111100 | 10 |
38 | 10111111110 | 11 |
39 | 11110100100 | 11 |
40 | 11110100101 | 11 |
41 | 11110100110 | 11 |
42 | 11110100111 | 11 |
43 | 11111111100 | 11 |
44 | 101111111110 | 12 |
45 | 110111100000 | 12 |
46 | 110111100001 | 12 |
47 | 110111100010 | 12 |
48 | 110111100011 | 12 |
49 | 110111100100 | 12 |
50 | 110111100101 | 12 |
51 | 110111100110 | 12 |
52 | 110111100111 | 12 |
53 | 110111101000 | 12 |
54 | 110111101001 | 12 |
55 | 110111101010 | 12 |
56 | 110111101011 | 12 |
57 | 110111101100 | 12 |
58 | 110111101101 | 12 |
59 | 110111101110 | 12 |
60 | 110111101111 | 12 |
61 | 111011000000 | 12 |
62 | 111011000001 | 12 |
63 | 111011000010 | 12 |
64 | 111011000011 | 12 |
65 | 111011000100 | 12 |
66 | 111011000101 | 12 |
67 | 111011000110 | 12 |
68 | 111011000111 | 12 |
69 | 111110100000 | 12 |
70 | 111110100001 | 12 |
71 | 111111111100 | 12 |
72 | 1011111111110 | 13 |
73 | 1110110010000 | 13 |
74 | 1110110010001 | 13 |
75 | 1110110010010 | 13 |
76 | 1110110010011 | 13 |
77 | 1110110010100 | 13 |
78 | 1110110010101 | 13 |
79 | 1110110010110 | 13 |
80 | 1110110010111 | 13 |
81 | 1110110011000 | 13 |
82 | 1110110011001 | 13 |
83 | 1110110011010 | 13 |
84 | 1110110011011 | 13 |
85 | 1110110011100 | 13 |
86 | 1110110011101 | 13 |
87 | 1110110011110 | 13 |
88 | 1110110011111 | 13 |
89 | 1111010100000 | 13 |
90 | 1111010100001 | 13 |
91 | 1111010100010 | 13 |
92 | 1111010100011 | 13 |
93 | 1111010100100 | 13 |
94 | 1111010100101 | 13 |
95 | 1111010100110 | 13 |
96 | 1111010100111 | 13 |
97 | 1111101000100 | 13 |
98 | 1111101000101 | 13 |
99 | 1111101000110 | 13 |
100 | 1111101000111 | 13 |
101 | 1111111111100 | 13 |
102 | 10111111111110 | 14 |
103 | 11011111000000 | 14 |
104 | 11011111000001 | 14 |
184 | 11111101000001 | 14 |
185 | 11111111111100 | 14 |
186 | 101111111111110 | 15 |
187 | 111011011000000 | 15 |
294 | 111111010000111 | 15 |
295 | 111111111111100 | 15 |
296 | 1011111111111110 | 16 |
444 | 10111111111111110 | 17 |
490 | 101111111111111110 | 18 |
830 | 1011111111111111110 | 19 |
1132 | 10111111111111111110 | 20 |
2112 | 101111111111111111110 | 21 |
3566 | 1011111111111111111110 | 22 |
6978 | 10111111111111111111110 | 23 |
12784 | 101111111111111111111110 | 24 |
25412 | 1011111111111111111111110 | 25 |
48626 | 10111111111111111111111110 | 26 |
64326 | 101111111111111111111111110 | 27 |
91636 | 1011111111111111111111111110 | 28 |
150344 | 10111111111111111111111111110 | 29 |
259574 | 101111111111111111111111111110 | 30 |
518986 | 1011111111111111111111111111110 | 31 |
988664 | 10111111111111111111111111111110 | 32 |
1977164 | 101111111111111111111111111111110 | 33 |
3888634 | 1011111111111111111111111111111110 | 34 |
7777102 | 10111111111111111111111111111111110 | 35 |
15455740 | 101111111111111111111111111111111110 | 36 |
Efficiently translating between a number x and its Encoding Omega seems so difficult that Encoding Omega is probably practically useless.
Going beyond Encoding Omega is possible, choosing a different encoding (instead of Encoding 1) to express the header. Encoding Omega Plus One uses Encoding 2 as the header, and Encoding Omega Plus Omega, i.e., Encoding Two-Omega, uses Encoding Omega for the header.
Here is some Haskell code to (inefficiently) compute these encodings up to and including Encoding Omega. Encoding Omega required merge sorting an infinite list of infinite lists, which would be normally be impossible (when scanning the heads, you never know if you have found the smallest item). However, all bit strings originating from Encoding N (and above) have length at least N+2, so finding all bit strings of a given length can be done in finite time.
No comments :
Post a Comment