Wednesday, December 21, 2022

[duartbli] Digit strings easy to count their length

It is easy to count the length of the following digit strings:

1
12
123
...
123456789
1234567890
12345678901
123456789012
...
1234567890123456789

Strings like these can serve as example input when it is useful to communicate the length of the string but the content of the string is not important, for example, an example block of data encoded in base 10 (future post vkhdrcsg).  Each string documents its own length.

For strings of lengths 20-29, we write the length-10 string above, then ten 2s, then up to 9 digits:

12345678902222222222
123456789022222222221
1234567890222222222212
...
12345678902222222222123456789

You have to trust that there are ten 2s.  Strings of length 30-39 follow the pattern:

123456789022222222223333333333
1234567890222222222233333333331
12345678902222222222333333333312
...
123456789022222222223333333333123456789

For expository purposes only, we introduce the shorthand (3x10) to indicate ten 3s.  So the above strings of length 30-39 we express as

1234567890(2x10)(3x10)
1234567890(2x10)(3x10)1
1234567890(2x10)(3x10)12
...
1234567890(2x10)(3x10)123456789

This shorthand will become especially useful when we explain longer strings.

The pattern easily extends up to strings of length 99.

Strings of length 10-19 are not (1x10)12345... because 10 and 11 would be difficult to distinguish:

1111111111 11111111111

Replacing long strings of 1s with ascending digits will also become useful later in creating a "ladder".

The first idea for 100 is (1x100), but that has same difficulty as 10-11 for strings of length 101.  Next we try (1x10)(2x10)(3x10)(4x10)(5x10)(6x10)(7x10)(8x10)(9x10)(0x10).  We then replace the initial (1x10) with 1234567890 as we did for 10-19, for reasons we will explain shortly, yielding

1234567890(2x10)(3x10)(4x10)(5x10)(6x10)(7x10)(8x10)(9x10)(0x10)

For strings of length 101-199, we simply append the strings of length 1-99 to the right of the above string.

The strings of length 200-299 has the string of length 100 from above, then (2x100), then the string of length 1-99.  You have to trust that there are 100 2s.  For example, the string of length 200:

1234567890(2x10)(3x10)(4x10)(5x10)(6x10)(7x10)(8x10)(9x10)(0x10)(2x100)

Then we can similarly construct strings of length 300-999 as we did for 30-99.

The string of length 1000 is

1234567890(2x10)(3x10)(4x10)(5x10)(6x10)(7x10)(8x10)(9x10)(0x10)(2x100)(3x100)(4x100)(5x100)(6x100)(7x100)(8x100)(9x100)(0x100)

One can see the recursive structure. The string of length 1000 has the string of length 100 at its beginning, which in turn has the string of length 10 at its beginning.  This "ladder" of more and more stretched out versions of 1234567890 lets you determine what power of 10 you are dealing with by counting the number of recursions.  If we were not to do this, it would be difficult to distinguish between (say) one thousand and ten thousand 1s.  Once we know what power of 10 we are at -- what rung of the ladder we are at -- we can assume that the following long strings of 2s, 3s, etc. each have the same length.

Here is the string of length 1023, illustrating how nothing interesting happens when the length has an internal zero digit:

1234567890(2x10)(3x10)(4x10)(5x10)(6x10)(7x10)(8x10)(9x10)(0x10)(2x100)(3x100)(4x100)(5x100)(6x100)(7x100)(8x100)(9x100)(0x100)1234567890(2x10)123

The string of length 2000 is

1234567890(2x10)(3x10)(4x10)(5x10)(6x10)(7x10)(8x10)(9x10)(0x10)(2x100)(3x100)(4x100)(5x100)(6x100)(7x100)(8x100)(9x100)(0x100)(2x1000)

There will be multiple ladders, one for each nonzero digit in the length of a string.  The string 1234567890 indicates that the previous digit in the length is complete and a new ladder is beginning.  Here is the string of length 2345:

1234567890(2x10)(3x10)(4x10)(5x10)(6x10)(7x10)(8x10)(9x10)(0x10)(2x100)(3x100)(4x100)(5x100)(6x100)(7x100)(8x100)(9x100)(0x100)(2x1000)1234567890(2x10)(3x10)(4x10)(5x10)(6x10)(7x10)(8x10)(9x10)(0x10)(2x100)(3x100)1234567890(2x10)(3x10)(4x10)12345

If possible, text figures (lowercase numerals) should be used to make it easy to spot the beginning of ladders and where digits change.

Haskell source to construct these strings in any base.  Here are the two key routines:

-- expand by 10
dogrow :: Integer -> Char -> String;
dogrow base '1' = List.genericTake (base-1) positive_digits ++ "0";
dogrow base c = List.genericReplicate base c;

-- process little-endian input one digit at a time
growasdigits :: Integer -> [Integer] -> String;
growasdigits _base [] = "";
growasdigits base (h:t) = (growasdigits base t & concatMap (dogrow base)) ++ List.genericTake h positive_digits;

Future work: parser to verify correctness of a string.

Below is the construction in base 4 instead of base 10, showing strings of length 1 through 70.  The first column is string lengths expressed in base 4.

11
212
3123
101230
1112301
12123012
131230123
2012302222
21123022221
221230222212
2312302222123
30123022223333
311230222233331
3212302222333312
33123022223333123
1001230222233330000
10112302222333300001
102123022223333000012
1031230222233330000123
11012302222333300001230
111123022223333000012301
1121230222233330000123012
11312302222333300001230123
120123022223333000012302222
1211230222233330000123022221
12212302222333300001230222212
123123022223333000012302222123
1301230222233330000123022223333
13112302222333300001230222233331
132123022223333000012302222333312
1331230222233330000123022223333123
20012302222333300002222222222222222
201123022223333000022222222222222221
2021230222233330000222222222222222212
20312302222333300002222222222222222123
210123022223333000022222222222222221230
2111230222233330000222222222222222212301
21212302222333300002222222222222222123012
213123022223333000022222222222222221230123
2201230222233330000222222222222222212302222
22112302222333300002222222222222222123022221
222123022223333000022222222222222221230222212
2231230222233330000222222222222222212302222123
23012302222333300002222222222222222123022223333
231123022223333000022222222222222221230222233331
2321230222233330000222222222222222212302222333312
23312302222333300002222222222222222123022223333123
300123022223333000022222222222222223333333333333333
3011230222233330000222222222222222233333333333333331
30212302222333300002222222222222222333333333333333312
303123022223333000022222222222222223333333333333333123
3101230222233330000222222222222222233333333333333331230
31112302222333300002222222222222222333333333333333312301
312123022223333000022222222222222223333333333333333123012
3131230222233330000222222222222222233333333333333331230123
32012302222333300002222222222222222333333333333333312302222
321123022223333000022222222222222223333333333333333123022221
3221230222233330000222222222222222233333333333333331230222212
32312302222333300002222222222222222333333333333333312302222123
330123022223333000022222222222222223333333333333333123022223333
3311230222233330000222222222222222233333333333333331230222233331
33212302222333300002222222222222222333333333333333312302222333312
333123022223333000022222222222222223333333333333333123022223333123
10001230222233330000222222222222222233333333333333330000000000000000
100112302222333300002222222222222222333333333333333300000000000000001
1002123022223333000022222222222222223333333333333333000000000000000012
10031230222233330000222222222222222233333333333333330000000000000000123
101012302222333300002222222222222222333333333333333300000000000000001230
1011123022223333000022222222222222223333333333333333000000000000000012301
10121230222233330000222222222222222233333333333333330000000000000000123012

No comments :