## Wednesday, December 21, 2022

### [duartbli] Digit strings easy to count their length

It is easy to count the length of the following digit strings:

1
12
123
...
123456789
1234567890
12345678901
123456789012
...
1234567890123456789

Strings like these can serve as example input when it is useful to communicate the length of the string but the content of the string is not important, for example, an example block of data encoded in base 10 (future post vkhdrcsg).  Each string documents its own length.

For strings of lengths 20-29, we write the length-10 string above, then ten 2s, then up to 9 digits:

12345678902222222222
123456789022222222221
1234567890222222222212
...
12345678902222222222123456789

You have to trust that there are ten 2s.  Strings of length 30-39 follow the pattern:

123456789022222222223333333333
1234567890222222222233333333331
12345678902222222222333333333312
...
123456789022222222223333333333123456789

For expository purposes only, we introduce the shorthand (3x10) to indicate ten 3s.  So the above strings of length 30-39 we express as

1234567890(2x10)(3x10)
1234567890(2x10)(3x10)1
1234567890(2x10)(3x10)12
...
1234567890(2x10)(3x10)123456789

This shorthand will become especially useful when we explain longer strings.

The pattern easily extends up to strings of length 99.

Strings of length 10-19 are not (1x10)12345... because 10 and 11 would be difficult to distinguish:

1111111111 11111111111

Replacing long strings of 1s with ascending digits will also become useful later in creating a "ladder".

The first idea for 100 is (1x100), but that has same difficulty as 10-11 for strings of length 101.  Next we try (1x10)(2x10)(3x10)(4x10)(5x10)(6x10)(7x10)(8x10)(9x10)(0x10).  We then replace the initial (1x10) with 1234567890 as we did for 10-19, for reasons we will explain shortly, yielding

1234567890(2x10)(3x10)(4x10)(5x10)(6x10)(7x10)(8x10)(9x10)(0x10)

For strings of length 101-199, we simply append the strings of length 1-99 to the right of the above string.

The strings of length 200-299 has the string of length 100 from above, then (2x100), then the string of length 1-99.  You have to trust that there are 100 2s.  For example, the string of length 200:

1234567890(2x10)(3x10)(4x10)(5x10)(6x10)(7x10)(8x10)(9x10)(0x10)(2x100)

Then we can similarly construct strings of length 300-999 as we did for 30-99.

The string of length 1000 is

1234567890(2x10)(3x10)(4x10)(5x10)(6x10)(7x10)(8x10)(9x10)(0x10)(2x100)(3x100)(4x100)(5x100)(6x100)(7x100)(8x100)(9x100)(0x100)

One can see the recursive structure. The string of length 1000 has the string of length 100 at its beginning, which in turn has the string of length 10 at its beginning.  This "ladder" of more and more stretched out versions of 1234567890 lets you determine what power of 10 you are dealing with by counting the number of recursions.  If we were not to do this, it would be difficult to distinguish between (say) one thousand and ten thousand 1s.  Once we know what power of 10 we are at -- what rung of the ladder we are at -- we can assume that the following long strings of 2s, 3s, etc. each have the same length.

Here is the string of length 1023, illustrating how nothing interesting happens when the length has an internal zero digit:

1234567890(2x10)(3x10)(4x10)(5x10)(6x10)(7x10)(8x10)(9x10)(0x10)(2x100)(3x100)(4x100)(5x100)(6x100)(7x100)(8x100)(9x100)(0x100)1234567890(2x10)123

The string of length 2000 is

1234567890(2x10)(3x10)(4x10)(5x10)(6x10)(7x10)(8x10)(9x10)(0x10)(2x100)(3x100)(4x100)(5x100)(6x100)(7x100)(8x100)(9x100)(0x100)(2x1000)

There will be multiple ladders, one for each nonzero digit in the length of a string.  The string 1234567890 indicates that the previous digit in the length is complete and a new ladder is beginning.  Here is the string of length 2345:

1234567890(2x10)(3x10)(4x10)(5x10)(6x10)(7x10)(8x10)(9x10)(0x10)(2x100)(3x100)(4x100)(5x100)(6x100)(7x100)(8x100)(9x100)(0x100)(2x1000)1234567890(2x10)(3x10)(4x10)(5x10)(6x10)(7x10)(8x10)(9x10)(0x10)(2x100)(3x100)1234567890(2x10)(3x10)(4x10)12345

If possible, text figures (lowercase numerals) should be used to make it easy to spot the beginning of ladders and where digits change.

Haskell source to construct these strings in any base.  Here are the two key routines:

`-- expand by 10 dogrow :: Integer -> Char -> String; dogrow base '1' = List.genericTake (base-1) positive_digits ++ "0"; dogrow base c = List.genericReplicate base c;`

` `

`-- process little-endian input one digit at a time growasdigits :: Integer -> [Integer] -> String; growasdigits _base [] = ""; growasdigits base (h:t) = (growasdigits base t & concatMap (dogrow base)) ++ List.genericTake h positive_digits;`

Future work: parser to verify correctness of a string.

Below is the construction in base 4 instead of base 10, showing strings of length 1 through 70.  The first column is string lengths expressed in base 4.

 1 1 2 12 3 123 10 1230 11 12301 12 123012 13 1230123 20 12302222 21 123022221 22 1230222212 23 12302222123 30 123022223333 31 1230222233331 32 12302222333312 33 123022223333123 100 1230222233330000 101 12302222333300001 102 123022223333000012 103 1230222233330000123 110 12302222333300001230 111 123022223333000012301 112 1230222233330000123012 113 12302222333300001230123 120 123022223333000012302222 121 1230222233330000123022221 122 12302222333300001230222212 123 123022223333000012302222123 130 1230222233330000123022223333 131 12302222333300001230222233331 132 123022223333000012302222333312 133 1230222233330000123022223333123 200 12302222333300002222222222222222 201 123022223333000022222222222222221 202 1230222233330000222222222222222212 203 12302222333300002222222222222222123 210 123022223333000022222222222222221230 211 1230222233330000222222222222222212301 212 12302222333300002222222222222222123012 213 123022223333000022222222222222221230123 220 1230222233330000222222222222222212302222 221 12302222333300002222222222222222123022221 222 123022223333000022222222222222221230222212 223 1230222233330000222222222222222212302222123 230 12302222333300002222222222222222123022223333 231 123022223333000022222222222222221230222233331 232 1230222233330000222222222222222212302222333312 233 12302222333300002222222222222222123022223333123 300 123022223333000022222222222222223333333333333333 301 1230222233330000222222222222222233333333333333331 302 12302222333300002222222222222222333333333333333312 303 123022223333000022222222222222223333333333333333123 310 1230222233330000222222222222222233333333333333331230 311 12302222333300002222222222222222333333333333333312301 312 123022223333000022222222222222223333333333333333123012 313 1230222233330000222222222222222233333333333333331230123 320 12302222333300002222222222222222333333333333333312302222 321 123022223333000022222222222222223333333333333333123022221 322 1230222233330000222222222222222233333333333333331230222212 323 12302222333300002222222222222222333333333333333312302222123 330 123022223333000022222222222222223333333333333333123022223333 331 1230222233330000222222222222222233333333333333331230222233331 332 12302222333300002222222222222222333333333333333312302222333312 333 123022223333000022222222222222223333333333333333123022223333123 1000 1230222233330000222222222222222233333333333333330000000000000000 1001 12302222333300002222222222222222333333333333333300000000000000001 1002 123022223333000022222222222222223333333333333333000000000000000012 1003 1230222233330000222222222222222233333333333333330000000000000000123 1010 12302222333300002222222222222222333333333333333300000000000000001230 1011 123022223333000022222222222222223333333333333333000000000000000012301 1012 1230222233330000222222222222222233333333333333330000000000000000123012