Sunday, August 14, 2022

[nmkzefbd] exponentially sized sizes

first byte E: 0 .. 255.  values 0, 1, 2, 3 are common.

next 2^E bytes: an integer denoting a length S.

next S bytes: payload of length S.

benefits:

  • no need to worry about parsing (say) 3-byte integers.  the length of the length will always be a power of 2.
  • the first byte realistically can vary -- not a constant prefix -- forcing input parsers to have to consider the various possibilities, including overflow.  can you handle E >= 64 (with lots of leading zeroes)?  it will require multiple precision just to count the number of bytes read.  reading more than 2^64 bytes through a pipe is not unrealistic.

limit of 256^2^255-1 bytes, which should be enough for everyone (~ 2.07 * 10^139427568484130471719462862803721671986013937475814382342177386385362323556246 ~= 10^10^77.14 bytes).

size width larger than necessary is permitted, e.g., E=3, 2^3 = 8 bytes to express the a length 255 when E=0, 2^0 = 1 byte suffices.  "larger than necessary" is required to express 0.  the byte sequence [0,0] and the byte sequence [1,0,0] both encode the empty string (and other encodings possible as well).  the prefix of a given string (payload) is not unique.

need to specify the endianness of the length integer.

previously, base 10.

No comments :