Wednesday, July 01, 2020

[drkapfvr] Binary approximation in arithmetic coding

Consider data compression using arithmetic coding of message that uses an input alphabet of 3 characters each with frequency 1/3.  If we do exact arithmetic with fractions, it will require linear space and probably cubic time (assuming quadratic-time arbitrary-precision multiplication).

Linear space is unacceptable because message size can be unbounded.

The key trick is to approximate 1/3 by a binary fraction (a fraction with denominator a power of 2).  (We assume the output alphabet is binary bits.)  This reduces computational complexity to constant space and linear time, a considerable savings.

The precision to which to binary approximate the frequencies results in a tradeoff between coding efficiency and speed.

Because everything is done in binary, arithmetic coding this way vaguely resembles Huffman coding.  It's probably possible to describe arithmetic coding as walking among nodes of a large collection of binary trees.

If the output alphabet is not binary bits but some other base, the approximation of frequencies should be done in the output base.

The remaining details of arithmetic coding are well described in other references on arithmetic coding.

No comments :