Monday, August 09, 2004

Bit compression by a byte compressor

I wonder how well various compression programs, e.g., gzip, bzip2, do on strings that are repetitive at the bit level, but not so obviously repetitive when the bits are packed into bytes. For example consider repeating a 33-bit pattern 32*N times (for a total of of 132*N bytes). A good compressor should asymptotically get a compression factor of 32*N, while a bad one might only get a factor of N (or 8*N?). It's also an interesting problem to prevent repeative occurrence of bytes for this problem. One wants to select a bitstring such that windows of size 8 are all unique.

No comments :