Wednesday, December 25, 2013

[jywdiiaw] Random data generator via compression

Modify a data decompressor to produce random output that matches the structural assumptions that allowed data compression to work well in the first place.

Wherever the decompression (uncompression) program reads input (normally compressed data), let it sample random bits.  Disable data integrity checks, or resample until the checks pass.

Motivation was, whereas text compression algorithms have become very sophisticated (Hutter Prize, PAQ family), random sentence generation generally still relies on simple Markov chaining.  First, decompress a training corpus, then generate random text starting from that context state.

I'm curious what Burrows-Wheeler and bzip2 will do, since it does not have a probabilistic foundation.

Randomly generated output may be used for art, or more practically, steganography.

We would also like to take any function that can distinguish (perhaps probabilistically) between real and randomly generated text and roll it back into making a better compression algorithm. This seems difficult. Data compression improves as a side effect of the arms race between eavesdroppers (like the NSA) and steganographers.

No comments :