Monday, July 26, 2010

[gckxujcr] LZMA compression of Fibonacci sequences

Let a0=0, a1=1, an=an-1 + an-2 mod 232.  We generate 224 32-bit wide values of this sequence starting at a2, and output in raw binary.  Gzip and bzip2 cannot compress this data, while lzma version 4.43-14ubuntu2 can.

This is a very simple, rather poor, pseudorandom number generator.

Doing a search over a0=0, 0<=a1<=600, here are the number of bytes after lzma compression for the best and worst.  The uncompressed data is 67108864 bytes.

a1Bytes after lzma compression
09555
51217485560
16032856845
36833114453
1633120288
11233164622
4833170719
22433193840
52833276552
14433383935
...
10766020589
2566072601
21566212294
22366293165
39166464701
26367270175
367281769
1167509002
1767586479
19767598950

If we do everything in 64 bits, lzma seems always able to compress.  Here are some results, 0<=a0<=1, 0<=a1<=300.  The uncompressed data is 134217728 bytes.

a0a1Bytes after lzma compression
0019022
016092625248
028898440144
0128101506051
064102243681
024102401107
0192102548146
0256102636570
0224102641413
066102650889
1224102668158
048102733136
...
0231114737440
0167115103524
165115373650
091116047368
1107116077496
0219116341874
1256116428799
160117401543
142117854697
1127120676882

Complete results here.

The challenge remains to produce as rapidly as possible data "random" enough to fool lzma.

SSE is probably involved.

1 comment :

Ken said...

http://kenta.blogspot.com/2010/07/upuplxco-much-faster-than-urandom.html