Monday, July 26, 2010

[rbqbheir] Bloom filter of the entire internet

Bloom filters with 99% confidence require -log(0.01)/log(2)^2/8 = 1.2 bytes per element.

Bloom filters with forgery-proof false positive rate 1/2^128 require 128*log(2)/log(2)^2/8 = 23 bytes per element.

Assuming the world wide web is a trillion objects, it's big but not too big. For text, to avoid losing on dynamically generated content, we could also do substrings.

It might be useful for digital timestamping.

3 comments :

Arvind Subramanian said...

Hello Ken,

I am interested in Bloom filter variants, and their applications.

Are you using an off-the-shelf implementation? Are you developing one yourself?What problem are you working on?

I had written an article related to the same, and I would love to know more about your insights.

http://appolo85.wordpress.com/2010/08/03/bloom-filter/

Regards,
Arvind

Ken said...

I am currently not working on Bloom filters; this was just an offhand idea I had.

Arvind Subramanian said...

Oh, alright... :-)

Would love to know more on what /how you are using BF in your work, when you do!

Regards,
Arvind