Wednesday, March 05, 2008

Voluntary Distributed Storage

Write access requires a private key.

The backend is many computers who have donated storage and bandwidth to store data too large or otherwise unwieldy to store all in one central location.

The model assumes these backend "servers" are untrusted and unreliable, so cryptographic signatures are used to ensure data integrity, and aggressive redundancy is maintained with replication or error correcting codes. The backend servers constantly communicate with each other to populate new servers and maintain data redundancy and verify integrity.

As the writer changes data, we encounter the standard problem of maintaining coherence of the data among the replicated backend servers with the most recent version. One possible solution is to keep track of versions using a separate "faster" metadata distribution channel and decide on the fly whether the given data being read from a certain backend server has is stale, that is, is marked as having changed between the version on the backend server and the current version.

Data cannot be surely deleted, as there is nothing stopping a backend server administrator from snapshotting his data store, or otherwise ignoring deletion commands. Let's call this a feature, not a bug, like version control systems do.

One may bundle the software for reading this distributed data store with the backend server software, suggesting a Bittorrent-like social contract of "You have to help store the data if you want access to it." The user/backend server administrator may control how much storage is granted to a particular writing private key, as well as upload and download bandwidth limits. (Bandwidth ought to be controllable from th operating system, though.) A transparent read user interface would be nice, perhaps modeled on a NFS mount or a database. We would also like to download simultaneously from multiple "near" sources for high bandwidth.

This is inspired by, and may be a partial solution to, several previous posts:

Distributed Pi
NFSNET large factorization
P2P Filesystem
Peer to peer distributed application

No comments :