Thursday, November 29, 2012

[haxsglch] Protocol for modifying remote files

Create a protocol for modifying remote files.

To conserve bandwidth, edits are submitted as a tuple of a hash of the original file and a binary diff.

One possible optimization is to compute hashes with a Merkel hash tree instead of a linear hash, allowing the file hash to be quickly recomputed for edits in the middle of a file.  Both sides independently maintain the hash tree.  Unfortunately, this does not work if an edit changes the length of a file.  Are there better ways?  Perhaps possible if we forgo cryptographic security.

We need the hash to avoid merge conflicts, i.e., if two separate clients edit the same original file in different ways.

A client sends a diff but does not receive an acknowledgment.  Maybe the server received the diff but the ack got dropped on the way back, or maybe the server never received it.  Prevent things from going wrong whichever possibility. Avoid the Two Generals' Problem.

Protocols like this almost certainly already exist.  What are they?  (Sshfs could use this, if not already.)

Inspired by Google Drive, or the way Google Drive should work.  Online "cloud" storage should be a commodity product, with anyone able to implement the protocol, including setting up a server yourself.  (But Google does not want it to become a commodity product.)

Goal is to mix and match any editor with any backend remote storage of your choice.

Git, or any version control system, does almost what I want.  We need a way to push, declaring this version should be become the head regardless of any merge conflicts.  Also, prune old versions.

No comments :