Sunday, November 17, 2019

[pfzuviov] Merkle Tree state

Consider a Merkle Hash Tree computation which arranges the computation as a complete binary tree with all the data at the lowest level.  We leave unstated the chunk size for each leaf.

Assume the data arrives online: we don't know how much data there is in total, and we don't get it all at once.  How do we keep track of the state of a partial computation?  Roughly, if the subtree below a node is a full binary tree, then the hashes of its children and further descendants can be forgotten: only the hash at the node is needed.  Amount of space required to store state is probably O(log n) where n is the amount of data so far.

We would also like to keep and report the first few levels of the tree as final output.  If a subsequent computation of the same data gets a different result, comparing the first few levels of the tree gives some indication of where things went wrong.  As more data arrives, a node which was in the first few levels of the tree may stop being so, so should be forgotten at that point.

A parallel computation requires even more bookkeeping.

Motivation was to design a large, parallel computation task (for testing a system) in which a single error anywhere will propagate and be visible in the final output.  (Inspiration was computation of digits of pi.)  Input data for such a test can be integers in increasing order.  (Update: modular exponentiation is a better way.)

No comments :