Consider a Merkle Hash Tree computation which arranges the computation as a complete binary tree with all the data at the lowest level. We leave unstated the chunk size for each leaf.
Assume the data arrives online: we don't know how much data there is in total, and we don't get it all at once. How do we keep track of the state of a partial computation? Roughly, if the subtree below a node is a full binary tree, then the hashes of its children and further descendants can be forgotten: only the hash at the node is needed. Amount of space required to store state is probably O(log n) where n is the amount of data so far.
We would also like to keep and report the first few levels of the tree as final output. If a subsequent computation of the same data gets a different result, comparing the first few levels of the tree gives some indication of where things went wrong. As more data arrives, a node which was in the first few levels of the tree may stop being so, so should be forgotten at that point.
A parallel computation requires even more bookkeeping.
Motivation was to design a large, parallel computation task (for testing a system) in which a single error anywhere will propagate and be visible in the final output. (Inspiration was computation of digits of pi.) Input data for such a test can be integers in increasing order. (Update: modular exponentiation is a better way.)
No comments :
Post a Comment