Merkle tree is a tree in which every non-leaf node is labeled with the hash of the labels or values (in case of leaves) of its child nodes. Hash trees allow efficient and secure verification of the contents of large data structures. Hash trees are a generalization of hash lists and hash chains.
- Check if the data belongs in the merkle tree
- To concisely prove the validity of data being part of a dataset without storing the whole data set
- To ensure the validity of a certain data set being inclusive in a larger data set without revealing either the complete data set or its subset.
Before downloading a file on a p2p network, in most cases the top hash is acquired from a trusted source, for instance a friend or a web site that is known to have good recommendations of files to download. When the top hash is available, the hash tree can be received from any non-trusted source, like any peer in the p2p network. Then, the received hash tree is checked against the trusted top hash, and if the hash tree is damaged or fake, another hash tree from another source will be tried until the program finds one that matches the top hash.
Suppose we want to prove that transaction C was indeed in the block that formed the header shown above.
In addition to the transaction hash C , we also need D, S(A,B), and S(S(E,F),S(G,H)) to form the proof. The verification itself performs the following steps on the proof:
- Hash C and D to produce S(C,D).
- Hash S(A,B) and S(C,D) to produce S(S(A,B),S(C,D)).
- Hash S(S(A,B),S(C,D)) and S(S(E,F),S(G,H)) to produce the root.
- Check that the root is the same as what has been stored previously.
The efficiency here is that we proved a transaction belonged in a block with only 3 accompanying pieces of information (instead of the 7 other transactions that were stored in the block). This efficiency becomes exponentially more pronounced with larger trees.
Different Merkle tree implementations handle "orphan" leaves in different ways. Our trees conform to the diagrams below; orphan leaves are not duplicated or hashed multiple times. They are promoted to higher level of the tree.
┌───┴──┐ ┌────┴───┐ ┌─────┴─────┐
┌──┴──┐ │ ┌──┴──┐ │ ┌──┴──┐ ┌──┴──┐
┌─┴─┐ ┌─┴─┐ │ ┌─┴─┐ ┌─┴─┐ ┌─┴─┐ ┌─┴─┐ ┌─┴─┐ ┌─┴─┐ │
(5-leaf) (6-leaf) (7-leaf)
Handling odd number
For the case of Bitcoin, where Merkle Tree is used, in case the number of transactions is odd at some level of the tree then you just copy the element onto the right to form a pair.
Create a new Merkle Tree from the list of Content
tree = MerkleTree(['a', '3', '你'])
Get the Merkle Root of the tree
tree.root.val
Verify the entire tree is valid
tree.validate()
Verify a specific content in in the tree
tree.contains('你', 2)