Merkle Tree in Blockchain: What it is and How it Works
What Is a Merkle Tree?
A Merkle tree is a data structure that is used in computer science applications. In bitcoin and other cryptocurrencies, Merkle trees serve to encode blockchain data more efficiently and securely.
They are also referred to as “binary hash trees.”
Breaking Down Merkle Tree
In bitcoin’s blockchain, a block of transactions is run through an algorithm to generate a hash, which is a string of numbers and letters that can be used to verify that a given set of data is the same as the original set of transactions, but not to obtain the original set of transactions. Bitcoin’s software does not run the entire block of transaction data—representing 10 minutes’ worth of transactions on average—through the hash function at one time, however. Rather, each transaction is hashed, then each pair of transactions is concatenated and hashed together, and so on until there is one hash for the entire block. (If there is an odd number of transactions, one transaction is doubled and its hash is concatenated with itself.)
Visualized, this structure resembles a tree. In the diagram below, “T” designates a transaction, “H” a hash. Note that the image is highly simplified; an average block contains over 500 transactions, not eight.
The hashes on the bottom row are referred to as “leaves,” the intermediate hashes as “branches,” and the hash at the top as the “root.” The Merkle root of a given block is stored in the header: for example, the Merkle root of block #482819 is e045b18e7a3d708d686717b4f44db2099aabcad9bebf968de5f7271b458f71c8. The root is combined with other information (the software version, the previous block’s hash, the timestamp, the difficulty target, and the nonce) and then run through a hash function to produce the block’s unique hash: 000000000000000000bfc767ef8bf28c42cbd4bdbafd9aa1b5c3c33c2b089594 in the case of block #482819. This hash is not actually included in the relevant block, but the next one; it is distinct from the Merkle root.
The Merkle tree is useful because it allows users to verify a specific transaction without downloading the whole blockchain (over 350 gigabytes at the end of June 2021). For example, say that you wanted to verify that transaction TD is included in the block in the diagram above. If you have the root hash (HABCDEFGH), the process is like a game of sudoku: you query the network about HD, and it returns HC, HAB, and HEFGH. The Merkle tree allows you to verify that everything is accounted for with three hashes: given HAB, HC, HEFGH, and the root HABCDEFGH, HD (the only missing hash) has to be present in the data.
Merkle trees are named after Ralph Merkle, who proposed them in a 1987 paper titled “A Digital Signature Based on a Conventional Encryption Function.” Merkle also invented cryptographic hashing.