Merkle root of a bitcoin block calculated in C#
Mục lục bài viết
Merkle root of a bitcoin block calculated in C#
(this article assumes a basic knowledge of how blockchains work and the intent is to shed a little insight into how transactions are included in a block)
What is a Merkle Tree?
Merkle tree aka binary hash tree is a data structure used for efficiently summarising and verifying the integrity of large data sets
Merkle tree is a kind of inverted tree structure with the root (known as the Merkle root) at the top and the leaves at the bottom. See the example below.
This data structure is used in the bitcoin and blockchain technologies to summarise all the transactions in a block, providing a very efficient process to verify whether a transaction is included in a block.
A bit of history
Merkle was also one of the inventors of public key cryptography and he appears to have long moved on from cryptography to solving more interesting problems. You can check out his webpage here.
While learning about Merkle tree, I had an idea to quickly write some code to download a block from the bitcoin chain and calculate the Merkle root of the transactions to see if I would arrive at the same Merkle root calculated by a block miner and verified by the network because …… well why not?
The wikipedia definition of a merkle tree is
A Merkle tree is recursively defined as a binary tree of hash lists where the parent node is the hash of its children, and the leaf nodes are hashes of the original data blocks.
This is consistent with the way merkle trees are implemented in the bitcoin block chain .
The algorithm is roughly as follows
- The transaction represents the original data blocks which are hashed to produce transaction hashes (transaction id) which form the leaf nodes.
- The leaf nodes have to be even in number for a binary hash tree to work so if the number of leaf nodes is an odd number, then the last leaf node is duplicated to even the count.
- Each pair of leaf nodes is concatenated and hashed to form the second row of hashes.
- The process is repeated until a row is obtained with only two hashes
- These last two hashes are concatenated to form the Merkle root.
Looking at the algorithm, it looked quite straightforward and I was very confident I could write the code in and verify the Merkle root for a block in the bitcoin blockchain https://blockchain.info in no time.
Hold my (metaphorical) beer…
Recursion … Really?
well … you could just use a while loop but the point here is blockchain, not algorithms and recursion is always fun. 🙂
A couple of hours later I was feeling quite annoyed; the algorithm was straightforward enough, however, my Merkle root did not match the root in the block and I really didn’t have a clue why.
The devil is always in the detail
While looking at the bitcoin developer reference I came across the following
If a block has three or more transactions, intermediate merkle tree rows are formed. The TXIDs are placed in order and paired, starting with the coinbase transaction’s TXID. Each pair is concatenated together as 64 raw bytes and SHA256(SHA256()) hashed to form a second row of hashes.
It appeared that I needed to double hash the concatenated pair as opposed to just a single hash so I happily made the change, compiled and ran the code. Disappointingly still no joy and I was seriously considering just leaving it there as what had started as a random thought was now costing me time while leaving me dissatisfied. But I was hooked and I had to know why my code wasn’t working. After some time trawling google and talking to a fellow blockchain enthusiast I came across a concept I wasn’t really too familiar with …
Little and Big Endian
Essentially this refers to how computers store multibyte numbers.
“Little Endian” machines store the low-order byte of the number in memory at the lowest address, and the high-order byte at the highest address.
“Big Endian” machines store the high-order byte of the number in memory at the lowest address, and the low-order byte at the highest address.
e.g lets say you have a 4 byte longInt
Byte3 Byte2 Byte1 Byte0
In little endian machines , it will be arranged in memory as
Address0 Byte0
Address1 Byte1
Address2 Byte2
Address3 Byte3In big endian machines , it will be arranged in memory as
Address0 Byte3
Address1 Byte2
Address2 Byte1
Address3 Byte0
In Bitcoin, the majority of the fields in transaction data are in little-endian format. When converting from little-endian to big-endian, swap each pair of characters first, then reverse the string.
So the fix was simply
- convert the transaction hashes from little-endian to big-endian before. running the BuildMerkleRoot algorithm.
- convert the Merkle root back to little-endian
Voila!, finally I got my merkle root, now where’s my beer ? :).
Summary
So what is the point of the Merkle tree in blockchain?
Without going into too much detail in this article, the Merkle root is ultimately included in a block header and provides a way to verify transactions in a block.
The block header contains a hash of the previous block header which includes the previous blocks Merkle root etc.., chaining transactions up until the very first block (genesis block) meaning any slight change in transaction data would result in a different Merkle root which would invalidate the block/s
Another key use case of the Merkle tree is in simple payment verification where you can prove that a transaction exists in a block, by showing the Merkle branch of the transaction.
I have included a GitHub link to the full code https://github.com/chidionyema/merkleroot.
You can replace the block hash in the code with any block hash from https://www.blockchain.com/ and the Merkle root should be the same.
disclaimer
- this is just get it working code 🙂