There are several conceptual pieces that are put together to make the working Bitcoin digital currency. The data which defines Bitcoin transactions is stored in a data structure called a blockchain. A key feature of blockchains involves cryptographic “hashing”. That is the focus of today’s post.
A hash function is any function that can be used to map initial data of arbitrary size to fixed-size values. The initial data may be called the key or the message. The values returned by a hash function are called hash values, hash codes, digests, or simply “hashes”. A common use for hashing in the past has been to do large-scale data storage and retrieval more efficiently, as described in Wikipedia. That link also discusses how some actual hashing calculations are done.
Here we will focus on cryptographic applications of hashing. For this purpose, hash functions are chosen which are for all practical purposes one-way. It is straightforward to start with the “message” and compute the hash. But it is not feasible to start with the hash and back-calculate the initial message, even if you know the algorithm used for the hash function. Typically the only way to find the message is to run a brute-force search of all possible inputs until you find a match to the output hash.
A good cryptographic hash function has the following main properties:
- It is deterministic, meaning that the same message always results in the same hash
- It is quick to compute the hash value for any given message
- It is infeasible to generate a message that yields a given hash value (i.e. to reverse the process that generated the given hash value)
- It is infeasible to find two different messages with the same hash value
- A small change to a message should change the hash value so extensively that a new hash value appears uncorrelated with the old hash value
The figure below shows the hash or “Digest” for several different text strings, using the SHA-1 hash function. Note that all these strings generate a hash of the same length. Also, small changes in the input text (e.g. “ouer” or “over” versus “over”) lead to big changes in the output hash. These features help make it essentially impossible to deduce anything about the input string by examination of the output hash.
Cryptographic hashing is used in a number of ways, including digital signatures and message authentication. Here we will focus on how it applies in cryptocurrencies like Bitcoin.
In Bitcoin it is desired to have multiple “miners” working on multiple nodes to log and confirm the actual financial transactions made with Bitcoin. The cryptography comes in as a way for a given miner to prove to everyone else on the network that they have indeed processed the transactional data correctly, and that they have put significant cost or effort into it. Ideally, the initial calculations by the miner are somewhat costly, but it is very easy and quick for anyone else to verify that that miner really has solved the puzzle.
The mining task consists of starting with a defined set of input data plus an arbitrary “nonce” number, and running a hash function on that (i.e. on the input data along with the nonce). The goal (in order to claim validation for that block of data) is to generate a hash below a certain value, i.e. that begins with a certain number of zeros. That is the “puzzle” to be solved. The only way to do that is to run through a huge number of possible nonce values, till an output hash value is found which has the desired number of zeroes as its first several digits. Part of the operation of Bitcoin is that the system software automatically varies the number of zeroes required in order to keep the average transaction time for solving the puzzle to about ten minutes. The relative difficulty of getting the solution has increased steadily with time, as miners employ ever more powerful, special-purpose computing equipment. As of August, 2020, the odds of getting an acceptable hash for a given nonce value were less than 1 in 16 trillion.
Here is a figure showing how this works in more detail:
The input to the hash function includes several header-type numbers (in yellow above, including the nonce). One of the header items is the hash of the previous valid block of data, which plays a crucial role in ensuring the integrity of the whole data chain. Another header item is a hash of the actual transactional data (in gray), which is represented as a “Merkle root”. (In practice, there are some additional complexities in the way this hash function is computed, involving breaking the problem down into smaller sub-problems that are hashed and then combined).
If a miner succeeds in finding a suitable nonce, he sends out his solution to the rest of the network, where it can easily be verified. For any single nonce value, it is very quick for others to run the input data plus the nonce through the hash function, and test whether the resulting hash begins with the requisite number of zeroes. If all goes well, the miner is rewarded with some Bitcoin.