Hash

From CryptoWiki

Basics

  • A digital fingerprint of some binary input.
  • Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string. Hashing is used to index and retrieve items in a database because it is faster to find the item using the shorter hashed key than to find it using the original value. It is also used in many encryption algorithms.
  • What are hash functions used for? From Decrypt (10-7-2023):
  1. "Creating Private and public keys: To send and receive crypto, or carry out a transaction on blockchain, you need a pair of keys. Keys typically come in pairs, private and public. The private key is connected to the public key via a hash function to keep your details safe. You could send your public key to anyone, or have it on a public profile, and no one will be able to work out the private key thanks to the hash function.  
  2. Bitcoin mining: In order to keep the network running fairly, Bitcoin maintains a level playing field for participants to earn themselves the right to create the next block. It uses a computational race that starts again every time a block is published. To win the race, each miner collects a set of transactions, includes a reference to the previous block and uses this as a piece of data to put into a hash function. To win the race, the hash function that results must start with a certain number of zeros depending on the current difficulty of the network. The higher the zeros, the harder it is. We wrote a whole piece on bitcoin mining to help make things clearer. Essentially, hash functions are used to help crack the puzzle at the heart of blockchain.
  3. Linking blocks in the blockchain: If blocks in a blockchain weren’t linked, it would be easy to insert a fake one. Bitcoin avoids this by linking each block to the previous block. It does so by using a hash pointer. A hash pointer is the result worked out from hashing the previous block in the chain. This means that anyone can check that the transactions in the block follow on from the previous one. This actually allows miners to ensure the whole chain is legitimate and that not a single block has been added by a malicious miner. This also means that every coin’s history can be traced back to when it was mined."

"Hashes and hash functions perform a vital role in both DLTs and wider computing security. A hash is often represented by a long alphanumeric value. It is generated through the act of ‘hashing’, by which an input (e.g. some text) is turned into an output. No matter the size of the input, the output will always be a fixed length as stipulated by the hash function. There are many different mathematical algorithms, known as hash functions, used to achieve this and they can be used for a variety of purposes.

Hash functions are utilized by billions of users daily, as they form the basis of user security for website passwords. Owing to the risk of data breaches, companies do not (or should not) store your password as plain text. If they do and then suffer a hack then all of your other accounts with the same or similar passwords would also be compromised.

Companies therefore use your password as the input for the hash function and instead store the hashed output. When you log in they need only run the same hash function to ensure the outputs match. Please note that the process to correctly hash passwords is somewhat more complicated than is shown here in order to prevent dictionary and other lookup attacks, but not overly so.

Hash functions are also widely used by crypto assets. For example, Bitcoin uses a hash function known as SHA-256 for both mining and creating addresses. When creating new blocks, a Bitcoin miner solves a complex mathematical puzzle in order to find a block which is then appended to all prior blocks. This puzzle Is known as the Proof of Work and involves arranging a series of required data before running it through the hash function.

One part of the required data is the hash of the previous block. In this manner all hashes link back to the Genesis block. If I change block 1000, then the hash of blocks 1001 onwards would be incorrect (because the input would have changed). As such, to change an older block a malicious actor would have to have enough power to also change the blocks after it as otherwise the hashes wouldn’t match.

Input X will always give Output Y‍

The whole point of a hash function is that the hash of any given input will equal a fixed output. This cannot change because otherwise hashes could not be used for verification - we would be unable to compare to the input with any confidence.‍

Pre-image resistance‍

Pre-image resistance means that you cannot use the known output to determine the associated input. If we could work out an input through an output then we would be able to easily break the hash function, rendering the security useless. Instead of needing to expend large amounts of time and resources to solve a block for Bitcoin for example, it would be trivial for anyone to do so (and therefore also alter all old blocks). Hash functions protect against this, by making it hard (not impossible) to work out an input when you know the output.

The reason it is not impossible is because a malicious actor could brute force the function to discover the output of every input. However, this is extraordinarily difficult and realistically this computation (or series of computations) is impractical for any existing hardware).  Although a malicious actor could get lucky and generate the output they were seeking earlier, the likelihood is that it would take them many, many lifetimes.

Avalanche Effect‍

Hash functions also protect against deducing inputs based on marginal changes. Changing an input by one character does not change the output in a discoverable way and rather the whole output changes. This is known as the Avalanche Effect. For example, the SHA-256 hash of ‘Radix’ and ‘radix’ look very different:

Radix: 1f56d0be7dd66870efd22fb5073f8c1ae650dc6bad6e764d65327ad32c937c4f

radix: da7f85eaf3d0452479031da124d28778aaf15cc756a6c909d7dc708fade343f0

Collision resistance‍

Collision resistance means two different inputs are extremely unlikely to give the same output (known as a cryptographic hash collision). If two inputs regularly gave the same output then we would not be able to trust the veracity of network information. Again, we cannot fully eradicate the chance of these collisions but instead seek to make it exceedingly rare and as difficult as possible for malicious actors to take advantage of. It is worth noting, however, that certain hash functions like MD5 (not used in Radix) have been proven to suffer from a susceptibility to such collision attacks.‍

Second pre-image resistance‍

As noted, there is a chance that two inputs can give the same output, even if that is rare. Second pre-image resistance means that even in this event we should not be able to discover the second shared input."

  • The main features of a hashing algorithm are that they are a one way function – or in other words you can get the output from the input but you can’t get the input from the output – just like elliptic curve cryptography where you can’t get the private key from the public key. The other property is that the same input creates the same output.
  • Most hashing algorithms, including the SHA and RIPEMD are all descended from the MD4 family. The MD4 hashing algorithm was developed by Ronald Rivest specifically to allow very easy software implementation. The MD4 algorithm and subsequent SHA algorithms use 32 bit variables with bitwise Boolean functions such as the logical AND, OR and XOR operators to work through from the  input to the output hash.
  • This is the basic process behind hashing – simply convert a number into binary then perform a set of simple functions that operate through basic standard transistor and bus processes such as AND, XOR, NOT, Rotate & OR. This is part of the reason that ASIC, or application specific chips can be designed that optimise hashing. In the case of SHA-256 chips have been specifically designed to optimise the iterations throughout the steps to increase the speed of creating a hash from an input. In the case of mining this means you can calculate more hashes per second by iterating through the nonce and extra nonce parameters and have a higher probability of winning the block reward.