Hash

From CryptoWiki

Basics

  • A digital fingerprint of some binary input.
  • Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string. Hashing is used to index and retrieve items in a database because it is faster to find the item using the shorter hashed key than to find it using the original value. It is also used in many encryption algorithms.
  • From this post by FlatOutCrypto (5-7-2019):

"Hashes and hash functions perform a vital role in both DLTs and wider computing security. A hash is often represented by a long alphanumeric value. It is generated through the act of ‘hashing’, by which an input (e.g. some text) is turned into an output. No matter the size of the input, the output will always be a fixed length as stipulated by the hash function. There are many different mathematical algorithms, known as hash functions, used to achieve this and they can be used for a variety of purposes.

Hash functions are utilized by billions of users daily, as they form the basis of user security for website passwords. Owing to the risk of data breaches, companies do not (or should not) store your password as plain text. If they do and then suffer a hack then all of your other accounts with the same or similar passwords would also be compromised.

Companies therefore use your password as the input for the hash function and instead store the hashed output. When you log in they need only run the same hash function to ensure the outputs match. Please note that the process to correctly hash passwords is somewhat more complicated than is shown here in order to prevent dictionary and other lookup attacks, but not overly so.

Hash functions are also widely used by crypto assets. For example, Bitcoin uses a hash function known as SHA-256 for both mining and creating addresses. When creating new blocks, a Bitcoin miner solves a complex mathematical puzzle in order to find a block which is then appended to all prior blocks. This puzzle Is known as the Proof of Work and involves arranging a series of required data before running it through the hash function.

One part of the required data is the hash of the previous block. In this manner all hashes link back to the Genesis block. If I change block 1000, then the hash of blocks 1001 onwards would be incorrect (because the input would have changed). As such, to change an older block a malicious actor would have to have enough power to also change the blocks after it as otherwise the hashes wouldn’t match.

Input X will always give Output Y‍

The whole point of a hash function is that the hash of any given input will equal a fixed output. This cannot change because otherwise hashes could not be used for verification - we would be unable to compare to the input with any confidence.‍

Pre-image resistance‍

Pre-image resistance means that you cannot use the known output to determine the associated input. If we could work out an input through an output then we would be able to easily break the hash function, rendering the security useless. Instead of needing to expend large amounts of time and resources to solve a block for Bitcoin for example, it would be trivial for anyone to do so (and therefore also alter all old blocks). Hash functions protect against this, by making it hard (not impossible) to work out an input when you know the output.

The reason it is not impossible is because a malicious actor could brute force the function to discover the output of every input. However, this is extraordinarily difficult and realistically this computation (or series of computations) is impractical for any existing hardware).  Although a malicious actor could get lucky and generate the output they were seeking earlier, the likelihood is that it would take them many, many lifetimes.

Avalanche Effect‍

Hash functions also protect against deducing inputs based on marginal changes. Changing an input by one character does not change the output in a discoverable way and rather the whole output changes. This is known as the Avalanche Effect. For example, the SHA-256 hash of ‘Radix’ and ‘radix’ look very different:

Radix: 1f56d0be7dd66870efd22fb5073f8c1ae650dc6bad6e764d65327ad32c937c4f

radix: da7f85eaf3d0452479031da124d28778aaf15cc756a6c909d7dc708fade343f0

Collision resistance‍

Collision resistance means two different inputs are extremely unlikely to give the same output (known as a cryptographic hash collision). If two inputs regularly gave the same output then we would not be able to trust the veracity of network information. Again, we cannot fully eradicate the chance of these collisions but instead seek to make it exceedingly rare and as difficult as possible for malicious actors to take advantage of. It is worth noting, however, that certain hash functions like MD5 (not used in Radix) have been proven to suffer from a susceptibility to such collision attacks.‍

Second pre-image resistance‍

As noted, there is a chance that two inputs can give the same output, even if that is rare. Second pre-image resistance means that even in this event we should not be able to discover the second shared input."

  • The main features of a hashing algorithm are that they are a one way function – or in other words you can get the output from the input but you can’t get the input from the output – just like elliptic curve cryptography where you can’t get the private key from the public key. The other property is that the same input creates the same output.
  • Most hashing algorithms, including the SHA and RIPEMD are all descended from the MD4 family. The MD4 hashing algorithm was developed by Ronald Rivest specifically to allow very easy software implementation. The MD4 algorithm and subsequent SHA algorithms use 32 bit variables with bitwise Boolean functions such as the logical AND, OR and XOR operators to work through from the  input to the output hash.
  • This is the basic process behind hashing – simply convert a number into binary then perform a set of simple functions that operate through basic standard transistor and bus processes such as AND, XOR, NOT, Rotate & OR. This is part of the reason that ASIC, or application specific chips can be designed that optimise hashing. In the case of SHA-256 chips have been specifically designed to optimise the iterations throughout the steps to increase the speed of creating a hash from an input. In the case of mining this means you can calculate more hashes per second by iterating through the nonce and extra nonce parameters and have a higher probability of winning the block reward.