What are cryptographic hashes?

Bruce Blackshaw 6 December 2019

Cryptographic hash algorithms are important mathematical functions used widely in software, particularly in secure protocols such as SSL/TLS and SSH.

A hash algorithm is supplied a block of data, known as the message, and produces a much smaller hash value, known as the message digest, or simply the digest. The same message will always result in the same digest. Different messages produce different digests.

An important feature of hash algorithms is that given a particular digest, it is extremely difficult to generate a message that will produce it. They are "one way" algorithms - the digest of a message is easy to calculate, but a message can't be deduced from the digest. It is mathematically possible to have two different messages produce the same digest - known as a collision - but for good hash algorithms this is computationally infeasible.

Popular hash algorithms include MD5 and SHA-1, although these are now being phased out in favour of stronger algorithms such as SHA-2.

Hash algorithms are used for many purposes, such as verifying the integrity of data or files, password verification, and building other cryptographic functions such as message authentication codes (MACs) and digital signatures.

Digital signatures

A written signature demonstrates that a document was created by a known author and accurately represents them. A digital signature is similar - it guarantees that the message was created by a known sender (authentication) and that the message was not tampered with in transit (integrity).

To sign a message requires two stages. Firstly, the message digest is calculated, producing a unique hash that is typically much smaller than the message. Next, the digest is encrypted using the message signer's private key. This is the digital signature of the message.

To verify the signer of a message also requires two stages. Firstly, the signer's public key (which is widely available) is used to decrypt the digital signature, yielding the message digest. Then the message digest of the message is calculated and compared to the decrypted digest. If the message has not been tampered with, the digests should be identical. And because the signer's public key was used to decrypt the signature, the signer's private key must have been used to encrypt it.

Why use the message's digest at all? Why not just encrypt the message with the signer's private key and use the encrypted message as the signature? While that would certainly work, it is impractical - it would double the size of the message when the signature is included. The digest is very small and of a fixed size, so encrypting the digest produces a signature that is much smaller.

Message authentication codes (MAC)

A message authentication code , or MAC , is a small piece of information attached to a message that can verify that the message has not been tampered with, and authenticate who created it.

A special type of MAC is the HMAC , which is constructed using a cryptographic hash and a secret key. The secret key is padded and concatenated with the message, and the digest, or hash, is calculated. This digest is then concatenated again with the padded secret key to yield the HMAC value. It is impossible for an attacker to produce the same HMAC without having the secret key.

The sender and receiver both share the secret key. When the receiver gets a message, they calculate the HMAC and compare it to the HMAC provided with the message. If the HMACs match, only someone possessing the secret key could have produced the message. The secret key itself is never transmitted.

Password verification

Cryptographic hashes are extremely useful for systems that require password verification. It is an unjustifiable security risk to store user's passwords, even if they are encrypted. Instead, the digest of each password is stored. When the user supplies the password, it is hashed and compared with the digest that is stored.

One drawback with this method is that if users have the same password, they will have the same hash value. Tables of pre-calculated digests for common passwords can be used to attack a system if the file containing the digests can be obtained. These tables are known as rainbow tables.

For this reason a salt - a random, non-secret value - is concatenated with the password before the digest is calculated. Because every user has a different salt, it is not feasible to use pre-calculated tables - there would need to be a table for every possible salt value. For salts to be effective, they must be as random as possible, and of adequate size - preferably at least 32 bits.