Cryptographic Concepts - Cryptographic Hashing and Salting
Oct 06, 2017
A hashing function is a function that is easy to calculate one way but extremely difficult to calculate in reverse that takes in data as an input and produces a fixed length output where it is exceptionally unlikely that 2 different inputs will result in the same output. It is best for the function to produce an output that appears as random as possible while also being deterministic. This allows for hashes to be useful for proving file integrity (any changes in a file will result in a completely different hash), showing files are different (files with different contents will have different hashes), and verifying that content is correct (If the hash matches, the content should be the same).
For an example of how a hashing function would work, an input of "test" will always produce the hash "9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08" and an input of "Test" will always produce the hash "532eaabd9574880dbf76b9b8cc00832c20a6ec113d682299550d7a6e0f345e25". Any change in the input should result in a completely different hash output.
An important feature of cryptographic hashes is being collision resistant. It should be very difficult to get two different inputs that result in the same hash output. If a hash collision happens, two different file contents would have the same hash value, meaning they would be considered the same content if you are using hashing to determine if contents are the same or different.
Hashes are quite often used by software creators and distributors to show that the code or binary executable you downloaded is the correct one and that it hasn't been tampered with. A hash collision could allow an attacker to create a malicious version of the code or executable that would be considered valid due to it having the same hash.
Broken Hashing Algorithms
There are a few hashing algorithms that were very commonly used for quite a while that have been shown to be broen and should not be used. These include **MD5** and **SHA1**. A collision attack actually happened to MD5 in 2004 and to SHA1 in 2017 (which I talked about in my blog post "Understanding the Recent SHA1 Collision".
Modern Hashing Algorithms
MD5 produced hashes of 128 bit lengths and SHA1 produced hashes of 160 bit lengths. Modern hashing algorithms produce hashes of longer lengths that MD5 and SHA1, while also being stronger in general.
SHA2 was a set of cryptographic hash functions created by the National Security Agency in 2001 as a replacement for SHA1. There were 6 different hashing functions in SHA2, which were SHA-224, SHA-256, SHA-284, SHA-512, SHA-512/224, and SHA-512/256, although SHA-256 and SHA-512 tend to be the hashing functions I tend to see used the most. FreeBSD actually uses SHA-512/256 on 64-bit platforms because SHA-512 is faster than SHA-256 on 64-bit processors.
In 2015, the National Institute of Standards and Technology (NIST) released the standard that would be called SHA3 after a competition was announced to create the standard back in 2006. SHA3 was not meant to be a replacement for SHA2, but simply an alternative that used a different method for hashing than SHA2, so we wouldn't be essentially putting all of our eggs in one basket in terms of security.
Let's say you have a secret word "Test" that you want to store a hash of. If you use a word as an input, it would be hashed and compared to the stored hash. If that input is the word "Test", it will match the hash. The problem is that someone else might have the secret word "Test" that they are storing the hash of, so both of your stored hashes will be the same. Even worse, someone might hash a large list of words, which is called a Rainbow Table, so they just need to look up the stored hash to get the original word from, an attacker would know your secret word. There is a way to protect against this, which is called "Salting".
A salt is some random data added to the input to make the output hash unique, even if someone else used the same input. For instance, with a salt, the hash would not be of the secret word "Test" anymore, but instead be something like "TestBT1zrfAg9mORZTgc", where "BT1zrfAg9mORZTgc" is a randomly generated base64 encoded string of 96 bits of data. This means there are 2^96 possible hashes for the secret word "Test" or any other possible secret word. This makes Rainbow Tables mathematically impractical to create, as the size would increase by a factor of 2^96.
You need to store the salt with the hash output though in order to make sure the input you provide matches the secret word. This is because you need to take your input and combine it with the salt before hashing it in order to compare it to the stored hash, so in this case, you would store "BT1zrfAg9mORZTgc" with the hash.
This post discusses cryptographic hashes, a set of functions that can take in an input and provide a seemingly random and deterministic output. Along with hashes, a concept called salting was also discussed, where random data is added to the input to make the hash stronger.
Come back for the next blog post which will be discussing Cryptographic Signatures, which are an extension of Asymmetric Cryptography and can be used to prove that data came from a specific person.