GitHub: ViGrey Twitter: @ViGreyTech LinkedIn: Vi Grey

Understanding the Recent SHA1 Collision

Feb 24, 2017

Update (2017/02/26): Reworded the "What can't this collision technique do?" section to make it a little easier to understand.

Researchers at Google and the CWI Institute in Amsterdam managed to produce the first public SHA1 collision [4]. SHA1 [1] is a deterministic one way cryptographic hashing algorithm that is often used for data integrity checks. For any input 𝑥, a deterministic output 𝑦 is produced. If you have an output 𝑦, you should not be able to figure out the input 𝑥. For example, an input of "test" will always produce the output "a94a8fe5ccb19ba61c4c0873d391e987982fbbd3" and an input of "Test" will always produce the output "640ab2bae07bedc4c163f679a746f7ab7fb5d1fa". Any change in the input should result in a completely different output. A SHA1 collision occurs when two different inputs provide the same output, which is bad for security.

Are we all doomed?

Not quite. It's important to clarify what was actually accomplished. While the researchers found a SHA1 collision, they were trying to create any hash collision, not a specific hash collision. This is described in general as a Birthday Attack, based on the Birthday Problem in Mathematics.

The Birthday Problem?

Imagine you are guarding a room with 365 chairs in it, a chair for each day of the year (we're going to ignore leap days to make this easier). You let one person into the room and have them sit on the chair that matches their birthday. Then, you let another person into the room and have them sit on the chair that matches their birthday. You continue letting people in one at a time until someone can't sit on a chair because someone else is already sitting on it. At this point, 2 people in the room will have the same birthday. How many people would need to be in the room before you have a greater than 50% chance of 2 people having the same birthday? Remember, we aren't asking for how many people need to be in the room for a greater than 50% chance of someone having a specific birthday, just that any 2 people in the room have the same birthday. Turns out, only 23 people would need to be in the room!

But why only 23 people?

If we were looking for the amount of people needed to have a greater than 50% chance of someone matching a specific birthday, we'd take the total number of days and divide it by 2 and round it up to the nearest whole number (Friends don't let friends split people in half, so whole numbers are needed) to get an answer of 183 people. In the Birthday problem, we only want any 2 people to have the same birthday. When the first person comes in, they have a 365 out of 365 chance that no one will have their birthday. The second person has a 364 out of 365 chance that no one will have their birthday. If the birthdays don't match, the third person has a 363 out of 365 chance that no one will have their birthday. We continue on until 2 people have the same birthday, so for 23 people, you can take 1 ‐ (365/365 x 364/365 x 363/365 x . . . x 343/365), which would be about 0.5073, or a little over 50%. As you can see, because we only care about any 2 matching birthdays, we only need 23 people and not 183 people.

How does this work for SHA1 collisions?

Imagine instead of 365 possible birthdays, we have 1461501637330902918203684832716283019655932542976 (2160) possible birthdays. The researchers used an attack called SHAttered [5], which is a Birthday Attack with modifications to bring the amount of hash collision attempts down to 9223372036854775808 (263).

What can this collision technique do?

Let's say we have 2 users, Alice and Bob, and server owned by Eve that hosts some files. If Eve used SHA1 checksums for the files, it would be possible for Eve to make two different files have the same SHA1 checksum. If this file was a program, Eve could give Alice the legitimate version of the file while giving Bob a backdoored version of the file.

What can't this collision technique do?

Many programs still use SHA1, including Git for commit checksums, OpenPGP for signatures, and Tor for Hidden Services. Because SHA1 is being used as an integrity check, any data that has a matching SHA1 hash is considered legitimate. If someone can find a specific SHA1 collision, they would be able to create data that is considered legitimate. Let's say Eve wants to make a backdoored version of Alice's file that has the same SHA1 hash. If Eve can find a collision to Alice's file, that backdoored version would be considered legitimate. This attack is still impractical to perform though. You're not trying to find just any collision, you're trying to find a specific collision, so the Birthday Attack wouldn't work here.

What should we do?

Development teams should move away from SHA1 and use either SHA256 [1] or SHA3 [2] instead. The earlier we switch away from SHA1, the better. Chrome has already removed support for SHA1 certificates in version 56 [6]. TLS 1.3 is attempting to remove as much support for SHA1 as possible [3].

References

[1] D. Eastlake, T. Hansen, "US Secure Hash Algorithms(SHA and SHA-based HMAC and HKDF)", IETF RFC6234, 2011.

[2] NIST, "SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions", FIPS Publication 202, 2015.

[3] E. Rescorla, "The Transport Layer Security (TLS) Protocol Version 1.3 draft-ietf-tls-tls13-18", Network Working Group, https://tools.ietf.org/pdf/draft-ietf-tls-tls13-18.pdf, Accessed February 23, 2017.

[4] M. Stevens , E. Bursztein, P. Karpman, A. Albertini, Y. Markov, A. P. Bianco, C. Baisse, "Announcing the first SHA1 collision", Google Security Blog, https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html, Accessed February 23, 2017.

[5] M. Stevens, E. Bursztein, P. Karpman, A. Al-bertini, Y. Markov, "The first collision for full SHA-1", https://shattered.io/static/shattered.pdf, Accessed February 23, 2017.

[6] A. Whalley, "SHA-1 Certificates in Chrome", Google Security Blog, https://security.googleblog.com/2016/11/sha-1-certificates-in-chrome.html, Accessed February 23, 2017.