Pages

Sunday, July 3, 2011

What are MD5 and SHA-1 Hashes


banner-01

You may have seen MD5 hashes listed next to downloads during your internet travels, but what exactly are they? Let’s take a look at what these cryptic strings are and how you can use them to verify your downloads.

What Are Hashes and What Are They Used For?

500px-Cryptographic_Hash_Function.svg
(Image credit: Wikimedia Commons)

Hashes, “digests,” are the products of cryptographic algorithms(in short, they’re a set of instructions used by computers to manipulate data). Many hash functions are designed to produce a fixed-length digest, regardless of the size of the input data. Take a look at the above chart and you’ll see that both “Fox” and “The red fox jumps over the blue dog” yield the same length output.

Another factor is complexity. Compare the second example in the above chart to the third, fourth, and fifth. You’ll see that despite a very minor change in the input data, the resulting hashes are all very different from one another. This is a sign of complexity of the algorithm (at least to our non-programmer eyes) and helps make it so that working backwards from the hash to the data is very difficult. Passwords are often stored as hashes because of this reason; it’s easy to take the password during a login attempt and compare it to the stored hash. On the the other hand, if someone has the hash, it’s very difficult to work backwards to the original input. When people try to crack passwords, they usually don’t work backwards, but instead use a dictionary of known hashes (usually of common passwords and key patterns) to compare the stolen ones with.

Data Verification

sshot-1

MD5, the Message-Digest Algorithm, has been used in multiple types of security-based programs in the past, but it’s also widely employed for another purpose: data verification. These types of algorithms work great to verify your downloads. Imagine, if you will, you’re online trying to grab the latest Ubuntu release from BitTorrent. Some horrible troublemaker starts distributing a version of the .iso you need but with malicious code embedded into it. Not just that, he’s clever, so he makes sure the files are exactly the same size. You would’t know you had the bad file until you tried to boot the CD, and by then, permanent damage could have already occurred!

sshot-2

Thankfully for us, Canonical posts the MD5 checksums for its images online. You can run a hash check yourself with any number of tools, and then check it against the posted checksum. If there are any differences at all, you know that the file you have was tampered with, did not complete properly, or something else prevented the data from matching. This way you prevent any damage to your system before you run anything, and you can just re-download the appropriate file.

This comes in handy not just for Linux distros, but for other things like BIOS files, third-party Android ROMs, and router firmwares – all things that could potentially “brick” your devices if the data is tainted. In general, large files have a larger risk of data corruption, so you may want to run your own checksums if your archives are important.

MD5 is no longer considered completely secure, and so people have started to migrate to other commonly used hash algorithms like SHA-1. This last one in particular is used for data verification more and more often so most tools will work with both of these algorithms.

No comments:

Post a Comment