Hash Functions and Cryptography
Last updated
Last updated
A term like 'hash function' can mean several things to different people depending on the context. For hash functions in cryptography, the definition is a bit more straightforward. A hash function is a unique identifier for any given piece of content. It’s also a process that takes plaintext data of any size and converts it into a unique ciphertext of a specific length.
The first part of the definition tells you that no two pieces of content will have the same hash digest, and if the content changes, the hash digest changes as well. Basically, hashing is a way to ensure that any data you send reaches your recipient in the same condition that it left you, completely intact and unaltered.
But wait –– doesn’t that sound a lot like encryption? Sure. They’re similar, but encryption and hashing are not the same things. They’re two separate cryptographic functions that aid in facilitating secure, legitimate communications. So, if you hear someone talking about “decrypting” a hash value, then you know they don’t know what they’re talking about because, well, hashes aren’t encrypted in the first place.
We’ll speak more about the difference between these two processes a little later. But for now, let’s stick with the topic of hashing. So, what does hashing look like?
Here's a basic illustration of how the hashing process works:
One purpose of a hash function in cryptography is to take a plaintext input and generate a hashed value output of a specific size in a way that can’t be reversed. But they do more than that from a 10,000-foot perspective. You see, hash functions tend to wear a few hats in the world of cryptography. In a nutshell, strong hash functions:
Ensure data integrity
Secure against unauthorized modifications
Protect stored passwords
Operate at different speeds to suit different purposes
Hash functions are a way to ensure data integrity in public key cryptography. Hash functions serve as a check-sum, or a way for someone to identify whether data has been tampered with after it’s been signed. It also serves as a means of identity verification.
For example, let’s say you’ve logged on to public Wi-Fi to send an email. (Don’t do that, by the way. It can be very insecure.) So, you write out the message, sign it using your digital certificate, and send it on its way across the internet. This is what you might call prime man-in-the-middle attack (MitM) territory — meaning that someone could easily intercept your message (again, because public wireless networks are notoriously insecure) and modify it to suit their purposes.
The example above is of a digitally signed email that’s been manipulated in transit via a MitM attack. The hash digest changes completely when any of the email content gets modified after being digitally signed, signaling that it can’t be trusted.
So, now someone receives the message and they want to know it’s legitimate. What they can do then is use the hash value your digital signature provides (along with the algorithm it tells them you used) to re-generate the hash themself to verify whether the hash value they create matches the one you sent. If it matches –– great, it means that no one has messed with it. But if it doesn’t… well, red flags should go up, and they should know to not trust it.
Even if something tiny changes in a message — you capitalize a letter instead of using one that’s lowercase, or you swap an exclamation mark where there was a period — it’s going to result in the generation of an entirely new hash value. But that’s the whole idea here — no matter how big or small a change, the difference in hash values will tell you that it isn’t legitimate.
One of the best aspects of a cryptographic hash function is that it helps you to ensure data integrity. But if you apply a hash to data, does it mean that the message can’t be altered? No. But what it does is inform the message recipient that the message has been changed. That’s because even the smallest of changes to a message will result in the creation of an entirely new hash value.
Think of hashing kind of like you would a smoke alarm. While a smoke alarm doesn’t stop a fire from starting, it does let you know that there’s danger before it’s too late.
Nowadays, many websites allow you to store your passwords so you don’t have to remember them every time you want to log in. But storing plaintext passwords like that in a public-facing server would be dangerous because it leaves that information vulnerable to cybercriminals. So, what websites typically do is hash passwords to generate hash values, which is what they store instead.
But password hashes on their own aren't enough to protect you against certain types of attacks, including brute force attacks. This is why you first need to add a salt. A salt is a unique, random number that’s applied to plaintext passwords before they’re hashed. This provides an additional layer of security and can protect passwords from password cracking methods like rainbow table attacks.
It’s also important to note that hash functions aren’t one-size-fits-all tools. As we mentioned earlier, different hash functions serve different purposes depending on their design and hash speeds. They work at different operational speeds — some are faster while others are much slower. These speeds can aid or impede the security of a hashing algorithm depending on how you’re using it.
An example of where you’d want to use a fast hashing algorithm is when establishing secure connections to websites. In this example, having a faster speed matters because it helps to provide a better user experience. However, if you were trying to enable your websites to store passwords for your customers, then you’d definitely want to use a slow hashing algorithm. At scale, this would require a password-cracking attack (such as brute force) that takes up more time and computing resources for cybercriminals. You don’t want to make it easy for them, right?