How To Securely Store User Passwords
When plain text is obviously not ideal.
At some point, you might find yourself in a situation where you need to store user passwords, and if you're anything like me, you probably know that you shouldn't store the passwords as plain text, but you might not know how to actually, properly store them.
How can you safely store passwords?
In order to safely store the passwords, we want to store the passwords in such a way that noone, not even the ourselves, can read the password.
Furthermore, it should be impossible to crack the password in case the database is stolen in any reasonable amount of time.
Below, we go through different storage schemes starting from something obviously bad in order to motivate the various security measures.
However, if you just want the answer, it’s storing an iterative hash of the password seasoned with salt.
Storing passwords as plain text is obviously a bad idea as it accomplishes none of our goals. Anyone with access to the database can just directly read the passwords.
Encrypt the passwords
This is better than plain text as you can’t immediately read the passwords if you get access to the database. However, it doesn’t prevent employees with access to the encryption key from reading the passwords.
It’s also not at all useful if an attacker gets access to the encryption key along with the database.
Hash the passwords
A better way to store the passwords is to not store them at all.
For this, you can use cryptographic hash functions such as SHA which are a kind of one way function, meaning you can’t get the password from the hash.
There are other properties that makes cryptographic hash functions special compared to ordinary hash functions. For a more complete overview, I encourage you to check out the Wikipedia page.
However, an attacker may still just bruteforce guess the passwords until they get a matching hash. This is problematic with fast hash functions such as SHA-1 where an attacker with a reasonably fast computer might be able to compute 10 billion combinations per second.
To avoid this, you can use iterative hashing where you apply the hash function to the output of the hash function a number of times to make the process slower.
The idea is to make it fast enough so that it doesn’t cause a noticeable latency for a legitimate user, but is too slow for an attacker with direct access to your database. To keep up with newer and faster computers, you will want to increase the number of iterations every few years.
However, this approach isn’t robust against rainbow tables found on the internet where hashes of common passwords are precomputed.
This approach is also weak against frequency analysis attacks where an attacker uses the number of times the same password occurs to figure out what the password is.
Season with salt
By adding a few random bytes to the password before hashing, you make it impossible to just precompute hash values for common passwords, and since the same passwords will no longer have the same hashes, you eliminate the possibility of frequency analysis attacks.
The salt can be stored as plain text in the database, and must be different for every entry.
Now, this won’t help against users who have chosen a poor password, but if implemented properly, it will make it pretty much impossible to crack strong passwords in any reasonable amount of time.
Below is a sample Python implementation of a password interface.
import hashlib import random import string def hash_password(password, salt=None, iterations=100000): # Do type checking if not type(password) == type("String"): raise TypeError("Password should be a string") # if no salt is given # generate 16 alphanumeric long salt using system random if not salt: salt = ''.join(random.SystemRandom().choice(string.ascii_uppercase + string.digits) for _ in range(16)) # encode to make it compatible with hashlib algorithm encoded_password = bytes(password, encoding='utf-8') encoded_salt = bytes(salt, encoding='utf-8') pass_hash = hashlib.sha3_512(encoded_password+encoded_salt).hexdigest() # use iterative hashing for _ in range(iterations): pass_hash = hashlib.sha3_512(bytes(pass_hash, encoding="utf-8")+encoded_password).hexdigest() return (pass_hash, salt) def validate_password(password, pw_hash_tuple): # Do input validation if not len(pw_hash_tuple) == 2: raise ValueError("pw_hash_tuple should have length 2") if not type(password) == type('string'): raise TypeError("password should be a string") for item in pw_hash_tuple: if not type(item) == type("string"): raise TypeError("items in pw_hash_tuple should be strings") stored_pw_hash = pw_hash_tuple stored_pw_salt = pw_hash_tuple # compute the hash of guesspassword using the same salt user_pw_hash_tuple = hash_password(password, salt=stored_pw_salt) # compare the two hashes if user_pw_hash_tuple == stored_pw_hash: return True else: return False pw_hash_tuple = hash_password("123456") guess_password = "123456" print (validate_password(guess_password, pw_hash_tuple)) # True
Edit: Thank you Scott Arciszewski for pointing out that when using iterative hashing, we need to append the password at each step to avoid collapsing the searchspace. He also suggested using bcrypt as demonstrated in his article.
Here we use the SHA-3 algorithm which is still considered strong by the end of 2017. If you're reading this a year or two in the future, you should do your own research to find an acceptable number of iterations and what algorithms are considered strong. The wikipedia page might be helpful.
You can also use the following command to see which algorithm hashlib supports.
Unless you're doing it as a learning experience, it's usually not recommended to implement a hash function yourself, much less try to invent one. Instead, it's recommended to use a library implementation of a hash function that is still considered strong.
In the implementation above, I use Python's built in
hashlib that implements many common hash functions.
You can find the code on github.