June 25, 2014

Storing passwords is nowadays one of the most basic things any website need to do. Well, you don’t need to do it, as there are a lot of third-party login providers, but sadly it’s the de-facto authentication method. That would be ok if it wasn’t because most websites and applications do it wrong. Very, very wrong.

This posts aims to serve as a basic guide and explanation to what to do when faced with this. I’m no security expert, but I’ve done my homework over the years, and I know the basics, which should be anough to at least do things competently.

I insist, this is not intended as a thoroughly written list, it’s just the basics of passwords. When I see any of this not being applied, I worry. Any tech-savvy person does.

## Don’t place limits on passwords

Except maybe a minimum length of 6-8 characters. Apart from that, let me screw up. Don’t make me add uppercase letters, numbers, etc. to a password over 30 letters long. That won’t make it stronger, but harder to remember. There are certain websites that add a maximum length to the password, usually too short. NEVER do that. That is effectively forcing me to use a weak password. Length is better than character diversity. If your concern is about database storage, keep reading.

If you really need to, use some kind of quality test. Quality tests are another concern, as they are sometimes coded to favor character diversity rather than length.

There is however a valid concern here: character encoding. I don’t really know much about how do browsers handle encoding in input fields, but I see how could it be a problem if someone used characters not in the extended-ASCII plane. In particular, Unicode allows some redundancy by having certain characters to be represented using different combinations of code points. The glyph é, for example, can be obtained with a combining ´ then an e, but also as just é. That is not a problem for the english world, but it might be for pretty much every other language. I really don’t know if other writings have the same problems, but it is a concern because the same password might be encoded using different code points and therefore won’t match if entered differently from time to time.

Summarizing: Don’t place restrictions other than minimum length and potentially minimum quality. In the latter case, disallow very bad passwords but don’t be very restrictive. Limit passwords to extended-ASCII only if you think using Unicode might be a problem, but don’t be too eager to do this if you can avoid it.

Never ever for any reason store a password in any way that allows it to be recovered directly (that is, other than by guessing). This includes storing a password in plain text or using some kind of encryption. If it can be reversed, it will be reversed. The attacker will get somehow the encryption key and screw you up.

Instead, use one-way algorithms. That is, use hash functions. You don’t need the password other than for checking that it is correct, and you dont need to recover the original password for that. You can hash both and check if they are the same.

Note that this justifies a point made before. A hash function always outputs a result of a fixed length. It doesn’t matter if the input is empty or two million bytes. This means that your database doesn’t care about password size. You always store the same number of bytes, so adding a (large) limit to a password field can be ok to avoid people entering The Oddysey as its password, or to avoid denial of service attacks based on password length.

## Use individual, random and unique salts

A security salt is a random string used to give password hashes diversity. It is added (concatenated) to the user password, changing the input to the hash function. The idea is that two passwords that are the same will not be hashed in the same way. Consider the following examples:

password none c8fed00eb2e87f1cee8e90ebbe870c190ac3848c
password 1b30c 5271db38d75fcdfc24078af6a301136630a9e2ba
password 41af5 bed0576de70e21cbdf73451728394e2639842ae0

The same password, three different salts. Well, two salts and no salt.

If you don’t use a salt or use always the same salt, every user that has the same password will have the same hash stored in your database. That’s the greatest thing you can give an attacker: You brute-force one password and suddenly you have brute-forced a hundred.

Of course, nothing prevents an attacker from trying already cracked passwords first, but reusing a salt would give clues about which ones are repeated, and that’s bad.

You might have noticed that the salt is random. Well, it was me pressing keys as I saw fit in that case, but it must be random. It also must be longer, at least as long as the hash result length, maybe more. As a side effect of this randomness, you must store the salt along with the hash, in order to check incoming passwords against the same hash and salt. This is ok, salts are not meant to be secret.

As an extra: Remember to create a new hash for every password you store. Even when users change their paswords. Even if the new one is the same, just generate a new hash. However, don’t go too far! Changing the hash without changing the password doesn’t improve security.

## Use a hashing function designed for passwords

I must admit: The last two points were superfluous: They can just be ignored by following this one. However, they provide the why to it, and ensure that if this point cannot be followed, you can do it right.

The hash functions typically chosen for password storage are cryptographic hash funcitons. MD5, SHA-1, SHA-2, etc. are cryptographic hash functions. Cryptographic hash functions are designed to be secure and fast, but fast is something we don’t want. We want password hashing to be slow, so an attacker has to spend more time to brute-force a password.

But, how slow? If the password takes about 300ms to be hashed or checked in your server, I would consider it perfect. 300ms is not really noticeable for the user, and only happens on sign up or login. An attacker with very good hardware might get to 1ms per guess. That is A LOT, 1000 guesses per second is a really low rate. Current hardware can make millions of guesses per second on cryptographic hash functions like SHA-1.

If you don’t have a password-specific hashing function at hand, you can apply the same algorithm a few times to slow down the hashing1. You can also store the number of iterations with the password, so you can increment it at some point in time and have old passwords unaffected.

Password hash functions do everything for us: They use a salt, they apply an algorithm a configurable amount of time, and everything is stored with the resulting hash. bcrypt, scrypt and similar are good choices, and are a better option than manually performing all the steps. They are also well-known, so you don’t have to document them if you migrate your codebase.

Or let a plaintext version of a password leave your server for any reason. The only place where a password should be in plaintext is in memory as little as possible (using some secure-memory storage if your are super-paranoid and your language supports it) and in the input field on the user’s browser (where it’s even hidden from sight!). This does not include the HTTP request at all, so be sure to use HTTPS!

Emailing a password is a very bad sign. There are three different situations in which you might be tempted to send the password to the user’s inbox:

• Upon registering: You have the password right there, so you send it to the user in order for them to remember it. Wrong! the password is now present in the mail server, the user’s inbox, a few hundred internet router’s, the user browser cache…

• When the user requests a password reset: So the user does not remember their password! Let’s generate a new one and send it to tem…

Wrong! the password is now present in the same places as in the previous point, plus the user won’t remember it.

Instead, send link. The link contains a one-time code that will allow a password change! This is essentially the same as forcing the user to change the password after generating one for them, but a bit faster and less bothersome.

• When the user presses “password forgotten”: The user doesn’t remember his password, so we send it to him!

Wrong! VERY WRONG! I’ve saved this for the end because by now you should realize that you can’t do it. You shoudn’t be able to do it. If you can, please, start reading this article again!

You can’t do this because you don’t have the password! You have no way of recovering the password, so your only possible action is for the user to set a new one or let them chance it.

Everything you just read also applies to showing the password in some webpage. Don’t generate a password then show it, don’t show a user his password after signing up, and of course don’t show him his password later, because if you do it means you can do it and you haven’t learned anything today.

The correct line of action when the user doesn’t remember her password is to send her a link to a previously verified e-mail address that allows a password change. The link should be usable only once and for a very short timespan (15 minutes, maybe as much as a day, but that’s it).

## Conclussion

These were the most fundamental things I could remember that I see not being followed time after time. Since a few months, I’ve been reluctant to sign up anywhere using a passwords, and I greatly favor Google, GitHub or Twitter logins. It allows me to enter without thinking, I don’t have yet another place where my password can be stolen, and I can revoke access any time if the site is compromised. The reasons are, among others, that most sites do NOT follow these practices and are therefore insecure. And because I just can’t know that, I simply refuse to give them a password unless it is absolutely necessary.

1. Never use a sleep function to slow down hashing! That would slow it down in your server, but not on the attacker’s machine!