How do websites store passwords long term? - authentication

So how do websites keep passwords long term?
I mean really important websites, say a government or a big ecommerce or social networking website.
Sure, they store a hash (or salted hash) of the password in the webserver-connected datastore that is used for authentication, but is that it?
NOTE: I am not asking about hashing or salting, I'm asking about where the store the metadata (e.g., hash or salted hash) such that it's always available?
In fact, how do websites like Facebook store passwords? I'm guessing they would have multiple copies of the hash spread out over the world? And backed up to tape once in a while?

You only need the hash of the user's password for most applications. Usually, the actual password isn't stored, for security reasons. If the datastore is compromised, you wouldn't want the hacker to be able to gain the actual passwords for the users.
That's usually why, actually, the hashes are salted in the first place. Salting makes it much harder to use a rainbow table (a precomputed table of all possible combos for passwords going through a certain type of hash) to regain the original password, which the user may be using on other sites.
This was answered in more depth here: Best way to store password in database

the question is too broad to answer. It isn't just relevant for government web pages; it would be a real security issue if there are clear text passwords stored. Depending on security needs, there are password hashes used in most cases. If users need a certificate (e.g. stored on a card, or obtained using another process), there might be a public key of the user stored on the server (instead of the hash).
Your question also asks on completely different topics. For sure, a web backend database also needs backups (not only for passwords), and there are several load balancing techniques which may also consider geolocation topics etc.

Related

Should API Secrets Be Hashed?

It might sound like a silly question, because passwords of course need to be hashed and never store the original.
However, for API secrets, generally I see them displayed in the clear when signing up for them.
For example, if I go to the google api console and look at my credentials page, I can view my client secret, same for twitter.
Surely api keys are just as sensitive as passwords?
Is it just because from the provider side, you can be confident that a sufficiently strong password is being generated?
If that's the case, then that doesn't provide any protection is your database is compromised.
Or is it perhaps because if you are using token based authentication, you're either doing password grant type, which requires you to send your credentials along with the client id and secret, or a refresh token, so a user would have already had to have been compromised?
I can imagine a few possible answers to this:
In some cases, it may be required for the server to have persistent storage of the plaintext API key in order to satisfy usability requirements (Google and Twitter being examples).
In some cases, the API key alone is not enough to do much at all -- additionally one needs to have an authenticated account -- and therefore the API key by itself is of limited value (hence less value than a password).
In a number of cases, the API key is hardcoded in a client application (especially mobile applications, which almost always do this) and therefore it does not make sense to add the extra protection on the server side when the same token can be trivially extracted from the client.
The security industry is just not that mature yet. Maybe once hackers start dumping API keys, ideas like this may be taken more seriously.
BTW, I am very serious about the last point. The truth is that a lot of good ideas don't become a reality until there is a critical mass of support behind them. As an example, I once blogged about a related topic -- protecting user confidential information by hashing it in the database but in a way that it could be recovered when the legitimate user logs in. I used Ashley Madison as an example -- in that case, the hackers were more after email addresses, phone numbers, and physical addresses than passwords. So when the hackers snatched the database, they immediately had what they wanted, and they could care less about the bcrypt encoded passwords (in fact, some older passwords were encoded with only MD5!) Unfortunately, concepts like this do not have enough of a push to make them a reality. Even zero-knowledge web designs are very few in the real world.

Now that I know how to salt & hash passwords, a few more questions

So, let's assume I have read every article/post about appropriately salting and hashing passwords in order to secure user credentials.
This means I am not wondering what hashing algorithm to use (SHA1 vs. SHA2 vs. PDKBF2), how to generate the salt, how to store the salt, how to append the salt, or whether I should be writing the code myself vs. leveraging well-established libraries like bcrypt. Please, avoid rambling about these issues here as I have read 50+ other pages of that already.
Just assume the following is my approach (also note I understand this is not flawless or likely sufficient for applications like financial service, I am really just wondering if this is an acceptable min bar to claim that I "do the right thing").
User comes to my amazing website (www.myamazingwebsite.com) and logs in with email and pass.
I pull her salt and hash from my database. Assume the salt is lengthy enough, unique per-user, and created using a CSPRNG upon user registration.
I prepend the salt to her input password, hash it using SHA-512, run 1,000 iterations, then compare it to the hashed value pulled from the db:
var hash = sha512(salt + password);
for (i = 0; i < 1000; i++) {
hash = sha512(salt + password + hash);
}
If they match, the user is authenticated. Otherwise, they are not.
Now, my question is how secure is my above approach. The questions I would like help answering:
Do I need to change the salt periodically? For example, perhaps I could re-compute and store a new hash using a newly created random salt after every successful login. This seems like it would be more secure but I am not sure what standard practice is here.
The request to the server will be done via https. Does that mean I can assume that I can process all of the hashing and validation logic server side? Would most folks consider this sufficient, or do I need to consider some hybrid both on client and server side?
Anything else I am overlooking or need to consider?
Thanks in advance, I appreciate the help.
1) Assuming you've done the right thing and do not store their password, you can't change the salt unless they are logging in. I suppose you could change their salt every time they do log in, but it doesn't really help (and might hurt).
Here's why: Having a unique salt on everyone simply makes it harder for an attacker that has access to your database from attempting to guess the passwords. If you've done things correctly, he would have to use a different salt for each person. He can't just start guessing passwords using a site-wide salt and see if it matches anyone. As long as you have a unique salt for each user, you are doing the best you can.
In fact, changing the salt does nothing but give an attacker with access to your database over time MORE information. Now he knows what their password looks like salted two different ways. That could (theoretically) help crack it. For this reason, it would actually be ill advised to change the salt.
2) Https is sufficient. If someone can compromise https, then any additional client side hashing or such will not help. The clients computer is compromised.
3) I think you have a fair understanding of best password practices. Don't overlook other security issues like sql-injection and cross-site scripting.
Do I need to change the salt periodically?
No. The salt is a per-user public parameter that servers two purposes. First, it ensures that an attacker cannot build an offline dictionary of passwords to hashes. Second, it ensures two users with the same password have different hashed password entries in the database.
See the Secure Password Storage Cheat Sheet and Secure Password Storage paper by John Steven of OWASP. It takes you through the entire threat model, and explains why things are done in particular ways.
The request to the server will be done via https. Does that mean I can assume that I can process all of the hashing and validation logic server side?
This is standard practice, but its a bad idea. Its a bad idea because of all the problems with SSL/TLS and PKI in practice. Though this is common, here's how it fails: the SSL/TLS channel is setup with any server that presents a certificate. The web application then puts the {username, password} on the wire in the plain text using a basic_auth scheme. Now the bad guy has the username and password.
There's lots of other problems with doing things this way. Peter Gutmann talks about this problem (and more) in his Engineering Security book. He's got a witty sense of humor, so the book is cleverly funny at times, too even though its a technical book.
Would most folks consider this sufficient, or do I need to consider some hybrid both on client and server side?
If possible, use TLS-PSK (Preshared Key) or TLS-SRP (Secure Remote Password). Both overcome the problems of basic_auth schemes, both properly bind the channel, and both provide mutual authentication. There are 80 cipher suites available for TLS-PSK and TLS-SRP, so there's no shortage of algorithms.
Anything else I am overlooking or need to consider?
Cracking is not the only threat here. More than likely, the guy trying to break into your organization is going to be using one of the top passwords from the millions of passwords gathered from the Adobe breach, the LinkedIn breach, the Last.fm breach, the <favorite here> breach.... For example:
25 most-used passwords revealed: Is yours one of them?
The 30 Most Popular Passwords Stolen From LinkedIn
Top 100 Adobe Passwords with Count
Why bother brute forcing when you have a list of thousands of top rated passwords to use?
So your FIRST best defense is to use a word list that filters a user's bad password choices. That is, don't allow user's to pick weak or known passwords in the first place.
If someone gets away with your password database, then he or she is going to use those same password lists to try and guess your user's passwords. He or she is probably not even going to bother brute forcing because he or she will have recovered so many passwords using a password list.
As I understand it, these word lists are quite small when implemented as a Bloom Filter. They are only KB in size even though there are millions of passwords. See Peter Gutmann's Engineering Security for an in depth discussion.

Salting the Password with the Password

I am developing my first web app that requires a login, and it has come to the point when i must decide how to store the passwords. I have been doing a lot of reading on the proper way to hash the password and adding a salt. It occurred to me that most of the ways that are recommended would rely on some variation of information that is stored in the database with the password hash, be it some variation of using all or part of the username as a salt or some other random value.
Instead I was thinking of using the user own password as a salt on the password. Using an algorithm to jumble the password and adding it to itself in some way as the salt. Of course this to would be compromised if an attacker got access to both the stored hashes and the source code of the algorithm, but any salt would be compromised in such a situation. My application really probably does not need this level of security, but it was just something that i started to think about when reading.
I just wanted to get some feedback from some more experienced developers. Any feedback is appreciated.
If you derrive the salt from the password itself, you will loose the whole benefit of salting. You can then build a single rainbow-table to get all passwords, and equal passwords will result in equal hash-values.
The main reason to use a salt is, that an attacker cannot build one single rainbow-table, and get all the passwords stored in your database. That's why you should add a random unique salt for each password, then an attacker would have to build a rainbow table for each password separately. Building a rainbow-table for a single password makes no sense, because brute forcing is faster (why not just stop when the password was found).
Don't be afraid to do it right, often the programing environments have support to create safe hashes and will handle salting for your (e.g. password_hash() for PHP). The salt is often combined with the hash for storing, that makes it easy to store it in a single database field.
I wrote a small tutorial about securely storing passwords, maybe you want to have a look at it.
Simply duplicating the password may still be vulnerable to dictionary attacks, e.g. the password "hello" becomes "hellohello", and thus might be part of a dictionary.
Using a scrambled password as the salt enables the attacker to use a dictionary and then generate a rainbow table for all entries by adding the scambled password on every entry.
Why change a proven algorithm which can be understood by any developer? Just do it the default way and your code will be maintainable by anyone else.
"My application really probably does not need this level of security" - until that point in time it was hacked. Use a salt, it takes almost no additional effort. Do it now.
"eliminate the need of storing the password salt at all": the salt can be very small (6 bytes). It will hardly affect performance.
I just wanted to get some feedback from some more experienced developers. Any feedback is appreciated.
John Steven of OWASP performed an analysis, including threat modes, for password storage system. It explains the components and their purpose, like the hash, the iteration count, the salt, the HMACs, the HSMs, etc. See the Secure Password Storage Cheat Sheet and Secure Password Storage paper.
Cracking is not the only threat here. More than likely, the guy trying to break into your organization is going to be using one of the top passwords from the millions of passwords gathered from the Adobe breach, the LinkedIn breach, the Last.fm breach, the eHarmony breach, the <favorite here> breach.... For example:
25 most-used passwords revealed: Is yours one of them?
The 30 Most Popular Passwords Stolen From LinkedIn
Top 100 Adobe Passwords with Count
Why bother brute forcing when you have a list of thousands of top rated passwords to use?
So your FIRST best defense is to use a word list that filters a user's bad password choices. That is, don't allow user's to pick weak or known passwords in the first place.
If someone gets away with your password database, then he or she is going to use those same password lists to try and guess your user's passwords. He or she is probably not even going to bother brute forcing because he or she will have recovered so many passwords using a password list.
As I understand it, these word lists are quite small when implemented as a Bloom Filter. They are only KB in size even though there are millions of passwords. See Peter Gutmann's Engineering Security for an in depth discussion.

The proper way of implementing user login system

I want to make a user login system for the purpose of learning. I have several questions.
I did some research and found that the proper way of implementing a user login system is to store the user name/id and the encrypted/hashed version of the password in the database. When a user logs in, the password is encrypted client side (MD5, SHA-1 etc.) and sent to the server where it is compared with the one in database. If they match, the user log in successfully.
This implementation prevents DBAs or programmers seeing the cleartext of the password in the database. It can also prevent hackers intercepting the real password in transit.
Here is where I'm confused:
What if the hackers know the hash/encrypted version of password (by hacking the database) or DBAs, programmers get the hashed version of the password by just simply reading the text in the database. They could then easily make a program that sends this hashed version of the password to the server allowing them to successfully log in. If they can do that, encrypting the password doesn't seem very useful. I think I misunderstanding something here.
Is this (the way I described above) the most popular way to implement user login functionality? Does it follow current best practices? Do I have to do everything manually or does some database have the built-in ability to do the same thing? Is there a most common way/method of doing this for a website or a web app? If so, please provide me with details.
My former company used couchDB to store user login info including passwords. They did not do too much with the encryption side of things. They said couchDB will automatically encrypt the password and store it in the documents. I am not sure if this is a safe way. If so, then it is pretty convenient for programmers because it saves lots of work.
Is this way (point 3) secure enough for normal use? Do other database system such as mySQL have this kind of ability that can do the same thing? If so, does it mean that using mySQL built-in method is secure enough?
I am not looking for a very super secure way of implementing user login functionality. I am rather looking for a way that is popular, easy-to-implement, proper, secure enough for most web applications. Please give me some advice. Details provided will be really appreciated.
When a user login, client side code will encrypt the password by MD5 or SHA-1 or something like that, and then send this encrypted password to server side and then compare it with the one in database. If they are matched, the user log in successfully.
No, no, the client needs to send the unhashed password over. If you hash the password on the client side then that hash is effectively the password. This would nullify the security of the cryptographic hashing. The hashing has to be done on the server side.
To secure the plaintext password in transit it needs to be sent over a secure channel, such as an encrypted TLS (SSL) connection.
Passwords should be salted with a piece of extra data that is different for each account. Salting inhibits rainbow table attacks by eliminating the direct correlation between plaintext and hash. Salts do not need to be secret, nor do they need to be extremely large. Even 4 random bytes of salt will increase the complexity of a rainbow table attack by a factor of 4 billion.
The industry gold standard right now is Bcrypt. In addition to salting, bcrypt adds further security by designing in a slowdown factor.
Besides incorporating a salt to protect against rainbow table attacks, bcrypt is an adaptive function: over time, the iteration count can be increased to make it slower, so it remains resistant to brute-force search attacks even with increasing computation power.... Cryptotheoretically, this is no stronger than the standard Blowfish key schedule, but the number of rekeying rounds is configurable; this process can therefore be made arbitrarily slow, which helps deter brute-force attacks upon the hash or salt.
A few clarifications:
Don't use MD5. It's considered broken. Use SHA but I'd recommend something a little better than SHA1. - https://en.wikipedia.org/wiki/MD5
You don't mention anything about salting the password. This is essential to protect against Rainbow tables. - https://en.wikipedia.org/wiki/Rainbow_tables
The idea of salting/hashing passwords isn't really to protect your own application. It's because most users have a few passwords that they use for a multitude of sites. Hashing/salting prevents anyone who gains access to your database from learning what these passwords are and using them to log into their banking application or something similar. Once someone gains direct access to the database your application's security has already been fully compromised. - http://nakedsecurity.sophos.com/2013/04/23/users-same-password-most-websites/
Don't use the database's built in security to handle your logins. It's hacky and gives them way more application access than they should have. Use a table.
You don't mention anything about SSL. Even a well designed authentication system is useless if the passwords are sent across the wire in plain text. There are other approaches like Challenge/Response but unfortunately the password still has to be sent in plain text to the server when the user registers or changes their password. SSL is the best way to prevent this.

Multiple Password Hash

I'm currently working on a Web app that requires a high level of security and I've been thinking about the password handling. That I should use a hashed password, with a large enough salt is a given, but would it be a benefit to hash the password multiple times with different salts or different algorithms?
I'm not referring to the fact that you should hash the password multiple times to generate your password hash like Hash(Hash(Hash(salt + psw)))=pswhash, but instead I'm thinking about using Hash(Hash(Hash(salt1 + psw)))=pswhash1 and Hash(Hash(Hash(salt2 + psw)))=pswhash2, and then comparing to both upon login.
Using this routine an attacker mustn't only find one password that generates pswhash, but a password that must generate both hashes correctly. This way the possibility of an collision is virtually nil, but the attacker can use the second hash to determine if a password from the first hash is correct or not.
Additional information about the application:
The application is primarily an internal application for our company. Alla connections are handled with https, all usernames are unique for this application (ergo you can't choose your username) and all passwords are unique for this application (random generated, and you can't choose them). We are primarily concerned that someone gains unauthorized accesses to the system before we can react. If we have time to react the fact that "they" can find the exact password isn't that big a deal.
Use a tried-and-tested technique -- for example, PBKDF2 -- rather than trying to roll your own.
You should hash a password multiple times, enough times to take up a good fraction of a second. This makes it impossible for a hacker with access to your database to crack the passwords individually as the processing power required has gone up exponentially.
See this question for related information
I think hashing with multiple salts is basically another way of rehashing, and is security through obscurity, instead of doing that I would just stick to rehashing.