Related
I'm creating a script to detect weak passwords within a MySQL database. Which method would work the best?
I've been researching a few methods, but can't seem to decide which one would offer the best results with the best performance. I currently have the following methods in mind:
Extract passwords, and perform a dictionary attack on each.
Still extracting passwords, but to a file and use a tool like Hydra.
Perform a regex matching, that hits on basic passwords.
Please note that all passwords in the database is encrypted with a md5 hash.
Every now and then I come across applications that force you to change passwords once in a while. Almost universally, they have this strange requirement for the new password: it has to be "significantly" different from your previous password(s).
While at first this sounds logical, next thing I think is: how do they do that? Do they store my passwords in plain text? I would have accepted the answer that they do, if it wasn't for the fact that these are kinds of applications that pretend to care about security so much they force you to change your password if it is expired! Microsoft Exchange is one example of this.
I'm not very good at cryptography and hash functions, so my question is this: Is it possible to enforce this kind of policy without storing passwords in plain text?
Do you know how this policy is implemented in real world applications?
UPDATE: An Example.
I was recently changing my Microsoft Exchange password. I only use Web Access, so it might be different a little -- I have no idea.
So, it forces me to change my password. What I do sometimes is I change it to something new and then change it back almost immediately. The freaky part is that It did not allow me to even change it back because of this. I tried changing it a little, by adding a letter in front of it or changing one symbol -- no luck, it was complaining.
With a typical hash, the best you can do is see if the new password is exactly equal to previous ones. You can break the password into multiple hashes in order to get more flexible with comparison, for example 3 hashes:
Alpha characters only
Numeric characters only
All other characters
You could for example require all the hashes to change to be accepted, to prevent users from just changing their password from SecretPassword01 to SecretPassword02.
A cryptographic expert may weigh in here on if this could be made as secure as a single hash.
NOTE that this is not as secure as a single hash, so before you go implementing this, make sure you have really done your research.
When changing password you're usually asked for the old one to confirm your identity. It's then trivial to compare the old one and the new one to see how much they differ. TBH I don't know how to compare to several previous passwords without storing them, but that's getting into the territory of ridiculous policies anyway.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I was reading a few articles on salts and password hashes and a few people were mentioning rainbow attacks. What exactly is a rainbow attack and what are the best methods to prevent it?
The wikipedia article is a bit difficult to understand. In a nutshell, you can think of a Rainbow Table as a large dictionary with pre-calculated hashes and the passwords from which they were calculated.
The difference between Rainbow Tables and other dictionaries is simply in the method how the entries are stored. The Rainbow table is optimized for hashes and passwords, and thus achieves great space optimization while still maintaining good look-up speed. But in essence, it's just a dictionary.
When an attacker steals a long list of password hashes from you, he can quickly check if any of them are in the Rainbow Table. For those that are, the Rainbow Table will also contain what string they were hashed from.
Of course, there are just too many hashes to store them all in a Rainbow Table. So if a hash is not in the particular table, the hacker is out of luck. But if your users use simple english words and you have hashed them just once, there is a large possibility that a good Rainbow Table will contain the password.
It's when somebody uses a Rainbow table to crack passwords.
If you are worried about this, you should use Salt. There is also a Stack Overlow question that might help you understand salt a little better than Wikipedia...
This is a useful article on Rainbow Tables for the lay person. (Not suggesting you are a layperson, but it's well written and concise.)
Broadly speaking, you encrypt a vast number of possible short plaintext strings (i.e. for passwords), and store the encrypted values alongside the plaintext. This makes it (relatively) straightforward to simply lookup the plaintext when you have the encrypted value.
This is most useful for weak and/or unsalted password hashes. A popular example is the LAN Manager hash, used by versions of Windows up to XP to store user passwords.
Note that a pre-computed rainbow table for even something as simple as the LM hash takes a lot of CPU time to generate and occupies a fair amount of space (on the order of 10s of gigabytes IIRC).
Rainbow Tables basically allow someone to store a large number of precomputed hashes feasibly.
This makes it easy to crack your hashed passwords, since instead of performing a whole heap of hashing functions, the work has already been done and they virtually just have to do a database lookup.
The best protection against this kind of attack is to use a salt (random characters) in your password. i.e. instead of storing md5(password), store md5(password + salt), or even better md5(salt + md5(password)).
Since even with rainbow tables, it is going to be near impossible to store all possible salted hashes.
BTW, obviously you have to store your salt with your hash so that you can authenticate the user.
Late to the party but I was also aware of Rainbow Tables being a method of attack on hashed/unsalted passwords. However on Twitter recently http://codahale.com/how-to-safely-store-a-password/ was shared and depending on your needs and concerns.. you may not be able to salt your way to safe password storage.
I hope this is informative to you.
Wikipedia is your friend:
http://en.wikipedia.org/wiki/Rainbow_table
Alice & Bob are both secret quadruple agents who could be working for the US, Russia or China. They want to come up with a scheme that would:
If they are both working for the same side, prove this to each other so they can talk freely.
If they are working for different sides, not expose any additional information about which side they are on.
Oh, and because of the sensitive nature of what they do, there is no trusted third party who can do the comparison for both of them.
What protocol would be able to satisfy both of these needs?
Ideally, any protocol would also be able to generalize to multiple participants and multiples states but that's not essential.
I've puzzled over it for a while and I can't find a satisfactory solution, mainly owing to condition 2.
edit: Here's the original problem that motivated me to look for a solution. "Charlie" had some personal photos that he shared with me and I later discovered that he had also shared them with "Bob". We both wanted to know if we had the same set of photos but, at the same time, if Charlie hadn't shared a certain photo with either of us, he probably had a good reason not to and we didn't want to leak information.
My first thought would be for each of us to concatenate all the photos and provide the MD5 sum. If they matched, then we had the same photos but if they didn't, neither party would know which photos the other had. However, I realized soon after that this scheme would still leak information because Bob could generate an MD5 for each subset of photos he had and if any of them matched my sum, he would know which photos I didn't have. I've yet to find a satisfactory solution to this particular problem but I thought I would generalize it to avoid people focusing on the particulars of my situation.
For both problems, you could use a Secure two-party computation equality-algorithm. There are many schemes, for example this by Damgard, Fitzi, Kiltz, Nielsen and Toft: Unconditionally Secure Constant Round Multi-Party
Computation for Equality, Comparison, Bits and Exponentiation.
Of course an agent could try to pose as an agent from another side to get a 1/3 chance to discover the true side of another agent, but that seems unavoidable.
A much simpler scheme for the photo-problem, which should be almost as good as the secure multiparty computation, is the following:
Alice and Bob sorts their pictures and generate a SHA-512 hash.
Alice sends the first bit of her hash to Bob.
Bob compares the bit to the first bit of his hash. If it is different, they know that they have received different photos. Otherwise they continue.
Bob sends the second bit of his hash to Alice.
Alice checks this bit and decides whether to continue.
Continue until the protocol aborts or all bits have been checked.
So they are guaranteed to be quadruple agents? That is they are guaranteed to be secretly working for one faction while pretending to work for a second while pretending to work for a third while pretending to work for a fourth? They are limited to just the US, Russia or China? If so then that means that there will always be at least one faction they are both pretending to work for and are simultaneously actually working for. That seems to negate their ability to be quadruple agents, because surely one of them can't be working for the Americans while secretly working for the Americans, while secretly working for the Americans, while secretly working for the Americans.
You say that the ideal solution would generalize to arbitrary numbers of states and spy-stacks. Can the degree of secret agent-ness be either higher, equal or lower than the number of states? This might be important. Also, is Alice always guaranteed to have the same degree of agent-ness as Bob? i.e. They will ALWAYS both be triple agents, or ALWAYS both by quintuple agents? The modulo operator springs to mind...
More details please.
As a potential answer, you can enumerate the states into a bitfield. US=1 Russia=2, China=4, Madagascar=8, Tuva=16 etc. Construct a device that is essentially an AND gate. Alice builds and brings one half and Bob builds and brings the other. Separated by a cloth, they each press the button of the state they're really working for. If the output of the AND gate is high, then they're on the same side. If not, then they quietly take down the cloth, and depart with the respective halves of their machine so that the button can't be determined by fingerprint.
This is not theoretical or rigorous, but practical.
For your photos problem, create hashes for all subsets of your photos; randomly select a subset of these, and shuffle in an agreed quantity of randomly generated hash values. Bob does the same, and you exchange these sets. If the proportion of hashes in what Bob has sent you that matches ones you can generate by hashing subsets of your photos significantly differs from what you expect, it is likely you have a significantly different corpus of photos from him. If the proportion of random hashes you agree on is high, you risk being unable to detect small differences in your collections of photos; if the proportion is low, you risk exposing information about missing photos; you will have to select a suitable point for the tradeoff.
Interesting.
I think, no matter what the scheme, it'll need to involve a component of random failure. This is because of the conflicting requirements. You would need a scheme that, occasionally, even when they are on the same side, doesn't work. Because if it always worked, they would immediately be able to determine they aren't on the same side.
Your point 'B' is also vague. You say you don't want to expose what side they are on. Does that mean that the info can't point to specifically one of the sides? Is it okay if Alice thinks Bob is from either one of the others?
Also, have you tried emailing this to the cryptography mailing list? May get a better response there. It's an interesting one to think about :)
Here's the closest I've come to a solution:
Assume there is a function doubleHash such that
doubleHash(a+doubleHash(b)) == doubleHash(b+doubleHash(a))
Alice generates a 62 bit secret and appends the 2 bit country code to the end of it, hashes it and gives Bob doubleHash(a).
Bob does the same thing and gives Alice doubleHash(b).
Alice appends the original secret to the hash that Bob gave her, hashes it and publishes it as doubleHash(a+doubleHash(b)).
Bob does the same thing and publishes doubleHash(b+doubleHash(a)).
If both the hashes match, then they are from the same country. On the other hand, if they don't match, then Bob can't decipher the hash because he doesn't know Alice's secret and vice versa.
However, such a scheme relies on the existence of a doubleHash function and I'm not sure if such a thing is possible.
The most simple thing I can think of with the photos that would possibly work is as thus:
Hash all the photos with a 4096 bit hash.
Sort the photos by hash value. ( Hashes are afterall, just a string representation of a large number )
using that sort order, use a streaming system to pipe, and hash, those photos, as if they were a singular file.
Share your hashes.
If the hashes match, you have the same files. ( low low risk of incorrect positive match, but at 4K hashing, its a bit unlikely )
There are of course, a few weaknesses here:
Don't share how many photos you have. Doing so could permit the party with the greater number of photos do intelligent permutation of the data and remove photos from the hash set they suspect likely you don't have, using the number as a guide, and find ( at great computational expense mind ) a set of images that matches your hash.
They can do 1 without the number, but its harder, and they're out of luck if they actually have less photos.
They could create a fake hash, simply with a random number generator, and send it to you, giving you the impression you had different datasets when you really had the same.
The above weaknesses are also prevalent in your country code identification system, except of course, you have far less entropy to get in the way, and its far easier to fraud the system. ( and thus, far far far easier to work out who they are by sheer brute force, or have yourself worked out by brute force, regardless of how fancy your hash algorithm is )
If this were not the case, you would have already been found out by the very agencies you work for, because something that reliable and secure would be a sure fire way to do a secure background check.
The Photo Scenario is Impossible to Achieve:
Your scheme is impossible for the reasons that you name.
Consider a function f, which takes two sets of photos, s1 and s2.
f(s1, s2) returns true if s1=s2 and false if s1!=s2.
That is, this function implements the scheme you want.
Bob can always supply a subset of photo's he has, and learn which photo's charlie doesn't have.
There is no way around this, any function which has the property you want can not have the security you want.
The Spy Scenario is Even More Impossible:
As Kent Fredric pointed out the spy scenario has even greater inherent weaknesses.
It has all problems of the photo scenario, plus the additional weakness of having only four secrets.
In the photo scenario it would be highly unlikely that Bob would randomly guess one of Charlies photographs.
It is trivial in the spy scenario for Bob to guess Alices choice (1/4).
The spys only have four countries they can belong to, as they are both quadruple agents they both know all the secret code words for each country.
Thus, Bob could pretend to be working for the Chinese to test Alice.
A Different Type of Solution:
Some posters have noted, the security can be increased if you weaken the accuracy of f.
Of course if it is not accurate what is the point. I propose a different type of solution.
Do not let them compare the same
photographs more than one time.
The party which wishes to initiate the comparison must first show that this is a new comparison and does not use any of the pictures from before.
EDIT: Problems with Double Hash
I am making some assumptions about the doublhash protocol, but...
For the photograph scheme, the doublehash protocol is no better than f, because the 62 bit secret must be constructed from a set of photographs for the comparison to be meaningfull. The subset attack mentioned in the original question still applies here. Try all subsets of photographs to brute force the secrets you can generate, thus Bob can see if he can generate the same secret as Alice.
Using the doublehash property Bob can still brute force the secret.
doubleHash(s1+doubleHash(b)) != doubleHash(aliceSecret+doubleHash(a))
doubleHash(s2+doubleHash(b)) != doubleHash(aliceSecret+doubleHash(a))
doubleHash(s3+doubleHash(b)) == doubleHash(aliceSecret+doubleHash(a))
Bingo, aliceSecret == s3.
DoubleHash is only as strong as it is hard to bruteforce either a or b
Implementating DoubleHash
Instead doubleHash(a + doubleHash(b)), try doubleHash(a, md5(b)).
DoubleHash(a + doubleHash(b)) is bad because Bob could generate colliding hashes like so:
doubleHash((12 + doubleHash(34)) + doubleHash(5678))
= doubleHash((34 + doubleHash(12)) + doubleHash(5678))
= doubleHash(5678 + doubleHash(12 + doubleHash(34))
= doubleHash(5678 + doubleHash(34 + doubleHash(12))
Here is an implementation of doubleHash using the new formulation,
Doublehash(a, hashOfB){
hashOfA = md5(a)
combinedHash = hashOfA xor hashOfB
return md5(combinedHash)
}
One could also use the math behind blind signatures to impliment version of doubleHash.
Wouldn't RSA work here? Each nation knows its private key, you publish your public key, and only nations that are the same can decrypt the info. I guess the second person would know that the first isn't on the same side as they are, however.
Hmm.
How about Public Key Cryptography?
Without thinking about it at all I just want to say I should allow every character. It gets hashed in any case, and I don't want to limit people who want to create strong passwords.
However, thinking about it more, there are plenty of characters that I have no idea what effect they'd have on things. Foreign characters, ascii symbols, etc. to name a couple.
I tried to Google but I can't find any definitive standard for what people do. Even most professional organizations don't seem to know. It seems to be a common practice for many sites to disallow special characters altogether, which is just silly and not what I want to do.
Anyway, are there any standard recommendations for length, allowed characters, and so forth?
I'm not sure if it matters, but I'll be using ASP.NET w/ C#
Any printable, non-whitespace ASCII character (between 33 and 126 inclusive) are typically allowed in passwords. Many security professionals (and SO commenters) are advising the use of a passphrase in place of a password, so you'd have to allow spaces. The argument is that due to their length, and since phrases aren't in a dictionary, passphrases are more difficult to crack than passwords. (A passphrase can also be easier to remember, so a legitimate user doesn't have to keep it written down on a sticky-note right on their monitor.)
Some strong password generators use a hash, so I'd put a very high limit on the length (512 or 1024) just to be inclusive. Password generators today often yield strings of 32-128 characters, but who knows what hashes will be used in the next few years.
Non-ASCII characters certainly make things harder when it comes to entering the password on limited devices (mobiles, consoles etc) - but usually not impossible. Arguably if the user wants to do that, you should let them. It's easy enough to do a reasonable and consistent thing - encode in UTF-8 before hashing, for example. You'd only get into difficulties if some input device sent the characters as a composition (e.g. e + acute accent instead of "e acute") - but I suspect that wouldn' t happen in real life. (You could decompose everything yourself, but that would be a lot of trouble to go to for an edge case.)
I'd restrict it to printable characters, however. Putting tabs, form feeds etc in a password really is asking for trouble.
Not an expert, but I hate when characters I choose and not that bizarre are rejected. So, I think I agree with your gut.
Short answer: allow as much as the system backing it can support. Nowadays there's really no excuse not to use full unicode support for text entry, and that includes passwords. I don't think you need to worry about problems with characters as long as they're handled literally (but I'm not a pro in this field--beware of sql injection).
I have a pet peeve against sites that impose restrictions on passwords... any kind of restriction. I like sites that will tell you how strong your password is and recommend you make it stronger, but forcing a user to type at least 8 characters, or to require both letters and numbers, etc. is just plain frustrating.
If you need to have a maximum field size (for example for storing in a database) try to make it large enough for anything that people would type out by hand. There's really no such thing as a too-large password field since there's always the potential to use an automated, generated strong password, but 64 to 128 characters would certainly suffice.
Fundamentally, most of the unicode class of characters should be allowed. Do skip however control characters (e.g. 0-31 besides space), the byte order mark (0xfffe and oxfeff). Further, you want to first canonicalize the representation to get rid of problems caused by differing representations. You might issue warnings though for characters that seem to be too hard to enter, but users will guard against that themselves.
Remember: When you are storing passwords, all passwords should be encrypted with a one-way algorithm like md5 of sha1. Since these algorithms always yield hexadecimal numbers, you don't need to worry about SQL injections or anything like that.
So, as long as you can md5 or sha1 a character, it should be accepted.
If you are talking about preventing SQL-injection type of attacks, it is probably a better idea to make sure your code does what it is supposed to do, rather than relying on restricting the input so the problem becomes easier.
For non-ascii characters, I don't see that as a more difficult problem if your input can be correctly represented as a binary string (and not as text), which is then passed to your hash function or key generator, etc.
Add another vote for "let the user include any and all characters that their interface allows them to enter". I wouldn't even disallow tab or control characters. Your software has the capability to accept arbitrary byte strings and hash them, so accept arbitrary byte strings as passwords. To do otherwise reduces the space which an attacker must search in a brute-force or dictionary attack.
(Of course, even if you do allow everything, 99% of users will still use their pet's name as their password...)
Eventually you may have to print out the clear password in a confirmlation email sent to your users.
PS: Might consider also encoding problems in the email, if it's not standard ascii (eg. Japanese characters), it's possible that a user will not receive the email in the proper format or simply can't read it on another system due to fonts not being installed.
All this weighs in the "printable" ascii characters range.