Difference between password Entropy and Min-Entrpy - cryptography

What is the difference between passwords Entropy and Min-Entropy?
Is there a recommended number of min-entropy or standard to ensure strong passwords?
How to convert Min-Entropy to number of days/months/years?
for example: Min-Entropy of 30 bits is corresponds to computational time of 2 years in brute-force attack.
Thanks

More information is needed about the attack that is being defended against.
The problem is not brute-forcing of the password space but rather brute-forcing a list of frequently used passwords ordered by use, say 10,000,000, that will not take 2-years. Then there is cracking hardware consisting of an array of GPUs and fuzzing software.
There is also the difference between cracking a particular password and cracking 90% of a million passwords for sale on the dark web.
What is necessary is to use an iteration key derivation function such as PBKDF2 with 100,000 iterations (about 100ms).
For more information see:
Password list at SecLists.
Infosec password-cracking-tools
Arstechnica How I became a password cracker
Advanced Password Recovery hashcat
[DRAFT NIST Special Publication 800-63B Digital Authentication Guideline](
https://pages.nist.gov/800-63-3/sp800-63b.html)

Related

Is still valid password hashing using md5 or sha1?

Just now I'm working in a financial project. Here, the team is thinking to use MD5 for password hashing.
But, today is easy copy a SHA1 or MD5 password to decrypt, inclusive if they are complex password like:
My$uper$ecur3PAS$word+448, you might use a online page to decrypt it and there is it.
Small and mid-range developers (including me) uses those hashing methods, but I think is not enough to provide security over the database.
(Excluding firewalls, network security, iptables, etc.).
Can someone give me a clue about what is the better approach to solve this vulnerability?
As per OWASP Password Storage Cheat Sheet, the recommendation is:
Argon2 is the winner of the password hashing competition and should be considered as your first choice for new applications;
PBKDF2 when FIPS certification or enterprise support on many platforms is required;
scrypt where resisting any/all hardware accelerated attacks is necessary but support isn’t.
bcrypt where PBKDF2 or scrypt support is not available.
MD5 and SHA1 are not secured for most security related use cases, because it is possible to find collisions with these algorithms. In other words, given an input and its hash value, it is possible to derive another input with the same hash value.
SHA-2 group of hashing algorithms are secured for many security use cases, but not for password hashing because they are blazingly fast when compared with the above algorithms. And performance is something we don't want for password hashing because that would make it easier for an attacker to perform a brute-force attack by trying a wide range of passwords in a short period of time.
The above 4 algorithms are therefore meant to be expensive in terms of memory, computation power and time. These values are usually parameterized so that they can be tuned to a high value as new technologies improve the computation power with passsing time. Therefore while using these algorithms, it is important to choose the work factor values correctly. Setting a very low valur may defeat the purpose.
In addition to that a salt should also be used.
Again from the same OWASP source:
Generate a unique salt upon creation of each stored credential (not just per user or system wide);
Use cryptographically-strong random data;
As storage permits, use a 32 byte or 64 byte salt (actual size dependent on protection function);
Scheme security does not depend on hiding, splitting, or otherwise obscuring the salt.
Salts serve two purposes:
prevent the protected form from revealing two identical credentials and
augment entropy fed to protecting function without relying on credential complexity.
The second aims to make pre-computed lookup attacks on an individual credential and time-based attacks on a population intractable
Your thinking is correct, MD5 and SHA1 should never be used for password hashing. I would recommend the following, in order of preference:
argon2
bcrypt
scrypt
PBKDF2
If you tag your question with the language/framework you are using, I can recommend specific libraries or methods.
Also be aware that encryption is not the right word to use here. These are password hashing algorithms, not encryption algorithms.

Why do salts have fixed lengths?

I thought the reason to use variable, random strings as salts is to force the attacker to look up every salt before using his rainbow table on the hash. This takes a long time. But most developers seem to use fixed sizes for their salts. If you look up the size of one salt, which you know because most passwords are stored alongside with the salt, couldn't you just cut the salt length off every password digest and then use your rainbow table? I don't see how this would be too much of an effort and it makes me question why I should even use a variable salt. Because the only information the attacker needs is the size of one salt and then he doesn't have to look up anymore salts. Wouldn't a randomized size be much more secure? The variability and randomization really seems like a 'security through obscurity' thing to me, because it's so easy to workaround if you know the method and the only reason to implement it is to cause some moments of confusion. Or am I wrong?
Salts can be of any length. The only requirement of a good salt is it is 'sufficiently unique' as it is this property that makes rainbow table attack unfeasible against salts: it would take more time to generate the rainbow table, that would only work for a given salt, than to just attempt a brute-force attack1.
A public salt makes it more time-consuming to crack a list of passwords. However, it does not make dictionary attacks harder when cracking a single password. - Salt (cryptography).
An easy way to generate a good salt is then to generate a large (say, 128 random bits) value from a cryptographic random number generator. This will trivially ensure salts will be of the same length - but there is an insane amount of unique values that can be represented by 128 bits.
Since the hash is generated using the original 'long' salt, using a trimmed salt later would just yield an invalid hash when specifying the original password - it doesn't make it any easier to find a hash collision. (Choosing a 'slow' hash like bcrypt is another requirement for a good hashed password design.)
If an attacker was able to trick a system to use a small range of salt values (or even a fixed salt value) then the salts lose the property of 'sufficiently unique'.
1 Sadly, many people choose poor passwords - mainly passwords that are too short, based on common words / sequences, or shared between sites. While a slow hash will help a good bit, and the salt prevents application of rainbow tables, given a large enough pool of accounts (and less time then may be expected) a brute-force is still likely to recover some passwords.

bcrypt or progressive passes of sha256?

It seems that the current best practice for storing passwords on the web is to use bcrypt as opposed to sha256 or any other hashing algorithm. Bcrypt seems fantastic, with one flaw as I see it: if I have a database filled with passwords using a work factor of 10 and I want to increase that work factor to 12 because computational power has increased, then I have no way of doing this without knowing the users password, meaning waiting until they login again. This causes problems for uses who have abandoned their account.
It seems to me then that an alternate solution would be to use sha256 and do a number of passes equal to 2^(work factor). If I do this, then when I want to increase the work factor I can just do the difference in the number of passes for every stored password.
I've written a bit of code to do exactly that, and I'd like to get feedback from everyone on whether this is a good idea or not.
https://github.com/rbrcurtis/pcrypt
Did a lot of digging and reading papers on these various encryption algorithms. What finally gave me a sort-of answer was this question on crypto.stackexchange.com. My algorithm is somewhat similar to shacrypt, which I hadn't heard of previously, but is still not as good as bcrypt. Reason being that bcrypt, in addition to the work factor, also requires more memory to process than the sha2 family. This means that it cannot as effectively be parallelized in GPUs (although to some extent it can be, and more easily in an FPGA) while sha2 can (and easily). As such, no matter how many passes of sha2 one does, it will still not be as effective as bcrypt.
As an aside, scrypt is significantly better still because it has both a work factor for CPU and a memory factor (and as such is essentially impossible to parallelize in a GPU or FPGA). The only issue is that the nodejs library for scrypt is essentially unusable at present so that might be something to put some effort into.
A potential solution for upping the number of bcrypt passes(or work factor. I don't actually use bcrypt but this is an algorithm-agnostic answer):
For each entry the table where your passwords are stored, also store the number of passes it was hashed with. When you up to more passes, save all new passwords with that number of passes, and set all passwords with less passes than that to expire in 7 days. When they make a new password, hash it with the right number of passes.
Alternatively, you can not reset the password, but the next time they try to login, rehash their password and store it in the table. This does mean that if people haven't logged in, in a long time, their passwords are more susceptible to breach in the event of a DB comprimise. That being said, it's more worth it for the attacker to attack the mass of people with more passes, than the few with less passes(nevermind, because of salts, this last sentence is wrong).

Encrypting(MD5) multiple times can improve security?

I saw some guy who encrypt users password multiple times with MD5 to improve security. I'm not sure if this works but it doesn't look good. So, does it make sense?
Let's assume the hash function you use would be a perfect one-way function. Then you can view its output like that of a "random oracle", its output values are in a finite range of values (2^128 for MD5).
Now what happens if you apply the hash multiple times? The output will still stay in the same range (2^128). It's like you saying "Guess my random number!" twenty times, each time thinking of a new number - that doesn't make it harder or easier to guess. There isn't any "more random" than random. That's not a perfect analogy, but I think it helps to illustrate the problem.
Considering brute-forcing a password, your scheme doesn't add any security at all. Even worse, the only thing you could "accomplish" is to weaken the security by introducing some possibility to exploit the repeated application of the hash function. It's unlikely, but at least it's guaranteed that you for sure won't win anything.
So why is still not all lost with this approach? It's because of the notion that the others made with regard to having thousands of iterations instead of just twenty. Why is this a good thing, slowing the algorithm down? It's because most attackers will try to gain access using a dictionary (or rainbow table using often-used passwords, hoping that one of your users was negligent enough to use one of those (I'm guilty, at least Ubuntu told me upon installation). But on the other hand it's inhumane to require your users to remember let's say 30 random characters.
That's why we need some form of trade-off between easy to remember passwords but at the same time making it as hard as possible for attackers to guess them. There are two common practices, salts and slowing the process down by applying lots of iterations of some function instead of a single iteration. PKCS#5 is a good example to look into.
In your case applying MD5 20000 instead of 20 times would slow attackers using a dictionary significantly down, because each of their input passwords would have to go through the ordinary procedure of being hashed 20000 times in order to be still useful as an attack. Note that this procedure does not affect brute-forcing as illustrated above.
But why is using a salt still better? Because even if you apply the hash 20000 times, a resourceful attacker could pre-compute a large database of passwords, hashing each of them 20000 times, effectively generating a customized rainbow table specifically targeted at your application. Having done this they could quite easily attack your application or any other application using your scheme. That's why you also need to generate a high cost per password, to make such rainbow tables impractical to use.
If you want to be on the really safe side, use something like PBKDF2 illustrated in PKCS#5.
Hashing a password is not encryption. It is a one-way process.
Check out security.stackexchange.com, and the password related questions. They are so popular we put together this blog post specifically to help individuals find useful questions and answers.
This question specifically discusses using md5 20 times in a row - check out Thomas Pornin's answer. Key points in his answer:
20 is too low, it should be 20000 or more - password processing is still too fast
There is no salt: an attacker may attack passwords with very low per-password cost, e.g. rainbow tables - which can be created for any number of md5 cycles
Since there is no sure test for knowing whether a given algorithm is secure or not, inventing your own cryptography is often a recipe for disaster. Don't do it
There is such a question on crypto.SE but it is NOT public now. The answer by Paŭlo Ebermann is:
For password-hashing, you should not use a normal cryptographic hash,
but something made specially to protect passwords, like bcrypt.
See How to safely store a password for details.
The important point is that password crackers don't have to bruteforce
the hash output space (2160 for SHA-1), but only the
password space, which is much much smaller (depending on your password
rules - and often dictionaries help). Thus we don't want a fast
hash function, but a slow one. Bcrypt and friends are designed for
this.
And similar question has these answers:
The question is "Guarding against cryptanalytic breakthroughs: combining multiple hash functions"
Answer by Thomas Pornin:
Combining is what SSL/TLS does with MD5 and SHA-1, in its
definition of its internal "PRF" (which is actually a Key Derivation
Function). For a given hash function, TLS defines a KDF which
relies on HMAC which relies on the hash function. Then the KDF is
invoked twice, once with MD5 and once with SHA-1, and the results are
XORed together. The idea was to resist cryptanalytic breaks in either
MD5 or SHA-1. Note that XORing the outputs of two hash functions
relies on subtle assumptions. For instance, if I define SHB-256(m) =
SHA-256(m) XOR C, for a fixed constant C, then SHB-256 is as
good a hash function as SHA-256; but the XOR of both always yields
C, which is not good at all for hashing purposes. Hence, the
construction in TLS in not really sanctioned by the authority of
science (it just happens not to have been broken). TLS-1.2 does
not use that combination anymore; it relies on the KDF with a single,
configurable hash function, often SHA-256 (which is, in 2011, a smart
choice).
As #PulpSpy points out, concatenation is not a good generic way of
building hash functions. This was published by Joux in 2004 and then
generalized by Hoch and Shamir in 2006, for a large class of
construction involving iterations and concatenations. But mind the
fine print: this is not really about surviving weaknesses in hash
functions, but about getting your money worth. Namely, if you take a
hash function with a 128-bit output and another with a 160-bit output,
and concatenate the results, then collision resistance will be no
worse than the strongest of the two; what Joux showed is that it will
not be much better either. With 128+160 = 288 bits of output, you
could aim at 2144 resistance, but Joux's result implies
that you will not go beyond about 287.
So the question becomes: is there a way, if possible an efficient
way, to combine two hash functions such that the result is as
collision-resistant as the strongest of the two, but without incurring
the output enlargement of concatenation ? In 2006, Boneh and
Boyen have published a result which simply states that the answer
is no, subject to the condition of evaluating each hash function only
once. Edit: Pietrzak lifted the latter condition in 2007
(i.e. invoking each hash function several times does not help).
And by PulpSpy:
I'm sure #Thomas will give a thorough answer. In the interm, I'll just
point out that the collision resistance of your first construction,
H1(m)||H2(M) is surprisingly not that much better than just H1(M). See
section 4 of this paper:
http://web.cecs.pdx.edu/~teshrim/spring06/papers/general-attacks/multi-joux.pdf
no , it's not a good practice, you must use a $salt for your encryption because the password cand be cracked with those rainbow tables

What is the best way to determine duplicate credit card numbers without storing them?

I run a website where we mark certain accounts as scammers, and "flag" their account and all credit cards used as being bad. We don't store actual credit card values, but are storing a checksum/MD5 algorithm of it instead.
We are hitting collisions all the time now. What is the best way to store these values - non reversible, but able to do comparisons on future values.
I thought MD5 would be the best, but we've got a debate going on here...
A cryptographically secure hash would work. (SHA512 or SHA256 would be OK)
However, I would use a fairly secret salt that is not stored along with the cards (to prevent any sort of rainbow table attack).
PS:
Rainbow table attacks against credit cards could be particularlly effective, since the total size of the plain-text-space is quite small due to the limited character set, the fixed size, and the check digits.
PPS:
You can't use a random salt for each entry, because you would never be able to feasibly check duplicates. Salts are used to prevent collisions, whereas we are specifically looking for a collision in this instance.
It isn't sufficiently safe to just use a good Hash algorithm. If your list is stolen, your stored hashes can be used to retrieve working card information. The actual schema-space for credit card numbers is small enough that a determined attacker can pre-calculate many of the possible hashes ahead of time as well, and this may have other implications for your system if there is an intrusion or an inside-job.
I recommend you use a salt and also calculate a 2nd value to be added to the salt based on a formula involving each digit of the card number and the first salt value. This assures that if you lose control of either part, you still have reasonable uniqueness that renders ownership of the list useless. The formula should not be heavily weighted toward the first 6 digits of the card (BIN number), though, and no trace of the formula should be stored in the same location as either the salt or the final hash.
Consider the anatomy of a 16-digit credit card number:
6 digit BIN (Bank Identification Number)
9 digit Account Number
1 digit Luhn Checksum
BIN lists are well known within the processing industry and are not too difficult to assemble for those with access to an illicit list of card numbers. The number of valid BINs is further diminished by the assigned space for each issuer.
Visa - Starts with 4
American Express - Starts with 34 / 37
MasterCard - Starts with 5
Discover/CUP - Starts with 6
Diner's Club - Starts with 35
etc.
Note that some of the assigned BIN information within each issuer category is also sparse. If an attacker is aware of where most of your customers are located, then that will cut down the uniqueness considerably, as BIN information is assigned on a per-bank basis. An attacker that already has an account issued by a small bank in a wealthy neighborhood could just get an account and use the BIN as a starting point on his own card.
The checksum digit is calculated with a well-known formula, so that is immediately discardable as a source of unique data.
Armed with a handful of BINs worth targeting, an attacker has to check 9 digits at a time for each BIN set. This is 1 Billion Checksums and Hash Operations per set. I don't have any benchmarks handy, but I'm pretty sure 1 Million Hash operations per minute is not unreasonable for MD5 or any flavor of SHA on a suitably powerful machine. This amounts to less than a day to crack all matches under a given BIN.
Finally, you might consider storing a timestamp or visitor token (IP/subnet) with your hashes as well. It is nice to catch duplicate card numbers, but also consider the ramifications of someone stuffing your system with bogus card numbers. At some point you need to decide on a trade-off between blocking card numbers that you know are invalid, and also give yourself a mechanism to identify and repair misuse.
For example, a disgruntled employee could be stealing card information on his own and then use your hash mechanism against you by inserting valid hashes into your card number blacklist to block repeat business. It is quite expensive to undo this if you are just storing a hash- everything is opaque once it has been converted to a hash. With this in mind, give yourself a method to identify the source of the hash as well.
Perhaps you can store two different hashes of the card number. The chances that both hashes will result in collisions is practically zero.
Use SHA1, hash collisions are yet to be found.
People pointing out that a hash is "broken" are missing the point, perhaps regurgitating something they've heard without understanding what it means. When people talk about hashes being 'broken' they typically mean that it is possible to easily generate an alternate payload that has computes to the same hash.
This 'breaks' the hash but only for the specific purpose of using a hash to verify data is what it's supposed to be.
That isn't the important here, ie someone managing to create an alternate datastream that happens to hash down to the same value as one of the credit cards doesn't achieve anything meaningful or useful in terms of an attack vector.
The risk with hashes here is that the problem space for credit card numbers is pretty low and rainbow tables for them would be pretty cheap and easy to generate.
Adding a salt would add a bit of protection against already generated rainbow tables for pure card numbers but the extent to which it offers any real protection depends on how 'secret' the salt would remain in the case you are compromised. If the salt is exposed then new rainbow tables can then be cheaply generated and it's all over.
Given that the salt needs to be available to the application for it to perform checks against the blacklist there's a good chance someone compromising the blacklist data will also be able to get to the salt. If you have multiple servers you can mitigate that to some degree by ensuring both the salt and the data aren't in the same 'place' so an exposure of one server won't give someone all of the parts they need. (Similarly for backups don't store the data and the salt on the same media where someone can walk away with one tape and get everything). The salt only adds some protection while it is secret (in this type use).
If you have the resources to do it securely then I think that is the route to go. If you are getting a significant number of collisions on any reasonable hash function you must be doing a lot of volume. (In fact I'm highly surprised collisions would be a problem even then, any reasonable hash function should provide diverse results over a small problem space like this).
As others have said, HMAC should be the way to go.
HMAC-SHA-256 with a proper key should:
Avoid collisions.
Avoid retrieval of the credit card number from the stored value.
Prevent an attacker from performing the same computation (on all possible credit card numbers, to find a matching value).
But there is one more very important thing:
It is with good reason that you are not storing the credit card numbers. Even if you could be 100% sure that you are using proper encryption, you probably still would not store credit card numbers. Why? For one thing, because the key could be leaked.
So you store hashes, so that the credit card number cannot be retrieved. ...Right?
Well, if you use a plain hash, a simple rainbow table with hashes of all possible credit card numbers gives away all the original data that you presumably did not store. Oops. But this you knew by now.
So we try to do better. Let's say using individual salts is better, and using HMAC is the best approach we know.
Consider the following scenario:
Take a 16-digit card number.
First 6 digits (Bank Identification Number) are guessed by trying a few common BINs.
Last 4 digits are visible in masked card number, which you are allowed to store. (You might not have this stored, which helps.)
1 digit is calculated (Luhn).
This leaves 5 digits to be brute-forced. That is a meager 100'000 attempts.
If we have used the individual salts, it's game over. We can simply brute-force each individual card number at an average of 50'000 attempts.
If we have used HMAC, we appear to be safe. But remember... we choose not to store encrypted card numbers, because even with perfect encryption, the key could be leaked. Guess what. Our HMAC key can be leaked just the same. With the key, again, we can brute-force each individual card number at an average of 50'000 attempts. So a leaked key gives us the credit card numbers, just as it would if we had stored encrypted card numbers.
As such, because of the low entropy of credit card numbers, storing hashes does not add much security compared to encrypted values (yet PCI limits the key rotation requirement to encryption).
A bit of perspective:
Ok, we're assuming a leaked key here. Extreme. But then again, so does PCI as part of their reasoning to forbid you from storing credit card numbers, so we should at least consider it.
True, I did not take into account the multiple guesses to find the BIN. It should be a small constant, though. Or we could limit ourselves to one BIN.
Definitely, a PCI auditor may be more forgiving than I am.
Yes, if you do not store the masked card number, you are a factor 10'000 safer. This helps a lot. Use it to your advantage. Still, if 50K attempts are doable, 500M may be doable, too. It's not enough to make me consider the data secure, in the context of a compromised key.
Conclusion:
Use HMAC-SHA-256. Understand the risk. Store as little as possible. Protect your keys vigilantly. Spend a fortune on a Hardware Security Module :-)
If you are finding collisions with MD5, why not use a better algorithm such as SHA1 or SHA256?
MD5 is NOT the way to go since it's broken. Quote Bruce Schneier: "[w]e already knew that MD5 is a broken hash function" and that "no one should be using MD5 anymore."
I.e. use SHA512 or SHA256 as someone already proposed.
As Henri already mentioned above (+1), the right solution is to use Message Authentication Code such as HMAC with a secret key. This is exactly the "secret salt" someone mentioned before. (BTW. Salts are always public).
Use standard construction such as HMAC-SHA-256 (RFC2104, FIPS-198a), keep the key secret and store the results (authentication tags) in a database.
The larger digest size (256 bits) of SHA-256 should prevent any collisions from happening, SHA-256 is a fairly good hash function and probability of random collisions is 2^-128, so if you ever encounter a collision in your system, please, let me know! :)
Using the strongest hash possible is usually good. Speed is not of the essence and slowness actually works against anyone trying a brute force reversal of your hashed values.
I like whirlpool, personally - if you're using PHP check out the supported algorithms at the hash function docs
Whirlpool returns a string 128 characters long, but you don't have to store all of it necessarily. The first 32 or 64 chars would suffice. You could also consider sha512 or sha284.
Dont bother doing salts, just use HMACs. I know it's kind of an abuse, but then you get a decent keyed hash, so you can prevent collisions and rainbow table attacks.
The nice thing here is that even if the key leaks, nobody can decrypt it. The best thing that works for HMACs is brute force. Actually, the key here is a salt as mentioned earlier. The nice thing here is that the algorithm is a little better than the usual salting stuff done by most non-security programmers.