I understand from a recent publication that there are significant risks with using any of the standard 1024 bit dhparam values. Sites are being encouraged to use longer dhparam values or generate their own.
Here's my question:
Why would web servers today use any standard DH param values? I created a 4096 bit DH param value on my laptop just now, using openssl dhparam 4096. It finished in about 40 seconds.
Why isn't that done during web server first run, or configuration? 40 seconds of compute time is not any real burden on a server. I can understand that an embedded device has a lot more constraints, but a general purpose web server generates a 4096 bit value so fast, is there a reason to ever use a standard value?
And adding to the question...
Why not generate new dh param values on a regular basis, like once a day? Doing so would greatly reduce the value of a successful attack on any particular set of parameters, and who knows what kind of attacks they may have.
You think in the right direction. 1024 bit DH Params are two weak nowadays. See the research for the so called LogJam attack (https://weakdh.org/)
Generating a parameter set on a weekly basis is implemented in the mail server dovecot for example. However, generating own params could lead to weak prime numbers. Therefore it is a good idea using well known primes which have been audited many times. For example modp-group 14 (oakley-group-14) from RFC 3526.
Edit:
Did you try generating more of the dhparams consecutively? You should observe a significant grow in amount of time for the generation.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Using a C implementation of bigint without any assembly, SSE etc.
running on a 2ghz dual core pentium laptop; what is
the average time that one should expect a prime number to
be created in?
Is it normal for primes which are greater than 512 bits to
take 30 seconds?
What about 2048, 4096 bits etc.?
From security stackexchange question 56214
I recently generated some custom Diffie-Hellman parameters which are basically just long (in the below case 4096 bit) primes.
As the generation took roughly 2 hours it cannot be something that is >generated on the fly........
Is this typical ? - 2 hours to generate a 4096 bit key ...
No, 4 hours are definitely not typical.
Generation of large random primes depends on the following:
the speed and entropy within the random number generator
the used algorithm to test the candidates for primality
the implementation
and luck
The random number generator used is very important. Especially for long term keys it may be that you require a random bit generator that contains a large amount of entropy. This can be achieved by accessing e.g. /dev/random on linux operating systems, for instance. There is one unfortunate problem: /dev/random may block until sufficient entropy is gathered. Depending on the system that can be very fast or very very slow.
Now, the second is the algorithm. When generating new DH parameters then often a method to generate a so called safe prime is usually used. Now generating safe primes is much much harder than generating a number that is probable prime. However, that prime is only used for the DH parameters not the key pair itself. So generating a safe prime is generally not needed; you can simply used a set of pre-calculated or even named parameters.
The implementation can make a bit of a difference as well. Although it won't change the order of complexity, it may still influence the result if the implementation is a thousand times slower than a fast implementation. These kind of differences are not unheard of within cryptography; a slow, interpreted language may be much slower than a hardware accelerated version, or a version directly running using vector instructions of the CPU or indeed GPU.
Furthermore, the only way to see if a number is prime is to test the number. There is no deterministic method of just generating primes. The problem with that is that although there are many, many primes available, it can still take a long time to find one. This is where the luck factor comes in: it could be that the first number you test is prime, but it can also be that you run through oodles of numbers before finding one. So in the end the runtime of the procedure is indeterministic.
For a C program, generating a safe prime of 4096 bits in over 2 hours seems a bit much. However, if it runs a very old CPU, without any SSE, it would not necessarily mean that anything is fundamentally wrong. However, taking 30 seconds for a 512 bit prime is very long. OpenSSL command line takes only between 0.015 (lucky) and 1.5 (unlucky) seconds on my laptop (but that's a Core i7).
Notes:
RSA generally requires two primes that are half the key size, and these are usually not safe primes. So generating a DH key pair (with new parameters) will take much longer than generating an RSA key pair of the same size.
If possible try to use predefined DH parameters. Unfortunately the openssl command line doesn't seem to support named DH parameters; this is only supported for DSA key pairs.
If you want speed, try Elliptic Curve DH with predefined parameters. Generating a key is almost as fast as just generating 256 random bits (for the P-256 curve, for instance). And until Quantum Crypto comes off age, those keys will be much stronger than DH keys on top of it.
It seems that the current best practice for storing passwords on the web is to use bcrypt as opposed to sha256 or any other hashing algorithm. Bcrypt seems fantastic, with one flaw as I see it: if I have a database filled with passwords using a work factor of 10 and I want to increase that work factor to 12 because computational power has increased, then I have no way of doing this without knowing the users password, meaning waiting until they login again. This causes problems for uses who have abandoned their account.
It seems to me then that an alternate solution would be to use sha256 and do a number of passes equal to 2^(work factor). If I do this, then when I want to increase the work factor I can just do the difference in the number of passes for every stored password.
I've written a bit of code to do exactly that, and I'd like to get feedback from everyone on whether this is a good idea or not.
https://github.com/rbrcurtis/pcrypt
Did a lot of digging and reading papers on these various encryption algorithms. What finally gave me a sort-of answer was this question on crypto.stackexchange.com. My algorithm is somewhat similar to shacrypt, which I hadn't heard of previously, but is still not as good as bcrypt. Reason being that bcrypt, in addition to the work factor, also requires more memory to process than the sha2 family. This means that it cannot as effectively be parallelized in GPUs (although to some extent it can be, and more easily in an FPGA) while sha2 can (and easily). As such, no matter how many passes of sha2 one does, it will still not be as effective as bcrypt.
As an aside, scrypt is significantly better still because it has both a work factor for CPU and a memory factor (and as such is essentially impossible to parallelize in a GPU or FPGA). The only issue is that the nodejs library for scrypt is essentially unusable at present so that might be something to put some effort into.
A potential solution for upping the number of bcrypt passes(or work factor. I don't actually use bcrypt but this is an algorithm-agnostic answer):
For each entry the table where your passwords are stored, also store the number of passes it was hashed with. When you up to more passes, save all new passwords with that number of passes, and set all passwords with less passes than that to expire in 7 days. When they make a new password, hash it with the right number of passes.
Alternatively, you can not reset the password, but the next time they try to login, rehash their password and store it in the table. This does mean that if people haven't logged in, in a long time, their passwords are more susceptible to breach in the event of a DB comprimise. That being said, it's more worth it for the attacker to attack the mass of people with more passes, than the few with less passes(nevermind, because of salts, this last sentence is wrong).
I saw some guy who encrypt users password multiple times with MD5 to improve security. I'm not sure if this works but it doesn't look good. So, does it make sense?
Let's assume the hash function you use would be a perfect one-way function. Then you can view its output like that of a "random oracle", its output values are in a finite range of values (2^128 for MD5).
Now what happens if you apply the hash multiple times? The output will still stay in the same range (2^128). It's like you saying "Guess my random number!" twenty times, each time thinking of a new number - that doesn't make it harder or easier to guess. There isn't any "more random" than random. That's not a perfect analogy, but I think it helps to illustrate the problem.
Considering brute-forcing a password, your scheme doesn't add any security at all. Even worse, the only thing you could "accomplish" is to weaken the security by introducing some possibility to exploit the repeated application of the hash function. It's unlikely, but at least it's guaranteed that you for sure won't win anything.
So why is still not all lost with this approach? It's because of the notion that the others made with regard to having thousands of iterations instead of just twenty. Why is this a good thing, slowing the algorithm down? It's because most attackers will try to gain access using a dictionary (or rainbow table using often-used passwords, hoping that one of your users was negligent enough to use one of those (I'm guilty, at least Ubuntu told me upon installation). But on the other hand it's inhumane to require your users to remember let's say 30 random characters.
That's why we need some form of trade-off between easy to remember passwords but at the same time making it as hard as possible for attackers to guess them. There are two common practices, salts and slowing the process down by applying lots of iterations of some function instead of a single iteration. PKCS#5 is a good example to look into.
In your case applying MD5 20000 instead of 20 times would slow attackers using a dictionary significantly down, because each of their input passwords would have to go through the ordinary procedure of being hashed 20000 times in order to be still useful as an attack. Note that this procedure does not affect brute-forcing as illustrated above.
But why is using a salt still better? Because even if you apply the hash 20000 times, a resourceful attacker could pre-compute a large database of passwords, hashing each of them 20000 times, effectively generating a customized rainbow table specifically targeted at your application. Having done this they could quite easily attack your application or any other application using your scheme. That's why you also need to generate a high cost per password, to make such rainbow tables impractical to use.
If you want to be on the really safe side, use something like PBKDF2 illustrated in PKCS#5.
Hashing a password is not encryption. It is a one-way process.
Check out security.stackexchange.com, and the password related questions. They are so popular we put together this blog post specifically to help individuals find useful questions and answers.
This question specifically discusses using md5 20 times in a row - check out Thomas Pornin's answer. Key points in his answer:
20 is too low, it should be 20000 or more - password processing is still too fast
There is no salt: an attacker may attack passwords with very low per-password cost, e.g. rainbow tables - which can be created for any number of md5 cycles
Since there is no sure test for knowing whether a given algorithm is secure or not, inventing your own cryptography is often a recipe for disaster. Don't do it
There is such a question on crypto.SE but it is NOT public now. The answer by PaĆlo Ebermann is:
For password-hashing, you should not use a normal cryptographic hash,
but something made specially to protect passwords, like bcrypt.
See How to safely store a password for details.
The important point is that password crackers don't have to bruteforce
the hash output space (2160 for SHA-1), but only the
password space, which is much much smaller (depending on your password
rules - and often dictionaries help). Thus we don't want a fast
hash function, but a slow one. Bcrypt and friends are designed for
this.
And similar question has these answers:
The question is "Guarding against cryptanalytic breakthroughs: combining multiple hash functions"
Answer by Thomas Pornin:
Combining is what SSL/TLS does with MD5 and SHA-1, in its
definition of its internal "PRF" (which is actually a Key Derivation
Function). For a given hash function, TLS defines a KDF which
relies on HMAC which relies on the hash function. Then the KDF is
invoked twice, once with MD5 and once with SHA-1, and the results are
XORed together. The idea was to resist cryptanalytic breaks in either
MD5 or SHA-1. Note that XORing the outputs of two hash functions
relies on subtle assumptions. For instance, if I define SHB-256(m) =
SHA-256(m) XOR C, for a fixed constant C, then SHB-256 is as
good a hash function as SHA-256; but the XOR of both always yields
C, which is not good at all for hashing purposes. Hence, the
construction in TLS in not really sanctioned by the authority of
science (it just happens not to have been broken). TLS-1.2 does
not use that combination anymore; it relies on the KDF with a single,
configurable hash function, often SHA-256 (which is, in 2011, a smart
choice).
As #PulpSpy points out, concatenation is not a good generic way of
building hash functions. This was published by Joux in 2004 and then
generalized by Hoch and Shamir in 2006, for a large class of
construction involving iterations and concatenations. But mind the
fine print: this is not really about surviving weaknesses in hash
functions, but about getting your money worth. Namely, if you take a
hash function with a 128-bit output and another with a 160-bit output,
and concatenate the results, then collision resistance will be no
worse than the strongest of the two; what Joux showed is that it will
not be much better either. With 128+160 = 288 bits of output, you
could aim at 2144 resistance, but Joux's result implies
that you will not go beyond about 287.
So the question becomes: is there a way, if possible an efficient
way, to combine two hash functions such that the result is as
collision-resistant as the strongest of the two, but without incurring
the output enlargement of concatenation ? In 2006, Boneh and
Boyen have published a result which simply states that the answer
is no, subject to the condition of evaluating each hash function only
once. Edit: Pietrzak lifted the latter condition in 2007
(i.e. invoking each hash function several times does not help).
And by PulpSpy:
I'm sure #Thomas will give a thorough answer. In the interm, I'll just
point out that the collision resistance of your first construction,
H1(m)||H2(M) is surprisingly not that much better than just H1(M). See
section 4 of this paper:
http://web.cecs.pdx.edu/~teshrim/spring06/papers/general-attacks/multi-joux.pdf
no , it's not a good practice, you must use a $salt for your encryption because the password cand be cracked with those rainbow tables
I was wondering if I could reasons or links to resources explaining why SHA512 is a superior hashing algorithm to MD5.
It depends on your use case. You can't broadly claim "superiority". (I mean, yes you can, in some cases, but to be strict about it, you can't really).
But there are areas where MD5 has been broken:
For starters: MD5 is old, and common. There are tons of rainbow tables against it, and they're easy to find. So if you're hashing passwords (without a salt - shame on you!) - using md5 - you might as well not be hashing them, they're so easy to find. Even if you're hashing with simple salts really.
Second off, MD5 is no longer secure as a cryptographic hash function (indeed it is not even considered a cryptographic hash function anymore as the Forked One points out). You can generate different messages that hash to the same value. So if you've got a SSL Certificate with a MD5 hash on it, I can generate a duplicate Certificate that says what I want, that produces the same hash. This is generally what people mean when they say MD5 is 'broken' - things like this.
Thirdly, similar to messages, you can also generate different files that hash to the same value so using MD5 as a file checksum is 'broken'.
Now, SHA-512 is a SHA-2 Family hash algorithm. SHA-1 is kind of considered 'eh' these days, I'll ignore it. SHA-2 however, has relatively few attacks against it. The major one wikipedia talks about is a reduced-round preimage attack which means if you use SHA-512 in a horribly wrong way, I can break it. Obivously you're not likely to be using it that way, but attacks only get better, and it's a good springboard into more research to break SHA-512 in the same way MD5 is broken.
However, out of all the Hash functions available, the SHA-2 family is currently amoung the strongest, and the best choice considering commonness, analysis, and security. (But not necessarily speed. If you're in embedded systems, you need to perform a whole other analysis.)
MD5 has been cryptographically broken for quite some time now. This basically means that some of the properties usually guaranteed by hash algorithms, do not hold anymore. For example it is possible to find hash collisions in much less time than potentially necessary for the output length.
SHA-512 (one of the SHA-2 family of hash functions) is, for now, secure enough but possibly not much longer for the foreseeable future. That's why the NIST started a contest for SHA-3.
Generally, you want hash algorithms to be one-way functions. They map some input to some output. Usually the output is of a fixed length, thereby providing a "digest" of the original input. Common properties are for example that small changes in input yield large changes in the output (which helps detecting tampering) and that the function is not easily reversible. For the latter property the length of the output greatly helps because it provides a theoretical upper bound for the complexity of a collision attack. However, flaws in design or implementation often result in reduced complexity for attacks. Once those are known it's time to evaluate whether still using a hash function. If the attack complexity drops far enough practical attacks easily get in the range of people without specialized computing equipment.
Note: I've been talking only about one kind of attack here. The reality if much more nuanced but also much harder to grasp. Since hash functions are very commonly used for verifying file/message integrity the collision thing is probably the easiest one to understand and follow.
There are a couple of points not being addressed here, and I feel it is from a lack of understanding about what a hash is, how it works, and how long it takes to successfully attack them, using rainbow or any other method currently known to man...
Mathematically speaking, MD5 is not "broken" if you salt the hash and throttle attempts (even by 1 second), your security would be just as "broken" as it would by an attacker slowly pelting away at your 1ft solid steel wall with a wooden spoon:
It will take thousands of years, and by then everyone involved will be dead; there are more important things to worry about.
If you lock their account by the 20th attempt... problem solved. 20 hits on your wall = 0.0000000001% chance they got through. There is literally a better statistical chance you are in fact Jesus.
It's also important to note that absolutely any hash function is going to be vulnerable to collisions by virtue of what a hash is: "a (small) unique id of something else".
When you increase the bit space you decrease collision rates, but you also increase the size of the id and the time it takes to compute it.
Let's do a tiny thought experiment...
SHA-2, if it existed, would have 4 total possible unique IDs for something else... 00, 01, 10 & 11. It will produce collisions, obviously. Do you see the issue here? A hash is just a generated ID of what you're trying to identify.
MD5 is actually really, really good at randomly choosing a number based on an input. SHA is actually not that much better at it; SHA just has massive more space for IDs.
The method used is about 0.1% of the reason the collisions are less likely. The real reason is the larger bit space.
This is literally the only reason SHA-256 and SHA-512 are less vulnerable to collisions; because they use a larger space for a unique id.
The actual methods SHA-256 and SHA-512 use to generate the hash are in fact better, but not by much; the same rainbow attacks would work on them if they had fewer bits in their IDs, and files and even passwords can have identical IDs using SHA-256 and SHA-512, it's just a lot less likely because it uses more bits.
The REAL ISSUE is how you implement your security
If you allow automated attacks to hit your authentication endpoint 1,000 times per second, you're going to get broken into. If you throttle to 1 attempt per 3 seconds and lock the account for 24 hours after the 10th attempt, you're not.
If you store the passwords without salt (a salt is just an added secret to the generator, making it harder to identify bad passwords like "31337" or "password") and have a lot of users, you're going to get hacked. If you salt them, even if you use MD5, you're not.
Considering MD5 uses 128 bits (32 bytes in HEX, 16 bytes in binary), and SHA 512 is only 4x the space but virtually eliminates the collision ratio by giving you 2^384 more possible IDs... Go with SHA-512, every time.
But if you're worried about what is really going to happen if you use MD5, and you don't understand the real, actual differences, you're still probably going to get hacked, make sense?
reading this
However, it has been shown that MD5 is not collision resistant
more information about collision here
MD5 has a chance of collision (http://www.mscs.dal.ca/~selinger/md5collision/) and there are numerous MD5 rainbow tables for reverse password look-up on the web and available for download.
It needs a much larger dictionary to map backwards, and has a lower chance of collision.
It is simple, MD5 is broken ;) (see Wikipedia)
Bruce Schneier wrote of the attack that "[w]e already knew that MD5 is a broken hash function" and that "no one should be using MD5 anymore."
Already understanding that AES is the encryption method of choice, should existing code that uses DES be re-written if the likely threat is on the level of script kiddies? (e.g. pkzip passwords can be cracked with free utilities by non-computer professionals, so is DES like that?) A quick google search seems to imply that even deprecated DES still requires a super computer and large quantity of time--or have times changed?
In particular, this CAPTCHA library uses DES to encrypt the challenge string which is sent to the user in viewstate.
DES is broken so far as storing sensitive data, and so I would certainly not use it in anything new, and would replace it in anything used for long term storage of any information of interest (data that someone would have a profit for national security interest in stealing).
At the moment a DES message can be broken by brute force in a couple of days (or less) using under $100,000 worth of custom hardware.
But there are some key factors in that:
The hardware is custom - the chips used to quickly brute a DES key are not the general purpose processor you'd find in a PC. That being said there is probably room today for using a cluster of Playstation 3s or current generation graphics cards with a GPGPU to crack a DES message in a reasonable amount of time, perhaps bringing down the cost to maybe $15,000.
The other factor is time - a DES message can be cracked in a day, but if your CAPTCHA library has a timestamp that specifies a 30 minute timeout for any given CAPTCHA response, it would still be effective (you could scale up your hardware, but then you're talking millions).
Overall I'd say that for non-long term storage, DES is still secure against "script kiddies".
no, DES cracking is not suitable for scriptkiddies and won't probaly be in the near forseeable future.
it requires such enormous processing power, we're talking about a load of FPGA processors.
for example the COPACOBANA in the CHES 2006 secret key challenge took 21 hours, 26 mins, 29 secs using 108 of it's 128 processors, at a troughput of 43.1852 billion keys per second, and found the key after searching trough 4.73507% of the keyspace
now, if we look at moores law we see, that if we currently build a similar machine, it'll currently take 1/4th of the time for the same amount of money, or 1/4th of the money for the same amount of time.
DES is broken by the standards of the crypto community; but the time required to break it is generally large enough that it would be 'safe' to use for this kind of application. On one assumption: the DES key changes from session to session. If the key doesn't change, then it is open to attack by a very very dedicated individual. Now, the question is, is your website subjected to people that will spend 10+ days cracking DES, rather then applying lessons learned by the rest of the Spam Industry in the way of Image Recognition.
DES is probably still good enough for most use cases. But the point is, there is normally reason to use an algorithm (or in this case rather: a key strength) which is known to be rather weak.
Wikipedia points out that even with special hardware around 9 Days are needed for an exhaustive key search. I don't think the script kiddies are likely to spend that many CPU time (even if they have a botnet) only to crack a captcha. (Actually, cracking captchas is normally A LOT easier with sufficient intelligent picture recognition...)