How long does it take to generate large prime numbers? [closed] - cryptography

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Using a C implementation of bigint without any assembly, SSE etc.
running on a 2ghz dual core pentium laptop; what is
the average time that one should expect a prime number to
be created in?
Is it normal for primes which are greater than 512 bits to
take 30 seconds?
What about 2048, 4096 bits etc.?
From security stackexchange question 56214
I recently generated some custom Diffie-Hellman parameters which are basically just long (in the below case 4096 bit) primes.
As the generation took roughly 2 hours it cannot be something that is >generated on the fly........
Is this typical ? - 2 hours to generate a 4096 bit key ...

No, 4 hours are definitely not typical.
Generation of large random primes depends on the following:
the speed and entropy within the random number generator
the used algorithm to test the candidates for primality
the implementation
and luck
The random number generator used is very important. Especially for long term keys it may be that you require a random bit generator that contains a large amount of entropy. This can be achieved by accessing e.g. /dev/random on linux operating systems, for instance. There is one unfortunate problem: /dev/random may block until sufficient entropy is gathered. Depending on the system that can be very fast or very very slow.
Now, the second is the algorithm. When generating new DH parameters then often a method to generate a so called safe prime is usually used. Now generating safe primes is much much harder than generating a number that is probable prime. However, that prime is only used for the DH parameters not the key pair itself. So generating a safe prime is generally not needed; you can simply used a set of pre-calculated or even named parameters.
The implementation can make a bit of a difference as well. Although it won't change the order of complexity, it may still influence the result if the implementation is a thousand times slower than a fast implementation. These kind of differences are not unheard of within cryptography; a slow, interpreted language may be much slower than a hardware accelerated version, or a version directly running using vector instructions of the CPU or indeed GPU.
Furthermore, the only way to see if a number is prime is to test the number. There is no deterministic method of just generating primes. The problem with that is that although there are many, many primes available, it can still take a long time to find one. This is where the luck factor comes in: it could be that the first number you test is prime, but it can also be that you run through oodles of numbers before finding one. So in the end the runtime of the procedure is indeterministic.
For a C program, generating a safe prime of 4096 bits in over 2 hours seems a bit much. However, if it runs a very old CPU, without any SSE, it would not necessarily mean that anything is fundamentally wrong. However, taking 30 seconds for a 512 bit prime is very long. OpenSSL command line takes only between 0.015 (lucky) and 1.5 (unlucky) seconds on my laptop (but that's a Core i7).
Notes:
RSA generally requires two primes that are half the key size, and these are usually not safe primes. So generating a DH key pair (with new parameters) will take much longer than generating an RSA key pair of the same size.
If possible try to use predefined DH parameters. Unfortunately the openssl command line doesn't seem to support named DH parameters; this is only supported for DSA key pairs.
If you want speed, try Elliptic Curve DH with predefined parameters. Generating a key is almost as fast as just generating 256 random bits (for the P-256 curve, for instance). And until Quantum Crypto comes off age, those keys will be much stronger than DH keys on top of it.

Related

When generating an SSL key pair for SSH what is the largest number of bits (-b) you can use?

I have recently done some work to upgrade to the SSL keys for some webservices we consume. I did not initiate the work but its was to go from 1024 to 2048 bits.
When generating ssh keys I can specify the bit level(rate/depth?) with ssh-keygen -b 2048. But what are the benefits/deficits of a higher bit value? Are there any technical limits?
why are we not all generating ssl keys with a bit depth of 1 billion?
I'm going to assume the keys are RSA since 2048 is a common size for RSA (but non-existent for ECDSA or EdDSA).
But what are the benefits/deficits of a higher bit value?
The benefits are the "strength" of the key, to put it simply. Larger keys take longer to "crack". More specifically, in RSA, breaking a key requires factoring a very large number. The larger the number is, the harder it is to factor. This the the extent of what we know about factoring numbers, which is that it cannot be done in polynomial time using technology that is readily available.
Larger keys can perform slower, and require more memory to use. However, 2048 is considered the lowest "safe" size for RSA.
Are there any technical limits?
It depends on what is using a key. Speaking from experience, keys bigger than 4096 start running in to software problems because the key is too large.
why are we not all generating ssl keys with a bit depth of 1 billion?
Well a 100 MB-ish key would take a lot of memory to use. Secondly, RSA keys are not completely random numbers. They are made up of two prime numbers, p & q, which produce n, the modulus. Generating primes this large is quite a difficult task.
Finally, there is little security benefit once you go beyond a certain key size.

SFV/CRC32 checksum good and fast enough to check for common backup files?

I have 3 terabytes, more than 300,000 reference files of all sizes (20, 30, 40, 200 megas each) and I usually back them up regularly (not zipped). A few months ago, I lost some files probably due to data degradation (as I did "backup" of damaged files without notice).
I do not care about security, so do not need MD5, SHA, etc. I just want to be assured that the files I'm copying are good (the same bits and bytes) and verify that backups are intact after a few months before making backups again.
Therefore, my needs are basic because the files are not very important and there is no need for security (no sensitive information).
My doubt: the format/method "SFV CRC/32" is good and fast for my needs? There is something better and faster than that? I'm using the program ExactFile.
Are there any checksum faster than SFV/CRC32 but that is not flawed? I tried using the MD5 but it is slow and since I do not need data security, I preferred the SFV/CRC32. Still, it's painful, because there are more than 300,000 files and takes hours to make the checksum of all of them, even with CPU xeon 8 cores HT and fast HDD.
From the point of view of data integrity , there is some advantage in joining all the files in one .ZIP or .RAR instead of letting them " loose " in folders and files?
Some tips?
Thanks!
If you could quantify "few" and "some" in "A few months ago, I lost some files" (where "few" would be considered to be replaced with "every few" in order to get a rate), then you could calculate the probability of a false positive. However just from those words, I would say, yes, a 32-bit CRC should be fine for your application.
As for speed, if you have a recent Intel processor, you likely have a CRC-32C instruction, which can make the calculation much faster, by about a factor of 15. (See this answer for some code.) That could be made faster still by running it over multiple cores. If done right, you should be limited by the I/O, not the calculation.
There is no advantage in this case to bundling them in a zip or rar. In fact it may be worse, if a corruption of that one file causes you to lose everything.
If you aren't getting a throughput of at least 250 MB per second per core then you're probably I/O or memory-speed bound. The raw hashing speed of CRC32 and MD5 is higher than that, even on decades-old hardware, assuming a non-sucky reasonably optimised implementation.
Have a look at the Crypto++ benchmark, which includes a wealth of other hash algorithms as well.
The Castagnoli CRC32 can be faster than standard CRC32 or MD5 because newer CPUs have a special instruction for it; with that instruction and oodles of supporting code (for hashing three streams in parallel, stitching together partial results with a bit of linear algebra, etc. pp.) you can speed up the hashing to about 1 cycle/dword. AES-based hashes are also lightning fast on recent CPUs, due to the special AES instructions.
However, in the end it doesn't matter how fast the hash function waits for data to be read; especially on a multicore machine you're almost always I/O bound in applications like this, unless you're getting sabotaged by small caches and the latencies of deep memory cache hierarchies.
I'd stick with MD5 which is no slower than CRC32 and universally available, even on the oldest of machines, in pretty much every programming system/language ever invented. Don't think of it as a 'cryptographically secure hash' (which it isn't, not anymore) but as some kind of CRC128 that's just as fast as CRC32 but that requires some 2^64 hashings for a collision to become likely, instead of only a few ten thousand as in the case of CRC32.
If you want to roll some custom code then CRCs do have some merit: the CRC of a file can be computed by combining the CRCs of sub blocks with a bit of linear algebra. With general hashes like MD5 that's not possible (but you can always process multiple files in parallel instead).
There are oodles of ready-made programs for computing MD5 hashes for files and directories fast. I'd recommend the 'deep' versions of md5sum + cousins: md5deep and hashdeep which you can find on SourceForge and on GitHub.
Darth Gizka, thanks for the tips. Now I'm using md5deep 64 you indicated. It's very good. I used to use ExactFile, which stopped being updated in 2010, is still 32-bit (no 64bit version). I did a quick comparison between the two. The ExactFile was faster to create the MD5 digest. But to compare the digest, the md5deep64 was much faster.
My problem is HDD, as you said. For backup and storage, I use three Seagates with 2 TB each (7200rpm 64 mega cache). With an SSD the procedure would be much faster, but with terabytes of files is very difficult to use SSD.
A few days ago, I did the procedure in part of the archives: 1 tera (about 170,000 files). The ExactFile took about six hours to create the digest SFV / CRC32. I used one of my newer machines, equipped with an i7 4770k (with CRC32 instructions embedded, 8 cores - four real and four virtual, MB Gygabyte Z87X-UD4H, 16 RAM).
Throughout the calculations of files, the CPU cores were almost unusable (3% to 4%, maximum 20%). The HDD was 100% used, however, only a fraction of his speed power was achieved (sata 3), most of the time 70 MB / s, sometimes dropping to 30 MB / s depending on the number of files being calculated and anti virus in the background (which I disabled later, as I often do when copying large numbers of files).
Now I am testing a copy program that uses binary file comparison. Anyway, I will continue using md5 digests. Grateful for the information and any tip is welcome.

Why use standard dhparam values for TLS

I understand from a recent publication that there are significant risks with using any of the standard 1024 bit dhparam values. Sites are being encouraged to use longer dhparam values or generate their own.
Here's my question:
Why would web servers today use any standard DH param values? I created a 4096 bit DH param value on my laptop just now, using openssl dhparam 4096. It finished in about 40 seconds.
Why isn't that done during web server first run, or configuration? 40 seconds of compute time is not any real burden on a server. I can understand that an embedded device has a lot more constraints, but a general purpose web server generates a 4096 bit value so fast, is there a reason to ever use a standard value?
And adding to the question...
Why not generate new dh param values on a regular basis, like once a day? Doing so would greatly reduce the value of a successful attack on any particular set of parameters, and who knows what kind of attacks they may have.
You think in the right direction. 1024 bit DH Params are two weak nowadays. See the research for the so called LogJam attack (https://weakdh.org/)
Generating a parameter set on a weekly basis is implemented in the mail server dovecot for example. However, generating own params could lead to weak prime numbers. Therefore it is a good idea using well known primes which have been audited many times. For example modp-group 14 (oakley-group-14) from RFC 3526.
Edit:
Did you try generating more of the dhparams consecutively? You should observe a significant grow in amount of time for the generation.

Tips to generate strong RSA keys [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
Is there any documentation including tips to generate strong RSA key?
I mean not just ' use XXX utility with -X flag'.
I mean some rules in theory. For example, module n should be not less then 1024 bit, etc.
Can anybody tell me?
In answer to your question, there is such documentation:
Strong primes are required by the ANSI X9.31 standard for use in generating RSA keys for digital signatures. This makes the factorization of n = p q using Pollard's p − 1 algorithm computationally infeasible. However, strong primes do not protect against modulus factorisation using newer algorithms such as Lenstra elliptic curve factorization and Number Field Sieve algorithm.
The version 4 RSA Laboratories’ Frequently Asked Questions About Today’s Cryptography was published in 1998 and can be found here ftp://ftp.rsa.com/pub/labsfaq/labsfaq4.pdf
Please pay attention to following questions:
Question 3.1.4. What are strong primes and are they necessary for RSA?
In the literature pertaining to RSA, it has often been suggested that in choosing a key pair, one should use socalled
“strong” primes p and q to generate the modulus n. Strong primes have certain properties that make the
product n hard to factor by specific factoring methods; such properties have included, for example, the existence
of a large prime factor of p-1 and a large prime factor of p+1. The reason for these concerns is some factoring
methods (for instance, the Pollard p-1 and p+1 methods, see Question 2.3.4) are especially suited to primes p such
that p-1 or p+1 has only small factors; strong primes are resistant to these attacks.
However, advances in factoring over the last ten years appear to have obviated the advantage of strong primes;
the elliptic curve factoring algorithm is one such advance. The new factoring methods have as good a chance of
success on strong primes as on “weak” primes. Therefore, choosing traditional “strong” primes alone does not
significantly increase security. Choosing large enough primes is what matters. However, there is no danger in
using strong, large primes, though it may take slightly longer to generate a strong prime than an arbitrary prime.
It is possible new factoring algorithms may be developed in the future which once again target primes with
certain properties. If this happens, choosing strong primes may once again help to increase security.
Question 3.1.5. How large a key should be used in RSA?
The size of an RSA key typically refers to the size of the modulus n. The two primes, p and q, which compose the
modulus, should be of roughly equal length; this makes the modulus harder to factor than if one of the primes is
much smaller than the other. If one chooses to use a 768-bit modulus, the primes should each have length approximately
384 bits. If the two primes are extremely close (identical except for, say, 100 - 200 bits), or more generally, if
their difference is close to any predetermined amount, then there is a potential security risk, but the probability
that two randomly chosen primes are so close is negligible.
The best size for an RSA modulus depends on one’s security needs. The larger the modulus, the greater the
security, but also the slower the RSA operations. One should choose a modulus length upon consideration, first, of
the value of the protected data and how long it needs to be protected, and, second, of how powerful one’s potential
threats might be.
As of 2010, the largest factored RSA number was 768 bits long (232 decimal digits). Its factorization, by a state-of-the-art distributed implementation, took around fifteen hundred CPU years (two years of real time, on many hundreds of computers). This means that, at this date, no larger RSA key has been factored. In practice, RSA keys are typically 1024 to 2048 bits long. Some experts believe that 1024-bit keys may become breakable in the near future; few see any way that 4096-bit keys could be broken in the foreseeable future. Therefore, it is generally presumed that RSA is secure if n is sufficiently large.
Key strength generally follows current state of the art computing power. Key size is only part of a security plan. You also need to consider secure storage of your keys and how often you change keys.
Basically, you need to pick the widest key width that is compatible with the software you'll be using.
Currently, it is a good rule of thumb to go with minimum 2048-bit RSA as of 2014. It does depend on:
Speed and frequency of use
What you are protecting
Max width supported by your software
If having your key cracked is just an inconvenience that doesn't impact your finances or health, then you can err on the side of convenience. But if you really care about privacy, use the strongest key you can stand (no less than 2048).
A good doc is the OpenPGP Best Practices
https://we.riseup.net/riseuplabs+paow/openpgp-best-practices

Can creators of RSA read all encoded messages? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
According to this page http://en.wikipedia.org/wiki/RSA_numbers each RSA version uses one single constant long number which is hard to factor.
Is this right?
For example, RSA-100 uses number
1522605027922533360535618378132637429718068114961380688657908494580122963258952897654000350692006139
which was factored in 1991.
Meanwhile RSA-210 uses number
245246644900278211976517663573088018467026787678332759743414451715061600830038587216952208399332071549103626827191679864079776723243005600592035631246561218465817904100131859299619933817012149335034875870551067
which was not factored yet.
My question is: doesn't this mean that CREATORS of any specific RSA version KNOW the factor numbers and can consequently READ all encoded messages? If they don't know factorization then how they could generate a number?
Those numbers are just sample random numbers, which are used by RSA to judge the adequacy of the algorithm. The RSA asymmetric-key algorithm itself relies on the difficulty in factorizing numbers of a large size, for security.
The approximate time or difficulty in factoring these numbers is an indicator of how other such numbers used in the algorithm will fare against the amount of computational power we have.
These numbers, which were challenges, are described as follows.
(Quoting from Reference)
The RSA challenge numbers were generated using a secure process that
guarantees that the factors of each number cannot be obtained by any
method other than factoring the published value. No one, not even RSA
Laboratories, knows the factors of any of the challenge numbers. The
generation took place on a Compaq laptop PC with no network connection
of any kind. The process proceeded as follows:
First, 30,000 random
bytes were generated using a ComScire QNG hardware random number
generator, attached to the laptop's parallel port.
The random bytes
were used as the seed values for the B_GenerateKeyPair function, in
version 4.0 of the RSA BSAFE library.
The private portion of the
generated keypair was discarded. The public portion was exported, in
DER format to a disk file.
The moduli were extracted from the DER
files and converted to decimal for posting on the Web page.
The
laptop's hard drive was destroyed.
When it becomes fairly trivial and quick, to reliably factorize numbers of a particular size, it usually implies it is time to move to a longer number.
Look at Ron was wrong, Whit is right. It is a detailed analysis of duplicate RSA key use and the use of RSA keys using common factors (the problem you describe). There is a lot in the article but, to quote from its conclusion:
We checked the computational properties of millions of public keys
that we collected on the web. The majority does not seem to suffer from
obvious weaknesses and can be expected to provide the expected level
of security. We found that on the order of 0.003% of public keys is
incorrect, which does not seem to be unacceptable.
Yes, it is a problem and the problem will continue to grow but the sheer number of possible keys means the problem is not too serious, at least not yet. Note that the article does not cover the increasing ease of brute forcing shorter RSA keys, either.
Note that this is not an issue with the RSA algorithm or the random number generators used to generate keys (although the paper does mention seeding may still be an issue). It is the difficulty of checking a newly generated key against an ever expanding list of existing keys from an arbitrary, sometimes disconnected device. This differs from the known weak keys for DES, for example, where the weak keys are known upfront.