Am I mistaken in thinking that the security of RSA encryption, in general, is limited by the amount of known prime numbers?
To crack (or create) a private key, one has to combine the right pair of prime numbers.
Is it impossible to publish a list of all the prime numbers in the range used by RSA? Or is that list sufficiently large to make this brute force attack unlikely? Wouldn't there be "commonly used" prime numbers?
RSA doesn't pick from a list of known primes: it generates a new very large number, then applies an algorithm to find a nearby number that is almost certainly prime. See this useful description of large prime generation):
The standard way to generate big prime numbers is to take a preselected random number of the desired length, apply a Fermat test (best with the base 2 as it can be optimized for speed) and then to apply a certain number of Miller-Rabin tests (depending on the length and the allowed error rate like 2−100) to get a number which is very probably a prime number.
(You might ask why, in that case, we're not using this approach when we try and find larger and larger primes. The answer is that the largest known prime has over 17 million digits- far beyond even the very large numbers typically used in cryptography).
As for whether collisions are possible- modern key sizes (depending on your desired security) range from 1024 to 4096, which means the prime numbers range from 512 to 2048 bits. That means that your prime numbers are on the order of 2^512: over 150 digits long.
We can very roughly estimate the density of primes using 1 / ln(n) (see here). That means that among these 10^150 numbers, there are approximately 10^150/ln(10^150) primes, which works out to 2.8x10^147 primes to choose from- certainly more than you could fit into any list!!
So yes- the number of primes in that range is staggeringly enormous, and collisions are effectively impossible. (Even if you generated a trillion possible prime numbers, forming a septillion combinations, the chance of any two of them being the same prime number would be 10^-123).
As new research comes out the answer to your question becomes more interesting. In a recent paper "Imperfect Forward Secrecy:How Diffie-Hellman Fails in Practice" by David Adrian et all found # https://weakdh.org/imperfect-forward-secrecy-ccs15.pdf accessed on 10/16/2015 the researchers show that although there probably are a sufficient number of prime numbers available to RSA's 1024 bit key set there are groups of keys inside the whole set that are more likely to be used because of implementation.
We estimate that even in the 1024-bit case, the computations are
plausible given nation-state resources. A small number of fixed or
standardized groups are used by millions of servers; performing
precomputation for a single 1024-bit group would allow passive
eavesdropping on 18% of popular HTTPS sites, and a second group would
allow decryption of traffic to 66% of IPsec VPNs and 26% of SSH
servers. A close reading of published NSA leaks shows that the
agency’s attacks on VPNs are consistent with having achieved such a
break. We conclude that moving to stronger key exchange methods should
be a priority for the Internet community.
The research also shows a flaw in TLS that could allow a man-in-middle attacker to downgrade the encryption to 512 bit.
So in answer to your question there are probably a sufficient quantity of prime numbers in RSA encryption on paper but in practice there is a security issue if your hiding from a nation state.
Related
Bitcoin mnemonic private key consist of 12 words. Theese words represent private key.
I was thinking if it is possible to make it shorter than 12 by increasing set of words from which to choose. It would be easier to remember. Lets say we have set of 10000 words. How many words would be enough to represent one private key then? Does somebody know the exact calculation? Or any suggestion why this is not a good idea?
Thank you.
The seed phrase can contain any number of words that you want it to. Their function is simply to allow recreation of the private key for the Bitcoin wallet.
Regarding the length of 12 words, taken from here:
The English-language wordlist for the BIP39 standard has 2048 words,
so if the phrase contained only 12 random words, the number of
possible combinations would be 2048^12 = 2^132 and the phrase would
have 132 bits of security. However, some of the data in a BIP39 phrase
is not random, so the actual security of a 12-word BIP39 seed
phrase is only 128 bits. This is approximately the same strength as
all Bitcoin private keys, so most experts consider it to be
sufficiently secure.
So although 12 would seem the correct amount, the important thing about them is that they must be securely stored, preferably in memory, but at least written and locked somewhere safe, and never stored in plaintext online or on your PC\devices.
One afterthought - by increasing the number of words that are used in the phrase, you could actually make it less secure because the longer it is, the more certain it would be that it must be recored somewhere physical rather than in your own memory.
According to the books that i have read, it says that S.H.A(Secure Hash Algorithm) is collision resistant.But if the input space is a 1024 bit number and the output space is a 512 bit message digest then shouldn't it be colliding for
(2^1024)/(2^512) times? As the range is lesser than the domain being mapped there should have been collisions. please explain where i am going wrong.
The chance for a collision does not depend on the input size. The chance to a 512-bit hash collision is 1.4×10^77, see Probability table
Maybe your book has also mentioned the definition of collision resistance? It does not mean that no collisions are created (which is clearly not the case), but that given a hash you are not able to create a message easily that produces this hash.
a hash function H is collision resistant if it is hard to find two
inputs that hash to the same output; that is, two inputs a and b such
that H(a) = H(b), and a ≠ b
From Wikipedia
As you describe: Since the input space (arbitrary size) is larger than the output space (e.g. 512bit for sha512), there always exist collisions.
"Collision resistant" means, it is adequately unlikely for a collision to be found.
Your confusion is answered when considering how large the output space "512 bits" really is:
2^512 (the number of possible configurations of a 512 bit array) is of the order 10^154.
For comparison: The number of atoms in the visible universe is somewhere in the range of 10^80.
A million is 10^6.
So a million of our 'visible universes' has 10^86 atoms.
A million times a million universes has 10^92 atoms.
If you could store a single 512 bit value on a single atom, how many universes would you need to have all possible 512 bit has values stored?
Starting with a specific 512bit number (and assuming the has function is not broken), the probability p to obtain a collision is assuming you can produce new hashes with a rate R and have the total time of t to do this is:
p = R*t/(2^(512/2))
(The exponent is halved, see "birthday attach". The expected search space for a success is to find a collision in n bits is n/2.)
Let's plugin in some example numbers:
The has rate of the bitcoin network is currently about R = 200*10^15 / s (200 million terrahashes per second).
Consider the situation that since the beginning of the universe the bitcoin network's current hashing capacity would have been available for the sole purpose of finding a collision for a specific hash value, i.e. for an available time of t=13.787*10^9 years,
then the probability that a collision would have been found by now is about 7 × 10^-41 %
Again, it is hard to appreciate how small this number is.
Edit: A similar question with a good answer is found here: https://crypto.stackexchange.com/questions/89558/are-sha-256-and-sha-512-collision-resistant
I am generating a 2048-bit safe prime for a Diffie-Hellman-type key, p such that p and (p-1)/2 are both prime.
How few iterations of Rabin-Miller can I use on both p and (p-1)/2 and still be confident of a cryptographically strong key? In the research I've done I've heard everything from 6 to 64 iterations for 1024-bit ordinary primes, so I'm a little confused at this point. And once that's established, does the number change if you are generating a safe prime rather than an ordinary one?
Computation time is at a premium, so this is a practical question - I'm basically wondering how to find out the lowest possible number of tests I can get away with while at the same time maintaining pretty much guaranteed security.
Let's assume that you select a prime p by selecting random values until you hit one for which Miller-Rabin says: that one looks like a prime. You use n rounds at most for the Miller-Rabin test. (For a so-called "safe prime", things are are not changed, except that you run two nested tests.)
The probability that a random 1024-bit integer is prime is about 1/900. Now, you do not want to do anything stupid so you generate only odd values (an even 1024-bit integer is guaranteed non-prime), and, more generally, you run the Miller-Rabin test only if the value is not "obviously" non-prime, i.e. can be divided by a small prime. So you end up with trying about 300 values with Miller-Rabin before hitting a prime (on average). When the value is non-prime, Miller-Rabin will detect it with probability 3/4 at each round, so the number of Miller-Rabin rounds you will run on average for a single non-prime value is 1+(1/4)+(1/16)+... = 4/3. For the 300 values, this means about 400 rounds of Miller-Rabin, regardless of what you choose for n.
So if you select n to be, e.g., 40, then the cost implied by n is less than 10% of the total computational cost. The random prime selection process is dominated by the test on non-primes, which are not impacted by the value of n you choose. I talked here about 1024-bit integers; for bigger numbers the choice of n is even less important since primes become sparser as size increases (for 2048-bit integers, the "10%" above become "5%").
Hence you can choose n=40 and be happy with it (or at least know that reducing n will not buy you much anyway). On the other hand, using a n greater than 40 is meaningless, because this would get you to probabilities lower than the risk of a simple miscomputation. Computers are hardware, they can have random failures. For instance, a primality test function could return "true" for a non-prime value because a cosmic ray (a high-energy particle hurtling through the Universe at high speed) happens to hit just the right transistor at the right time, flipping the return value from 0 ("false") to 1 ("true"). This is very unlikely -- but no less likely than probability 2-80. See this stackoverflow answer for a few more details. The bottom line is that regardless of how you make sure that an integer is prime, you still have an unavoidable probabilistic element, and 40 rounds of Miller-Rabin already give you the best that you can hope for.
To sum up, use 40 rounds.
The paper Average case error estimates for the strong probable prime test by Damgard-Landrock-Pomerance points out that, if you randomly select k-bit odd number n and apply t independent Rabin-Miller tests in succession, the probability that n is a composite has much stronger bounds.
In fact for 3 <= t <= k/9 and k >= 21,
For a k=1024 bit prime, t=6 iterations give you an error rate less than 10^(-40).
Each iteration of Rabin-Miller reduces the odds that the number is composite by a factor of 1/4.
So after 64 iterations, there is only 1 chance in 2^128 that the number is composite.
Assuming you are using these for a public key algorithm (e.g. RSA), and assuming you are combining that with a symmetric algorithm using (say) 128-bit keys, an adversary can guess your key with that probability.
The bottom line is to choose the number of iterations to put that probability within the ballpark of the other sizes you are choosing for your algorithm.
[update, to elaborate]
The answer depends entirely on what algorithms you are going to use the numbers for, and what the best known attacks are against those algorithms.
For example, according to Wikipedia:
As of 2003 RSA Security claims that 1024-bit RSA keys are equivalent in strength to 80-bit symmetric keys, 2048-bit RSA keys to 112-bit symmetric keys and 3072-bit RSA keys to 128-bit symmetric keys.
So, if you are planning to use these primes to generate (say) a 1024-bit RSA key, then there is no reason to run more than 40 iterations or so of Rabin-Miller. Why? Because by the time you hit a failure, an attacker could crack one of your keys anyway.
Of course, there is no reason not to perform more iterations, time permitting. There just isn't much point to doing so.
On the other hand, if you are generating 2048-bit RSA keys, then 56 (or so) iterations of Rabin-Miller is more appropriate.
Cryptography is typically built as a composition of primitives, like prime generation, RSA, SHA-2, and AES. If you want to make one of those primitives 2^900 times stronger than the others, you can, but it is a little like putting a 10-foot-steel vault door on a log cabin.
There is no fixed answer to your question. It depends on the strength of the other pieces going into your cryptographic system.
All that said, 2^-128 is a ludicrously tiny probability, so I would probably just use 64 iterations :-).
From the libgcrypt source:
/* We use 64 Rabin-Miller rounds which is better and thus
sufficient. We do not have a Lucas test implementaion thus we
can't do it in the X9.31 preferred way of running a few
Rabin-Miller followed by one Lucas test. */
cipher/primegen.c line# 1295
I would run two or three iterations of Miller-Rabin (i.e., strong Fermat probable prime) tests, making sure that one of the bases is 2.
Then I would run a strong Lucas probable prime test, choosing D, P, and Q with the method described here:
https://en.wikipedia.org/wiki/Baillie%E2%80%93PSW_primality_test
There are no known composites that pass this combination of Fermat and Lucas tests.
This is much faster than doing 40 Rabin-Miller iterations. In addition, as was pointed out by Pomerance, Selfridge, and Wagstaff in https://math.dartmouth.edu/~carlp/PDF/paper25.pdf, there are diminishing returns with multiple Fermat tests: if N is a pseudoprime to one base, then it is more likely than the average number to be a pseudoprime to other bases. That's why, for example, we see so many psp's base 2 are also psp's base 3.
A smaller probability is usually better, but I would take the actual probability value with a grain of salt. Albrecht et al Prime and Prejudice: Primality Testing Under Adversarial Conditions break a number of prime-testing routines in cryptographic libraries. In one example, the published probability is 1/2^80, but the number they construct is declared prime 1 time out of 16.
In several other examples, their number passes 100% of the time.
Only 2 iterations, assuming 2^-80 as a negligibly probability.
From (Alfred. J. Menezes. et al. 1996) §4.4 p.148:
Does it matter? Why not run for 1000 iterations? When searching for primes, you stop applying the Rabin-Miller test anyway the first time it fails, so for the time it takes to find a prime it doesn't really matter what the upper bound on the number of iterations is. You could even run a deterministic primality checking algorithm after those 1000 iterations to be completely sure.
That said, the probability that a number is prime after n iterations is 4^-n.
I am currently working on a parallel computing project where i am trying to crack passwords using rainbow tables.
The first step that i have thought of is to implement a very small version of it that cracks password of lengths 5 or 6 (only numeric passwords to begin with). To begin with, i have some questions with the configuration settings.
1 - What should be the size that i should start with. My first guess is, i will start with a table with 1000 Initial, Final pair. Is this is a good size to start with?
2- Number of chains - I really got no information online with what should be the size of a chain be
3 - Reduction function - If someone can give me any information about how should i go about building one.
Also, if anyone has any information or any example, it will be really helpful.
There is already a wealth of rainbow tables available online. Calculating rainbow tables simply moves the computation burden from when the attack is being run, to the pre-computation.
http://www.freerainbowtables.com/en/tables/
http://www.renderlab.net/projects/WPA-tables/
http://ophcrack.sourceforge.net/tables.php
http://www.codinghorror.com/blog/2007/09/rainbow-hash-cracking.html
It's a time-space tradeoff. The longer the chains are, the less of them you need, so the less space it'll take up, but the longer cracking each password will take.
So, the answer is always to build the biggest table you can in the space that you have available. This will determine your chain length and number of chains.
As for choosing the reduction function, it should be fast and behave pseudo-randomly. For your proposed plaintext set, you could just pick 20 bits from the hash and interpret them as a decimal number (choosing a different set of 20 bits at each step in the chain).
This question already has answers here:
Probability of SHA1 collisions
(3 answers)
Closed 6 years ago.
Given two different strings S1 and S2 (S1 != S2) is it possible that:
SHA1(S1) == SHA1(S2)
is True?
If yes - with what probability?
If not - why not?
Is there a upper bound on the length of a input string, for which the probability of getting duplicates is 0? OR is the calculation of SHA1 (hence probability of duplicates) independent of the length of the string?
The goal I am trying to achieve is to hash some sensitive ID string (possibly joined together with some other fields like parent ID), so that I can use the hash value as an ID instead (for example in the database).
Example:
Resource ID: X123
Parent ID: P123
I don't want to expose the nature of my resource identifies to allow client to see "X123-P123".
Instead I want to create a new column hash("X123-P123"), let's say it's AAAZZZ. Then the client can request resource with id AAAZZZ and not know about my internal id's etc.
What you describe is called a collision. Collisions necessarily exist, since SHA-1 accepts many more distinct messages as input that it can produce distinct outputs (SHA-1 may eat any string of bits up to 2^64 bits, but outputs only 160 bits; thus, at least one output value must pop up several times). This observation is valid for any function with an output smaller than its input, regardless of whether the function is a "good" hash function or not.
Assuming that SHA-1 behaves like a "random oracle" (a conceptual object which basically returns random values, with the sole restriction that once it has returned output v on input m, it must always thereafter return v on input m), then the probability of collision, for any two distinct strings S1 and S2, should be 2^(-160). Still under the assumption of SHA-1 behaving like a random oracle, if you collect many input strings, then you shall begin to observe collisions after having collected about 2^80 such strings.
(That's 2^80 and not 2^160 because, with 2^80 strings you can make about 2^159 pairs of strings. This is often called the "birthday paradox" because it comes as a surprise to most people when applied to collisions on birthdays. See the Wikipedia page on the subject.)
Now we strongly suspect that SHA-1 does not really behave like a random oracle, because the birthday-paradox approach is the optimal collision searching algorithm for a random oracle. Yet there is a published attack which should find a collision in about 2^63 steps, hence 2^17 = 131072 times faster than the birthday-paradox algorithm. Such an attack should not be doable on a true random oracle. Mind you, this attack has not been actually completed, it remains theoretical (some people tried but apparently could not find enough CPU power)(Update: as of early 2017, somebody did compute a SHA-1 collision with the above-mentioned method, and it worked exactly as predicted). Yet, the theory looks sound and it really seems that SHA-1 is not a random oracle. Correspondingly, as for the probability of collision, well, all bets are off.
As for your third question: for a function with a n-bit output, then there necessarily are collisions if you can input more than 2^n distinct messages, i.e. if the maximum input message length is greater than n. With a bound m lower than n, the answer is not as easy. If the function behaves as a random oracle, then the probability of the existence of a collision lowers with m, and not linearly, rather with a steep cutoff around m=n/2. This is the same analysis than the birthday paradox. With SHA-1, this means that if m < 80 then chances are that there is no collision, while m > 80 makes the existence of at least one collision very probable (with m > 160 this becomes a certainty).
Note that there is a difference between "there exists a collision" and "you find a collision". Even when a collision must exist, you still have your 2^(-160) probability every time you try. What the previous paragraph means is that such a probability is rather meaningless if you cannot (conceptually) try 2^160 pairs of strings, e.g. because you restrict yourself to strings of less than 80 bits.
Yes it is possible because of the pigeon hole principle.
Most hashes (also sha1) have a fixed output length, while the input is of arbitrary size. So if you try long enough, you can find them.
However, cryptographic hash functions (like the sha-family, the md-family, etc) are designed to minimize such collisions. The best attack known takes 2^63 attempts to find a collision, so the chance is 2^(-63) which is 0 in practice.
git uses SHA1 hashes as IDs and there are still no known SHA1 collisions in 2014. Obviously, the SHA1 algorithm is magic. I think it's a good bet that collisions don't exist for strings of your length, as they would have been discovered by now. However, if you don't trust magic and are not a betting man, you could generate random strings and associate them with your IDs in your DB. But if you do use SHA1 hashes and become the first to discover a collision, you can just change your system to use random strings at that time, retaining the SHA1 hashes as the "random" strings for legacy IDs.
A collision is almost always possible in a hashing function. SHA1, to date, has been pretty secure in generating unpredictable collisions. The danger is when collisions can be predicted, it's not necessary to know the original hash input to generate the same hash output.
For example, attacks against MD5 have been made against SSL server certificate signing last year, as exampled on the Security Now podcast episode 179. This allowed sophisticated attackers to generate a fake SSL server cert for a rogue web site and appear to be the reaol thing. For this reason, it is highly recommended to avoid purchasing MD5-signed certs.
What you are talking about is called a collision. Here is an article about SHA1 collisions:
http://www.rsa.com/rsalabs/node.asp?id=2927
Edit: So another answerer beat me to mentioning the pigeon hole principle LOL, but to clarify this is why it's called the pigeon hole principle, because if you have some holes cut out for carrier pigeons to nest in, but you have more pigeons than holes, then some of the pigeons(an input value) must share a hole(the output value).