How can I crack two ciphertexts that have used the same key twice? For example, plaintext1 uses the key "abcdefg", and plaintext2 uses the key "abcdefg".
I know that ciphertext2 ^ ciphertext1 is equal to plaintext1 ^ plaintext2. And the method to crack plaintext1 ^ plaintext2 is the same method to crack a "book cipher" (also sometimes called a "running key cipher", although a running key cipher isn't the same as a book cipher, right?)
I know that I'm supposed to use a dictionary attack, but I'm not sure which dictionary/wordlist I should use, and the algorithm used in cracking this. Can anyone provide me with a link, or some code that shows how to crack it?
I'm new to cryptography, and I just wanted to do this for fun. Can anyone help me out? Thanks.
The most common attack is to "slide" a common (but not too short) word along and XOR it against successive positions in the combined stream. Where the word was used in one stream, the XOR will (usually) produce readable text for the other stream.
Related
Does it make a problem about security as long as key and iv kept secret but same?
Thanks.
UPDATE: I did some research and learned why iv used. As mentioned in the answer below it's a protection way against frequency attacks. And there is two requirements when creating iv: uniqueness and unpredictability.
The algorithm that I thought for creating aes key and iv is:
Get two values from a certain elliptic curve point that agreed before by using asymetric encryption.(x,y)
Aes.Key <- HASH(x)
(string)unique <- timestamp()
(string)unpre <- y
Aes.IV <- HASH(unique + unpre)
And share data as {encryptedmessage, timestamp}
Is this logic alright?
You need a random (or at least pseudo-random) Initialization Vector for block ciphers like AES.
Ripped straight from Wikipedia:
Randomization is crucial for encryption schemes to achieve semantic
security, a property whereby repeated usage of the scheme under the
same key does not allow an attacker to infer relationships between
segments of the encrypted message.
I've been reading this article on elliptic-curve crypto and how it works:
http://arstechnica.com/security/2013/10/a-relatively-easy-to-understand-primer-on-elliptic-curve-cryptography/
In the article, they state:
It turns out that if you have two points [on an elliptic curve], an initial point "dotted" with itself n times to arrive at a final point [on the curve], finding out n when you only know the final point and the first point is hard.
It goes on to state that the only way to find out n (if you only have the first and final points, and you know the curve eqn), is to repeatedly dot the initial point until you finally have the matching final point.
I think I understand all this - but what confuses me is - if n is the private key, and the final point corresponds to the public key (which I think is the case), then doesn't it take the exact same amount of work to compute the public key from the private, as it does the private from the public (both just have to recursively dot a point on the curve)? am I misunderstanding something about what the article is saying?
The one-way attribute of ECC and RSA is due to the Chinese Reminder Theoreom (CRT). A series of arithmetic divisions where only the remainder is kept (aka Modulo operation %), which results in information loss in the output. As a result, the person with the keys takes one direct path to generate the output - and any would-be attacker has to exhaust a massive number of possible paths in order to determine what key was used to create the output. If the simple division was used instead of a modulo - then key data would be present in the output and it couldn't be used for cryptography.
If you lived in a world where you had a powerful enough computer to exhaust all possibilities - then the CRT wouldn't be useful as a cryptographic primitive. The computers we have now a fairly powerful - so we balance the power of our modern machines with a keysize that introduces enough range of possibilities so that they cannot be exhausted in a timeframe that matters.
The CRT is a subset of the P vs NP problem set - so perhaps proving P=NP may lead to a way of undermining the oneway aspect of asymmetric cryptography. We know that there is a way to factor CRT using a quantum computer running Shor's Algorithm. Shor's Algorithm has proven that we can defeat the so-called "trapdoor", or one-way attributes of CRT, it is still however an expensive attack to conduct.
The following lecture is my favorite description of the CRT. It shows that there are many possible solutions for one direction forcing an attacker to exhaust them all and only one solution for the other:
https://www.youtube.com/watch?v=ru7mWZJlRQg
EDIT: I previously stated that n is not the private key. In your example, n is either server or client private key.
How it works is that there is a starting point known to anybody.
You select random integer k and do the "dotting operation" k-times. Then you send this new point to the server. (k is your private key)
Server does the same with the starting point, but q-times and sends it to you. (q is server's private key)
You take the point you got from server and "dot" it k-times. The final point would be the starting point "dotted" k*q-times.
Server does the same with point it got from you. And again its final point would be the starting point "dotted" q*k-times.
That means the final point (= the starting point "dotted" k*q-times) is the shared secret since all what any attacker would know is the starting point, the starting point dotted k-times and the starting point dotted q-times. And given only those data, it's practically impossible to find the final point as a product of k*q unless any of those known.
EDIT: No, it doesn't take the same time to compute k from G = kP given known values of G (sent point) and P (starting point). More in comment section and:
For rising to power, see Exponentiation by squaring.
For ECC point multiplication, see point multiplication.
I'm trying to test my RSA implementation's correctness with the RSACryptoPAD example in here: http://www.codeproject.com/Articles/10877/Public-Key-RSA-Encryption-in-C-NET
But it always creates different encryption results. Isn't RSA just a mod and power operation? But the program can decrypt all different encrypted texts correctly. My results are same with the http://nmichaels.org/rsa.py site. I think RSACryptoPAD is doing some other things?
The code uses RSACryptoServiceProvider. The line
byte[] encryptedBytes = rsaCryptoServiceProvider.Encrypt( tempBytes, true );
Tells it to encrypt using OAEP padding, which introduces randomness into the padding. The reason for doing this is so that encrypting the same plaintext will always yield different results (as you are seeing). This is a good thing, as it stops an information leak where an attacker sees you sent the same message multiple times.
There's a great historical example of why this is important. "AF is short of water"
I've been doing some preliminary research in the area of message digests. Specifically collision attacks of cryptographic hash functions such as MD5 and SHA-1, such as the Postscript example and X.509 certificate duplicate.
From what I can tell in the case of the postscript attack, specific data was generated and embedded within the header of the postscript file (which is ignored during rendering) which brought about the internal state of the md5 to a state such that the modified wording of the document would lead to a final MD value equivalent to the original postscript file.
The X.509 took a similar approach where by data was injected within the comment/whitespace sections of the certificate.
Ok so here is my question, and I can't seem to find anyone asking this question:
Why isn't the length of ONLY the data being consumed added as a final block to the MD calculation?
In the case of X.509 - Why is the whitespace and comments being taken into account as part of the MD?
Wouldn't a simple processes such as one of the following be enough to resolve the proposed collision attacks:
MD(M + |M|) = xyz
MD(M + |M| + |M| * magicseed_0 +...+ |M| * magicseed_n) = xyz
where :
M : is the message
|M| : size of the message
MD : is the message digest function (eg: md5, sha, whirlpool etc)
xyz : is the pairing of the acutal message digest value for the message M and |M|. <M,|M|>
magicseed_{i}: Is a set of random values generated with seed based on the internal-state prior to the size being added.
This technqiue should work, as to date all such collision attacks rely on adding more data to the original message.
In short, the level of difficulty involved in generating a collision message such that:
It not only generates the same MD
But is also comprehensible/parsible/compliant
and is also the same size as the original message,
is immensely difficult if not near impossible. Has this approach ever been discussed? Any links to papers etc would be nice.
Further Question: What is the lower bound for collisions of messages of common length for a hash function H chosen randomly from U, where U is the set of universal hash functions ?
Is it 1/N (where N is 2^(|M|)) or is it greater? If it is greater, that implies there is more than 1 message of length N that will map to the same MD value for a given H.
If that is the case, how practical is it to find these other messages? bruteforce would be of O(2^N), is there a method of time complexity less than bruteforce?
Can't speak for the rest of the questions, but the first one is fairly simple - adding length data to the input of the md5, at any stage of the hashing process (1st block, Nth block, final block) just changes the output hash. You couldn't retrieve that length from the output hash string afterwards. It's also not inconceivable that a collision couldn't be produced from another string with the exact same length in the first place, so saying "the original string was 17 bytes" is meaningless, because the colliding string could also be 17 bytes.
e.g.
md5("abce(17bytes)fghi") = md5("abdefghi<long sequence of text to produce collision>")
is still possible.
In the case of X.509 certificates specifically, the "comments" are not comments in the programming language sense: they are simply additional attributes with an OID that indicates they are to be interpreted as comments. The signature on a certificate is defined to be over the DER representation of the entire tbsCertificate ('to be signed' certificate) structure which includes all the additional attributes.
Hash function design is pretty deep theory, though, and might be better served on the Theoretical CS Stack Exchange.
As #Marc points out, though, as long as more bits can be modified than the output of the hash function contains, then by the pigeonhole principle a collision must exist for some pair of inputs. Because cryptographic hash functions are in general designed to behave pseudo-randomly over their inputs, collisions will tend toward being uniformly distributed over possible inputs.
EDIT: Incorporating the message length into the final block of the hash function would be equivalent to appending the length of everything that has gone before to the input message, so there's no real need to modify the hash function to do this itself; rather, specify it as part of the usage in a given context. I can see where this would make some types of collision attacks harder to pull off, since if you change the message length there's a changed field "downstream" of the area modified by the attack. However, this wouldn't necessarily impede the X.509 intermediate CA forgery attack since the length of the tbsCertificate is not modified.
What is the difference between a multi-collision in a hash function and a first or second preimage.
First preimage attacks: given a hash h, find a message m such that
hash(m) = h.
Second preimage attacks: given a fixed message m1, find a different message m2 such that
hash(m2) = hash(m1).
Multi-collision attacks: generate a series of messages m1, m2, ... mN, such that
hash(m1) = hash(m2) = ... = hash(mN).
Wikipedia tells us that a preimage attack differs from a collision attack in that there is a fixed hash or message that is being attacked.
I am confused by papers with which make statements like :
The techniques are
not only efficient to search for
collisions, but also applicable to
explore the second- preimage of MD4.
About the second-preimage attack, they
showed that a random message was a
weak message with probability 2^–122
and it only needed a one-time MD4
computation to find the
second-preimage corresponding to the
weak message.
The Second-Preimage Attack on MD4
If I understand what the authors seem to be saying is that they have developed a multi-collision attack which encompasses a large enough set of messages that given a random message there is a significant though extremely small chance it will overlap with one of their multi-collisions.
I seen similar arguments in many papers. My question when does an attack stop being a multi-collision attack and become a second preimage attack..
If a multi-collision collides with 2^300 other messages does that count as a second preimage, since the multi-collision could be used to calculate the "pre-image" of one of the messages it collides with? Where is the dividing line, 2^60, 2^100, 2^1000?
What if you can generate a preimage of all hash digests that begin with 23? Certainly it doesn't meet the strict definition of a preimage, but it is also very certainly a serious flaw in the cryptographic hash function.
If someone has a large multi-collision, then they could always recover the image of the any message which hash collided with the multi-collision. For instance,
hash(m1) = hash(m2) = hash(m3) = h
Someone requests the preimage of h, and they respond with m2. When does this stop being silly and becomes a real attack?
Rules of thumb? Know of any good resources on evaluating hash function attacks?
Related Links:
HASH COLLISION Q&A
Cryptographic Hashes
The eHash Main Page
It is about an attack scenario. The difference lies in the choice of input. In multi-collision there is free choice of both inputs. 2nd preimage is about finding any second input which has the same output as any specified input.
When a function doesn't have multi-collision resistance, it may be possible to find collision for some kind of messages - not all of them. So this doesn't imply 2nd preimage weakness.
You did a lot of research before posting the question. I cannot answer much aside the resources-question. Which is: I use Applied Cryptography be Menezes/Oorschot for almost everything I ever wanted to know on topics of cryptography, including hashes.
Maybe you'll find a copy at your universities library. Good luck.