Isn't it difficult to recognize a successful decryption? - cryptography

When I hear about methods for breaking encryption algorithms, I notice there is often focused on how to decrypt very rapidly and how to reduce the search space. However, I always wonder how you can recognize a successful decryption, and why this doesn't form a bottleneck. Or is it often assumed that a encrypted/decrypted pair is known?

From Cryptonomicon:
There is a compromise between the two
extremes of, on the one hand, not
knowing any of the plaintext at all,
and, on the other, knowing all of it.
In the Cryptonomicon that falls under
the heading of cribs. A crib is an
educated guess as to what words or
phrases might be present in the
message. For example if you were
decrypting German messages from World
War II, you might guess that the
plaintext included the phrase "HElL
HITLER" or "SIEG HElL." You might pick
out a sequence of ten characters at
random and say, "Let's assume that
this represented HEIL HITLER. If that
is the case, then what would it imply
about the remainder of the message?"
...
Sitting down in his office with the
fresh Arethusa intercepts, he went to
work, using FUNERAL as a crib: if this
group of seven letters decrypts to
FUNERAL, then what does the rest of
the message look like? Gibberish?
Okay, how about this group of seven
letters?

Generally, you have some idea of the format of the file you expect to result from the decryption, and most formats provide an easy way to identify them. For example, nearly all binary formats such as images, documents, zipfiles, etc, have easily identifiable headers, while text files will contain only ASCII, or only valid UTF-8 sequences.

In assymetric cryptography you usually have access to the public key. Therefore, any decryption of an encrypted ciphertext can be re-encrypted using the public key and compared to the original ciphertext, thus revealing if the decryption was succesful.
The same is true for symmetric encryption. If you think you have decrypted a cipher, you must also think that you have found the key. Therefore, you can use that key to encrypt your, presumably correct, decrypted text and see if the encrypted result is identical to the original ciphertext.

For symmetric encryption where the key length is shorter than the cipher-text length, you're guaranteed to not be able to produce every possible plain-text. You can probably guess what form your plain--text will take, to some degree -- you probably know whether it's an image, or XML, or if you don't even know that much then you can assume you'll be able to run file on it and not get 'data'. You have to hope that there are only a few keys which would give you even a vaguely sensible decryption and only one which matches the form you are looking for.
If you have a sample plain-text (or partial plain-text) then this gets a lot easier.

Related

Length extension attack doubts

So I've been studying this concept of length extension attacks and there are few things that I noticed during my study about it which are not very bright to me.
1.Research papers are explaining how you can append some type of data to the end and make newly formed data. For example
Desired New Data: count=10&lat=37.351&user_id=1&long=-119.827&waffle=eggo&waffle=liege
(notice 2 waffles). My question is if a parser function on the server side can track duplicate attributes, could then the entire length extension attack be nonsense? Because the server would notice duplicate attributes. Is a proper parser that is made to check any duplicates a good solution versus length extension attacks? I'm aware of HMAC approach and other protections, but specifically talking just about parsers here now.
2.Research says that only vulnerable data is H(key|message). They claim that H(message|key) won't work for the attacker because we would have to append a new key (which we obviously don't know). My question is why would we have to append a new key? We don't do it when we are attacking H(key|message). Why can't we rely on the fact that we will pass the verification test (we would create the correct hash) and that if the parser tries to extract the key from it, that it would take the only key in the block we send out and resume from there? Why would we have to send 2 keys? Why doesn't attack against H(message|key) work?
My question is if a parser function on server side can track duplicate attributes, could then the entire length extension attack be a nonsense?
You are talking about a well-written parser. Writing software is hard and writing correct software is very hard.
In that example, you have seen an overwritten attribute. Are you able to say that a good parser must take the last one or the first one? What is the rule? There can be stations that the last one must be taken! That is an attack that can be applied or not. This depends on the station. If you consider that the knowledge of the length extension attack goes back to 1990s, then finding a place applicable to this should amaze someone!. And, it is applied in the wild to Flickr API in 2009, after almost 20 years;
Flickr's API Signature Forgery by Thai Duong and Juliano Rizzo Published on Sep. 28, 2009.
My question is why would we have to append new key? We don't do it when we are attacking H(key|message). Why can't we relay on the fact that we will pass verification test (we would create correct hash) and that if parser tries to extract key from it, that it would take the only key in the block we send out and resume from there. Why would we have to send 2 keys? Why doesnt attack against H(message|key) work?
The attack is a signature forgery. The key is not known to the attacker, but they can still forge new signatures. The new message and signature - extended hash - is sent to the server, then the server takes the key and appends it to the message to execute a canonical verification, that is; if it does the signature is valid.
The parser doesn't extract the key, it already knows the key. The point is that can you make sure that the data is really extended or not. The padding rule is simple, add 1 and fill many zeroes so that the last 64 (128) is the length encoding (very simplified, for example, the final length must be multiple of 512 for SHA256). To see that there is another padding inside you must check every block and then you may claim that there is an extension attack. Yes, you can do this, however, the one of aims of cryptography is to reduce the dependencies, too. If we can create a better signature that eliminates the checking then we suggest to left the others. This enables the software developers to write more secure implementation easily.
Why doesn't attack against H(message|key) work?
Simple, you get the extended message message|extended and send the extended hash
H(message|key|extended) to the server. Then the server takes the message message|extended and appends the key message|extended|key and hashes it H(message|extended|key) and clearly this is not equal to the extended one H(message|key|extended)
Note that the trimmed version of the SHA2 series like SHA-512/256 has resistance to length extension attacks. SHA3 is immune to it by design and that enables a simple KMAC signature scheme. Blake2 is also immune since it is designed with the HAIFA construction.

Comparing Blob fields

My example is that I am using a fingerprint scanner, the fingerprint data is stored in a blob field, so I want to make sure that the same fingerprint does not get inserted, so whats the best way to compare these fields.
This does not seem to be about delphi or blob fields at all, since "the same fingerprint" will rarely (if ever) happen. Even the same person will produce slightly different images every time (s)he puts a finger on the scanner. Therefore the real problem is not checking for equality but checking for close matches which is a nontrivial problem in and of itself. You should consult specialized literature.

Writing a kama sutra cipher

I took up cryptography recently, and 1 of my task was to create a kama sutra cipher. Up till the point of generating the keys, I will have no problems. However, due to the nature of kama sutra, I believe that the keys are not supposed to be hard coded into the program, but rather generated for each plain text it takes in.
What I understand is that the cipher text's length should be the same as the length of plain text. However, the thing is that where do I place the key, such that as long as the cipher text is generated by my program, the program would be able to decipher it even if the program was closed. Given that this is an algorithm, I am sure that I should not be looking at storing the key in another flat file/ database.
There are not many related information online regarding this cipher. What I saw are those that allow you to randomise a key set, generate a cipher text based on the given key set. When decrypting, you will also need to provide the same key set. Is this the correct way of implementation?
For those who have knowledge about this, please guide me along.
If you want to be able to decrypt the cyphertext, then you need to be able to recover the key whenever you need. For a classical cypher, this was usually done by using the same key for multiple messages, see the Caesar Cypher for an example. Caesar used a constant key, a -3/+3 shift while Augustus used a +1/-1 shift.
You may want to consult your instructor as to whether a fixed key or a varying key is required.
It will be simpler to develop a fixed key version, and then to add varying key functionality on top. That way you can get the rest of the program working correctly.
You may also want to look at classical techniques for using a keyphrase to mix an alphabet.

Password strength check: comparing to previous passwords

Every now and then I come across applications that force you to change passwords once in a while. Almost universally, they have this strange requirement for the new password: it has to be "significantly" different from your previous password(s).
While at first this sounds logical, next thing I think is: how do they do that? Do they store my passwords in plain text? I would have accepted the answer that they do, if it wasn't for the fact that these are kinds of applications that pretend to care about security so much they force you to change your password if it is expired! Microsoft Exchange is one example of this.
I'm not very good at cryptography and hash functions, so my question is this: Is it possible to enforce this kind of policy without storing passwords in plain text?
Do you know how this policy is implemented in real world applications?
UPDATE: An Example.
I was recently changing my Microsoft Exchange password. I only use Web Access, so it might be different a little -- I have no idea.
So, it forces me to change my password. What I do sometimes is I change it to something new and then change it back almost immediately. The freaky part is that It did not allow me to even change it back because of this. I tried changing it a little, by adding a letter in front of it or changing one symbol -- no luck, it was complaining.
With a typical hash, the best you can do is see if the new password is exactly equal to previous ones. You can break the password into multiple hashes in order to get more flexible with comparison, for example 3 hashes:
Alpha characters only
Numeric characters only
All other characters
You could for example require all the hashes to change to be accepted, to prevent users from just changing their password from SecretPassword01 to SecretPassword02.
A cryptographic expert may weigh in here on if this could be made as secure as a single hash.
NOTE that this is not as secure as a single hash, so before you go implementing this, make sure you have really done your research.
When changing password you're usually asked for the old one to confirm your identity. It's then trivial to compare the old one and the new one to see how much they differ. TBH I don't know how to compare to several previous passwords without storing them, but that's getting into the territory of ridiculous policies anyway.

What are the best rules to follow for what characters to allow in a password?

Without thinking about it at all I just want to say I should allow every character. It gets hashed in any case, and I don't want to limit people who want to create strong passwords.
However, thinking about it more, there are plenty of characters that I have no idea what effect they'd have on things. Foreign characters, ascii symbols, etc. to name a couple.
I tried to Google but I can't find any definitive standard for what people do. Even most professional organizations don't seem to know. It seems to be a common practice for many sites to disallow special characters altogether, which is just silly and not what I want to do.
Anyway, are there any standard recommendations for length, allowed characters, and so forth?
I'm not sure if it matters, but I'll be using ASP.NET w/ C#
Any printable, non-whitespace ASCII character (between 33 and 126 inclusive) are typically allowed in passwords. Many security professionals (and SO commenters) are advising the use of a passphrase in place of a password, so you'd have to allow spaces. The argument is that due to their length, and since phrases aren't in a dictionary, passphrases are more difficult to crack than passwords. (A passphrase can also be easier to remember, so a legitimate user doesn't have to keep it written down on a sticky-note right on their monitor.)
Some strong password generators use a hash, so I'd put a very high limit on the length (512 or 1024) just to be inclusive. Password generators today often yield strings of 32-128 characters, but who knows what hashes will be used in the next few years.
Non-ASCII characters certainly make things harder when it comes to entering the password on limited devices (mobiles, consoles etc) - but usually not impossible. Arguably if the user wants to do that, you should let them. It's easy enough to do a reasonable and consistent thing - encode in UTF-8 before hashing, for example. You'd only get into difficulties if some input device sent the characters as a composition (e.g. e + acute accent instead of "e acute") - but I suspect that wouldn' t happen in real life. (You could decompose everything yourself, but that would be a lot of trouble to go to for an edge case.)
I'd restrict it to printable characters, however. Putting tabs, form feeds etc in a password really is asking for trouble.
Not an expert, but I hate when characters I choose and not that bizarre are rejected. So, I think I agree with your gut.
Short answer: allow as much as the system backing it can support. Nowadays there's really no excuse not to use full unicode support for text entry, and that includes passwords. I don't think you need to worry about problems with characters as long as they're handled literally (but I'm not a pro in this field--beware of sql injection).
I have a pet peeve against sites that impose restrictions on passwords... any kind of restriction. I like sites that will tell you how strong your password is and recommend you make it stronger, but forcing a user to type at least 8 characters, or to require both letters and numbers, etc. is just plain frustrating.
If you need to have a maximum field size (for example for storing in a database) try to make it large enough for anything that people would type out by hand. There's really no such thing as a too-large password field since there's always the potential to use an automated, generated strong password, but 64 to 128 characters would certainly suffice.
Fundamentally, most of the unicode class of characters should be allowed. Do skip however control characters (e.g. 0-31 besides space), the byte order mark (0xfffe and oxfeff). Further, you want to first canonicalize the representation to get rid of problems caused by differing representations. You might issue warnings though for characters that seem to be too hard to enter, but users will guard against that themselves.
Remember: When you are storing passwords, all passwords should be encrypted with a one-way algorithm like md5 of sha1. Since these algorithms always yield hexadecimal numbers, you don't need to worry about SQL injections or anything like that.
So, as long as you can md5 or sha1 a character, it should be accepted.
If you are talking about preventing SQL-injection type of attacks, it is probably a better idea to make sure your code does what it is supposed to do, rather than relying on restricting the input so the problem becomes easier.
For non-ascii characters, I don't see that as a more difficult problem if your input can be correctly represented as a binary string (and not as text), which is then passed to your hash function or key generator, etc.
Add another vote for "let the user include any and all characters that their interface allows them to enter". I wouldn't even disallow tab or control characters. Your software has the capability to accept arbitrary byte strings and hash them, so accept arbitrary byte strings as passwords. To do otherwise reduces the space which an attacker must search in a brute-force or dictionary attack.
(Of course, even if you do allow everything, 99% of users will still use their pet's name as their password...)
Eventually you may have to print out the clear password in a confirmlation email sent to your users.
PS: Might consider also encoding problems in the email, if it's not standard ascii (eg. Japanese characters), it's possible that a user will not receive the email in the proper format or simply can't read it on another system due to fonts not being installed.
All this weighs in the "printable" ascii characters range.