How to create a hash function to mask confidential informations? - cryptography

In the current project I would like to create my own hash function but so far haven't gained much theoretical background on hashing principle.
I would be very thankful if anyone of you could suggest any useful resource about the theory of hashing, cryptography and practical implementations of hash functions.
Thank you!
P.S. As hashing blocks of informations in this case is a part of larger research project I would like to create a hash function on my own and this way learn the principle rather than use the existing libraries. The informations I am working on will stay in house so there is no need to worry about the possible attacks.

Don't. Existing encryption and hashing algorithms (as pointed out in the comments above, they have little to do with each other) have been designed by experts and extensively peer-reviewed. Anything you write from scratch will suck in comparison. Guaranteed. Really. The only thing you'll gain is a false sense of security -- your algorithm won't be peer-reviewed, so you'll think it's more secure than it actually is.
But if you do want to know more about the theory (and gain an appreciation for why you shouldn't do it yourself), read "Applied Cryptography" by Bruce Schneier. You won't find a better resource.
Brush up on your math first.

First of all, if you use the right terminology, you'll be better able to find helpful resources.
"Encryption" is performed with ciphers, not cryptographic hash functions. You'll never find a reliable reference that mentions a hash as an "encryption function". So, if you are trying to learn about hashes, leave "encryption" out.
Another term for "cryptographic hash" is "message digest," so keep that in mind as you search.
Many chapters of an excellent book, The Handbook of Applied Cryptography are available for free online. Especially check out Chapter 9, "Hash Functions and Data Integrity."

Instead of writing your own hashing function have you considered using a standard hashing function from a library and then salting the data you're hashing? That is common practice and ensures that anyone with software that decrypts data with standard encryption functions doesn't intercept your data and decipher it.

Like the others said, do not make a new kind of hash (the code will get complicated and you might as well reinvent SHA1 or MD5.) Study cryptography first. But if you are willing to, look at existing hashes (most are based on another). Or you can look at the hash model. The hash model looks like:
A mixing stage (mix up the contents and modify)
A combining stage (combine the data in the mixing stage with the initial state [the original hash])
Or maybe start with something simple and build up from it (to make a secure hash).

Related

Storing PBKDF2 Settings Alongside Password

I'm experimenting with PBKDF2 for my passwords right now, and it dawned on me that if I were to ever upgrade to a faster machine in the future, I would want to increase the number of PBKDF2 iterations. However, this would invalidate all the current passwords that I have stored. One idea I've seen was to store the PBKDF2 settings along with the password (similar to how you store the salt) such as the iteration count and the PRF used (SHA-256, SHA-512) at the time of the hash creation. It sounds like a good idea in terms of backwards compatibility, but I wanted to know if there are any drawbacks to doing this. Any insight into this would be appreciated.
You are definitely taking the right direction here. Many systems store just the salt but where is the rest of the parameters required to perform PBKDF2? Hardcoded! And hardcoding parameters of cryptographic functions is almost never a good idea.
Only drawback I see is that when you store all the parameters your database will probably take a little more space but your future upgrades will be much easier and straightforward.
BTW RFC 2898 defines structure called PBKDF2-params which was designed as a data holder for all the public parameters of PBKDF2 algorithm. Use it at least as an inspiration so you won't forget any important parameter.

AES encryption/decryption for a beginner

I am trying to encrypt an NSString to both NSString and NSData in Objective-C and so I began a search.
I started off here, but that went way over my head, unfortunately.
I then found myself at this post and it came across to be very easy to follow, so I went along and tried to figure out the implementation. After looking over the implementation, I saw the second answer in the post and saw he had more adaptable implementations, which brought me to his gist. As per the gist readme, he "took down this Gist due to concerns about the security of the encryption/decryption". That leads me to believe that the security of the implementation from above has security flaws as well.
From that gist, however, he mentioned another alternative that I could use for encryption. After taking a look at the code, I noticed that it generates NSData with "a header, encryption salt, HMAC salt, IV, ciphertext, and HMAC". I know how to handle that to decode using the same library again, but how would I pass this off to a server guy, given that I don't quite know what I'm sending to him?
At the root of it all, I'm in over my head. Given what I said above and knowing that I don't have the time to take on a lot of learning for this, unless if it is absolutely necessary, how should I best handle going about this encoding/decoding process, given a private key with the end goal of shipping it off to a server that is not designed by me? (How's that for a run on sentence!)
Maybe you should ask the server guy? When ever you have encryption between too parties you have to have some kind of agreement on the format of that data, the raw primitives don't handle that alone, not to mention it's easy to mess things up security wise dealing with just the primitives and the desire to just send the aes ciphertext alone is going to cause mistakes.
RNCryptor, which you mention, is a high level encryption library it defines a simple format that others would have to conform too, it's simple thus helps going cross platform, but it has that extra that you need to do AES properly. There are other libraries like that too (NaCL, GPGME, and Keyczar), that are not as simple in format, but simple in usage, so you'd need to be able to use the library on both ends, but I'd highly recommend that you uses something like that, if you can, rather than rolling your own.
Keyczar specifically exists for java, python, c++, c# and go, so if you can use the c++ version on the iOS (or Mac, which ever you are targeting on the client) you might be good on the server as there are several choices.

Is SHA1 still secure for use as hash function in PBKDF2?

As there have been significant advances in the cryptoanalysis of SHA1 it's supposed to be phased out in favor of SHA2 (wikipedia).
For use as underlying hash function in PBKDF2, however, it's basically used as a PRNG. As such it should be still secure to use SHA1 as hash for PBKDF2, right?
None of the currently known weaknesses on SHA-1 has any impact on its security when used in HMAC, a fortiori when used in PBKDF2. For that matter, MD5 would be fine too (but not MD4).
However, SHA-1 is not good for public relations: if, in 2011, you use SHA-1, then you must prepare yourself to have to justify that choice. On the other hand, SHA-256 is a fine "default function" and nobody will question it.
There is no performance issue in PBKDF2 (PBKDF2 includes an "iteration count" meant to make it exactly as slow as needed) so there is very little reason to prefer SHA-1 over SHA-256 here. However, if you have an existing, deployed system which uses PBKDF2-with-SHA-1, then there is no immediate need to "fix" it.
Sure. SHA-256, or larger, might be more efficient if you want to generate more key material.
But PBKDF2-HMAC-SHA1 is fine. Also standard HMAC use has not been compromised, but again, longer hashes are in principle more secure in that scenario.
The attacks on SHA1 which caused a lot of public turmoil make it possible to construct a message which has the same hash as a different message. This is of course always possible (in principle) for every hash function, since a hash function has fewer output bits than input bits. However, it is normally not likely to happen by accident, and doing it on purpose should be computationally not feasible.
From a "ensure message integrity" point of view, this can be seen as a disaster.
On the other hand, for the purpose of generating random numbers, this has absolutely no bearing.

How does password-based encryption technically work?

Say I have some data and a password, and I want to encrypt the data in such a way that it can only be recovered with the right password.
How does this technically work (i.e. how to implement this)? I often hear people use bitshifting for encryption, but how do you base that on a password? How does password-based encryption work?
An example is Mac OS X FileVault
Thanks.
If you give sample code, preferably in C, Objective-C or pseudocode.
For (symmetric) encryption you need a secret key for encryption and decryption.
Usually, the password you supply is used as the source of this key. For various security reasons, the password is not (and often cannot, due to requirements of the cipher used) directly used as the key. Instead, a key derivation function is used to generate the key from the password.
This is why passwords for encryption must be long and fairly random: Otherwise the resulting key will only come from a very small subset of possible keys, and these can then simply all be tried, thus brute-forcing the encryption.
As to code examples, there are several possibilities:
look at the source code of a crypto library, such as OpenSSL
look at the source code of a program that implements encryption, such as GnuPG
google some sample source code for a simple encryption algorithm, or a key derivation function, and try to understand it
This depends on what you want to learn.
You'll need to look to other resources for a deep explanation, as this question is extremely broad.
Speaking generally: you use a password as a "seed" for an encryption key, as sleske pointed out. Then you use this key to apply a two-way encryption algorithm (i.e. one that can be applied once to encrypt and again to decrypt). When you apply the algorithm to a piece of data, it becomes encrypted in such a way that you could never get the data back out again without using the same key, and you can't practically produce the same key without having the same password as a seed.
If you're interested in crypto, read Applied Cryptography by Bruce Schneier. Excellent read, lots of examples. It goes through many different cryptography types.
An easy way, but not exactly secure, is to rotate each byte by a number determined by the password. You can use a hash code from a string, or count the number of characters, or whatever for the number.
What you are probably thinking of, though, is public key encryption. Here is a link to a document that will tell you the math for it - you'll have to work out the implementation details yourself, but it's not that hard once you understand the math.
http://mathaware.org/mam/06/Kaliski.pdf
The basic building block of most block ciphers is a construction called a Feistel Network. It's reasonably easy to understand.
Stream ciphers are even simpler - they're essentially just pseudo-random number generators, albeit with some important security properties, where the initial internal state is derived from the key.
Password based encryption IS symmetric. The input usually consists of a salt in addition to the password. FooBabel has a cool app where you can play around with this... currently they hard code the Salt to an array of eight bytes (zero to seven) for simplicity. I put in a request to see that they let users input the salt. Anyway, here it is - PBECrypto

Does partial known plaintext weaken a hash?

This is a question about an authentication scheme.
Say I have a shared secret string S, and two computers, C1 and C2
Computer one (C1) sends a random string (R) to computer two (C2)
C2 hashes (say SHA256) the concatenation of S and R (SR)
C2 sends the hash of SR to C1, along with some instructions
C1 compares the received hash of SR with it's own hash of SR and executes the instructions if they match
Wash, rinse, repeat with different values of R
Now, what I want to know is if someone intercepts a whole bunch of R values, and a whole bunch of SR hashes, can they use that as a "crib" to work out what S is, thus allowing them to forge instructions?
I'm already aware of the potential for a MITM attack here (attacker intercepts response, changes the instructions and forwards it on).
I honestly don't know what I'm dealing with here, I only have a bit of historical knowledge about encryption but that included the use of cribs to break them. I'm not a theorist, so anything you can definitively tell me about specific strong hashes would be great.
Alternate authentication schemes are also welcome, assuming the constraints of an existing shared secret string like in this example. Would I be better off just using S as a key for AES? If I do that, can I still use this in the encrypted message to prevent replay attacks?
Any and all advice welcome, I sort of deviated from my question at the end, so feel free to deviate in your answers!
What you're talking about is called a message authentication code - a MAC. If the secret is sufficiently large (such that it cannot be brute forced in reasonable time) and the MAC is properly implemented, then no, knowing the plaintext doesn't help the attacker.
The key, however, is that it has to be properly implemented. The problem is that crypto is hard. Really hard. Unless you're an expert or have an expert to review your work in context, it's extremely easy to make a mistake. Even worse, it's very easy for people to write crypto that they don't know how to break, but which can be broken quite easily by someone in the know.
The advice you got in the comments is the correct advice: use a proven scheme like SSL or TLS instead of creating your own.
Answering your question:
No, the only way to break a hash is brute force, as small diferences in the origin mean big differences in the output of the hashing algorithm (given that the algorithm has been proben to be unbroken). You must to know S to perform a MITM here.
But, Byron Withlock is correct:
Using a homemade encryption scheme when there are sooo many better schemes available is crazy. Leave encryption to the experts. – Byron Whitlock 4 mins ago
I'm with Byron. Just use something off-the-shelf and tested by people with a clue. How about SSL? – Steven Sudit 57 secs ago
Many cryptographic hash functions are vulnerable to a lengt extension attack. That means if an attacker knows hash(S) but not S, then he may still be able to compute hash(S || M) for some messages M. For example, the attacker might try to get hash(S), by sending the challenge string "" to one of the parties. Your scheme does not have a detailed description. So it is not clear if such a length extension attack is possible. To avoid these kind of attacks you might consider to use for example HMAC instead of the more simple hashing scheme that you propose.
This scheme is weak because the instructions themselves aren't authenticated. You want to send the MAC of R + instructions - and ensure that R is fixed length so that an attacker can't shuffle about between R and instructions.
I take it the purpose of the random value is to ensure the "freshness" of the instructions sent?
You could also look into using gpg, if SSL doesn't meet your needs. That's likely to be a lot better than homegrown crypto.