Sha1 hash of multiple sha1-hashes -> Secure to identify file? - cryptography

Lets say I split a 1G file to 1024 chunks of 1Mb in browser, get an SHA1 of every chunk and save this hash temporary. Finally after hashing all chunks, do an SHA1 of all previous collected SHA1-hashes (do an hash of hashes). Then send this "final"-hash to my server.
Would this hash be secure to identify my file on the server? (Assuming we have an secure connection and sha1 was collision free)
Is it an bad idea to do an hash of multiple hashes?

I guess your objective is to check integrity of the uploaded file comparing a chekcsum calculated in client side and in server side after completion. Then hashing each chunk, combine them and hashing the result should be enough.
//pseudocode
SHA1.digest (
SHA1.digest(chunk 1) + SHA1.digest(chunk 2) + ... + SHA1.digest(chunk n))
But note you can perform an incremental SHA1 hash on the complete file adding each chunk to the calculation. In this way the result is the same that hashing the complete file in one step and you do not need to combine temporal data
SHA1.update(chunk 1)
SHA1.update(chunk 2)
...
SHA1.update(chunk n)
SHA1.digest ()
Consider also to move to sha256 as shown in the comments, but probably for this purpose SHA1 would be adequate

This should work. Assuming SHA-1 is collision free, for two different files at least one of this hashes differ from each other. So the "final" hashes will also differ.
In general, hashing hashes does not improve security. If you want more security use SHA-256.

Related

Initialization vector - best practices (symmetric cryptography)

I would like to ask about best practices regarding a usage of an initialization vector (IV) and a key for symmetric cryptography algorithms.
I want to accept messages from a client, encrypt them and store in a backend. This will be done over a time, and there will be requests coming at a later time for pooling out the messages and return them in a readable form.
According what I know, the key can be the same during the encryption of multiple separate messages. The IV should change with every new encryption. This however, will cause problems, because every message will need a different IV for de-cryption at a later time.
I’d like to know if this is the best way of doing it. Is there any way to avoid storing IV with every message, which would simplify entire process of working with encryption/decryption?
IV selection is a bit complicated because the exact requirements depend on the mode of operation. There are some general rules, however:
You can't go wrong¹ with a random IV, except when using shorter IVs in modes that allow this.
Never use the same IV with the same key.
If you only ever encrypt a single message with a given key, the choice of IV doesn't matter².
Choose the IV independently of the data to encrypt.
Never use ECB.
Of the most common specific modes of operation:
CBC requires the IV to be generated uniformly at random. Do not use a counter as IV for CBC. Furthermore, if you're encrypting some data that contains parts that you receive from a third party, don't reveal the IV until you've fully received the data, .
CTR uses the IV as the initial value of a counter which is incremented for every block, not for every message, and the counter value needs to be unique for every block. A block is 16 bytes for all modern symmetric ciphers (including AES, regardless of the key size). So for CTR, if you encrypt a 3-block message (33 to 48 bytes) with 0 as the IV, the next message must start with IV=3 (or larger), not IV=1.
Modern modes such as Chacha20, GCM, CCM, SIV, etc. use a nonce as their IV. When a mode is described as using a nonce rather than an IV, this means that the only requirement is that the IV is never reused with the same key. It doesn't have to be random.
When encrypting data in a database, it is in general not safe to use the row ID (or a value derived from it) as IV. Using the row ID is safe only if the row is never updated or removed, because otherwise the second time data is stored using the same ID, it would repeat the IV. An adversary who sees two different messages encrypted with the same key and IV may well be able to decrypt both messages (the details depend on the mode and on how much the attacker can guess about the message content; note that even weak guesses such as “it's printable UTF-8” may suffice).
Unless you have a very good reason to do otherwise (just saving a few bytes per row does not count as a very good reason) and a cryptographer has reviewed the specific way in which you are storing and retrieving the data:
Use an authenticated encryption mode such as GCM, CCM, SIV or Chacha20+Poly1305.
If you can store a counter somewhere and make sure that it's never reset as long as you keep using the same encryption key, then each time you encrypt a message:
Increment the counter.
Use the new value of the counter as the nonce for the authenticated encryption.
The reason to increment the counter first is that if the process is interrupted, it will lead to a skipped counter value, which is not a problem. If step 2 was done without step 1, it would lead to repeating a nonce, which is bad. With this scheme, you can shave a few bytes off the nonce length if the mode allows it, as long as the length is large enough for the number of messages that you'll ever encrypt.
If you don't have such a counter, then use the maximum nonce length and generate a random counter. The reason to use the maximum nonce length is that due to the birthday paradox, a random n-bit nonce is expected to repeat when the number of messages approaches 2n/2.
In either case, you need to store the nonce in the row.
¹ Assuming that everything is implemented correctly, e.g. random values need to be generated with a random generator that is appropriate for cryptography.
² As long as it isn't chosen in a way that depends on the key.

Is it safe to store extremely complicated Password in SHA1?

Is it safe to hash extremely complicated password (longer than 25 chars, any ascii chars even binary) with SHA1 ?
Actually, the password represent a tokenID but I don't want to store it like this in the database, i prefer to hash it for more security.
The password (token) is valid only for 14 Days and I need to hash it the most faster as possible (so no way to use something like bcrypt)
What must be the ideal length of the Password (token) ?
In the general case, no. "Complicated" it may be, but cryptographically random it probably is not.
A bare minimum would be applying an RFC2104 HMAC with a secret key (pepper); however, a more appropriate alternative that can, if you absolutely insist, still be quite fast would be to use PBKDF2-HMAC-SHA-256 and ignore all rules of security regarding a sufficiently high iteration count, i.e. choose an iteration count of 10, instead of 10,000.
For password/token hashing, of course, never request more bytes of PBKDF2 output than the native hash function provides - 20 for SHA-1, 32 for SHA-256, 64 for SHA-512.
I have several example implementations of PBKDF2 at my Github repository that may help, and there are others in other languages, of course.
Use a cryptographically random per-password (per-token) salt.

Why in some cases are used only the first x chars of a md5 hash instead of using all of them?

For example commit list on GitHub shows only first 10, or this line from tornadoweb which uses only 5
return static_url_prefix + path + "?v=" + hashes[abs_path][:5]
Are only the first 5 chars enough to make sure that 2 different hashes for 2 different files won't collide?
LE: The example above from tornadoweb uses md5 hash for generating a query sting for static file caching.
In general, No.
In fact, even if a full MD5 hash were given, it wouldn't be enough to prevent malicious users from generating collisions---MD5 is broken. Even with a better hash function, five characters is not enough.
But sometimes you can get away with it.
I'm not sure exactly what the context of the specific example you provided is. However, to answer your more general question, if there aren't bad guys actively trying to cause collisions, than using part of the hash is probably okay. In particular, given 5 hex characters (20 bits), you won't expect collisions before around 2^(20/2) = 2^10 ~ one thousand values are hashed. This is a consequence of the the Birthday paradox.
The previous paragraph assumes the hash function is essentially random. This is not an assumption anyone trying to make a cryptographically secure system should make. But as long as no one is intentionally trying to create collisions, it's a reasonable heuristic.

SHA1 Decryption in VB.Net

Is it possible to decrypt a SHA1 string in VB.Net, knowing the key?
I have seen "decryption" of credentials before, however - in Java: http://pastebin.com/P0LuN00P
The entire point of SHA1 is to make this impossible.
However, SHA1 has weaknesses which make this less impossible.
You should use SHA512 to make it more impossible.
You might be looking for Rijndael, a (good) symmetric encryption algorithm.
I think you got SHA1 wrong.
SHA1 is not an encryption algorithm, it is a hash function.
A hash function is a function taking some unconditionally long argument string and transform that string to a much smaller string, called the hash. It is very hard to get from a hash to the string used to generate the hash. Actually, since the input are arbitrarily long, there are multiple such inputs that give the same hash. Two such inputs are called collisions. Therefore you really cant "decrypt" a hash, you can find a input which gives the same hash though.
Commonly hashing functions are used to hash a user password, store it in a database on the server. When the server is supplied a password from a user, the server checks to see if the password is correct by checking that hashing the password gives the same result as stored in the database.
If a malicious user grabs what is stored in the database, he is unable to know the actual password since it is very hard to go from hash to the string used to generate the hash.
SHA1 isn't encrypted, it's hashed. So no, it's not possible to decrypt it. You might try a Rainbow Tables: http://www.freerainbowtables.com/

How do I convert password hashing from MD5 to SHA?

I've got an old application that has user passwords stored in the database with an MD5 hash. I'd like to replace this with something in the SHA-2 family.
I've thought of two possible ways to accomplish this, but both seem rather clunky.
1) Add a boolean "flag" field. The first time the user authenticates after this, replace the MD5 password hash with the SHA password hash, and set the flag. I can then check the flag to see whether the password hash has been converted.
2) Add a second password field to store the SHA hash. The first time the user authenticates after this, hash the password with SHA and store it in the new field (probably delete their MD5 hash at the same time). Then I can check whether the SHA field has a value; this essentially becomes my flag.
In either case, the MD5 authentication would have to remain in place for some time for any users who log in infrequently. And any users who are no longer active will never be switched to SHA.
Is there a better way to do this?
Essentially the same, but maybe more elegant than adding extra fields: In the default authentication framwork in Django, the password hashes are stored as strings constructed like this:
hashtype$salt$hash
Hashtype is either sha1 or md5, salt is a random string used to salt the raw password and at last comes the hash itself. Example value:
sha1$a1976$a36cc8cbf81742a8fb52e221aaeab48ed7f58ab4
You can convert all your MD5 Strings to SHA1 by rehashing them in your DB if you create your future passwords by first MD5ing them. Checking the passwords requires MD5ing them first also, but i dont think thats a big hit.
php-code (login):
prev:
$login = (md5($password) == $storedMd5PasswordHash);
after:
$login = (sha1(md5($password)) == $storedSha1PasswordHash);
Works also with salting, got the initial idea from here.
I think you've already got the best possibilities. I like #1 more than #2, since there's no use for the md5 once the sha is set.
There's no way to reverse the MD5, so you have to wait for the user to authenticate again to create a new hash.
No - basically you'll have to keep the MD5 in place until all the users you care about have been converted. That's just the nature of hashing - you don't have enough information to perform the conversion again.
Another option in-keeping with the others would be to make the password field effectively self-describing, e.g.
MD5:(md5 hash)
SHA:(sha hash)
You could then easily detect which algorithm to use for comparison, and avoid having two fields. Again, you'd overwrite the MD5 with SHA as you went along.
You'd want to do an initial update to make all current passwords declare themselves as MD5.
Your second suggestion sounds the best to me. That way frequent users will have a more secure experience in the future.
The first effectively "quirks-mode"'s your codebase and only makes sure that new users have the better SHA experience.
If the MD5's aren't salted you can always use a decryption site/rainbow tables such as: http://passcracking.com/index.php to get the passwords. Probably easier to just use the re-encode method though.
Yes you should know the real password first before you convert it into sha-1..
If you want to find the real password from md5 encrypted string, you can try md5pass.com