I have lots of small secrets that I want to store encrypted in a database. The database client will have the keys, and the database server will not deal with encryption and decryption. All of my secrets are 16 bytes or less, which means just one block when using AES. I'm using a constant IV (and key) to make the encryption deterministic and my reason for doing deterministic encryption is to be able to easily query the database using ciphertext and making sure the same secret is not stored twice (by making the column UNIQUE). As far as I can see there should be no problem doing this, as long as the key is secret. But I want to be sure: Am I right or wrong? In case I'm wrong, what attacks could be done?
BTW: Hashes are quite useless here, because of a relatively small number of possible plaintexts. With a hash it would be trivial to obtain the original plaintext.
An ideal cipher, for messages of length n bits, is a permutation of the 2n sequences of n bits, chosen at random in the 2n! such permutations. The "key" is the description of which permutation was chosen.
A secure block cipher is supposed to be indistinguishable from an ideal cipher, with n being the block size. For AES, n=128 (i.e. 16 bytes). AES is supposed to be a secure block cipher.
If all your secrets have length exactly 16 bytes (or less than 16 bytes, with some padding convention to unambiguously extend them to 16 bytes), then an ideal cipher is what you want, and AES "as itself" should be fine. With common AES implementations, which want to apply padding and process arbitrarily long streams, you can get a single-block encryption by asking for ECB mode, or CBC mode with an all-zero IV.
All the issues about IV, and why chaining modes such as CBC were needed in the first place, come from multi-block messages. AES encrypts 16-byte messages (no more, no less): chaining modes are about emulating an ideal cipher for longer messages. If, in your application, all messages have length exactly 16 bytes (or are shorter, but you add padding), then you just need the "raw" AES; and a fixed IV is a close enough emulation of raw AES.
Note, though, the following:
If you are storing encrypted elements in a database, and require uniqueness for the whole lifetime of your application, then your secret key is long-lived. Keeping a secret key secret for a long time can be a hard problem. For instance, long-lived secret keys need some kind of storage (which resists to reboots). How do you manage dead hard disks ? Do you destroy them in an acid-filled cauldron ?
Encryption ensures confidentiality, not integrity. In most security models, attackers can be active (i.e., if the attacker can read the database, he can probably write into it too). Active attacks open up a full host of issues: for instance, what could happen if the attacker swaps some of your secrets within the database ? Or alters some randomly ? Encryption is, as always, the easy part (not that it is really "easy", but it is much easier than the rest of the job).
If the assembly is publicly available, or can become so, your key and IV can be discovered by using Reflector to expose the source code that uses it. That would be the main problem with this, if the data really were secret. It is possible to obfuscate MSIL, but that just makes it harder to trace through; it still has to be computer-consumable, so you can't truly encrypt it.
The static IV would make your implementation vulnerable to frequency attacks. See For AES CBC encryption, whats the importance of the IV?
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am using AES128 in CBC mode, and I need a 16-byte key, so I was wondering if using sha2 or sha3 and then truncating it to 16 bytes (take first 16 bytes from the left) would make sha2/sha3 weaker than crc32 which gives me 16 bytes out of the box.
Each bit of a cryptographically secure hash is effectively random (i.e. independent of all the other bits). This is not true of non-cryptographic hashes. This property is critical for a secure key. You should always use a cryptographic hash for key derivation.
Truncating a long secure hash is a perfectly acceptable way to create a secure hash of shorter length. You may also select any subset of bits rather than just the most significant or least significant. If this weren't true, then the original hash would not itself be secure, because it would suggest some non-randomness in the output.
SHA-2 and SHA-3 intend to be cryptographically secure hashes (and at this point, we believe they are). CRC does not even intend to be cryptographically secure.
If the input key material is not itself random, then a fast hash like the SHA series may be subject to brute force. If so, then you need to use key stretching as well as hashing, for example with PBKDF2.
But you should never use CRC for any of this. It is not intended to be a secure hash.
For more discussion, see Should I use the first or last bits from a SHA-256 hash? and “SHA-256” vs “any 256 bits of SHA-512”, which is more secure?
I am using AES128 in CBC mode, and I need a 16-byte key, so I was wondering if using sha2 or sha3 and then truncating it to 16 bytes (take first 16 bytes from the left) would make sha2/sha3 weaker than crc32 which gives me 16 bytes out of the box.
The question was not clear about how the input to the CRC or SHAx is generated. The OP cleared more. So, I've provided the answer below parts;
I mean regardless of the input (say the input was even abcd ), would truncating sha2/3 to 16 bytes be more secure than using crc32.
First of all, forget CRC, it is not a cryptographical hash function, forget it.
When the input space is small, there is a special case of the pre-image attack of the hash functions. The attacker can try all possible combinations to generate the key. You can read more details in this Cryptography.SE Q/A
Secure hashing when the input comes from a small space
Is it easy to crack a hashed phone number?.
Forgot about the small input space!, the entities like BitCoin Miner or SuperComputer like Summit Can reach 2^64 very easily. Which simply says the 8-byte.
One should generate a strong password like the dicewire or Bip-39. This will provide you easy to remember and strong passwords. See also XKCD
Once you generated a good password, then you can pass it to the poor man's KDF1, to better use HKDF. Since your input material is good you can skip the expand part of the HKDF. You can also use the Password-based Key derivation functions like Scrypt, PBKDF2, and Argon2. In this case, choose the Argon2 since it was the winner of the Password Hashing Competition in July 2015.
I was just trying to encrypt data like sounds for a game with AES 128, and was just wondering if using only 16 bytes of the hashed password-like key with sha2/3 was a more secure solution than using a whole crc32 output instead. I've never worked with any of the other methods you mentioned...
For the proper use of CBC mode, you need a nonce, too. You can use HKDF or PBKDF2, Argon2, etc with different info/nonce to derive the nonce, too. This is very common.
Note those about CBC;
The nonce must be unique under the same key, i.e (Key,IV) pair must be used once
CBC IV must be unpredictable, however, as far as I can see this is not your case
CBC is vulnerable to padding oracle attacks on the server-side. This is not possible in your case, too.
CBC mode can only provide CPA security, there is no integrity and authentication. To provide integrity and authentication either use HMAC with a different key, or use the combined modes.
Use Authenticated Encryption With Associated Data mode of encryptions like AES-GCM and ChaCha20-Poly1305. Correctly using the GCM may be hard, better use ChaCha20-poly1305 or xChaCha20-poly1305 for better nonce random generations.
Why DES can be used only with 56bit key? What happens if we use the longer key? Also, why the plaintext must be 64 bits in length?
US regulations at the time required users of stronger than 56-bit keys, to submit to "key recovery" to enable law enforcement back-door access.
Thus DES, as a standard, was specified at the maximum allowed key length of 56 bits. If you used a longer key, you would not be compatible with other DES systems.
See: http://en.wikipedia.org/wiki/56-bit_encryption
If you are implementing a system & have a choice of encryption, more modern & stronger ciphers would be absolutely recommended. The current standard would be AES (Advanced Encryption System) which is widely available, strong and allows key sizes from 128 - 256 bits.
For desktop or server applications, AES-256 would be a good default choice.
See: http://en.wikipedia.org/wiki/Advanced_Encryption_Standard
When encrypting data, plaintexts must often be "padded" to a minimum size. Ciphers rely on jumbling and interactions between multiple bits, to preserve secrecy of the plaintext & avoid potentially revealing the key. For a short plaintext without padding, jumbling and interactions are removed as a factor & mathematical complexity drops vastly.
Encrypting just a single character without padding, for example a 'y' or 'n' response, could for example reduce a 2^256 keyspace down to possibly 2^24. That could be cracked in minutes. This would enable an attacker to guess large parts of the key, rapidly break it, and then (worst of all) -- decrypt all other traffic on the channel.
Just working on a algorithm and so far i can encrypt and decrypt a number, which works fine. My question now is how do i go abouts encrypting an image? How does the UIdata look and shold i convert the image to that before I start? Never done anything on this level in terms of encryption and any input would be great! Thanks!
You'll probably want to encrypt in small chunks - perhaps a byte or word/int (4 bytes), maybe even a long (8 bytes) at a time depending on how your algorithm is implemented.
I don't know the signature of your algorithm (i.e. what types of input it takes and what types output it gives), but the most common ciphers are block ciphers, i.e. algorithms which have a input of some block size (nowadays 128 bits = 16 bytes is a common size), and a same-sized output, additionally to a key input (which should also have at least 128 bits).
To encrypt longer pieces of data (and actually, also for short pieces if you send multiple such pieces with the same key), you use a mode of operation (and probably additionally a padding scheme). This gives you an algorithm (or a pair of such) with an arbitrary length plaintext input, and slightly bigger ciphertext output (which the decryption algorithm undoes then).
Some hints:
Don't use ECB mode (i.e. simply encrypting each block independently of the others).
Probably you also should apply a MAC, to protect your data against malicious modifications (and also breaking of the encryption scheme by choosen-ciphertext attacks). Some modes of operation already include a MAC.
When using AES encryption, plaintext must be padded to the cipher block size. Most libraries and standards use padding where the padding bytes can be determined from the unpadded plaintext length. Is there a benefit to using random padding bytes when possible?
I'm implementing a scheme for storing sensitive per-user and per-session data. The data will usually be JSON-encoded key-value pairs, and can be potentially short and repetitive. I'm looking to PKCS#5 for guidance, but I planned on using AES for the encryption algorithm rather than DES3. I was planning on a random IV per data item, and a key determined by the user ID and password or a session ID, as appropriate.
One thing that surprised me is the PKCS#5 padding scheme for the plaintext. To pad the ciphertext to 8-byte blocks, 1 to 8 bytes are added at the end, with the padding byte content reflecting the number of padding bytes (i.e. 01, 0202, 030303, up to 0808080808080808). My own padding scheme was to use random bytes at the front of the plaintext, and the last character of the plaintext would be the number of padding bytes added.
My reasoning was that in AES-CBC mode, each block is a function of the ciphertext of the preceding block. This way, each plaintext would have an element of randomness, giving me another layer of protection from known plaintext attacks, as well as IV and key issues. Since my plaintext is expected to be short, I don't mind holding the whole decrypted string in memory, and slicing padding off the front and back.
One drawback would be the same unpadded plaintext, IV, and key would result in different ciphertext, making unit testing difficult (but not impossible - I can use a pseudo-random padding generator for testing, and a cryptographically strong one for production).
Another would be that, to enforce random padding, I'd have to add a minimum of two bytes - a count and one random byte. For deterministic padding, the minimum is one byte, either stored with the plaintext or in the ciphertext wrapper.
Since a well-regarded standard like PKCS#5 decided to use deterministic padding, I'm wondering if there is something else I missed, or I'm judging the benefits too high.
Both, I suspect. The benefit is fairly minimal.
You have forgotten about the runtime cost of acquiring or generating cryptographic-quality random numbers. at one extreme, when a finite supply of randomness is available (/dev/random on some systems for instance), your code may have to wait a long time for more random bytes.
At the other extreme, when you are getting your random bytes from a PRNG, you could expose yourself to problems if you're using the same random source to generate your keys. If you're sending encrypted data to multiple recipients one after another, you have given the previous recipient a whole bunch of information about the state of the PRNG which will be used to pick the key for your next comms session. If your PRNG algorithm is ever broken, which is IMO more likely than a good plaintext attack on full AES, you're much worse off than if you had used deliberately-deterministic padding.
In either case, however you get the padding, it's more computationally intensive than PKCS#5 padding.
As an aside, it is fairly standard to compress potentially-repetitive data with e.g. deflate before encrypting it; this reduces the redundancy in the data, which can make certain attacks more difficult to perform.
One last recommendation: deriving the key with a mechanism in which only the username and password vary is very dangerous. If you are going to use it, make sure you use a Hash algorithm with no known flaws (not SHA-1, not MD-5). cf this slashdot story
Hope this helps.
I was wondering if I could reasons or links to resources explaining why SHA512 is a superior hashing algorithm to MD5.
It depends on your use case. You can't broadly claim "superiority". (I mean, yes you can, in some cases, but to be strict about it, you can't really).
But there are areas where MD5 has been broken:
For starters: MD5 is old, and common. There are tons of rainbow tables against it, and they're easy to find. So if you're hashing passwords (without a salt - shame on you!) - using md5 - you might as well not be hashing them, they're so easy to find. Even if you're hashing with simple salts really.
Second off, MD5 is no longer secure as a cryptographic hash function (indeed it is not even considered a cryptographic hash function anymore as the Forked One points out). You can generate different messages that hash to the same value. So if you've got a SSL Certificate with a MD5 hash on it, I can generate a duplicate Certificate that says what I want, that produces the same hash. This is generally what people mean when they say MD5 is 'broken' - things like this.
Thirdly, similar to messages, you can also generate different files that hash to the same value so using MD5 as a file checksum is 'broken'.
Now, SHA-512 is a SHA-2 Family hash algorithm. SHA-1 is kind of considered 'eh' these days, I'll ignore it. SHA-2 however, has relatively few attacks against it. The major one wikipedia talks about is a reduced-round preimage attack which means if you use SHA-512 in a horribly wrong way, I can break it. Obivously you're not likely to be using it that way, but attacks only get better, and it's a good springboard into more research to break SHA-512 in the same way MD5 is broken.
However, out of all the Hash functions available, the SHA-2 family is currently amoung the strongest, and the best choice considering commonness, analysis, and security. (But not necessarily speed. If you're in embedded systems, you need to perform a whole other analysis.)
MD5 has been cryptographically broken for quite some time now. This basically means that some of the properties usually guaranteed by hash algorithms, do not hold anymore. For example it is possible to find hash collisions in much less time than potentially necessary for the output length.
SHA-512 (one of the SHA-2 family of hash functions) is, for now, secure enough but possibly not much longer for the foreseeable future. That's why the NIST started a contest for SHA-3.
Generally, you want hash algorithms to be one-way functions. They map some input to some output. Usually the output is of a fixed length, thereby providing a "digest" of the original input. Common properties are for example that small changes in input yield large changes in the output (which helps detecting tampering) and that the function is not easily reversible. For the latter property the length of the output greatly helps because it provides a theoretical upper bound for the complexity of a collision attack. However, flaws in design or implementation often result in reduced complexity for attacks. Once those are known it's time to evaluate whether still using a hash function. If the attack complexity drops far enough practical attacks easily get in the range of people without specialized computing equipment.
Note: I've been talking only about one kind of attack here. The reality if much more nuanced but also much harder to grasp. Since hash functions are very commonly used for verifying file/message integrity the collision thing is probably the easiest one to understand and follow.
There are a couple of points not being addressed here, and I feel it is from a lack of understanding about what a hash is, how it works, and how long it takes to successfully attack them, using rainbow or any other method currently known to man...
Mathematically speaking, MD5 is not "broken" if you salt the hash and throttle attempts (even by 1 second), your security would be just as "broken" as it would by an attacker slowly pelting away at your 1ft solid steel wall with a wooden spoon:
It will take thousands of years, and by then everyone involved will be dead; there are more important things to worry about.
If you lock their account by the 20th attempt... problem solved. 20 hits on your wall = 0.0000000001% chance they got through. There is literally a better statistical chance you are in fact Jesus.
It's also important to note that absolutely any hash function is going to be vulnerable to collisions by virtue of what a hash is: "a (small) unique id of something else".
When you increase the bit space you decrease collision rates, but you also increase the size of the id and the time it takes to compute it.
Let's do a tiny thought experiment...
SHA-2, if it existed, would have 4 total possible unique IDs for something else... 00, 01, 10 & 11. It will produce collisions, obviously. Do you see the issue here? A hash is just a generated ID of what you're trying to identify.
MD5 is actually really, really good at randomly choosing a number based on an input. SHA is actually not that much better at it; SHA just has massive more space for IDs.
The method used is about 0.1% of the reason the collisions are less likely. The real reason is the larger bit space.
This is literally the only reason SHA-256 and SHA-512 are less vulnerable to collisions; because they use a larger space for a unique id.
The actual methods SHA-256 and SHA-512 use to generate the hash are in fact better, but not by much; the same rainbow attacks would work on them if they had fewer bits in their IDs, and files and even passwords can have identical IDs using SHA-256 and SHA-512, it's just a lot less likely because it uses more bits.
The REAL ISSUE is how you implement your security
If you allow automated attacks to hit your authentication endpoint 1,000 times per second, you're going to get broken into. If you throttle to 1 attempt per 3 seconds and lock the account for 24 hours after the 10th attempt, you're not.
If you store the passwords without salt (a salt is just an added secret to the generator, making it harder to identify bad passwords like "31337" or "password") and have a lot of users, you're going to get hacked. If you salt them, even if you use MD5, you're not.
Considering MD5 uses 128 bits (32 bytes in HEX, 16 bytes in binary), and SHA 512 is only 4x the space but virtually eliminates the collision ratio by giving you 2^384 more possible IDs... Go with SHA-512, every time.
But if you're worried about what is really going to happen if you use MD5, and you don't understand the real, actual differences, you're still probably going to get hacked, make sense?
reading this
However, it has been shown that MD5 is not collision resistant
more information about collision here
MD5 has a chance of collision (http://www.mscs.dal.ca/~selinger/md5collision/) and there are numerous MD5 rainbow tables for reverse password look-up on the web and available for download.
It needs a much larger dictionary to map backwards, and has a lower chance of collision.
It is simple, MD5 is broken ;) (see Wikipedia)
Bruce Schneier wrote of the attack that "[w]e already knew that MD5 is a broken hash function" and that "no one should be using MD5 anymore."