In order to integrity protect a byte stream one can conceptually either use symmetric cryptography (e.g. an HMAC with SHA-1) or asymmetric cryptography (e.g. digital signature with RSA).
It is common sense that asymmetric cryptography is much more expensive than using symmetric cryptography. However, I would like to have hard numbers and would like to know whether there exist benchmark suites for existing crypto libraries (e.g. openssl) in order to gain some measurement results for symmetric and asymmetric cryptography algorithms.
The numbers I get from the built-in "openssl speed" app can, unfortunately, not be compared to each other.
Perhaps somebody already implemented a small benchmarking suite for this purpose?
Thanks,
Martin
I don't think a benchmark is useful here, because the two things you're comparing are built for different use-cases. An HMAC is designed for situations in which you have a shared secret you can use to authenticate the message, whilst signatures are designed for situations in which you don't have a shared secret, but rather want anyone with your public key be able to verify your signature. There are very few situations in which either primitive would be equally appropriate, and when there is, there's likely to be a clear favorite on security, rather than performance grounds.
It's fairly trivial to demonstrate that an HMAC is going to be faster, however: Signing a message requires first hashing it, then computing the signature over the hash, whilst computing an HMAC requries first hashing it, then computing the HMAC (which is merely two additional one-block hash computations). For the same reason, though, for any reasonable assumption as to message length and speed of your cryptographic primitives, the speed difference is going to be negligible, since the largest part of the cost is shared between both operations.
In short, you shouldn't choose the structure of your cryptosystem based on insignificant differences in performance.
All digital signature algorithms (RSA, DSA, ECDSA...) begin by hashing the source stream with a hash function; only the hash output is used afterwards. So the asymptotic cost of signing a long stream of data is the same as the asymptotic cost of hashing the same stream. HMAC is similar in that respect: first you input in the hash function a small fixed-size header, then the data stream; and you have an extra hash operation at the end which operates on a small fixed-size input. So the asymptotic cost of HMACing a long stream of data is the same as the asymptotic cost of hashing the same stream.
To sum up, for a suitably long data stream, a digital signature and HMAC will have the same CPU cost. Speed difference will not be noticeable (the complex part at the end of a digital signature is more expensive than what HMAC does, but a simple PC will still be able to do it in less than a millisecond).
The hash function itself can make a difference, though, at least if you can obtain the data with a high bandwidth. On a typical PC, you can hope hashing data at up to about 300 MB/s with SHA-1, but "only" 150 MB/s with SHA-256. On the other hand, a good mechanical harddisk or gigabit ethernet will hardly go beyond 100 MB/s read speed, so SHA-256 would not be the bottleneck here.
Related
I need to hash identifiers before storing in a database. There will be up to 1 million values overall. I need to pseudonymise these values to comply with GDPR.
I am using .Net core and I want to stay with the core hashing functionality. I dont want to risk using external hashing implementations. The intention is to add a salt phrase to each value before hashing. These values have already been hashed by the supplier but I will be hashing again before storing in db.
I was going to use SHA256 but I have read that PBKDF2 is more secure. However, I have read that PBKDF2 is prone to collisions. It is of the utmost importance that the hashing implementation I use has a low collision chance. Has PBKDF2 a higher collision rate than simple SHA256? Does using a key-derivation of HMACSHA512 with PBKDF2 as opposed to HMACSHA1 reduce the possibility of collisions?
Would like recommendations for a secure, low-collision one-way hash for Net core.
There will be up to 1 million values overall. I need to pseudonymise these values to comply with GDPR
I was going to use SHA256 but I have read that PBKDF2 is more secure.
For this use case a proper cryptographic hash is imho the best option.
PBKDF2 is a key derivation function intended to derive higher entropy keys from relatively weak passwords. It uses a hash under the hood so if the hash has certain hash collision probability , pbkdf will have the same.
pbkdf2 is intended to be slow (using iterations) to mitigate feasibility of brute-forcing the input password. You don't need that property, even it may be bad for your use case.
So -you can boldly use sha256 to anonymize your data, imho it may be the best option you have today. Indeed principially you cannot prevent the hash collision, but the probability should be negligible
Just now I'm working in a financial project. Here, the team is thinking to use MD5 for password hashing.
But, today is easy copy a SHA1 or MD5 password to decrypt, inclusive if they are complex password like:
My$uper$ecur3PAS$word+448, you might use a online page to decrypt it and there is it.
Small and mid-range developers (including me) uses those hashing methods, but I think is not enough to provide security over the database.
(Excluding firewalls, network security, iptables, etc.).
Can someone give me a clue about what is the better approach to solve this vulnerability?
As per OWASP Password Storage Cheat Sheet, the recommendation is:
Argon2 is the winner of the password hashing competition and should be considered as your first choice for new applications;
PBKDF2 when FIPS certification or enterprise support on many platforms is required;
scrypt where resisting any/all hardware accelerated attacks is necessary but support isn’t.
bcrypt where PBKDF2 or scrypt support is not available.
MD5 and SHA1 are not secured for most security related use cases, because it is possible to find collisions with these algorithms. In other words, given an input and its hash value, it is possible to derive another input with the same hash value.
SHA-2 group of hashing algorithms are secured for many security use cases, but not for password hashing because they are blazingly fast when compared with the above algorithms. And performance is something we don't want for password hashing because that would make it easier for an attacker to perform a brute-force attack by trying a wide range of passwords in a short period of time.
The above 4 algorithms are therefore meant to be expensive in terms of memory, computation power and time. These values are usually parameterized so that they can be tuned to a high value as new technologies improve the computation power with passsing time. Therefore while using these algorithms, it is important to choose the work factor values correctly. Setting a very low valur may defeat the purpose.
In addition to that a salt should also be used.
Again from the same OWASP source:
Generate a unique salt upon creation of each stored credential (not just per user or system wide);
Use cryptographically-strong random data;
As storage permits, use a 32 byte or 64 byte salt (actual size dependent on protection function);
Scheme security does not depend on hiding, splitting, or otherwise obscuring the salt.
Salts serve two purposes:
prevent the protected form from revealing two identical credentials and
augment entropy fed to protecting function without relying on credential complexity.
The second aims to make pre-computed lookup attacks on an individual credential and time-based attacks on a population intractable
Your thinking is correct, MD5 and SHA1 should never be used for password hashing. I would recommend the following, in order of preference:
argon2
bcrypt
scrypt
PBKDF2
If you tag your question with the language/framework you are using, I can recommend specific libraries or methods.
Also be aware that encryption is not the right word to use here. These are password hashing algorithms, not encryption algorithms.
Title says it all.
Since Quantum Computers are said to be the next big thing, I figured the speed at which these systems operate on should be enough to decrypt files/applications in a 'Brute Force' manner.
Is it possible? When will it be possible?
Quantum computers operate differently from classical computers, rather than faster or slower. For some problems they're much faster than the best known algorithms, while for others they'd be slower if they would work at all.
For decrypting, there are quantum algorithms for attacking some specific ciphers. Probably the best known is Shor's Algorithm, which on a large enough quantum computer would allow you to factor large numbers efficiently, thus breaking RSA. Breaking RSA would require many thousands of high-quality qubits, and so is not something that's going to be available in the next few years. Longer term, I myself wouldn't try to guess when such a quantum computer will be available, although others may have more confidence.
There are quantum attacks on other ciphers as well, including elliptic curve cryptography. The good news is that post-quantum cryptography is an active field of research, and there are some promising developments already. Also, most symmetric ciphers in use today are quantum-resistant; while brute force search time on a quantum computer would in theory scale with the square root of the number of possible keys, doubling the key size addresses this neatly.
There are good resources for this on Wikipedia: https://en.wikipedia.org/wiki/Shor%27s_algorithm and https://en.wikipedia.org/wiki/Post-quantum_cryptography. The Microsoft Quantum samples repository includes a Q# implementation of Shor's algorithm.
Threat of Quantum Computers to today's encrypted data is real. Please refer to "Harvest now Decrypt later attack".
You can implement Shor's algorithm in Q#. Shor's algorithm is the threat to
Asymmetric key cryptography.
You can also implement Grover's Algorithm in Q#. Grover's algorithm uses brute-force / unsorted search to search symmetric keys.
Much progress has been made with Post Quantum Cryptography (PQC) since this question was originally answered in 2018. NIST is driving a standardization process to identify PQC algorithms. These PQC Algorithms will replace RSA / ECC based DHE/DSA/LEM algorithms.
So more than "when is it possible?" - we have to act now because the threat is real and the encrypted data as we know today is perhaps being passively snapped ( we wont know about it). Data elements such as social security id (in USA) and similar information have a shelf life that exceeds 7 to 10 years.
I have lots of small secrets that I want to store encrypted in a database. The database client will have the keys, and the database server will not deal with encryption and decryption. All of my secrets are 16 bytes or less, which means just one block when using AES. I'm using a constant IV (and key) to make the encryption deterministic and my reason for doing deterministic encryption is to be able to easily query the database using ciphertext and making sure the same secret is not stored twice (by making the column UNIQUE). As far as I can see there should be no problem doing this, as long as the key is secret. But I want to be sure: Am I right or wrong? In case I'm wrong, what attacks could be done?
BTW: Hashes are quite useless here, because of a relatively small number of possible plaintexts. With a hash it would be trivial to obtain the original plaintext.
An ideal cipher, for messages of length n bits, is a permutation of the 2n sequences of n bits, chosen at random in the 2n! such permutations. The "key" is the description of which permutation was chosen.
A secure block cipher is supposed to be indistinguishable from an ideal cipher, with n being the block size. For AES, n=128 (i.e. 16 bytes). AES is supposed to be a secure block cipher.
If all your secrets have length exactly 16 bytes (or less than 16 bytes, with some padding convention to unambiguously extend them to 16 bytes), then an ideal cipher is what you want, and AES "as itself" should be fine. With common AES implementations, which want to apply padding and process arbitrarily long streams, you can get a single-block encryption by asking for ECB mode, or CBC mode with an all-zero IV.
All the issues about IV, and why chaining modes such as CBC were needed in the first place, come from multi-block messages. AES encrypts 16-byte messages (no more, no less): chaining modes are about emulating an ideal cipher for longer messages. If, in your application, all messages have length exactly 16 bytes (or are shorter, but you add padding), then you just need the "raw" AES; and a fixed IV is a close enough emulation of raw AES.
Note, though, the following:
If you are storing encrypted elements in a database, and require uniqueness for the whole lifetime of your application, then your secret key is long-lived. Keeping a secret key secret for a long time can be a hard problem. For instance, long-lived secret keys need some kind of storage (which resists to reboots). How do you manage dead hard disks ? Do you destroy them in an acid-filled cauldron ?
Encryption ensures confidentiality, not integrity. In most security models, attackers can be active (i.e., if the attacker can read the database, he can probably write into it too). Active attacks open up a full host of issues: for instance, what could happen if the attacker swaps some of your secrets within the database ? Or alters some randomly ? Encryption is, as always, the easy part (not that it is really "easy", but it is much easier than the rest of the job).
If the assembly is publicly available, or can become so, your key and IV can be discovered by using Reflector to expose the source code that uses it. That would be the main problem with this, if the data really were secret. It is possible to obfuscate MSIL, but that just makes it harder to trace through; it still has to be computer-consumable, so you can't truly encrypt it.
The static IV would make your implementation vulnerable to frequency attacks. See For AES CBC encryption, whats the importance of the IV?
I was wondering if I could reasons or links to resources explaining why SHA512 is a superior hashing algorithm to MD5.
It depends on your use case. You can't broadly claim "superiority". (I mean, yes you can, in some cases, but to be strict about it, you can't really).
But there are areas where MD5 has been broken:
For starters: MD5 is old, and common. There are tons of rainbow tables against it, and they're easy to find. So if you're hashing passwords (without a salt - shame on you!) - using md5 - you might as well not be hashing them, they're so easy to find. Even if you're hashing with simple salts really.
Second off, MD5 is no longer secure as a cryptographic hash function (indeed it is not even considered a cryptographic hash function anymore as the Forked One points out). You can generate different messages that hash to the same value. So if you've got a SSL Certificate with a MD5 hash on it, I can generate a duplicate Certificate that says what I want, that produces the same hash. This is generally what people mean when they say MD5 is 'broken' - things like this.
Thirdly, similar to messages, you can also generate different files that hash to the same value so using MD5 as a file checksum is 'broken'.
Now, SHA-512 is a SHA-2 Family hash algorithm. SHA-1 is kind of considered 'eh' these days, I'll ignore it. SHA-2 however, has relatively few attacks against it. The major one wikipedia talks about is a reduced-round preimage attack which means if you use SHA-512 in a horribly wrong way, I can break it. Obivously you're not likely to be using it that way, but attacks only get better, and it's a good springboard into more research to break SHA-512 in the same way MD5 is broken.
However, out of all the Hash functions available, the SHA-2 family is currently amoung the strongest, and the best choice considering commonness, analysis, and security. (But not necessarily speed. If you're in embedded systems, you need to perform a whole other analysis.)
MD5 has been cryptographically broken for quite some time now. This basically means that some of the properties usually guaranteed by hash algorithms, do not hold anymore. For example it is possible to find hash collisions in much less time than potentially necessary for the output length.
SHA-512 (one of the SHA-2 family of hash functions) is, for now, secure enough but possibly not much longer for the foreseeable future. That's why the NIST started a contest for SHA-3.
Generally, you want hash algorithms to be one-way functions. They map some input to some output. Usually the output is of a fixed length, thereby providing a "digest" of the original input. Common properties are for example that small changes in input yield large changes in the output (which helps detecting tampering) and that the function is not easily reversible. For the latter property the length of the output greatly helps because it provides a theoretical upper bound for the complexity of a collision attack. However, flaws in design or implementation often result in reduced complexity for attacks. Once those are known it's time to evaluate whether still using a hash function. If the attack complexity drops far enough practical attacks easily get in the range of people without specialized computing equipment.
Note: I've been talking only about one kind of attack here. The reality if much more nuanced but also much harder to grasp. Since hash functions are very commonly used for verifying file/message integrity the collision thing is probably the easiest one to understand and follow.
There are a couple of points not being addressed here, and I feel it is from a lack of understanding about what a hash is, how it works, and how long it takes to successfully attack them, using rainbow or any other method currently known to man...
Mathematically speaking, MD5 is not "broken" if you salt the hash and throttle attempts (even by 1 second), your security would be just as "broken" as it would by an attacker slowly pelting away at your 1ft solid steel wall with a wooden spoon:
It will take thousands of years, and by then everyone involved will be dead; there are more important things to worry about.
If you lock their account by the 20th attempt... problem solved. 20 hits on your wall = 0.0000000001% chance they got through. There is literally a better statistical chance you are in fact Jesus.
It's also important to note that absolutely any hash function is going to be vulnerable to collisions by virtue of what a hash is: "a (small) unique id of something else".
When you increase the bit space you decrease collision rates, but you also increase the size of the id and the time it takes to compute it.
Let's do a tiny thought experiment...
SHA-2, if it existed, would have 4 total possible unique IDs for something else... 00, 01, 10 & 11. It will produce collisions, obviously. Do you see the issue here? A hash is just a generated ID of what you're trying to identify.
MD5 is actually really, really good at randomly choosing a number based on an input. SHA is actually not that much better at it; SHA just has massive more space for IDs.
The method used is about 0.1% of the reason the collisions are less likely. The real reason is the larger bit space.
This is literally the only reason SHA-256 and SHA-512 are less vulnerable to collisions; because they use a larger space for a unique id.
The actual methods SHA-256 and SHA-512 use to generate the hash are in fact better, but not by much; the same rainbow attacks would work on them if they had fewer bits in their IDs, and files and even passwords can have identical IDs using SHA-256 and SHA-512, it's just a lot less likely because it uses more bits.
The REAL ISSUE is how you implement your security
If you allow automated attacks to hit your authentication endpoint 1,000 times per second, you're going to get broken into. If you throttle to 1 attempt per 3 seconds and lock the account for 24 hours after the 10th attempt, you're not.
If you store the passwords without salt (a salt is just an added secret to the generator, making it harder to identify bad passwords like "31337" or "password") and have a lot of users, you're going to get hacked. If you salt them, even if you use MD5, you're not.
Considering MD5 uses 128 bits (32 bytes in HEX, 16 bytes in binary), and SHA 512 is only 4x the space but virtually eliminates the collision ratio by giving you 2^384 more possible IDs... Go with SHA-512, every time.
But if you're worried about what is really going to happen if you use MD5, and you don't understand the real, actual differences, you're still probably going to get hacked, make sense?
reading this
However, it has been shown that MD5 is not collision resistant
more information about collision here
MD5 has a chance of collision (http://www.mscs.dal.ca/~selinger/md5collision/) and there are numerous MD5 rainbow tables for reverse password look-up on the web and available for download.
It needs a much larger dictionary to map backwards, and has a lower chance of collision.
It is simple, MD5 is broken ;) (see Wikipedia)
Bruce Schneier wrote of the attack that "[w]e already knew that MD5 is a broken hash function" and that "no one should be using MD5 anymore."