Can two bitcoin addresses has same checksum - bitcoin

I know that checksum is extracted from double sha256 of the ripemd160. So, it must be possible to have two bitcoin addresses to have same checksum though statistically improbable.

Yes, you can get the same bitcoin address from different public/private keys. This is called a "collision" and you can't have hash functions without collisions (see pigeonhole principle). Also check 128 bit hash without collisions guaranteed.

Related

Is truncating sha2/sha3 to 16 bytes worse than using crc32 which itself gives 16 bytes to begin with? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am using AES128 in CBC mode, and I need a 16-byte key, so I was wondering if using sha2 or sha3 and then truncating it to 16 bytes (take first 16 bytes from the left) would make sha2/sha3 weaker than crc32 which gives me 16 bytes out of the box.
Each bit of a cryptographically secure hash is effectively random (i.e. independent of all the other bits). This is not true of non-cryptographic hashes. This property is critical for a secure key. You should always use a cryptographic hash for key derivation.
Truncating a long secure hash is a perfectly acceptable way to create a secure hash of shorter length. You may also select any subset of bits rather than just the most significant or least significant. If this weren't true, then the original hash would not itself be secure, because it would suggest some non-randomness in the output.
SHA-2 and SHA-3 intend to be cryptographically secure hashes (and at this point, we believe they are). CRC does not even intend to be cryptographically secure.
If the input key material is not itself random, then a fast hash like the SHA series may be subject to brute force. If so, then you need to use key stretching as well as hashing, for example with PBKDF2.
But you should never use CRC for any of this. It is not intended to be a secure hash.
For more discussion, see Should I use the first or last bits from a SHA-256 hash? and “SHA-256” vs “any 256 bits of SHA-512”, which is more secure?
I am using AES128 in CBC mode, and I need a 16-byte key, so I was wondering if using sha2 or sha3 and then truncating it to 16 bytes (take first 16 bytes from the left) would make sha2/sha3 weaker than crc32 which gives me 16 bytes out of the box.
The question was not clear about how the input to the CRC or SHAx is generated. The OP cleared more. So, I've provided the answer below parts;
I mean regardless of the input (say the input was even abcd ), would truncating sha2/3 to 16 bytes be more secure than using crc32.
First of all, forget CRC, it is not a cryptographical hash function, forget it.
When the input space is small, there is a special case of the pre-image attack of the hash functions. The attacker can try all possible combinations to generate the key. You can read more details in this Cryptography.SE Q/A
Secure hashing when the input comes from a small space
Is it easy to crack a hashed phone number?.
Forgot about the small input space!, the entities like BitCoin Miner or SuperComputer like Summit Can reach 2^64 very easily. Which simply says the 8-byte.
One should generate a strong password like the dicewire or Bip-39. This will provide you easy to remember and strong passwords. See also XKCD
Once you generated a good password, then you can pass it to the poor man's KDF1, to better use HKDF. Since your input material is good you can skip the expand part of the HKDF. You can also use the Password-based Key derivation functions like Scrypt, PBKDF2, and Argon2. In this case, choose the Argon2 since it was the winner of the Password Hashing Competition in July 2015.
I was just trying to encrypt data like sounds for a game with AES 128, and was just wondering if using only 16 bytes of the hashed password-like key with sha2/3 was a more secure solution than using a whole crc32 output instead. I've never worked with any of the other methods you mentioned...
For the proper use of CBC mode, you need a nonce, too. You can use HKDF or PBKDF2, Argon2, etc with different info/nonce to derive the nonce, too. This is very common.
Note those about CBC;
The nonce must be unique under the same key, i.e (Key,IV) pair must be used once
CBC IV must be unpredictable, however, as far as I can see this is not your case
CBC is vulnerable to padding oracle attacks on the server-side. This is not possible in your case, too.
CBC mode can only provide CPA security, there is no integrity and authentication. To provide integrity and authentication either use HMAC with a different key, or use the combined modes.
Use Authenticated Encryption With Associated Data mode of encryptions like AES-GCM and ChaCha20-Poly1305. Correctly using the GCM may be hard, better use ChaCha20-poly1305 or xChaCha20-poly1305 for better nonce random generations.

Encrypt / Decrypt uidata with "homemade" algorithm

Just working on a algorithm and so far i can encrypt and decrypt a number, which works fine. My question now is how do i go abouts encrypting an image? How does the UIdata look and shold i convert the image to that before I start? Never done anything on this level in terms of encryption and any input would be great! Thanks!
You'll probably want to encrypt in small chunks - perhaps a byte or word/int (4 bytes), maybe even a long (8 bytes) at a time depending on how your algorithm is implemented.
I don't know the signature of your algorithm (i.e. what types of input it takes and what types output it gives), but the most common ciphers are block ciphers, i.e. algorithms which have a input of some block size (nowadays 128 bits = 16 bytes is a common size), and a same-sized output, additionally to a key input (which should also have at least 128 bits).
To encrypt longer pieces of data (and actually, also for short pieces if you send multiple such pieces with the same key), you use a mode of operation (and probably additionally a padding scheme). This gives you an algorithm (or a pair of such) with an arbitrary length plaintext input, and slightly bigger ciphertext output (which the decryption algorithm undoes then).
Some hints:
Don't use ECB mode (i.e. simply encrypting each block independently of the others).
Probably you also should apply a MAC, to protect your data against malicious modifications (and also breaking of the encryption scheme by choosen-ciphertext attacks). Some modes of operation already include a MAC.

Does an internal hash digest in a message strengthen an outer digest?

A message digest is being used to verify that a message is the intended one.
By how much would bundling a hash digest with contents to form the message increase the difficulty of collision and preimage attacks against the message?
For example, to encode:
message = data . hash1(data)
message_hash = hash2(message)
To verify message using message_hash:
check(hash2(message) == message_hash)
data = message[:-digest_size]
check(hash1(data) == message[-digest_size:])
hash1 and hash2 could be completely different types of hash functions.
My reasoning for this was that any attack would have to break both hash functions - faking the outer digest would require constructing a message with a valid inner hash.
If the outer hash algorithm is broken, the inner hash could help, but you have to consider how likely that scenario is with a well respected algorithm.
If the outer hash is so small that a brute force attack is feasible, the inner hash wouldn't help much at all. Instead of finding a message with the same hash, the attacker would have to find a message plus inner hash with the same outer hash, which pretty much amounts to the same thing.
So make the hash as large as you can, and concentrate on making sure there are no back doors in the rest of your system. 64 bits is probably just about OK unless you are anticipating a government or major corporation taking an interest in breaking your hash.
Your proposal has something reminds me of HMAC. This is a construction that allows one to create message authentication codes, keyed hashes if you wish.
However, I don't see the point of using 2 hash functions. Pick a one of the standard ones that have resisted attacks so far and go with it. If you assume one of them will get broken, why use it in the first place? SHA-2 or any of the final candidates for the SHA3 competition should be fine if you want strong security, more info here: http://ehash.iaik.tugraz.at/wiki/The_SHA-3_Zoo.
In some situations, the inner hash may make the task more difficult for the attacker, but not necessarily. For instance, if you use MD5 for both hash functions, then a collision for the inner hash would also imply a collision for the outer hash, given the iterated structure of MD5.
So adding the inner hash function will not necessarily increase resistance to collisions and preimages. On a pure theoretical point of view, it may actually decrease resistance, although this is quite improbable, especially if the hash functions are secure (but if the functions are secure, then the construction is pointless). On a more practical plane, this double hashing increases computational work load (more CPU, and possibly bigger code -- hence more L1 cache usage -- if the two functions are not the same). So my advice would be not to do that. Instead, use a single "believed secure" hash function such as SHA-256. The hash function will not be the biggest weakness in your application (or, more precisely, if the hash function is the biggest weakness in your application, then you are a programming god and/or Donald Knuth).
As an illustration, SSL/TLS uses MD5 and SHA-1 simultaneously as an attempt to resist weaknesses on either of the two functions. But the newer TLS 1.2 version switches to SHA-256 only.

Reasons why SHA512 is superior to MD5

I was wondering if I could reasons or links to resources explaining why SHA512 is a superior hashing algorithm to MD5.
It depends on your use case. You can't broadly claim "superiority". (I mean, yes you can, in some cases, but to be strict about it, you can't really).
But there are areas where MD5 has been broken:
For starters: MD5 is old, and common. There are tons of rainbow tables against it, and they're easy to find. So if you're hashing passwords (without a salt - shame on you!) - using md5 - you might as well not be hashing them, they're so easy to find. Even if you're hashing with simple salts really.
Second off, MD5 is no longer secure as a cryptographic hash function (indeed it is not even considered a cryptographic hash function anymore as the Forked One points out). You can generate different messages that hash to the same value. So if you've got a SSL Certificate with a MD5 hash on it, I can generate a duplicate Certificate that says what I want, that produces the same hash. This is generally what people mean when they say MD5 is 'broken' - things like this.
Thirdly, similar to messages, you can also generate different files that hash to the same value so using MD5 as a file checksum is 'broken'.
Now, SHA-512 is a SHA-2 Family hash algorithm. SHA-1 is kind of considered 'eh' these days, I'll ignore it. SHA-2 however, has relatively few attacks against it. The major one wikipedia talks about is a reduced-round preimage attack which means if you use SHA-512 in a horribly wrong way, I can break it. Obivously you're not likely to be using it that way, but attacks only get better, and it's a good springboard into more research to break SHA-512 in the same way MD5 is broken.
However, out of all the Hash functions available, the SHA-2 family is currently amoung the strongest, and the best choice considering commonness, analysis, and security. (But not necessarily speed. If you're in embedded systems, you need to perform a whole other analysis.)
MD5 has been cryptographically broken for quite some time now. This basically means that some of the properties usually guaranteed by hash algorithms, do not hold anymore. For example it is possible to find hash collisions in much less time than potentially necessary for the output length.
SHA-512 (one of the SHA-2 family of hash functions) is, for now, secure enough but possibly not much longer for the foreseeable future. That's why the NIST started a contest for SHA-3.
Generally, you want hash algorithms to be one-way functions. They map some input to some output. Usually the output is of a fixed length, thereby providing a "digest" of the original input. Common properties are for example that small changes in input yield large changes in the output (which helps detecting tampering) and that the function is not easily reversible. For the latter property the length of the output greatly helps because it provides a theoretical upper bound for the complexity of a collision attack. However, flaws in design or implementation often result in reduced complexity for attacks. Once those are known it's time to evaluate whether still using a hash function. If the attack complexity drops far enough practical attacks easily get in the range of people without specialized computing equipment.
Note: I've been talking only about one kind of attack here. The reality if much more nuanced but also much harder to grasp. Since hash functions are very commonly used for verifying file/message integrity the collision thing is probably the easiest one to understand and follow.
There are a couple of points not being addressed here, and I feel it is from a lack of understanding about what a hash is, how it works, and how long it takes to successfully attack them, using rainbow or any other method currently known to man...
Mathematically speaking, MD5 is not "broken" if you salt the hash and throttle attempts (even by 1 second), your security would be just as "broken" as it would by an attacker slowly pelting away at your 1ft solid steel wall with a wooden spoon:
It will take thousands of years, and by then everyone involved will be dead; there are more important things to worry about.
If you lock their account by the 20th attempt... problem solved. 20 hits on your wall = 0.0000000001% chance they got through. There is literally a better statistical chance you are in fact Jesus.
It's also important to note that absolutely any hash function is going to be vulnerable to collisions by virtue of what a hash is: "a (small) unique id of something else".
When you increase the bit space you decrease collision rates, but you also increase the size of the id and the time it takes to compute it.
Let's do a tiny thought experiment...
SHA-2, if it existed, would have 4 total possible unique IDs for something else... 00, 01, 10 & 11. It will produce collisions, obviously. Do you see the issue here? A hash is just a generated ID of what you're trying to identify.
MD5 is actually really, really good at randomly choosing a number based on an input. SHA is actually not that much better at it; SHA just has massive more space for IDs.
The method used is about 0.1% of the reason the collisions are less likely. The real reason is the larger bit space.
This is literally the only reason SHA-256 and SHA-512 are less vulnerable to collisions; because they use a larger space for a unique id.
The actual methods SHA-256 and SHA-512 use to generate the hash are in fact better, but not by much; the same rainbow attacks would work on them if they had fewer bits in their IDs, and files and even passwords can have identical IDs using SHA-256 and SHA-512, it's just a lot less likely because it uses more bits.
The REAL ISSUE is how you implement your security
If you allow automated attacks to hit your authentication endpoint 1,000 times per second, you're going to get broken into. If you throttle to 1 attempt per 3 seconds and lock the account for 24 hours after the 10th attempt, you're not.
If you store the passwords without salt (a salt is just an added secret to the generator, making it harder to identify bad passwords like "31337" or "password") and have a lot of users, you're going to get hacked. If you salt them, even if you use MD5, you're not.
Considering MD5 uses 128 bits (32 bytes in HEX, 16 bytes in binary), and SHA 512 is only 4x the space but virtually eliminates the collision ratio by giving you 2^384 more possible IDs... Go with SHA-512, every time.
But if you're worried about what is really going to happen if you use MD5, and you don't understand the real, actual differences, you're still probably going to get hacked, make sense?
reading this
However, it has been shown that MD5 is not collision resistant
more information about collision here
MD5 has a chance of collision (http://www.mscs.dal.ca/~selinger/md5collision/) and there are numerous MD5 rainbow tables for reverse password look-up on the web and available for download.
It needs a much larger dictionary to map backwards, and has a lower chance of collision.
It is simple, MD5 is broken ;) (see Wikipedia)
Bruce Schneier wrote of the attack that "[w]e already knew that MD5 is a broken hash function" and that "no one should be using MD5 anymore."

What is the best way to determine duplicate credit card numbers without storing them?

I run a website where we mark certain accounts as scammers, and "flag" their account and all credit cards used as being bad. We don't store actual credit card values, but are storing a checksum/MD5 algorithm of it instead.
We are hitting collisions all the time now. What is the best way to store these values - non reversible, but able to do comparisons on future values.
I thought MD5 would be the best, but we've got a debate going on here...
A cryptographically secure hash would work. (SHA512 or SHA256 would be OK)
However, I would use a fairly secret salt that is not stored along with the cards (to prevent any sort of rainbow table attack).
PS:
Rainbow table attacks against credit cards could be particularlly effective, since the total size of the plain-text-space is quite small due to the limited character set, the fixed size, and the check digits.
PPS:
You can't use a random salt for each entry, because you would never be able to feasibly check duplicates. Salts are used to prevent collisions, whereas we are specifically looking for a collision in this instance.
It isn't sufficiently safe to just use a good Hash algorithm. If your list is stolen, your stored hashes can be used to retrieve working card information. The actual schema-space for credit card numbers is small enough that a determined attacker can pre-calculate many of the possible hashes ahead of time as well, and this may have other implications for your system if there is an intrusion or an inside-job.
I recommend you use a salt and also calculate a 2nd value to be added to the salt based on a formula involving each digit of the card number and the first salt value. This assures that if you lose control of either part, you still have reasonable uniqueness that renders ownership of the list useless. The formula should not be heavily weighted toward the first 6 digits of the card (BIN number), though, and no trace of the formula should be stored in the same location as either the salt or the final hash.
Consider the anatomy of a 16-digit credit card number:
6 digit BIN (Bank Identification Number)
9 digit Account Number
1 digit Luhn Checksum
BIN lists are well known within the processing industry and are not too difficult to assemble for those with access to an illicit list of card numbers. The number of valid BINs is further diminished by the assigned space for each issuer.
Visa - Starts with 4
American Express - Starts with 34 / 37
MasterCard - Starts with 5
Discover/CUP - Starts with 6
Diner's Club - Starts with 35
etc.
Note that some of the assigned BIN information within each issuer category is also sparse. If an attacker is aware of where most of your customers are located, then that will cut down the uniqueness considerably, as BIN information is assigned on a per-bank basis. An attacker that already has an account issued by a small bank in a wealthy neighborhood could just get an account and use the BIN as a starting point on his own card.
The checksum digit is calculated with a well-known formula, so that is immediately discardable as a source of unique data.
Armed with a handful of BINs worth targeting, an attacker has to check 9 digits at a time for each BIN set. This is 1 Billion Checksums and Hash Operations per set. I don't have any benchmarks handy, but I'm pretty sure 1 Million Hash operations per minute is not unreasonable for MD5 or any flavor of SHA on a suitably powerful machine. This amounts to less than a day to crack all matches under a given BIN.
Finally, you might consider storing a timestamp or visitor token (IP/subnet) with your hashes as well. It is nice to catch duplicate card numbers, but also consider the ramifications of someone stuffing your system with bogus card numbers. At some point you need to decide on a trade-off between blocking card numbers that you know are invalid, and also give yourself a mechanism to identify and repair misuse.
For example, a disgruntled employee could be stealing card information on his own and then use your hash mechanism against you by inserting valid hashes into your card number blacklist to block repeat business. It is quite expensive to undo this if you are just storing a hash- everything is opaque once it has been converted to a hash. With this in mind, give yourself a method to identify the source of the hash as well.
Perhaps you can store two different hashes of the card number. The chances that both hashes will result in collisions is practically zero.
Use SHA1, hash collisions are yet to be found.
People pointing out that a hash is "broken" are missing the point, perhaps regurgitating something they've heard without understanding what it means. When people talk about hashes being 'broken' they typically mean that it is possible to easily generate an alternate payload that has computes to the same hash.
This 'breaks' the hash but only for the specific purpose of using a hash to verify data is what it's supposed to be.
That isn't the important here, ie someone managing to create an alternate datastream that happens to hash down to the same value as one of the credit cards doesn't achieve anything meaningful or useful in terms of an attack vector.
The risk with hashes here is that the problem space for credit card numbers is pretty low and rainbow tables for them would be pretty cheap and easy to generate.
Adding a salt would add a bit of protection against already generated rainbow tables for pure card numbers but the extent to which it offers any real protection depends on how 'secret' the salt would remain in the case you are compromised. If the salt is exposed then new rainbow tables can then be cheaply generated and it's all over.
Given that the salt needs to be available to the application for it to perform checks against the blacklist there's a good chance someone compromising the blacklist data will also be able to get to the salt. If you have multiple servers you can mitigate that to some degree by ensuring both the salt and the data aren't in the same 'place' so an exposure of one server won't give someone all of the parts they need. (Similarly for backups don't store the data and the salt on the same media where someone can walk away with one tape and get everything). The salt only adds some protection while it is secret (in this type use).
If you have the resources to do it securely then I think that is the route to go. If you are getting a significant number of collisions on any reasonable hash function you must be doing a lot of volume. (In fact I'm highly surprised collisions would be a problem even then, any reasonable hash function should provide diverse results over a small problem space like this).
As others have said, HMAC should be the way to go.
HMAC-SHA-256 with a proper key should:
Avoid collisions.
Avoid retrieval of the credit card number from the stored value.
Prevent an attacker from performing the same computation (on all possible credit card numbers, to find a matching value).
But there is one more very important thing:
It is with good reason that you are not storing the credit card numbers. Even if you could be 100% sure that you are using proper encryption, you probably still would not store credit card numbers. Why? For one thing, because the key could be leaked.
So you store hashes, so that the credit card number cannot be retrieved. ...Right?
Well, if you use a plain hash, a simple rainbow table with hashes of all possible credit card numbers gives away all the original data that you presumably did not store. Oops. But this you knew by now.
So we try to do better. Let's say using individual salts is better, and using HMAC is the best approach we know.
Consider the following scenario:
Take a 16-digit card number.
First 6 digits (Bank Identification Number) are guessed by trying a few common BINs.
Last 4 digits are visible in masked card number, which you are allowed to store. (You might not have this stored, which helps.)
1 digit is calculated (Luhn).
This leaves 5 digits to be brute-forced. That is a meager 100'000 attempts.
If we have used the individual salts, it's game over. We can simply brute-force each individual card number at an average of 50'000 attempts.
If we have used HMAC, we appear to be safe. But remember... we choose not to store encrypted card numbers, because even with perfect encryption, the key could be leaked. Guess what. Our HMAC key can be leaked just the same. With the key, again, we can brute-force each individual card number at an average of 50'000 attempts. So a leaked key gives us the credit card numbers, just as it would if we had stored encrypted card numbers.
As such, because of the low entropy of credit card numbers, storing hashes does not add much security compared to encrypted values (yet PCI limits the key rotation requirement to encryption).
A bit of perspective:
Ok, we're assuming a leaked key here. Extreme. But then again, so does PCI as part of their reasoning to forbid you from storing credit card numbers, so we should at least consider it.
True, I did not take into account the multiple guesses to find the BIN. It should be a small constant, though. Or we could limit ourselves to one BIN.
Definitely, a PCI auditor may be more forgiving than I am.
Yes, if you do not store the masked card number, you are a factor 10'000 safer. This helps a lot. Use it to your advantage. Still, if 50K attempts are doable, 500M may be doable, too. It's not enough to make me consider the data secure, in the context of a compromised key.
Conclusion:
Use HMAC-SHA-256. Understand the risk. Store as little as possible. Protect your keys vigilantly. Spend a fortune on a Hardware Security Module :-)
If you are finding collisions with MD5, why not use a better algorithm such as SHA1 or SHA256?
MD5 is NOT the way to go since it's broken. Quote Bruce Schneier: "[w]e already knew that MD5 is a broken hash function" and that "no one should be using MD5 anymore."
I.e. use SHA512 or SHA256 as someone already proposed.
As Henri already mentioned above (+1), the right solution is to use Message Authentication Code such as HMAC with a secret key. This is exactly the "secret salt" someone mentioned before. (BTW. Salts are always public).
Use standard construction such as HMAC-SHA-256 (RFC2104, FIPS-198a), keep the key secret and store the results (authentication tags) in a database.
The larger digest size (256 bits) of SHA-256 should prevent any collisions from happening, SHA-256 is a fairly good hash function and probability of random collisions is 2^-128, so if you ever encounter a collision in your system, please, let me know! :)
Using the strongest hash possible is usually good. Speed is not of the essence and slowness actually works against anyone trying a brute force reversal of your hashed values.
I like whirlpool, personally - if you're using PHP check out the supported algorithms at the hash function docs
Whirlpool returns a string 128 characters long, but you don't have to store all of it necessarily. The first 32 or 64 chars would suffice. You could also consider sha512 or sha284.
Dont bother doing salts, just use HMACs. I know it's kind of an abuse, but then you get a decent keyed hash, so you can prevent collisions and rainbow table attacks.
The nice thing here is that even if the key leaks, nobody can decrypt it. The best thing that works for HMACs is brute force. Actually, the key here is a salt as mentioned earlier. The nice thing here is that the algorithm is a little better than the usual salting stuff done by most non-security programmers.