Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Since breaking password hashes has become a new passtime for scriptkiddies, I thought of the problem and came up with a novel(?) idea.
store the pass as offset+number instead of hash
the number is a product of two large primes
the password is converted into a number , offset is added and that prime is used to divide the number. If it divides AND the divisor is the larger of the two primes the password is correct.
by definition , each hash is unique and each password can be hashed in many different ways depending on the offset. Breaking one hash means you have to factor the number(which is hard), then find a word which corresponds to a number that is largerprime-offset (which is trivial).
To generate use function f() to turn password into a password-number (not important) , generate two random primes larger than 2^4096 or however much is enough. Take the larger prime and calculate prime-passwordnumber=offset. Multiply the primes to get "number". store number and offset.
To check. use function f() to turn password into a password-number, add offset to find prime. divide number with prime to get the other prime. Check that the first prime was the bigger of the two. If so, password was correct.
f() might be for example utf-8 encoding of the password understood as a large binary integer.
Your procedure doesn't really gain you anything over using a hash function. Reversing your function is difficult, yes, since it requires factoring large numbers, but reversing regular hash functions is also difficult. An attacker can still employ the same procedure they would against a regular hash algorithm: employ a brute force attack by testing every possible password.
This, of course, is inevitable with any scheme that stores sufficient data to validate the password. The only solution is to make it computationally expensive for the attacker to do so, by making the hash function expensive to compute, and by adding a salt to make sure they can't precompute.
In general, trying to invent your own crypto system is very hard to do correctly. There are many little things that you have to consider, and it's easy to miss something that an attack can exploit. You'd still be much better off and safer if you used an established cryptography or hashing library. Bcrypt for hashing will probably be much more secure than the solution you posted.
To formalize your scheme:
To create the hash:
User enters password pw
Convert pw to a byte array ba with an encoding function e
Convert ba to a large integer bn
Find prime numbers p and q, p > q > max(bn, 2^2048)
Store n = pq and o = p - bn
To verify the hash:
User enters password pw
Convert pw to a byte array ba with an encoding function e
Convert ba to a large integer bn
Verify that bn + o divides n
This being a secure hash requires that given n and o, it's not feasible to deduce pw, i.e. there is no algorithm that gives an advantage over guessing and checking. I believe it.
As I see it, the main benefit of your scheme is the randomness injected into the hashing process by selecting the random numbers. That they are primes and factoring should be hard is more of an implementation detail (it's your one-way function). Presumably it should also slow down checks, though I really don't know how slow division is on numbers that large.
It is interesting that the hash creation and password verification processes are so different. As you point out, this makes the technique of rainbow table hash chaining inapplicable. This may be something of an advantage, but per-user salting gets you similar protection from rainbow tables.
Related
Is it safe to hash extremely complicated password (longer than 25 chars, any ascii chars even binary) with SHA1 ?
Actually, the password represent a tokenID but I don't want to store it like this in the database, i prefer to hash it for more security.
The password (token) is valid only for 14 Days and I need to hash it the most faster as possible (so no way to use something like bcrypt)
What must be the ideal length of the Password (token) ?
In the general case, no. "Complicated" it may be, but cryptographically random it probably is not.
A bare minimum would be applying an RFC2104 HMAC with a secret key (pepper); however, a more appropriate alternative that can, if you absolutely insist, still be quite fast would be to use PBKDF2-HMAC-SHA-256 and ignore all rules of security regarding a sufficiently high iteration count, i.e. choose an iteration count of 10, instead of 10,000.
For password/token hashing, of course, never request more bytes of PBKDF2 output than the native hash function provides - 20 for SHA-1, 32 for SHA-256, 64 for SHA-512.
I have several example implementations of PBKDF2 at my Github repository that may help, and there are others in other languages, of course.
Use a cryptographically random per-password (per-token) salt.
Is it possible to write a code that can crack the sha256 hash when you know the form of password? For example the password form is *-********** which is 12-13 characters long and:
The first char is one number from 1 to 25
Second one is hyphen
In each char from the third one to the end, you can put a...z, A...Z and 0...9
After guessing each pass, code converts the pass to sha256 and see whether the result hash is equal to our hash or not and then print the correct pass.
I know all possible numbers is a big number (26+26+10)^10 but I want to know that:
Is it possible to write such code?
If yes, is it possible to run whole code in less than one day (because I think it takes a lot of time to complete the whole code)?
Since I can't ask you to write a code for me, how and where can I ask for this code?
You cannot "crack" a SHA256 hash no matter how much information you know about the plaintext (assuming by crack you mean derive the plaintext from the hash). Even if you knew the password you could not determine any procedure for reversing the hash. In technical terms, there is no known way to perform a preimage attack on a SHA256 hash.
That means you have to resort to guessing or brute forcing the password:
You have a prefix, which can be any value in [1-25]- and 10 additional characters in [a-zA-Z0-9]. That means the total number of possible passwords is: 25 * 62^10 or 20,982,484,146,708,505,600.
If you were able to compute and check a billion passwords per second it would take you 20,982,484,146 seconds to generate every possible hash. If you start now you'll be finished in about 665 years.
If you are able to leverage some more computing power and generate a trillion hashes per second it would only take a bit more than half a year. The good news is that computing hashes can be done in parallel, so it is easy to utilize multiple machines. The bad news is that kind of computing power isn't going to be cheap.
To answer your questions:
Is it possible to write such code? It is possible to write a program that will iterate over the entire range of possible passwords and check it against the hash(es) you want to determine the plaintext for.
If yes, is it possible to run whole code in less than one day. Yes, if you can compute and check around 10^15 hashes per second.
How and where can I ask for this code? This is the least of your problems.
Fortunately, since bitcoin uses sha256, it is pretty easy to find rough numbers on the amount of computing power it takes to generate the number of hashes you need.
If the numbers in this article are correct a Raspberry Pi can generate 2*10^5 hashes per second. I believe the newer Raspberry Pis are more powerful than that so I'm going to double that to 4*10^5. You need to generate about 10^15 hashes per second to be done in less than a day.
You're going to need 250,000,000 Raspberry Pis.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I'm not a fan of complex passwords as I have a hard time remembering them. Because of that I like the message of this comic.
However typing the sentence "correct horse battery staple" into this calculator yields "12.41 trillion trillion trillion centuries" as opposed to the comic's "550 years".
How can they differ so much, which one is correct if any and how would I know?
How do I create a strong enough password without making it difficult to remember?
The reason for this difference is basically given on the linked side itself:
IMPORTANT!!! What this calculator is NOT . . .
It is NOT a “Password Strength Meter.”
Since it could be easily confused for one, it is very important for you to understand what it is, and what it isn't:
The #1 most commonly used password is “123456”, and the 4th most common is “Password.” So any password attacker and cracker would try those two passwords immediately. Yet the Search Space Calculator above shows the time to search for those two passwords online (assuming a very fast online rate of 1,000 guesses per second) as 18.52 minutes and 17.33 centuries respectively! If “123456” is the first password that's guessed, that wouldn't take 18.52 minutes. And no password cracker would wait 17.33 centuries before checking to see whether “Password” is the magic phrase.
The caclulator basically only considers brute force attempts, while an actual attack would probably be a dictionary arrack. Since most combinations of letters are not actual words a dictionary attack will try a lot less combinations, thous getting a result much faster
IMPORTANT!!! What this calculator is NOT . . .
It is NOT a “Password Strength Meter.”
The calculator assumes that cracker uses exhaustive search. xkcd assumes that cracker may know (or guess) your method of generating password and needs to check only the passwords which you can choose. xkcd method is far safer.
Not any strong password is 100% safe, few websites can really protect user's password. You'd better not to use only one password everywhere. What you do is to keep the straw on fire away from others.
What I do is:
a unforgettable password: A;
the website asking for a password, "www.example.com", as B;
get C = md5(A) + md5(B), and use the leading 8 characters of C as the password;
write a simple script for this, and of course, you may adjust the algorithm, and do keep the script on cloud.
the browser will save password for us, if it asks you for re-enter the password, you can get it back at once.
The operator '+' is not as strcat. It means:
I get the md5 in low letters, and saying that the '0' values 0, ..., 'a' values 10, ..., 'A' is 36, ..., and as so on.Then calculate the result at every character with their values, discarding the carry, and do "mod(62)".
I need to know the cost of succeeding with a Preimage attack ("In cryptography, a preimage attack on a cryptographic hash is an attempt to find a message that has a specific hash value.", Wikipedia).
The message I want to hash consists of six digits (the date of birth), then four random digits. This is a social security number.
Is there also a possibility to hash something using a specific password. This would introduce another layer of security as one would have to know the password in order to produce the same hash values for a message.
I am thinking about using SHA-2.
If you want to know how expensive it is to find a preimage for the string you're describing, you need to figure out how many possible strings there are. Since the first 6 digits are a date of birth, their value is even more restricted than the naive assumption of 10^6 - we have an upper bound of 366*100 (every day of the year, plus the two digit year).
The remaining 4 'random' digits permit another 10^4 possibilities, giving a total number of distinct hashes of 366 * 100 * 10^4 = 366,000,000 hashes.
With that few possibilities, it ought to be possible to find a preimage in a fraction of a second on a modern computer - or, for that matter, to build a lookup table for every possible hash.
Using a salt, as Tom suggests, will make a lookup table impractical, but with such a restricted range of valid values, a brute force attack is still eminently practical, so it alone is not sufficient to make the attack impractical.
One way to make things more expensive is to use iterative hashing - that is, hash the hash, and hash that, repeatedly. You have to do a lot less hashing than your attacker does, so increases in cost affect them more than they do you. This is still likely to be only a stopgap given the small search space, however.
As far as "using a password" goes, it sounds like you're looking for an HMAC - a construction that uses a hash, but can only be verified if you have the key. If you can keep the key secret - no easy task if you're assuming the hashes can only be obtained if your system is compromised in the first place - this is a practical system.
Edit: Okay, so 'fractions of a second' may have been a slight exaggeration, at least with my trivial Python test. It's still perfectly tractable to bruteforce on a single computer in a short timeframe, however.
SHA-2, salts, preimage atttacks, brute forcing a restricted, 6-digit number - man it would be awesome if we have a dial we could turn that would let us adjust the security. Something like this:
Time to compute a hash of an input:
SHA-2, salted Better security!
| |
\|/ \|/
|-----------------------------------------------------|
.01 seconds 3 seconds
If we could do this, your application, when verifying that the user entered data matches what you have hashed, would in fact be a few seconds slower.
But imagine being the attacker!
Awesome, he's hashing stuff using a salt, but there's only 366,000,000 possible hashes, I'm gonna blaze through this at 10,000 a second and finish in ~10 hours!
Wait, what's going on! I can only do 1 every 2.5 seconds?! This is going to take me 29 years!!
That would be awesome, wouldn't it?
Sure would.
I present unto you: scrypt and bcrypt. They give you that dial. Want to spend a whole minute hashing a password? They can do that. (Just make sure you remember the salt!)
I'm unsure what your question is exactly, but to make your encrypted value more secure, use salt values.
Edit: I think you are sort of describing salt values in your question.
This question already has answers here:
Probability of SHA1 collisions
(3 answers)
Closed 6 years ago.
Given two different strings S1 and S2 (S1 != S2) is it possible that:
SHA1(S1) == SHA1(S2)
is True?
If yes - with what probability?
If not - why not?
Is there a upper bound on the length of a input string, for which the probability of getting duplicates is 0? OR is the calculation of SHA1 (hence probability of duplicates) independent of the length of the string?
The goal I am trying to achieve is to hash some sensitive ID string (possibly joined together with some other fields like parent ID), so that I can use the hash value as an ID instead (for example in the database).
Example:
Resource ID: X123
Parent ID: P123
I don't want to expose the nature of my resource identifies to allow client to see "X123-P123".
Instead I want to create a new column hash("X123-P123"), let's say it's AAAZZZ. Then the client can request resource with id AAAZZZ and not know about my internal id's etc.
What you describe is called a collision. Collisions necessarily exist, since SHA-1 accepts many more distinct messages as input that it can produce distinct outputs (SHA-1 may eat any string of bits up to 2^64 bits, but outputs only 160 bits; thus, at least one output value must pop up several times). This observation is valid for any function with an output smaller than its input, regardless of whether the function is a "good" hash function or not.
Assuming that SHA-1 behaves like a "random oracle" (a conceptual object which basically returns random values, with the sole restriction that once it has returned output v on input m, it must always thereafter return v on input m), then the probability of collision, for any two distinct strings S1 and S2, should be 2^(-160). Still under the assumption of SHA-1 behaving like a random oracle, if you collect many input strings, then you shall begin to observe collisions after having collected about 2^80 such strings.
(That's 2^80 and not 2^160 because, with 2^80 strings you can make about 2^159 pairs of strings. This is often called the "birthday paradox" because it comes as a surprise to most people when applied to collisions on birthdays. See the Wikipedia page on the subject.)
Now we strongly suspect that SHA-1 does not really behave like a random oracle, because the birthday-paradox approach is the optimal collision searching algorithm for a random oracle. Yet there is a published attack which should find a collision in about 2^63 steps, hence 2^17 = 131072 times faster than the birthday-paradox algorithm. Such an attack should not be doable on a true random oracle. Mind you, this attack has not been actually completed, it remains theoretical (some people tried but apparently could not find enough CPU power)(Update: as of early 2017, somebody did compute a SHA-1 collision with the above-mentioned method, and it worked exactly as predicted). Yet, the theory looks sound and it really seems that SHA-1 is not a random oracle. Correspondingly, as for the probability of collision, well, all bets are off.
As for your third question: for a function with a n-bit output, then there necessarily are collisions if you can input more than 2^n distinct messages, i.e. if the maximum input message length is greater than n. With a bound m lower than n, the answer is not as easy. If the function behaves as a random oracle, then the probability of the existence of a collision lowers with m, and not linearly, rather with a steep cutoff around m=n/2. This is the same analysis than the birthday paradox. With SHA-1, this means that if m < 80 then chances are that there is no collision, while m > 80 makes the existence of at least one collision very probable (with m > 160 this becomes a certainty).
Note that there is a difference between "there exists a collision" and "you find a collision". Even when a collision must exist, you still have your 2^(-160) probability every time you try. What the previous paragraph means is that such a probability is rather meaningless if you cannot (conceptually) try 2^160 pairs of strings, e.g. because you restrict yourself to strings of less than 80 bits.
Yes it is possible because of the pigeon hole principle.
Most hashes (also sha1) have a fixed output length, while the input is of arbitrary size. So if you try long enough, you can find them.
However, cryptographic hash functions (like the sha-family, the md-family, etc) are designed to minimize such collisions. The best attack known takes 2^63 attempts to find a collision, so the chance is 2^(-63) which is 0 in practice.
git uses SHA1 hashes as IDs and there are still no known SHA1 collisions in 2014. Obviously, the SHA1 algorithm is magic. I think it's a good bet that collisions don't exist for strings of your length, as they would have been discovered by now. However, if you don't trust magic and are not a betting man, you could generate random strings and associate them with your IDs in your DB. But if you do use SHA1 hashes and become the first to discover a collision, you can just change your system to use random strings at that time, retaining the SHA1 hashes as the "random" strings for legacy IDs.
A collision is almost always possible in a hashing function. SHA1, to date, has been pretty secure in generating unpredictable collisions. The danger is when collisions can be predicted, it's not necessary to know the original hash input to generate the same hash output.
For example, attacks against MD5 have been made against SSL server certificate signing last year, as exampled on the Security Now podcast episode 179. This allowed sophisticated attackers to generate a fake SSL server cert for a rogue web site and appear to be the reaol thing. For this reason, it is highly recommended to avoid purchasing MD5-signed certs.
What you are talking about is called a collision. Here is an article about SHA1 collisions:
http://www.rsa.com/rsalabs/node.asp?id=2927
Edit: So another answerer beat me to mentioning the pigeon hole principle LOL, but to clarify this is why it's called the pigeon hole principle, because if you have some holes cut out for carrier pigeons to nest in, but you have more pigeons than holes, then some of the pigeons(an input value) must share a hole(the output value).