In my program, we store a user's IP address in a record. When we display a list of records to a user, we don't want to give away the other user's IP, so we SHA1 hash it. Then, when the user clicks on a record, it goes to a URL like this:
http://www.example.com/allrecordsbyipaddress.php?ipaddress=SHA1HASHOFTHEIPADDRESS
Now, I need to list all the records by the IP address specified in the SHA1 hash. I tried this:
SELECT * FROM records
WHERE SHA1(IPADDRESS)="da39a3ee5e6b4b0d3255bfef95601890afd80709"
but this does not work. How would I do this?
Thanks,
Isaac Waller
Don't know if it matters, but your SHA1 hash da39a3ee5e6b4b0d3255bfef95601890afd80709 is a well-known hash of an empty string.
Is it just an example or you forgot to provide an actual IP address to the hash calculation function?
Update:
Does your webpage code generate SHA1 hashes in lowercase?
This check will fail in MySQL:
SELECT SHA1('') = 'DA39A3EE5E6B4B0D3255BFEF95601890AFD80709'
In this case, use this:
SELECT SHA1('') = LOWER('DA39A3EE5E6B4B0D3255BFEF95601890AFD80709')
, which will succeed.
Also, you can precalculate the SHA1 hash when you insert the records into the table:
INSERT
INTO ip_records (ip, ip_sha)
VALUES (#ip, SHA1(CONCAT('my_secret_salt', #ip))
SELECT *
FROM ip_records
WHERE ip_sha = #my_salted_sha1_from_webpage
This will return you the original IP and allow indexing of ip_sha, so that this query will work fast.
I'd store the SHA1 of the IP in the database along with the raw IP, so that the query would become
SELECT * FROM records WHERE ip_sha1 = "..."
Then I'd make sure that the SHA1 calculation happens exactly one place in code, so that there's no opportunity for it be be done slightly differently in multiple places. That also gives you the opportunity to mix a salt into the calculation, so that someone can't simply compute the SHA1 on an IP address they're interested in and pass that in by hand.
Storing the SHA1 hash the database also gives you the opportunity to add a secondary index on ip_sha1 to speed up that SELECT. If you have a very large data set, doing the SHA1 in the WHERE clauses forces the database to do a complete table scan, along with redoing a calculation for every record on every scan.
Every time I've had an unexpected hashing mismatch, it was because I accidentally hashed a string that included some whitespace, such as "\n".
Just a quick thought: that's a very simple obfuscation. There are only 232 possible IP addresses, so if somebody with technical knowledge wanted to figure it out, they could do that by calculating all 4 billion hashes, which wouldn't take very long. Depending on the sensitivity of those ip addresses, you may want to consider a private lookup table.
Did you compare the output of your hash algorithm with the output of MySQL's SHA1()? For example for IP address 1.2.3.4?
I ended up encrypting the IP addresses, and decrypting them on the other page. Then I can just use the raw IP address in the SQL query. Also, it protects against brute force attacks, like Autocracy said.
Related
We have phone number fields that we need to obfuscate in a UAT environment, the problem is that the number needs to be unique, and should match other data processes using other databases that are also obfuscated. I'm trying to create a function that will reliably scramble a number, and each number passed in produces the same scrambled number every time, using some kind of encryption key that we'll store safely. I haven't found a way to reliably reproduce numbers in the same 10 digit format. Any ideas?
Why not use any hash function that will give you a guid?
E.g.
hash('012345677899')
in python
or
SELECT HASHBYTES('SHA2_256', '0103203803') in t-sql
https://learn.microsoft.com/en-us/sql/t-sql/functions/hashbytes-transact-sql?view=sql-server-ver15
I believe Column Encryption is what you're looking for. You can encrypt the column, then pass the encrypted value.
SQLShack did a good write up as well.
Column Encryption is not what Steve is looking for, the phone number fields needs to obfuscated in the lower environment after a refresh from production in 2 separate tables and guarantee the same number of rows match before and after the process completes.
The process below seems to have worked but the before count did not match the after count.
SET [somePhone] = BINARY_CHECKSUM([somePhone])
Microsoft dynamic-data-masking may be a better option.
https://learn.microsoft.com/en-us/sql/relational-databases/security/dynamic-data-masking?view=sql-server-ver15
I have a project at hand which uses SQL Server to store fingerprints bitmap from a terminal hand held fingerprint reader.
My question: is there a way of comparing the fingerprint match in the database instead of bringing back all the fingerprints from the database for authentication?
Something like a query eg
SELECT *
FROM table
WHERE fingerprintcolumn = fingerprint_template
You can not do the comparison with simple compare/= operator. In the the reality, when you get the same fingerprint two times, both the images will be different from each other with little bit position change, angle change, and quality of the scan. So String comparison is not possible.
You have to get your automated fingerprint identification system implemented or need to get the 3rd party fingerprint comparison services like one from Cams Fingerprint Comparison API
Long story short I want to be able to read passwords stored in our database to be able to query weak passwords for our employees as there are currently no restrictions. What I've been doing in the past is changing it from the front end to see what it looks like on the backend. For instance this is what "password" looks like on the back end JXm7CJyoCBnURIrneTtflA== .
I'm not sure if this is possible, or what type of encryption is used. Any help would be great!!
Thanks
This particular field is Base64 encoded, and has 16 bytes if you decode it (Convert.FromBase64String). This smells like a MD5 (hash algorithm) - especially if other fields have also 16 bytes when decoded. There is no way in hell how to decrypt hash (there are some options like rainbow tables but you can't be 100% sure). Algorithm works like this: you hash password in db, you hash whatever user puts in as password when he logs in and hash it as well - if hash matches user has entered correct password.
In our DB, every Person has an ID, which is the DB generated, auto-incremented integer. Now, we want to generate a more user-friendly alpha-numeric ID which can be publicly exposed. Something like the Passport number. We obviously don't want to expose the DB ID to the users. For the purpose of this question, I will call what we need to generate, the UID.
Note: The UID is not meant to replace the DB ID. You can think of the UID as a prettier version of the DB ID, which we can give out to the users.
I was wondering if this UID can be a function of the DB ID. That is, we should be able to re-generate the same UID for a given DB ID.
Obviously, the function will take a 'salt' or key, in addition to the DB ID.
The UID should not be sequential. That is, two neighboring DB IDs should generate visually different-looking UIDs.
It is not strictly required for the UID to be irreversible. That is, it is okay if somebody studies the UID for a few days and is able to reverse-engineer and find the DB ID. I don't think it will do us any harm.
The UID should contain only A-Z (uppercase only) and 0-9. Nothing else. And it should not contain characters which can be confused with other alphabets or digits, like 0 and O, l and 1 and so on. I guess Crockford's Base32 encoding takes care of this.
The UID should be of a fixed length (10 characters), regardless of the size of the DB ID. We could pad the UID with some constant string, to bring it to the required fixed length. The DB ID could grow to any size. So, the algorithm should not have any such input limitations.
I think the way to go about this is:
Step 1: Hashing.
I have read about the following hash functions:
SHA-1
MD5
Jenkin's
The hash returns a long string. I read here about something called XOR folding to bring the string down to a shorter length. But I couldn't find much info about that.
Step 2: Encoding.
I read about the following encoding methods:
Crockford Base 32 Encoding
Z-Base32
Base36
I am guessing that the output of the encoding will be the UID string that I am looking for.
Step 3: Working around collisions.
To work around collisions, I was wondering if I could generate a random key at the time of UID generation and use this random key in the function.
I can store this random key in a column, so that we know what key was used to generate that particular UID.
Before inserting a newly generated UID into the table, I would check for uniqueness and if the check fails, I can generate a new random key and use it to generate a new UID. This step can be repeated till a unique UID is found for a particular DB ID.
I would love to get some expert advice on whether I am going along the correct lines and how I go about actually implementing this.
I am going to be implementing this in a Ruby On Rails app. So, please take that into consideration in your suggestions.
Thanks.
Update
The comments and answer made me re-think and question one of the requirements I had: the need for us to be able to regenerate the UID for a user after assigning it once. I guess I was just trying to be safe, in the case where we lose a user's UID and we will able to get it back if it is a function of an existing property of the user. But we can get around that problem just by using backups, I guess.
So, if I remove that requirement, the UID then essentially becomes a totally random 10 character alphanumeric string. I am adding an answer containing my proposed plan of implementation. If somebody else comes with a better plan, I'll mark that as the answer.
As I mentioned in the update to the question, I think what we are going to do is:
Pre-generate a sufficiently large number of random and unique ten character alphanumeric strings. No hashing or encoding.
Store them in a table in a random order.
When creating a user, pick the first these strings and assign it to the user.
Delete this picked ID from the pool of IDs after assigning it to a user.
When the pool reduces to a low number, replenish the pool with new strings, with uniqueness checks, obviously. This can be done in a Delayed Job, initiated by an observer.
The reason for pre-generating is that we are offloading all the expensive uniqueness checking to a one-time pre-generation operation.
When picking an ID from this pool for a new user, uniqueness is guaranteed. So, the operation of creating user (which is very frequent) becomes fast.
Would db_id.chr work for you? It would take the integers and generate a character string from them. You could then append their initials or last name or whatever to it. Example:
user = {:id => 123456, :f_name => "Scott", :l_name => "Shea"}
(user.id.to_s.split(//).map {|x| (x.to_i + 64).chr}).join.downcase + user.l_name.downcase
#result = "abcdefshea"
I've got an old application that has user passwords stored in the database with an MD5 hash. I'd like to replace this with something in the SHA-2 family.
I've thought of two possible ways to accomplish this, but both seem rather clunky.
1) Add a boolean "flag" field. The first time the user authenticates after this, replace the MD5 password hash with the SHA password hash, and set the flag. I can then check the flag to see whether the password hash has been converted.
2) Add a second password field to store the SHA hash. The first time the user authenticates after this, hash the password with SHA and store it in the new field (probably delete their MD5 hash at the same time). Then I can check whether the SHA field has a value; this essentially becomes my flag.
In either case, the MD5 authentication would have to remain in place for some time for any users who log in infrequently. And any users who are no longer active will never be switched to SHA.
Is there a better way to do this?
Essentially the same, but maybe more elegant than adding extra fields: In the default authentication framwork in Django, the password hashes are stored as strings constructed like this:
hashtype$salt$hash
Hashtype is either sha1 or md5, salt is a random string used to salt the raw password and at last comes the hash itself. Example value:
sha1$a1976$a36cc8cbf81742a8fb52e221aaeab48ed7f58ab4
You can convert all your MD5 Strings to SHA1 by rehashing them in your DB if you create your future passwords by first MD5ing them. Checking the passwords requires MD5ing them first also, but i dont think thats a big hit.
php-code (login):
prev:
$login = (md5($password) == $storedMd5PasswordHash);
after:
$login = (sha1(md5($password)) == $storedSha1PasswordHash);
Works also with salting, got the initial idea from here.
I think you've already got the best possibilities. I like #1 more than #2, since there's no use for the md5 once the sha is set.
There's no way to reverse the MD5, so you have to wait for the user to authenticate again to create a new hash.
No - basically you'll have to keep the MD5 in place until all the users you care about have been converted. That's just the nature of hashing - you don't have enough information to perform the conversion again.
Another option in-keeping with the others would be to make the password field effectively self-describing, e.g.
MD5:(md5 hash)
SHA:(sha hash)
You could then easily detect which algorithm to use for comparison, and avoid having two fields. Again, you'd overwrite the MD5 with SHA as you went along.
You'd want to do an initial update to make all current passwords declare themselves as MD5.
Your second suggestion sounds the best to me. That way frequent users will have a more secure experience in the future.
The first effectively "quirks-mode"'s your codebase and only makes sure that new users have the better SHA experience.
If the MD5's aren't salted you can always use a decryption site/rainbow tables such as: http://passcracking.com/index.php to get the passwords. Probably easier to just use the re-encode method though.
Yes you should know the real password first before you convert it into sha-1..
If you want to find the real password from md5 encrypted string, you can try md5pass.com