Based on my research the purpose of a salt is to defeat the use of a rainbow table. This is done because rainbow tables are only created to look up hashes of a sole password(without a salt). I am having a conflict understanding how we can't use rainbow tables in when salts are introduced. Suppose we have the following scenario:
I am a malicious hacker and I want to gain access to a rich person's bank account. I am able to gain access to the bank's database which has the salt and the hashed string in plain sight, which is a function of the user's password and salt (f(password + salt)). The salt is fsd88. Next I get a rainbow table from some hacker on the web. Great, so I am all ready to become rich and move to Switzerland.
What I do next is I take the the hashed string and look it up on the rainbow table (according to a tutorial online this takes about an hour to do). The rainbow table look up then returns passwfsd88. Since I know the salt is fsd88. I now know what the password is! It's passw!
What is wrong with my mental model of a salt? Thanks for reading.
The salt is added before the hash is calculated:
$password = 'secret';
$salt = 'kU832hNWQ2122093uiue';
$passwordHash = hash($password + $salt);
In this example not the hash of password 'secret' was calculated, instead it is the hash of 'secretkU832hNWQ2122093uiue'. Nobody will ever create a rainbowtable with such passwords, if you find a precalculated rainbow-table it would contain the hash of 'secret'.
Of course you can build a rainbow-table with a list of possible passwords with this salt (the salt is not secret), but if each password got its unique salt, then you would have to build rainbow-tables for each password separately. That means, the salt prevents to use a single rainbow-table to get all passwords at once.
Good to know that i'm rich now, regards from switzerland :-)
Related
I want to create an anonymized version of a table where one of the id fields (long) needs to be anonymized.
The table is queried by a huge number of different business stake holders so I would prefer to not change the field type in order to minimize SQL changes for consumers.
I guess it requires some sort of HMAC like hash algorithm with a secret that makes the mapping fully one-way after the secret is deleted/forgotten.
This sounds like something that one should not roll yourself.
It has to be secure and have very few collisions.
Is there something recommended by GDPR specialists?
Or is this not really possible? (We will need to change the field to a larger "string" field)
I need to store some user data in one server database. It is not sensitive data, but neither I or other users should get access to this content. I want the owner to be confident that even I or other developers will not be able to read his data querying the database.
Tables structure will be like this: (ID | Data) where Data should contain an encrypted JSON string.
I would like to do encryption/decryption on the client, using a secret key that would not persisted anywhere (user should keep this password or otherwise he would not be able to get his data anymore).
Do you see any pitfall in this approach?
I understand that if we just hash a password, a hacker could use a pre calculated hashed password table and compare it to the actual hashed password table. From what I understand if we add a random string as a salt before hashing, the pre calculated hash table won't work.
Now my question is, suppose a users password is "password", I add 999 to it and hash the string "password999" before saving it. Now when the user returns to my site, how do I know that I need to add 999 to his password before it is compared to the hashed values in the database? Do I maintain a separate table with all the salts for every username?
You need to store the salt for each user in a separate column next to the password.
Given this example:
user:1 email bob#bob.com
user:1 name bob
Based on my research, all the examples create an "index" similar to the following:
user:bob#bob.com 1
My question is: wouldn't it be better to store it as "user:1"? That would eliminate the need to concatenate the string in code. Is there some other reason not to store the whole string? Memory maybe?
The question was specifically about storing the full key in the index or just a numeric ID which is part of this key.
Redis has a number of memory optimizations that you may want to leverage to decrease general memory consumption. One of these optimizations is the intset (an efficient structure to represent sets of integers).
Very often, sets are used as index entries, and in that case, it is much better to store a numeric ID rather than an alphanumeric key, to benefit from the intset optimization.
Your example is slightly different because a given email address should be associated to only one user. A unique hash object is fine to store the whole index. I would still use numeric ID here since it is more compact, and may benefit from future Redis optimizations.
Based on what you've conveyed so far, I'd use Redis hashes. For example, I'd denormalize the data a bit and store is as hmset users:1 email bob#bob.com name Bob and 'hset users:lookup:email bob#bob.com 1'.
This way, I can retrieve the user using both his email ID or user ID. You could create more lookup hashes depending on your needs.
For more useful patterns, look at the Little Redis book, written by Salvatore Sanfilippo himself.
For a database I'm building, I've decided to use natural numbers as the primary key. I'm aware of the advantages that GUID's allow, but looking at the data, the bulk of row's data were GUID keys.
I want to generate XML records from the database data, and one problem with natural numbers is that I don't want to expose my database key's to the outside world, and allow users to guess "keys." I believe GUID's solve this problem.
So, I think the solution is to generate a sparse, unique iD derived from the natural ID (hopefully it would be 2-way), or just add an extra column in the database and store a guid (or some other multibyte id)
The derived value is nicer because there is no storage penalty, but it would be easier to reverse and guess compared to a GUID.
I'm (buy) curious as to what others on SO have done, and what insights they have.
What you can do to compute a "GUID" is to calculate a MD5 hash of the ID with some salt (table name for instance), load this into a GUID and set a few bits so that it is a valid version 3 (MD5) GUID.
This is almost 2-way since you can have a SQL computed column (which can also be indexed in certain cases) holding the GUID without persisting it in the table, and you can always re-compute a GUID with the correct ID and salt, which should be harder for users since they don't know the salt nor the actual ID.