How to lookup encrypted data with sql keyword 'like' - sql

I know it is a really strange question.There are some columns in my database using RSA for data encrytion.But front-end is using instance searching. So, I need matching the encrypted data with part of the plaintext.
eg:the original text is "stackoverflow",and it is encrypted in database. When the front-end input get plaintext "stack" ,the encryted "stackoverflow" should be matched.
I know there is a solution: Load all data, decrypt and match.And Huge usage of memory.So, How to deal this within database? what should I do if I want to use the keyword 'like'?

If there are a limited number of keywords that can be searched for, and if showing them in plaintext does not compromise security, then I suggest the following...
As you insert the text (encrypted by the client) into the table, also insert the relevant plaintext words into another column. Then use FULLTEXT on that column.
FULLTEXT would be much faster, but has limitations.

Related

Masking/Hashing data

As SQL dba, I need to export data which have some personal/sensitive information such as the national identification number (NiN). This field is a 10-digits unique number and it's not allowed as per our company's policy to export such data. Is there anyway I can generate a new field out of NiN but with different value and same length. I need this value to be consistent across all tables so that we can use this new field to JOIN data instead of using NiN.
I am thinking of HashBytes function but it generates an output with different length (10 digits).
Data is huge, so it's important to avoid collision. What's the best way to do this?
Thanks
First, I would change the format of the produced value to be different from the internal version. That will make it much simpler to see right away if there is an issue.
Second, you can use a hashing algorithm such as sha256 which is quite unlikely to have conflicts. That might be good enough.
Third, you need to think through the security requirements better. My preferred solution is to have a look-up table that matches internal numbers to external values. Then, this table is used for all exports and imports to translate between the two. A suggestion here would be to use newid() to generate the value and to use GUIDs for external data.
However, this may not be sufficient for your requirements. Why? The same number has the same value over time. So, although you might be able to hide the internal value and even forget it, a given external value still still matches to a single number -- tying external records together.
The solution to this is something called "salt" in the hashing function. This allows the external value to change over time, while still mapping to the same internal number.

Get original sql query in postgres extension in C

I am creating extension to postgres in C (c++). It is new data type that behave like text but it is encrypted by HSM device. But I have problem to use more then one key to protect data. My idea is to get original sql query and process it to choose what key I should use. But I don't know how to do that or if it is even possible?
My goal is to change some existing text fields in database to encrypted ones. And that's why I can't provide key number to my type in direct way. Type must be seen by external app as text.
Normally there is userID field and single query always use that id to get or set encrypted data. Base on that field I want to chose key. HSM can have billions of keys in itself and that's mean every user can have it's own key. It's not a problem if I need to parse string by myself, I am more then capable of doing that. Performance is not issue too, HSM is so slow that I can encode or decode only couple fields in one second.
In most parts of the planner and executor the current (sub)query is available in a passed PlannerInfo struct, usually:
PlannerInfo *root
This has a parse member containing the Query object.
Earlier in the system, in the rewriter, it's passed as Query *root directly.
In both cases, if there's evaluation of a nested subquery going on, you get the subquery. There's no easy way to access the parent Query node.
The query tree isn't always available deeper in execution paths, such as in expression evaluation. You're not supposed to be referring to it there; expressions are self contained, and don't need to refer to the rest of the query.
So you're going to have a problem doing what you want. Frankly, that's because it's a pretty bad design from the sounds. What you should consider instead is:
Using a function to encode/decode the type to/from cleartext, allowing you to pass parameters; or possibly
Using the typmod of the type to store the desired information (but be aware that the typmod is not preserved across casts, subqueries, etc).
There's also the debug_query_string global, but really don't use that. It's unparsed query text so it won't help you anyway. If you (ab)use this in your code, I will cry. I'm only telling you it exists so I can tell you not to use it.
By far and away your best option is going to be to use a function-based interface for this.

Optimising range/wildcard search on encrypted columns.

I have couple of requirements which don't really play well with each other:
Encrypt the first name, last name, DOB along with few other columns in a table (database is Sql Server).
Perform range/wildcard search on some of those encrypted columns. i.e. select * from table where first_name like '%jo%' and last_name like '%exceptional%'.
I know that I need to decrypt the whole table then perform the search which is painfully slow. But somehow I need to optimise the search.
Now I can think of doing the search either on the database or inside the application using dataset/ Linq etc.
So, which approach will be relatively faster? Is there any other way of optimising this?
You should look into Data Hashing. Hashing can allow you to do searches without having to decrypt every row.
http://blogs.msdn.com/b/sqlsecurity/archive/2011/08/26/data-hashing.aspx

How to store sensitive information in SQL Server 2008?

I need to store some sensitive information in a table in SQL Server 2008.
The data is a string and I do not want it to be in human readable format to anyone accessing the database.
What I mean by sensitive information is, a database of dirty/foul words. I need to make sure that they are not floating around in tables and SQL files.
At the same time, I should be able to perform operations like "=" and "like" on the strings.
So far I can think of two options; will these work or what is a better option?
Store string (varchar) as binary data (BLOB)
Store in some encrypted format, like we usually do with passwords.
A third option, which may be most appropriate, is to simply not store these values in the particular database at all. I would argue that it is probably more appropriate to store them elsewhere, since you're probably not going to JOIN against the table of sensitive words.
Otherwise, you probably want to use Conrad Frix's suggestion of SQL Server's built-in encryption support.
The reason I say this is because you say both = and LIKE must work across your data. When you hash a string using a hash algo such as SHA/MD5/etc., the results won't obey human language LIKE semantics.
If exact equality (=) is sufficient (i.e. you don't really need to be able to do LIKE queries), you can use a cryptographic function to secure the text. But keep in mind that a one-way hash function would prohibit you from getting a list of strings "un-hashed" - if you need to do that, you need to use an encryption algo where decryption is possible, such as AES.
If you use rot13, then you can still use = and LIKE. This also applies to any storage method other than an SQL database, if preventing casual/accidental views (including search engine indexing, if the list is public) is that important.

Sphinx question: Structuring database

I'm developing a job service that has features like radial search, full-text search, the ability to do full-text search + disable certain job listings (such as un-checking a textbox and no longer returning full-time jobs).
The developer who is working on Sphinx wants the database information to all be stored as intergers with a key (so under the table "Job Type" values might be stored such as 1="part-time" and 2="full-time")... whereas the other developers want to keep the database as strings (so under the table "Job Type" it says "part-time" or "full-time".
Is there a reason to keep the database as ints? Or should strings be fine?
Thanks!
Walker
Choosing your key can have a dramatic performance impact. Whenever possible, use ints instead of strings. This is called using a "surrogate key", where the key presents a unique and quick way to find the data, rather than the data standing on it's own.
String comparisons are resource intensive, potentially orders of magnitude worse than comparing numbers.
You can drive your UI off off the surrogate key, but show another column (such as job_type). This way, when you hit the database you pass the int in, and avoid looking through to the table to find a row with a matching string.
When it comes to joining tables in the database, they will run much faster if you have int's or another number as your primary keys.
Edit: In the specific case you have mentioned, if you only have two options for what your field may be, and it's unlikely to change, you may want to look into something like a bit field, and you could name it IsFullTime. A bit or boolean field holds a 1 or a 0, and nothing else, and typically isn't related to another field.
if you are normalizing your structure (i hope you are) then numeric keys will be most efficient.
Aside from the usual reasons to use integer primary keys, the use of integers with Sphinx is essential, as the result set returned by a successful Sphinx search is a list of document IDs associated with the matched items. These IDs are then used to extract the relevant data from the database. Sphinx does not return rows from the database directly.
For more details, see the Sphinx manual, especially 3.5. Restrictions on the source data.