Sanitize search string for Dynamic SQL Queries - sql

I know i can use stored procedures to eliminate sql injection attacks but it would bloat the code by more than I'm willing to accept and making it costly to maintain.
In my dynamic sql query, I would like to search a string of text in 2 columns in one of my tables but before that happens, I would like my business layer, which is written in c#, to sanitize sanitize the input. I would like the input to have special characters (ie: #,!, $, etc.) What is the minimal character set that i have to strip out in my search string to sanitize it? I'm thinking that stripping out single and double quotes is sufficient. Is that correct?
Thanks

You don't need to use stored procedures to be safe. (As a matter a fact, stored procedures don't necessarily guarantee safety against injection attacks if the stored procedures themselves construct dynamic queries.) And manual escaping is difficult to do 100% safely, and not recommended.
Instead, use parameterized queries, which nearly all databases support.

If you use stored procedures or parameterized statements, you shouldn't need to sanitize anything, unless you are building strings blindly in dynamic SQL within the procedure. If that is the case, please read Erland's excellent article on dynamic SQL:
http://sommarskog.se/dynamic_sql.html

Also semicolons to stop subsequent statements begin defined (necessary, but not sufficient)

You're approaching this from an unsafe direction. You want to define the set of characters that should be allowed (checking that they're not special, etc), and then strip everything not in that set.
You should probably also look into SqlCommands as a safer way to build the string sent to the DB.

Related

Should you use quotes on system identifiers in your query?

It may depend on the database type, but is there a preference (by the database and not the coder) or is it better to use quotes? Is it:
Faster?
Less error prone?
Helps prevent injection (if using a PDO or not)?
Assume there is nothing requiring the use (spaces, reserved words, etc.).
MySQL:
SELECT `id` FROM `table` WHERE `name` = '$name';
ANSI:
SELECT "id" FROM "table" WHERE "name" = '$name';
vs:
SELECT id FROM table WHERE name = $name;
This answer talks about the requirement to use quotes in MySQL, but I'm interested in when it's not required by the db, but it might be preferred/better for the aspects (and perhaps more) that I listed above.
Quotes around identifiers are used only during the query parsing stage - a stage that goes through the SQL statement, and figures out its syntactic elements. Compared to other stages (query optimization, query execution, and passing the results back to the caller) the parsing stage is relatively short. Therefore, you should not expect any measurable speedup or slowdown from using quotes around your identifiers, regardless of your particular RDBMS.
As far as being more or less error prone goes, missing quotes around multipart identifiers become apparent very quickly during the development stage, so the practice of placing quotes everywhere it is not worth the trouble, because the readability to humans suffers significantly.
Finally, adding quotes around identifiers would not help you prevent injection attacks; same goes for not placing quotes around all identifiers. Many SQL script generators take this route to avoid if statements all over the script testing if the identifier is multipart or not.
The only situation where quoting all identifiers is a good idea is when you generate SQL programmatically, and the results are not intended for human readers.
I am not a fan of using quoted identifiers. They take too long to type and they clutter the code, making it harder for humans to read.
Also, I prefer to discourage the use of reserved words as identifiers in SQL. To me, this is just good practice. I mean, who wants to read something like:
select `select`, `from` as `as`
from <some really messed up table>
I use the the bottome version because I find it easier to read but it depends on the way you echoed your query. There is no difference in the performance as it reads past the quotes unless it is need so you really don't know it unless writing a WHERE clause in your query that contains more than one variable.
I'd say it's faster NOT to use quotes for numeric identifiers; why add the potential overhead of a data type conversion to a query you want (presumably) to be as performant as possible?

One function building SQL-strings by parameters. A security risk?

I try to avoid writing the SQL-code for each and every update or insert. Instead a PHP-function takes some parameters (add or change, tablename, fields-list) and builds the SQL from it. This is similar, but not identical, to the preferred method of using parameterized statements (mysqli).
Is my way of creating SQL putting a big smile in the face of hackers?
It's only a security risk if you allow any user input to get added to the SQL queries. You are fine if you are using constant parameters defined in your source code, but the actual values coming from users of your website should always be using bound parameters and never added directly to the SQL query.
Oh yes - see Wikipedia, and the obligatory xkcd reference.
Yes it is a security risk unless you can devote significant resources to testing your function for abuse/misuse.
There are other factors that you should consider. Building dynamic SQL reduces the ability of the database to cache queries - each time you call your function, your query will be parsed as a new one. This might have a significant impact on performance. Using bind parameters doesn't impose the same problem as the database can recognize the parameters as the variable part of the query.

How to differentiate SQL statements that modify data from those that are read only?

I need to write a jdbc compliant driver wrapper whose purpose is to log all SQL statements that modify data. What is the easiest way to differentiate those statements that modify data from those that only read data? I guess that only way is to parse SQL code on the fly, any libraries that can do that?
You could probably find some full blown sql parsers such as this one, but if your wrapper will intercept only single statements then you might (not enough detail) consider SELECT statements as read only and everything else as statements that modify data.
I think a pretty good start is looking for queries starting with INSERT, UPDATE or DELETE. Make sure to trim leading whitespace before testing the strings.
If you want to include schema altering statements, include commands such as CREATE, ALTER, DROP, and TRUNCATE.
The specifics of the above approach will depend on the database you are using. E.g., there may be batch commands in one string, separated by a semicolon. You will need to parse the string to look for instances such as this.
If this is not a general purpose wrapper I would just log every call to the executeXXX() methods, except calls to executeQuery(), and then just call the appropiate method in client code.
If you want it to be general purpose and would like to avoid parsing SQL, you can investigate the getUpdateCount()method, and the return values of the executeXXX() methods. That would imply logging after the statement was executed, and I do not think it would be 100% correct.

Are input sanitization and parameterized queries mutually exclusive?

I'm working updating some legacy code that does not properly handle user input. The code does do a minimal amount of sanitization, but does not cover all known threats.
Our newer code uses parameterized queries. As I understand it, the queries are precompiled, and the input is treated simply as data which cannot be executed. In that case, sanitization is not necessary. Is that right?
To put it another way, if I parameterize the queries in this legacy code, is it OK to eliminate the sanitization that it currently does? Or am I missing some additional benefit of sanitization on top of parameterization?
It's true that SQL query parameters are a good defense against SQL injection. Embedded quotes or other special characters can't make mischief.
But some components of SQL queries can't be parameterized. E.g. table names, column names, SQL keywords.
$sql = "SELECT * FROM MyTable ORDER BY {$columnname} {$ASC_or_DESC}";
So there are some examples of dynamic content you may need to validate before interpolating into an SQL query. Whitelisting values is also a good technique.
Also you could have values that are permitted by the data type of a column but would be nonsensical. For these cases, it's often easier to use application code to validate than to try to validate in SQL constraints.
Suppose you store a credit card number. There are valid patterns for credit card numbers, and libraries to recognize a valid one from an invalid one.
Or how about when a user defines her password? You may want to ensure sufficient password strength, or validate that the user entered the same string in two password-entry fields.
Or if they order a quantity of merchandise, you may need to store the quantity as an integer but you'd want to make sure it's greater than zero and perhaps if it's greater than 1000 you'd want to double-check with the user that they entered it correctly.
Parameterized queries will help prevent SQL injection, but they won't do diddly against cross-site scripting. You need other measures, like HTML encoding or HTML detection/validation, to prevent that. If all you care about is SQL injection, parameterized queries is probably sufficient.
There are many different reasons to sanitize and validate, including preventing cross-site scripting, and simply wanting the correct content for a field (no names in phone numbers). Parameterized queries eliminate the need to manually sanitize or escape against SQL injection.
See one of my previous answers on this.
You are correct, SQL parameters are not executable code so you don't need to worry about that.
However, you should still do a bit of validation. For example, if you expect a varchar(10) and the user inputs something longer than that, you will end up with an exception.
In short no. Input sanitization and the use of parameterized queries are not mutually exclusive, they are independent: you can use neither, either one alone, or both. They prevent different types of attacks. Using both is the best course.
It is important to note, as a minor point, that sometimes it is useful to write stored procedures which contain dynamic SQL. In this case, the fact that the inputs are parameterized is no automatic defense against SQL injection. This may seem a fairly obvious point, but I often run into people who think that because their inputs are parameterized they can just stop worrying about SQL Injection.

What do we consider as a dynamic sql statement?

a) What do we consider as a dynamic sql statement?
Any sql statement that dynamically adds a clause(s) or even just a part of a clause to a SQL string?
b)Aren’t then parameterized strings that use placeholders for dynamically supplied values also considered dynamic sql statements?
thanx
A dynamic SQL statement is a statement that is built at execution time. The emphasis lies on statement. So, it isn't dynamic SQL if you just supply a value at execution time.
Dynamic SQL statements generally refers those that are constructed using string concatenation.
"SELECT name FROM names WHERE id=" + this.id;
"SELECT name FROM names WHERE id=" + this.id + " AND age=" this.age;
Parameterized queries are also dynamic but not in terms of construct. You can only change parameters but you can't change the structure of the statement i.e add WHERE clauses.
Parameterized queries are often at the database level so the database can cache the execution plan of the query and use it over and over. Not quite possible in the first case since a simple change in the text or the order of the where clauses can cause the database to not recognize the previously cached execution plan and start over.
The first construct is also vulnerable to SQL injection since it is hard to validate input values for attempts to inject rogue SQL.
Certainly anything involving EXEC (#sql) or EXEC sp_ExecuteSQL #sql, ... (i.e. dynamic at the database itself) would qualify, but I guess you could argue that any SQL generated at runtime (rather than fixed at build / install) would qualify.
So yes, you could argue that a runtime-generated, yet correctly parameterized query is "dynamic" (for example, LINQ-to-SQL generated queries), but to be honest as long as it doesn't expose you to injection attacks I don't care about the name ;-p
Dynamic sql is basically just any sql that is not fully constructed until runtime. Its generated on-the-fly by concatenating runtime values into a statement. Could be any part of an sql statement
A. Anything that will cause the DB server to evaluate strings as SQL.
B. No, as they still go through the DB driver/provider and get cleaned up.
For point b) You already know the statement and you pass in known parameters (hopefully type safe and not string literals).
I consider a dynamic SQL statement to be one that accepts new values at runtime in order to return a different result. "New values", by my reckoning, can be a different ORDER BY, new WHERE criteria, different field selections, etc.
a) What do we consider as a dynamic sql statement?
Any sql statement that dynamically adds a clause(s) or even just a part of a clause to a SQL string?
Both - any query altered/tailored prior to execution.
b)Aren’t then parameterized strings that use placeholders for dynamically supplied values also considered dynamic sql statements?
Parameterized queries, AKA using bind variables, are supplying different filter criteria to the query. You can use (my_variable IS NULL OR ...), but OR performance is generally terrible on any db & the approach destroys sargability.
Dynamic SQL generally deals with tailoring the query to include other logic, like JOINs that only need to be included if a specific parameter is set. However, there are limitations like the IN clause not supporting converting a comma delimited string as it's list of options - for this you would have to use dynamic SQL, or handle the comma delimited list in another fashion (CLR, pipelined into a temp table, etc).
I see where you're going with this, but one somewhat objective criteria which defines a particular situation as Dynamic SQL vs. say a prepared statement is...
...the fact that dynamic statements cause the SQL server to fully evaluate the query, to define a query plan etc.
With prepared statements, SQL can (and does unless explicitly asked) cache the query plan (and in some cases, even gather statistics about the returns etc.).
[Edit: effectively, even dynamic SQL statements are cached, but such cached plans have a much smaller chance of being reused because the exact same query would need to be received anew for this to happen (unlike with parametrized queries where the plan is reused even with distinct parameter values and of course, unless "WITH RECOMPILE")]
So in the cases from the question, both a) and b) would be considered dynamic SQL, because, in the case of b), the substitution takes place outside of SQL. This is not a prepared statement. SQL will see it as a totally novel statement, as it doesn't know that you are merely changing the search values.
Now... beyond the SQL-centric definition of dynamic SQL, it may be useful to distinguish between various forms of dynamic SQL, such as say your a) and b) cases. Dynamic SQL has had a bad rep for some time, some of it related to SQL injection awareness. I've seen applications where dynamic SQL is produced, and executed, within a Stored Procedure, and because it is technically "SQL-side" some people tended to accept it (even though SQL-injecting in particular may not have been addressed)...
The point I'm trying to make here is that building a query, dynamically, from various contextual elements is often needed, and one may be better advised to implement this in "application" layers (well, call it application-side layers, for indeed, it can and should be separate from application per-se), where the programming language and associated data structures are typically easier and more expressive than say T-SQL and such. So jamming this into SQL for the sake of calling it "Data-side" isn't a good thing, in my opinion.