I'm working updating some legacy code that does not properly handle user input. The code does do a minimal amount of sanitization, but does not cover all known threats.
Our newer code uses parameterized queries. As I understand it, the queries are precompiled, and the input is treated simply as data which cannot be executed. In that case, sanitization is not necessary. Is that right?
To put it another way, if I parameterize the queries in this legacy code, is it OK to eliminate the sanitization that it currently does? Or am I missing some additional benefit of sanitization on top of parameterization?
It's true that SQL query parameters are a good defense against SQL injection. Embedded quotes or other special characters can't make mischief.
But some components of SQL queries can't be parameterized. E.g. table names, column names, SQL keywords.
$sql = "SELECT * FROM MyTable ORDER BY {$columnname} {$ASC_or_DESC}";
So there are some examples of dynamic content you may need to validate before interpolating into an SQL query. Whitelisting values is also a good technique.
Also you could have values that are permitted by the data type of a column but would be nonsensical. For these cases, it's often easier to use application code to validate than to try to validate in SQL constraints.
Suppose you store a credit card number. There are valid patterns for credit card numbers, and libraries to recognize a valid one from an invalid one.
Or how about when a user defines her password? You may want to ensure sufficient password strength, or validate that the user entered the same string in two password-entry fields.
Or if they order a quantity of merchandise, you may need to store the quantity as an integer but you'd want to make sure it's greater than zero and perhaps if it's greater than 1000 you'd want to double-check with the user that they entered it correctly.
Parameterized queries will help prevent SQL injection, but they won't do diddly against cross-site scripting. You need other measures, like HTML encoding or HTML detection/validation, to prevent that. If all you care about is SQL injection, parameterized queries is probably sufficient.
There are many different reasons to sanitize and validate, including preventing cross-site scripting, and simply wanting the correct content for a field (no names in phone numbers). Parameterized queries eliminate the need to manually sanitize or escape against SQL injection.
See one of my previous answers on this.
You are correct, SQL parameters are not executable code so you don't need to worry about that.
However, you should still do a bit of validation. For example, if you expect a varchar(10) and the user inputs something longer than that, you will end up with an exception.
In short no. Input sanitization and the use of parameterized queries are not mutually exclusive, they are independent: you can use neither, either one alone, or both. They prevent different types of attacks. Using both is the best course.
It is important to note, as a minor point, that sometimes it is useful to write stored procedures which contain dynamic SQL. In this case, the fact that the inputs are parameterized is no automatic defense against SQL injection. This may seem a fairly obvious point, but I often run into people who think that because their inputs are parameterized they can just stop worrying about SQL Injection.
Related
If we take a database table, we can query all the rows or we can choose to apply a filter on it. The filter can vary depending on the user input. In cases when there are few options we can specify different queries for those few specific conditions. But if there are lots and lots of options that user might or might not specify, aforementioned method does not come handy. I know, I can compose the filter based upon the user input and send it as a string to the corresponding stored procedure as a parameter, build the query with that filter and finally execute the query string with the help of EXECUTE IMMEDIATE(In Oracle's case). Don't know why but I really don't like this way of query building. I think this way I leave the doors open for SQL injectors. And besides, that I always have trouble with the query itself as everything is just a string and I need to handle dates and numbers carefully.What is the best and most used method of forming the WHERE clause of a query against a database table?
Using database parameters instead of attempting to quote your literals is the way forward.
This will guard you against SQL injection.
A common way of approaching this problem is building expression trees that represent your query criteria, converting them to parameterized SQL (to avoid SQL injection risks), binding parameter values to the generated SQL, and executing the resultant query against your target database.
The exact approach depends on your client programming framework: .NET has Entity Framework and LINQ2SQL that both support expression trees; Java has Hibernate and JPA, and so on. I have seen several different frameworks used to construct customizable queries, with great deal of success. In situations when these frameworks are not available, you can roll your own, although it requires a lot more work.
My question is basically this: If I use a parameterized statement/prepared statement to insert a user input string into a table, then get that value later and use it for dynamically constructing a table's column values, does that leave me open to SQLInjection?
Specific example:
If i store a user's input string into a table using a parameterized statement, then select that TEXT from that table and store it in a local variable (String localVariable) in my program and CREATE a table with something like:
"CREATE TABLE InjectFree (" + localVariable + " TEXT)"
would my localVariable be free of injectable sql code? I know there are alternatives (and will probably use an alternative just to be on the safe side), but I guess I'm just wondering what parameterizing a value actually does and what effect it has on the data being stored in the table.
you will be in danger.
the parameterized insert will protect from injection on that otriginal insert statement, but not from the next use.
If you use parameterised queries the query and the data are supplied to the database separately. This has several effects...
It allows the RDBMS to see that the query is identical to previous instances of that query. (If you embed the data as static values in the query string, the RDBMS will not see that the query is the same and only the data has changed.) This allows execution plan re-use and other beneficial characteristic of the RDBMS.
It simplifies the data validation. This is relevant to injection attacks. No matter what values are substituted into the parameter, the data is always just data. It will never be treated as part of the query.
This latter point, however, also means that you can't do this...
INSERT INTO #tableName(#fieldName) VALUES (#dataValue)
Each parameter is treated as a data item. It isn't a loosely bound script, the value in #tableName won't be substituted into the script. The query must be hard-coded with the table and field names. Only true data items can be passed as parameters.
This often feels like a limitation to users of java script, etc. It is, however, the mecahnism that prtects you from SQL Injection attacks. It's a good thing :)
This means that to allow user defined Data Definition Lanaguage (Such as a CREATE TABLE) you need to concatenate the different parts of the string together yourself. And virtually no matter what you do to protect yourself from a SQL Injection Attack here, some-one will find a way through.
As soon as you allow a user to specify table names, field names, etc, you become immediately open to attack. The only safe way is to have a white-list of allowable strings.
I try to avoid writing the SQL-code for each and every update or insert. Instead a PHP-function takes some parameters (add or change, tablename, fields-list) and builds the SQL from it. This is similar, but not identical, to the preferred method of using parameterized statements (mysqli).
Is my way of creating SQL putting a big smile in the face of hackers?
It's only a security risk if you allow any user input to get added to the SQL queries. You are fine if you are using constant parameters defined in your source code, but the actual values coming from users of your website should always be using bound parameters and never added directly to the SQL query.
Oh yes - see Wikipedia, and the obligatory xkcd reference.
Yes it is a security risk unless you can devote significant resources to testing your function for abuse/misuse.
There are other factors that you should consider. Building dynamic SQL reduces the ability of the database to cache queries - each time you call your function, your query will be parsed as a new one. This might have a significant impact on performance. Using bind parameters doesn't impose the same problem as the database can recognize the parameters as the variable part of the query.
I know i can use stored procedures to eliminate sql injection attacks but it would bloat the code by more than I'm willing to accept and making it costly to maintain.
In my dynamic sql query, I would like to search a string of text in 2 columns in one of my tables but before that happens, I would like my business layer, which is written in c#, to sanitize sanitize the input. I would like the input to have special characters (ie: #,!, $, etc.) What is the minimal character set that i have to strip out in my search string to sanitize it? I'm thinking that stripping out single and double quotes is sufficient. Is that correct?
Thanks
You don't need to use stored procedures to be safe. (As a matter a fact, stored procedures don't necessarily guarantee safety against injection attacks if the stored procedures themselves construct dynamic queries.) And manual escaping is difficult to do 100% safely, and not recommended.
Instead, use parameterized queries, which nearly all databases support.
If you use stored procedures or parameterized statements, you shouldn't need to sanitize anything, unless you are building strings blindly in dynamic SQL within the procedure. If that is the case, please read Erland's excellent article on dynamic SQL:
http://sommarskog.se/dynamic_sql.html
Also semicolons to stop subsequent statements begin defined (necessary, but not sufficient)
You're approaching this from an unsafe direction. You want to define the set of characters that should be allowed (checking that they're not special, etc), and then strip everything not in that set.
You should probably also look into SqlCommands as a safer way to build the string sent to the DB.
A "static" query is one that remains the same at all times. For example, the "Tags" button on Stackoverflow, or the "7 days" button on Digg. In short, they always map to a specific database query, so you can create them at design time.
But I am trying to figure out how to do "dynamic" queries where the user basically dictates how the database query will be created at runtime. For example, on Stackoverflow, you can combine tags and filter the posts in ways you choose. That's a dynamic query albeit a very simple one since what you can combine is within the world of tags. A more complicated example is if you could combine tags and users.
First of all, when you have a dynamic query, it sounds like you can no longer use the substitution api to avoid sql injection since the query elements will depend on what the user decided to include in the query. I can't see how else to build this query other than using string append.
Secondly, the query could potentially span multiple tables. For example, if SO allows users to filter based on Users and Tags, and these probably live in two different tables, building the query gets a bit more complicated than just appending columns and WHERE clauses.
How do I go about implementing something like this?
The first rule is that users are allowed to specify values in SQL expressions, but not SQL syntax. All query syntax should be literally specified by your code, not user input. The values that the user specifies can be provided to the SQL as query parameters. This is the most effective way to limit the risk of SQL injection.
Many applications need to "build" SQL queries through code, because as you point out, some expressions, table joins, order by criteria, and so on depend on the user's choices. When you build a SQL query piece by piece, it's sometimes difficult to ensure that the result is valid SQL syntax.
I worked on a PHP class called Zend_Db_Select that provides an API to help with this. If you like PHP, you could look at that code for ideas. It doesn't handle any query imaginable, but it does a lot.
Some other PHP database frameworks have similar solutions.
Though not a general solution, here are some steps that you can take to mitigate the dynamic yet safe query issue.
Criteria in which a column value belongs in a set of values whose cardinality is arbitrary does not need to be dynamic. Consider using either the instr function or the use of a special filtering table in which you join against. This approach can be easily extended to multiple columns as long as the number of columns is known. Filtering on users and tags could easily be handled with this approach.
When the number of columns in the filtering criteria is arbitrary yet small, consider using different static queries for each possibility.
Only when the number of columns in the filtering criteria is arbitrary and potentially large should you consider using dynamic queries. In which case...
To be safe from SQL injection, either build or obtain a library that defends against that attack. Though more difficult, this is not an impossible task. This is mostly about escaping SQL string delimiters in the values to filter for.
To be safe from expensive queries, consider using views that are specially crafted for this purpose and some up front logic to limit how those views will get invoked. This is the most challenging in terms of developer time and effort.
If you were using python to access your database, I would suggest you use the Django model system. There are many similar apis both for python and for other languages (notably in ruby on rails). I am saving so much time by avoiding the need to talk directly to the database with SQL.
From the example link:
#Model definition
class Blog(models.Model):
name = models.CharField(max_length=100)
tagline = models.TextField()
def __unicode__(self):
return self.name
Model usage (this is effectively an insert statement)
from mysite.blog.models import Blog
b = Blog(name='Beatles Blog', tagline='All the latest Beatles news.')
b.save()
The queries get much more complex - you pass around a query object and you can add filters / sort elements to it. When you finally are ready to use the query, Django creates an SQL statment that reflects all the ways you adjusted the query object. I think that it is very cute.
Other advantages of this abstraction
Your models can be created as database tables with foreign keys and constraints by Django
Many databases are supported (Postgresql, Mysql, sql lite, etc)
DJango analyses your templates and creates an automatic admin site out of them.
Well the options have to map to something.
A SQL query string CONCAT isn't a problem if you still use parameters for the options.