One function building SQL-strings by parameters. A security risk? - sql

I try to avoid writing the SQL-code for each and every update or insert. Instead a PHP-function takes some parameters (add or change, tablename, fields-list) and builds the SQL from it. This is similar, but not identical, to the preferred method of using parameterized statements (mysqli).
Is my way of creating SQL putting a big smile in the face of hackers?

It's only a security risk if you allow any user input to get added to the SQL queries. You are fine if you are using constant parameters defined in your source code, but the actual values coming from users of your website should always be using bound parameters and never added directly to the SQL query.

Oh yes - see Wikipedia, and the obligatory xkcd reference.

Yes it is a security risk unless you can devote significant resources to testing your function for abuse/misuse.
There are other factors that you should consider. Building dynamic SQL reduces the ability of the database to cache queries - each time you call your function, your query will be parsed as a new one. This might have a significant impact on performance. Using bind parameters doesn't impose the same problem as the database can recognize the parameters as the variable part of the query.

Related

When to use function and when to use stored procedure in SQL Server [duplicate]

This question already has answers here:
Function vs. Stored Procedure in SQL Server
(19 answers)
Closed 7 years ago.
When to use function and when to use stored procedure in SQL Server?
I would like to know about people's thoughts and experience on it. Also, would like to know when to use the views. I am not looking for the definition of these db objects. A practical scenario discussion would be good
In my eyes it is a very bad habit to use SPs just to read data.
You must distinguish between
SP: A batch, normally multi statement, your are allowed to do (almost) everything. The biggest flaw is, that you cannot continue with the SPs result easily and that - if you want to use a SPs return in further queries - you always have to write this into a correctly declared table (real, temp or variable). This can be a lot of error prone typing. And furthermore, the optimizer will not be able to deal with this performantly.
TV-UDF (Table valued User Defined Function): One must be aware of the fact that there exist two flavours: single-statement (ad-hoc) or multi-statment. The first is good, the second (in almost all cases) very bad! One advantage over the VIEW is, that parameters and their handling is pre-compiled.
VIEW: This is as good as an ad-hoc TV-UDF. You can declare it with schema binding and deal with it almost as if it was a table (indexes and so on)...
Fazit: Use SPs for UPDATE, DELETE, any kind of data or structure manipulation but not for sole reading.
In my daily work for example, I use sp's to retrieve, store, update and delete records, I execute those sps from web application most of the time, but sometimes when some sp needs some complex or repetitive operation then I use a function to reuse that operation if needed or to leave a more clean code in my sp.
We do not normally use Views, but you can use them for simplicity, you don't need to make the same long Select command with joins if you just call the view.
that's my experience, sry for my english :).

Keeping dynamic out of SQL while using specifications with stored procedures

A specification essentially is a text string representing a "where" clause created by an end user.
I have stored procedures that copy a set of related tables and records to other places. The operation is always the same, but dependent on some crazy user requirements like "products that are frozen and blue and on sale on Tuesday".
What if we fed the user specification (or string parameter) to a scalar function that returned true/false which executed the specification as dynamic SQL or just exec (#variable).
It could tell us whether those records exist. We could add the result of the function to our copy products where clause.
It would keep us from recompiling the copy script each time our where clauses changed. Plus it would isolate the product selection in to a single function.
Anyone ever do anything like this or have examples? What bad things could come of it?
EDIT:
This is the specification I simply added to the end of each insert/select statement:
and exists (
select null as nothing
from SameTableAsOutsideTable inside
where inside.ID = outside.id and -- Join operations to outside table
inside.page in (6, 7) and -- Criteria 1
inside.dept in (7, 6, 2, 4) -- Criteria 2
)
It would be great to feed a parameter into a function that produces records based on the user criteria, so all that above could be something like:
and dbo.UserCriteria( #page="6,7", #dept="7,6,2,4")
Dynamic Search Conditions in T-SQL
When optimizing SQL the important thing is optimizing the access path to data (ie. index usage). This trumps code reuse, maintainability, nice formatting and just about every other development perk you can think of. This is because a bad access path will cause the query to perform hundreds of times slower than it should. The article linked sums up very well all the options you have, and your envisioned function is nowhere on the radar. Your options will gravitate around dynamic SQL or very complicated static queries. I'm afraid there is no free lunch on this topic.
It doesn't sound like a very good idea to me. Even supposing that you had proper defensive coding to avoid SQL injection attacks it's not going to really buy you anything. The code still needs to be "compiled" each time.
Also, it's pretty much always a bad idea to let users create free-form WHERE clauses. Users are pretty good at finding new and innovative ways to bring a server to a grinding halt.
If you or your users or someone else in the business can't come up with some concrete search requirements then it's likely that someone isn't thinking about it hard enough and doesn't really know what they want. You can have pretty versatile search capabilities without letting the users completely loose on the system. Alternatively, look at some of the BI tools out there and consider creating a data mart where they can do these kinds of ad hoc searches.
How about this:
You create another store procedure (instead of function) and pass the right condition to it.
Based on that condition it dumps the record ids to a temp table.
Next you move procedure will read ids from that table and do the needful things?
Or you could create a user function that returns a table which is nothing but the ids of the records that matches your criteria (dynamic)
If I am totally off, then please clarify me.
Hope this helps.
If you are forced to use dynamic queries and you don't have any solid and predefined search requirements, it is strongly recommended to use sp_executesql instead of EXEC . It provides parametrized queries to prevent SQL Injection attacks (to some extent) and It makes use of execution plans to speed up performance. (More info)

Are input sanitization and parameterized queries mutually exclusive?

I'm working updating some legacy code that does not properly handle user input. The code does do a minimal amount of sanitization, but does not cover all known threats.
Our newer code uses parameterized queries. As I understand it, the queries are precompiled, and the input is treated simply as data which cannot be executed. In that case, sanitization is not necessary. Is that right?
To put it another way, if I parameterize the queries in this legacy code, is it OK to eliminate the sanitization that it currently does? Or am I missing some additional benefit of sanitization on top of parameterization?
It's true that SQL query parameters are a good defense against SQL injection. Embedded quotes or other special characters can't make mischief.
But some components of SQL queries can't be parameterized. E.g. table names, column names, SQL keywords.
$sql = "SELECT * FROM MyTable ORDER BY {$columnname} {$ASC_or_DESC}";
So there are some examples of dynamic content you may need to validate before interpolating into an SQL query. Whitelisting values is also a good technique.
Also you could have values that are permitted by the data type of a column but would be nonsensical. For these cases, it's often easier to use application code to validate than to try to validate in SQL constraints.
Suppose you store a credit card number. There are valid patterns for credit card numbers, and libraries to recognize a valid one from an invalid one.
Or how about when a user defines her password? You may want to ensure sufficient password strength, or validate that the user entered the same string in two password-entry fields.
Or if they order a quantity of merchandise, you may need to store the quantity as an integer but you'd want to make sure it's greater than zero and perhaps if it's greater than 1000 you'd want to double-check with the user that they entered it correctly.
Parameterized queries will help prevent SQL injection, but they won't do diddly against cross-site scripting. You need other measures, like HTML encoding or HTML detection/validation, to prevent that. If all you care about is SQL injection, parameterized queries is probably sufficient.
There are many different reasons to sanitize and validate, including preventing cross-site scripting, and simply wanting the correct content for a field (no names in phone numbers). Parameterized queries eliminate the need to manually sanitize or escape against SQL injection.
See one of my previous answers on this.
You are correct, SQL parameters are not executable code so you don't need to worry about that.
However, you should still do a bit of validation. For example, if you expect a varchar(10) and the user inputs something longer than that, you will end up with an exception.
In short no. Input sanitization and the use of parameterized queries are not mutually exclusive, they are independent: you can use neither, either one alone, or both. They prevent different types of attacks. Using both is the best course.
It is important to note, as a minor point, that sometimes it is useful to write stored procedures which contain dynamic SQL. In this case, the fact that the inputs are parameterized is no automatic defense against SQL injection. This may seem a fairly obvious point, but I often run into people who think that because their inputs are parameterized they can just stop worrying about SQL Injection.

What do we consider as a dynamic sql statement?

a) What do we consider as a dynamic sql statement?
Any sql statement that dynamically adds a clause(s) or even just a part of a clause to a SQL string?
b)Aren’t then parameterized strings that use placeholders for dynamically supplied values also considered dynamic sql statements?
thanx
A dynamic SQL statement is a statement that is built at execution time. The emphasis lies on statement. So, it isn't dynamic SQL if you just supply a value at execution time.
Dynamic SQL statements generally refers those that are constructed using string concatenation.
"SELECT name FROM names WHERE id=" + this.id;
"SELECT name FROM names WHERE id=" + this.id + " AND age=" this.age;
Parameterized queries are also dynamic but not in terms of construct. You can only change parameters but you can't change the structure of the statement i.e add WHERE clauses.
Parameterized queries are often at the database level so the database can cache the execution plan of the query and use it over and over. Not quite possible in the first case since a simple change in the text or the order of the where clauses can cause the database to not recognize the previously cached execution plan and start over.
The first construct is also vulnerable to SQL injection since it is hard to validate input values for attempts to inject rogue SQL.
Certainly anything involving EXEC (#sql) or EXEC sp_ExecuteSQL #sql, ... (i.e. dynamic at the database itself) would qualify, but I guess you could argue that any SQL generated at runtime (rather than fixed at build / install) would qualify.
So yes, you could argue that a runtime-generated, yet correctly parameterized query is "dynamic" (for example, LINQ-to-SQL generated queries), but to be honest as long as it doesn't expose you to injection attacks I don't care about the name ;-p
Dynamic sql is basically just any sql that is not fully constructed until runtime. Its generated on-the-fly by concatenating runtime values into a statement. Could be any part of an sql statement
A. Anything that will cause the DB server to evaluate strings as SQL.
B. No, as they still go through the DB driver/provider and get cleaned up.
For point b) You already know the statement and you pass in known parameters (hopefully type safe and not string literals).
I consider a dynamic SQL statement to be one that accepts new values at runtime in order to return a different result. "New values", by my reckoning, can be a different ORDER BY, new WHERE criteria, different field selections, etc.
a) What do we consider as a dynamic sql statement?
Any sql statement that dynamically adds a clause(s) or even just a part of a clause to a SQL string?
Both - any query altered/tailored prior to execution.
b)Aren’t then parameterized strings that use placeholders for dynamically supplied values also considered dynamic sql statements?
Parameterized queries, AKA using bind variables, are supplying different filter criteria to the query. You can use (my_variable IS NULL OR ...), but OR performance is generally terrible on any db & the approach destroys sargability.
Dynamic SQL generally deals with tailoring the query to include other logic, like JOINs that only need to be included if a specific parameter is set. However, there are limitations like the IN clause not supporting converting a comma delimited string as it's list of options - for this you would have to use dynamic SQL, or handle the comma delimited list in another fashion (CLR, pipelined into a temp table, etc).
I see where you're going with this, but one somewhat objective criteria which defines a particular situation as Dynamic SQL vs. say a prepared statement is...
...the fact that dynamic statements cause the SQL server to fully evaluate the query, to define a query plan etc.
With prepared statements, SQL can (and does unless explicitly asked) cache the query plan (and in some cases, even gather statistics about the returns etc.).
[Edit: effectively, even dynamic SQL statements are cached, but such cached plans have a much smaller chance of being reused because the exact same query would need to be received anew for this to happen (unlike with parametrized queries where the plan is reused even with distinct parameter values and of course, unless "WITH RECOMPILE")]
So in the cases from the question, both a) and b) would be considered dynamic SQL, because, in the case of b), the substitution takes place outside of SQL. This is not a prepared statement. SQL will see it as a totally novel statement, as it doesn't know that you are merely changing the search values.
Now... beyond the SQL-centric definition of dynamic SQL, it may be useful to distinguish between various forms of dynamic SQL, such as say your a) and b) cases. Dynamic SQL has had a bad rep for some time, some of it related to SQL injection awareness. I've seen applications where dynamic SQL is produced, and executed, within a Stored Procedure, and because it is technically "SQL-side" some people tended to accept it (even though SQL-injecting in particular may not have been addressed)...
The point I'm trying to make here is that building a query, dynamically, from various contextual elements is often needed, and one may be better advised to implement this in "application" layers (well, call it application-side layers, for indeed, it can and should be separate from application per-se), where the programming language and associated data structures are typically easier and more expressive than say T-SQL and such. So jamming this into SQL for the sake of calling it "Data-side" isn't a good thing, in my opinion.

Benefits of using a Case statement over an If statement in a stored procedure?

Hi are there any pros / cons relating to the speed that a stored procedure executes when using an IF statement or choosing to use a CASE statement instead?
I try not to use IFs if I can avoid it, because they are non transactional, i.e. you have generally no guarantee that the condition of the IF will still be valid inside the BEGIN... END block of the IF.
I assume that you mean using CASE statements inside a SQL SELECT, for example. If the usage is meant to avoid an IF then, by all means, do it.
The golden rule of SQL is: let the server figure out HOW to do things, tell the server what you WANT.
Don't concern yourself with any speed difference between a case and an IF statement. Because you're dealing with a database, the amount of time it takes to write or read from disk is going to dwarf the time it takes the CPU to branch via an IF or Case. You should focus instead on which makes your code more readable and maintainable.