Avoid string concatenation to create queries - sql

Martin Fowler in his book Patterns of enterprise application architecture says
A good rule of thumb is to avoid string concatenation to put together
SQL queries
It is a practice that I use quite often, to abstract the syntax of my SQL queries from the real data of the query.
Can you explain me why this is considered a bad practice?

While there might be usecases where you build a prepared statement by string-concatenation before compiling it, it is always bad practice to insert query-parameters using string-concatenation for two reasons:
Performance: When using a prepared statement the query-syntax has to be parsed only once and the access-path has to be calculated only once for each distinct query-type. When building statements by string-concatenation parsing and optimizing has to be done for each execution of the query.
Security: Using string-concatenation with data provided by the user is always prone to SQL-injection-attacks. Suppose you got a statement:
query = "select secret_data from users where userid = '" + userid_param + "'";
And imagine someone sends a userid_param containing "' OR 1=1;"...
This way the only way to defend is doing 100% correct input-sanitation which might be quite hard to get right depending on the language used. When using prepared statements with a properly implemented driver the driver will isolate the statement form the query-parameters so nothing will be mixed up.

Related

What's the common practice in constituting the WHERE clause based on the user input

If we take a database table, we can query all the rows or we can choose to apply a filter on it. The filter can vary depending on the user input. In cases when there are few options we can specify different queries for those few specific conditions. But if there are lots and lots of options that user might or might not specify, aforementioned method does not come handy. I know, I can compose the filter based upon the user input and send it as a string to the corresponding stored procedure as a parameter, build the query with that filter and finally execute the query string with the help of EXECUTE IMMEDIATE(In Oracle's case). Don't know why but I really don't like this way of query building. I think this way I leave the doors open for SQL injectors. And besides, that I always have trouble with the query itself as everything is just a string and I need to handle dates and numbers carefully.What is the best and most used method of forming the WHERE clause of a query against a database table?
Using database parameters instead of attempting to quote your literals is the way forward.
This will guard you against SQL injection.
A common way of approaching this problem is building expression trees that represent your query criteria, converting them to parameterized SQL (to avoid SQL injection risks), binding parameter values to the generated SQL, and executing the resultant query against your target database.
The exact approach depends on your client programming framework: .NET has Entity Framework and LINQ2SQL that both support expression trees; Java has Hibernate and JPA, and so on. I have seen several different frameworks used to construct customizable queries, with great deal of success. In situations when these frameworks are not available, you can roll your own, although it requires a lot more work.

SQL injection and solution

SQL injection and solution
What would be the solution?
you should do prepared SQL statement, and the way to do it differs from one programming language to another. regarding what could he do with SQL injection worst case he could dump the DB content. how to prevent it check the following link http://www.marcofolio.net/features/how_you_can_prevent_an_sql_injection.html
I guess your best approach is to have your SQL statements as far as possible from your UI, overall you need to understand how a SQL injection happens, so for example, you have a query like this:
select name from users where password='ValueFromTxtPassword' --not good approach
an attacker could write this in you TxtPassword field:
0 or 1=1; select * from creditcards --
this effectively makes a valid query like this:
select name from users where password=0 or 1=1; ======> valid query is always true
select * from creditcards -- ===> makes another query and ignores the rest of the lines in your SQL statement, giving you all the records from your creditcard table
So, to avoid that, you can have a stored procedure like
authenticate(username, password) --Stored procedures force you to pass only the needed info
another good approach is to use a 'query' complex object that builds up your sql statement according to your needs
public class query{
public List<column> projections {get; set;} //use this to build your SELECT
public List<condition> filters { get; set;} //use this to build your WHERE
public List<condition> sorting { get; set;} //use this to build your ORDER BY
}
using ORMs like Hibernate usually force you to build this query complex objects because you are not able to map table maps but only return domain objects.
It's not good. You need to protect the SQL queries to client requests. If it's too hard then the best way if you'll use a framework. For example if You familiarly for PHP then you can use Symfony or Zend.
1) Filter Input- Stop believing your users: The biggest threat to the application is from its users. Users need not be well mannered and obedient as you are expecting. Some users have really bad intentions and some simply try to test their hacking skills. Whatever code you are going to write, write it using the best practices and consider the security aspects of it. Validate every field in the form
2) Use database wrapper classes or PDO –
Database wrappers or PDO (in PHP) can reduce the risk of direct access of the input values to the database.
Prepared statements can be used along with PDO as shown below.
http://www.itechnicalblog.com/what-is-a-sql-injection-and-how-to-fix-it/
enter link description here
A server-side solution would be to have the database server reject unexpected SQL queries (queries whose parse-tree hash is not in a set of known hashes).
Basically, the idea is fairly simple and others have had similar thoughts.
https://www.researchgate.net/profile/Debabrata_Kar/publication/261318459_Prevention_of_SQL_Injection_attack_using_query_transformation_and_hashing/links/545cf9180cf295b5615e6452.pdf
My idea is just slightly different in that I suggest moving the parsing from the client to the server where there is already a parse step.
Algorithm:
1) On the database server, after parsing the SQL, walk the parse-tree and produce a string of opcode tokens. Compute a MD5 hash on the string.
2) This hash then represents the shape of the parse-tree. For a web-application, the number of unique SQL queries will be fairly small.
3) The database server keeps a per-user list of valid hashes and warns/fails on new hashes.
4) Add a function to the API to pre-load this list of hashes and call this function with a list of known hashes at the web application startup.
The way SQL Injection works is to insert characters resulting in a different parse-tree than the one envisioned by the programmer.
The parse-tree hashes can detect unexpected parse-trees.
The downside of this proposal is that no current database system does it. The closest I have seen so far is Postgresql which keeps a hash of the parse tree in pg_stat for informational purposes.

Best practice between these two queries

I was in a user group meeting yesterday and they pointed out that using parameterized queries is better than harcoding the query. That got me to thinking, does this do anything beneficial(obviously on a much bigger scale than this though):
DECLARE #Client1 UNIQUEIDENTIFIER,
#Client2 UNIQUEIDENTIFIER
SET #ClientId1 ='41234532-2342-3456-3456-123434543212';
SET #ClientId2 = '12323454-3432-3234-5334-265456787654';
SELECT ClientName
FROM dbo.tblclient
WHERE id IN (#Client1,#Client2)
As opposed to:
SELECT ClientName
FROM dbo.tblclient
WHERE id IN ('41234532-2342-3456-3456-123434543212','12323454-3432-3234-5334-265456787654')
Parametrized queries and IN clause are actually not trivially implemented together if your IN list changes from time to time.
Read this SO question and answers: Parameterize an SQL IN clause
Parameters, by design, are one value only. Everything else other than that must be manually implemented, having in mind security issues, such as SQL Injection.
From a performance perspective, you will have better performance for parametrized queries, specially if a same query is ran repeatedly, but with different parameters values. However, if you have a dynamic IN list (sometimes 2 items, sometimes 3), you might not get the advantage of using parametrized queries.
Do not lose hope, though. Some folks have been able to implement that (Parametrized queries and IN clause). It's, again, not trivial, though.
On huge databases and complex queries with many joins the database can use time building an execute plan. When using parameterized queries the execute plan stays in the database cache for some time when calling the query multiple times with different parameters
It shouldn't hurt, but you're going to get the most effect from prepared statements when you use queries that are generated by user input. If they're clicking a button to "show all", it's not a big deal; however, if you're prompting for a user to enter their name, you seriously need to parameterize the input before inserting/updating/selecting/etc.
For example, if I entered my name as "Mike DROP TABLE MASTER);" or whatever a big table name is in your DB, It could get really ugly for you. Better safe than sorry, right?
EDIT: OP commented here and asked a question. Updated with a code example.
public int myNum;
SqlParameter spNum=new SqlParameter("#myNum", SqlDbType.Int);
//you can also check for null here (but not really relevant in this case)
command.Parameters.Add(spNum);
string sql="INSERT INTO Table(myNum)";
sql+=" VALUES(#myNum)";
command.CommandText = sql;
int resultsCt = command.ExecuteNonQuery();
See how the code is forcing the input to be an integer BEFORE it does any work with the database? That way if anybody tries any shenanigans it's rejected before it can do harm to the DB.

HQL vs. SQL / Hibernate netbeans HQL editor

I am teaching my self hibernate, and am quite confused of why i can not just write simple SQL queries.
I find it rather more confusing to use than plain SQL (what I am used to)
PLUS: The NetBeans HQL editor I found pretty annoying, It is harder for me to produce a right query in HQL, then in SQL, and why does the shown SQL differ from actual SQL statements ?
So why use it ? - As it is known that hibernate is very resource extensive, and I believe that hibernate is the reason for our app to running out of memory very often, as during the process of redeploying e.g...
I am very interested in knowing why I should use Hibernate and not plain SQL (mysql) statements !?
And maybe a good link for hibernate queries would be nice ;), I am using this one atm:
http://docs.jboss.org/hibernate/core/3.3/reference/en/html/queryhql.html
But also interested in any good link explaining the setup of the queries, the mapping, underlying construction etc..
Best Regards
Alex
HQL is object oriented and it's done with the purpose of working on the Java objects representing your DB tables.
A basic advantage is that you can put placeholders like :orderNumber (using the colon simbol) in the HQL query and replace with the value of a variable. For example:
int orderNumber = 685412;
List<Order> l=
session.createQuery("from Order where orderNumber = :orderNumber")
.setParameter("orderNumber",orderNumber).list();
In this way you can modify orderNumber in a simple way, evoiding the classical
String query = "select * from Order where orderNumber = " + orderNumber + "...";
Morover using MySQL syntax would sometimes turn your code not reusable if you migrate your DB to another DBMS.
Anyway I'm still not so convinced about the preference on HQL.
Here you can find the full grammar definition.

What do we consider as a dynamic sql statement?

a) What do we consider as a dynamic sql statement?
Any sql statement that dynamically adds a clause(s) or even just a part of a clause to a SQL string?
b)Aren’t then parameterized strings that use placeholders for dynamically supplied values also considered dynamic sql statements?
thanx
A dynamic SQL statement is a statement that is built at execution time. The emphasis lies on statement. So, it isn't dynamic SQL if you just supply a value at execution time.
Dynamic SQL statements generally refers those that are constructed using string concatenation.
"SELECT name FROM names WHERE id=" + this.id;
"SELECT name FROM names WHERE id=" + this.id + " AND age=" this.age;
Parameterized queries are also dynamic but not in terms of construct. You can only change parameters but you can't change the structure of the statement i.e add WHERE clauses.
Parameterized queries are often at the database level so the database can cache the execution plan of the query and use it over and over. Not quite possible in the first case since a simple change in the text or the order of the where clauses can cause the database to not recognize the previously cached execution plan and start over.
The first construct is also vulnerable to SQL injection since it is hard to validate input values for attempts to inject rogue SQL.
Certainly anything involving EXEC (#sql) or EXEC sp_ExecuteSQL #sql, ... (i.e. dynamic at the database itself) would qualify, but I guess you could argue that any SQL generated at runtime (rather than fixed at build / install) would qualify.
So yes, you could argue that a runtime-generated, yet correctly parameterized query is "dynamic" (for example, LINQ-to-SQL generated queries), but to be honest as long as it doesn't expose you to injection attacks I don't care about the name ;-p
Dynamic sql is basically just any sql that is not fully constructed until runtime. Its generated on-the-fly by concatenating runtime values into a statement. Could be any part of an sql statement
A. Anything that will cause the DB server to evaluate strings as SQL.
B. No, as they still go through the DB driver/provider and get cleaned up.
For point b) You already know the statement and you pass in known parameters (hopefully type safe and not string literals).
I consider a dynamic SQL statement to be one that accepts new values at runtime in order to return a different result. "New values", by my reckoning, can be a different ORDER BY, new WHERE criteria, different field selections, etc.
a) What do we consider as a dynamic sql statement?
Any sql statement that dynamically adds a clause(s) or even just a part of a clause to a SQL string?
Both - any query altered/tailored prior to execution.
b)Aren’t then parameterized strings that use placeholders for dynamically supplied values also considered dynamic sql statements?
Parameterized queries, AKA using bind variables, are supplying different filter criteria to the query. You can use (my_variable IS NULL OR ...), but OR performance is generally terrible on any db & the approach destroys sargability.
Dynamic SQL generally deals with tailoring the query to include other logic, like JOINs that only need to be included if a specific parameter is set. However, there are limitations like the IN clause not supporting converting a comma delimited string as it's list of options - for this you would have to use dynamic SQL, or handle the comma delimited list in another fashion (CLR, pipelined into a temp table, etc).
I see where you're going with this, but one somewhat objective criteria which defines a particular situation as Dynamic SQL vs. say a prepared statement is...
...the fact that dynamic statements cause the SQL server to fully evaluate the query, to define a query plan etc.
With prepared statements, SQL can (and does unless explicitly asked) cache the query plan (and in some cases, even gather statistics about the returns etc.).
[Edit: effectively, even dynamic SQL statements are cached, but such cached plans have a much smaller chance of being reused because the exact same query would need to be received anew for this to happen (unlike with parametrized queries where the plan is reused even with distinct parameter values and of course, unless "WITH RECOMPILE")]
So in the cases from the question, both a) and b) would be considered dynamic SQL, because, in the case of b), the substitution takes place outside of SQL. This is not a prepared statement. SQL will see it as a totally novel statement, as it doesn't know that you are merely changing the search values.
Now... beyond the SQL-centric definition of dynamic SQL, it may be useful to distinguish between various forms of dynamic SQL, such as say your a) and b) cases. Dynamic SQL has had a bad rep for some time, some of it related to SQL injection awareness. I've seen applications where dynamic SQL is produced, and executed, within a Stored Procedure, and because it is technically "SQL-side" some people tended to accept it (even though SQL-injecting in particular may not have been addressed)...
The point I'm trying to make here is that building a query, dynamically, from various contextual elements is often needed, and one may be better advised to implement this in "application" layers (well, call it application-side layers, for indeed, it can and should be separate from application per-se), where the programming language and associated data structures are typically easier and more expressive than say T-SQL and such. So jamming this into SQL for the sake of calling it "Data-side" isn't a good thing, in my opinion.