I've been reading a lot about prepared statements and in everything I've read, no one talks about the downsides of using them. Therefore, I'm wondering if there are any "there be dragons" spots that people tend to overlook?
Prepared statement is just a parsed and precompiled SQL statement which just waits for the bound variables to be provided to be executed.
Any executed statement becomes prepared sooner or later (it need to be parsed, optimized, compiled and then executed).
A prepared statement just reuses the results of parsing, optimization and compilation.
Usually database systems use some kind of optimization to save some time on query preparation even if you don't use prepared queries yourself.
Oracle, for instance, when parsing a query first checks the library cache, and if the same statement had already been parsed, it uses the cached execution plan instead.
If you use a statement only once, or if you automatically generate dynamic sql statements (and either properly escape everythin or know for certain your parameters have only safe characters) then you should not use prepared statements.
There is one other small issue with prepared statements vs dynamic sql, and that is that it can be harder to debug them. With dynamic sql, you can always just write out a problem query to a log file and run it directly on the server exactly as your program sees it. With prepared statements it can take a little more work to test your query with a specific set of parameters determined from crash data. But not that much more, and the extra security definitely justifies the cost.
in some situations, the database engine might come up with an inferior query plan when using a prepared statement (because it can't make the right assumptions without having the actual bind values for a search).
see e.g. the "Notes" section at
http://www.postgresql.org/docs/current/static/sql-prepare.html
so it might be worth testing your queries with and without preparing statements to find out which is faster. ideally, you would then decide on a per-statement basis whether to use prepared statements or not, although not all ORMs will allow you to do that.
The only downside that I can think of is that they take up memory on the server. It's not much, but there are probably some edge cases where it would be a problem but I'm hard pressed to think of any.
Related
I am using Sql-Server and Postgresql, usually when I need to run multiple commands, I open a new connection, and foreach needed query task I run a command (inserting 1000 rows, for example).
Is it bad if I run multiple query commands in a single command(queries separated by comma) vs previous behavior?
Running multiple commands inside a single call (and being able to do it) makes you extremely vulnerable to SQL injections. Any query, even a simple SELECT, becomes dangerous if somebody can append an UPDATE or DELETE statement afterwards. That's why many implementations (especially from PHP) simply inhibit the ability of submitting nested queries.
On the other hand, as far as I know, there's almost no valid reason to do so. One usually maintains the connection open, then the overhead implied by the call itself is negligible.
If what you seek is actually atomicity, then you want to try "transactions" instead ;
If you worried about the complexity of your queries and don't want to make them be re-parsed at each time, you may take a look to "prepared statements" and stored procedures (or "functions" if you're using a recent version of PostGreSQL).
If we have a prepared statement like:
SELECT my_func($1::text, $2::int)
Is there is any gain in speed if I prepare a statement with this call and do the call via the prepared statement.
Let me quote the docs here:
Prepared statements have the largest performance advantage when a
single session is being used to execute a large number of similar
statements. The performance difference will be particularly
significant if the statements are complex to plan or rewrite, for
example, if the query involves a join of many tables or requires the
application of several rules. If the statement is relatively simple to
plan and rewrite but relatively expensive to execute, the performance
advantage of prepared statements will be less noticeable.
Emphasize is mine. I think it clearly states in which conditions PREPARE can have benefits.
Still, all languages currently provide a native way to prepare statements (like PHP), so the overall machinery is executed for you behind the scenes.
To make it short:
if it is a one-timer from the client, execute directly;
if it comes from the application and assumes user input, use your platform and it's functionality to prepare for security reasons;
if statement is executed many times within a session, use any means (either PREPARE or platform's functionality) to prepare for performance reasons.
My understanding is that a prepared statement is compiled on the server once, thus saving the overhead of repeating parsing, optimization etc. Apparently, I should always prefer using prepared statements for queries that run more than once.
Are there any cons to this approach?
I am using ODBC (libodbc++) from C++ to MySQL.
Prepared Statements:
Why use prepared statements?
There are numerous advantages to using
prepared statements in your
applications, both for security and
performance reasons.
Prepared statements can help increase
security by separating SQL logic from
the data being supplied. This
separation of logic and data can help
prevent a very common type of
vulnerability called an SQL injection
attack. Normally when you are dealing
with an ad hoc query, you need to be
very careful when handling the data
that you received from the user. This
entails using functions that escape
all of the necessary trouble
characters, such as the single quote,
double quote, and backslash
characters. This is unnecessary when
dealing with prepared statements. The
separation of the data allows MySQL to
automatically take into account these
characters and they do not need to be
escaped using any special function.
The increase in performance in
prepared statements can come from a
few different features. First is the
need to only parse the query a single
time. When you initially prepare the
statement, MySQL will parse the
statement to check the syntax and set
up the query to be run. Then if you
execute the query many times, it will
no longer have that overhead. This
pre-parsing can lead to a speed
increase if you need to run the same
query many times, such as when doing
many INSERT statements.
(Note: While it will not happen with
MySQL 4.1, future versions will also
cache the execution plan for prepared
statements, eliminating another bit of
overhead you currently pay for each
query execution.)
The second place where performance may
increase is through the use of the new
binary protocol that prepared
statements can use. The traditional
protocol in MySQL always converts
everything into strings before sending
them across the network. This means
that the client converts the data into
strings, which are often larger than
the original data, sends it over the
network (or other transport) to the
server, which finally decodes the
string into the correct datatype. The
binary protocol removes this
conversion overhead. All types are
sent in a native binary form, which
saves the conversion CPU usage, and
can also cut down on network usage.
When should you use prepared statements? Prepared statements can
be useful for all of the above
reasons, however they should not (and
can not) be used for everything in
your application. First off, the type
of queries that they work on is
limited to DML (INSERT, REPLACE,
UPDATE, and DELETE), CREATE TABLE, and
SELECT queries. Support for additional
query types will be added in further
versions, to make the prepared
statements API more general.
-> Sometimes prepared statements can actually be slower than regular
queries. The reason for this is that
there are two round-trips to the
server, which can slow down simple
queries that are only executed a
single time. In cases like that, one
has to decide if it is worth trading
off the performance impact of this
extra round-trip in order to gain the
security benefits of using prepared
statements.
almost always.
http://use-the-index-luke.com/sql/where-clause/bind-parameters
Larger numbers of active prepared statements consume additional server memory. For example, it can be an issue for embedded platforms (e.g. sqlite database on IPhone).
You should always prefer working with prepared statements for the security benefits. They all but eliminate vulnerability to SQL injection, without you having to worry about SQL-escaping values.
If you have a query that doesn't run often, though (less than once per request), a prepared statement can take longer to run. It takes two calls to use a prepared statement: once to prepare it, and once to execute it. With an ad-hoc statement, those two steps are done in one fell swoop, and there's no waiting for the server to say "ok, done compiling".
The upshot of all that being, if you're worried about performance, and your query only runs once, an ad-hoc query might be a little faster. But the security benefits almost always outweigh the extra little bit of time it takes to prepare a statement.
What is considered best practice for executing recurring SQL queries? My understanding is to use a parameterized query and turn it into a prepared statement upon first execution. What if this query needs to be executed by multiple threads? Will I need to create a prepared statement for each type of query for each thread?
Or is the parsing of SQL statements so efficient nowadays that prepared statements are no longer necessary?
Good question - answered one bit at a time.
What is considered best practice for executing recurring SQL queries?
If the query will be repeated apart from differences in the parameters, then use prepared statements.
My understanding is to use a parameterized query and turn it into a prepared statement upon first execution.
That is my opinion on what should be done. Classically, the advice was to prepare all queries when the program started. In my view, this was always nonsense; it overloads the server with queries, many of which will not be used in any given run, wasting memory in both client and DBMS. It was always most sensible to prepare statements on demand; when it was first needed, and not unless it was needed. I'd allow an exception for statements that will 'always' be executed - but I'd have to be convinced that 'always' was really close to 100% of the time.
What if this query needs to be executed by multiple threads? Will I need to create a prepared statement for each type of query for each thread?
That depends on how the different threads communicate with the DBMS. In the DBMS with which I'm familiar, if there is a single connection that the threads all share, then you only need to prepare it once for the single connection. If each thread has its own separate connection, then you need to prepare the statement for each thread separately.
Or is the parsing of SQL statements so efficient nowadays that prepared statements are no longer necessary?
Machines are fast - yes. And for non-repeated statements, it is not worth worrying about the overhead. But if you are going to execute the query a few million times, then the cost of preparing it a few million times begins to add up. Also, database server machines are usually shared resources, but the statement is likely to be prepared separately for each user, so if you have multiple users hammering the system with repeated queries that are discarded, the server will be too busy preparing queries to execute any of them fast.
So, my answer is "No". Prepared statements are still beneficial when the queries will be repeated often enough - where 'often enough' is probably not all that often. Hundreds of times - use prepared statements. Tens of times - probably use prepared statements. Less than that - maybe do not use prepared statements.
Well, you didn't mention the environment you're using but in general, you can also consider stored procedures (if your DB engine supports it). It has the benefit of building an additional abstraction layer in the database itself thus making the exact database schema less relevant to client applications.
Using parameterized queries is encouraged most of the time, not only for the sake of performance, but for the security against SQL injection and preventing data type conversion issues (localized date time).
I understand that the WITH RECOMPILE option forces the optimizer to rebuild the query plan for stored procs but when would you want that to happen?
What are some rules of thumb on when to use the WITH RECOMPILE option and when not to?
What's the effective overhead associated with just putting it on every sproc?
As others have said, you don't want to simply include WITH RECOMPILE in every stored proc as a matter of habit. By doing so, you'd be eliminating one of the primary benefits of stored procedures: the fact that it saves the query plan.
Why is that potentially a big deal? Computing a query plan is a lot more intensive than compiling regular procedural code. Because the syntax of a SQL statement only specifies what you want, and not (generally) how to get it, that allows the database a wide degree of flexibility when creating the physical plan (that is, the step-by-step instructions to actually gather and modify data). There are lots of "tricks" the database query pre-processor can do and choices it can make - what order to join the tables, which indexes to use, whether to apply WHERE clauses before or after joins, etc.
For a simple SELECT statement, it might not make a difference, but for any non-trivial query, the database is going to spend some serious time (measured in milliseconds, as opposed to the usual microseconds) to come up with an optimal plan. For really complex queries, it can't even guarantee an optimal plan, it has to just use heuristics to come up with a pretty good plan. So by forcing it to recompile every time, you're telling it that it has to go through that process over and over again, even if the plan it got before was perfectly good.
Depending on the vendor, there should be automatic triggers for recompiling query plans - for example, if the statistics on a table change significantly (like, the histogram of values in a certain column starts out evenly distributed by over time becomes highly skewed), then the DB should notice that and recompile the plan. But generally speaking, the implementers of a database are going to be smarter about that on the whole than you are.
As with anything performance related, don't take shots in the dark; figure out where the bottlenecks are that are costing 90% of your performance, and solve them first.
Putting it on every stored procedure is NOT a good idea, because compiling a query plan is a relatively expensive operation and you will not see any benefit from the query plans being cached and re-used.
The case of a dynamic where clause built up inside a stored procedure can be handled using sp_executesql to execute the TSQL rather than adding WITH RECOMPILE to the stored procedure.
Another solution (SQL Server 2005 onwards) is to use hint with specific parameters using the OPTIMIZE FOR hint. This works well if the values in the rows are static.
SQL Server 2008 has introduced a little known feature called "OPTIMIZE FOR UNKNOWN":
This hint directs the query optimizer
to use the standard algorithms it has
always used if no parameters values
had been passed to the query at all.
In this case the optimizer will look
at all available statistical data to
reach a determination of what the
values of the local variables used to
generate the queryplan should be,
instead of looking at the specific
parameter values that were passed to
the query by the application.
generally a much better alternative to WITH RECOMPILE is OPTION(RECOMPILE)
as you can see on the explanation below, taken from the answer of this question here
When a parameter-sensitivity problem is encountered, a common piece of
advice on forums and Q&A sites is to "use recompile" (assuming the
other tuning options presented earlier are unsuitable). Unfortunately,
that advice is often misinterpreted to mean adding WITH RECOMPILE
option to the stored procedure.
Using WITH RECOMPILE effectively returns us to SQL Server 2000
behaviour, where the entire stored procedure is recompiled on every
execution. A better alternative, on SQL Server 2005 and later, is to
use the OPTION (RECOMPILE) query hint on just the statement that
suffers from the parameter-sniffing problem. This query hint results
in a recompilation of the problematic statement only; execution plans
for other statements within the stored procedure are cached and reused
as normal.
Using WITH RECOMPILE also means the compiled plan for the stored
procedure is not cached. As a result, no performance information is
maintained in DMVs such as sys.dm_exec_query_stats. Using the query
hint instead means that a compiled plan can be cached, and performance
information is available in the DMVs (though it is limited to the most
recent execution, for the affected statement only).
For instances running at least SQL Server 2008 build 2746 (Service
Pack 1 with Cumulative Update 5), using OPTION (RECOMPILE) has another
significant advantage over WITH RECOMPILE: only OPTION (RECOMPILE)
enables the Parameter Embedding Optimization.
The most common use is when you might have a dynamic WHERE clause in a procedure...you wouldn't want that particular query plan to get compiled and saved for subsequent executions because it very well might not be the exact same clause the next time the procedure is called.
It should only be used when testing with reprentative data and context demonstrate that doing without produces invalid query plans (whatever the possible reasons might be). Don't assume beforehand (without testing) that an SP won't optimize properly.
Sole exception for manual invocation only (i.e. don't code it into the SP): When you know that you've substantially altered the character of the target tables. e.g. TRUNCATE, bulk loads, etc.
It's yet another opportunity for premature optimization.