I am using Sql-Server and Postgresql, usually when I need to run multiple commands, I open a new connection, and foreach needed query task I run a command (inserting 1000 rows, for example).
Is it bad if I run multiple query commands in a single command(queries separated by comma) vs previous behavior?
Running multiple commands inside a single call (and being able to do it) makes you extremely vulnerable to SQL injections. Any query, even a simple SELECT, becomes dangerous if somebody can append an UPDATE or DELETE statement afterwards. That's why many implementations (especially from PHP) simply inhibit the ability of submitting nested queries.
On the other hand, as far as I know, there's almost no valid reason to do so. One usually maintains the connection open, then the overhead implied by the call itself is negligible.
If what you seek is actually atomicity, then you want to try "transactions" instead ;
If you worried about the complexity of your queries and don't want to make them be re-parsed at each time, you may take a look to "prepared statements" and stored procedures (or "functions" if you're using a recent version of PostGreSQL).
Related
I'm making a simple web interface to allow execution of queries against a database (yeah, I know, I know it's a really bad practice, but it's a private website used only by a few trusted users that currently use directly a DB manager to execute these queries, so the web interface is only to make more automatic the process).
The thing is that, for safety, whenever an UPDATE query is detected I want to first execute a SELECT statement "equivalent" to the update (keeping WHERE clause) to retrieve how many records are going to be affected prior to execute the UPDATE.
The idea is to replace "UPDATE" by "SELECT * FROM" and remove the whole "SET" clause without removing the "WHERE".
I'm trying replacing UPDATE\s*(.*?)\s*SET.*(\s*WHERE\s*.*) by SELECT * FROM \1 \2 and similar but i'm having troubles when there is no "WHERE" clause (uncommon, but possible).
edit: It's pretty hard to explain why I need this to be done like this, but I do, I know about stored procedures, query builders, transactions, etc... but for my case it's not what I need to be able to do.
You should fix your design. There is nothing wrong with users updating data in a database. The question is how they do it. My strong suggestion is to wrap the update statements in stored procedures. Then only allow updates through the stored procedures.
There are several main reasons why I prefer this approach:
I think a well-designed API produces more stable and maintainable code.
The stored procedures control security.
The stored procedures allow better logging of what is happening in the database.
The stored procedures provide control over what users can do.
In your case, though, they offer another advantage. Because all the update code is on the database-side, you know what the update statements look like. So, you can then decide how you want to get the "pre-counts" (which is what I assume you are looking for).
EDIT:
There is also an important flaw in your design (as you describe it). The data might change between the update and the select. If you use stored procedures, there are ways to address this. For instance, you can wrap the operations in a transaction. Or, you use a SELECT to get the rows to be updated, lock those rows (depending on the database), and only do the update on those rows.
My understanding is that a prepared statement is compiled on the server once, thus saving the overhead of repeating parsing, optimization etc. Apparently, I should always prefer using prepared statements for queries that run more than once.
Are there any cons to this approach?
I am using ODBC (libodbc++) from C++ to MySQL.
Prepared Statements:
Why use prepared statements?
There are numerous advantages to using
prepared statements in your
applications, both for security and
performance reasons.
Prepared statements can help increase
security by separating SQL logic from
the data being supplied. This
separation of logic and data can help
prevent a very common type of
vulnerability called an SQL injection
attack. Normally when you are dealing
with an ad hoc query, you need to be
very careful when handling the data
that you received from the user. This
entails using functions that escape
all of the necessary trouble
characters, such as the single quote,
double quote, and backslash
characters. This is unnecessary when
dealing with prepared statements. The
separation of the data allows MySQL to
automatically take into account these
characters and they do not need to be
escaped using any special function.
The increase in performance in
prepared statements can come from a
few different features. First is the
need to only parse the query a single
time. When you initially prepare the
statement, MySQL will parse the
statement to check the syntax and set
up the query to be run. Then if you
execute the query many times, it will
no longer have that overhead. This
pre-parsing can lead to a speed
increase if you need to run the same
query many times, such as when doing
many INSERT statements.
(Note: While it will not happen with
MySQL 4.1, future versions will also
cache the execution plan for prepared
statements, eliminating another bit of
overhead you currently pay for each
query execution.)
The second place where performance may
increase is through the use of the new
binary protocol that prepared
statements can use. The traditional
protocol in MySQL always converts
everything into strings before sending
them across the network. This means
that the client converts the data into
strings, which are often larger than
the original data, sends it over the
network (or other transport) to the
server, which finally decodes the
string into the correct datatype. The
binary protocol removes this
conversion overhead. All types are
sent in a native binary form, which
saves the conversion CPU usage, and
can also cut down on network usage.
When should you use prepared statements? Prepared statements can
be useful for all of the above
reasons, however they should not (and
can not) be used for everything in
your application. First off, the type
of queries that they work on is
limited to DML (INSERT, REPLACE,
UPDATE, and DELETE), CREATE TABLE, and
SELECT queries. Support for additional
query types will be added in further
versions, to make the prepared
statements API more general.
-> Sometimes prepared statements can actually be slower than regular
queries. The reason for this is that
there are two round-trips to the
server, which can slow down simple
queries that are only executed a
single time. In cases like that, one
has to decide if it is worth trading
off the performance impact of this
extra round-trip in order to gain the
security benefits of using prepared
statements.
almost always.
http://use-the-index-luke.com/sql/where-clause/bind-parameters
Larger numbers of active prepared statements consume additional server memory. For example, it can be an issue for embedded platforms (e.g. sqlite database on IPhone).
You should always prefer working with prepared statements for the security benefits. They all but eliminate vulnerability to SQL injection, without you having to worry about SQL-escaping values.
If you have a query that doesn't run often, though (less than once per request), a prepared statement can take longer to run. It takes two calls to use a prepared statement: once to prepare it, and once to execute it. With an ad-hoc statement, those two steps are done in one fell swoop, and there's no waiting for the server to say "ok, done compiling".
The upshot of all that being, if you're worried about performance, and your query only runs once, an ad-hoc query might be a little faster. But the security benefits almost always outweigh the extra little bit of time it takes to prepare a statement.
What is considered best practice for executing recurring SQL queries? My understanding is to use a parameterized query and turn it into a prepared statement upon first execution. What if this query needs to be executed by multiple threads? Will I need to create a prepared statement for each type of query for each thread?
Or is the parsing of SQL statements so efficient nowadays that prepared statements are no longer necessary?
Good question - answered one bit at a time.
What is considered best practice for executing recurring SQL queries?
If the query will be repeated apart from differences in the parameters, then use prepared statements.
My understanding is to use a parameterized query and turn it into a prepared statement upon first execution.
That is my opinion on what should be done. Classically, the advice was to prepare all queries when the program started. In my view, this was always nonsense; it overloads the server with queries, many of which will not be used in any given run, wasting memory in both client and DBMS. It was always most sensible to prepare statements on demand; when it was first needed, and not unless it was needed. I'd allow an exception for statements that will 'always' be executed - but I'd have to be convinced that 'always' was really close to 100% of the time.
What if this query needs to be executed by multiple threads? Will I need to create a prepared statement for each type of query for each thread?
That depends on how the different threads communicate with the DBMS. In the DBMS with which I'm familiar, if there is a single connection that the threads all share, then you only need to prepare it once for the single connection. If each thread has its own separate connection, then you need to prepare the statement for each thread separately.
Or is the parsing of SQL statements so efficient nowadays that prepared statements are no longer necessary?
Machines are fast - yes. And for non-repeated statements, it is not worth worrying about the overhead. But if you are going to execute the query a few million times, then the cost of preparing it a few million times begins to add up. Also, database server machines are usually shared resources, but the statement is likely to be prepared separately for each user, so if you have multiple users hammering the system with repeated queries that are discarded, the server will be too busy preparing queries to execute any of them fast.
So, my answer is "No". Prepared statements are still beneficial when the queries will be repeated often enough - where 'often enough' is probably not all that often. Hundreds of times - use prepared statements. Tens of times - probably use prepared statements. Less than that - maybe do not use prepared statements.
Well, you didn't mention the environment you're using but in general, you can also consider stored procedures (if your DB engine supports it). It has the benefit of building an additional abstraction layer in the database itself thus making the exact database schema less relevant to client applications.
Using parameterized queries is encouraged most of the time, not only for the sake of performance, but for the security against SQL injection and preventing data type conversion issues (localized date time).
I've been reading a lot about prepared statements and in everything I've read, no one talks about the downsides of using them. Therefore, I'm wondering if there are any "there be dragons" spots that people tend to overlook?
Prepared statement is just a parsed and precompiled SQL statement which just waits for the bound variables to be provided to be executed.
Any executed statement becomes prepared sooner or later (it need to be parsed, optimized, compiled and then executed).
A prepared statement just reuses the results of parsing, optimization and compilation.
Usually database systems use some kind of optimization to save some time on query preparation even if you don't use prepared queries yourself.
Oracle, for instance, when parsing a query first checks the library cache, and if the same statement had already been parsed, it uses the cached execution plan instead.
If you use a statement only once, or if you automatically generate dynamic sql statements (and either properly escape everythin or know for certain your parameters have only safe characters) then you should not use prepared statements.
There is one other small issue with prepared statements vs dynamic sql, and that is that it can be harder to debug them. With dynamic sql, you can always just write out a problem query to a log file and run it directly on the server exactly as your program sees it. With prepared statements it can take a little more work to test your query with a specific set of parameters determined from crash data. But not that much more, and the extra security definitely justifies the cost.
in some situations, the database engine might come up with an inferior query plan when using a prepared statement (because it can't make the right assumptions without having the actual bind values for a search).
see e.g. the "Notes" section at
http://www.postgresql.org/docs/current/static/sql-prepare.html
so it might be worth testing your queries with and without preparing statements to find out which is faster. ideally, you would then decide on a per-statement basis whether to use prepared statements or not, although not all ORMs will allow you to do that.
The only downside that I can think of is that they take up memory on the server. It's not much, but there are probably some edge cases where it would be a problem but I'm hard pressed to think of any.
Here's an argument for SPs that I haven't heard. Flamers, be gentle with the down tick,
Since there is overhead associated with each trip to the database server, I would suggest that a POSSIBLE reason for placing your SQL in SPs over embedded code is that you are more insulated to change without taking a performance hit.
For example. Let's say you need to perform Query A that returns a scalar integer.
Then, later, the requirements change and you decide that it the results of the scalar is > x that then, and only then, you need to perform another query. If you performed the first query in a SP, you could easily check the result of the first query and conditionally execute the 2nd SQL in the same SP.
How would you do this efficiently in embedded SQL w/o perform a separate query or an unnecessary query?
Here's an example:
--This SP may return 1 or two queries.
SELECT #CustCount = COUNT(*) FROM CUSTOMER
IF #CustCount > 10
SELECT * FROM PRODUCT
Can this/what is the best way to do this in embedded SQL?
A very persuasive article
SQL and stored procedures will be there for the duration of your data.
Client languages come and go, and you'll have to re-implement your embedded SQL every time.
In the example you provide, the time saved is sending a single scalar value and a single follow-up query over the wire. This is insignificant in any reasonable scenario. That's not to say there might not be other valid performance reasons to use SPs; just that this isn't such a reason.
I would generally never put business logic in SP's, I like them to be in my native language of choice outside the database. The only time I agree SPs are better is when there is a lot of data movement that don't need to come out of the db.
So to aswer your question, I'd rather have two queries in my code than embed that in a SP, in my view I am trading a small performance hit for something a lot more clear.
How would you do this efficiently in
embedded SQL w/o perform a separate
query or an unnecessary query?
Depends on the database you are using. In SQL Server, this is a simple CASE statement.
Perhaps include the WHERE clause in that sproc:
WHERE (all your regular conditions)
AND myScalar > myThreshold
Lately I prefer to not use SPs (Except when uber complexity arises where a proc would just be better...or CLR would be better). I have been using the Repository pattern with LINQ to SQL where my query is written in my data layer in a strongly typed LINQ expression. The key here is that the query is strongly typed which means when I refactor I am refactoring properties of a class that is directly generated from the database table (which makes changes from the DB carried all the way forward super easy and accurate). While my SQL is generated for me and sent to the server I still have the option of sticking to DRY principles as the repository pattern allows me to break things down into their smallest component. I do have the issue that I might make a trip to the server and based on the results of query I may find that I need to make another trip to the server. I don't worry about this up front. If I find later that it becomes an issue then I may refactor that code into something more performant. The over all key here is that there is no one magic bullet. I tend to work on greenfield applications which allows this method of development to be most efficient for me.
Benefits of SPs:
Performance (are precompiled)
Easy to change (without compiling the application)
SQL set based features make very easy doing really difficult data tasks
Drawbacks:
Depend heavily on the database engine used
Makes deployment of upgrades a little harder (you have to deploy the App + the scripts)
My 2 cents...
About your example, it can be done like this:
select * from products where (select count(*) from customers>10)