Dynamic SQL in production systems - sql

SO has plenty of Q&As about how to accomplish various tasks using dynamic SQL, frequently the responses are accompanied with warnings and disclaimers about the advisability of actually using the approach provided. I've worked in environments ranging from "knowing how to use a cursor is a bad sign" to "isn't sp_executesql neat!".
When it comes to production systems should dynamic sql always be avoided or should it have a valid place in the programming toolbox. If not/so, Why?

Answers to some of your questions can be found here The Curse and Blessings of Dynamic SQL
Sometimes you have to use dynamic SQL because it will perform better

It depends on what you mean by dynamic sql. Queries built where parameter values substituted directly into the sql string are dangerous and should be avoided except for the very rare case where there is no other option.
Dynamic SQL using the appropriate parameterized query mechanism supported by your platform can be very powerful.

There are several considerations when evaluating the use of dynamic SQL:
performance can vary widely, and specific to which database engine is targeted
maintenance can vary widely, in both directions (e.g., dynamic SQL can employ
modularization techniques that static SQL cannot)
dynamic SQL is vulnerable to SQL injection attacks, but static SQL is (mostly) not
sometimes your needs make static SQL (nearly) impossible, but that should be rare
Remember, SQL is code, and an RDBMS is another API. It is NOT special, or at least not much; deal with it like you would any other code and API. In particular, don't just code directly against the API: modularize your code and write some helper methods to make it easier and reusable.

The cost of 'dynamic SQL' varies between DBMS.
In IBM DB2, where query plans can be pre-compiled down to machine code, the cost of Dynamic SQL is significant.
In IBM Informix (IDS - Informix Dynamic Server), most queries are effectively 'dynamic SQL' (the exception is queries in stored procedures) in that the SQL is interpreted at run-time, even though the client-side notation uses static SQL.
I believe - though a DB2 expert might be able to contradict me - that the CLI (ODBC, aka CCC) and JCC (JDBC) systems for DB2 do all SQL as dynamic SQL.
I don't know what Oracle, Sybase, MS SQL Server do - my suspicion is that they hew closer to the line adopted by IDS than the line adopted by DB2. For MySQL and PostgreSQL, I'd be surprised if they did not behave more like IDS than DB2.
As a result, with IDS, there is no particular overhead to using Dynamic SQL; at the server level, your SQL is dynamic anyway. Other DBMS may have other factors at play.
One issue for all servers is 'how do you identify the pre-compiled query from the SQL sent over the wire'. With DB2, the pre-compilers identify the package which is used, and the communications protocol between application and server identifies that. My understanding is that the DB2 clients such as ODBC and JDBC do not use a pre-compiled package - hence I think that they effectively do dynamic SQL all the time.
Beware SQL injection with Dynamic SQL!

My previous shop would never allow such things to execute against the database (SQL Server).
It was prohibited as practice, and the DB was locked down to prevent it.
All work goes through objects (SPs, etc).
This is the right way to do it always, IMHO.

There are edge cases where dynamic SQL is both easier and faster than the alternative. As long as you keep them few and far between and they are dynamically generated prepared parameterized SQL I see no big problem with them.

Dynamic SQL has a place - even in production code. Executing arbitrary code with security holes does not.
In general, I would avoid using dynamic SQL if there is not a good reason to use it. Dynamic SQL does not get checked until it runs, so you obviously have a bigger testing burden. However, there are plenty of good times to use it when dealing with administrative tasks, static code-generation, accommodating changing systems without excessive maintenance based on querying metadata, using DRY to avoid redundancy, etc.

Our system has what I think is a lot of dynamic SQL in it, mostly because we have a lot of dynamically created objects (tables, indexes, views mostly). A lot of this is legacy; things like partitioned tables in 2k5 help somewhat in some of our use cases. But as mentioned you need to make a good case for it; on-the-fly SQL inside the procs usually has a better (static) solution.

Related

Are there significant performance benefits from changing involved queries to stored procedures?

Obvious security benefits aside, are there significant performance boosts yielded from modifying involved queries to stored procedures in a SAP HANA database?
If so, are there metrics I can use to gauge perceived benefits?
SQL Server 2005 onward, all SQL statements, irrespective of whether it’s a SQL coming from inline code or stored procedure or from anywhere else, they are compiled and cached. So, stored procedures won't give you performance boosts. They do give you better abstraction, security and ease of maintenance.
Read more about it here.
As for SAP HANA, I tried comparing it with Microsoft SQL Server(This article sheds some light on it.) and I cannot definitively say that it does compile and cache inline queries but it most probably should if you're using a recent version.
Read the other solution before reading this one.
Essentially, I ran some tests and swapped out my project's dynamic queries to stored procedures and saw massive performance bumps.
I was reading the SAP HANA Best Practices reference and came across this passage under Performance:
Executing dynamic SQL is slow because compile time checks and query
optimization must be done for every invocation of the procedure.
Another related problem is security because constructing SQL
statements without proper checks of the variables used may harm
security.

Make stored procedures generic

I have to make my stored procedures written in SQL to be generic, so that they can be used in different versions of SQL (also MySQL if possible). I think it can be done if the scripts are written according to ANSI standards. But I have to convert large procedures. There is no tool for direct conversion. Is there any set of rules which can be followed to perform this conversion?
I have found a tool to validate scripts # http://developer.mimer.com
But this will be very time consuming as I have large SP's and I think by using some rule book, this task can be done in a shorter time.
There is no generic stored procedure language.
If you need something to work across database platforms, you would be better to implement the SP functionality in code, using ANSI standard SQL for the database access.

What is the difference between PL/SQL and T-SQL?

All that I know is that the former is Oracle and the latter is SQL Server. I assume some things might be easier in one versus the other but are there certain things I can do in PL that I can't in T?
Are there fundamental differences that I should be aware of? If so, what are they?
T-SQL and PL/SQL are two completely different programming languages with different syntax, type system, variable declarations, built-in functions and procedures, and programming capabilities.
The only thing they have in common is direct embedding of SQL statements, and storage and execution inside a database.
(In Oracle Forms, PL/SQL is even used for client-side code, though integration with database-stored PL/SQL is (almost) seemless)
The only fundamental difference is that PL/SQL is a procedural programming language with embedded SQL, and T-SQL is a set of procedural extensions for SQL used by MS SQL Server. The syntax differences are major. Here are a couple good articles on converting between the two:
http://www.dba-oracle.com/t_convent_sql_server_tsql_oracle_plsql.htm
http://jopinblog.wordpress.com/2007/04/24/oracle-plsql-equivalents-for-ms-sql-server-t-sql-constructs/
They're not necessarily easier, just different - ANSI-SQL is the standard code that's shared between them - the SELECT, WHERE and other query syntax, and T-SQL or PL/SQL is the control flow and advanced coding that's added on top by the database vendor to let you write stored procedures and string queries together.
It's possible to run multiple queries using just ANSI-SQL statements, but if you want to do any IF statements or handle any errors, you're using more than what's in the ANSI spec, so it's either T-SQL or PL/SQL (or whatever the vendor calls it if you're not using Microsoft or Oracle).
One tid bit I can add is take what you know in one and while using the other forget what you know(except for using set based logic when ever possible).
One example of the differences is cursor's are typically considered a less ideal solution in T-SQL unless there is a really good reason to use them which there is often not. In Oracle the cursor's are much more optimized for example they have bulk abilities, that is the ability to work on a set of data much like a normal SQL statement can. So in Oracle using a cursor isn't an instant failed code review where it might in a TSQL code review.
Overall T-SQL is much easier to learn as there's not much to it as far as languages are concerned. PL/SQL is a richer language, and therefore more complicated. It is not a hard language to pick up if have a good book. Overall I really like PLSQL for it's depth and I really like TSQL for it's simplicity.
PL/SQL and T-SQL are extensions for SQL. PL/SQL is used for Oracle databases and T-SQL is used for Microsoft databases.
Here are more useful informations:
http://techdifferences.com/difference-between-t-sql-and-pl-sql.html

Getting data from multiple databases

I am working on an application that will need to communicate with many different applications running on different database platforms. I will know the table schema before runtime but I won't know the database platform (MS SQL 200X, Oracle 9i, 10g, etc, MySQL 4.0.1, 5.x, etc, sybase, etc) until runtime.
It's my understanding that each of these systems have a slightly different dialect. Do I need to use nhibernate to handle the differences when connecting to these systems or can I use ADO.NET and pass raw SQL strings (select * from table)?
If you only need to use ANSI SQL statements, which should be implemented by all of the databases then yes, you can just use ADO.NET.
In my experience the main problem with database-agnostic code is the use of surrogate keys, like sequences or autonumber fields, as all databases implement these differently.
If you do need to use features that differ across databases then I don't think that it is reason enough to go to an object relational mapper like NHibernate - only do that if you have other reasons to do so. You can implement your own handling of syntax differences by generating different SQL for different databases easily enough.
SQL should be standardized for all dbs but they don't all use the same syntax so it really depends on what SQL you're calling. For example, SQL Server uses TOP while Oracle uses rownum. Even if they're all DDL, some syntactically differences between DBMSes can be an issue.
If select * from table is all you want, then there shouldn't be a problem, other than performance hits.

Are Stored Procedures more efficient, in general, than inline statements on modern RDBMS's? [duplicate]

This question already has answers here:
Which is better: Ad hoc queries or stored procedures? [closed]
(22 answers)
Closed 10 years ago.
Conventional wisdom states that stored procedures are always faster. So, since they're always faster, use them ALL THE TIME.
I am pretty sure this is grounded in some historical context where this was once the case. Now, I'm not advocating that Stored Procs are not needed, but I want to know in what cases stored procedures are necessary in modern databases such as MySQL, SQL Server, Oracle, or <Insert_your_DB_here>. Is it overkill to have ALL access through stored procedures?
NOTE that this is a general look at stored procedures not regulated to a specific
DBMS. Some DBMS (and even, different
versions of the same DBMS!) may operate
contrary to this, so you'll want to
double-check with your target DBMS
before assuming all of this still holds.
I've been a Sybase ASE, MySQL, and SQL Server DBA on-and off since for almost a decade (along with application development in C, PHP, PL/SQL, C#.NET, and Ruby). So, I have no particular axe to grind in this (sometimes) holy war.
The historical performance benefit of stored procs have generally been from the following (in no particular order):
Pre-parsed SQL
Pre-generated query execution plan
Reduced network latency
Potential cache benefits
Pre-parsed SQL -- similar benefits to compiled vs. interpreted code, except on a very micro level.
Still an advantage?
Not very noticeable at all on the modern CPU, but if you are sending a single SQL statement that is VERY large eleventy-billion times a second, the parsing overhead can add up.
Pre-generated query execution plan.
If you have many JOINs the permutations can grow quite unmanageable (modern optimizers have limits and cut-offs for performance reasons). It is not unknown for very complicated SQL to have distinct, measurable (I've seen a complicated query take 10+ seconds just to generate a plan, before we tweaked the DBMS) latencies due to the optimizer trying to figure out the "near best" execution plan. Stored procedures will, generally, store this in memory so you can avoid this overhead.
Still an advantage?
Most DBMS' (the latest editions) will cache the query plans for INDIVIDUAL SQL statements, greatly reducing the performance differential between stored procs and ad hoc SQL. There are some caveats and cases in which this isn't the case, so you'll need to test on your target DBMS.
Also, more and more DBMS allow you to provide optimizer path plans (abstract query plans) to significantly reduce optimization time (for both ad hoc and stored procedure SQL!!).
WARNING Cached query plans are not a performance panacea. Occasionally the query plan that is generated is sub-optimal.
For example, if you send SELECT *
FROM table WHERE id BETWEEN 1 AND
99999999, the DBMS may select a
full-table scan instead of an index
scan because you're grabbing every row
in the table (so sayeth the
statistics). If this is the cached
version, then you can get poor
performance when you later send
SELECT * FROM table WHERE id BETWEEN
1 AND 2. The reasoning behind this is
outside the scope of this posting, but
for further reading see:
http://www.microsoft.com/technet/prodtechnol/sql/2005/frcqupln.mspx
and
http://msdn.microsoft.com/en-us/library/ms181055.aspx
and http://www.simple-talk.com/sql/performance/execution-plan-basics/
"In summary, they determined that
supplying anything other than the
common values when a compile or
recompile was performed resulted in
the optimizer compiling and caching
the query plan for that particular
value. Yet, when that query plan was
reused for subsequent executions of
the same query for the common values
(‘M’, ‘R’, or ‘T’), it resulted in
sub-optimal performance. This
sub-optimal performance problem
existed until the query was
recompiled. At that point, based on
the #P1 parameter value supplied, the
query might or might not have a
performance problem."
Reduced network latency
A) If you are running the same SQL over and over -- and the SQL adds up to many KB of code -- replacing that with a simple "exec foobar" can really add up.
B) Stored procs can be used to move procedural code into the DBMS. This saves shuffling large amounts of data off to the client only to have it send a trickle of info back (or none at all!). Analogous to doing a JOIN in the DBMS vs. in your code (everyone's favorite WTF!)
Still an advantage?
A) Modern 1Gb (and 10Gb and up!) Ethernet really make this negligible.
B) Depends on how saturated your network is -- why shove several megabytes of data back and forth for no good reason?
Potential cache benefits
Performing server-side transforms of data can potentially be faster if you have sufficient memory on the DBMS and the data you need is in memory of the server.
Still an advantage?
Unless your app has shared memory access to DBMS data, the edge will always be to stored procs.
Of course, no discussion of Stored Procedure optimization would be complete without a discussion of parameterized and ad hoc SQL.
Parameterized / Prepared SQL
Kind of a cross between stored procedures and ad hoc SQL, they are embedded SQL statements in a host language that uses "parameters" for query values, e.g.:
SELECT .. FROM yourtable WHERE foo = ? AND bar = ?
These provide a more generalized version of a query that modern-day optimizers can use to cache (and re-use) the query execution plan, resulting in much of the performance benefit of stored procedures.
Ad Hoc SQL
Just open a console window to your DBMS and type in a SQL statement. In the past, these were the "worst" performers (on average) since the DBMS had no way of pre-optimizing the queries as in the parameterized/stored proc method.
Still a disadvantage?
Not necessarily. Most DBMS have the ability to "abstract" ad hoc SQL into parameterized versions -- thus more or less negating the difference between the two. Some do this implicitly or must be enabled with a command setting (SQL server: http://msdn.microsoft.com/en-us/library/ms175037.aspx , Oracle: http://www.praetoriate.com/oracle_tips_cursor_sharing.htm).
Lessons learned?
Moore's law continues to march on and DBMS optimizers, with every release, get more sophisticated. Sure, you can place every single silly teeny SQL statement inside a stored proc, but just know that the programmers working on optimizers are very smart and are continually looking for ways to improve performance. Eventually (if it's not here already) ad hoc SQL performance will become indistinguishable (on average!) from stored procedure performance, so any sort of massive stored procedure use ** solely for "performance reasons"** sure sounds like premature optimization to me.
Anyway, I think if you avoid the edge cases and have fairly vanilla SQL, you won't notice a difference between ad hoc and stored procedures.
Reasons for using stored procedures:
Reduce network traffic -- you have to send the SQL statement across the network. With sprocs, you can execute SQL in batches, which is also more efficient.
Caching query plan -- the first time the sproc is executed, SQL Server creates an execution plan, which is cached for reuse. This is particularly performant for small queries run frequently.
Ability to use output parameters -- if you send inline SQL that returns one row, you can only get back a recordset. With sprocs you can get them back as output parameters, which is considerably faster.
Permissions -- when you send inline SQL, you have to grant permissions on the table(s) to the user, which is granting much more access than merely granting permission to execute a sproc
Separation of logic -- remove the SQL-generating code and segregate it in the database.
Ability to edit without recompiling -- this can be controversial. You can edit the SQL in a sproc without having to recompile the application.
Find where a table is used -- with sprocs, if you want to find all SQL statements referencing a particular table, you can export the sproc code and search it. This is much easier than trying to find it in code.
Optimization -- It's easier for a DBA to optimize the SQL and tune the database when sprocs are used. It's easier to find missing indexes and such.
SQL injection attacks -- properly written inline SQL can defend against attacks, but sprocs are better for this protection.
In many cases, stored procedures are actually slower because they're more genaralized. While stored procedures can be highly tuned, in my experience there's enough development and institutional friction that they're left in place once they work, so stored procedures often tend to return a lot of columns "just in case" - because you don't want to deploy a new stored procedure every time you change your application. An OR/M, on the other hand, only requests the columns the application is using, which cuts down on network traffic, unnecessary joins, etc.
It's a debate that rages on and on (for instance, here).
It's as easy to write bad stored procedures as it is to write bad data access logic in your app.
My preference is for Stored Procs, but that's because I'm typically working with very large and complex apps in an enterprise environment where there are dedicated DBAs who are responsible for keeping the database servers running sweetly.
In other situations, I'm happy enough for data access technologies such as LINQ to take care of the optimisation.
Pure performance isn't the only consideration, though. Aspects such as security and configuration management are typically at least as important.
Edit: While Frans Bouma's article is indeed verbose, it misses the point with regard to security by a mile. The fact that it's 5 years old doesn't help its relevance, either.
There is no noticeable speed difference for stored procedures vs parameterized or prepared queries on most modern databases, because the database will also cache execution plans for those queries.
Note that a parameterized query is not the same as ad hoc sql.
The main reason imo to still favor stored procedures today has more to do with security. If you use stored procedures exclusively, you can disable INSERT, SELECT, UPDATE, DELETE, ALTER, DROP, and CREATE etc permissions for your application's user, only leaving it with EXECUTE.
This provides a little extra protection against 2nd order sql injection. Parameterized queries only protect against 1st order injection.
Obviously, actual performance ought to be measured in individual cases, not assumed. But even in cases where performance is hampered by a stored procedure, there are good reasons to use them:
Application developers aren't always the best SQL coders. Stored procedures hides SQL from the application.
Stored procedures automatically use bind variables. Application developers often avoid bind variables because they seem like unneeded code and show little benefit in small test systems. Later on, the failure to use bind variables can throttle RDBMS performance.
Stored procedures create a layer of indirection that might be useful later on. It's possible to change implementation details (including table structure) on the database side without touching application code.
The exercise of creating stored procedures can be useful for documenting all database interactions for a system. And it's easier to update the documentation when things change.
That said, I usually stick raw SQL in my applications so that I can control it myself. It depends on your development team and philosophy.
The one topic that no one has yet mentioned as a benefit of stored procedures is security. If you build the application exclusively with data access via stored procedures, you can lockdown the database so the ONLY access is via those stored procedures. Therefor, even if someone gets a database ID and password, they will be limited in what they can see or do against that database.
In 2007 I was on a project, where we used MS SQL Server via an ORM. We had 2 big, growing tables which took up to 7-8 seconds of load time on the SQL Server. After making 2 large, stored SQL procedures, and optimizing them from the query planner, each DB load time got down to less than 20 milliseconds, so clearly there are still efficiency reasons to use stored SQL procedures.
Having said that, we found out that the most important benefit of stored procedures was the added maintaince-ease, security, data-integrity, and decoupling business-logic from the middleware-logic, benefitting all middleware-logic from reuse of the 2 procedures.
Our ORM vendor made the usual claim that firing off many small SQL queries were going to be more efficient than fetching large, joined data sets. Our experience (to our surprise) showed something else.
This may of course vary between machines, networks, operating systems, SQL servers, application frameworks, ORM frameworks, and language implementations, so measure any benefit, you THINK you may get from doing something else.
It wasn't until we benchmarked that we discovered the problem was between the ORM and the database taking all the load.
I prefer to use SP's when it makes sense to use them. In SQL Server anyway there is no performance advantage to SP's over a parametrized query.
However, at my current job my boss mentioned that we are forced to use SP's because our customer's require them. They feel that they are more secure. I have not been here long enough to see if we are implementing role based security but I have a feeling we do.
So the customer's feelings trump all other arguments in this case.
Read Frans Bouma's excellent post (if a bit biased) on that.
To me one advantage of stored procedures is to be host language agnostic: you can switch from a C, Python, PHP or whatever application to another programming language without rewriting your code. In addition, some features like bulk operations improve really performance and are not easily available (not at all?) in host languages.
I don't know that they are faster. I like using ORM for data access (to not re-invent the wheel) but I realize that's not always a viable option.
Frans Bouma has a good article on this subject : http://weblogs.asp.net/fbouma/archive/2003/11/18/38178.aspx
All I can speak to is SQL server. In that platform, stored procedures are lovely because the server stores the execution plan, which in most cases speeds up performance a good bit. I say "in most cases", because if the SP has widely varying paths of execution you might get suboptimal performance. However, even in those cases, some enlightened refactoring of the SPs can speed things up.
Using stored procedures for CRUD operations is probably overkill, but it will depend on the tools be used and your own preferences (or requirements). I prefer inline SQL, but I make sure to use parameterized queries to prevent SQL injection attacks. I keep a print out of this xkcd comic as a reminder of what can go wrong if you are not careful.
Stored procedures can have real performance benefits when you are working with multiple sets of data to return a single set of data. It's usually more efficient to process sets of data in the stored procedure than sending them over the wire to be processed at the client end.
Realising this is a bit off-topic to the question, but if you are using a lot of stored procedures, make sure there is a consistent way to put them under some sort of source control (e.g., subversion or git) and be able to migrate updates from your development system to the test system to the production system.
When this is done by hand, with no way to easily audit what code is where, this quickly becomes a nightmare.
Stored procs are great for cases where the SQL code is run frequently because the database stores it tokenized in memory. If you repeatedly ran the same code outside of a stored proc, you will likey incur a performance hit from the database reparsing the same code over and over.
I typically frequently called code as a stored proc or as a SqlCommand (.NET) object and execute as many times as needed.
Yes, they are faster most of time. SQL composition is a huge performance tuning area too. If I am doing a back office type app I may skip them but anything production facing I use them for sure for all the reasons others spoke too...namely security.
IMHO...
Restricting "C_UD" operations to stored procedures can keep the data integrity logic in one place. This can also be done by restricting"C_UD" operations to a single middle ware layer.
Read operations can be provided to the application so they can join only the tables / columns they need.
Stored procedures can also be used instead of parameterized queries (or ad-hoc queries) for some other advantages too :
If you need to correct something (a sort order etc.) you don't need to recompile your app
You could deny access to all tables for that user account, grant access only to stored procedures and route all access through stored procedures. This way you can have custom validation of all input much more flexible than table constraints.
Reduced network traffic -- SP are generally worse then Dynamic SQL. Because people don't create a new SP for every select, if you need just one column you are told use the SP that has the columns they need and ignore the rest. Get an extra column and any less network usage you had just went away. Also you tend to have a lot of client filtering when SP are used.
caching -- MS-SQL does not treat them any differently, not since MS-SQL 2000 may of been 7 but I don't remember.
permissions -- Not a problem since almost everything I do is web or have some middle application tier that does all the database access. The only software I work with that have direct client to database access are 3rd party products that are designed for users to have direct access and are based around giving users permissions. And yes MS-SQL permission security model SUCKS!!! (have not spent time on 2008 yet) As a final part to this would like to see a survey of how many people are still doing direct client/server programming vs web and middle application server programming; and if they are doing large projects why no ORM.
Separation -- people would question why you are putting business logic outside of middle tier. Also if you are looking to separate data handling code there are ways of doing that without putting it in the database.
Ability to edit -- What you have no testing and version control you have to worry about? Also only a problem with client/server, in the web world not problem.
Find the table -- Only if you can identify the SP that use it, will stick with the tools of the version control system, agent ransack or visual studio to find.
Optimization -- Your DBA should be using the tools of the database to find the queries that need optimization. Database can tell the DBA what statements are talking up the most time and resources and they can fix from there. For complex SQL statements the programmers should be told to talk to the DBA if simple selects don't worry about it.
SQL injection attacks -- SP offer no better protection. The only thing they get the nod is that most of them teach using parameters vs dynamic SQL most examples ignore parameters.