Related
I have read many strong views (both for and against) SPs or DS.
I am writing a query engine in C++ (mySQL backend for now, though I may decide to go with a C++ ORM). I cant decide whether to write a SP, or to dynamically creat the SQL and send the query to the db engine.#
Any tips on how to decide?
Here's the simple answer:
If your programmers do both database and coding work, keep the SQL with the app. It's easier to maintain that way. Otherwise, let the DB guys handle it in SPs.
You have more control over the mechanisms outside the database. The biggest win for taking care of this outside the database is simply maintenance (in my mind). It'd be slightly hard to version control the SP vs the code you generate outside the database. One more thing to keep track of.
While we're on the topic, it's similar to handling data/schema migrations. It's annoyingly complex to version/handle schema migrations, if you don't already have a mechanism for this, you will have yet another thing you'll need to manage. It comes down to simply being easier to manage/version these things outside the database.
Consider the scenario where you have a bug in your SP. Now it needs to be changed, but then you hop over to another developers database/sandbox. What version is the sandbox and the SP? Now you have to track multiple versions.
One of the main differentiators is whether you are writing the "one true front end" or whether the database is the central piece of your application.
If you are going to have multiple front ends stored procedures make a lot of sense because you reduce your maintenance overhead. If you are writing only one interface, stored procedures are a pain, because you lose a lot of flexibility in changing your data set as your front end needs change, plus you now have to do code maintenance, version control, etc. in two places. Databases are a real pain to keep in sync with code repositories.
Finally, if you are coding for multiple databases (Oracle and SQL compatible code, for example), I'd avoid stored procedures completely.
You may in certain rare circumstances, after profiling, determine that some limited stored procedures are useful to you. This situation comes up way less than people think it does.
The main scenarios when you MUST have the SP is:
1) When you have very complex set of queries with heavy compile overhead and data drift low enough that recompiling is not needed on a regular basis.
2) When the "Only True" logic for accessing the specific data set is VERY complicated, needs to be accessed from several different codebases on different platforms (so writing multiple APIs in code is much more expensive).
Any other scenario, it's debatable, and can be decided one way or another.
I must also say that the other posters' arguments about versioning are not really such a big deal in my experience - having your SPs in version control is as easy as creating a "sql/db_name" directory structure and having easy basic "database release" script which releases the SP code from the version control location to the database. Every company I worked for had some kind of setup like this, central one run by DBAs or departmental one run by developers.
The one thing you want to avoid is to have your business logic spread across multiple tiers of your application. Database DDL and DML are difficult enough to keep in sync with an application code base as it is.
My recommendation is to create a good relational schema, but all your constraints and triggers so that the data retains integrity even if somebody goes to the database and tries to do something through some command line SQL.
Put all your business logic in an application or service that calls (static/dynamic) SQL then wraps the business functionality you are are trying to expose.
Stored-procedures have two purposes that I can think of.
An aid to simplifying data access.
The Stored Procedure does not have
any business logic in it, it just
knows about the structure of the
data and exposes an interface to
isolate accessing three tables and a
view just to get a single piece of
information.
Mapping the Domain Model to the Data
Model, Stored Procedures can assist
in making the Data Model look like a
given Domain Model.
After the program has been completed and has been profiled there are often performance issues with the pre 1.0 release. Stored procedures do offer batching of SQL without traffic needing to go back and forth between the DBMS and the Application. That being said in rare and extreme cases due to performance a few business rules might need to be migrated to the Stored-Procedure side. Make sure to document any exceptions to the architectural philosophy in multiple prominent places.
Stored Procedures are ideal for:
Creating reusable abstractions over complex queries;
Enforcing specific types of insertions/updates to tables (if you also deny permissions to the table);
Performing privileged operations that the logged-in user wouldn't normally be allowed to do;
Guaranteeing a consistent execution plan;
Extending the capabilities of an ORM (batch updates, hierarchy queries, etc.)
Dynamic SQL is ideal for:
Variable search arguments or output columns:
Optional search conditions
Pivot tables
IN clauses with user-specified values
ORM implementations (most can use SPs, but can't be built entirely on them);
DDL and administrative scripts.
They solve different problems, really. Use whichever one is more appropriate to the task at hand, and don't restrict yourself to just one or the other. After you work on database code for a while you'll start to get a more intuitive feel for these things; you'll find yourself banging together some rat's nest of strings for a query and think, "this should really go in a stored procedure."
Final note: Because this question implies a certain level of inexperience with SQL, I feel obliged to say, don't forget that you still need to parameterize your queries when you write dynamic SQL. Parameters aren't just for stored procedures.
DS is more flexible. SP approach makes your system more manageable.
I'm pretty new to SQL world. Here are my questions:
What are the benefits of stored procedured over normal SQL statement in applications?
Does stored procedure help eliminates SQL injection?
In Microsoft SQL Server it is called stored procedure. How about in Oracle, MySQL, DB2, etc.?
Thanks for your explanation.
Stored procedures only directly prevent SQL injection if you call them in a paramerized way. If you still have a string in your app with the procedure name and concatenate parameters from user input to that string in your code you'll have still have trouble.
However, when used exclusively, stored procedures let you add some additional protection by making it possible for you to disable permissions to everything but the EXEC command. Aside from this, parameterized queries/prepared statements are normally cached by the server, and so are just like a stored procedure in nearly every respect.
In spite of this, stored procedures have two big advantages for larger enterprises:
They allow you to define an application interface for the database, so that the system can be shared between multiple applications without having to duplicate logic in those applications.
They move the sql code to the db, where you can easily have an experienced DBA tune, update, and otherwise maintain it, rather than application developers who often don't know exactly what they're doing with database code.
Of course, these advantages aren't without cost:
It's harder to track changes in source control
The database code is far separated from the code that uses it
Developer tools for managing many stored procedures are less than ideal (if you've ever open the stored procedures folder in management studio to find 200 procedures for a database, you know what I'm talking about here).
Some of the benefits that I consider when using stored procedures
Stored procedures encapsulate query code at the server, rather than inside your application. This allows you to make changes to queries without having to recompile your application.
Stored procedures can be used for more well defined application security. You can Deny all rights on the base tables, grant execute only on the procs. This gives you a much smaller security footprint to manage.
Stored procedures are compiled code. With the latest versions of MSSQL the server does a better job of storing execution plans - so this isn't as big of an issue as it used to be, but still something to consider
Stored procedures eliminate SQL injection risk ONLY when used correctly. Make sure to use the parameters the right way inside the stored proc - stored procs that are just executing concatenated dynamic SQL inside them aren't doing anyone any good.
For the most part yes, SQL injection is far less likely with a stored procedure. Though there are times when you want to pass a stored procedure some data that requires you to use dynamic SQL inside the stored procedure and then you're right back where you started. In this sense I don't see any advantage to them over using parameterized queries in programming languages that support them.
Personally I hate stored procedures. Having code in two disjointed places is a pain in the ass and it makes deploys that much more complicated. I don't advocate littering your code with SQL statements either however as this leads to it's own set of headaches.
I recommend a DAL layer implemented one of two ways.
My favorite, use an object
relational management system (ORM).
I've been working with nHibernate
and I absolutely love it. The
learning curve in steep but
definitely worth the payoff in my
opinion.
Some kind of mechanism for keeping
all your SQL code in one place.
Either some sort of query library
you select from or a really
structured set of classes that
design the SQL for you. I don't
recommend this way since it's
basically like building your own ORM
and odds are you don't have the time
to do it correctly.
Forget stored procedures. Use an ORM.
One way in which stored procedures (ones which do not use dynamic SQL) can make the whole application more secure is that you can now set the permissions at the stored procedure level and not at the table level. If you do all of your data access this way (and forbid dynamic sql!) this means users can not under any circumstances do amnything to the database that is not in a stored proc. Developers always want to say that their application code can protect against outside threats, but they seem to forget that inside threats are often far more serious and by allowing permissions at the table level, they are at the mercy of any user who can find a way to directly query the database outside the application (another reason why in large shops only two or three people at most have production rights to anything in the datbase, it limits who can steal information).
Any financial system that uses anything except stored procs for instance is completely open to internal fraud which is a violation of internal controls that should prevent fraud and would not pass a good audit.
Stored procedures allow you to store you sql code in a location outside of the application. this gives you the ability to:
Change the SQL Code without recompiling/redistrubuting the application
Have multiple applications use the same stored procedure to access the same data.
Restrict users from having access to read/write to tables directly in the database.
From a development perspective it also allows the DBAs/database programmers to work on sql code without having to go through application code to work on it. (separation of responsibilities essentially).
Do stored procedures protect against injection attacks? For the most part yes. In sql server you can create stored procedures which are not effective against this, mainly by using sp_executesql. Now this doesn't main that sp_executesql is a security hole, it just means that more precaution needs to be taken when using it.
This also does not mean that stored procedures are the only way to protect against this. You can use parameritized sql to accomplish the same task of protecting against sql injection.
I do agree with other people stored procedures can be cumbersome, but they have their advantages too. Where I work, we have probably 20 different production databases for various reasons (don't ask). I work on a subset of maybe three, and my teammate and I know those three really really well. How do stored procedures help us? People come to us and when they need to grab that information out of those databases, we can get it for them. We don't have to spend hours explaining the schemas and what data is de-normalized. It's a layer of abstraction which allows us to program the most efficient code against the databases we know. If this isn't the case for you, then maybe stored procedures aren't the way to go, but in some instances they can add a lot of value.
When should I be using stored procedures instead of just writing the logic directly in my application? I'd like to reap the benefits of stored procedures, but I'd also like to not have my application logic spread out over the database and the application.
Are there any rules of thumb that you can think of in reference to this?
Wow... I'm going to swim directly against the current here and say, "almost always". There are a laundry list of reasons - some/many of which I'm sure others would argue. But I've developed apps both with and without the use of stored procs as a data access layer, and it has been my experience that well written stored procedures make it so much easier to write your application. Then there's the well-documented performance and security benefits.
This depends entirely on your environment. The answer to the question really isn't a coding problem, or even an analysis issue, but a business decision.
If your database supports just one application, and is reasonably tightly integrated with it, then it's better, for reasons of flexibility, to place your logic inside your application program. Under these circumstances handling the database simply as a plain data repository using common functionality looses you little and gains flexibility - with vendors, implementation, deployment and much else - and many of the purist arguments that the 'databases are for data' crowd make are demonstratively true.
On the other hand if your are handling a corporate database, which can generally be identified by having multiple access paths into it, then it is highly advisable to screw down the security as far as you can. At the very least all appropriate constraints should enabled, and if possible access to the data should be through views and procedures only. Whining programmers should be ignored in these cases as...
With a corporate database the asset is valuable and invalid data or actions can have business-threatening consequences. Your primary concern is safeguarding the business, not how convenient access is for your coders.
Such databases are by definition accessed by more than one application. You need to use the abstraction that stored procedures offer so the database can be changed when application A is upgraded and you don't have the resource to upgrade application B.
Similarly the encapsulation of business logic in SPs rather than in application code allows changes to such logic to be implemented across the business more easily and reliably than if such logic is embedded in application code. For example if a tax calculation changes it's less work, and more robust, if the calculation has to be changed in one SP than multiple applications. The rule of thumb here is that the business rule should be implemented at the closest point to the data where it is unique - so if you have a specialist application then the logic for that app can be implemented in that app, but logic more widely applicable to the business should be implemented in SPs.
Coders who dive into religious wars over the use or not of SPs generally have worked in only one environment or the other so they extrapolate their limited experience into a cast-iron position - which indeed will be perfectly defensible and correct in the context from which they come but misses the big picture. As always, you should make you decision on the needs of the business/customers/users and not on the which type of coding methodology you prefer.
I tend to avoid stored procedures. The debugging tools tend to be more primitive. Error reporting can be harder (vs your server's log file) and, to me at least, it just seems to add another language for no real gain.
There are cases where it can be useful, particularly when processing large amounts of data on the server and of course for database triggers that you can't do in code.
Other than that though, I tend to do everything in code and treat the database as a big dump of data rather than something I run code on.
Consider Who Needs Stored Procedures, Anyways?:
For modern databases and real world
usage scenarios, I believe a Stored
Procedure architecture has serious
downsides and little practical
benefit. Stored Procedures should be
considered database assembly language:
for use in only the most performance
critical situations.
and Why I do not use Stored Procedures:
The absolute worst thing you can do,
and it's horrifyingly common in the
Microsoft development world, is to
split related functionality between
sproc's and middle tier code.
Grrrrrrrr. You just make the code
brittle and you increase the
intellectual overhead of understanding
a system.
I said this in a comment, but I'm going to say it again here.
Security, Security, SECURITY.
When sql code is embedded in your application, you have to expose the underlying tables to direct access. This might sound okay at first. Until you get hit with some sql injection that scrambles all the varchar fields in your database.
Some people might say that they get around this by using magic quotes or some other way of properly escaping their embedded sql. The problem, though, is the one query a dev didn't escape correctly. Or, the dev that forgot to not allow code to be uploaded. Or, the web server that was cracked which allowed the attacker to upload code. Or,... you get the point. It's hard to cover all your bases.
My point is, all modern databases have security built in. You can simply deny direct table access (select, insert, update, and deletes) and force everything to go through your s'procs. By doing so generic attacks will no longer work. Instead the attacker would have to take the time to learn the intimate details of your system. This increases their "cost" in terms of time spent and stops drive by and worm attacks.
I know we can't secure ourselves against everything, but if you take the time to architect your apps so that the cost to crack it far outweighs the benefits then you are going to serious reduce your potential of data loss. That means taking advantage of all the security tools available to you.
Finally, as to the idea of not using s'procs because you might have to port to a different rdbms: First, most apps don't change database servers. Second, in the event that it's a real possibility, you have to code using ANSI sql anyway; which you can do in your procs. Third, you would have to reevaluate all of your sql code no matter what and it's a whole lot easier when that code is in one place. Fourth, all modern databases now support s'procs. Fifth, when using s'proc's you can custom tune your sql for the database it's running under to take advantage of that particular database's sql extensions.
Basically when you have to perform operations involving data that do not need to get out of the database. For example, you want to update one table with data from another, it makes little sense to get the data out and then back in if you can do it all in one single shot to the db.
Another situation where it may be acceptable to use stored procedures is when you are 100% sure you will never deploy your application to another database vendor. If you are an Oracle shop and you have lots of applications talking to the same database it may make sense to have stored procedures to make sure all of them talk to the db in a consistent manner.
Complicated database queries for me tend to end up as stored procs. Another thought to consider is that your database might be completely separate and distinct from the application. Lets say you run an Oracle DB and you essentially are building an API for other application developers at your organization to call into. You can hide the complicated stuff from them and provide a stored proc in its place.
A very simple example:
registerUser(username, password)
might end up running a few different queries (check if it exists, create entries in a preference table, etc) and you might want to encapsulate them.
Of course, different people will have different perspectives (a DBA versus a Programmer).
I used stored procs in 1 of 3 scenarios:
Speed
When speed is of the utmost importance, stored procedures provide an excellent method
Complexity
When I'm updating several tables and the code logic might change down the road, I can update the stored proc and avoid a recompile. Stored procedures are an excellent black box method for updating lots of data in a single stroke.
Transactions
When I'm working an insert, delete or update that spans multiple tables. I wrap the whole thing in a transaction. If there is an error, it's very easy to roll back the transaction and throw an error to avoid data corruption.
The bottom 2 are very do-able in code. However, stored procedures provide an black-box method of working when complex and transaction level operations are important. Otherwise, stick with code level database operations.
Security used to be one of the reasons. However, with LINQ and other ORMs out there, code level DAL operations are much more secure than they've been in the past. Stored procs ARE secure but so are ORMs like LINQ.
We use stored procedures for all of our reporting needs. They can usually retrieve the data faster and in a way that the report can just spit out directly instead of having to do any kind of calculations or similar.
We also will use stored procedures for complex or complicated queries we need to do that would be difficult to read if they were otherwise inside of our codebase.
It can also be very useful as a matter of encapsulation and in the philosophy of DRY. For instance I use stored functions for calculations inside a table that I need for several queries inside the code. This way I use the better performance as well as the ensuring that the calculation is always done the same way.
I would not use it for higher functionality or logic the should be in the business logic layer of an architecture, but focused on the model layer, where the functionality is clearly focused on the database design and possible flexibility of changing the database design without breaking the API to the other layers.
I tend to always use stored procedures. Personally, I find it makes everything easier to maintain. Then there is the security and performance considerations.
Just make sure you write clean, well laid out and well documented stored procedures.
When all the code is in a stored proc, it is far easier to refactor the database when needed. Changes to logic are far easier to push as well. It is also far far easier to performance tune and sooner or later performance tuning becomes necessary for most database applications.
From my experience, stored procedures can be very useful for building reporting databases/pipelines, however, I'd argue that you should avoid using stored procedures within applications as they can impede a team's velocity and any security risks of building queries within an application can be mitigated by the use of modern tooling/frameworks.
Why might we avoid it?
To avoid tight-coupling between applications and databases. If we use stored procedures, we won't be able to easily change our underlying database in the future because we'd have to either:
Migrate stored procedures from one database (e.g. DB2) to another (e.g. SQL Server) which could be painstakingly time-consuming or...
Migrate all the queries to the applications themselves (or potentially in a shared library)
Because code-first is a thing. There a several ORMs which can enable us to target any database and even manage the table schemas without ever needing to touch the database. ORMs such as Entity Framework or Dapper allow developers to focus on building features instead of writing stored procedures and wiring them up in the application.
It's yet another thing that developers need to learn in order to be productive. Instead, they can write the queries as part of the applications which makes the queries far simpler to understand, maintain, and modify by the developers who are building new features and/or fixing bugs.
Ultimately, it depends on what developers are most comfortable with.
If a developer has a heavy SQL background, they might go with Stored Procs.
If a developer has lots of app development experience, they might prefer queries in code. Personally, I think having queries in code can enable developers to move much faster and security concerns can be mitigated by ensuring teams are following best practices (e.g. parameterized queries, ORM). Stored procs aren't a "silver bullet" for system security.
Does the use of procedures still make sense in 202X?
Maybe in low level and rare scenarios or if we write code for a legacy companies with unfounded restrictions, stored procedure should be an option.
If entire logic is in the database, should I need a dba to change it?
No. In modern platforms, the requirement of a DBA to change the business logic is not an option.
Hot modification of stored procedures without dev or staging phases, area a crazy idea.
How easy is to maintain a procedure with dozens of lines, cursors and other low level database features vs a OOP objects in any modern language in which a junior developer is able to maintain?
This answers itself
Hide tables from my development team for security reasons sounds very crazy for me, in these times in which agility and well documentation are everything.
Modern development team with a modern database, should not worry about security. What's more, they need access to sandbox version of database to reduce the time of its deliverables.
With modern ORMs, ESBs, ETLs and the constant increase of cpu power, stored procedures are not an option anymore. Should I invest time and money in these tools, to create at final: one big stored procedure?
Of course, not.
On top of the speed and security considerations, I tend to stick as much in Stored Procedures as possible for ease of maintenance and alterations. If you put the logic in your application, and find later that sql logic has an error or needs to work differently in some manner, you have to recompile and redeploy the whole app in many cases (especially if it's a client side app such as WPF, Win-Forms, etc). If you keep the logic in the stored proc, all you have to do is update the proc and you never have to touch the application.
I agree that they should be used often and well.
The use case I think is extremely compelling and extremely useful is if you are taking in a lot of raw information that should be separated out into several tables, where the some of the data may have records that already exist and need to be connected by foreign key id, then you can just IF EXISTS checks and insert if it doesn't or return key if it does, which makes everything more uniform, succinct, and maintainable in the long run.
The only case where I would suggest against using them is if you are doing a lot of logic or number crunching between queries which is best done in the app server OR if you are working for a company where keeping all of the logic in the code is important for maintainability/understanding what is happening. If you have a git repository full of everything anyone would need and is easily understandable, that can be very valuable.
The stored procedures are a method of collecting operations that should be done together on database side, while still keeping them on database side.
This includes:
Populating several tables from one rowsource
Checking several tables against different business rules
Performing operations that cannot be efficiently performed using set-based approach
etc.
The main problem with stored procedures is that they are hard to maintain.
You, therefore, should make stored procedures as easy to maintain as all your other code.
I have an article on this in my blog:
Schema junk
I've had some very bad experiences with this.
I'm not opposed to stored procedures in their place, but gratuitous use of stored procedures can be very expensive.
First, stored procedures run on the database server. That means that if you have a multi-server environment with 50 webservers and one database server, instead of spreading workloads over 50 cheap machines, you load up one expensive one (since the database server is commonly built as a heavyweight server). And you're risking creating a single-point-of-failure.
Secondly, it's not very easy to write an application solely in stored procedures, although I ran into one that made a superhuman effort to try to. So you end up with something that's expensive to maintain: It's implemented in 2 different programming languages, and the source code is often not all in one place either, since stored procedures are definitively stored in the DBMS and not in a source archive. Assuming that someone ever managed/bothered o pull them out of the database server and source-archive them at all.
So aside from a fairly messy app architecture, you also limit the set of qualified chimpanzees who can maintain it, as multiple skills are required.
On the other hand, stored procedures are extremely useful, IF:
You need to maintain some sort of data integrity across multiple systems. That is, the stored logic doesn't belong to any single app, but you need consistent behavior from all participating apps. A certain amount of this is almost inevitable in modern-day apps in the form of foreign keys and triggers, but occasionally, major editing and validation may be warranted as well.
You need performance that can only be achieved by running logic on the database server itself and not as a client. But, as I said, when you do that, you're eating into the total system resources of the DBMS server. So it behooves you to ensure that if there are significant bits of the offending operation that CAN be offloaded onto clients, you can separate them out and leave the most critical stuff for the DBMS server.
A particular scenario you're likely to benefit involves the situation around the "(n+1)" scalability problem. Any kind of multidimensional/hierarchical situation is likely to involve this scenario.
Another scenario would involve use cases where it does some protocol when handling the tables (hint: defined steps which transactions are likely to be involved), this could benefit from locality of reference: Being in the server, queries might benefit. OTOH, you could supply a batch of statements directly into the server. Specially when you're on a XA environment and you have to access federated databases.
If you are talking business logic rather than just "Should I use sprocs in general" I would say you should put business logic in sprocs when you are carrying out large set based operations or any other time executing the logic would require a large number of calls to the db from the app.
It also depends on your audience. Is ease of installation and portability across DBMSs important to you?
If your program should be easy to install and easy to run on different database systems then you should stay away from stored procedures and also look out for non-portable SQL in your code.
This question already has answers here:
Which is better: Ad hoc queries or stored procedures? [closed]
(22 answers)
Closed 10 years ago.
Conventional wisdom states that stored procedures are always faster. So, since they're always faster, use them ALL THE TIME.
I am pretty sure this is grounded in some historical context where this was once the case. Now, I'm not advocating that Stored Procs are not needed, but I want to know in what cases stored procedures are necessary in modern databases such as MySQL, SQL Server, Oracle, or <Insert_your_DB_here>. Is it overkill to have ALL access through stored procedures?
NOTE that this is a general look at stored procedures not regulated to a specific
DBMS. Some DBMS (and even, different
versions of the same DBMS!) may operate
contrary to this, so you'll want to
double-check with your target DBMS
before assuming all of this still holds.
I've been a Sybase ASE, MySQL, and SQL Server DBA on-and off since for almost a decade (along with application development in C, PHP, PL/SQL, C#.NET, and Ruby). So, I have no particular axe to grind in this (sometimes) holy war.
The historical performance benefit of stored procs have generally been from the following (in no particular order):
Pre-parsed SQL
Pre-generated query execution plan
Reduced network latency
Potential cache benefits
Pre-parsed SQL -- similar benefits to compiled vs. interpreted code, except on a very micro level.
Still an advantage?
Not very noticeable at all on the modern CPU, but if you are sending a single SQL statement that is VERY large eleventy-billion times a second, the parsing overhead can add up.
Pre-generated query execution plan.
If you have many JOINs the permutations can grow quite unmanageable (modern optimizers have limits and cut-offs for performance reasons). It is not unknown for very complicated SQL to have distinct, measurable (I've seen a complicated query take 10+ seconds just to generate a plan, before we tweaked the DBMS) latencies due to the optimizer trying to figure out the "near best" execution plan. Stored procedures will, generally, store this in memory so you can avoid this overhead.
Still an advantage?
Most DBMS' (the latest editions) will cache the query plans for INDIVIDUAL SQL statements, greatly reducing the performance differential between stored procs and ad hoc SQL. There are some caveats and cases in which this isn't the case, so you'll need to test on your target DBMS.
Also, more and more DBMS allow you to provide optimizer path plans (abstract query plans) to significantly reduce optimization time (for both ad hoc and stored procedure SQL!!).
WARNING Cached query plans are not a performance panacea. Occasionally the query plan that is generated is sub-optimal.
For example, if you send SELECT *
FROM table WHERE id BETWEEN 1 AND
99999999, the DBMS may select a
full-table scan instead of an index
scan because you're grabbing every row
in the table (so sayeth the
statistics). If this is the cached
version, then you can get poor
performance when you later send
SELECT * FROM table WHERE id BETWEEN
1 AND 2. The reasoning behind this is
outside the scope of this posting, but
for further reading see:
http://www.microsoft.com/technet/prodtechnol/sql/2005/frcqupln.mspx
and
http://msdn.microsoft.com/en-us/library/ms181055.aspx
and http://www.simple-talk.com/sql/performance/execution-plan-basics/
"In summary, they determined that
supplying anything other than the
common values when a compile or
recompile was performed resulted in
the optimizer compiling and caching
the query plan for that particular
value. Yet, when that query plan was
reused for subsequent executions of
the same query for the common values
(‘M’, ‘R’, or ‘T’), it resulted in
sub-optimal performance. This
sub-optimal performance problem
existed until the query was
recompiled. At that point, based on
the #P1 parameter value supplied, the
query might or might not have a
performance problem."
Reduced network latency
A) If you are running the same SQL over and over -- and the SQL adds up to many KB of code -- replacing that with a simple "exec foobar" can really add up.
B) Stored procs can be used to move procedural code into the DBMS. This saves shuffling large amounts of data off to the client only to have it send a trickle of info back (or none at all!). Analogous to doing a JOIN in the DBMS vs. in your code (everyone's favorite WTF!)
Still an advantage?
A) Modern 1Gb (and 10Gb and up!) Ethernet really make this negligible.
B) Depends on how saturated your network is -- why shove several megabytes of data back and forth for no good reason?
Potential cache benefits
Performing server-side transforms of data can potentially be faster if you have sufficient memory on the DBMS and the data you need is in memory of the server.
Still an advantage?
Unless your app has shared memory access to DBMS data, the edge will always be to stored procs.
Of course, no discussion of Stored Procedure optimization would be complete without a discussion of parameterized and ad hoc SQL.
Parameterized / Prepared SQL
Kind of a cross between stored procedures and ad hoc SQL, they are embedded SQL statements in a host language that uses "parameters" for query values, e.g.:
SELECT .. FROM yourtable WHERE foo = ? AND bar = ?
These provide a more generalized version of a query that modern-day optimizers can use to cache (and re-use) the query execution plan, resulting in much of the performance benefit of stored procedures.
Ad Hoc SQL
Just open a console window to your DBMS and type in a SQL statement. In the past, these were the "worst" performers (on average) since the DBMS had no way of pre-optimizing the queries as in the parameterized/stored proc method.
Still a disadvantage?
Not necessarily. Most DBMS have the ability to "abstract" ad hoc SQL into parameterized versions -- thus more or less negating the difference between the two. Some do this implicitly or must be enabled with a command setting (SQL server: http://msdn.microsoft.com/en-us/library/ms175037.aspx , Oracle: http://www.praetoriate.com/oracle_tips_cursor_sharing.htm).
Lessons learned?
Moore's law continues to march on and DBMS optimizers, with every release, get more sophisticated. Sure, you can place every single silly teeny SQL statement inside a stored proc, but just know that the programmers working on optimizers are very smart and are continually looking for ways to improve performance. Eventually (if it's not here already) ad hoc SQL performance will become indistinguishable (on average!) from stored procedure performance, so any sort of massive stored procedure use ** solely for "performance reasons"** sure sounds like premature optimization to me.
Anyway, I think if you avoid the edge cases and have fairly vanilla SQL, you won't notice a difference between ad hoc and stored procedures.
Reasons for using stored procedures:
Reduce network traffic -- you have to send the SQL statement across the network. With sprocs, you can execute SQL in batches, which is also more efficient.
Caching query plan -- the first time the sproc is executed, SQL Server creates an execution plan, which is cached for reuse. This is particularly performant for small queries run frequently.
Ability to use output parameters -- if you send inline SQL that returns one row, you can only get back a recordset. With sprocs you can get them back as output parameters, which is considerably faster.
Permissions -- when you send inline SQL, you have to grant permissions on the table(s) to the user, which is granting much more access than merely granting permission to execute a sproc
Separation of logic -- remove the SQL-generating code and segregate it in the database.
Ability to edit without recompiling -- this can be controversial. You can edit the SQL in a sproc without having to recompile the application.
Find where a table is used -- with sprocs, if you want to find all SQL statements referencing a particular table, you can export the sproc code and search it. This is much easier than trying to find it in code.
Optimization -- It's easier for a DBA to optimize the SQL and tune the database when sprocs are used. It's easier to find missing indexes and such.
SQL injection attacks -- properly written inline SQL can defend against attacks, but sprocs are better for this protection.
In many cases, stored procedures are actually slower because they're more genaralized. While stored procedures can be highly tuned, in my experience there's enough development and institutional friction that they're left in place once they work, so stored procedures often tend to return a lot of columns "just in case" - because you don't want to deploy a new stored procedure every time you change your application. An OR/M, on the other hand, only requests the columns the application is using, which cuts down on network traffic, unnecessary joins, etc.
It's a debate that rages on and on (for instance, here).
It's as easy to write bad stored procedures as it is to write bad data access logic in your app.
My preference is for Stored Procs, but that's because I'm typically working with very large and complex apps in an enterprise environment where there are dedicated DBAs who are responsible for keeping the database servers running sweetly.
In other situations, I'm happy enough for data access technologies such as LINQ to take care of the optimisation.
Pure performance isn't the only consideration, though. Aspects such as security and configuration management are typically at least as important.
Edit: While Frans Bouma's article is indeed verbose, it misses the point with regard to security by a mile. The fact that it's 5 years old doesn't help its relevance, either.
There is no noticeable speed difference for stored procedures vs parameterized or prepared queries on most modern databases, because the database will also cache execution plans for those queries.
Note that a parameterized query is not the same as ad hoc sql.
The main reason imo to still favor stored procedures today has more to do with security. If you use stored procedures exclusively, you can disable INSERT, SELECT, UPDATE, DELETE, ALTER, DROP, and CREATE etc permissions for your application's user, only leaving it with EXECUTE.
This provides a little extra protection against 2nd order sql injection. Parameterized queries only protect against 1st order injection.
Obviously, actual performance ought to be measured in individual cases, not assumed. But even in cases where performance is hampered by a stored procedure, there are good reasons to use them:
Application developers aren't always the best SQL coders. Stored procedures hides SQL from the application.
Stored procedures automatically use bind variables. Application developers often avoid bind variables because they seem like unneeded code and show little benefit in small test systems. Later on, the failure to use bind variables can throttle RDBMS performance.
Stored procedures create a layer of indirection that might be useful later on. It's possible to change implementation details (including table structure) on the database side without touching application code.
The exercise of creating stored procedures can be useful for documenting all database interactions for a system. And it's easier to update the documentation when things change.
That said, I usually stick raw SQL in my applications so that I can control it myself. It depends on your development team and philosophy.
The one topic that no one has yet mentioned as a benefit of stored procedures is security. If you build the application exclusively with data access via stored procedures, you can lockdown the database so the ONLY access is via those stored procedures. Therefor, even if someone gets a database ID and password, they will be limited in what they can see or do against that database.
In 2007 I was on a project, where we used MS SQL Server via an ORM. We had 2 big, growing tables which took up to 7-8 seconds of load time on the SQL Server. After making 2 large, stored SQL procedures, and optimizing them from the query planner, each DB load time got down to less than 20 milliseconds, so clearly there are still efficiency reasons to use stored SQL procedures.
Having said that, we found out that the most important benefit of stored procedures was the added maintaince-ease, security, data-integrity, and decoupling business-logic from the middleware-logic, benefitting all middleware-logic from reuse of the 2 procedures.
Our ORM vendor made the usual claim that firing off many small SQL queries were going to be more efficient than fetching large, joined data sets. Our experience (to our surprise) showed something else.
This may of course vary between machines, networks, operating systems, SQL servers, application frameworks, ORM frameworks, and language implementations, so measure any benefit, you THINK you may get from doing something else.
It wasn't until we benchmarked that we discovered the problem was between the ORM and the database taking all the load.
I prefer to use SP's when it makes sense to use them. In SQL Server anyway there is no performance advantage to SP's over a parametrized query.
However, at my current job my boss mentioned that we are forced to use SP's because our customer's require them. They feel that they are more secure. I have not been here long enough to see if we are implementing role based security but I have a feeling we do.
So the customer's feelings trump all other arguments in this case.
Read Frans Bouma's excellent post (if a bit biased) on that.
To me one advantage of stored procedures is to be host language agnostic: you can switch from a C, Python, PHP or whatever application to another programming language without rewriting your code. In addition, some features like bulk operations improve really performance and are not easily available (not at all?) in host languages.
I don't know that they are faster. I like using ORM for data access (to not re-invent the wheel) but I realize that's not always a viable option.
Frans Bouma has a good article on this subject : http://weblogs.asp.net/fbouma/archive/2003/11/18/38178.aspx
All I can speak to is SQL server. In that platform, stored procedures are lovely because the server stores the execution plan, which in most cases speeds up performance a good bit. I say "in most cases", because if the SP has widely varying paths of execution you might get suboptimal performance. However, even in those cases, some enlightened refactoring of the SPs can speed things up.
Using stored procedures for CRUD operations is probably overkill, but it will depend on the tools be used and your own preferences (or requirements). I prefer inline SQL, but I make sure to use parameterized queries to prevent SQL injection attacks. I keep a print out of this xkcd comic as a reminder of what can go wrong if you are not careful.
Stored procedures can have real performance benefits when you are working with multiple sets of data to return a single set of data. It's usually more efficient to process sets of data in the stored procedure than sending them over the wire to be processed at the client end.
Realising this is a bit off-topic to the question, but if you are using a lot of stored procedures, make sure there is a consistent way to put them under some sort of source control (e.g., subversion or git) and be able to migrate updates from your development system to the test system to the production system.
When this is done by hand, with no way to easily audit what code is where, this quickly becomes a nightmare.
Stored procs are great for cases where the SQL code is run frequently because the database stores it tokenized in memory. If you repeatedly ran the same code outside of a stored proc, you will likey incur a performance hit from the database reparsing the same code over and over.
I typically frequently called code as a stored proc or as a SqlCommand (.NET) object and execute as many times as needed.
Yes, they are faster most of time. SQL composition is a huge performance tuning area too. If I am doing a back office type app I may skip them but anything production facing I use them for sure for all the reasons others spoke too...namely security.
IMHO...
Restricting "C_UD" operations to stored procedures can keep the data integrity logic in one place. This can also be done by restricting"C_UD" operations to a single middle ware layer.
Read operations can be provided to the application so they can join only the tables / columns they need.
Stored procedures can also be used instead of parameterized queries (or ad-hoc queries) for some other advantages too :
If you need to correct something (a sort order etc.) you don't need to recompile your app
You could deny access to all tables for that user account, grant access only to stored procedures and route all access through stored procedures. This way you can have custom validation of all input much more flexible than table constraints.
Reduced network traffic -- SP are generally worse then Dynamic SQL. Because people don't create a new SP for every select, if you need just one column you are told use the SP that has the columns they need and ignore the rest. Get an extra column and any less network usage you had just went away. Also you tend to have a lot of client filtering when SP are used.
caching -- MS-SQL does not treat them any differently, not since MS-SQL 2000 may of been 7 but I don't remember.
permissions -- Not a problem since almost everything I do is web or have some middle application tier that does all the database access. The only software I work with that have direct client to database access are 3rd party products that are designed for users to have direct access and are based around giving users permissions. And yes MS-SQL permission security model SUCKS!!! (have not spent time on 2008 yet) As a final part to this would like to see a survey of how many people are still doing direct client/server programming vs web and middle application server programming; and if they are doing large projects why no ORM.
Separation -- people would question why you are putting business logic outside of middle tier. Also if you are looking to separate data handling code there are ways of doing that without putting it in the database.
Ability to edit -- What you have no testing and version control you have to worry about? Also only a problem with client/server, in the web world not problem.
Find the table -- Only if you can identify the SP that use it, will stick with the tools of the version control system, agent ransack or visual studio to find.
Optimization -- Your DBA should be using the tools of the database to find the queries that need optimization. Database can tell the DBA what statements are talking up the most time and resources and they can fix from there. For complex SQL statements the programmers should be told to talk to the DBA if simple selects don't worry about it.
SQL injection attacks -- SP offer no better protection. The only thing they get the nod is that most of them teach using parameters vs dynamic SQL most examples ignore parameters.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Assuming you can't use LINQ for whatever reason, is it a better practice to place your queries in stored procedures, or is it just as good a practice to execute ad hoc queries against the database (say, SQL Server for argument's sake)?
In my experience writing mostly WinForms Client/Server apps these are the simple conclusions I've come to:
Use Stored Procedures:
For any complex data work. If you're going to be doing something truly requiring a cursor or temp tables it's usually fastest to do it within SQL Server.
When you need to lock down access to the data. If you don't give table access to users (or role or whatever) you can be sure that the only way to interact with the data is through the SP's you create.
Use ad-hoc queries:
For CRUD when you don't need to restrict data access (or are doing so in another manner).
For simple searches. Creating SP's for a bunch of search criteria is a pain and difficult to maintain. If you can generate a reasonably fast search query use that.
In most of my applications I've used both SP's and ad-hoc sql, though I find I'm using SP's less and less as they end up being code just like C#, only harder to version control, test, and maintain. I would recommend using ad-hoc sql unless you can find a specific reason not to.
I can't speak to anything other than SQL Server, but the performance argument is not significantly valid there unless you're on 6.5 or earlier. SQL Server has been caching ad-hoc execution plans for roughly a decade now.
I think this is a basic conflict between people who must maintain the database and people who develop the user interfaces.
As a data person, I would not consider working with a database that is accessed through adhoc queries because they are difficult to effectively tune or manage. How can I know what affect a change to the schema will have? Additionally, I do not think users should ever be granted direct access to the database tables for security reasons (and I do not just mean SQL injection attacks, but also because it is a basic internal control to not allow direct rights and require all users to use only the procs designed for the app. This is to prevent possible fraud. Any financial system which allows direct insert, update or delete rights to tables is has a huge risk for fraud. This is a bad thing.).
Databases are not object-oriented and code which seems good from an object-oriented perspective is can be extremely bad from a database perspective.
Our developers tell us they are glad that all our databse access is through procs becasue it makes it much faster to fix a data-centered bug and then simply run the proc on the production environment rather than create a new branch of the code and recompile and reload to production. We require all our procs to be in subversion, so source control is not an issue at all. If it isn't in Subversion, it will periodically get dropped by the dbas, so there is no resistance to using Source Control.
Stored procedures represent a software contract that encapsulates the actions taken against the database. The code in the procedures, and even the schema of the database itself can be changed without affecting compiled, deployed code, just so the inputs and outputs of the procedure remain the same.
By embedding queries in your application, you are tightly coupling yourself to your data model.
For the same reason, it is also not good practice to simply create stored procedures that are just CRUD queries against every table in your database, since this is still tight coupling. The procedures should instead be bulky, coarse grained operations.
From a security perspective, it is good practice to disallow db_datareader and db_datawriter from your application and only allow access to stored procedures.
Stored procedures are definitely the way to go...they are compiled, have execution plan before hand and you could do rights management on them.
I do not understand this whole source control issue on stored procedure. You definitely can source control them, if only you are a little disciplined.
Always start with a .sql file that is the source of your stored procedure. Put it in version control once you have written your code. The next time you want to edit your stored procedure get it from your source control than your database. If you follow this, you will have as good source control as your code.
I would like to quote Tom Kyte from Oracle here...Here's his rule on where to write code...though a bit unrelated but good to know I guess.
Start with stored procedures in PL/SQL...
If you think something can't be done using stored procedure in PL/SQL, use Java stored procedure.
If you think something can't be done using Java Stored procedure, consider Pro*c.
If you think you can't achieve something using Pro*C, you might want to rethink what you need to get done.
My answer from a different post:
Stored Procedures are MORE maintainable because:
You don't have to recompile your C# app whenever you want to change some SQL
You end up reusing SQL code.
Code repetition is the worst thing you can do when you're trying to build a maintainable application!
What happens when you find a logic error that needs to be corrected in multiple places? You're more apt to forget to change that last spot where you copy & pasted your code.
In my opinion, the performance & security gains are an added plus. You can still write insecure/inefficient SQL stored procedures.
Easier to port to another DB - no procs to port
It's not very hard to script out all your stored procedures for creation in another DB. In fact - it's easier than exporting your tables because there are no primary/foreign keys to worry about.
In our application, there is a layer of code that provides the content of the query (and is sometimes a call to a stored procedure). This allows us to:
easily have all the queries under version control
to make what ever changes are required to each query for different database servers
eliminates repetition of the same query code through out our code
Access control is implemented in the middle layer, rather than in the database, so we don't need stored procedures there. This is in some ways a middle road between ad hoc queries and stored procs.
There are persuasive arguments for both - stored procedures are all located in a central repository, but are (potentially) hard to migrate and ad hoc queries are easier to debug as they are with your code, but they can also be harder to find in the code.
The argument that stored procedures are more efficient doesn't hold water anymore.
link text
Doing a google for Stored Procedure vs Dynamic Query will show decent arguments either way and probably best for you to make your own decision...
Store procedures should be used as much as possible, if your writing SQL into code your already setting yourself up for headaches in the futures. It takes about the same time to write a SPROC as it does to write it in code.
Consider a query that runs great under a medium load but once it goes into fulltime production your badly optimized query hammers the system and brings it to a crawl. In most SQL servers you are not the only application/service that is using it. Your application has now brought a bunch of angry people at your door.
If you have your queries in SPROCs you also allow your friendly DBA to manage and optimize with out recompiling or breaking your app. Remember DBA's are experts in this field, they know what to do and not do. It makes sense to utilise their greater knowledge!
EDIT: someone said that recompile is a lazy excuse! yeah lets see how lazy you feel when you have to recompile and deploy your app to 1000's of desktops, all because the DBA has told you that your ad-hoc Query is eating up too much Server time!
someone said that recompile is a lazy excuse! yeah lets see how lazy you feel when you have to recompile and deploy your app to 1000's of desktops, all because the DBA has told you that your ad-hoc Query is eating up too much Server time!
is it good system architecture if you let connect 1000 desktops directly to database?
Some things to think about here: Who Needs Stored Procedures, Anyways?
Clearly it's a matter of your own needs and preferences, but one very important thing to think about when using ad hoc queries in a public-facing environment is security. Always parameterize them and watch out for the typical vulnerabilities like SQL-injection attacks.
Stored Procedures are great because they can be changed without a recompile. I would try to use them as often as possible.
I only use ad-hoc for queries that are dynamically generated based on user input.
Procs for the reasons mentioned by others and also it is easier to tune a proc with profiler or parts of a proc. This way you don't have to tell someone to run his app to find out what is being sent to SQL server
If you do use ad-hoc queries make sure that they are parameterized
Parametized SQL or SPROC...doesn't matter from a performance stand point...you can query optimize either one.
For me the last remaining benefit of a SPROC is that I can eliminate a lot SQL rights management by only granting my login rights to execute sprocs...if you use Parametized SQL the login withing your connection string has a lot more rights (writing ANY kind of select statement on one of the tables they have access too for example).
I still prefer Parametized SQL though...
I haven't found any compelling argument for using ad-hoc queries. Especially those mixed up with your C#/Java/PHP code.
The sproc performance argument is moot - the 3 top RDBMs use query plan caching and have been for awhile. Its been documented... Or is 1995 still?
However, embedding SQL in your app is a terrible design too - code maintenance seems to be a missing concept for many.
If an application can start from scratch with an ORM (greenfield applications are far and few between!) its a great choice as your class model drives your DB model - and saves LOTS of time.
If an ORM framework is not available we have taken a hybrid of approach of creating an SQL resource XML file to look up SQL strings as we need them (they are then cached by the resource framework). If the SQL needs any minor manipulation its done in code - if major SQL string manipulation is needed we rethink the approach.
This hybrid approach lends to easy management by the developers (maybe we are the minority as my team is bright enough to read a query plan) and deployment is a simple checkout from SVN. Also, it makes switching RDBMs easier - just swap out the SQL resource file (not as easy as an ORM tool of course, but connecting to legacy systems or non-supported database this works)
Depends what your goal is. If you want to retrieve a list of items and it happens once during your application's entire run for example, it's probably not worth the effort of using a stored procedure. On the other hand, a query that runs repeatedly and takes a (relatively) long time to execute is an excellent candidate for database storage, since the performance will be better.
If your application lives almost entirely within the database, stored procedures are a no-brainer. If you're writing a desktop application to which the database is only tangentially important, ad-hoc queries may be a better option, as it keeps all of your code in one place.
#Terrapin: I think your assertion that the fact that you don't have to recompile your app to make modifications makes stored procedures a better option is a non-starter. There may be reasons to choose stored procedures over ad-hoc queries, but in the absence of anything else compelling, the compile issue seems like laziness rather than a real reason.
My experience is that 90% of queries and/or stored procedures should not be written at all (at least by hand).
Data access should be generated somehow automaticly. You can decide if you'd like to staticly generate procedures in compile time or dynamically at run time but when you want add column to the table (property to the object) you should modify only one file.
I prefer keeping all data access logic in the program code, in which the data access layer executes straight SQL queries. On the other hand, data management logic I put in the database in the form of triggers, stored procedures, custom functions and whatnot. An example of something I deem worthy of database-ifying is data generation - assume our customer has a FirstName and a LastName. Now, the user interface needs a DisplayName, which is derived from some nontrivial logic. For this generation, I create a stored procedure which is then executed by a trigger whenever the row (or other source data) is updated.
There appears to be this somewhat common misunderstanding that the data access layer IS the database and everything about data and data access goes in there "just because". This is simply wrong but I see a lot of designs which derive from this idea. Perhaps this is a local phenomonon, though.
I may just be turned off the idea of SPs after seeing so many badly designed ones. For example, one project I participated in used a set of CRUD stored procedures for every table and every possible query they encountered. In doing so they simply added another completely pointless layer. It is painful to even think about such things.
These days I hardly ever use stored procedures. I only use them for complicated sql queries that can't easily be done in code.
One of the main reasons is because stored procedures do not work as well with OR mappers.
These days I think you need a very good reason to write a business application / information system that does not use some sort of OR mapper.
Stored procedure work as block of code so in place of adhoc query it work fast.
Another thing is stored procedure give recompile option which the best part of
SQL you just use this for stored procedures nothing like this in adhoc query.
Some result in query and stored procedure are different that's my personal exp.
Use cast and covert function for check this.
Must use stored procedure for big projects to improve the performance.
I had 420 procedures in my project and it's work fine for me. I work for last 3 years on this project.
So use only procedures for any transaction.
is it good system architecture if you
let connect 1000 desktops directly to
database?
No it's obviously not, it's maybe a poor example but I think the point I was trying to make is clear, your DBA looks after your database infrastructure this is were their expertise is, stuffing SQL in code locks the door to them and their expertise.