I'm a software developer not a TSQL or DBA expert, just background. One of my applications uses allot of SQL views for reporting purposes, at this stage (might change) the windows application execute the view and I display the data in a grid/table for reporting purposes. The views are becoming more and more complex and slower, that's one problem. I'm in the process of re-designing the application to use a web front-end for reporting. But my question is what is the best approach with reports in terms of SQL, should my reports be based on Stored Procedure or Views? Any other comments or advice on SQL reporting welcome, like I mentioned I'm a software developer and I try to stay clear of SQL work, but this has become an issue and I thought this is a good time to sharpen my SQL knowledge.
Thank you for reading.
Stored procedures (SPs) are a better choice than views, but views are much better than SQL queries embedded in reports. I know you didn't mention embedded SQL but I'm going to discuss it briefly to give a more rounded answer.
When you embed a SQL query in a report (or an application or anything outside of the database) you are assuming that all of the objects referenced are not going to change in any way. This is firstly a big assumption (and assumptions are bad) and secondly a crippling restriction on the database owner - they can't change anything because it might break something somewhere.
When you use an SP or a view to access a database you make the reasonable assumption that the name of the object you are calling (the SP or view) won't change and that any parameter set will remain constant or at least stay compatible. Both approaches hide away the logic of the query from the caller - the logic can be corrected and improved over time without affecting the caller. The entire database can be refactored or even redeisigned as long as the name of the exposed object (and any parameters) remain the same and the caller will never know.
The advantage of using an SP over a view is that you can do far more. For example it's a good idea to validate that parameter values are within expected ranges. If you have a particularly complex query you can break it down into smaller steps, using temp tables for example. Moving on to very heavy queries you could even do interim maintenance steps in an SP, updating stats for example.
I would recommend using SPs for all database access. You might not need to now, but it will give you much more scope to change things in the future if you need to.
I'm writing many reporting queries for my current employer utilizing Oracle's WITH clause to allow myself to create simple steps, each of which is a data-oriented transformation, that build upon each other to perform a complex task.
It was brought to my attention today that overuse of the WITH clause could have negative side effects on the Oracle server's resources.
Can anyone explain why over use of the Oracle WITH clause may cause a server to crash? Or point me to some articles where I can research appropriate use cases? I started using the WITH clause heavily to add structure to my code and make it easier to understand. I hope with some informative responses here I can continue to use it efficiently.
If an example query would be helpful I'll try to post one later today.
Thanks!
Based on this: http://www.dba-oracle.com/t_with_clause.htm it looks like this is a way to avoid using temporary tables. However, as others will note, this may actually mean heavier, more expensive queries that will put an additional drain on the database server.
It may not 'crash'. That's a bit dramatic. More likely it will just be slower, use more memory, etc. How that affects your company will depend on the amount of data, amount of processors, amount of processing (either using with or not)
In our report generation application, there's some pretty hefty queries that take a considerable amount of time to run. User feedback up until this point has been basically zip while the server chugs away at their request. I noticed that there's a tab on the ADA Management Utility that shows progress on the query both as percent complete and estimated seconds remaining. I tried digging through the tables to see if I could find any of this information exposed, as well as picking through the limited documentation available for ADBS and couldn't find anything useful.
Does anyone know if there's a way I can cull this information outside ADA to provide some needed user feedback?
ADA is getting that information from the sp_GetSQLStatements system procedure.
However, the traditional way of providing progress information for any operation is through a callback function.
This isn't an answer to the question but might be useful in helping reduce the time it takes to run the queries in the report. You may have already done this and made it as optimized as it gets. But if not, you might look at the query plan within Advantage Data Architect to check for optimization issues. In the query window where you run a query, you can choose Show Plan from the SQL menu (or click the button in the toolbar). This will display the execution plan with optimization information that might help identify missing indexes.
Another tool that might be helpful in identifying unoptimized queries is through query logging. It is also discussed here.
This is going to be both a direct question and a discussion point. I'll ask the direct question first:
Can a stored procedure create another stored procedure dynamically? (Personally I'm interested in SQL Server 2008, but in the interests of wider discussion will leave it open)
Now to the reason I'm asking. Briefly (you can read more elsewhere), User Defined Scalar Functions in SQL Server are performance bottlenecks, at best. I've seen uses in our code base that slow the total query down by 3-4x, but from what I've read the local impact of the S-UDF can be 10x+
However, UDFs are, potentially, great for raising abstraction levels, reducing lots of tedious boilerplate, centralising logic rules etc. In most cases they boil down to simple expressions that could easily be expanded inline - but they're not (I'm really only thinking of non-querying functions - e.g. string manipluations). I've seen a bug report for this to be addressed in a future release - with some buy-in from MS. But for now we have to live with the (IMHO) broken implementation.
One workaround is to use a table value UDF instead - however these complicate the client code in ways you don't always want to deal with (esp. when the UDF just computes the result of an expression).
So my crazy idea, at first, was to write the procs with C Preprocessor directives, then pass it through a preprocessor before submitting to the RDBMS. This could work, but has its own problems.
That led me to my next crazy idea, which was to define the "macros" in the DB itself, and have a master proc that accepts a string containing an unprocessed SP with macros, expands the macros inline, then submits it on to the RDMS. This is not what SPs are good at, but I think it could work - assuming you can do this in the first place - hence my original question.
However, now I have explained my path to the question, I'd also like to leave it open for other ideas. I'm sure I'm not the only one who has been thinking along these lines. Perhaps there are third-party solutions already out there? My googling has not turned up much yet.
Also I thought it would be a fun discussion topic.
[edit]
This blog post I found in my research describes the same issue I'm seeing. I'd be happy if anyone could point out something that I, or the blog poster, might be doing wrong that leads to the overhead.
I should also add that I am using WITH SCHEMABINDING on my S-UDF, although it doesn't seem to be giving me any advantage
your string processing UDF won't be a perf problem. Scalar UDF's are a problem only when they perform selects and those selects are done for every row. this in turn spikes the IO.
string manipulaation on the other hand is done in memory and is fast.
as for your idea i can't really see any benefit of it. creating and dropping objects like that can be an expensive operation and may lead to schema locking.
One of my co-workers claims that even though the execution path is cached, there is no way parameterized SQL generated from an ORM is as quick as a stored procedure. Any help with this stubborn developer?
I would start by reading this article:
http://decipherinfosys.wordpress.com/2007/03/27/using-stored-procedures-vs-dynamic-sql-generated-by-orm/
Here is a speed test between the two:
http://www.blackwasp.co.uk/SpeedTestSqlSproc.aspx
Round 1 - You can start a profiler trace and compare the execution times.
For most people, the best way to convince them is to "show them the proof." In this case, I would create a couple basic test cases to retrieve the same set of data, and then time how long it takes using stored procedures versus NHibernate. Once you have the results, hand it over to them and most skeptical people should yield to the evidence.
I would only add a couple things to Rob's answer:
First, Make sure the amount of data involved in the test cases is similiar to production values. In other words if your queries are normally against tables with hundreds of thousands or rows, then create such a test environment.
Second, make everything else equal except for the use of an nHibernate generated query and a s'proc call. Hopefully you can execute the test by simply swapping out a provider.
Finally, realize that there is usually a lot more at stake than just stored procedures vs. ORM. With that in mind the test should look at all of the factors: execution time, memory consumption, scalability, debugging ability, etc.
The problem here is that you've accepted the burden of proof. You're unlikely to change someone's mind like that. Like it or not, people--even programmers-- are just too emotional to be easily swayed by logic. You need to put the burden of proof back on him- get him to convince you otherwise- and that will force him to do the research and discover the answer for himself.
A better argument to use stored procedures is security. If you use only stored procedures, with no dynamic sql, you can disable SELECT, INSERT, UPDATE, DELETE, ALTER, and CREATE permissions for the application database user. This will protect you against most 2nd order SQL Injection, whereas parameterized queries are only effective against first order injection.
Measure it, but in a non-micro-benchmark, i.e. something that represents real operations in your system. Even if there would be a tiny performance benefit for a stored procedure it will be insignificant against the other costs your code is incurring: actually retrieving data, converting it, displaying it, etc. Not to mention that using stored procedures amounts to spreading your logic out over your app and your database with no significant version control, unit tests or refactoring support in the latter.
Benchmark it yourself. Write a testbed class that executes a sampled stored procedure a few hundred times, and run the NHibernate code the same amount of times. Compare the average and median execution time of each method.
It is just as fast if the query is the same each time. Sql Server 2005 caches query plans at the level of each statement in a batch, regardless of where the SQL comes from.
The long-term difference might be that stored procedures are many, many times easier for a DBA to manage and tune, whereas hundreds of different queries that have to be gleaned from profiler traces are a nightmare.
I've had this argument many times over.
Almost always I end up grabbing a really good dba, and running a proc and a piece of code with the profiler running, and get the dba to show that the results are so close its negligible.
Measure it.
Really, any discussion on this topic is probably futile until you've measured it.
He may be correct for the specific use case he is thinking of. A stored procedure will probably execute faster for some complex set of SQL, that can be arbitrarily tuned. However, something you get from things like hibernate is caching. This may prove much faster for the lifetime of your actual application.
The additional layer of abstraction will cause it to be slower than a pure call to a sproc. Just by the fact that you have additional allocations on the managed heap, and additional pushes and pops off the callstack, the truth of the matter is that it is more efficient to call a sproc over having an ORM build the query, regardless how good the ORM is.
How slow, if its even measurable, is debatable. This is also helped by the fact that most ORM's have a caching mechanism to avoid doing the query at all.
Even if the stored procedure is 10% faster (it probably isn't), you may want to ask yourself how much it really matters. What really matters in the end, is how easy it is to write and maintain code for your system. If you are coding a web app, and your pages all return in 0.25 seconds, then the extra time saved by using stored procedures is negligible. However, there can be many added advantages of using an ORM like NHibernate, which would be extremely hard to duplicate using only stored procedures.