concurrent queries in T-SQL - sql

I have a long stored procedure that does a lot of querying and generates a report. Since its a summary report, it calls a lot of other procs to get data. I am wondering if its possible to execute concurrent sql batches from with a proc in Sql server ...
many thanks

No, SQL Server does not do concurrency in the sense I think you mean.
How long does the code run for? Is is a problem?
Edit, based on comment.
11-20 seconds for a big summary report isn't bad at all.
If you submit the calls in parallel from the client, it may take the same or longer: if each query is fairly intense and is resource heavy, then you may max out the server by running them together and affect other processes. + then you have to assemble data in the final form in the client.

Your best bet is to report stuff like this from another database. Either based on transformations from your production database into an OLAP database (which will make the time delay go away), or at least a periodic (say, nighly) static snapshot (which will make the delay not matter - you can turn off locking because nothing will change).
Side benefit: Report readers will be a lot happier with reports run five minutes apart that give the same answers. Or 12 hours apart.
The benefit you will appreciate the most is that life will be simpler.

I think you'll have to do the concurrency in code (multithreaded sql calls) and concatenate the result at the end.

Related

powershell multi-threaded script for sql. Where to start?

So right now we have a sub script that basically iterates through a collection of SQL DBs and runs a given upgrade script to that DB. This works fine, but sometimes our scripts can be a bit intensive (adding columns, populating new columns, changing column lengths, etc) and end up taking forever. The bottle neck being it's apply the same script to one DB at a time.
Is there a way to multithread the script so it applies the script to every DB at the same time? All answers or links to blogs/documentation are welcome!
Here's another take on it, FWIW:
Invoke-ScriptAsynch
This is a good article on the subject of Powershell multithreading at: http://www.get-blog.com/?p=189
There is a lot of example code there for maximum multithreading pleasure.
Edit:
However, I would like to add a small warning for running major DDL queries on many databases on the same production server at the same time.
Depending on how much data, what recovery models you have and things like that, you might send the server into free fall and end up in really crazy situations where half the databases are unavailable and refuses to be contacted. So start slow with a few of the databases at the same time, and test that first.

Does the size of a stored procedure affect its execution performance?

Does the size of a stored procedure affect its execution performance?
Is it better to have a large SP that does all the process or to split it to multiple SPs, regarding to performance?
Let me paraphrase: "Does the size of my function affect it's execution performance?"
The obvious answer is: No. The function will run as fast as it possibly can on the hardware it happens to run on. (To be clear: A longer function will take longer to execute, of course. But it will not run slower. Therefore, the performance is unafffected.)
The right question is: "Does the way I write my function affect it's execution performance?"
The answer is: Yes, by all means.
If you are in fact asking the second question, you should add a code sample and be more specific.
No, not really - or not much, anyway. The Stored Proc is precompiled on the server - and it's not being sent back and forth between server and client - so it's size is really not all that relevant.
It's more important to have it set up in a maintainable and easy to read way.
Marc
Not sure what you mean by the SIZE of a stored procedure ( lines of code?, complexity? number of tables? number of joins? ) but the execution performance depends entirely upon the execution plan of the defined and compiled SQL within the stored procedure. This can be monitored quite well through SQL Profiler if you are using SQL Server. Performance is most heavy taxed by things like joins and table scans, and a good tool can help you figure out where to place your indexes, or think of better ways to define the SQL. Hope this helps.
You could possibly cause worse performance by coding multiple stored procedures, if the execution plans need to be flushed to reclaim local memory and a single procedure would not.
We have hit situations where a flushed stored procedure is needed again and must be recompiled. When querying a view accessing hundreds of partition tables, this can be costly and has caused timeouts in our production. Combining into two from eight solved this problem.
On the other hand, we had one stored procedure that was so complex that breaking it up into multiples allowed the query execution plan to be simpler for the chunks and performed better.
The other answers that are basically "it depends" are dead on. No matter how fast of a DB you have, a bad query an bring it to its knees. And each situation is unique. In most places, coding in a modular and easily understandable way, is better performing and cheaper to maintain. SQL server has to "understand" it to, as it builds the query plans.

Refactoring "extreme" SQL queries

I have a business user who tried his hand at writing his own SQL query for a report of project statistics (e.g. number of tasks, milestones, etc.). The query starts off declaring a temp table of 80+ columns. There are then almost 70 UPDATE statements to the temp table over almost 500 lines of code that each contain their own little set of business rules. It finishes with a SELECT * from the temp table.
Due to time constraints and 'other factors', this was rushed into production and now my team is stuck with supporting it. Performance is appalling, although thanks to some tidy up it's fairly easy to read and understand (although the code smell is nasty).
What are some key areas we should be looking at to make this faster and follow good practice?
First off, if this is not causing a business problem, then leave it until it becomes a problem. Wait until it becomes a problem, then fix everything.
When you do decide to fix it, check if there is one statement causing most of your speed issues ... issolate and fix it.
If the speed issue is over all the statements, and you can combine it all into a single SELECT, this will probably save you time. I once converted a proc like this (not as many updates) to a SELECT and the time to run it went from over 3 minutes to under 3 seconds (no shit ... I couldn't believe it). By the way, don't attempt this if some of the data is coming from a linked server.
If you don't want to or can't do that for whatever reason, then you might want to adjust the existing proc. Here are some of the things I would look at:
If you are creating indexes on the temp table, wait until after your initial INSERT to populate it.
Adjust your initial INSERT to insert as many of the columns as possible. There are probably some update's you can eliminate by doing this.
Index the temp table before running your updates. Do not create indexes on any of the columns targetted by the update statements until after their updated.
Group your updates if your table(s) and groupings allow for it. 70 updates is quite a few for only 80 columns, and sounds like there may be an opportunity to do this.
Good luck
First thing I would do is check to make sure there is an active index maintenance job being run periodically. If not, get all existing indexes rebuilt or if not possible at least get statistics updated.
Second thing I would do is set up a trace (as described here) and find out which statements are causing the highest number of reads.
Then I would run in SSMS with 'show actual execution plan' and tally the results with the trace. From this you should be able to work out whether there are missing indexes that could improve performance.
EDIT: If you are going to downvote, please leave a comment as to why.
Just like any refactoring, make sure you have an automated way to verify your refactorings after each change (you can write this yourself using queries which check the development output against a known good baseline). That way, you are always matching the known good data. This will give you a high degree of confidence in the correctness of your approach when you enter the phase where you are deciding whether to switch over to your new version of the process and want to run side by side for a few iterations to ensure correctness.
I also like to log all the test batches and the run times of the processes within the batch, so I can tell if some particular process within the batch was adversely affected at some point in time. I can get average times for processes and see trends of improvement or spot potential problems. This also lets me identify the low-hanging fruit within the batch where I can make the most improvement.
There are then almost 70 UPDATE
statements to the temp table over
almost 500 lines of code that each
contain their own little set of
business rules. It finishes with a
SELECT * from the temp table.
Actually this sounds like it can be followed and understood quite well, each update statement does one thing to the table with a specific purpose and set of business rules. I think that maintaining procedures of 500 lines of code with one or a couple of select statements that does "everything", built with 15 or so joins, and case statements etc scattered all over the place, is a lot harder to maintain. Although it would make for better performance..
It's a bit of a dilemma with SQL, that writing clear and concise code (using multiple updates, creating functions etc) always seems to have a big negative impact on performance. Trying to do everything at once, which is considered bad practice in other programming languages, seems to be the very core of set oriented languages.
If this is a report generating stored procedure, how often is it being run? If it's only necessary to run it once a day and is run during the night how much of an issue is the performance?
If it's not I'd recommend being careful in your choice to re-write it because there is a chance that you could muck up your figures.
Also it sounds like the sort of thing that should be pulled out into an SSIS package building up a new permanent table with the results so it only has to be run once.
Hope this makes sense
One thing you could try is to replace the temp table with a table variable. There are times when this is faster and times when it is not, you will have to just try it and see.
Look at the 70 update statements. It is possible to combine any of them? If the person writing did not use CASE statments, it might be possible to do fewer statements.
Other obvious things to look at - eliminate any cursors, change any subqueries to joins to tables or derived tables.
Rewrite perhaps. One hardware solution would be to make sure your database temp table goes on a 'fast' drive, perhaps a solid state disk (SSD), or can be managed all in memory.
My guess is this 'solution' was developed by someone with a grasp of and a dependency upon spreadsheets, someone who may not be very savvy on 'normalized' databases--how to construct and populate tables to retain data for reporting purposes, something which perhaps BI Business Intelligence software can be utilized with sophistication and yet be adaptable.
You didn't say 'where' the update process is being run. Is the update process being run as a SQL script from a separate computer (desktop) against the server where the data is? There can be significant bottlenecks and overhead created by that approach. If so, consider running the entire update process directly on the server as a local job, as a compiled stored procedure, bypassing the network and (multiple) cursor management overhead. It could have a scheduled time to run and a controlled priority, completing in off peak business data usage hours.
Evaluate how often 'commit' statements are really needed for the sequence of update statements...saving on a bunch of commit lines could notably improve the overall update time. There may be a couple of settings in the database client driver software which can make a notable difference.
Can the queries used for update conditions be factored out as static 'views' which in turn can be shared across multiple update statements? Views can keep in memory data/query rows frequently accessed. There may be performance tuning in determining how much update data can be pended before a commit is optimal.
It might be worth evaluating whether Triggers could be used to replace the batch job update sequence. You don't say from how many tables the data used comes from...that might help with decision making. I don't know if you have the option of adding triggers to the database tables from which the data is gathered. If so, adding a few triggers to a number of tables wouldn't really degrade overall system performance much, but might save a big wad of time on that update process. You could try replacing the update statements one at a time with triggers and see if the results are the same as before. Create a similar temp table, based on the same update process, then carefully test whether triggers feeding updates to the temp table could replace individual update statements. Perhaps you may have a sort of 'Data Warehouse' application. It might be useful to review how to set up a 'star' schema of tables to retain summarized business data for reporting.
Creating a comprehensive and cached 'view' which updates via the queries once per day, reflecting the updates might be another approach to explore.
Well, since the only thing you've told us about this stored procedure is that it has a 80+ column temp table, the only thing I can recommend is to remove that table, and rewrite the rest to remove the need for it.
You should get a tool that allows you to get an explain plan of all queries your app will run. It is the best bang for the buck on an SQL heavy app for performace increases. If you read and react to what the Explain Plan is telling you. If you are on Oracle what we used to use was TOAD by Qwest(?) I think. It was a great tool.
I would recommend looking at the tables involved, the end result, and starting from scratch to see if the query can be done in a more efficient manner. Keep the query to verify that the new one is working exactly the same as the old one, but try to forget all methods used to obtain the end result.
I would rewrite it from scratch.
You say that you understand what it supposed to do so it should not be that difficult. And I bet that the requirements for that piece of code will keep changing so if you do not rewrite it now you may end up maintaining some ugly monster

At some point in your career with SQL Server does parameter sniffing just jump out and attack?

Today again, I have a MAJOR issue with what appears to be parameter sniffing in SQL Server 2005.
I have a query comparing some results with known good results. I added a column to the results and the known good results, so that each month, I can load a new months results in both sides and compare only the current month. The new column is first in the clustered index, so new months will add to the end.
I add a criteria to my WHERE clause - this is code-generated, so it's a literal constant:
WHERE DATA_DT_ID = 20081231 -- Which is redundant because all DATA_DT_ID are 20081231 right now.
Performance goes to pot. From 7 seconds to compare about 1.5m rows to 2 hours and nothing completing. Running the generated SQL right in SSMS - no SPs.
I've been using SQL Server for going on 12 years now and I have never had so many problems with parameter sniffing as I have had on this production server since October (build build 9.00.3068.00). And in every case, it's not because it was run the first time with a different parameter or the table changed. This is a new table and it's only run with this parameter or no WHERE clause at all.
And, no, I don't have DBA access, and they haven't given me enough rights to see the execution plans.
It's to the point where I'm not sure I'm going to be able to handle this system off to SQL Server users with only a couple years experience.
UPDATE Turns out that although statistics claim to be up to date, running UPDATE STATISTICS WITH FULLSCAN clears up the problem.
FINAL UPDATE Even with recreating the SP, using WITH RECOMPILE and UPDATE STATISTICS, it turned out the query had to be rewritten in a different way to use a NOT IN instead of a LEFT JOIN with NULL check.
Not quite an answer, but I'll share my experience.
Parameter sniffing took a few years of SQL Server to come and bite me, when I went back to Developer DBA after moving away to mostly prod DBA work. I understood more about the engine, how SQL works, what was best left to the client etc and I was a better SQL coder.
For example, dynamic SQL or CURSORs or just plain bad SQL code probably won't ever suffer parameter sniffing. But better set programming or how to avoid dynamic SQL or more elegant SQL more likely will.
I noticed it for complex search code (plenty of conditionals) and complex reports where parameter defaults affected the plan. When I see how less experienced developers would write this code, then it won't suffer parameter sniffing.
In any event, I prefer parameter masking to WITH RECOMPILE. Updating stats or indexes forces a recompile anyway. But why recompile all the time? I've answered elsewhere to one of your questions with a link that mentions parameters are sniffed during compilation, so I don't have faith in it either.
Parameter masking is an overhead, yes, but it allows the optimiser to evaluate the query case by case, rather than blanket recompiling. Especially with statement level recompilation of SQL Server 2005
OPTIMISE FOR UNKNOWN in SQL Server 2008 also appears to do exactly the same thing as masking. My SQL Server MVP colleague and I spent some time investigating and came to this conclusion.
I suspect your problem is caused by out of data statistics. Since you do not have DBA access to the server, I would encourage you to ask the DBA when the last time statistics were updated. This can have a huge impact on performance. It also sounds like your tables are not indexed very well.
Basically, this does not "feel" like a parameter sniffing issue, but more of a "healthy" database issue.
This article describes how you can determine the last time statistics were updated:
Statistics Update Time
I second the comment about checking the statistics - I have seen several instances where a query's performance has fallen off a cliff specifically because the statistics are out of date.
Specifically, if you have a date in your PK, and SQL Server thinks there are only a 10 or 100 records which after a specific date when in fact there are thousands, it may choose terribly inefficient query plans because it thinks the dataset is much smaller than it really is.
HTH,
Andrew
I had a production issue exactly like this. A tab in the application which called a stored proc would not show. I ran a trace for the specific proc and saw the call. The application times out in 30 secs and the proc would take close to 40 - 50 secs to complete (ran the proc exactly as called from the trace).
Next step was to figure out which statement was causing the scans I notice in the execution of the procedure. So I scripted out the proc, removed the procedure syntax and declared variables and ran in query analyser. It RAN in 3 secs!!!
I'm writing this to let anyone out there looking for answeres know that this can happen in SQL. It stems from the parameter sniffing issue. I was able t ofind this thread because I pin-pointed the cause as a faulty cached query plan! I've read posts where they said it happens to one specific users/ value. But it can happen to any value and once it starts, it can be a continuous thing.
The solution for me was to script out the proc and run it again. yeah. that simple. An alter works fine. No need to drop and re-create. This causes SQL to refresh the cached plan and things were fine. I have not figured out how to disable this at a server level. It is too cumbersome to clean up all the procs. Hope this helps

Why is parameterized SQL generated by NHibernate just as fast as a stored procedure?

One of my co-workers claims that even though the execution path is cached, there is no way parameterized SQL generated from an ORM is as quick as a stored procedure. Any help with this stubborn developer?
I would start by reading this article:
http://decipherinfosys.wordpress.com/2007/03/27/using-stored-procedures-vs-dynamic-sql-generated-by-orm/
Here is a speed test between the two:
http://www.blackwasp.co.uk/SpeedTestSqlSproc.aspx
Round 1 - You can start a profiler trace and compare the execution times.
For most people, the best way to convince them is to "show them the proof." In this case, I would create a couple basic test cases to retrieve the same set of data, and then time how long it takes using stored procedures versus NHibernate. Once you have the results, hand it over to them and most skeptical people should yield to the evidence.
I would only add a couple things to Rob's answer:
First, Make sure the amount of data involved in the test cases is similiar to production values. In other words if your queries are normally against tables with hundreds of thousands or rows, then create such a test environment.
Second, make everything else equal except for the use of an nHibernate generated query and a s'proc call. Hopefully you can execute the test by simply swapping out a provider.
Finally, realize that there is usually a lot more at stake than just stored procedures vs. ORM. With that in mind the test should look at all of the factors: execution time, memory consumption, scalability, debugging ability, etc.
The problem here is that you've accepted the burden of proof. You're unlikely to change someone's mind like that. Like it or not, people--even programmers-- are just too emotional to be easily swayed by logic. You need to put the burden of proof back on him- get him to convince you otherwise- and that will force him to do the research and discover the answer for himself.
A better argument to use stored procedures is security. If you use only stored procedures, with no dynamic sql, you can disable SELECT, INSERT, UPDATE, DELETE, ALTER, and CREATE permissions for the application database user. This will protect you against most 2nd order SQL Injection, whereas parameterized queries are only effective against first order injection.
Measure it, but in a non-micro-benchmark, i.e. something that represents real operations in your system. Even if there would be a tiny performance benefit for a stored procedure it will be insignificant against the other costs your code is incurring: actually retrieving data, converting it, displaying it, etc. Not to mention that using stored procedures amounts to spreading your logic out over your app and your database with no significant version control, unit tests or refactoring support in the latter.
Benchmark it yourself. Write a testbed class that executes a sampled stored procedure a few hundred times, and run the NHibernate code the same amount of times. Compare the average and median execution time of each method.
It is just as fast if the query is the same each time. Sql Server 2005 caches query plans at the level of each statement in a batch, regardless of where the SQL comes from.
The long-term difference might be that stored procedures are many, many times easier for a DBA to manage and tune, whereas hundreds of different queries that have to be gleaned from profiler traces are a nightmare.
I've had this argument many times over.
Almost always I end up grabbing a really good dba, and running a proc and a piece of code with the profiler running, and get the dba to show that the results are so close its negligible.
Measure it.
Really, any discussion on this topic is probably futile until you've measured it.
He may be correct for the specific use case he is thinking of. A stored procedure will probably execute faster for some complex set of SQL, that can be arbitrarily tuned. However, something you get from things like hibernate is caching. This may prove much faster for the lifetime of your actual application.
The additional layer of abstraction will cause it to be slower than a pure call to a sproc. Just by the fact that you have additional allocations on the managed heap, and additional pushes and pops off the callstack, the truth of the matter is that it is more efficient to call a sproc over having an ORM build the query, regardless how good the ORM is.
How slow, if its even measurable, is debatable. This is also helped by the fact that most ORM's have a caching mechanism to avoid doing the query at all.
Even if the stored procedure is 10% faster (it probably isn't), you may want to ask yourself how much it really matters. What really matters in the end, is how easy it is to write and maintain code for your system. If you are coding a web app, and your pages all return in 0.25 seconds, then the extra time saved by using stored procedures is negligible. However, there can be many added advantages of using an ORM like NHibernate, which would be extremely hard to duplicate using only stored procedures.