How to run an unknown number of stored procs in parallel - sql

I am looking for something like this. Although the syntax is wrong, it demonstrates the principle
foreach(int i in myIntArray)
{
execute mystoredProc i;//this should kick off the proc and go onto next one without waiting for a return value
}
These stored procs are called from a Windows application. I am a little skeptical of creating numerous threads at the application end. I would rather do the threading at the SQL server end. I am open to using SSIS.

You can't do what you're asking for directly.
What you can do is to fire up n threads, and then each thread open up it's own connection, and each connection run it's own SQL query. Each thread will then wait for it's query to return. You can't do this in just one thread.
This also means that you can't do it natively within T-SQL.
You could write a CLR routine that launches multiple threads, and repeat the above process. So enabling your T-SQL to call your CLR code and the CLR deal with the concurrency problem.
But the standard practice for this really is to have multiple client side threads.

Related

Run SQL statements asynchronously

I have six SQL UPDATE statements (on the same database) as an SQL Agent job that I run every night to ensure that two systems are in sync with each other. Each update takes about 10 minutes to run.
As a test today I opened SQL Studio Manager and opened five windows and ran the five updates concurrently (I can guarantee that a row can only ever be updated by one SQL statement). The five queries ran in 15 minutes.
Therefore instead of using a single SQL Agent Job I am thinking about calling the SQL statements from a VB.NET program, so that I can either:
1) Use asynchronous calls to ensure the queries are running concurrently.
2) Use multiple threads to ensure the queries are running concurrently
I read an article recently that says that asynchronous calls should not be used to speed up processing performance. Therefore I believe that multiple threads is the answer. Is that correct?
I read an article recently that says that asynchronous calls should not be used to speed up processing performance. Therefore I believe that multiple threads is the answer. Is that correct?
I think either what you read is wrong, or you've misinterpreted it. Running things concurrently will not speed that thing up, but will allow more things to happen in parallel by freeing up threads (on Windows threads are expensive to create: creating and destroying them over short periods should be avoided).
Concurrency (eg. using .NET 4.5.1's async support) allows the activity, including starting other asynchronous actions, to continue while the thread is used for something else.
The details of how to do this depend on how you are accessing the database: Entity Framework (EF), ADO.NET, or something else?
With EF you can use the extension methods in QueryableExtensions like ToListAsync on queries.
With ADO.NET you can use SqlCommand methods like ExecuteNonQueryAsync and ExecuteReaderAsync.
since you are dealing with a sql statement the choice you make in vb.net will not affect the performances or the time required to complete the 5 tasks on sql.
if you make 5 async calls then you will sit waiting for 5 answers; if you spawn 5 threads these threads will sit waiting for their syncronous calls to finish. the net result will be the same.
me too i'm pushing for the 5 agent jobs: is a solution that leverages existing sql tools, does not requires additional coding (more coding = more maintenance) and is available out of the box on almost any sql instance.

Does SqlCommand.ExecuteNonQuery waits till completion of stored procedure?

I am working on a Windows desktop application built in VB.NET that interfaces with a web-based database (Microsoft SQL Server 2012). It connects to database when required and performs some calculations locally.
Now, there is a new requirement to execute a sproc from this Windows application. I need to know that when I call SqlCommand.ExecuteNonQuery, is the SqlCommand freed up immediately or it waits till the sproc has finished all work?
I cannot test right now because the stored procedure is yet to be designed and it is going to take minimum 2 min to complete.
I am running the SqlCommand.ExecuteNonQuery on a separate thread than the UI.
Concerns if SqlCommand is not freed up immediately:
1. If user exits the application before completion, will the sproc finish work?
2. If timeout occurs, will the sproc finish work?
Sorry if my doubts are basic in nature, I am a novice.
Yes, ExecuteNonQuery will wait until the SP is complete or errors.
If you are running the ExecuteNonQuery from another thread, only that thread is waiting. Other threads are not so the user can close the application. If you cleanup the thread before closing, as you should, it will send a cancel to the SP. (This does not mean it will actually cancel. If it is almost done, it may finish anyway.) What happens then is up to your SQL developer. If they run the SP inside a transaction, they should rollback the changes and return the error. If not, get a new SQL developer.
The method ExecuteNonQuery waits until the stored procedure finishes. The return will be an integer with the number of affected rows.

How to execute sub-tasks in a SQL Procedure in parallel

I have a process that analyzes audit data from one system to build reporting data for another system. There is a managing procedure that loops for each day to be analyzed and calls a entity specific procedure with the current iteration's day. Some entities take less than a second to process while others can take minutes. Running serially as it does in t-sql the cpu utilization never crests above 8% on the 16-core server. Each of the entity specific procedures are not dependent on the others, just that all of the entities for that day are complete before the next day is started.
My idea is to have a CLR managing procedure and start the longer running procedures for the day running on their own threads, then once the quick ones are done, Thread.Join() the long running threads to wait for all Entities to complete for that day before moving on to the next.
Below is my try as the simplest thing that could work for just one worker thread, and calling Start on that thread does not result in the static method being called. I have set a break point in the HelloWorld method and it is never hit.
I have tried something very much like this in a console application and had it work as does calling it on the same thread in the commented out line at the start of AsyncHelloWorld. Is there something about threading within SQL CLR Procedures that is different?
using System.Threading;
using Microsoft.SqlServer.Server;
public partial class StoredProcedures
{
[SqlProcedure]
public static void AsyncHelloWorld()
{
// HelloWorld(SqlContext.Pipe);
var worker = new Thread(HelloWorld);
worker.Start(SqlContext.Pipe);
worker.Join();
}
public static void HelloWorld(object o)
{
var pipe = o as SqlPipe;
if (pipe != null)
pipe.Send("Hello World!");
}
}
You absolutely cannot do that. A SqlPipe is very strongly tied to the context of the thread you were invoked on. While you can, technically, launch threads from SQLCRL, these threads must do all interaction with the caller from the original thread. But even so, launching CLR threads inside the SQL hosted environment is a very bad idea (and I won't enter into details why).
Instead, separate your logic into procedures than can be invoked in parallel and invoke these procedures in parallel from the client. You can use Asynchronous procedure execution as a pattern of scheduling procedures to be launched in asynchronously and queue based activation has built-in support for parallelism via MAX_QUEUE_READERS setting.
But most likely your procedures do not need explicit parallelism. T-SQL loads than can benefit from explicit user controlled parallelism are so rare that is not worth mentioning (not to mention that pulling transactional semantics right across parallel tasks is beyond mere mortals). T-SQL can leverage internal statement parallelism for processing data in parallel, so there is never a need for explicit parallelism.
So better you explain what is that you're really trying to solve and perhaps we can help.

Can Sql Server 2008 Stored Procedures (or Triggers) manually parallel or background some logic?

If i have a stored procedure or a trigger in Sql Server 2008, can it do some sql calculations 'in another non-blocking thread'? ie. something in the background
also, can two sql code blocks be ran in parallel? or two stored procs be ran in parallel?
for example. Imagine we are given the job calculating the scores for each Stack Overflow user (and please leave all 'do that elsehwere/service/batch/overnight/etc, elswhere) after a user does some 'action'.
so we have a trigger on the Post table, so when a new post is INSERTED, the trigger fires off and part of that logic, it calculates the user's latest score. Instead of waiting for the stored proc to finish and block the current sql thread / executire, can we ask it to calc the score in the background OR parallel.
cheers!
SQL Server does not have parallel or deferred execution: each block of running code in a connection is serial, one line after the other.
To decouple processing, you usually have to use SQL Server Agent jobs or use Service broker. These start executing in a new connection, new session etc
This makes sense:
What if you want to rollback your changes? What does the background thread do and how does it know?
What data does it use? New, Old, lock wait, snapshot?
What if it gets ahead of the main thread and uses stale data?
No, but you could write the request to a queue. Service Broker, a SQL Server component, provides support for this kind of thing. It's probably the best option available for asynchronous processing.

Start stored procedures sequentially or in parallel

We have a stored procedure that runs nightly that in turn kicks off a number of other procedures. Some of those procedures could logically be run in parallel with some of the others.
How can I indicate to SQL Server whether a procedure should be run in parallel or serial — ie: kicked off of asynchronously or blocking?
What would be the implications of running them in parallel, keeping in mind that I've already determined that the processes won't be competing for table access or locks- just total disk io and memory. For the most part they don't even use the same tables.
Does it matter if some of those procedures are the same procedure, just with different parameters?
If I start a pair or procedures asynchronously, is there a good system in SQL Server to then wait for both of them to finish, or do I need to have each of them set a flag somewhere and check and poll the flag periodically using WAITFOR DELAY?
At the moment we're still on SQL Server 2000.
As a side note, this matters because the main procedure is kicked off in response to the completion of a data dump into the server from a mainframe system. The mainframe dump takes all but about 2 hours each night, and we have no control over it. As a result, we're constantly trying to find ways to reduce processing times.
I had to research this recently, so found this old question that was begging for a more complete answer. Just to be totally explicit: TSQL does not (by itself) have the ability to launch other TSQL operations asynchronously.
That doesn't mean you don't still have a lot of options (some of them mentioned in other answers):
Custom application: Write a simple custom app in the language of your choice, using asynchronous methods. Call a SQL stored proc on each application thread.
SQL Agent jobs: Create multiple SQL jobs, and start them asynchronously from your proc using sp_start_job. You can check to see if they have finished yet using the undocumented function xp_sqlagent_enum_jobs as described in this excellent article by Gregory A. Larsen. (Or have the jobs themselves update your own JOB_PROGRESS table as Chris suggests.) You would literally have to create separate job for each parallel process you anticipate running, even if they are running the same stored proc with different parameters.
OLE Automation: Use sp_oacreate and sp_oamethod to launch a new process calling the other stored proc as described in this article, also by Gregory A. Larsen.
DTS Package: Create a DTS or SSIS package with a simple branching task flow. DTS will launch tasks in individual spids.
Service Broker: If you are on SQL2005+, look into using Service Broker
CLR Parallel Execution: Use the CLR commands Parallel_AddSql and Parallel_Execute as described in this article by Alan Kaplan (SQL2005+ only).
Scheduled Windows Tasks: Listed for completeness, but I'm not a fan of this option.
I don't have much experience with Service Broker or CLR, so I can't comment on those options. If it were me, I'd probably use multiple Jobs in simpler scenarios, and a DTS/SSIS package in more complex scenarios.
One final comment: SQL already attempts to parallelize individual operations whenever it can*. This means that running 2 tasks at the same time instead of after each other is no guarantee that it will finish sooner. Test carefully to see whether it actually improves anything or not.
We had a developer that created a DTS package to run 8 tasks at the same time. Unfortunately, it was only a 4-CPU server :)
*Assuming default settings. This can be modified by altering the server's Maximum Degree of Parallelism or Affinity Mask, or by using the MAXDOP query hint.
Create a couple of SQL Server agent jobs where each one runs a particular proc.
Then from within your master proc kick off the jobs.
The only way of waiting that I can think of is if you have a status table that each proc updates when it's finished.
Then yet another job could poll that table for total completion and kick off a final proc. Alternatively, you could have a trigger on this table.
The memory implications are completely up to your environment..
UPDATE:
If you have access to the task system.. then you could take the same approach. Just have windows execute multiple tasks, each responsible for one proc. Then use a trigger on the status table to kick off something when all of the tasks have completed.
UPDATE2:
Also, if you're willing to create a new app, you could house all of the logic in a single exe...
You do need to move your overnight sprocs to jobs. SQL Server job control will let you do all of the scheduling you are asking for.
You might want to look into using DTS (which can be run from the SQL Agent as a job). It will allow you pretty fine control over which stored procedures need to wait for others to finish and what can run in parallel. You can also run the DTS package as an EXE from your own scheduling software if needed.
NOTE: You will need to create multiple copies of your connection objects to allow calls to run in parallel. Two calls using the same connection object will still block each other even if you don't explicitly put in a dependency.