I have a process that analyzes audit data from one system to build reporting data for another system. There is a managing procedure that loops for each day to be analyzed and calls a entity specific procedure with the current iteration's day. Some entities take less than a second to process while others can take minutes. Running serially as it does in t-sql the cpu utilization never crests above 8% on the 16-core server. Each of the entity specific procedures are not dependent on the others, just that all of the entities for that day are complete before the next day is started.
My idea is to have a CLR managing procedure and start the longer running procedures for the day running on their own threads, then once the quick ones are done, Thread.Join() the long running threads to wait for all Entities to complete for that day before moving on to the next.
Below is my try as the simplest thing that could work for just one worker thread, and calling Start on that thread does not result in the static method being called. I have set a break point in the HelloWorld method and it is never hit.
I have tried something very much like this in a console application and had it work as does calling it on the same thread in the commented out line at the start of AsyncHelloWorld. Is there something about threading within SQL CLR Procedures that is different?
using System.Threading;
using Microsoft.SqlServer.Server;
public partial class StoredProcedures
{
[SqlProcedure]
public static void AsyncHelloWorld()
{
// HelloWorld(SqlContext.Pipe);
var worker = new Thread(HelloWorld);
worker.Start(SqlContext.Pipe);
worker.Join();
}
public static void HelloWorld(object o)
{
var pipe = o as SqlPipe;
if (pipe != null)
pipe.Send("Hello World!");
}
}
You absolutely cannot do that. A SqlPipe is very strongly tied to the context of the thread you were invoked on. While you can, technically, launch threads from SQLCRL, these threads must do all interaction with the caller from the original thread. But even so, launching CLR threads inside the SQL hosted environment is a very bad idea (and I won't enter into details why).
Instead, separate your logic into procedures than can be invoked in parallel and invoke these procedures in parallel from the client. You can use Asynchronous procedure execution as a pattern of scheduling procedures to be launched in asynchronously and queue based activation has built-in support for parallelism via MAX_QUEUE_READERS setting.
But most likely your procedures do not need explicit parallelism. T-SQL loads than can benefit from explicit user controlled parallelism are so rare that is not worth mentioning (not to mention that pulling transactional semantics right across parallel tasks is beyond mere mortals). T-SQL can leverage internal statement parallelism for processing data in parallel, so there is never a need for explicit parallelism.
So better you explain what is that you're really trying to solve and perhaps we can help.
Related
I'm new to Apache Ignite (using 2.7) and I'm looking to create a set of compute tasks that also query data from a cache. I see in the docs the concept of collocated processing but I don't see any examples in the repo. Couple of things I'm unclear on:
1) I want to query the cache from within the task, do I need to create another instance of Cache using Ignite.start or Client mode from within this task, or is there some implicit variable I can use from the context to query the cache.
2) Specifically I'd like to to execute this task as the result of a Continuous Query callback, are there any example detailing that?
thanks
You should inject an instance of Ignite into your task - this is preferred approach.
This may be tricky - make sure to not run this task synchronously since you should not acquire any locks from Continuous Query callback. Maybe Async() methods are OK. The preferred approach is to schedule a taks into your own thread pool to handle procesing latter, and return from callback. Make sure that you don't wait on thread pool as it exhausts (since the common strategy is to run task synchronously if pool is full).
I have six SQL UPDATE statements (on the same database) as an SQL Agent job that I run every night to ensure that two systems are in sync with each other. Each update takes about 10 minutes to run.
As a test today I opened SQL Studio Manager and opened five windows and ran the five updates concurrently (I can guarantee that a row can only ever be updated by one SQL statement). The five queries ran in 15 minutes.
Therefore instead of using a single SQL Agent Job I am thinking about calling the SQL statements from a VB.NET program, so that I can either:
1) Use asynchronous calls to ensure the queries are running concurrently.
2) Use multiple threads to ensure the queries are running concurrently
I read an article recently that says that asynchronous calls should not be used to speed up processing performance. Therefore I believe that multiple threads is the answer. Is that correct?
I read an article recently that says that asynchronous calls should not be used to speed up processing performance. Therefore I believe that multiple threads is the answer. Is that correct?
I think either what you read is wrong, or you've misinterpreted it. Running things concurrently will not speed that thing up, but will allow more things to happen in parallel by freeing up threads (on Windows threads are expensive to create: creating and destroying them over short periods should be avoided).
Concurrency (eg. using .NET 4.5.1's async support) allows the activity, including starting other asynchronous actions, to continue while the thread is used for something else.
The details of how to do this depend on how you are accessing the database: Entity Framework (EF), ADO.NET, or something else?
With EF you can use the extension methods in QueryableExtensions like ToListAsync on queries.
With ADO.NET you can use SqlCommand methods like ExecuteNonQueryAsync and ExecuteReaderAsync.
since you are dealing with a sql statement the choice you make in vb.net will not affect the performances or the time required to complete the 5 tasks on sql.
if you make 5 async calls then you will sit waiting for 5 answers; if you spawn 5 threads these threads will sit waiting for their syncronous calls to finish. the net result will be the same.
me too i'm pushing for the 5 agent jobs: is a solution that leverages existing sql tools, does not requires additional coding (more coding = more maintenance) and is available out of the box on almost any sql instance.
I am looking for something like this. Although the syntax is wrong, it demonstrates the principle
foreach(int i in myIntArray)
{
execute mystoredProc i;//this should kick off the proc and go onto next one without waiting for a return value
}
These stored procs are called from a Windows application. I am a little skeptical of creating numerous threads at the application end. I would rather do the threading at the SQL server end. I am open to using SSIS.
You can't do what you're asking for directly.
What you can do is to fire up n threads, and then each thread open up it's own connection, and each connection run it's own SQL query. Each thread will then wait for it's query to return. You can't do this in just one thread.
This also means that you can't do it natively within T-SQL.
You could write a CLR routine that launches multiple threads, and repeat the above process. So enabling your T-SQL to call your CLR code and the CLR deal with the concurrency problem.
But the standard practice for this really is to have multiple client side threads.
I would like to know how you would run a stored procedure from a page and just "let it finish" even if the page is closed. It doesn't need to return any data.
A database-centric option would be:
Create a table that will contain a list (or queue) of long-running jobs to be performed.
Have the application add an entry to the queue if, when, and as desired. That's all it does; once logged and entered, no web session or state data need be maintained.
Have a SQL Agent job configured to check every 1, 2, 5, whatever minutes to see if there are any jobs to run.
If there are as-yet unstarted items, mark the most recent one as started, and start it.
When it's completed, mark it as completed, or just delete it
Check if there are any other items to run. If there are, repeat; if not, exit the job.
Depending on capacity, you could have several (differently named) copies of this job running, concurrently processing items from the list.
(I've used this method for very long-running methods. It's more an admin-type trick, but it may be appropriate for your situation.)
Prepare the command first, then queue it in the threadpool. Just make sure the thread does not depend on any HTTP Context or any other http intrinsic object. If your request finishes before the thread; the context might be gone.
See Asynchronous procedure execution. This is the only method that guarantees the execution even if the ASP process crashes. It also self tuning and can handle spikes of load, requests are queued up and processed as resources become available.
The gist of the solution is leveraging the SQL Server Activation concept, which allows you to run a stored procedure in a background thread in SQL Server without a client connection.
Solutions based on SqlClient asynch methods or on CLR thread pool are unreliable, the calls are lost as the ASP process is recycled, and besides they build up in-memory queues of requests that actually trigger a process recycle due to memory consumption.
Solutions based on tables and Agent jobs are better, as they are reliable, but they lack the self tuning of Activation based solutions.
We have a stored procedure that runs nightly that in turn kicks off a number of other procedures. Some of those procedures could logically be run in parallel with some of the others.
How can I indicate to SQL Server whether a procedure should be run in parallel or serial — ie: kicked off of asynchronously or blocking?
What would be the implications of running them in parallel, keeping in mind that I've already determined that the processes won't be competing for table access or locks- just total disk io and memory. For the most part they don't even use the same tables.
Does it matter if some of those procedures are the same procedure, just with different parameters?
If I start a pair or procedures asynchronously, is there a good system in SQL Server to then wait for both of them to finish, or do I need to have each of them set a flag somewhere and check and poll the flag periodically using WAITFOR DELAY?
At the moment we're still on SQL Server 2000.
As a side note, this matters because the main procedure is kicked off in response to the completion of a data dump into the server from a mainframe system. The mainframe dump takes all but about 2 hours each night, and we have no control over it. As a result, we're constantly trying to find ways to reduce processing times.
I had to research this recently, so found this old question that was begging for a more complete answer. Just to be totally explicit: TSQL does not (by itself) have the ability to launch other TSQL operations asynchronously.
That doesn't mean you don't still have a lot of options (some of them mentioned in other answers):
Custom application: Write a simple custom app in the language of your choice, using asynchronous methods. Call a SQL stored proc on each application thread.
SQL Agent jobs: Create multiple SQL jobs, and start them asynchronously from your proc using sp_start_job. You can check to see if they have finished yet using the undocumented function xp_sqlagent_enum_jobs as described in this excellent article by Gregory A. Larsen. (Or have the jobs themselves update your own JOB_PROGRESS table as Chris suggests.) You would literally have to create separate job for each parallel process you anticipate running, even if they are running the same stored proc with different parameters.
OLE Automation: Use sp_oacreate and sp_oamethod to launch a new process calling the other stored proc as described in this article, also by Gregory A. Larsen.
DTS Package: Create a DTS or SSIS package with a simple branching task flow. DTS will launch tasks in individual spids.
Service Broker: If you are on SQL2005+, look into using Service Broker
CLR Parallel Execution: Use the CLR commands Parallel_AddSql and Parallel_Execute as described in this article by Alan Kaplan (SQL2005+ only).
Scheduled Windows Tasks: Listed for completeness, but I'm not a fan of this option.
I don't have much experience with Service Broker or CLR, so I can't comment on those options. If it were me, I'd probably use multiple Jobs in simpler scenarios, and a DTS/SSIS package in more complex scenarios.
One final comment: SQL already attempts to parallelize individual operations whenever it can*. This means that running 2 tasks at the same time instead of after each other is no guarantee that it will finish sooner. Test carefully to see whether it actually improves anything or not.
We had a developer that created a DTS package to run 8 tasks at the same time. Unfortunately, it was only a 4-CPU server :)
*Assuming default settings. This can be modified by altering the server's Maximum Degree of Parallelism or Affinity Mask, or by using the MAXDOP query hint.
Create a couple of SQL Server agent jobs where each one runs a particular proc.
Then from within your master proc kick off the jobs.
The only way of waiting that I can think of is if you have a status table that each proc updates when it's finished.
Then yet another job could poll that table for total completion and kick off a final proc. Alternatively, you could have a trigger on this table.
The memory implications are completely up to your environment..
UPDATE:
If you have access to the task system.. then you could take the same approach. Just have windows execute multiple tasks, each responsible for one proc. Then use a trigger on the status table to kick off something when all of the tasks have completed.
UPDATE2:
Also, if you're willing to create a new app, you could house all of the logic in a single exe...
You do need to move your overnight sprocs to jobs. SQL Server job control will let you do all of the scheduling you are asking for.
You might want to look into using DTS (which can be run from the SQL Agent as a job). It will allow you pretty fine control over which stored procedures need to wait for others to finish and what can run in parallel. You can also run the DTS package as an EXE from your own scheduling software if needed.
NOTE: You will need to create multiple copies of your connection objects to allow calls to run in parallel. Two calls using the same connection object will still block each other even if you don't explicitly put in a dependency.