Start stored procedures sequentially or in parallel - sql

We have a stored procedure that runs nightly that in turn kicks off a number of other procedures. Some of those procedures could logically be run in parallel with some of the others.
How can I indicate to SQL Server whether a procedure should be run in parallel or serial — ie: kicked off of asynchronously or blocking?
What would be the implications of running them in parallel, keeping in mind that I've already determined that the processes won't be competing for table access or locks- just total disk io and memory. For the most part they don't even use the same tables.
Does it matter if some of those procedures are the same procedure, just with different parameters?
If I start a pair or procedures asynchronously, is there a good system in SQL Server to then wait for both of them to finish, or do I need to have each of them set a flag somewhere and check and poll the flag periodically using WAITFOR DELAY?
At the moment we're still on SQL Server 2000.
As a side note, this matters because the main procedure is kicked off in response to the completion of a data dump into the server from a mainframe system. The mainframe dump takes all but about 2 hours each night, and we have no control over it. As a result, we're constantly trying to find ways to reduce processing times.

I had to research this recently, so found this old question that was begging for a more complete answer. Just to be totally explicit: TSQL does not (by itself) have the ability to launch other TSQL operations asynchronously.
That doesn't mean you don't still have a lot of options (some of them mentioned in other answers):
Custom application: Write a simple custom app in the language of your choice, using asynchronous methods. Call a SQL stored proc on each application thread.
SQL Agent jobs: Create multiple SQL jobs, and start them asynchronously from your proc using sp_start_job. You can check to see if they have finished yet using the undocumented function xp_sqlagent_enum_jobs as described in this excellent article by Gregory A. Larsen. (Or have the jobs themselves update your own JOB_PROGRESS table as Chris suggests.) You would literally have to create separate job for each parallel process you anticipate running, even if they are running the same stored proc with different parameters.
OLE Automation: Use sp_oacreate and sp_oamethod to launch a new process calling the other stored proc as described in this article, also by Gregory A. Larsen.
DTS Package: Create a DTS or SSIS package with a simple branching task flow. DTS will launch tasks in individual spids.
Service Broker: If you are on SQL2005+, look into using Service Broker
CLR Parallel Execution: Use the CLR commands Parallel_AddSql and Parallel_Execute as described in this article by Alan Kaplan (SQL2005+ only).
Scheduled Windows Tasks: Listed for completeness, but I'm not a fan of this option.
I don't have much experience with Service Broker or CLR, so I can't comment on those options. If it were me, I'd probably use multiple Jobs in simpler scenarios, and a DTS/SSIS package in more complex scenarios.
One final comment: SQL already attempts to parallelize individual operations whenever it can*. This means that running 2 tasks at the same time instead of after each other is no guarantee that it will finish sooner. Test carefully to see whether it actually improves anything or not.
We had a developer that created a DTS package to run 8 tasks at the same time. Unfortunately, it was only a 4-CPU server :)
*Assuming default settings. This can be modified by altering the server's Maximum Degree of Parallelism or Affinity Mask, or by using the MAXDOP query hint.

Create a couple of SQL Server agent jobs where each one runs a particular proc.
Then from within your master proc kick off the jobs.
The only way of waiting that I can think of is if you have a status table that each proc updates when it's finished.
Then yet another job could poll that table for total completion and kick off a final proc. Alternatively, you could have a trigger on this table.
The memory implications are completely up to your environment..
UPDATE:
If you have access to the task system.. then you could take the same approach. Just have windows execute multiple tasks, each responsible for one proc. Then use a trigger on the status table to kick off something when all of the tasks have completed.
UPDATE2:
Also, if you're willing to create a new app, you could house all of the logic in a single exe...

You do need to move your overnight sprocs to jobs. SQL Server job control will let you do all of the scheduling you are asking for.

You might want to look into using DTS (which can be run from the SQL Agent as a job). It will allow you pretty fine control over which stored procedures need to wait for others to finish and what can run in parallel. You can also run the DTS package as an EXE from your own scheduling software if needed.
NOTE: You will need to create multiple copies of your connection objects to allow calls to run in parallel. Two calls using the same connection object will still block each other even if you don't explicitly put in a dependency.

Related

Run SQL statements asynchronously

I have six SQL UPDATE statements (on the same database) as an SQL Agent job that I run every night to ensure that two systems are in sync with each other. Each update takes about 10 minutes to run.
As a test today I opened SQL Studio Manager and opened five windows and ran the five updates concurrently (I can guarantee that a row can only ever be updated by one SQL statement). The five queries ran in 15 minutes.
Therefore instead of using a single SQL Agent Job I am thinking about calling the SQL statements from a VB.NET program, so that I can either:
1) Use asynchronous calls to ensure the queries are running concurrently.
2) Use multiple threads to ensure the queries are running concurrently
I read an article recently that says that asynchronous calls should not be used to speed up processing performance. Therefore I believe that multiple threads is the answer. Is that correct?
I read an article recently that says that asynchronous calls should not be used to speed up processing performance. Therefore I believe that multiple threads is the answer. Is that correct?
I think either what you read is wrong, or you've misinterpreted it. Running things concurrently will not speed that thing up, but will allow more things to happen in parallel by freeing up threads (on Windows threads are expensive to create: creating and destroying them over short periods should be avoided).
Concurrency (eg. using .NET 4.5.1's async support) allows the activity, including starting other asynchronous actions, to continue while the thread is used for something else.
The details of how to do this depend on how you are accessing the database: Entity Framework (EF), ADO.NET, or something else?
With EF you can use the extension methods in QueryableExtensions like ToListAsync on queries.
With ADO.NET you can use SqlCommand methods like ExecuteNonQueryAsync and ExecuteReaderAsync.
since you are dealing with a sql statement the choice you make in vb.net will not affect the performances or the time required to complete the 5 tasks on sql.
if you make 5 async calls then you will sit waiting for 5 answers; if you spawn 5 threads these threads will sit waiting for their syncronous calls to finish. the net result will be the same.
me too i'm pushing for the 5 agent jobs: is a solution that leverages existing sql tools, does not requires additional coding (more coding = more maintenance) and is available out of the box on almost any sql instance.

SSIS - Connection Management Within a Loop

I have the following SSIS package:
alt text http://www.freeimagehosting.net/uploads/5161bb571d.jpg
The problem is that within the Foreach loop a connection is opened and closed for each iteration.
On running SQL Profiler I see a series of:
Audit Login
RPC:Completed
Audit Logout
The duration for the login and the RPC that actually does the work is minimal. However, the duration for the logout is significant, running into several seconds each. This causes the JOB to run very slowly - taking many hours. I get the same problem when running either on a test server or stand-alone laptop.
Could anyone please suggest how I may change the package to improve performance?
Also, I have noticed that when running the package from Visual Studio, it looks as though it continues to run with the component blocks going amber then green but actually all the processing has been completed and SQL profiler has dropped silent?
Thanks,
Rob.
Have you tried running your data flow task in parallel vs serial? You can most likely break up your for loops to enable you to run each 'set' in parallel, so while it might still be expensive to login/out, you will be doing it N times simultaneously.
SQL Server is most performant when running a batch of operations in a single query. Is it possible to redesign your package so that it batches updates in a single call, rather than having a procedural workflow with for-loops, as you have it here?
If the design of your application and the RPC permits (or can be refactored to permit it), this might be the best solution for performance.
For example, instead of something like:
for each Facility
for each Stock
update Qty
See if you can create a structure (using SQL, or a bulk update RPC with a single connection) like:
update Qty
from Qty join Stock join Facility
...
If you control the implementation of the RPC, the RPC could maintain the same API (if needed) by delegating to another which does the batch operation, but specifies a single-record restriction (where record=someRecord).
Have you tried doing the following?
In your connection managers for the connection that is used within the loop, right click and choose properties. In the properties for the connection, find "RetainSameConnection" and change it to True from the default of False. This will let your package maintain the connection throughout your package run. Your profiler would then probably look like:
Audit Login
RPC:Completed
RPC:Completed
RPC:Completed
RPC:Completed
RPC:Completed
RPC:Completed
...
Audit Logout
With the final Audit Logout happening at the end of package execution.

Can Sql Server 2008 Stored Procedures (or Triggers) manually parallel or background some logic?

If i have a stored procedure or a trigger in Sql Server 2008, can it do some sql calculations 'in another non-blocking thread'? ie. something in the background
also, can two sql code blocks be ran in parallel? or two stored procs be ran in parallel?
for example. Imagine we are given the job calculating the scores for each Stack Overflow user (and please leave all 'do that elsehwere/service/batch/overnight/etc, elswhere) after a user does some 'action'.
so we have a trigger on the Post table, so when a new post is INSERTED, the trigger fires off and part of that logic, it calculates the user's latest score. Instead of waiting for the stored proc to finish and block the current sql thread / executire, can we ask it to calc the score in the background OR parallel.
cheers!
SQL Server does not have parallel or deferred execution: each block of running code in a connection is serial, one line after the other.
To decouple processing, you usually have to use SQL Server Agent jobs or use Service broker. These start executing in a new connection, new session etc
This makes sense:
What if you want to rollback your changes? What does the background thread do and how does it know?
What data does it use? New, Old, lock wait, snapshot?
What if it gets ahead of the main thread and uses stale data?
No, but you could write the request to a queue. Service Broker, a SQL Server component, provides support for this kind of thing. It's probably the best option available for asynchronous processing.

Database Job Scheduling

I have a procedure written in PLJava that sends out updates over JMS in my postgres database.
What I would like to do is have that function called on an interval (every 15 seconds) internally in the database (preferably not from an outside process). Is this possible? Any ideas?
If you need no external access, you are presumably able to modify the database design so that you don't need the update at all. Can you explain more about what the update is doing?
As depesz said, you could use either cron or pgAgent, but they are only able to go down to a one minute granularity, not 15 seconds. Considering sleeping inside the stored procedure until the next iteration is not a good idea, because you will have an open transaction for all that time which is a really bad idea.
Strict answer: it is not possible. Since you don't want outside process, and PostgreSQL doesn't support jobs - you are out of luck.
If you'll reconsider using outside processes, then you're most likely want something like cron, or better yet pgagent.
On absolutely other hand - what do you need to do that has to happen every 30 seconds? this seems like a problem with design.
First, you'll spend the least amount of effort if you just go with a cron job.
However, if you were starting from scracth: You are trying to periodically replicate rows from your database. I think you are looking at a replication queue.
The PGQ project (used for Londiste replication, both from Skype's SkyTools) has a queue that you can use independently. When configuring it, you set a maximum event count, and a loop delay, before batched events are generated. You can get batches spaced by no more than 15 seconds that way. You now have to produce the events that will be batched, using a trigger that calls pgq.insert_event; and consume the queues. The consumer can call your PL/Java stored proc; you'll have to rewrite the procedure to send everything in the batch instead of scanning the base table for new events.
As far as I know postgresql doesn't support scheduled tasks. You'll need to use a script with cron or at (depending on your operating system.)
Sounds like you're doing sort of replication? Every 15s sounds like a lot of updates. Could you setup a trigger (or a number of triggers) instead of polling?
If you are using JMS why not just have th task wait for input on the queue?
Per your depesz comment, you have a PL/Java stored procedure that "flushes out database tables (updates) as java objects". Since you want it to run in 15 second intervals, it must be processing a batch of updates each time. Rather than processing a batch of updates in a stored procedure every 15 seconds, why not process them one at a time when they happen via an after update trigger and eliminate the need for a timed interval. If you are aggregrating data from multiple tables to build your objects than add the triggers to you upper most tables only.
In my case the problem was that agent couldn't authorize to database so after I've made all connections trusted from localhost the service started successfully and job works fine
for more information about error you should see into windows event viewer or eq in unix based system. see my config file C:\Program Files\PostgreSQL\10\data\pg_hba.conf

Spawning multiple SQL tasks in SQL Server 2005

I have a number of stored procs which I would like to all run simultaneously on the server. Ideally all on the server without reliance on connections to an external client.
What options are there to launch all these and have them run simultaneously (I don't even need to wait until all the processes are done to do additional work)?
I have thought of:
Launching multiple connections from
a client, having each start the
appropriate SP.
Setting up jobs for
each SP and starting the jobs from a
SQL Server connection or SP.
Using
xp_cmdshell to start additional runs
equivalent to osql or whetever
SSIS - I need to see if the package can be dynamically written to handle more SPs, because I'm not sure how much access my clients are going to get to production
In the job and cmdshell cases, I'm probably going to run into permissions level problems from the DBA...
SSIS could be a good option - if I can table-drive the SP list.
This is a datawarehouse situation, and the work is largely independent and NOLOCK is universally used on the stars. The system is an 8-way 32GB machine, so I'm going to load it down and scale it back if I see problems.
I basically have three layers, Layer 1 has a small number of processes and depends on basically all the facts/dimensions already being loaded (effective, the stars are a Layer 0 - and yes, unfortunately they will all need to be loaded), Layer 2 has a number of processes which depend on some or all of Layer 1, and Layer 3 has a number of processes which depend on some or all of Layer 2. I have the dependencies in a table already, and would only initially launch all the procs in a particular layer at the same time, since they are orthogonal within a layer.
Is SSIS an option for you? You can create a simple package with parallel Execute SQL tasks to execute the stored procs simultaneously. However, depending on what your stored procs do, you may or may not get benefit from starting this in parallel (e.g. if they all access the same table records, one may have to wait for locks to be released etc.)
At one point I did some architectural work on a product known as Acumen Advantage that has a warehouse manager that does this.
The basic strategy for this is to have a control DB with a list of the sprocs and their dependencies. Based on the dependencies you can do a Topological Sort to give them an order to run in. If you do this, you need to manage the dependencies - all of the predecessors of a stored procedure must complete before it executes. Just starting the sprocs in order on multiple threads will not accomplish this by itself.
Implementing this meant knocking much of the SSIS functionality on the head and implementing another scheduler. This is OK for a product but probably overkill for a bespoke system. A simpler solution is thus:
You can manage the dependencies at a more coarse-grained level by organising the ETL vertically by dimension (sometimes known as Subject Oriented ETL) where a single SSIS package and set of sprocs takes the data from extraction through to producing dimensions or fact tables. Typically the dimensions will mostly be siloed, so they will have minimal interdependency. Where there is interdependency, make one dimension (or fact table) load process dependent on whatever it needs upstream.
Each loader becomes relatively modular and you still get a useful degree of parallelism by kicking off the load processes in parallel and letting the SSIS scheduler work it out. The dependencies will contain some redundancy. For example an ODS table may not be dependent on a dimension load being completed but the upstream package itself takes the components right through to the dimensional schema before it completes. However this is not likely to be an issue in practice for the following reasons:
The load process probably has plenty of other tasks that can execute in the meantime
The most resource-hungry tasks will almost certainly be the fact table loads, which will mostly not be dependent on each other. Where there is a dependency (e.g. a rollup table based on the contents of another table) this cannot be avoided anyway.
You can construct the SSIS packages so they pick up all of their configuration from an XML file and the location can be supplied exernally in an environment variable. This sort of thing can be fairly easily implemented with scheduling systems like Control-M.
This means that a modified SSIS package can be deployed with relatively little manual intervention. The production staff can be handed the packages to deploy along with the stored procedures and can mainain the config files on a per-environment basis without having to manually fiddle configuration in the SSIS packages.
you might want to look at the service broker and it's activation stored procedures... might be an option...
In the end, I created a C# management console program which launches the processes Async as they are able to be run and keeps track of the connections.