How can we avoid Stored Procedures being executed in parallel? - sql

We have the following situation:
A Stored Procedure is invoked by a middleware and is given a XML file as parameter. The Procedure then parses the XML file and inserts values into temporary tables inside a loop. After looping, the values inside the temporary tables are inserted into physical tables.
Problem is, the Stored Procedure has a relatively long run-time (about 5 Minutes). In this period, it is likely that it is being invoked a second time, which would cause both processes to be suspended.
Now my question:
How can we avoid a second execution of a Stored Procedure if it is already running?
Best regards

I would recommend designing your application layer to prevent multiple instances of this process being run at once. For example, you could move the logic into a queue that is processed 1 message at a time. Another option would be locking at the application level to prevent the database call from being executed.
SQL Server does have a locking mechanism to ensure a block of code is not run multiple times: an "app lock". This is similar in concept to the lock statement in C# or other semaphores you might see in other languages.
To acquire an application lock, call sp_getapplock. For example:
begin tran
exec sp_getapplock #Resource = 'MyExpensiveProcess', #LockMode = 'Exclusive', #LockOwner = 'Transaction'
This call will block if another process has acquired the lock. If a second RPC call tries to run this process, and you would rather have the process return a helpful error message, you can pass in a #LockTimeout of 0 and check the return code.
For example, the code below raises an error if it could not acquire the lock. Your code could return something else that the application interprets as "process is already running, try again later":
begin tran
declare #result int
exec #result = sp_getapplock #Resource = 'MyExpensiveProcess', #LockMode = 'Exclusive', #LockOwner = 'Transaction', #LockTimeout = 0
if #result < 0
begin
rollback
raiserror (N'Could not acquire application lock', 16, 1)
end
To release the lock, call sp_releaseapplock.
exec sp_releaseapplock #Resource = 'MyExpensiveProcess'

Stored procedures are meant to be run multiple times and in parallel as well. The idea is to reuse the code.
If you want to avoid multiple run for same input, you need to take care of it manually. By implementing condition check for the input or using some locking mechanism.
If you don't want your procedure to run in parallel at all (regardless of input) best strategy is to acquire lock using some entry in DB table or using global variables depending on DBMS you are using.

You can check if the stored procedure is already running using exec sp_who2. This may be an approach to consider. In your SP, check this first and simply exit if it is. It will run again the next time the job executes.
You would need to filter out the current thread, make sure the count of that SP is 1 (1 will be for the current process, 2 means already running), or have a helper SP that is called first.
Here are other ideas: Check if stored procedure is running

Related

Should I use sp_getapplock to prevent multiple instances of a stored procedure that conditionally inserts?

Hear me out! I know this use case sounds suspect, but...
I have a stored procedure which checks a table (effectively a cache) for data for a given requested ID. If it doesn't find any data for that ID, or deems it out of date, it executes a second stored procedure which will pull data from a separate database (using dynamic SQL, source DB name is based on the requested ID) and insert it into the local table. It then selects from this table.
If the data is in the table, everything returns quickly (ms), but if it needs to be brought in from the other database, it takes about 10 seconds. We're seeing race conditions where two concurrent instances check the local cache, see something is missing, and queue up sequential ingestions of the remote data into the cache. To avoid double-insertion, the cache-populating procedure will clear whatever is already there for this id, but this just means the first instance of the procedure can selecting no rows because the second instance deleted the just-inserted records before re-inserting them itself.
I think I want to put a lock around the entire procedure (checking the cache, potentially populating the cache, selecting from the cache) - although I'm open to other solutions. I think the overall caching approach has to remain on-demand though, the remote databases come and go by the hundreds, and we only want to cache the ones actually requested by reporting as-needed.
BEGIN TRANSACTION;
BEGIN TRY
-- Take out a lock intended to prevent anyone else modifying the cache while we're reading and potentially modifying it
EXEC sp_getapplock #Resource = '[private].[cache_entries]', #LockOwner='Transaction', #LockMode = 'Exclusive', #LockTimeout = 120000;
-- Invoke a stored procedure that ingests any required data that is not already cached
EXEC [private].populate_cache #required_dbs
-- CALCULATIONS
-- ... SELECT FROM [private].cache_entries
COMMIT TRANSACTION; -- Free the lock
END TRY
BEGIN CATCH --Ensure we release our lock on failure
ROLLBACK TRANSACTION;
THROW
END CATCH;
The alternative to sp_getapplock is to use locking hints with your transaction. Both are reasonable approaches. Locking hints can be complex, but they protect the target object itself rather than a single code path. So sometimes necessary. sp_getapplock is simple (with Transaction as owner), and reliable.
You can do this without sp_getapplock, which tends to inhibit concurrency a lot.
The way to do this is to continue do your checks within a transaction, but to apply a HOLDLOCK hint, as well as a UPDLOCK hint.
HOLDLOCK aka the SERIALIZABLE isolation level, will place a lock not only on the ID you specify, but even on the absence of such data, in other words it will prevent anyone else inserting into that ID.
You must use both these hints, as well as have an index that matches that SELECT, otherwise you could run into major blocking and deadlocking problems due to full table scans.
Also, you don't need a CATCH and ROLLBACK. Just use SET XACT_ABORT ON; which ensures a rollback in any event of an error.
SET XACT_ABORT ON; -- always have this set
BEGIN TRANSACTION;
DECLARE #SomeData nvarchar(100) = (
SELECT ce.SomeColumn
FROM [private].cache_entries ce WITH (HOLDLOCK, UPDLOCK)
WHERE ce.SomeCondition = 1
);
IF #SomeData IS NULL
BEGIN
-- Invoke a stored procedure that ingests any required data that is not already cached
EXEC [private].populate_cache #required_dbs
END
-- CALCULATIONS
-- ... SELECT FROM [private].cache_entries
COMMIT TRANSACTION; -- Free the lock

How to lock and unlock a table exclusively outside of a transaction

In SQL Server, how can a table be locked and unlocked (exclusively) outside of a transaction?
Reason: multiple instances of my app need to interact with one table without stepping on each other's toes, while at the same time executing a statement that cannot run within a transaction
One option might be to look into sp_getapplock and sp_releaseapplock.
This would not lock the table, but an arbitrary named resource. You could then configure your application to only work with the table in question after the lock has been acquired, e.g., through stored procedures.
An example of this would be something like:
EXEC sp_getapplock #Resource = 'ResourceName', #LockMode = 'Exclusive', #LockOwner = 'Session'
-- UPDATE table, etc.
EXEC sp_releaseapplock #Resource = 'ResourceName', #LockOwner = 'Session'
Specifying #LockOwner = 'Session' means you can use this locking mechanism outside of a transaction.
There's also the option of extracting the locking and releasing statements into their own stored procedures, so the logic is only stated once; these stored procedures could return a value to the calling procedure with a result specifying whether or not the lock has been acquired/released.
At that point it's just a question of ensuring that this mechanism is put into place for every procedure/table/etc. where there may be contention.

How to check if a proc is already running when called?

As a follow up to my previous question where I ask about storedproc_Task1 calling storedproc_Task2, I want to know if SQL (SQL Server 2012) has a way to check if a proc is currently running, before calling it.
For example, if storedproc_Task2 can be called by both storedproc_Task1 and storedproc_Task3, I don't want storedproc_Task1 to call storedproc_Task2 only 20 seconds after storedproc_Task3. I want the code to look something like the following:
declare #MyRetCode_Recd_In_Task1 int
if storedproc_Task2 is running then
--wait for storedproc_Task2 to finish
else
execute #MyRetCode_Recd_In_Task1 = storedproc_Task2 (with calling parameters if any).
end
The question is how do I handle the if storedproc_Task2 is running boolean check?
UPDATE: I initially posed the question using general names for my stored procedures, (i.e. sp_Task1) but have updated the question to use names like storedproc_Task1 instead. Per srutzky's reminder, the prefix sp_ is reserved for system procs in the [master] database.
Given that the desire is to have any process calling sp_Task2 wait until sp_Task2 completes if it is already running, that is essentially making sp_Task2 single-threaded.
This can be accomplished through the use of Application Locks (see sp_getapplock and sp_releaseapplock). Application Locks let you create locks around arbitrary concepts. Meaning, you can define the #Resource as "Task2" which will force each caller to wait their turn. It would follow this structure:
BEGIN TRANSACTION;
EXEC sp_getapplock #Resource = 'Task2', #LockMode = 'Exclusive';
...single-threaded code...
EXEC sp_releaseapplock #Resource = 'Task2';
COMMIT TRANSACTION;
You need to manage errors / ROLLBACK yourself (as stated in the linked MSDN documentation) so put in the usual TRY / CATCH. But, this does allow you to manage the situation.
This code can be placed either in sp_Task2 at the beginning and end, as follows:
CREATE PROCEDURE dbo.Task2
AS
SET NOCOUNT ON;
BEGIN TRANSACTION;
EXEC sp_getapplock #Resource = 'Task2', #LockMode = 'Exclusive';
{current logic for Task2 proc}
EXEC sp_releaseapplock #Resource = 'Task2';
COMMIT TRANSACTION;
Or it can be placed in all of the locations that calls sp_Task2, as follows:
CREATE PROCEDURE dbo.Task1
AS
SET NOCOUNT ON;
BEGIN TRANSACTION;
EXEC sp_getapplock #Resource = 'Task2', #LockMode = 'Exclusive';
EXEC dbo.Task2 (with calling parameters if any);
EXEC sp_releaseapplock #Resource = 'Task2';
COMMIT TRANSACTION;
I would think that the first choice -- placing the logic in sp_Task2 -- would be the cleanest since a) it is in a single location and b) cannot be avoided by someone else calling sp_Task2 outside of the currently defined paths (ad hoc query or a new proc that doesn't take this precaution).
Please see my answer to your initial question regarding not using the sp_ prefix for stored procedure names and not needing the return value.
Please note: sp_getapplock / sp_releaseapplock should be used sparingly; Application Locks can definitely be very handy (such as in cases like this one) but they should only be used when absolutely necessary.
If you are using a global table as stated in the answer to your previous question then just drop the global table at the end of the procedure and then to check if the procedure is still running just check for the existence of the table:
If Object_ID('tempdb...##temptable') is null then -- Procedure is not running
--do something
else
--do something else
end

sp_getApplock lockOwner values transaction vs session

I have stored procedure that needs to be running one at a time (should not be running concurrently).
For this I am using below stored proc
exec #result = sys.sp_getapplock
#Resource='Employee', #LockMode = 'Exclusive', #LockOwner= 'session', #LockTimeout = 0
Inside that call there is LockOwner set to 'session' and there is also another possible value called 'transaction'.
I do know that if I will choose transaction I need to write stored proc body within transaction.
I searched about differences between both but did not get any luck.
Can anybody please help me out what is difference between them apart from calling sp_getapplock inside transction?
I will also appreciate any best practices for solving my problem (prevent running stored procedure from concurrent execution.)
I think the SEQUENCE object might work better for you than the applock solution. It acts a little like an identity column without a table and will allow you to generate the sequential part of your random number in a thread-safe way.

Transactions within loop within stored procedure

I'm working on a procedure that will update a large number of items on a remote server, using records from a local database. Here's the pseudocode.
CREATE PROCEDURE UpdateRemoteServer
pre-processing
get cursor with ID's of records to be updated
while on cursor
process the item
No matter how much we optimize it, the routine is going to take a while, so we don't want the whole thing to be processed as a single transaction. The items are flagged after being processed, so it should be possible to pick up where we left off if the process is interrupted.
Wrapping the contents of the loop ("process the item") in a begin/commit tran does not do the trick... it seems that the whole statement
EXEC UpdateRemoteServer
is treated as a single transaction. How can I make each item process as a complete, separate transaction?
Note that I would love to run these as "non-transacted updates", but that option is only available (so far as I know) in 2008.
EXEC procedure does not create a transaction. A very simple test will show this:
create procedure usp_foo
as
begin
select ##trancount;
end
go
exec usp_foo;
The ##trancount inside usp_foo is 0, so the EXEC statement does not start an implicit transaction. If you have a transaction started when entering UpdateRemoteServer it means somebody started that transaction, I can't say who.
That being said, using remote servers and DTC to update items is going to perform quite bad. Is the other server also SQL Server 2005 at least? Maybe you can queue the requests to update and use messaging between the local and remote server and have the remote server perform the updates based on the info from the message. It would perform significantly better because both servers only have to deal with local transactions, and you get much better availability due to the loose coupling of queued messaging.
Updated
Cursors actually don't start transactions. The typical cursor based batch processing is usually based on cursors and batches updates into transactions of a certain size. This is fairly common for overnight jobs, as it allows for better performance (log flush throughput due to larger transaction size) and jobs can be interrupted and resumed w/o losing everithing. A simplified version of a batch processing loop is typically like this:
create procedure usp_UpdateRemoteServer
as
begin
declare #id int, #batch int;
set nocount on;
set #batch = 0;
declare crsFoo cursor
forward_only static read_only
for
select object_id
from sys.objects;
open crsFoo;
begin transaction
fetch next from crsFoo into #id ;
while ##fetch_status = 0
begin
-- process here
declare #transactionId int;
SELECT #transactionId = transaction_id
FROM sys.dm_tran_current_transaction;
print #transactionId;
set #batch = #batch + 1
if #batch > 10
begin
commit;
print ##trancount;
set #batch = 0;
begin transaction;
end
fetch next from crsFoo into #id ;
end
commit;
close crsFoo;
deallocate crsFoo;
end
go
exec usp_UpdateRemoteServer;
I ommitted the error handling part (begin try/begin catch) and the fancy ##fetch_status checks (static cursors actually don't need them anyway). This demo code shows that during the run there are several different transactions started (different transaction IDs). Many times batches also deploy transaction savepoints at each item processed so they can skip safely an item that causes an exception, using a pattern similar to the one in my link, but this does not apply to distributed transactions since savepoints and DTC don't mix.
EDIT: as pointed out by Remus below, cursors do NOT open a transaction by default; thus, this is not the answer to the question posed by the OP. I still think there are better options than a cursor, but that doesn't answer the question.
Stu
ORIGINAL ANSWER:
The specific symptom you describe is due to the fact that a cursor opens a transaction by default, therefore no matter how you work it, you're gonna have a long-running transaction as long as you are using a cursor (unless you avoid locks altogether, which is another bad idea).
As others are pointing out, cursors SUCK. You don't need them for 99.9999% of the time.
You really have two options if you want to do this at the database level with SQL Server:
Use SSIS to perform your operation; very fast, but may not be available to you in your particular flavor of SQL Server.
Because you're dealing with remote servers, and you're worried about connectivity, you may have to use a looping mechanism, so use WHILE instead and commit batches at a time. Although WHILE has many of the same issues as a cursor (looping still sucks in SQL), you avoid creating the outer transaction.
Stu
Are yo running this only from within sql server, or from an app? if so, get the list to be processed, then loop in the app to only process for the subsets as required.
Then the transaction should be handled by your app, and should only lock the items being updated/pages the items are in.
NEVER process one item at a time in a loop when you are doing transactional work. You can loop through records processing groups of them but never ever do one record at a time. Do set-based inserts instead and your performance will change from hours to minutes or even seconds. If you are using a cursor to insert update or delete and it isn't handling at least 1000 rowa in each statement (not one at atime) you are doing the wrong thing. Cursors are an extremely poor practice for such thing.
Just an idea ..
Only process a few items when the procedure is called (e.g. only get the TOP 10 items to process)
Process those
Hopefully, this will be the end of the transaction.
Then write a wrapper that calls the procedure as long as there is more work to do (either use a simple count(..) to see if there are items or have the procedure return true indicating that there is more work to do.
Don't know if this works, but maybe the idea is helpful.