SQL Server 2005 Deadlock Problem - sql

I’m running into a deadlock problem when trying to lock some records so that no process (Windows service) picks the items to service them, then update the status and then return a recordset.
Can you please let me know why am I getting the deadlock issue when this proc is invoked?
CREATE PROCEDURE [dbo].[sp_LoadEventsTemp]
(
#RequestKey varchar(20),
#RequestType varchar(20),
#Status varchar(20),
#ScheduledDate smalldatetime = null
)
AS
BEGIN
declare #LoadEvents table
(
id int
)
BEGIN TRANSACTION
if (#scheduledDate is null)
Begin
insert into #LoadEvents (id)
(
Select eventqueueid FROM eventqueue
WITH (HOLDLOCK, ROWLOCK)
WHERE requestkey = #RequestKey
and requesttype = #RequestType
and [status] = #Status
)
END
else
BEGIN
insert into #LoadEvents (id)
(
Select eventqueueid FROM eventqueue
WITH (HOLDLOCK, ROWLOCK)
WHERE requestkey = #RequestKey
and requesttype = #RequestType
and [status] = #Status
and (convert(smalldatetime,scheduleddate) <= #ScheduledDate)
)
END
update eventqueue set [status] = 'InProgress'
where eventqueueid in (select id from #LoadEvents)
IF ##Error 0
BEGIN
ROLLBACK TRANSACTION
END
ELSE
BEGIN
COMMIT TRANSACTION
select * from eventqueue
where eventqueueid in (select id from #LoadEvents)
END
END
Thanks in advance.

Do you have a non-clustered index defined as:
CREATE NONCLUSTERED INDEX NC_eventqueue_requestkey_requesttype_status
ON eventqueue(requestkey, requesttype, status)
INCLUDE eventqueueid
and another on eventqueueid?
BTW the conversion of column scheduleddate to smalldatetime type will prevent any use of an index on that column.

First of all, as you're running SQL Server I'd recommend you to intall Performance Dashboard which is a very handy tool to identify what locks are currently being made on the server.
Performance Dahsboard Link
Second, take a trace of your SQL Server using SQL Profiler (Already Installed) and make sure that you select on the Events Selection the item Locks > Deadlock graph which will show what is causing the deadlock.
You got to have very clear on your mind what a deadlock is to start troubleshooting it.
When any access is made to any table or row on the DB a lock is made.
Lets call SPID 51 and SPID 52 (SPID = SQL Process ID)
SPID 51 locks Cell A
SPID 52 locks Cell B
if on the same transaction SPID 51 requests for the Cell B, it'll wait SPID 52 till it releases it.
if on the same transaction SPID 52 requests for the Cell A, you got a deadlock because this situation will never finish (51 waiting for 52 and 52 for 51)
Got to tell you that it ain't easy to troubleshoot, but you you to dig deeper to find the resolution

Deadlocks happen most often (in my experience) when differnt resources are locked within differnt transactions in different orders.
Imagine 2 processes using resource A and B, but locking them in different orders.
- Process 1 locks resource A, then resource B
- Process 2 locks resource B, then resource A
The following then becomes possible:
- Process 1 locks resource A
- Process 2 locks resource B
- Process 1 tries to lock resource B, then stops and waits as Process 2 has it
- Process 2 tries to lock resource A, then stops and waits as Process 1 has it
- Both proceses are waiting for each other, Deadlock
In your case we would need to see exactly where the SP falls over due to a deadlock (the update I'd guess?) and any other processes that reference that table. It could be an trigger or something, which then gets deadlocked on a different table, not the table you're updating.
What I would do is use SQL Server 2005's OUTPUT syntaxt to avoid having to use the transaction...
UPDATE
eventqueue
SET
status = 'InProgress'
WHERE
requestkey = #RequestKey
AND requesttype = #RequestType
AND status = #Status
AND (convert(smalldatetime,scheduleddate) <= #ScheduledDate OR #ScheduledDate IS NULL)
OUTPUT
inserted.*

Related

Ensure data consistency when same stored procedure is called by windows service in interval of few seconds

We have a stored procedure which returns the list of the pending items which need to be processed. Now there is a window service which calls a stored procedure in intervals of 20 seconds to get the pending items for further processing.
There is a column QueryTimestamp in the Pending table. For the pending items the QueryTimestamp column is null. Once selected by the a stored procedure, the column QueryTimestamp is updated with current date time.
The body is as followed. No explicit transaction has been used. SQL Server default isolation level is applicable.
DECLARE #workerPending TABLE
(
RowNum INT IDENTITY PRIMARY KEY,
[PendingId] BIGINT,
[CreatedDate] DATETIME
)
INSERT INTO #workerPending ([PendingId], [CreatedDate])
SELECT
[p].[PendingId] AS [PendingId],
[p].CreatedDate
FROM
[pending] [p]
WHERE
[p].QueryTimestamp IS NULL
ORDER BY
[p].[PendingId]
--Update pending table with current date time
UPDATE Pnd
SET QueryTimestamp = GETDATE()
FROM [Pending] Pnd
INNER JOIN #workerPending [wp] ON [wp].[PendingId] = Pnd.[PendingId]
If the stored procedure is not able to process the first request in 20 seconds due to huge data, windows service sends another call to the stored procedure, and it starts processing both the requests.
Concern is: does this causes both the requests have some duplicate pending records?
Do we need to implement LOCK in the pending table ?
Please suggest how we can ensure data consistency? SO that if another request comes to the stored procedure while the previous request is still in progress, no duplicate record should be returned.
EDIT : Other windows service is there which calls another SP which inserts records into the Pending table and mark "QueryTimestamp" with null.
Simple but effective solution is when the service wants to call the SP do this :
Read in table Settings a value that tells you if the thread is already running
If not already running then
begin
Write to table Settings that the thread has started
Commit this update
Call your SP
Write to table Settings that the thread has finished
Commit this update
end
You can do the UPDATE and SELECT in a single step with an OUTPUT clause. EG
UPDATE pending
SET QueryTimestamp = GETDATE()
output inserted.PendingId, inserted.CreatedDate
into #workerPending(PendingId,CreatedDate)
WHERE QueryTimestamp IS NULL
Or a more robust pattern that allows you to limit and order the results and concurrently retrieve them is to use a transaction and lock hints on the SELECT, eg:
begin transaction
INSERT INTO #workerPending ([PendingId], [CreatedDate])
SELECT top 100
[p].[PendingId] AS [PendingId],
[p].CreatedDate
FROM
[pending] [p] with (updlock, rowlock, readpast)
WHERE
[p].QueryTimestamp IS NULL
ORDER BY
[p].[PendingId];
UPDATE pending
SET QueryTimestamp = GETDATE()
where PendingIdin (select PendingId from #workerPending )
commit transaction
See Using tables as Queues

How to handle concurrency when Creating an Queue with SQL table as backend

I want to create a simple queue with a sql database as backend.
the table have the fields, id, taskname,runat(datetime) and hidden(datetime).
I want to ensure a queue item is not run once and only once.
The idea is when a client want to dequeue, a stored procedure selects the first item(sorted by runat and hidden < now), sets the hidden field to current time + 2min and returns the item.
How does MS Sql (Azure to be precise) wokr, will two clients be able to run at the same time and both set the same item to hidden and return it? Or can i be sure that they are run one by one and the second one will not return the same item as the hidden field was changed with the first?
The key is to get a lock (Row or table) on the queue item you are receiving. You can use a couple of ways, my favorite being the UPDATE with OUTPUT clause. Either will produce serialized access to the table.
Example:
CREATE PROCEDURE spGetNextItem_output
BEGIN
BEGIN TRAN
UPDATE TOP(1) Messages
SET [Status] = 1
OUTPUT INSERTED.MessageID, INSERTED.Data
WHERE [Status] = 0
COMMIT TRAN
END
CREATE PROCEDURE spGetNextItem_tablockx
BEGIN
BEGIN TRAN
DECLARE #MessageID int, #data xml
SELECT TOP(1) #MessageID = MessageID, #Data = Data
FROM Messages WITH (ROWLOCK, XLOCK, READPAST) --lock the row, skip other locked rows
WHERE [Status] = 0
UPDATE Messages
SET [Status] = 1
WHERE MessageID = #MessageID
SELECT #MessageID AS MessageID, #Data as Data
COMMIT TRAN
END
Table definition:
CREATE TABLE [dbo].[Messages](
[MessageID] [int] IDENTITY(1,1) NOT NULL,
[Status] [int] NOT NULL,
[Data] [xml] NOT NULL,
CONSTRAINT [PK_Messages] PRIMARY KEY CLUSTERED
(
[MessageID] ASC
)
)
Windows Azure SQL Database is going to behave just like a SQL Server database in terms of concurrency. This is a database problem, not a Windows Azure problem.
Now: If you went with Windows Azure Queues (or Service Bus Queues) rather than implementing your own, then the behavior is well documented. For instance: with Azure Queues, first-in gets the queue item, and then the item is marked as invisible until it's either deleted or a timeout period has been reached.

SQL Delete Statement Didnt Delete

Just want to get some views/possible leads on an issue I have.
I have a stored procedure that updates/deletes a record from a table in my database, the table it deletes from is a live table, that temporary holds the data, and also updates records on a archive table. (for reporting etc..) it works normally and havent had an issues.
However recently I had worked on a windows service to monitor our system (running 24/7), which uses a HTTP call to initiate a program, and once this program has finished it then runs the mention stored procedure to delete out redundant data. Basically the service just runs the program quickly to make sure its functioning correctly.
I have noticed recently that the data isnt always being deleted. Looking through logs I see no errors being reported. And Even see the record in the database has been updated correctly. But just doesnt get deleted.
This unfortunately has a knock on effect with the monitoring service, as this continously runs, and sends out alerts because the data cant be duplicated in the live table, hence why it needs to delete out the data.
Currently I have in place a procedure to clear out any old data. (3 hours).
Result has the value - Rejected.
Below is the stored procedure:
DECLARE #PostponeUntil DATETIME;
DECLARE #Attempts INT;
DECLARE #InitialTarget VARCHAR(8);
DECLARE #MaxAttempts INT;
DECLARE #APIDate DATETIME;
--UPDATE tCallbacks SET Result = #Result WHERE CallbackID = #CallbackID AND UPPER(Result) = 'PENDING';
UPDATE tCallbacks SET Result = #Result WHERE ID = (SELECT TOP 1 ID FROM tCallbacks WHERE CallbackID = #CallbackID ORDER BY ID DESC)
SELECT #InitialTarget = C.InitialTarget, #Attempts = LCB.Attempts, #MaxAttempts = C.CallAttempts
FROM tConfigurations C WITH (NOLOCK)
LEFT JOIN tLiveCallbacks LCB ON LCB.ID = #CallbackID
WHERE C.ID = LCB.ConfigurationID;
IF ((UPPER(#Result) <> 'SUCCESSFUL') AND (UPPER(#Result) <> 'MAXATTEMPTS') AND (UPPER(#Result) <> 'DESTBAR') AND (UPPER(#Result) <> 'REJECTED')) BEGIN
--INSERT A NEW RECORD FOR RTNR/BUSY/UNSUCCESSFUL/REJECT
--Create Callback Archive Record
SELECT #APIDate = CallbackRequestDate FROM tCallbacks WHERE Attempts = 0 AND CallbackID = #CallbackID;
BEGIN TRANSACTION
INSERT INTO tCallbacks (CallbackID, ConfigurationID, InitialTarget, Agent, AgentPresentedCLI, Callee, CalleePresentedCLI, CallbackRequestDate, Attempts, Result, CBRType, ExternalID, ASR, SessionID)
SELECT ID, ConfigurationID, #InitialTarget, Agent, AgentPresentedCLI, Callee, CalleePresentedCLI, #APIDate, #Attempts + 1, 'PENDING', CBRType, ExternalID, ASR, SessionID
FROM tLiveCallbacks
WHERE ID = #CallbackID;
UPDATE LCB
SET PostponeUntil = DATEADD(second, C.CallRetryPeriod, GETDATE()),
Pending = 0,
Attempts = #Attempts + 1
FROM tLiveCallbacks LCB
LEFT JOIN tConfigurations C ON C.ID = LCB.ConfigurationID
WHERE LCB.ID = #CallbackID;
COMMIT TRANSACTION
END
ELSE BEGIN
-- Update the Callbacks archive, when Successful or Max Attempts or DestBar.
IF EXISTS (SELECT ID FROM tLiveCallbacks WHERE ID = #CallbackID) BEGIN
BEGIN TRANSACTION
UPDATE tCallbacks
SET Attempts = #Attempts
WHERE ID IN (SELECT TOP (1) ID
FROM tCallbacks
WHERE CallbackID = #CallbackID
ORDER BY Attempts DESC);
-- The live callback should no longer be active now. As its either been answered or reach the max attempts.
DELETE FROM tLiveCallbacks WHERE ID = #CallbackID;
COMMIT
END
END
You need to fix your transaction processing. What is happening is that one statement is failing but since you don't have a try-catch block all changes are not getting rolled back only the statement that failed.
You should never have a begin tran without a try catch block and a rollback on error. I personally also prefer in something like this to put the errors and associated data into a table variable (which will not rollback) and then insert then to an exception table after the rollback. This way the data retains integrity and you can look up what the problem was.

How to detect a record is locked?

With SQL Server 2008, how can I detect if a record is locked?
EDIT:
I need to know this, so I can notify the user that the record is not accessible because the record is blocked.
In most circumstances with SQL 2008 you can do something like:
if exists(select 0 from table with (nolock) where id = #id)
and not exists(select 0 from table with(readpast) where id = #id)
begin
-- Record is locked! Do something.
end
If that is not enough (that is, you need to ignore table-level locks as well), use the NOWAIT hint that throws an error if there's a lock.

The best way to use a DB table as a job queue (a.k.a batch queue or message queue)

I have a databases table with ~50K rows in it, each row represents a job that need to be done. I have a program that extracts a job from the DB, does the job and puts the result back in the db. (this system is running right now)
Now I want to allow more than one processing task to do jobs but be sure that no task is done twice (as a performance concern not that this will cause other problems). Because the access is by way of a stored procedure, my current though is to replace said stored procedure with something that looks something like this
update tbl
set owner = connection_id()
where available and owner is null limit 1;
select stuff
from tbl
where owner = connection_id();
BTW; worker's tasks might drop there connection between getting a job and submitting the results. Also, I don't expect the DB to even come close to being the bottle neck unless I mess that part up (~5 jobs per minute)
Are there any issues with this? Is there a better way to do this?
Note: the "Database as an IPC anti-pattern" is only slightly apropos here because
I'm not doing IPC (there is no process generating the rows, they all already exist right now) and
the primary gripe described for that anti-pattern is that it results in unneeded load on the DB as processes wait for messages (in my case, if there are no messages, everything can shutdown as everything is done)
The best way to implement a job queue in a relational database system is to use SKIP LOCKED.
SKIP LOCKED is a lock acquisition option that applies to both read/share (FOR SHARE) or write/exclusive (FOR UPDATE) locks and is widely supported nowadays:
Oracle 10g and later
PostgreSQL 9.5 and later
SQL Server 2005 and later
MySQL 8.0 and later
Now, consider we have the following post table:
The status column is used as an Enum, having the values of:
PENDING (0),
APPROVED (1),
SPAM (2).
If we have multiple concurrent users trying to moderate the post records, we need a way to coordinate their efforts to avoid having two moderators review the same post row.
So, SKIP LOCKED is exactly what we need. If two concurrent users, Alice and Bob, execute the following SELECT queries which lock the post records exclusively while also adding the SKIP LOCKED option:
[Alice]:
SELECT
p.id AS id1_0_,1
p.body AS body2_0_,
p.status AS status3_0_,
p.title AS title4_0_
FROM
post p
WHERE
p.status = 0
ORDER BY
p.id
LIMIT 2
FOR UPDATE OF p SKIP LOCKED
[Bob]:
SELECT
p.id AS id1_0_,
p.body AS body2_0_,
p.status AS status3_0_,
p.title AS title4_0_
FROM
post p
WHERE
p.status = 0
ORDER BY
p.id
LIMIT 2
FOR UPDATE OF p SKIP LOCKED
We can see that Alice can select the first two entries while Bob selects the next 2 records. Without SKIP LOCKED, Bob lock acquisition request would block until Alice releases the lock on the first 2 records.
Here's what I've used successfully in the past:
MsgQueue table schema
MsgId identity -- NOT NULL
MsgTypeCode varchar(20) -- NOT NULL
SourceCode varchar(20) -- process inserting the message -- NULLable
State char(1) -- 'N'ew if queued, 'A'(ctive) if processing, 'C'ompleted, default 'N' -- NOT NULL
CreateTime datetime -- default GETDATE() -- NOT NULL
Msg varchar(255) -- NULLable
Your message types are what you'd expect - messages that conform to a contract between the process(es) inserting and the process(es) reading, structured with XML or your other choice of representation (JSON would be handy in some cases, for instance).
Then 0-to-n processes can be inserting, and 0-to-n processes can be reading and processing the messages, Each reading process typically handles a single message type. Multiple instances of a process type can be running for load-balancing.
The reader pulls one message and changes the state to "A"ctive while it works on it. When it's done it changes the state to "C"omplete. It can delete the message or not depending on whether you want to keep the audit trail. Messages of State = 'N' are pulled in MsgType/Timestamp order, so there's an index on MsgType + State + CreateTime.
Variations:
State for "E"rror.
Column for Reader process code.
Timestamps for state transitions.
This has provided a nice, scalable, visible, simple mechanism for doing a number of things like you are describing. If you have a basic understanding of databases, it's pretty foolproof and extensible.
Code from comments:
CREATE PROCEDURE GetMessage #MsgType VARCHAR(8) )
AS
DECLARE #MsgId INT
BEGIN TRAN
SELECT TOP 1 #MsgId = MsgId
FROM MsgQueue
WHERE MessageType = #pMessageType AND State = 'N'
ORDER BY CreateTime
IF #MsgId IS NOT NULL
BEGIN
UPDATE MsgQueue
SET State = 'A'
WHERE MsgId = #MsgId
SELECT MsgId, Msg
FROM MsgQueue
WHERE MsgId = #MsgId
END
ELSE
BEGIN
SELECT MsgId = NULL, Msg = NULL
END
COMMIT TRAN
Instead of having owner = null when it isn't owned, you should set it to a fake nobody record instead. Searching for null doesn't limit the index, you might end up with a table scan. (this is for oracle, SQL server might be different)
Just as a possible technology change, you might consider using MSMQ or something similar.
Each of your jobs / threads could query the messaging queue to see if a new job was available. Because the act of reading a message removes it from the stack, you are ensured that only one job / thread would get the message.
Of course, this is assuming you are working with a Microsoft platform.
See Vlad's answer for context, I'm just adding the equivalent in Oracle because there's a few "gotchas" to be aware of.
The
SELECT * FROM t order by x limit 2 FOR UPDATE OF t SKIP LOCKED
will not translate directly to Oracle in the way you might expect. If we look at a few options of translation, we might try any of the following:
SQL> create table t as
2 select rownum x
3 from dual
4 connect by level <= 100;
Table created.
SQL> declare
2 rc sys_refcursor;
3 begin
4 open rc for select * from t order by x for update skip locked fetch first 2 rows only;
5 end;
6 /
open rc for select * from t order by x for update skip locked fetch first 2 rows only;
*
ERROR at line 4:
ORA-06550: line 4, column 65:
PL/SQL: ORA-00933: SQL command not properly ended
ORA-06550: line 4, column 15:
PL/SQL: SQL Statement ignored
SQL> declare
2 rc sys_refcursor;
3 begin
4 open rc for select * from t order by x fetch first 2 rows only for update skip locked ;
5 end;
6 /
declare
*
ERROR at line 1:
ORA-02014: cannot select FOR UPDATE from view with DISTINCT, GROUP BY, etc.
ORA-06512: at line 4
or perhaps try falling back to the ROWNUM option
SQL> declare
2 rc sys_refcursor;
3 begin
4 open rc for select * from ( select * from t order by x ) where rownum <= 10 for update skip locked;
5 end;
6 /
declare
*
ERROR at line 1:
ORA-02014: cannot select FOR UPDATE from view with DISTINCT, GROUP BY, etc.
ORA-06512: at line 4
And you won't get any joy. You thus need to control the fetching of the "n" rows yourself. Thus you can code up something like:
SQL> declare
2 rc sys_refcursor;
3 res1 sys.odcinumberlist := sys.odcinumberlist();
4 begin
5 open rc for select * from t order by x for update skip locked;
6 fetch rc bulk collect into res1 limit 10;
7 end;
8 /
PL/SQL procedure successfully completed.
You are trying to implement de "Database as IPC" antipattern. Look it up to understand why you should consider redesigning your software properly.