Avoid deadlocking within a serializable isolation-level transaction? - sql

I am trying to implement an event source in SQL Server, and have been experiencing deadlocking issues.
In my design, events are grouped by DatasetId and each event is written with a SequenceId, with the requirement that, for a given DatasetId, SequenceIds are serial, beginning at 1 and increasing one at a time with each event, never missing a value and never repeating one either.
My Events table looks something like:
CREATE TABLE [Events]
(
[Id] [BIGINT] IDENTITY(1,1) NOT NULL,
[DatasetId] [BIGINT] NOT NULL,
[SequenceId] [BIGINT] NOT NULL,
[Value] [NVARCHAR](MAX) NOT NULL
)
I also have a non-unique, non-clustered index on the DatasetId column.
In order to insert into this table with the above restrictions on SequenceId, I have been inserting rows under a transaction using Serializable isolation level, and calculating the required SequenceId manually within this transaction as the max of all existing SequenceIds plus one:
DECLARE #DatasetId BIGINT = 1, #Value NVARCHAR(MAX) = N'I am an event.';
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
BEGIN TRY
DECLARE #SequenceId BIGINT;
SELECT #SequenceId = ISNULL(MAX([SequenceId]), 0) + 1
FROM [Events]
WHERE [DatasetId] = #DatasetId;
INSERT INTO [Events] ([DatasetId], [SequenceId], [Value])
VALUES (#DatasetId, #SequenceId, #Value);
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
END CATCH
This has worked fine in terms of the guarantees that I require on the SequenceId column. However, I have been experiencing deadlocking when trying to insert multiple rows in parallel, even when such rows are for different DatasetIds.
The behaviour seems to be that the query to generate a SequenceId in a first connection blocks the same query to generate a SequenceId in the second connection, and this second attempt blocks the first connections ability to insert the row, meaning neither transaction is able to complete, hence the deadlock.
Is there a means of avoiding this whilst still gaining the benefits of a Serializable transaction isolation level?
Another technique I have been considering is reducing the isolation level and instead using sp_getapplock to manually acquire a lock on the table for a given DatasetId, meaning I can then ensure I can generate a consistent SequenceId. Once the transaction has been committed/rolled back, the lock would automatically be released. Is this approach reasonable, or is manually managing locks like this considered an anti-pattern?

Related

How to minimize primary key conflicts on non-identity column when several sources can concurrently insert into a table?

There is a table in our SQL Server 2012 to generate and send emails. Its simplified structure is as follows:
CREATE TABLE [dbo].[EmailRequest]
(
[EmailRequestID] [int] NOT NULL,
[EmailAddress] [varchar](1024) NULL,
[CCEmailAddress] [varchar](1024) NULL,
[EmailReplyToAddress] [varchar](128) NULL,
[EmailReplyToName] [varchar](128) NULL,
[EmailSubject] [varchar](max) NULL,
[EmailBody] [varchar](max) NULL,
[Attachments] [varchar](max) NULL,
[CreateDateTime] [datetime] NULL,
[_EmailSent] [varchar](1) NULL,
[_EmailSentDateTime] [datetime] NULL,
CONSTRAINT [PK_EmailRequest]
PRIMARY KEY CLUSTERED ([EmailRequestID] ASC)
)
I don't have any control over that table or the database where it sits; it is provided "as is".
Different programs and scripts insert records into the table at random intervals. I suspect most of them do this with queries like this:
INSERT INTO [dbo].[EmailRequest] ([EmailRequestID], ... <other affected columns>)
SELECT MAX([EmailRequestID]) + 1, <constants somehow generated in advance>
FROM [dbo].[EmailRequest];
I run a big SQL script which at some conditions must send emails as well. In my case the part responsible for emails looks like this:
INSERT INTO [dbo].[EmailRequest] ([EmailRequestID], ... <other affected columns>)
SELECT MAX([EmailRequestID]) + 1, <values collected from elsewhere>
FROM [dbo].[EmailRequest]
JOIN db1.dbo.table1 ON ...
JOIN db1.dbo.table2 ON ... and so on;
The "select" part takes its time, so when it actually inserts data the calculated MAX([EmailRequestID]) + 1 value may become redundant and cause primary key violation (rare event, but nevertheless annoying one).
The question: is there a way to design the query so it calculates MAX([EmailRequestID])+1 later, just before insert?
One of the options might be:
INSERT INTO [dbo].[EmailRequest] ([EmailRequestID], ... <other affected columns>)
SELECT
(SELECT MAX([EmailRequestID]) + 1
FROM [dbo].[EmailRequest]), <values collected from elsewhere>
FROM db1.dbo.table1
JOIN db1.dbo.table2 ON ... and so on;
but I am not sure if it brings any advantages.
So there may be another question: is there a way to see "time-lapse" of query execution?
Testing is a challenge, because no one sends request to the test database, so I will never get PK violation in there.
Thank you.
Some amazing results from testing the accepted answer.
The elapsed time for original (real) query - 2000...2800 ms;
same query without "insert" part - 1200...1800 ms.
Note: the "select" statement collects information from three databases.
The test query retains real "select" statement (removed below):
Declare #mailTable table
(mt_ID int,
mt_Emailaddress varchar(1024),
mt_CCEmailAddress varchar(1024),
mt_EmailSubject varchar(max),
mt_EmailBody varchar(max)
);
insert into #mailTable
select row_number() over (ORDER BY (SELECT NULL)),
am.ul_EMail, ... -- EmailAddress - the rest is removed
FROM <real live tables>;
insert into dbo.EmailRequest
(EmailRequestID, _MessageID, EmailType, EmailAddress, CCEmailAddress,
BulkFlag, EmailSubject, EmailBody, EmailReplyToAddress,
CreateDateTime, SQLServerUpdated, SQLServerDateTime, _EmailSent)
select (select Max(EmailRequestID)+1 from dbo.EmailRequest),
0, '*TEXT', -- _MessageID, EmailType
mt_Emailaddress,
mt_CCEmailAddress,
'N', -- BulkFlag
mt_EmailSubject, -- EmailSubject
mt_EmailBody, -- EmailBody
'', GetDate(), '0', GetDate(), '0'
FROM #mailTable;
Elapsed time on 10 runs for first part - 48 ms (worst), 8 (best);
elapsed time for second part, where collision may occur - 85 ms (worst), 1 ms (best)
You don't have any good options, if you cannot fix the table. The table should be defined as:
CREATE TABLE [dbo].[EmailRequest](
[EmailRequestID] [int] identity(1, 1) NOT NULL PRIMARY KEY,
. . .
Then the database will generate a unique id for each row.
If you didn't are about performance, you can lock the table to prevent other threads from writing to the table. That's a lousy idea.
Your best bet is to capture the error and try again. No guarantee of when things will finish, and you could end up with different threads all deadlocking.
Wait, there is one thing you could do. You could use a sequence instead of the max id. If you control all the inserts into the table, then you could create a sequence and insert from that value rather than from the table. This would solve the performance problem and the need for a unique id. To really effect this, you would want to take the database down, bring it back up, set up all the code using the sequence, and then let'er rip.
That said, much the better solution is an identity primary key.
I know this might not be the most ideal solution, but I wanted to add it for completeness sake. Unfortunately, sometimes we don't have much of a choice in how we deal with certain problems.
Let me preface this with a disclaimer:
This may not work well in extremely high concurrency scenarios since it will hold an exclusive lock on the table. In practice, I've used this approach with up to 32 concurrent threads interacting with the table across 4 different machines and this was not the bottleneck. Make sure that the transaction here runs separately if at all possible.
The basic idea is that you perform your complex query first and store the results somewhere temporarily (a table variable in this example). You then take a lock on the table while locating the max ID, insert your records based on that ID, and then release the lock.
Assuming your table is structured like this:
CREATE TABLE EmailRequest (
EmailRequestID INT,
Field1 INT,
Field2 VARCHAR(20)
);
You could try something like this to push your inserts:
-- Define a table variable to hold the data to be inserted into the main table:
DECLARE #Emails TABLE(
RowID INT IDENTITY(1, 1),
Field1 INT,
Field2 VARCHAR(20)
);
-- Run the complex query and store the results in the table variable:
INSERT INTO #Emails (Field1, Field2)
SELECT Field1, Field2
FROM (VALUES
(10, 'DATA 1'),
(11, 'DATA 2'),
(15, 'DATA 3')
) AS a (Field1, Field2);
BEGIN TRANSACTION;
-- Determine the current max ID, and lock the table:
DECLARE #MaxEmailRequestID INT = (
SELECT ISNULL(MAX(EmailRequestID), 0)
FROM [dbo].[EmailRequest] WITH(TABLOCKX, HOLDLOCK)
);
-- Insert the records into the main table:
INSERT INTO EmailRequest (EmailRequestID, Field1, Field2)
SELECT
#MaxEmailRequestID + RowID,
Field1,
Field2
FROM #Emails;
-- Commit to release the lock:
COMMIT;
If your complex query returns a large number of rows (thousands), you might want to consider using a temp table instead of a table variable.
Honestly, even if you remove the BEGIN TRANSACTION, COMMIT, and locking hints (WITH(TABLOCKX, HOLDLOCK)), this still has the potential to dramatically reduce the frequency of the issue you described. In that case, the disclaimer above would no longer apply.

Row concurrency problems multi threading

I am having troubles with concurrency. A 3rd party software is executing my Stored Procedure which I need to capture a unique list of IDs in a table. The code works until multi threading is brought into the mix (gasp).
I have tried various transaction features including isolation levels to seemingly no avail.
Essentially given the following, I need table 'IDList' to contain only the unique IDs that have ever been sent.
When other threads from the 3rd party software execute the example calling code, I consistently end up with duplicates in 'IDList'. It is my estimation that the following is happening, but am unable to resolve:
Thread #1 runs the SELECT (with JOIN) in insertMissingIDs
Thread #2 runs the SELECT (with JOIN) in insertMissingIDs
Thread #1 runs the INSERT in insertMissingIDs
Thread #2 runs the INSERT in insertMissingIDs
Result: Duplicates
I realize the example may seem silly, I have boiled it down as not to reveal confidential code.
Calling code:
DECLARE #ids IdType
INSERT INTO #ids
SELECT '123'
EXEC insertMissingIDs #ids
User defined Type:
CREATE TYPE [dbo].[IdType] AS TABLE(
[ID] [nvarchar](250) NULL
)
Procedure:
ALTER PROCEDURE [dbo].[insertMissingIDs]
#ids IdType READONLY
AS
BEGIN
SET NOCOUNT ON;
INSERT INTO IDList (ID)
SELECT p.ID
FROM #ids i
LEFT JOIN IDList ON i.ID = IDList.ID
WHERE IDList.ID IS NULL
END
Thanks in advance!
I think you basically have two choices. You can set the appropriate transaction isolation level (documentation here). I think using set transaction isolation level serializable would do the trick. This could introduce a big overhead on your transactions. You would be locking the table for both reads and writes. One call will have to wait for the previous one to finish. In more complicated situations, you might end up with a deadlock.
Another option is to define the primary key table IDLIST using the IGNORE_DUP_KEY option. This allows inserts into the table. If duplicates are in the data being inserted, they are ignored.
Here is a blog post about creative ways to use this option.
First, you should probably put a unique key on IDList(ID) in any event. That will guarantee that there are no duplicates, though in the case of concurrent processes above, one of the process would get an error instead.
If you want to insure that both processes can execute concurrently without error, then change the stored procedure's isolation to serializable and add transaction-handling.
Something like this should work:
ALTER PROCEDURE [dbo].[insertMissingIDs]
#ids IdType READONLY
AS
BEGIN
SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
Begin Transaction
INSERT INTO IDList (ID)
SELECT p.ID
FROM #ids i
LEFT JOIN IDList ON i.ID = IDList.ID
WHERE IDList.ID IS NULL
Commit Transaction
END
Of course you might get some blocking also.

Singleton pattern in a stored procedure

How can you implement the Singleton pattern in a SQL Server 2005/2008 stored procedure?
We want the stored procedure to return the next value from a table to a caller, and then update the value, so the next caller gets a different value ...
BUT there will be time when there are lots of callers!
AND we don't want blocking/time-out issues
PS. maybe singleton isn't the answer ... if not, how could you handle this?
By definition SINGLETON IS A LOCKING Pattern.
Talking about databases, there are so many DB professionals that get afraid when you mention the word "Lock", but locks per se are not a problem, they are a fundamental mechanism for the Relational Databases.
You must learn how locks works, what kind of locks exists, and treat them with respect.
Always work with short transactions, lock the minimum rows as you can, work with sets not individual rows.
Locks become a problem when they are massive, and when they last too much, and of course when you build a DEADLOCK.
So, the golden rule, when you must change data into a transaction, first put an exclusive Lock (UPDATE), never a Shared Lock (SELECT), it means sometimes you have to start doing a fake LOCK as in :
BEGIN TRAN
UPDATE table
set col1 = col1
Where Key = #Key
.......
COMMIT TRAN
Prior to SQL Server 2012, when I needed a serial I've done it in two ways:
Create an IDENTITY column, so after inserting you can get the value with the built in function SCOPE_IDENTITY() there is also ##IDENTITY, but if someone create a trigger that inserts into another table with an identity column starts the nightmare.
CREATE TABLE [table]
(
Id int IDENTITY(1,1) NOT NULL,
col2 ....
col3 ....
)
The second option is to add an serial column usually in the parent table or a table made for it plus a procedure (you can use client code) to get the serial:
--IF YOU CREATE A SERIAL HERE YOU'LL SPENT SOME SPACE,
--BUT IT WILL KEEP YOUR BLOCKINGS VERY LOW
CREATE TABLE Parent
(
Id,
ChildSerial int not null,
col2 ...
col3 ...
CONSTRAINT PK_Parent PRIMARY KEY (Id)
)
GO
--NAMED CONSTRAINT Auto names are random (avoid them)
ALTER TABLE Parent
ADD CONSTRAINT Parent_DF_ChildSerial DEFAULT(0) FOR ChildSerial;
GO
CREATE TABLE Child
(
Id int not null
col2..
colN..
--PLUS PRIMARY KEY... INDEXES, etc.
)
CREATE PROC GetChildId
(
#PatentId int
#ChildSerial int output --To use the proc from another proc
)
As
Begin
BEGIN TRAN
--YOU START WITH A LOCK, SO YOU'LL NEVER GET A DEADLOCK
--NOR A FAKE SERIAL (Two clients with the same number)
UPDATE Parent
SET ChildSerial = ChildSerial + 1
WHERE Id = #PatentId
If ##error != 0
Begin
SELECT #ChildSerial = -1
SELECT #ChildSerial
ROLLBACK TRAN
RETURN
End
SELECT #ChildSerial = ChildSerial
FROM Parent
WHERE Id = #PatentId
COMMIT TRAN
SELECT #ChildSerial --To Use the proc easily from a program
End
Go

How to handle concurrency when Creating an Queue with SQL table as backend

I want to create a simple queue with a sql database as backend.
the table have the fields, id, taskname,runat(datetime) and hidden(datetime).
I want to ensure a queue item is not run once and only once.
The idea is when a client want to dequeue, a stored procedure selects the first item(sorted by runat and hidden < now), sets the hidden field to current time + 2min and returns the item.
How does MS Sql (Azure to be precise) wokr, will two clients be able to run at the same time and both set the same item to hidden and return it? Or can i be sure that they are run one by one and the second one will not return the same item as the hidden field was changed with the first?
The key is to get a lock (Row or table) on the queue item you are receiving. You can use a couple of ways, my favorite being the UPDATE with OUTPUT clause. Either will produce serialized access to the table.
Example:
CREATE PROCEDURE spGetNextItem_output
BEGIN
BEGIN TRAN
UPDATE TOP(1) Messages
SET [Status] = 1
OUTPUT INSERTED.MessageID, INSERTED.Data
WHERE [Status] = 0
COMMIT TRAN
END
CREATE PROCEDURE spGetNextItem_tablockx
BEGIN
BEGIN TRAN
DECLARE #MessageID int, #data xml
SELECT TOP(1) #MessageID = MessageID, #Data = Data
FROM Messages WITH (ROWLOCK, XLOCK, READPAST) --lock the row, skip other locked rows
WHERE [Status] = 0
UPDATE Messages
SET [Status] = 1
WHERE MessageID = #MessageID
SELECT #MessageID AS MessageID, #Data as Data
COMMIT TRAN
END
Table definition:
CREATE TABLE [dbo].[Messages](
[MessageID] [int] IDENTITY(1,1) NOT NULL,
[Status] [int] NOT NULL,
[Data] [xml] NOT NULL,
CONSTRAINT [PK_Messages] PRIMARY KEY CLUSTERED
(
[MessageID] ASC
)
)
Windows Azure SQL Database is going to behave just like a SQL Server database in terms of concurrency. This is a database problem, not a Windows Azure problem.
Now: If you went with Windows Azure Queues (or Service Bus Queues) rather than implementing your own, then the behavior is well documented. For instance: with Azure Queues, first-in gets the queue item, and then the item is marked as invisible until it's either deleted or a timeout period has been reached.

There is no row but (XLOCK,ROWLOCK) locked it?

Consider this simple table :
table create statement is :
CREATE TABLE [dbo].[Test_Serializable](
[Id] [int] NOT NULL,
[Name] [nvarchar](50) NOT NULL
)
so there is not any primary key or index.
consider it's emopty and has not any row.I want to Insert this row (1,'nima') but I want to check if there is a row with Id=1 or not.if yes call RAISERROR and if no Insert row.I write this script:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRY
BEGIN TRAN ins
IF EXISTS(SELECT * FROM Test_Serializable ts WITH(xlock,ROWLOCK) WHERE ts.Id=1)
RAISERROR(N'Row Exists',16,1);
INSERT INTO Test_Serializable
(
Id,
[Name]
)
VALUES
(
1,
'nima'
)
COMMIT TRAN ins
END TRY
BEGIN CATCH
DECLARE #a NVARCHAR(1000);
SET #a=ERROR_MESSAGE();
ROLLBACK TRAN ins
RAISERROR(#a,16,1);
END CATCH
this script works fine but there is interesting point.
I run this script from 2 SSMS and step by step run this 2 scripts(in debug mode).Interesting point is however my table has no row but one of the script when reach IF EXIST statement lock the table.
My question is whether (XLOCK,ROWLOCK) locks entire table because there is no row?or it locks phantom row :) !!???
Edit 1)
This is my scenario:
I have a table with for example 6 fields
this is Uniqueness Rules:
1)City_Code + F1_Code are Unique
2)City_Code + F2_Code are Unique
3)City_Code + F3_Code + F4_Code are uinque
the problem is user may want to fill city_code and F1_Code and when it want Insert it in other fileds we must have Empty String or 0 (for numeric fields) value.
If user want to fill City_Code + F3_Code + F4_Code then F1_Code and F2_Code must have Empty String values
How I can check this better?I can't create any Unique Index for every rules
To answer your question, the SERIALIZABLE isolation level performs range locks which would include non-existant rows within the range.
http://msdn.microsoft.com/en-us/library/ms191272.aspx
Key-range locking ensures that the following operations are
serializable:
Range scan query
Singleton fetch of nonexistent row
Delete operation
Insert operation
XLOCK is exclusive lock: so as WHERE traverses rows, rows are locked.
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE isn't about duplicates or locking of rows, it simply removes the chance of "Phantom reads". From a locking perspective, it takes range locks (eg all rows between A and B)
So with XLOCK and SERIALIZABLE you lock the table. You want UPDLOCK which isn't exclusive.
With UPDLOCK, this pattern is not safe. Under high load, you will still get duplicate errors because 2 concurrent EXISTS won't find a row, both try to INSERT, one gets a duplicate error.
So just try to INSERT and trap the error:
BEGIN TRAN ins
INSERT INTO Test_Serializable
(
Id,
[Name]
)
VALUES
(
1,
'nima'
)
COMMIT TRAN ins
END TRY
BEGIN CATCH
DECLARE #a NVARCHAR(1000);
IF ERROR_NUMBER() = 2627
RAISERROR(N'Row Exists',16,1);
ELSE
BEGIN
SET #a=ERROR_MESSAGE();
RAISERROR(#a,16,1);
END
ROLLBACK TRAN ins
END CATCH
I've mentioned this before
Edit: to force various uniques for SQL Server 2008
Use filtered indexes
CREATE UNIQUE NONCLUSTERED INDEX IX_UniqueF1 ON (City_Code, F1_Code)
WHERE F2_Code = '' AND F3_Code = '' AND AND F4_Code = 0;
CREATE UNIQUE NONCLUSTERED INDEX IX_UniqueF1 ON (City_Code, F2_Code)
WHERE F1_Code = '' AND F3_Code = '' AND AND F4_Code = 0;
CREATE UNIQUE NONCLUSTERED INDEX IX_UniqueF3F4 ON (City_Code, F3_Code, F4_Code)
WHERE F1_Code = '' AND F2_Code = '';
You can do the same with indexed views on earlier versions