DELETE all with ORDER BY

DELETE all with ORDER BY - sql

I'm building a poor's man FIFO message queue based on tables and trying to implement a receiveAll operation where all messages currently in the queue would be retrieved.
To receive a single message, I do:
WITH receiveCte AS (
SELECT TOP 1 body
FROM MyQueue WITH (ROWLOCK, READPAST)
ORDER BY id
)
DELETE FROM receiveCte
OUTPUT deleted.body;
From what I understand, the ORDER BY clause is necessary to guarantee the delete order, even if id is an identity primary key with a clustered index.
Now, to perform the receiveAll operation I need to delete all rows ORDER BY id and obviously that doesn't work without a TOP clause.
Therefore, I was thinking of performing a SELECT of the rows not locked and lock these for the entire transaction, then go on with the DELETE, however it seems I can't find a way to lock the rows affected by the select for the entire transaction.
BEGIN TRAN
DECLARE #msgCount int;
SELECT #msgCount = COUNT(*)
FROM MyQueue WITH (UPDLOCK, ROWLOCK, READPAST);
...
COMMIT TRAN
If I execute the above except the COMMIT TRAN and then execute the following statement in another connection, it still returns all rows while I expected to return 0 because of READPAST and the fact that there's an ongoing transaction holding UPDLOCK on the rows.
SELECT COUNT(*)
FROM MyQueue WITH (READPAST)
Obviously, I must be doing something wrong...
EDIT #1:
#king.code already gave the perfect answer in this case, however I found out what was going on.
It turns out that COUNT(*) seems to be ignoring the lock hints so it wasn't adequate for testing.
Also, it seems that you need an XLOCK to make sure that READPAST does it's job.
EDIT #2:
WARNING: SELECT TOP 100 PERCENT ... ORDER BY doesn't work because SQL Server seems to disregard the ORDER BY clause in that case. However, it seems that we can trick the optimizer using a variable e.g. SELECT TOP (#hundred) PERCENT, but I'm not sure how reliable that is.

I tried this and it worked for me:
**EDIT **
Updated based upon this technet article which states:
"If you have to use TOP to delete rows in a meaningful chronological order, you must use TOP together with ORDER BY in a subselect statement."
Setup
CREATE TABLE MyQueue
(
ID INT Identity Primary Key,
Body VARCHAR(MAX)
)
INSERT INTO MyQueue
VALUES ('Message 1'),
('Message 2'),
('Message 3'),
('Message 4'),
('Message 5'),
('Message 6'),
('Message 7'),
('Message 8'),
('Message 9'),
('Message 10'),
('Message 11')
In one query analyser window I performed this:
Query session 1
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
GO
Begin Tran
;WITH receiveCte AS (
SELECT TOP 1 body
FROM MyQueue WITH (READPAST)
ORDER BY id
)
DELETE FROM receiveCte WITH (ROWLOCK)
OUTPUT deleted.body;
-- note no COMMIT
Query Session 2
--Second window
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
GO
BEGIN TRAN
DELETE FROM MYQueue WITH (ROWLOCK)
WHERE ID IN
(
SELECT TOP 100 PERCENT ID
FROM MyQueue WITH (READPAST)
ORDER BY Id
)
OUTPUT deleted.Body
-- Note again no COMMIT
and then in the third window:
Query Session 3
SELECT COUNT(*)
FROM MyQueue WITH (READPAST)
which correctly returned a result of 0

From DB point of view there is no big sense in deleting rows in any particular order if you are going to delete them all. Simple DELETE without ordering is just fine.
If you are going to process row-by-row from application side, then start serializable transaction, block entire table and process\delete row-by-row based on ID, no ordering required.

From what I understand, the ORDER BY clause is necessary to guarantee the delete order, even if id is an identity primary key with a clustered index.
You're right.
Now, to perform the receiveAll operation I need to delete all rows ORDER BY id and obviously that doesn't work without a TOP clause.
Remember that you can use PERCENT in TOP:
DECLARE #Hundred FLOAT = 100;
SELECT TOP (#Hundred) PERCENT body
FROM MyQueue WITH (ROWLOCK, READPAST)
ORDER BY id;
UPDATE:
I've just made some test. And it looks like that if I ORDER BY the Clustered Index, I get the same execution plan with and without TOP(100) PERCENT.
If I ORDER BY another column I see the Sort operation in execution plan even if I place TOP(100) PERCENT. So it looks like it is not ignored.
Anyway, since the #Hundred variable and the TOP expression are FLOAT, you can try to set it to something like this 99.99999 and see what happens.

Try this using Broker Queues:
USE MASTER
CREATE DATABASE SBTest
GO
ALTER DATABASE SBTest SET ENABLE_BROKER;
GO
USE SBTest
GO
CREATE Message TYPE MyMessage
VALIDATION = NONE
GO
CREATE CONTRACT MyContract (MyMessage SENT BY INITIATOR)
GO
CREATE QUEUE MYSendQueue
GO
CREATE QUEUE MyReceiveQueue
GO
CREATE SERVICE MySendService
ON QUEUE MySendQueue (MyContract)
GO
CREATE SERVICE MyReceiveService
ON QUEUE MyReceiveQueue (MyContract)
GO
-- Send Messages
DECLARE #MyDialog uniqueidentifier
DECLARE #MyMessage NVARCHAR(128)
BEGIN DIALOG CONVERSATION #MyDialog
FROM SERVICE MySendService
TO SERVICE 'MyReceiveService'
ON CONTRACT MyContract
WITH ENCRYPTION = OFF
-- Send messages on Dialog
SET #MyMessage = N'My First Message';
SEND ON CONVERSATION #MyDialog
MESSAGE TYPE MyMessage (#MyMessage)
SET #MyMessage = N'My Second Message';
SEND ON CONVERSATION #MyDialog
MESSAGE TYPE MyMessage (#MyMessage)
SET #MyMessage = N'My Third Message';
SEND ON CONVERSATION #MyDialog
MESSAGE TYPE MyMessage (#MyMessage)
GO
-- View messages from Receive Queue
SELECT CONVERT(NVARCHAR(MAX), message_body) AS Message
FROM MyReceiveQueue
GO
-- Receive 1 message from Queue
RECEIVE TOP(1) CONVERT(NVARCHAR(MAX), message_body) AS Message
FROM MyReceiveQueue
GO
-- Receive All messages from Receive Queue
RECEIVE CONVERT(NVARCHAR(MAX), message_body) AS Message
FROM MyReceiveQueue
GO
-- Clean Up
USE master
GO
DROP DATABASE SBTest
GO

I believe SQL Server is determining that the action within the transaction does not require it to lock the rows that are "in use". I attempted to lock the table with something more obvious:
OPEN TRANSACTION
UPDATE MyQueue SET body = body
But even with that, it would still return the rows with the READPAST hint. However, if I actually change the rows:
OPEN TRANSACTION
UPDATE MyQueue SET body = body + '.'
Only then would my SELECT statement with READPAST return 0 rows. Seems that MSSQL is intelligent enough to minimize the locks on your table!
I suggest that if you want to hide rows from a READPAST hint, add a new column that can actually be edited without the fear of changing important data, and then in the process that locks the rows, update that row with an actual data change:
ALTER TABLE MyQueue ADD LockRow bit DEFAULT(0)
...
BEGIN TRANSACTION
UPDATE MyQueue SET LockRow = 1
If you do the above, your READPAST query should return 0 rows.

If you simply need to delete all of the records as of now without blocking inserts and deletes that happen concurrently, you should simply issue this command:
DELETE FROM MyQueue WITH (ROWLOCK, READPAST)
OUTPUT deleted.id, deleted.body;
This won't block neither inserts into MyQueue table, nor executions of the same statement concurrently. Concurrent executions will only pick up the records that were inserted after the previous DELETE transaction start time. Similarly, there is no need to do any ORDER BY, since the subject of deletion will be all records that existed in the table at the transaction start time.
Also I must mention that I strongly recommend not using the ROWLOCK hint, and let the SQL server decide which lock level to use for the top efficiency.

Have you tried issuing a HOLDLOCK hint, when selecting the rows? This should ensure that no other query can select the rows, until after the transaction is finished:
BEGIN TRAN
DECLARE #msgCount int;
SELECT #msgCount = COUNT(*)
FROM MyQueue WITH (HOLDLOCK, ROWLOCK, READPAST);
...
COMMIT TRAN

According to MSDN Output documentation:
There is no guarantee that the order in which the changes are applied
to the table and the order in which the rows are inserted into the
output table or table variable will correspond.
You can however insert the results of OUTPUT into a table variable, and then select back the results while keeping them ordered. Please note that using rowlock will require more resources if there are many records in the queue.
DECLARE #processQueue
TABLE
( id int NOT NULL
, body nvarchar(max) /*<- set to appropriate data type!*/
);
DELETE q
OUTPUT deleted.id, deleted.body
INTO #processQueue(id, body)
FROM MyQueue q WITH (ROWLOCK, READPAST)
;
SELECT body
FROM #processQueue q
ORDER BY q.id --ensure output ordering here.
;
Also since you have mentioned you do not want table locks, you can disable lock escalation on the MyQueue table.
--Disable lock escalation, to ensure that locks do get escalated to table locks.
ALTER TABLE MyQueue SET ( LOCK_ESCALATION = DISABLE );
go

Related

How can I serialize multiple executions of a stored procedure with the same arguments?

I have a couple hundred line stored procedure that takes a single parameter (#id) and is heavily simplified to something like:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
INSERT INTO #new_result
EXEC pr_do_a_ton_of_calculations
DELETE FROM result WHERE id = #id
INSERT INTO result
SELECT * FROM #new_result
Multiple processes may invoke this procedure concurrently, with the same parameters. I'm experiencing that both executions delete the rows one after the other, and then try to insert the same data one after the other. The result is that one errors out, because it's inserting duplicate data and violating a unique constraint.
Ideally, I'd like to ensure that two connections executing the procedure with the same #id parameter will execute both the DELETE and INSERT serially, without locking the entire table. It's also fine if the two procedures are completely serialized, as long as they aren't preventing the execution of other invocations with a different parameter.
Is there any way I can achieve this?

Add this to the beginning of your stored procedure:
DECLARE #lid INT
SELECT #lid = id
FROM result WITH (UPDLOCK, ROWLOCK)
WHERE id = #id
and get rid of the READ UNCOMMITTED above.
Make sure your id is indexed. If it's a reference to another table where it is a PRIMARY KEY, use the lock on that table instead.
Better yet, use application locks (sp_getapplock).

You can use application locks, for example:
DECLARE #ResourceName VARCHAR(200) = 'MyResource' + CONVERT(VARCHAR(20), #id)
EXEC sp_getapplock #Resource = #ResourceName, #LockMode = 'Exclusive'
---- Do your thing ----
DECLARE #ResourceName VARCHAR(200) = 'MyResource' + CONVERT(VARCHAR(20), #id)
EXEC sp_releaseapplock #Resource = #ResourceName, #LockMode = 'Exclusive'

If you need these things to happen in a guaranteed order based on receipt of request, Service Broker will manage that for you and throw in a bunch of other benefits, too. Setting it up takes some doing, but "An Introduction to Asynchronous Processing with Service Broker" by Jonathan Kehayias is the best intro I've found. You would set your "pr_do_a_ton_of_calculations" as the activation procedure for the queue and add the additional commands for handling Service Broker conversations.
This WILL make the stored procedures operate asynchronously from the caller, so if the call is being made from another stored procedure, the processing would happen off of that thread. This can actually be to your advantage if waiting for this processing is slowing things down.

This may be just getting rid of the error and not fixing the problem, but
INSERT INTO result
SELECT *
FROM #new_result
left join result
on #new_result.PK = result.PK
where result.PK is null
Even without concurrency issues you would probably be better off breaking that up into delete, update, insert as an update is more efficient than a delete and insert and does not fragment index as much
delete d
from result d
left join #new_result
on #new_result.PK = d.PK
where #new_result.PK is null
and d.ID = #ID;
update result
SELECT result.colx = #new_result.colx, result.coly = #new_result.coly
FROM #new_result
join result
on #new_result.PK = result.PK
INSERT INTO result
SELECT *
FROM #new_result
left join result
on #new_result.PK = result.PK
where result.PK is null

Row concurrency problems multi threading

I am having troubles with concurrency. A 3rd party software is executing my Stored Procedure which I need to capture a unique list of IDs in a table. The code works until multi threading is brought into the mix (gasp).
I have tried various transaction features including isolation levels to seemingly no avail.
Essentially given the following, I need table 'IDList' to contain only the unique IDs that have ever been sent.
When other threads from the 3rd party software execute the example calling code, I consistently end up with duplicates in 'IDList'. It is my estimation that the following is happening, but am unable to resolve:
Thread #1 runs the SELECT (with JOIN) in insertMissingIDs
Thread #2 runs the SELECT (with JOIN) in insertMissingIDs
Thread #1 runs the INSERT in insertMissingIDs
Thread #2 runs the INSERT in insertMissingIDs
Result: Duplicates
I realize the example may seem silly, I have boiled it down as not to reveal confidential code.
Calling code:
DECLARE #ids IdType
INSERT INTO #ids
SELECT '123'
EXEC insertMissingIDs #ids
User defined Type:
CREATE TYPE [dbo].[IdType] AS TABLE(
[ID] [nvarchar](250) NULL
)
Procedure:
ALTER PROCEDURE [dbo].[insertMissingIDs]
#ids IdType READONLY
AS
BEGIN
SET NOCOUNT ON;
INSERT INTO IDList (ID)
SELECT p.ID
FROM #ids i
LEFT JOIN IDList ON i.ID = IDList.ID
WHERE IDList.ID IS NULL
END
Thanks in advance!

I think you basically have two choices. You can set the appropriate transaction isolation level (documentation here). I think using set transaction isolation level serializable would do the trick. This could introduce a big overhead on your transactions. You would be locking the table for both reads and writes. One call will have to wait for the previous one to finish. In more complicated situations, you might end up with a deadlock.
Another option is to define the primary key table IDLIST using the IGNORE_DUP_KEY option. This allows inserts into the table. If duplicates are in the data being inserted, they are ignored.
Here is a blog post about creative ways to use this option.

First, you should probably put a unique key on IDList(ID) in any event. That will guarantee that there are no duplicates, though in the case of concurrent processes above, one of the process would get an error instead.
If you want to insure that both processes can execute concurrently without error, then change the stored procedure's isolation to serializable and add transaction-handling.
Something like this should work:
ALTER PROCEDURE [dbo].[insertMissingIDs]
#ids IdType READONLY
AS
BEGIN
SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
Begin Transaction
INSERT INTO IDList (ID)
SELECT p.ID
FROM #ids i
LEFT JOIN IDList ON i.ID = IDList.ID
WHERE IDList.ID IS NULL
Commit Transaction
END
Of course you might get some blocking also.

Getting deadlocks on MS SQL stored procedure performing a read/update (put code to handle deadlocks)

I have to admit I'm just learning about properly handling deadlocks but based on suggestions I read, I thought this was the proper way to handle it. Basically I have many processes trying to 'reserve' a row in the database for an update. So I first read for an available row, then write to it. Is this not the right way? If so, how do I need to fix this SP?
CREATE PROCEDURE [dbo].[reserveAccount]
-- Add the parameters for the stored procedure here
#machineId varchar(MAX)
AS
BEGIN
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN TRANSACTION;
declare #id BIGINT;
set #id = (select min(id) from Account_Data where passfail is null and reservedby is null);
update Account_data set reservedby = #machineId where ID = #id;
COMMIT TRANSACTION;
END

You can write this as a single statement. That will may fix the update problem:
update Account_data
set reservedby = #machineId
where ID = (select min(id) from Account_Data where passfail is null and reservedby is null);

Well, yur problem is 2that you have 2 statements - a select and an update. if those run concurrent, then the select will..... make a read lock and the update will demand a write lock. At the same time 2 machins deadlock.
Simple solution is to make the initial select demand an uddate lock (WITH (ROWLOCK, UPDLOCK) as hint). That may or may not work (depends on what else goes on) but it is a good start.
Second step - if that fails - is to use an application elvel lock (sp_getapplock) that makes sure a critical system always has only one owner and htus only exeutes transactions serially.

Deadlock on query that is executed simultaneously

I've got a stored procedure that does the following (Simplified):
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRANSACTION
DECLARE #intNo int
SET #intNo = (SELECT MAX(intNo) + 1 FROM tbl)
INSERT INTO tbl(intNo)
Values (#intNo)
SELECT intNo
FROM tbl
WHERE (intBatchNumber = #intNo - 1)
COMMIT TRANSACTION
My issue is that when two or more users execute this at the same time I am getting deadlocks. Now as I understand it the moment I do my first select in the proc this should create a lock in tbl. If the second procedure is then called while the first procedure is still executing it should wait for it to complete right?
At the moment this is causing a deadlock, any ideas?

The insert query requires a different lock than the select. The lock for select blocks a second insert, but it does not block a second select. So both queries can start with the select, but they both block on the other's insert.
You can solve this by asking the first query to lock the entire table:
SET #intNo = (SELECT MAX(intNo) + 1 FROM tbl with (tablockx))
^^^^^^^^^^^^^^^
This will make the second transaction's select wait for the complete first transaction to finish.

Make it simpler so you have one statement and no transaction
--BEGIN TRANSACTION not needed
INSERT INTO tbl(intNo)
OUTPUT INSERTED.intNo
SELECT MAX(intNo) + 1 FROM tbl WITH (TABLOCK)
--COMMIT TRANSACTION not needed
Although, why aren't you using IDENTITY...?

Solutions for INSERT OR UPDATE on SQL Server

Assume a table structure of MyTable(KEY, datafield1, datafield2...).
Often I want to either update an existing record, or insert a new record if it doesn't exist.
Essentially:
IF (key exists)
run update command
ELSE
run insert command
What's the best performing way to write this?

don't forget about transactions. Performance is good, but simple (IF EXISTS..) approach is very dangerous.
When multiple threads will try to perform Insert-or-update you can easily
get primary key violation.
Solutions provided by #Beau Crawford & #Esteban show general idea but error-prone.
To avoid deadlocks and PK violations you can use something like this:
begin tran
if exists (select * from table with (updlock,serializable) where key = #key)
begin
update table set ...
where key = #key
end
else
begin
insert into table (key, ...)
values (#key, ...)
end
commit tran
or
begin tran
update table with (serializable) set ...
where key = #key
if ##rowcount = 0
begin
insert into table (key, ...) values (#key,..)
end
commit tran

See my detailed answer to a very similar previous question
#Beau Crawford's is a good way in SQL 2005 and below, though if you're granting rep it should go to the first guy to SO it. The only problem is that for inserts it's still two IO operations.
MS Sql2008 introduces merge from the SQL:2003 standard:
merge tablename with(HOLDLOCK) as target
using (values ('new value', 'different value'))
as source (field1, field2)
on target.idfield = 7
when matched then
update
set field1 = source.field1,
field2 = source.field2,
...
when not matched then
insert ( idfield, field1, field2, ... )
values ( 7, source.field1, source.field2, ... )
Now it's really just one IO operation, but awful code :-(

Do an UPSERT:
UPDATE MyTable SET FieldA=#FieldA WHERE Key=#Key
IF ##ROWCOUNT = 0
INSERT INTO MyTable (FieldA) VALUES (#FieldA)
http://en.wikipedia.org/wiki/Upsert

Many people will suggest you use MERGE, but I caution you against it. By default, it doesn't protect you from concurrency and race conditions any more than multiple statements, and it introduces other dangers:
Use Caution with SQL Server's MERGE Statement
So, you want to use MERGE, eh?
Even with this "simpler" syntax available, I still prefer this approach (error handling omitted for brevity):
BEGIN TRANSACTION;
UPDATE dbo.table WITH (UPDLOCK, SERIALIZABLE)
SET ... WHERE PK = #PK;
IF ##ROWCOUNT = 0
BEGIN
INSERT dbo.table(PK, ...) SELECT #PK, ...;
END
COMMIT TRANSACTION;
Please stop using this UPSERT anti-pattern
A lot of folks will suggest this way:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
IF EXISTS (SELECT 1 FROM dbo.table WHERE PK = #PK)
BEGIN
UPDATE ...
END
ELSE
BEGIN
INSERT ...
END
COMMIT TRANSACTION;
But all this accomplishes is ensuring you may need to read the table twice to locate the row(s) to be updated. In the first sample, you will only ever need to locate the row(s) once. (In both cases, if no rows are found from the initial read, an insert occurs.)
Others will suggest this way:
BEGIN TRY
INSERT ...
END TRY
BEGIN CATCH
IF ERROR_NUMBER() = 2627
UPDATE ...
END CATCH
However, this is problematic if for no other reason than letting SQL Server catch exceptions that you could have prevented in the first place is much more expensive, except in the rare scenario where almost every insert fails. I prove as much here:
Checking for potential constraint violations before entering TRY/CATCH
Performance impact of different error handling techniques

IF EXISTS (SELECT * FROM [Table] WHERE ID = rowID)
UPDATE [Table] SET propertyOne = propOne, property2 . . .
ELSE
INSERT INTO [Table] (propOne, propTwo . . .)
Edit:
Alas, even to my own detriment, I must admit the solutions that do this without a select seem to be better since they accomplish the task with one less step.

If you want to UPSERT more than one record at a time you can use the ANSI SQL:2003 DML statement MERGE.
MERGE INTO table_name WITH (HOLDLOCK) USING table_name ON (condition)
WHEN MATCHED THEN UPDATE SET column1 = value1 [, column2 = value2 ...]
WHEN NOT MATCHED THEN INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...])
Check out Mimicking MERGE Statement in SQL Server 2005.

Although its pretty late to comment on this I want to add a more complete example using MERGE.
Such Insert+Update statements are usually called "Upsert" statements and can be implemented using MERGE in SQL Server.
A very good example is given here:
http://weblogs.sqlteam.com/dang/archive/2009/01/31/UPSERT-Race-Condition-With-MERGE.aspx
The above explains locking and concurrency scenarios as well.
I will be quoting the same for reference:
ALTER PROCEDURE dbo.Merge_Foo2
#ID int
AS
SET NOCOUNT, XACT_ABORT ON;
MERGE dbo.Foo2 WITH (HOLDLOCK) AS f
USING (SELECT #ID AS ID) AS new_foo
ON f.ID = new_foo.ID
WHEN MATCHED THEN
UPDATE
SET f.UpdateSpid = ##SPID,
UpdateTime = SYSDATETIME()
WHEN NOT MATCHED THEN
INSERT
(
ID,
InsertSpid,
InsertTime
)
VALUES
(
new_foo.ID,
##SPID,
SYSDATETIME()
);
RETURN ##ERROR;

/*
CREATE TABLE ApplicationsDesSocietes (
id INT IDENTITY(0,1) NOT NULL,
applicationId INT NOT NULL,
societeId INT NOT NULL,
suppression BIT NULL,
CONSTRAINT PK_APPLICATIONSDESSOCIETES PRIMARY KEY (id)
)
GO
--*/
DECLARE #applicationId INT = 81, #societeId INT = 43, #suppression BIT = 0
MERGE dbo.ApplicationsDesSocietes WITH (HOLDLOCK) AS target
--set the SOURCE table one row
USING (VALUES (#applicationId, #societeId, #suppression))
AS source (applicationId, societeId, suppression)
--here goes the ON join condition
ON target.applicationId = source.applicationId and target.societeId = source.societeId
WHEN MATCHED THEN
UPDATE
--place your list of SET here
SET target.suppression = source.suppression
WHEN NOT MATCHED THEN
--insert a new line with the SOURCE table one row
INSERT (applicationId, societeId, suppression)
VALUES (source.applicationId, source.societeId, source.suppression);
GO
Replace table and field names by whatever you need.
Take care of the using ON condition.
Then set the appropriate value (and type) for the variables on the DECLARE line.
Cheers.

That depends on the usage pattern. One has to look at the usage big picture without getting lost in the details. For example, if the usage pattern is 99% updates after the record has been created, then the 'UPSERT' is the best solution.
After the first insert (hit), it will be all single statement updates, no ifs or buts. The 'where' condition on the insert is necessary otherwise it will insert duplicates, and you don't want to deal with locking.
UPDATE <tableName> SET <field>=#field WHERE key=#key;
IF ##ROWCOUNT = 0
BEGIN
INSERT INTO <tableName> (field)
SELECT #field
WHERE NOT EXISTS (select * from tableName where key = #key);
END

You can use MERGE Statement, This statement is used to insert data if not exist or update if does exist.
MERGE INTO Employee AS e
using EmployeeUpdate AS eu
ON e.EmployeeID = eu.EmployeeID`

If going the UPDATE if-no-rows-updated then INSERT route, consider doing the INSERT first to prevent a race condition (assuming no intervening DELETE)
INSERT INTO MyTable (Key, FieldA)
SELECT #Key, #FieldA
WHERE NOT EXISTS
(
SELECT *
FROM MyTable
WHERE Key = #Key
)
IF ##ROWCOUNT = 0
BEGIN
UPDATE MyTable
SET FieldA=#FieldA
WHERE Key=#Key
IF ##ROWCOUNT = 0
... record was deleted, consider looping to re-run the INSERT, or RAISERROR ...
END
Apart from avoiding a race condition, if in most cases the record will already exist then this will cause the INSERT to fail, wasting CPU.
Using MERGE probably preferable for SQL2008 onwards.

MS SQL Server 2008 introduces the MERGE statement, which I believe is part of the SQL:2003 standard. As many have shown it is not a big deal to handle one row cases, but when dealing with large datasets, one needs a cursor, with all the performance problems that come along. The MERGE statement will be much welcomed addition when dealing with large datasets.

Before everyone jumps to HOLDLOCK-s out of fear from these nafarious users running your sprocs directly :-) let me point out that you have to guarantee uniqueness of new PK-s by design (identity keys, sequence generators in Oracle, unique indexes for external ID-s, queries covered by indexes). That's the alpha and omega of the issue. If you don't have that, no HOLDLOCK-s of the universe are going to save you and if you do have that then you don't need anything beyond UPDLOCK on the first select (or to use update first).
Sprocs normally run under very controlled conditions and with the assumption of a trusted caller (mid tier). Meaning that if a simple upsert pattern (update+insert or merge) ever sees duplicate PK that means a bug in your mid-tier or table design and it's good that SQL will yell a fault in such case and reject the record. Placing a HOLDLOCK in this case equals eating exceptions and taking in potentially faulty data, besides reducing your perf.
Having said that, Using MERGE, or UPDATE then INSERT is easier on your server and less error prone since you don't have to remember to add (UPDLOCK) to first select. Also, if you are doing inserts/updates in small batches you need to know your data in order to decide whether a transaction is appropriate or not. It it's just a collection of unrelated records then additional "enveloping" transaction will be detrimental.

Does the race conditions really matter if you first try an update followed by an insert?
Lets say you have two threads that want to set a value for key key:
Thread 1: value = 1
Thread 2: value = 2
Example race condition scenario
key is not defined
Thread 1 fails with update
Thread 2 fails with update
Exactly one of thread 1 or thread 2 succeeds with insert. E.g. thread 1
The other thread fails with insert (with error duplicate key) - thread 2.
Result: The "first" of the two treads to insert, decides value.
Wanted result: The last of the 2 threads to write data (update or insert) should decide value
But; in a multithreaded environment, the OS scheduler decides on the order of the thread execution - in the above scenario, where we have this race condition, it was the OS that decided on the sequence of execution. Ie: It is wrong to say that "thread 1" or "thread 2" was "first" from a system viewpoint.
When the time of execution is so close for thread 1 and thread 2, the outcome of the race condition doesn't matter. The only requirement should be that one of the threads should define the resulting value.
For the implementation: If update followed by insert results in error "duplicate key", this should be treated as success.
Also, one should of course never assume that value in the database is the same as the value you wrote last.

I had tried below solution and it works for me, when concurrent request for insert statement occurs.
begin tran
if exists (select * from table with (updlock,serializable) where key = #key)
begin
update table set ...
where key = #key
end
else
begin
insert table (key, ...)
values (#key, ...)
end
commit tran

You can use this query. Work in all SQL Server editions. It's simple, and clear. But you need use 2 queries. You can use if you can't use MERGE
BEGIN TRAN
UPDATE table
SET Id = #ID, Description = #Description
WHERE Id = #Id
INSERT INTO table(Id, Description)
SELECT #Id, #Description
WHERE NOT EXISTS (SELECT NULL FROM table WHERE Id = #Id)
COMMIT TRAN
NOTE: Please explain answer negatives

Assuming that you want to insert/update single row, most optimal approach is to use SQL Server's REPEATABLE READ transaction isolation level:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN TRANSACTION
IF (EXISTS (SELECT * FROM myTable WHERE key=#key)
UPDATE myTable SET ...
WHERE key=#key
ELSE
INSERT INTO myTable (key, ...)
VALUES (#key, ...)
COMMIT TRANSACTION
This isolation level will prevent/block subsequent repeatable read transactions from accessing same row (WHERE key=#key) while currently running transaction is open.
On the other hand, operations on another row won't be blocked (WHERE key=#key2).

You can use:
INSERT INTO tableName (...) VALUES (...)
ON DUPLICATE KEY
UPDATE ...
Using this, if there is already an entry for the particular key, then it will UPDATE, else, it will INSERT.

In SQL Server 2008 you can use the MERGE statement

If you use ADO.NET, the DataAdapter handles this.
If you want to handle it yourself, this is the way:
Make sure there is a primary key constraint on your key column.
Then you:
Do the update
If the update fails because a record with the key already exists, do the insert. If the update does not fail, you are finished.
You can also do it the other way round, i.e. do the insert first, and do the update if the insert fails. Normally the first way is better, because updates are done more often than inserts.

Doing an if exists ... else ... involves doing two requests minimum (one to check, one to take action). The following approach requires only one where the record exists, two if an insert is required:
DECLARE #RowExists bit
SET #RowExists = 0
UPDATE MyTable SET DataField1 = 'xxx', #RowExists = 1 WHERE Key = 123
IF #RowExists = 0
INSERT INTO MyTable (Key, DataField1) VALUES (123, 'xxx')

I usually do what several of the other posters have said with regard to checking for it existing first and then doing whatever the correct path is. One thing you should remember when doing this is that the execution plan cached by sql could be nonoptimal for one path or the other. I believe the best way to do this is to call two different stored procedures.
FirstSP:
If Exists
Call SecondSP (UpdateProc)
Else
Call ThirdSP (InsertProc)
Now, I don't follow my own advice very often, so take it with a grain of salt.

Do a select, if you get a result, update it, if not, create it.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

DELETE all with ORDER BY - sql

Have you tried issuing a HOLDLOCK hint, when selecting the rows? This should ensure that no other query can select the rows, until after the transaction is finished: BEGIN TRAN DECLARE #msgCount int; SELECT #msgCount = COUNT(*) FROM MyQueue WITH (HOLDLOCK, ROWLOCK, READPAST); ... COMMIT TRAN

Related

How can I serialize multiple executions of a stored procedure with the same arguments?

Row concurrency problems multi threading

Getting deadlocks on MS SQL stored procedure performing a read/update (put code to handle deadlocks)

Deadlock on query that is executed simultaneously

Solutions for INSERT OR UPDATE on SQL Server

Categories

Resources