Conditionally inserting records into a table in multithreaded environment based on a count

Conditionally inserting records into a table in multithreaded environment based on a count - sql

I am writing a T-SQL stored procedure that conditionally adds a record to a table only if the number of similar records is below a certain threshold, 10 in the example below. The problem is this will be run from a web application, so it will run on multiple threads, and I need to ensure that the table never has more than 10 similar records.
The basic gist of the procedure is:
BEGIN
DECLARE #c INT
SELECT #c = count(*)
FROM foo
WHERE bar = #a_param
IF #c < 10 THEN
INSERT INTO foo
(bar)
VALUES (#a_param)
END IF
END
I think I could solve any potential concurrency problems by replacing the select statement with:
SELECT #c = count(*) WITH (TABLOCKX, HOLDLOCK)
But I am curious if there any methods other than lock hints for managing concurrency problems in T-SQL

One option would be to use the sp_getapplock system stored procedure. You can place your critical section logic in a transaction and use the built in locking of sql server to ensure synchronized access.
Example:
CREATE PROC MyCriticalWork(#MyParam INT)
AS
DECLARE #LockRequestResult INT
SET #LockRequestResult=0
DECLARE #MyTimeoutMiliseconds INT
SET #MyTimeoutMiliseconds=5000--Wait only five seconds max then timeouit
BEGIN TRAN
EXEC #LockRequestResult=SP_GETAPPLOCK 'MyCriticalWork','Exclusive','Transaction',#MyTimeoutMiliseconds
IF(#LockRequestResult>=0)BEGIN
/*
DO YOUR CRITICAL READS AND WRITES HERE
*/
--Release the lock
COMMIT TRAN
END ELSE
ROLLBACK TRAN

Use SERIALIZABLE. By definition it provides you the illusion that your transaction is the only transaction running. Be aware that this might result in blocking and deadlocking. In fact this SQL code is a classic candidate for deadlocking: Two transactions might first read a set of rows, then both will try to modify that set of rows. Locking hints are the classic way of solving that problem. Retry also works.

As stated in the comment. Why are you trying to insert on multiple threads? You cannot write to a table faster on multiple threads.
But you don't need a declare
insert into [Table_1] (ID, fname, lname)
select 3, 'fname', 'lname'
from [Table_1]
where ID = 3
having COUNT(*) <= 10
If you need to take a lock then do so
The data is not 3NF
Should start any design with a proper data model
Why rule out table lock?
That could very well be the best approach
Really, what are the chances?
Even without a lock you would have to have two at a count of 9 submit at exactly the same time. Even then it would stop at 11. Is the 10 an absolute hard number?

Related

SQL number generation in concurrent environment (Transation isolation level)

I am working with an application that generates invoice numbers (sequentially based on few parameters) and so far it has been using a trigger with serialized transaction. Because the trigger is rather "heavy" it manages to timeout execution of the insert query.
I'm now working on a solution to that problem and so far I came to the point where I have a stored procedure that do the insert and after the insert I have a transaction with isolation level serializable (which by the way applies to that transaction only or should i set it back after the transaction has been commited?) that:
gets the number
if not found do the insert into that table and if found updates the number (increment)
commits the transaction
I'm wondering whether there's a better way to ensure the number is used once and gets incrementer with the table locked (only the number tables gets locked, right?).
I read about sp_getapplock, would that be somewhat a better way to achieve my goal?

I would optimize the routine for update (and handle "insert if not there" separately), at which point it would be:
declare #number int;
update tbl
set #number = number, number += 1
where year = #year and month = #month and office = #office and type = #type;
You don't need any specific locking hints or isolation levels, SQL Server will ensure no two transactions read the same value before incrementing.
If you'd like to avoid handling the insert separately, you can:
merge into tbl
using (values (#year, #month, #office, #type)) as v(y,m,o,t)
on tbl.year = v.year and tbl.month = v.month and tbl.office = v.office and tbl.type = v.type
when not matched by target then
insert (year, month, office, type, number) values(#year, #month, #office, #type, 1)
when matched then
update set #number = tbl.number, tbl.number += 1
;
Logically this should provide the same guard against race condition as update, but for some reason I don't remember where is the proof.

If you first insert and then update you have a time window where an invalid number is set and can be observed. Further, if the 2nd transaction fails which can always happen you have inconsistent data.
Try this:
Take a fresh number in tran 1.
Insert in tran 2 with the number that was taken already
That way you might burn a number but there will never be inconsistent data.

Update if a key, or combination of keys, exists, otherwise INSERT [duplicate]

Assume a table structure of MyTable(KEY, datafield1, datafield2...).
Often I want to either update an existing record, or insert a new record if it doesn't exist.
Essentially:
IF (key exists)
run update command
ELSE
run insert command
What's the best performing way to write this?

don't forget about transactions. Performance is good, but simple (IF EXISTS..) approach is very dangerous.
When multiple threads will try to perform Insert-or-update you can easily
get primary key violation.
Solutions provided by #Beau Crawford & #Esteban show general idea but error-prone.
To avoid deadlocks and PK violations you can use something like this:
begin tran
if exists (select * from table with (updlock,serializable) where key = #key)
begin
update table set ...
where key = #key
end
else
begin
insert into table (key, ...)
values (#key, ...)
end
commit tran
or
begin tran
update table with (serializable) set ...
where key = #key
if ##rowcount = 0
begin
insert into table (key, ...) values (#key,..)
end
commit tran

See my detailed answer to a very similar previous question
#Beau Crawford's is a good way in SQL 2005 and below, though if you're granting rep it should go to the first guy to SO it. The only problem is that for inserts it's still two IO operations.
MS Sql2008 introduces merge from the SQL:2003 standard:
merge tablename with(HOLDLOCK) as target
using (values ('new value', 'different value'))
as source (field1, field2)
on target.idfield = 7
when matched then
update
set field1 = source.field1,
field2 = source.field2,
...
when not matched then
insert ( idfield, field1, field2, ... )
values ( 7, source.field1, source.field2, ... )
Now it's really just one IO operation, but awful code :-(

Do an UPSERT:
UPDATE MyTable SET FieldA=#FieldA WHERE Key=#Key
IF ##ROWCOUNT = 0
INSERT INTO MyTable (FieldA) VALUES (#FieldA)
http://en.wikipedia.org/wiki/Upsert

Many people will suggest you use MERGE, but I caution you against it. By default, it doesn't protect you from concurrency and race conditions any more than multiple statements, and it introduces other dangers:
Use Caution with SQL Server's MERGE Statement
So, you want to use MERGE, eh?
Even with this "simpler" syntax available, I still prefer this approach (error handling omitted for brevity):
BEGIN TRANSACTION;
UPDATE dbo.table WITH (UPDLOCK, SERIALIZABLE)
SET ... WHERE PK = #PK;
IF ##ROWCOUNT = 0
BEGIN
INSERT dbo.table(PK, ...) SELECT #PK, ...;
END
COMMIT TRANSACTION;
Please stop using this UPSERT anti-pattern
A lot of folks will suggest this way:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
IF EXISTS (SELECT 1 FROM dbo.table WHERE PK = #PK)
BEGIN
UPDATE ...
END
ELSE
BEGIN
INSERT ...
END
COMMIT TRANSACTION;
But all this accomplishes is ensuring you may need to read the table twice to locate the row(s) to be updated. In the first sample, you will only ever need to locate the row(s) once. (In both cases, if no rows are found from the initial read, an insert occurs.)
Others will suggest this way:
BEGIN TRY
INSERT ...
END TRY
BEGIN CATCH
IF ERROR_NUMBER() = 2627
UPDATE ...
END CATCH
However, this is problematic if for no other reason than letting SQL Server catch exceptions that you could have prevented in the first place is much more expensive, except in the rare scenario where almost every insert fails. I prove as much here:
Checking for potential constraint violations before entering TRY/CATCH
Performance impact of different error handling techniques

IF EXISTS (SELECT * FROM [Table] WHERE ID = rowID)
UPDATE [Table] SET propertyOne = propOne, property2 . . .
ELSE
INSERT INTO [Table] (propOne, propTwo . . .)
Edit:
Alas, even to my own detriment, I must admit the solutions that do this without a select seem to be better since they accomplish the task with one less step.

If you want to UPSERT more than one record at a time you can use the ANSI SQL:2003 DML statement MERGE.
MERGE INTO table_name WITH (HOLDLOCK) USING table_name ON (condition)
WHEN MATCHED THEN UPDATE SET column1 = value1 [, column2 = value2 ...]
WHEN NOT MATCHED THEN INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...])
Check out Mimicking MERGE Statement in SQL Server 2005.

Although its pretty late to comment on this I want to add a more complete example using MERGE.
Such Insert+Update statements are usually called "Upsert" statements and can be implemented using MERGE in SQL Server.
A very good example is given here:
http://weblogs.sqlteam.com/dang/archive/2009/01/31/UPSERT-Race-Condition-With-MERGE.aspx
The above explains locking and concurrency scenarios as well.
I will be quoting the same for reference:
ALTER PROCEDURE dbo.Merge_Foo2
#ID int
AS
SET NOCOUNT, XACT_ABORT ON;
MERGE dbo.Foo2 WITH (HOLDLOCK) AS f
USING (SELECT #ID AS ID) AS new_foo
ON f.ID = new_foo.ID
WHEN MATCHED THEN
UPDATE
SET f.UpdateSpid = ##SPID,
UpdateTime = SYSDATETIME()
WHEN NOT MATCHED THEN
INSERT
(
ID,
InsertSpid,
InsertTime
)
VALUES
(
new_foo.ID,
##SPID,
SYSDATETIME()
);
RETURN ##ERROR;

/*
CREATE TABLE ApplicationsDesSocietes (
id INT IDENTITY(0,1) NOT NULL,
applicationId INT NOT NULL,
societeId INT NOT NULL,
suppression BIT NULL,
CONSTRAINT PK_APPLICATIONSDESSOCIETES PRIMARY KEY (id)
)
GO
--*/
DECLARE #applicationId INT = 81, #societeId INT = 43, #suppression BIT = 0
MERGE dbo.ApplicationsDesSocietes WITH (HOLDLOCK) AS target
--set the SOURCE table one row
USING (VALUES (#applicationId, #societeId, #suppression))
AS source (applicationId, societeId, suppression)
--here goes the ON join condition
ON target.applicationId = source.applicationId and target.societeId = source.societeId
WHEN MATCHED THEN
UPDATE
--place your list of SET here
SET target.suppression = source.suppression
WHEN NOT MATCHED THEN
--insert a new line with the SOURCE table one row
INSERT (applicationId, societeId, suppression)
VALUES (source.applicationId, source.societeId, source.suppression);
GO
Replace table and field names by whatever you need.
Take care of the using ON condition.
Then set the appropriate value (and type) for the variables on the DECLARE line.
Cheers.

That depends on the usage pattern. One has to look at the usage big picture without getting lost in the details. For example, if the usage pattern is 99% updates after the record has been created, then the 'UPSERT' is the best solution.
After the first insert (hit), it will be all single statement updates, no ifs or buts. The 'where' condition on the insert is necessary otherwise it will insert duplicates, and you don't want to deal with locking.
UPDATE <tableName> SET <field>=#field WHERE key=#key;
IF ##ROWCOUNT = 0
BEGIN
INSERT INTO <tableName> (field)
SELECT #field
WHERE NOT EXISTS (select * from tableName where key = #key);
END

You can use MERGE Statement, This statement is used to insert data if not exist or update if does exist.
MERGE INTO Employee AS e
using EmployeeUpdate AS eu
ON e.EmployeeID = eu.EmployeeID`

If going the UPDATE if-no-rows-updated then INSERT route, consider doing the INSERT first to prevent a race condition (assuming no intervening DELETE)
INSERT INTO MyTable (Key, FieldA)
SELECT #Key, #FieldA
WHERE NOT EXISTS
(
SELECT *
FROM MyTable
WHERE Key = #Key
)
IF ##ROWCOUNT = 0
BEGIN
UPDATE MyTable
SET FieldA=#FieldA
WHERE Key=#Key
IF ##ROWCOUNT = 0
... record was deleted, consider looping to re-run the INSERT, or RAISERROR ...
END
Apart from avoiding a race condition, if in most cases the record will already exist then this will cause the INSERT to fail, wasting CPU.
Using MERGE probably preferable for SQL2008 onwards.

MS SQL Server 2008 introduces the MERGE statement, which I believe is part of the SQL:2003 standard. As many have shown it is not a big deal to handle one row cases, but when dealing with large datasets, one needs a cursor, with all the performance problems that come along. The MERGE statement will be much welcomed addition when dealing with large datasets.

Before everyone jumps to HOLDLOCK-s out of fear from these nafarious users running your sprocs directly :-) let me point out that you have to guarantee uniqueness of new PK-s by design (identity keys, sequence generators in Oracle, unique indexes for external ID-s, queries covered by indexes). That's the alpha and omega of the issue. If you don't have that, no HOLDLOCK-s of the universe are going to save you and if you do have that then you don't need anything beyond UPDLOCK on the first select (or to use update first).
Sprocs normally run under very controlled conditions and with the assumption of a trusted caller (mid tier). Meaning that if a simple upsert pattern (update+insert or merge) ever sees duplicate PK that means a bug in your mid-tier or table design and it's good that SQL will yell a fault in such case and reject the record. Placing a HOLDLOCK in this case equals eating exceptions and taking in potentially faulty data, besides reducing your perf.
Having said that, Using MERGE, or UPDATE then INSERT is easier on your server and less error prone since you don't have to remember to add (UPDLOCK) to first select. Also, if you are doing inserts/updates in small batches you need to know your data in order to decide whether a transaction is appropriate or not. It it's just a collection of unrelated records then additional "enveloping" transaction will be detrimental.

Does the race conditions really matter if you first try an update followed by an insert?
Lets say you have two threads that want to set a value for key key:
Thread 1: value = 1
Thread 2: value = 2
Example race condition scenario
key is not defined
Thread 1 fails with update
Thread 2 fails with update
Exactly one of thread 1 or thread 2 succeeds with insert. E.g. thread 1
The other thread fails with insert (with error duplicate key) - thread 2.
Result: The "first" of the two treads to insert, decides value.
Wanted result: The last of the 2 threads to write data (update or insert) should decide value
But; in a multithreaded environment, the OS scheduler decides on the order of the thread execution - in the above scenario, where we have this race condition, it was the OS that decided on the sequence of execution. Ie: It is wrong to say that "thread 1" or "thread 2" was "first" from a system viewpoint.
When the time of execution is so close for thread 1 and thread 2, the outcome of the race condition doesn't matter. The only requirement should be that one of the threads should define the resulting value.
For the implementation: If update followed by insert results in error "duplicate key", this should be treated as success.
Also, one should of course never assume that value in the database is the same as the value you wrote last.

I had tried below solution and it works for me, when concurrent request for insert statement occurs.
begin tran
if exists (select * from table with (updlock,serializable) where key = #key)
begin
update table set ...
where key = #key
end
else
begin
insert table (key, ...)
values (#key, ...)
end
commit tran

You can use this query. Work in all SQL Server editions. It's simple, and clear. But you need use 2 queries. You can use if you can't use MERGE
BEGIN TRAN
UPDATE table
SET Id = #ID, Description = #Description
WHERE Id = #Id
INSERT INTO table(Id, Description)
SELECT #Id, #Description
WHERE NOT EXISTS (SELECT NULL FROM table WHERE Id = #Id)
COMMIT TRAN
NOTE: Please explain answer negatives

Assuming that you want to insert/update single row, most optimal approach is to use SQL Server's REPEATABLE READ transaction isolation level:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN TRANSACTION
IF (EXISTS (SELECT * FROM myTable WHERE key=#key)
UPDATE myTable SET ...
WHERE key=#key
ELSE
INSERT INTO myTable (key, ...)
VALUES (#key, ...)
COMMIT TRANSACTION
This isolation level will prevent/block subsequent repeatable read transactions from accessing same row (WHERE key=#key) while currently running transaction is open.
On the other hand, operations on another row won't be blocked (WHERE key=#key2).

You can use:
INSERT INTO tableName (...) VALUES (...)
ON DUPLICATE KEY
UPDATE ...
Using this, if there is already an entry for the particular key, then it will UPDATE, else, it will INSERT.

In SQL Server 2008 you can use the MERGE statement

If you use ADO.NET, the DataAdapter handles this.
If you want to handle it yourself, this is the way:
Make sure there is a primary key constraint on your key column.
Then you:
Do the update
If the update fails because a record with the key already exists, do the insert. If the update does not fail, you are finished.
You can also do it the other way round, i.e. do the insert first, and do the update if the insert fails. Normally the first way is better, because updates are done more often than inserts.

Doing an if exists ... else ... involves doing two requests minimum (one to check, one to take action). The following approach requires only one where the record exists, two if an insert is required:
DECLARE #RowExists bit
SET #RowExists = 0
UPDATE MyTable SET DataField1 = 'xxx', #RowExists = 1 WHERE Key = 123
IF #RowExists = 0
INSERT INTO MyTable (Key, DataField1) VALUES (123, 'xxx')

I usually do what several of the other posters have said with regard to checking for it existing first and then doing whatever the correct path is. One thing you should remember when doing this is that the execution plan cached by sql could be nonoptimal for one path or the other. I believe the best way to do this is to call two different stored procedures.
FirstSP:
If Exists
Call SecondSP (UpdateProc)
Else
Call ThirdSP (InsertProc)
Now, I don't follow my own advice very often, so take it with a grain of salt.

Do a select, if you get a result, update it, if not, create it.

Getting deadlocks on MS SQL stored procedure performing a read/update (put code to handle deadlocks)

I have to admit I'm just learning about properly handling deadlocks but based on suggestions I read, I thought this was the proper way to handle it. Basically I have many processes trying to 'reserve' a row in the database for an update. So I first read for an available row, then write to it. Is this not the right way? If so, how do I need to fix this SP?
CREATE PROCEDURE [dbo].[reserveAccount]
-- Add the parameters for the stored procedure here
#machineId varchar(MAX)
AS
BEGIN
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN TRANSACTION;
declare #id BIGINT;
set #id = (select min(id) from Account_Data where passfail is null and reservedby is null);
update Account_data set reservedby = #machineId where ID = #id;
COMMIT TRANSACTION;
END

You can write this as a single statement. That will may fix the update problem:
update Account_data
set reservedby = #machineId
where ID = (select min(id) from Account_Data where passfail is null and reservedby is null);

Well, yur problem is 2that you have 2 statements - a select and an update. if those run concurrent, then the select will..... make a read lock and the update will demand a write lock. At the same time 2 machins deadlock.
Simple solution is to make the initial select demand an uddate lock (WITH (ROWLOCK, UPDLOCK) as hint). That may or may not work (depends on what else goes on) but it is a good start.
Second step - if that fails - is to use an application elvel lock (sp_getapplock) that makes sure a critical system always has only one owner and htus only exeutes transactions serially.

Having TRANSACTION In All Queries

Do you think always having a transaction around the SQL statements in a stored procedure is a good practice? I'm just about to optimize this legacy application in my company, and one thing I found is that every stored procedure has BEGIN TRANSACTION. Even a procedure with a single select or update statement has one. I thought it would be nice to have BEGIN TRANSACTION if performing multiple actions, but not just one action. I may be wrong, which is why I need someone else to advise me. Thanks for your time, guys.

It is entirely unnecessary as each SQL statement executes atomically, ie. as if it were already running in its own transaction. In fact, opening unnecessary transactions can lead to increased locking, even deadlocks. Forgetting to match COMMITs with BEGINs can leave a transaction open for as long as the connection to the database is open and interfere with other transactions in the same connection.
Such coding almost certainly means that whoever wrote the code was not very experienced in database programming and is a sure smell that there may be other problems as well.

The only possible reason I could see for this is if you have the possibility of needing to roll-back the transaction for a reason other than a SQL failure.
However, if the code is literally
begin transaction
statement
commit
Then I see absolutely no reason to use an explicit transaction, and it's probably being done because it's always been done that way.

I don't know of any benefit of not just using auto commit transactions for these statements.
Possible disadvantages of using explicit transactions everywhere might be that it just adds clutter to the code and so makes it less easy to see when an explicit transaction is being used to ensure correctness over multiple statements.
Additionally it increases the risk that a transaction is left open holding locks unless care is taken (e.g. with SET XACT_ABORT ON).
Also there is a minor performance implication as shown in #8kb's answer. This illustrates it another way using the visual studio profiler.
Setup
(Testing against an empty table)
CREATE TABLE T (X INT)
Explicit
SET NOCOUNT ON
DECLARE #X INT
WHILE ( 1 = 1 )
BEGIN
BEGIN TRAN
SELECT #X = X
FROM T
COMMIT
END
Auto Commit
SET NOCOUNT ON
DECLARE #X INT
WHILE ( 1 = 1 )
BEGIN
SELECT #X = X
FROM T
END
Both of them end up spending time in CMsqlXactImp::Begin and CMsqlXactImp::Commit but for the explicit transactions case it spends a significantly greater proportion of the execution time in these methods and hence less time doing useful work.
+--------------------------------+----------+----------+
| | Auto | Explicit |
+--------------------------------+----------+----------+
| CXStmtQuery::ErsqExecuteQuery | 35.16% | 25.06% |
| CXStmtQuery::XretSchemaChanged | 20.71% | 14.89% |
| CMsqlXactImp::Begin | 5.06% | 13% |
| CMsqlXactImp::Commit | 12.41% | 24.03% |
+--------------------------------+----------+----------+

When performing multiple insert/update/delete, it is better to have a transaction to insure atomicity on operation and it insure that all the tasks of operation are executed or none.
For single insert/update/delete statement, it depends upon what kind of operation (from business layer perspective) you are performing and how important it is. If you are performing some calculation before single insert/update/delete, then better use transaction, may be some data changed after you retrieve data for insert/update/delete.

One plus point is you can add another INSERT (for example) and it's already safe.
Then again, you also have the problem of nested transactions if a stored procedure calls another one. An inner rollback will cause error 266.
If every call is simple CRUD with no nesting then it's pointless: but if you nest or have multiple writes pre TXN then it's good to have a consistent template.

You mentioned that you'll be optimizing this legacy app.
One of the first, and easiest, things you can do to improve performance is remove all the BEGIN TRAN and COMMIT TRAN for the stored procedures that only do SELECTs.
Here is a simple test to demonstrate:
/* Compare basic SELECT times with and without a transaction */
DECLARE #date DATETIME2
DECLARE #noTran INT
DECLARE #withTran INT
SET #noTran = 0
SET #withTran = 0
DECLARE #t TABLE (ColA INT)
INSERT #t VALUES (1)
DECLARE
#count INT,
#value INT
SET #count = 1
WHILE #count < 1000000
BEGIN
SET #date = GETDATE()
SELECT #value = ColA FROM #t WHERE ColA = 1
SET #noTran = #noTran + DATEDIFF(MICROSECOND, #date, GETDATE())
SET #date = GETDATE()
BEGIN TRAN
SELECT #value = ColA FROM #t WHERE ColA = 1
COMMIT TRAN
SET #withTran = #withTran + DATEDIFF(MICROSECOND, #date, GETDATE())
SET #count = #count + 1
END
SELECT
#noTran / 1000000. AS Seconds_NoTransaction,
#withTran / 1000000. AS Seconds_WithTransaction
/** Results **/
Seconds_NoTransaction Seconds_WithTransaction
--------------------------------------- ---------------------------------------
14.23600000 18.08300000
You can see there is a definite overhead associated with the transactions.
Note: this is assuming your these stored procedures are not using any special isolation levels or locking hints (for something like handling pessimistic concurrency). In that case, obvously you would want to keep them.
So to answer the question, I would only leave in the transactions where you are actually attempting to preserve the integrity of the data modifications in case of an error in the code, SQL Server, or the hardware.

I can only say that placing a transaction block like this to every stored procedure might be a novice's work.
A transaction should be placed only in a block that has more than one insert/update statements, other than that, there is no need to place a transaction block in the stored procedure.

BEGIN TRANSACTION / COMMIT syntax shouldn't be used in every stored procedure by default unless you are trying to cover the following scenarios:
You include the WITH MARK option because you want to support restoring the database from a backup to a specific point in time.
You intend to port the code from SQL Server to another database platform like Oracle. Oracle does not commit transactions by default.

Solutions for INSERT OR UPDATE on SQL Server

Assume a table structure of MyTable(KEY, datafield1, datafield2...).
Often I want to either update an existing record, or insert a new record if it doesn't exist.
Essentially:
IF (key exists)
run update command
ELSE
run insert command
What's the best performing way to write this?

don't forget about transactions. Performance is good, but simple (IF EXISTS..) approach is very dangerous.
When multiple threads will try to perform Insert-or-update you can easily
get primary key violation.
Solutions provided by #Beau Crawford & #Esteban show general idea but error-prone.
To avoid deadlocks and PK violations you can use something like this:
begin tran
if exists (select * from table with (updlock,serializable) where key = #key)
begin
update table set ...
where key = #key
end
else
begin
insert into table (key, ...)
values (#key, ...)
end
commit tran
or
begin tran
update table with (serializable) set ...
where key = #key
if ##rowcount = 0
begin
insert into table (key, ...) values (#key,..)
end
commit tran

See my detailed answer to a very similar previous question
#Beau Crawford's is a good way in SQL 2005 and below, though if you're granting rep it should go to the first guy to SO it. The only problem is that for inserts it's still two IO operations.
MS Sql2008 introduces merge from the SQL:2003 standard:
merge tablename with(HOLDLOCK) as target
using (values ('new value', 'different value'))
as source (field1, field2)
on target.idfield = 7
when matched then
update
set field1 = source.field1,
field2 = source.field2,
...
when not matched then
insert ( idfield, field1, field2, ... )
values ( 7, source.field1, source.field2, ... )
Now it's really just one IO operation, but awful code :-(

Do an UPSERT:
UPDATE MyTable SET FieldA=#FieldA WHERE Key=#Key
IF ##ROWCOUNT = 0
INSERT INTO MyTable (FieldA) VALUES (#FieldA)
http://en.wikipedia.org/wiki/Upsert

Many people will suggest you use MERGE, but I caution you against it. By default, it doesn't protect you from concurrency and race conditions any more than multiple statements, and it introduces other dangers:
Use Caution with SQL Server's MERGE Statement
So, you want to use MERGE, eh?
Even with this "simpler" syntax available, I still prefer this approach (error handling omitted for brevity):
BEGIN TRANSACTION;
UPDATE dbo.table WITH (UPDLOCK, SERIALIZABLE)
SET ... WHERE PK = #PK;
IF ##ROWCOUNT = 0
BEGIN
INSERT dbo.table(PK, ...) SELECT #PK, ...;
END
COMMIT TRANSACTION;
Please stop using this UPSERT anti-pattern
A lot of folks will suggest this way:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
IF EXISTS (SELECT 1 FROM dbo.table WHERE PK = #PK)
BEGIN
UPDATE ...
END
ELSE
BEGIN
INSERT ...
END
COMMIT TRANSACTION;
But all this accomplishes is ensuring you may need to read the table twice to locate the row(s) to be updated. In the first sample, you will only ever need to locate the row(s) once. (In both cases, if no rows are found from the initial read, an insert occurs.)
Others will suggest this way:
BEGIN TRY
INSERT ...
END TRY
BEGIN CATCH
IF ERROR_NUMBER() = 2627
UPDATE ...
END CATCH
However, this is problematic if for no other reason than letting SQL Server catch exceptions that you could have prevented in the first place is much more expensive, except in the rare scenario where almost every insert fails. I prove as much here:
Checking for potential constraint violations before entering TRY/CATCH
Performance impact of different error handling techniques

IF EXISTS (SELECT * FROM [Table] WHERE ID = rowID)
UPDATE [Table] SET propertyOne = propOne, property2 . . .
ELSE
INSERT INTO [Table] (propOne, propTwo . . .)
Edit:
Alas, even to my own detriment, I must admit the solutions that do this without a select seem to be better since they accomplish the task with one less step.

If you want to UPSERT more than one record at a time you can use the ANSI SQL:2003 DML statement MERGE.
MERGE INTO table_name WITH (HOLDLOCK) USING table_name ON (condition)
WHEN MATCHED THEN UPDATE SET column1 = value1 [, column2 = value2 ...]
WHEN NOT MATCHED THEN INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...])
Check out Mimicking MERGE Statement in SQL Server 2005.

Although its pretty late to comment on this I want to add a more complete example using MERGE.
Such Insert+Update statements are usually called "Upsert" statements and can be implemented using MERGE in SQL Server.
A very good example is given here:
http://weblogs.sqlteam.com/dang/archive/2009/01/31/UPSERT-Race-Condition-With-MERGE.aspx
The above explains locking and concurrency scenarios as well.
I will be quoting the same for reference:
ALTER PROCEDURE dbo.Merge_Foo2
#ID int
AS
SET NOCOUNT, XACT_ABORT ON;
MERGE dbo.Foo2 WITH (HOLDLOCK) AS f
USING (SELECT #ID AS ID) AS new_foo
ON f.ID = new_foo.ID
WHEN MATCHED THEN
UPDATE
SET f.UpdateSpid = ##SPID,
UpdateTime = SYSDATETIME()
WHEN NOT MATCHED THEN
INSERT
(
ID,
InsertSpid,
InsertTime
)
VALUES
(
new_foo.ID,
##SPID,
SYSDATETIME()
);
RETURN ##ERROR;

/*
CREATE TABLE ApplicationsDesSocietes (
id INT IDENTITY(0,1) NOT NULL,
applicationId INT NOT NULL,
societeId INT NOT NULL,
suppression BIT NULL,
CONSTRAINT PK_APPLICATIONSDESSOCIETES PRIMARY KEY (id)
)
GO
--*/
DECLARE #applicationId INT = 81, #societeId INT = 43, #suppression BIT = 0
MERGE dbo.ApplicationsDesSocietes WITH (HOLDLOCK) AS target
--set the SOURCE table one row
USING (VALUES (#applicationId, #societeId, #suppression))
AS source (applicationId, societeId, suppression)
--here goes the ON join condition
ON target.applicationId = source.applicationId and target.societeId = source.societeId
WHEN MATCHED THEN
UPDATE
--place your list of SET here
SET target.suppression = source.suppression
WHEN NOT MATCHED THEN
--insert a new line with the SOURCE table one row
INSERT (applicationId, societeId, suppression)
VALUES (source.applicationId, source.societeId, source.suppression);
GO
Replace table and field names by whatever you need.
Take care of the using ON condition.
Then set the appropriate value (and type) for the variables on the DECLARE line.
Cheers.

That depends on the usage pattern. One has to look at the usage big picture without getting lost in the details. For example, if the usage pattern is 99% updates after the record has been created, then the 'UPSERT' is the best solution.
After the first insert (hit), it will be all single statement updates, no ifs or buts. The 'where' condition on the insert is necessary otherwise it will insert duplicates, and you don't want to deal with locking.
UPDATE <tableName> SET <field>=#field WHERE key=#key;
IF ##ROWCOUNT = 0
BEGIN
INSERT INTO <tableName> (field)
SELECT #field
WHERE NOT EXISTS (select * from tableName where key = #key);
END

You can use MERGE Statement, This statement is used to insert data if not exist or update if does exist.
MERGE INTO Employee AS e
using EmployeeUpdate AS eu
ON e.EmployeeID = eu.EmployeeID`

If going the UPDATE if-no-rows-updated then INSERT route, consider doing the INSERT first to prevent a race condition (assuming no intervening DELETE)
INSERT INTO MyTable (Key, FieldA)
SELECT #Key, #FieldA
WHERE NOT EXISTS
(
SELECT *
FROM MyTable
WHERE Key = #Key
)
IF ##ROWCOUNT = 0
BEGIN
UPDATE MyTable
SET FieldA=#FieldA
WHERE Key=#Key
IF ##ROWCOUNT = 0
... record was deleted, consider looping to re-run the INSERT, or RAISERROR ...
END
Apart from avoiding a race condition, if in most cases the record will already exist then this will cause the INSERT to fail, wasting CPU.
Using MERGE probably preferable for SQL2008 onwards.

MS SQL Server 2008 introduces the MERGE statement, which I believe is part of the SQL:2003 standard. As many have shown it is not a big deal to handle one row cases, but when dealing with large datasets, one needs a cursor, with all the performance problems that come along. The MERGE statement will be much welcomed addition when dealing with large datasets.

Before everyone jumps to HOLDLOCK-s out of fear from these nafarious users running your sprocs directly :-) let me point out that you have to guarantee uniqueness of new PK-s by design (identity keys, sequence generators in Oracle, unique indexes for external ID-s, queries covered by indexes). That's the alpha and omega of the issue. If you don't have that, no HOLDLOCK-s of the universe are going to save you and if you do have that then you don't need anything beyond UPDLOCK on the first select (or to use update first).
Sprocs normally run under very controlled conditions and with the assumption of a trusted caller (mid tier). Meaning that if a simple upsert pattern (update+insert or merge) ever sees duplicate PK that means a bug in your mid-tier or table design and it's good that SQL will yell a fault in such case and reject the record. Placing a HOLDLOCK in this case equals eating exceptions and taking in potentially faulty data, besides reducing your perf.
Having said that, Using MERGE, or UPDATE then INSERT is easier on your server and less error prone since you don't have to remember to add (UPDLOCK) to first select. Also, if you are doing inserts/updates in small batches you need to know your data in order to decide whether a transaction is appropriate or not. It it's just a collection of unrelated records then additional "enveloping" transaction will be detrimental.

Does the race conditions really matter if you first try an update followed by an insert?
Lets say you have two threads that want to set a value for key key:
Thread 1: value = 1
Thread 2: value = 2
Example race condition scenario
key is not defined
Thread 1 fails with update
Thread 2 fails with update
Exactly one of thread 1 or thread 2 succeeds with insert. E.g. thread 1
The other thread fails with insert (with error duplicate key) - thread 2.
Result: The "first" of the two treads to insert, decides value.
Wanted result: The last of the 2 threads to write data (update or insert) should decide value
But; in a multithreaded environment, the OS scheduler decides on the order of the thread execution - in the above scenario, where we have this race condition, it was the OS that decided on the sequence of execution. Ie: It is wrong to say that "thread 1" or "thread 2" was "first" from a system viewpoint.
When the time of execution is so close for thread 1 and thread 2, the outcome of the race condition doesn't matter. The only requirement should be that one of the threads should define the resulting value.
For the implementation: If update followed by insert results in error "duplicate key", this should be treated as success.
Also, one should of course never assume that value in the database is the same as the value you wrote last.

I had tried below solution and it works for me, when concurrent request for insert statement occurs.
begin tran
if exists (select * from table with (updlock,serializable) where key = #key)
begin
update table set ...
where key = #key
end
else
begin
insert table (key, ...)
values (#key, ...)
end
commit tran

You can use this query. Work in all SQL Server editions. It's simple, and clear. But you need use 2 queries. You can use if you can't use MERGE
BEGIN TRAN
UPDATE table
SET Id = #ID, Description = #Description
WHERE Id = #Id
INSERT INTO table(Id, Description)
SELECT #Id, #Description
WHERE NOT EXISTS (SELECT NULL FROM table WHERE Id = #Id)
COMMIT TRAN
NOTE: Please explain answer negatives

Assuming that you want to insert/update single row, most optimal approach is to use SQL Server's REPEATABLE READ transaction isolation level:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN TRANSACTION
IF (EXISTS (SELECT * FROM myTable WHERE key=#key)
UPDATE myTable SET ...
WHERE key=#key
ELSE
INSERT INTO myTable (key, ...)
VALUES (#key, ...)
COMMIT TRANSACTION
This isolation level will prevent/block subsequent repeatable read transactions from accessing same row (WHERE key=#key) while currently running transaction is open.
On the other hand, operations on another row won't be blocked (WHERE key=#key2).

You can use:
INSERT INTO tableName (...) VALUES (...)
ON DUPLICATE KEY
UPDATE ...
Using this, if there is already an entry for the particular key, then it will UPDATE, else, it will INSERT.

In SQL Server 2008 you can use the MERGE statement

If you use ADO.NET, the DataAdapter handles this.
If you want to handle it yourself, this is the way:
Make sure there is a primary key constraint on your key column.
Then you:
Do the update
If the update fails because a record with the key already exists, do the insert. If the update does not fail, you are finished.
You can also do it the other way round, i.e. do the insert first, and do the update if the insert fails. Normally the first way is better, because updates are done more often than inserts.

Doing an if exists ... else ... involves doing two requests minimum (one to check, one to take action). The following approach requires only one where the record exists, two if an insert is required:
DECLARE #RowExists bit
SET #RowExists = 0
UPDATE MyTable SET DataField1 = 'xxx', #RowExists = 1 WHERE Key = 123
IF #RowExists = 0
INSERT INTO MyTable (Key, DataField1) VALUES (123, 'xxx')

I usually do what several of the other posters have said with regard to checking for it existing first and then doing whatever the correct path is. One thing you should remember when doing this is that the execution plan cached by sql could be nonoptimal for one path or the other. I believe the best way to do this is to call two different stored procedures.
FirstSP:
If Exists
Call SecondSP (UpdateProc)
Else
Call ThirdSP (InsertProc)
Now, I don't follow my own advice very often, so take it with a grain of salt.

Do a select, if you get a result, update it, if not, create it.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Conditionally inserting records into a table in multithreaded environment based on a count - sql

Related

SQL number generation in concurrent environment (Transation isolation level)

Update if a key, or combination of keys, exists, otherwise INSERT [duplicate]

Getting deadlocks on MS SQL stored procedure performing a read/update (put code to handle deadlocks)

Having TRANSACTION In All Queries

Solutions for INSERT OR UPDATE on SQL Server

Categories

Resources