Update if a key, or combination of keys, exists, otherwise INSERT [duplicate] - sql

Assume a table structure of MyTable(KEY, datafield1, datafield2...).
Often I want to either update an existing record, or insert a new record if it doesn't exist.
Essentially:
IF (key exists)
run update command
ELSE
run insert command
What's the best performing way to write this?

don't forget about transactions. Performance is good, but simple (IF EXISTS..) approach is very dangerous.
When multiple threads will try to perform Insert-or-update you can easily
get primary key violation.
Solutions provided by #Beau Crawford & #Esteban show general idea but error-prone.
To avoid deadlocks and PK violations you can use something like this:
begin tran
if exists (select * from table with (updlock,serializable) where key = #key)
begin
update table set ...
where key = #key
end
else
begin
insert into table (key, ...)
values (#key, ...)
end
commit tran
or
begin tran
update table with (serializable) set ...
where key = #key
if ##rowcount = 0
begin
insert into table (key, ...) values (#key,..)
end
commit tran

See my detailed answer to a very similar previous question
#Beau Crawford's is a good way in SQL 2005 and below, though if you're granting rep it should go to the first guy to SO it. The only problem is that for inserts it's still two IO operations.
MS Sql2008 introduces merge from the SQL:2003 standard:
merge tablename with(HOLDLOCK) as target
using (values ('new value', 'different value'))
as source (field1, field2)
on target.idfield = 7
when matched then
update
set field1 = source.field1,
field2 = source.field2,
...
when not matched then
insert ( idfield, field1, field2, ... )
values ( 7, source.field1, source.field2, ... )
Now it's really just one IO operation, but awful code :-(

Do an UPSERT:
UPDATE MyTable SET FieldA=#FieldA WHERE Key=#Key
IF ##ROWCOUNT = 0
INSERT INTO MyTable (FieldA) VALUES (#FieldA)
http://en.wikipedia.org/wiki/Upsert

Many people will suggest you use MERGE, but I caution you against it. By default, it doesn't protect you from concurrency and race conditions any more than multiple statements, and it introduces other dangers:
Use Caution with SQL Server's MERGE Statement
So, you want to use MERGE, eh?
Even with this "simpler" syntax available, I still prefer this approach (error handling omitted for brevity):
BEGIN TRANSACTION;
UPDATE dbo.table WITH (UPDLOCK, SERIALIZABLE)
SET ... WHERE PK = #PK;
IF ##ROWCOUNT = 0
BEGIN
INSERT dbo.table(PK, ...) SELECT #PK, ...;
END
COMMIT TRANSACTION;
Please stop using this UPSERT anti-pattern
A lot of folks will suggest this way:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
IF EXISTS (SELECT 1 FROM dbo.table WHERE PK = #PK)
BEGIN
UPDATE ...
END
ELSE
BEGIN
INSERT ...
END
COMMIT TRANSACTION;
But all this accomplishes is ensuring you may need to read the table twice to locate the row(s) to be updated. In the first sample, you will only ever need to locate the row(s) once. (In both cases, if no rows are found from the initial read, an insert occurs.)
Others will suggest this way:
BEGIN TRY
INSERT ...
END TRY
BEGIN CATCH
IF ERROR_NUMBER() = 2627
UPDATE ...
END CATCH
However, this is problematic if for no other reason than letting SQL Server catch exceptions that you could have prevented in the first place is much more expensive, except in the rare scenario where almost every insert fails. I prove as much here:
Checking for potential constraint violations before entering TRY/CATCH
Performance impact of different error handling techniques

IF EXISTS (SELECT * FROM [Table] WHERE ID = rowID)
UPDATE [Table] SET propertyOne = propOne, property2 . . .
ELSE
INSERT INTO [Table] (propOne, propTwo . . .)
Edit:
Alas, even to my own detriment, I must admit the solutions that do this without a select seem to be better since they accomplish the task with one less step.

If you want to UPSERT more than one record at a time you can use the ANSI SQL:2003 DML statement MERGE.
MERGE INTO table_name WITH (HOLDLOCK) USING table_name ON (condition)
WHEN MATCHED THEN UPDATE SET column1 = value1 [, column2 = value2 ...]
WHEN NOT MATCHED THEN INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...])
Check out Mimicking MERGE Statement in SQL Server 2005.

Although its pretty late to comment on this I want to add a more complete example using MERGE.
Such Insert+Update statements are usually called "Upsert" statements and can be implemented using MERGE in SQL Server.
A very good example is given here:
http://weblogs.sqlteam.com/dang/archive/2009/01/31/UPSERT-Race-Condition-With-MERGE.aspx
The above explains locking and concurrency scenarios as well.
I will be quoting the same for reference:
ALTER PROCEDURE dbo.Merge_Foo2
#ID int
AS
SET NOCOUNT, XACT_ABORT ON;
MERGE dbo.Foo2 WITH (HOLDLOCK) AS f
USING (SELECT #ID AS ID) AS new_foo
ON f.ID = new_foo.ID
WHEN MATCHED THEN
UPDATE
SET f.UpdateSpid = ##SPID,
UpdateTime = SYSDATETIME()
WHEN NOT MATCHED THEN
INSERT
(
ID,
InsertSpid,
InsertTime
)
VALUES
(
new_foo.ID,
##SPID,
SYSDATETIME()
);
RETURN ##ERROR;

/*
CREATE TABLE ApplicationsDesSocietes (
id INT IDENTITY(0,1) NOT NULL,
applicationId INT NOT NULL,
societeId INT NOT NULL,
suppression BIT NULL,
CONSTRAINT PK_APPLICATIONSDESSOCIETES PRIMARY KEY (id)
)
GO
--*/
DECLARE #applicationId INT = 81, #societeId INT = 43, #suppression BIT = 0
MERGE dbo.ApplicationsDesSocietes WITH (HOLDLOCK) AS target
--set the SOURCE table one row
USING (VALUES (#applicationId, #societeId, #suppression))
AS source (applicationId, societeId, suppression)
--here goes the ON join condition
ON target.applicationId = source.applicationId and target.societeId = source.societeId
WHEN MATCHED THEN
UPDATE
--place your list of SET here
SET target.suppression = source.suppression
WHEN NOT MATCHED THEN
--insert a new line with the SOURCE table one row
INSERT (applicationId, societeId, suppression)
VALUES (source.applicationId, source.societeId, source.suppression);
GO
Replace table and field names by whatever you need.
Take care of the using ON condition.
Then set the appropriate value (and type) for the variables on the DECLARE line.
Cheers.

That depends on the usage pattern. One has to look at the usage big picture without getting lost in the details. For example, if the usage pattern is 99% updates after the record has been created, then the 'UPSERT' is the best solution.
After the first insert (hit), it will be all single statement updates, no ifs or buts. The 'where' condition on the insert is necessary otherwise it will insert duplicates, and you don't want to deal with locking.
UPDATE <tableName> SET <field>=#field WHERE key=#key;
IF ##ROWCOUNT = 0
BEGIN
INSERT INTO <tableName> (field)
SELECT #field
WHERE NOT EXISTS (select * from tableName where key = #key);
END

You can use MERGE Statement, This statement is used to insert data if not exist or update if does exist.
MERGE INTO Employee AS e
using EmployeeUpdate AS eu
ON e.EmployeeID = eu.EmployeeID`

If going the UPDATE if-no-rows-updated then INSERT route, consider doing the INSERT first to prevent a race condition (assuming no intervening DELETE)
INSERT INTO MyTable (Key, FieldA)
SELECT #Key, #FieldA
WHERE NOT EXISTS
(
SELECT *
FROM MyTable
WHERE Key = #Key
)
IF ##ROWCOUNT = 0
BEGIN
UPDATE MyTable
SET FieldA=#FieldA
WHERE Key=#Key
IF ##ROWCOUNT = 0
... record was deleted, consider looping to re-run the INSERT, or RAISERROR ...
END
Apart from avoiding a race condition, if in most cases the record will already exist then this will cause the INSERT to fail, wasting CPU.
Using MERGE probably preferable for SQL2008 onwards.

MS SQL Server 2008 introduces the MERGE statement, which I believe is part of the SQL:2003 standard. As many have shown it is not a big deal to handle one row cases, but when dealing with large datasets, one needs a cursor, with all the performance problems that come along. The MERGE statement will be much welcomed addition when dealing with large datasets.

Before everyone jumps to HOLDLOCK-s out of fear from these nafarious users running your sprocs directly :-) let me point out that you have to guarantee uniqueness of new PK-s by design (identity keys, sequence generators in Oracle, unique indexes for external ID-s, queries covered by indexes). That's the alpha and omega of the issue. If you don't have that, no HOLDLOCK-s of the universe are going to save you and if you do have that then you don't need anything beyond UPDLOCK on the first select (or to use update first).
Sprocs normally run under very controlled conditions and with the assumption of a trusted caller (mid tier). Meaning that if a simple upsert pattern (update+insert or merge) ever sees duplicate PK that means a bug in your mid-tier or table design and it's good that SQL will yell a fault in such case and reject the record. Placing a HOLDLOCK in this case equals eating exceptions and taking in potentially faulty data, besides reducing your perf.
Having said that, Using MERGE, or UPDATE then INSERT is easier on your server and less error prone since you don't have to remember to add (UPDLOCK) to first select. Also, if you are doing inserts/updates in small batches you need to know your data in order to decide whether a transaction is appropriate or not. It it's just a collection of unrelated records then additional "enveloping" transaction will be detrimental.

Does the race conditions really matter if you first try an update followed by an insert?
Lets say you have two threads that want to set a value for key key:
Thread 1: value = 1
Thread 2: value = 2
Example race condition scenario
key is not defined
Thread 1 fails with update
Thread 2 fails with update
Exactly one of thread 1 or thread 2 succeeds with insert. E.g. thread 1
The other thread fails with insert (with error duplicate key) - thread 2.
Result: The "first" of the two treads to insert, decides value.
Wanted result: The last of the 2 threads to write data (update or insert) should decide value
But; in a multithreaded environment, the OS scheduler decides on the order of the thread execution - in the above scenario, where we have this race condition, it was the OS that decided on the sequence of execution. Ie: It is wrong to say that "thread 1" or "thread 2" was "first" from a system viewpoint.
When the time of execution is so close for thread 1 and thread 2, the outcome of the race condition doesn't matter. The only requirement should be that one of the threads should define the resulting value.
For the implementation: If update followed by insert results in error "duplicate key", this should be treated as success.
Also, one should of course never assume that value in the database is the same as the value you wrote last.

I had tried below solution and it works for me, when concurrent request for insert statement occurs.
begin tran
if exists (select * from table with (updlock,serializable) where key = #key)
begin
update table set ...
where key = #key
end
else
begin
insert table (key, ...)
values (#key, ...)
end
commit tran

You can use this query. Work in all SQL Server editions. It's simple, and clear. But you need use 2 queries. You can use if you can't use MERGE
BEGIN TRAN
UPDATE table
SET Id = #ID, Description = #Description
WHERE Id = #Id
INSERT INTO table(Id, Description)
SELECT #Id, #Description
WHERE NOT EXISTS (SELECT NULL FROM table WHERE Id = #Id)
COMMIT TRAN
NOTE: Please explain answer negatives

Assuming that you want to insert/update single row, most optimal approach is to use SQL Server's REPEATABLE READ transaction isolation level:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN TRANSACTION
IF (EXISTS (SELECT * FROM myTable WHERE key=#key)
UPDATE myTable SET ...
WHERE key=#key
ELSE
INSERT INTO myTable (key, ...)
VALUES (#key, ...)
COMMIT TRANSACTION
This isolation level will prevent/block subsequent repeatable read transactions from accessing same row (WHERE key=#key) while currently running transaction is open.
On the other hand, operations on another row won't be blocked (WHERE key=#key2).

You can use:
INSERT INTO tableName (...) VALUES (...)
ON DUPLICATE KEY
UPDATE ...
Using this, if there is already an entry for the particular key, then it will UPDATE, else, it will INSERT.

In SQL Server 2008 you can use the MERGE statement

If you use ADO.NET, the DataAdapter handles this.
If you want to handle it yourself, this is the way:
Make sure there is a primary key constraint on your key column.
Then you:
Do the update
If the update fails because a record with the key already exists, do the insert. If the update does not fail, you are finished.
You can also do it the other way round, i.e. do the insert first, and do the update if the insert fails. Normally the first way is better, because updates are done more often than inserts.

Doing an if exists ... else ... involves doing two requests minimum (one to check, one to take action). The following approach requires only one where the record exists, two if an insert is required:
DECLARE #RowExists bit
SET #RowExists = 0
UPDATE MyTable SET DataField1 = 'xxx', #RowExists = 1 WHERE Key = 123
IF #RowExists = 0
INSERT INTO MyTable (Key, DataField1) VALUES (123, 'xxx')

I usually do what several of the other posters have said with regard to checking for it existing first and then doing whatever the correct path is. One thing you should remember when doing this is that the execution plan cached by sql could be nonoptimal for one path or the other. I believe the best way to do this is to call two different stored procedures.
FirstSP:
If Exists
Call SecondSP (UpdateProc)
Else
Call ThirdSP (InsertProc)
Now, I don't follow my own advice very often, so take it with a grain of salt.

Do a select, if you get a result, update it, if not, create it.

Related

Microsoft SQL Server - best way to 'Update if exists, or Insert'

I've been searching around for the answers to this question, and there's some conflicting or ambiguous information out there, finding it hard to find a for-sure answer.
My context: I'm in node.js using the 'mssql' npm package. My SQL server is Microsoft SQL Server 2014.
I have a record that may or may not exist in a table already -- if it exists I want to update it, otherwise I want to insert it. I'm not sure what the optimal SQL is, or if there's some kind of 'transaction' I should be running in mssql. I've found some options that seem good, but I'm not sure about any of them:
Option 1:
how to update if exists or insert
Problem with this is I'm not even sure this is valid syntax in MSSQL. I do like it though, and it seems to support doing multiple rows at once too which I like.
INSERT INTO table (id, user, date, points)
VALUES (1, 1, '2017-03-03', 25),
(2, 1, '2017-03-04', 25),
(3, 2, '2017-03-03', 100),
(4, 2, '2017-03-04', 150)
ON DUPLICATE KEY UPDATE points = VALUES(points)
Option 2:
don't know if there's any problem with this one, just not sure if it's optimal. Doesn't seem to support multiple simultaneous rows
update test set name='john' where id=3012
IF ##ROWCOUNT=0
insert into test(name) values('john');
Option 3: Merge, https://dba.stackexchange.com/questions/89696/how-to-insert-or-update-using-single-query
Some people say this is a bit buggy or something? This also apparently supports multiple at once which I like.
MERGE dbo.Test WITH (SERIALIZABLE) AS T
USING (VALUES (3012, 'john')) AS U (id, name)
ON U.id = T.id
WHEN MATCHED THEN
UPDATE SET T.name = U.name
WHEN NOT MATCHED THEN
INSERT (id, name)
VALUES (U.id, U.name);
Every one of them has different purpose, pros and cons.
Option 1 is good for multi row inserts/updates. However It only checks primary key constraints.
Option 2 is good for small sets of data. Single record insertion/update. It is more like script.
Option 3 is best for big queries. Lets say, reading from one table and inserting/updating to another accordingly. You can define which condition to be satisfied for insertion and/or update. You are not limited to primary key/unique constraint.
If your system is highly concurrent, and performance is important - you can try following pattern, if updates are more common than inserts:
BEGIN TRANSACTION;
UPDATE dbo.t WITH (UPDLOCK, SERIALIZABLE) SET val = #val WHERE [key] = #key;
IF ##ROWCOUNT = 0
BEGIN
INSERT dbo.t([key], val) VALUES(#key, #val);
END
COMMIT TRANSACTION;
Reference: https://sqlperformance.com/2020/09/locking/upsert-anti-pattern
Also read: https://michaeljswart.com/2017/07/sql-server-upsert-patterns-and-antipatterns/
If inserts are more common:
BEGIN TRY
INSERT INTO dbo.AccountDetails (Email, Etc) VALUES (#Email, #Etc);
END TRY
BEGIN CATCH
-- ignore duplicate key errors, throw the rest.
IF ERROR_NUMBER() IN (2601, 2627)
UPDATE dbo.AccountDetails
SET Etc = #Etc
WHERE Email = #Email;
END CATCH
I wouldn't use merge, while most of the bugs are apparently fixed - we have had major issues with it before in production.
EDIT ---
Yes above answers were for single rows - For multiple rows, you'd do something like this: The idea behind the locking is the same though
BEGIN TRANSACTION;
UPDATE t WITH (UPDLOCK, SERIALIZABLE)
SET val = tvp.val
FROM dbo.t AS t
INNER JOIN #tvp AS tvp
ON t.[key] = tvp.[key];
INSERT dbo.t([key], val)
SELECT [key], val FROM #tvp AS tvp
WHERE NOT EXISTS (SELECT 1 FROM dbo.t WHERE [key] = tvp.[key]);
COMMIT TRANSACTION;
Extending my comment here. There are known problems with MERGE in SQL Server, however, for what you're doing here you will likely be ok. Aaron Bertrand has an article on the subject which you can find here: Use Caution with SQL Server's MERGE Statement.
An alternative, however, for what you could do here would be using an "UPSERT"; UPDATE the existing rows, and then INSERT the ones that don't exist. This involves 2 separate statements, however, was the method used prior to MERGE:
UPDATE T
SET T.Name = U.Name
FROM dbo.Test T
JOIN (VALUES (3012, 'john')) AS U (id, name) ON T.id = U.id;
INSERT INTO dbo.Test (Name) --I'm assuming ID is an `IDENTITY` here
SELECT U.name
FROM (VALUES (3012, 'john')) AS U (id, name)
WHERE NOT EXISTS (SELECT 1
FROM dbo.Test T
WHERE T.ID = U.ID);
Note I have not declared any locking or transactions in this example, but you should in any implemented solution.

Insert trigger preventing duplicates

I have a table with a AutoIdentity column as its PK and a nvarchar column called "IdentificationCode". All I want is when inserting a new row, it will search the table for any preexisting IdentificationCode, and if any found roll back the transaction.
I have written the folowing trigger:
ALTER trigger [dbo].[Disallow_Duplicate_Ids]
on [dbo].[tbl1]
for insert
as
if ((select COUNT(*) from dbo.tbl1 e , inserted i where e.IdentificationNo = i.IdentificationNo ) > 0)
begin
RAISERROR('Multiple Ids detected',16,1)
ROLLBACK TRANSACTION
end
But when inserting new rows, it always triggers the rollback even if there is no such IdentificationCode.
Can any one help me please?
thanks
As #Qpirate mentions, you should probably put some sort of UNIQUE constraint on the column. This is probably 'stronger' than using a trigger, as there's ways to disable those.
Also, the implicit-join syntax (comma-separated FROM clause) is considered an SQL anti-pattern - if possible, please always explicitly declare your joins.
I suspect that your error is because your trigger seems to be an AFTER trigger, and you check to see if there are any (non-zero) rows in the table; in other words, the trigger is (possibly) 'failing' the INSERT because it was INSERTed. Changing it to a BEFORE (or INSTEAD OF) trigger, or changing the count to >= 2 may solve the problem.
Without seeing your insert statement, it's impossible to know for sure, but (especially if you're using a SP), you may be able to check for existence in the INSERT statement itself, and throw an error (or do something else) if the row isn't inserted.
For example, the following:
INSERT INTO tbl1 (identificationCode, *otherColumns*)
VALUES (#identificationCode, *otherColumns)
WHERE NOT EXISTS (SELECT '1'
FROM tbl1
WHERE identificationCode = #identificationCode)
Will return a code indicating 'row not found' (inserted, etc; on pretty much every system this is SQLCODE = 100) if identificationCode is already present.
Use EXISTS to check if the IdentificationCode already exist.
If EXISTS (Select * from tbl1 where IdentificationCode = #IdentificationCode )
BEGIN
//do something
END
Else
BEGIN
//do something
END

Possible to implement a manual increment with just simple SQL INSERT?

I have a primary key that I don't want to auto increment (for various reasons) and so I'm looking for a way to simply increment that field when I INSERT. By simply, I mean without stored procedures and without triggers, so just a series of SQL commands (preferably one command).
Here is what I have tried thus far:
BEGIN TRAN
INSERT INTO Table1(id, data_field)
VALUES ( (SELECT (MAX(id) + 1) FROM Table1), '[blob of data]');
COMMIT TRAN;
* Data abstracted to use generic names and identifiers
However, when executed, the command errors, saying that
"Subqueries are not allowed in this
context. only scalar expressions are
allowed"
So, how can I do this/what am I doing wrong?
EDIT: Since it was pointed out as a consideration, the table to be inserted into is guaranteed to have at least 1 row already.
You understand that you will have collisions right?
you need to do something like this and this might cause deadlocks so be very sure what you are trying to accomplish here
DECLARE #id int
BEGIN TRAN
SELECT #id = MAX(id) + 1 FROM Table1 WITH (UPDLOCK, HOLDLOCK)
INSERT INTO Table1(id, data_field)
VALUES (#id ,'[blob of data]')
COMMIT TRAN
To explain the collision thing, I have provided some code
first create this table and insert one row
CREATE TABLE Table1(id int primary key not null, data_field char(100))
GO
Insert Table1 values(1,'[blob of data]')
Go
Now open up two query windows and run this at the same time
declare #i int
set #i =1
while #i < 10000
begin
BEGIN TRAN
INSERT INTO Table1(id, data_field)
SELECT MAX(id) + 1, '[blob of data]' FROM Table1
COMMIT TRAN;
set #i =#i + 1
end
You will see a bunch of these
Server: Msg 2627, Level 14, State 1, Line 7
Violation of PRIMARY KEY constraint 'PK__Table1__3213E83F2962141D'. Cannot insert duplicate key in object 'dbo.Table1'.
The statement has been terminated.
Try this instead:
INSERT INTO Table1 (id, data_field)
SELECT id, '[blob of data]' FROM (SELECT MAX(id) + 1 as id FROM Table1) tbl
I wouldn't recommend doing it that way for any number of reasons though (performance, transaction safety, etc)
It could be because there are no records so the sub query is returning NULL...try
INSERT INTO tblTest(RecordID, Text)
VALUES ((SELECT ISNULL(MAX(RecordID), 0) + 1 FROM tblTest), 'asdf')
I don't know if somebody is still looking for an answer but here is a solution that seems to work:
-- Preparation: execute only once
CREATE TABLE Test (Value int)
CREATE TABLE Lock (LockID uniqueidentifier)
INSERT INTO Lock SELECT NEWID()
-- Real insert
BEGIN TRAN LockTran
-- Lock an object to block simultaneous calls.
UPDATE Lock WITH(TABLOCK)
SET LockID = LockID
INSERT INTO Test
SELECT ISNULL(MAX(T.Value), 0) + 1
FROM Test T
COMMIT TRAN LockTran
We have a similar situation where we needed to increment and could not have gaps in the numbers. (If you use an identity value and a transaction is rolled back, that number will not be inserted and you will have gaps because the identity value does not roll back.)
We created a separate table for last number used and seeded it with 0.
Our insert takes a few steps.
--increment the number
Update dbo.NumberTable
set number = number + 1
--find out what the incremented number is
select #number = number
from dbo.NumberTable
--use the number
insert into dbo.MyTable using the #number
commit or rollback
This causes simultaneous transactions to process in a single line as each concurrent transaction will wait because the NumberTable is locked. As soon as the waiting transaction gets the lock, it increments the current value and locks it from others. That current value is the last number used and if a transaction is rolled back, the NumberTable update is also rolled back so there are no gaps.
Hope that helps.
Another way to cause single file execution is to use a SQL application lock. We have used that approach for longer running processes like synchronizing data between systems so only one synchronizing process can run at a time.
If you're doing it in a trigger, you could make sure it's an "INSTEAD OF" trigger and do it in a couple of statements:
DECLARE #next INT
SET #next = (SELECT (MAX(id) + 1) FROM Table1)
INSERT INTO Table1
VALUES (#next, inserted.datablob)
The only thing you'd have to be careful about is concurrency - if two rows are inserted at the same time, they could attempt to use the same value for #next, causing a conflict.
Does this accomplish what you want?
It seems very odd to do this sort of thing w/o an IDENTITY (auto-increment) column, making me question the architecture itself. I mean, seriously, this is the perfect situation for an IDENTITY column. It might help us answer your question if you'd explain the reasoning behind this decision. =)
Having said that, some options are:
using an INSTEAD OF trigger for this purpose. So, you'd do your INSERT (the INSERT statement would not need to pass in an ID). The trigger code would handle inserting the appropriate ID. You'd need to use the WITH (UPDLOCK, HOLDLOCK) syntax used by another answerer to hold the lock for the duration of the trigger (which is implicitly wrapped in a transaction) & to elevate the lock type from "shared" to "update" lock (IIRC).
you can use the idea above, but have a table whose purpose is to store the last, max value inserted into the table. So, once the table is set up, you would no longer have to do a SELECT MAX(ID) every time. You'd simply increment the value in the table. This is safe provided that you use appropriate locking (as discussed). Again, that avoids repeated table scans every time you INSERT.
use GUIDs instead of IDs. It's much easier to merge tables across databases, since the GUIDs will always be unique (whereas records across databases will have conflicting integer IDs). To avoid page splitting, sequential GUIDs can be used. This is only beneficial if you might need to do database merging.
Use a stored proc in lieu of the trigger approach (since triggers are to be avoided, for some reason). You'd still have the locking issue (and the performance problems that can arise). But sprocs are preferred over dynamic SQL (in the context of applications), and are often much more performant.
Sorry about rambling. Hope that helps.
How about creating a separate table to maintain the counter? It has better performance than MAX(id), as it will be O(1). MAX(id) is at best O(lgn) depending on the implementation.
And then when you need to insert, simply lock the counter table for reading the counter and increment the counter. Then you can release the lock and insert to your table with the incremented counter value.
Have a separate table where you keep your latest ID and for every transaction get a new one.
It may be a bit slower but it should work.
DECLARE #NEWID INT
BEGIN TRAN
UPDATE TABLE SET ID=ID+1
SELECT #NEWID=ID FROM TABLE
COMMIT TRAN
PRINT #NEWID -- Do what you want with your new ID
Code without any transaction scope (I use it in my engineer course as an exercice) :
-- Preparation: execute only once
CREATE TABLE increment (val int);
INSERT INTO increment VALUES (1);
-- Real insert
DECLARE #newIncrement INT;
UPDATE increment
SET #newIncrement = val,
val = val + 1;
INSERT INTO Table1 (id, data_field)
SELECT #newIncrement, 'some data';
declare #nextId int
set #nextId = (select MAX(id)+1 from Table1)
insert into Table1(id, data_field) values (#nextId, '[blob of data]')
commit;
But perhaps a better approach would be using a scalar function getNextId('table1')
Any critiques of this? Works for me.
DECLARE #m_NewRequestID INT
, #m_IsError BIT = 1
, #m_CatchEndless INT = 0
WHILE #m_IsError = 1
BEGIN TRY
SELECT #m_NewRequestID = (SELECT ISNULL(MAX(RequestID), 0) + 1 FROM Requests)
INSERT INTO Requests ( RequestID
, RequestName
, Customer
, Comment
, CreatedFromApplication)
SELECT RequestID = #m_NewRequestID
, RequestName = dbo.ufGetNextAvailableRequestName(PatternName)
, Customer = #Customer
, Comment = [Description]
, CreatedFromApplication = #CreatedFromApplication
FROM RequestPatterns
WHERE PatternID = #PatternID
SET #m_IsError = 0
END TRY
BEGIN CATCH
SET #m_IsError = 1
SET #m_CatchEndless = #m_CatchEndless + 1
IF #m_CatchEndless > 1000
THROW 51000, '[upCreateRequestFromPattern]: Unable to get new RequestID', 1
END CATCH
This should work:
INSERT INTO Table1 (id, data_field)
SELECT (SELECT (MAX(id) + 1) FROM Table1), '[blob of data]';
Or this (substitute LIMIT for other platforms):
INSERT INTO Table1 (id, data_field)
SELECT TOP 1
MAX(id) + 1, '[blob of data]'
FROM
Table1
ORDER BY
[id] DESC;

Solutions for INSERT OR UPDATE on SQL Server

Assume a table structure of MyTable(KEY, datafield1, datafield2...).
Often I want to either update an existing record, or insert a new record if it doesn't exist.
Essentially:
IF (key exists)
run update command
ELSE
run insert command
What's the best performing way to write this?
don't forget about transactions. Performance is good, but simple (IF EXISTS..) approach is very dangerous.
When multiple threads will try to perform Insert-or-update you can easily
get primary key violation.
Solutions provided by #Beau Crawford & #Esteban show general idea but error-prone.
To avoid deadlocks and PK violations you can use something like this:
begin tran
if exists (select * from table with (updlock,serializable) where key = #key)
begin
update table set ...
where key = #key
end
else
begin
insert into table (key, ...)
values (#key, ...)
end
commit tran
or
begin tran
update table with (serializable) set ...
where key = #key
if ##rowcount = 0
begin
insert into table (key, ...) values (#key,..)
end
commit tran
See my detailed answer to a very similar previous question
#Beau Crawford's is a good way in SQL 2005 and below, though if you're granting rep it should go to the first guy to SO it. The only problem is that for inserts it's still two IO operations.
MS Sql2008 introduces merge from the SQL:2003 standard:
merge tablename with(HOLDLOCK) as target
using (values ('new value', 'different value'))
as source (field1, field2)
on target.idfield = 7
when matched then
update
set field1 = source.field1,
field2 = source.field2,
...
when not matched then
insert ( idfield, field1, field2, ... )
values ( 7, source.field1, source.field2, ... )
Now it's really just one IO operation, but awful code :-(
Do an UPSERT:
UPDATE MyTable SET FieldA=#FieldA WHERE Key=#Key
IF ##ROWCOUNT = 0
INSERT INTO MyTable (FieldA) VALUES (#FieldA)
http://en.wikipedia.org/wiki/Upsert
Many people will suggest you use MERGE, but I caution you against it. By default, it doesn't protect you from concurrency and race conditions any more than multiple statements, and it introduces other dangers:
Use Caution with SQL Server's MERGE Statement
So, you want to use MERGE, eh?
Even with this "simpler" syntax available, I still prefer this approach (error handling omitted for brevity):
BEGIN TRANSACTION;
UPDATE dbo.table WITH (UPDLOCK, SERIALIZABLE)
SET ... WHERE PK = #PK;
IF ##ROWCOUNT = 0
BEGIN
INSERT dbo.table(PK, ...) SELECT #PK, ...;
END
COMMIT TRANSACTION;
Please stop using this UPSERT anti-pattern
A lot of folks will suggest this way:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
IF EXISTS (SELECT 1 FROM dbo.table WHERE PK = #PK)
BEGIN
UPDATE ...
END
ELSE
BEGIN
INSERT ...
END
COMMIT TRANSACTION;
But all this accomplishes is ensuring you may need to read the table twice to locate the row(s) to be updated. In the first sample, you will only ever need to locate the row(s) once. (In both cases, if no rows are found from the initial read, an insert occurs.)
Others will suggest this way:
BEGIN TRY
INSERT ...
END TRY
BEGIN CATCH
IF ERROR_NUMBER() = 2627
UPDATE ...
END CATCH
However, this is problematic if for no other reason than letting SQL Server catch exceptions that you could have prevented in the first place is much more expensive, except in the rare scenario where almost every insert fails. I prove as much here:
Checking for potential constraint violations before entering TRY/CATCH
Performance impact of different error handling techniques
IF EXISTS (SELECT * FROM [Table] WHERE ID = rowID)
UPDATE [Table] SET propertyOne = propOne, property2 . . .
ELSE
INSERT INTO [Table] (propOne, propTwo . . .)
Edit:
Alas, even to my own detriment, I must admit the solutions that do this without a select seem to be better since they accomplish the task with one less step.
If you want to UPSERT more than one record at a time you can use the ANSI SQL:2003 DML statement MERGE.
MERGE INTO table_name WITH (HOLDLOCK) USING table_name ON (condition)
WHEN MATCHED THEN UPDATE SET column1 = value1 [, column2 = value2 ...]
WHEN NOT MATCHED THEN INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...])
Check out Mimicking MERGE Statement in SQL Server 2005.
Although its pretty late to comment on this I want to add a more complete example using MERGE.
Such Insert+Update statements are usually called "Upsert" statements and can be implemented using MERGE in SQL Server.
A very good example is given here:
http://weblogs.sqlteam.com/dang/archive/2009/01/31/UPSERT-Race-Condition-With-MERGE.aspx
The above explains locking and concurrency scenarios as well.
I will be quoting the same for reference:
ALTER PROCEDURE dbo.Merge_Foo2
#ID int
AS
SET NOCOUNT, XACT_ABORT ON;
MERGE dbo.Foo2 WITH (HOLDLOCK) AS f
USING (SELECT #ID AS ID) AS new_foo
ON f.ID = new_foo.ID
WHEN MATCHED THEN
UPDATE
SET f.UpdateSpid = ##SPID,
UpdateTime = SYSDATETIME()
WHEN NOT MATCHED THEN
INSERT
(
ID,
InsertSpid,
InsertTime
)
VALUES
(
new_foo.ID,
##SPID,
SYSDATETIME()
);
RETURN ##ERROR;
/*
CREATE TABLE ApplicationsDesSocietes (
id INT IDENTITY(0,1) NOT NULL,
applicationId INT NOT NULL,
societeId INT NOT NULL,
suppression BIT NULL,
CONSTRAINT PK_APPLICATIONSDESSOCIETES PRIMARY KEY (id)
)
GO
--*/
DECLARE #applicationId INT = 81, #societeId INT = 43, #suppression BIT = 0
MERGE dbo.ApplicationsDesSocietes WITH (HOLDLOCK) AS target
--set the SOURCE table one row
USING (VALUES (#applicationId, #societeId, #suppression))
AS source (applicationId, societeId, suppression)
--here goes the ON join condition
ON target.applicationId = source.applicationId and target.societeId = source.societeId
WHEN MATCHED THEN
UPDATE
--place your list of SET here
SET target.suppression = source.suppression
WHEN NOT MATCHED THEN
--insert a new line with the SOURCE table one row
INSERT (applicationId, societeId, suppression)
VALUES (source.applicationId, source.societeId, source.suppression);
GO
Replace table and field names by whatever you need.
Take care of the using ON condition.
Then set the appropriate value (and type) for the variables on the DECLARE line.
Cheers.
That depends on the usage pattern. One has to look at the usage big picture without getting lost in the details. For example, if the usage pattern is 99% updates after the record has been created, then the 'UPSERT' is the best solution.
After the first insert (hit), it will be all single statement updates, no ifs or buts. The 'where' condition on the insert is necessary otherwise it will insert duplicates, and you don't want to deal with locking.
UPDATE <tableName> SET <field>=#field WHERE key=#key;
IF ##ROWCOUNT = 0
BEGIN
INSERT INTO <tableName> (field)
SELECT #field
WHERE NOT EXISTS (select * from tableName where key = #key);
END
You can use MERGE Statement, This statement is used to insert data if not exist or update if does exist.
MERGE INTO Employee AS e
using EmployeeUpdate AS eu
ON e.EmployeeID = eu.EmployeeID`
If going the UPDATE if-no-rows-updated then INSERT route, consider doing the INSERT first to prevent a race condition (assuming no intervening DELETE)
INSERT INTO MyTable (Key, FieldA)
SELECT #Key, #FieldA
WHERE NOT EXISTS
(
SELECT *
FROM MyTable
WHERE Key = #Key
)
IF ##ROWCOUNT = 0
BEGIN
UPDATE MyTable
SET FieldA=#FieldA
WHERE Key=#Key
IF ##ROWCOUNT = 0
... record was deleted, consider looping to re-run the INSERT, or RAISERROR ...
END
Apart from avoiding a race condition, if in most cases the record will already exist then this will cause the INSERT to fail, wasting CPU.
Using MERGE probably preferable for SQL2008 onwards.
MS SQL Server 2008 introduces the MERGE statement, which I believe is part of the SQL:2003 standard. As many have shown it is not a big deal to handle one row cases, but when dealing with large datasets, one needs a cursor, with all the performance problems that come along. The MERGE statement will be much welcomed addition when dealing with large datasets.
Before everyone jumps to HOLDLOCK-s out of fear from these nafarious users running your sprocs directly :-) let me point out that you have to guarantee uniqueness of new PK-s by design (identity keys, sequence generators in Oracle, unique indexes for external ID-s, queries covered by indexes). That's the alpha and omega of the issue. If you don't have that, no HOLDLOCK-s of the universe are going to save you and if you do have that then you don't need anything beyond UPDLOCK on the first select (or to use update first).
Sprocs normally run under very controlled conditions and with the assumption of a trusted caller (mid tier). Meaning that if a simple upsert pattern (update+insert or merge) ever sees duplicate PK that means a bug in your mid-tier or table design and it's good that SQL will yell a fault in such case and reject the record. Placing a HOLDLOCK in this case equals eating exceptions and taking in potentially faulty data, besides reducing your perf.
Having said that, Using MERGE, or UPDATE then INSERT is easier on your server and less error prone since you don't have to remember to add (UPDLOCK) to first select. Also, if you are doing inserts/updates in small batches you need to know your data in order to decide whether a transaction is appropriate or not. It it's just a collection of unrelated records then additional "enveloping" transaction will be detrimental.
Does the race conditions really matter if you first try an update followed by an insert?
Lets say you have two threads that want to set a value for key key:
Thread 1: value = 1
Thread 2: value = 2
Example race condition scenario
key is not defined
Thread 1 fails with update
Thread 2 fails with update
Exactly one of thread 1 or thread 2 succeeds with insert. E.g. thread 1
The other thread fails with insert (with error duplicate key) - thread 2.
Result: The "first" of the two treads to insert, decides value.
Wanted result: The last of the 2 threads to write data (update or insert) should decide value
But; in a multithreaded environment, the OS scheduler decides on the order of the thread execution - in the above scenario, where we have this race condition, it was the OS that decided on the sequence of execution. Ie: It is wrong to say that "thread 1" or "thread 2" was "first" from a system viewpoint.
When the time of execution is so close for thread 1 and thread 2, the outcome of the race condition doesn't matter. The only requirement should be that one of the threads should define the resulting value.
For the implementation: If update followed by insert results in error "duplicate key", this should be treated as success.
Also, one should of course never assume that value in the database is the same as the value you wrote last.
I had tried below solution and it works for me, when concurrent request for insert statement occurs.
begin tran
if exists (select * from table with (updlock,serializable) where key = #key)
begin
update table set ...
where key = #key
end
else
begin
insert table (key, ...)
values (#key, ...)
end
commit tran
You can use this query. Work in all SQL Server editions. It's simple, and clear. But you need use 2 queries. You can use if you can't use MERGE
BEGIN TRAN
UPDATE table
SET Id = #ID, Description = #Description
WHERE Id = #Id
INSERT INTO table(Id, Description)
SELECT #Id, #Description
WHERE NOT EXISTS (SELECT NULL FROM table WHERE Id = #Id)
COMMIT TRAN
NOTE: Please explain answer negatives
Assuming that you want to insert/update single row, most optimal approach is to use SQL Server's REPEATABLE READ transaction isolation level:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN TRANSACTION
IF (EXISTS (SELECT * FROM myTable WHERE key=#key)
UPDATE myTable SET ...
WHERE key=#key
ELSE
INSERT INTO myTable (key, ...)
VALUES (#key, ...)
COMMIT TRANSACTION
This isolation level will prevent/block subsequent repeatable read transactions from accessing same row (WHERE key=#key) while currently running transaction is open.
On the other hand, operations on another row won't be blocked (WHERE key=#key2).
You can use:
INSERT INTO tableName (...) VALUES (...)
ON DUPLICATE KEY
UPDATE ...
Using this, if there is already an entry for the particular key, then it will UPDATE, else, it will INSERT.
In SQL Server 2008 you can use the MERGE statement
If you use ADO.NET, the DataAdapter handles this.
If you want to handle it yourself, this is the way:
Make sure there is a primary key constraint on your key column.
Then you:
Do the update
If the update fails because a record with the key already exists, do the insert. If the update does not fail, you are finished.
You can also do it the other way round, i.e. do the insert first, and do the update if the insert fails. Normally the first way is better, because updates are done more often than inserts.
Doing an if exists ... else ... involves doing two requests minimum (one to check, one to take action). The following approach requires only one where the record exists, two if an insert is required:
DECLARE #RowExists bit
SET #RowExists = 0
UPDATE MyTable SET DataField1 = 'xxx', #RowExists = 1 WHERE Key = 123
IF #RowExists = 0
INSERT INTO MyTable (Key, DataField1) VALUES (123, 'xxx')
I usually do what several of the other posters have said with regard to checking for it existing first and then doing whatever the correct path is. One thing you should remember when doing this is that the execution plan cached by sql could be nonoptimal for one path or the other. I believe the best way to do this is to call two different stored procedures.
FirstSP:
If Exists
Call SecondSP (UpdateProc)
Else
Call ThirdSP (InsertProc)
Now, I don't follow my own advice very often, so take it with a grain of salt.
Do a select, if you get a result, update it, if not, create it.

Insert Update stored proc on SQL Server

I've written a stored proc that will do an update if a record exists, otherwise it will do an insert. It looks something like this:
update myTable set Col1=#col1, Col2=#col2 where ID=#ID
if ##rowcount = 0
insert into myTable (Col1, Col2) values (#col1, #col2)
My logic behind writing it in this way is that the update will perform an implicit select using the where clause and if that returns 0 then the insert will take place.
The alternative to doing it this way would be to do a select and then based on the number of rows returned either do an update or insert. This I considered inefficient because if you are to do an update it will cause 2 selects (the first explicit select call and the second implicit in the where of the update). If the proc were to do an insert then there'd be no difference in efficiency.
Is my logic sound here?
Is this how you would combine an insert and update into a stored proc?
Your assumption is right, this is the optimal way to do it and it's called upsert/merge.
Importance of UPSERT - from sqlservercentral.com:
For every update in the case mentioned above we are removing one
additional read from the table if we
use the UPSERT instead of EXISTS.
Unfortunately for an Insert, both the
UPSERT and IF EXISTS methods use the
same number of reads on the table.
Therefore the check for existence
should only be done when there is a
very valid reason to justify the
additional I/O. The optimized way to
do things is to make sure that you
have little reads as possible on the
DB.
The best strategy is to attempt the
update. If no rows are affected by the
update then insert. In most
circumstances, the row will already
exist and only one I/O will be
required.
Edit:
Please check out this answer and the linked blog post to learn about the problems with this pattern and how to make it work safe.
Please read the post on my blog for a good, safe pattern you can use. There are a lot of considerations, and the accepted answer on this question is far from safe.
For a quick answer try the following pattern. It will work fine on SQL 2000 and above. SQL 2005 gives you error handling which opens up other options and SQL 2008 gives you a MERGE command.
begin tran
update t with (serializable)
set hitCount = hitCount + 1
where pk = #id
if ##rowcount = 0
begin
insert t (pk, hitCount)
values (#id,1)
end
commit tran
If to be used with SQL Server 2000/2005 the original code needs to be enclosed in transaction to make sure that data remain consistent in concurrent scenario.
BEGIN TRANSACTION Upsert
update myTable set Col1=#col1, Col2=#col2 where ID=#ID
if ##rowcount = 0
insert into myTable (Col1, Col2) values (#col1, #col2)
COMMIT TRANSACTION Upsert
This will incur additional performance cost, but will ensure data integrity.
Add, as already suggested, MERGE should be used where available.
MERGE is one of the new features in SQL Server 2008, by the way.
You not only need to run it in transaction, it also needs high isolation level. I fact default isolation level is Read Commited and this code need Serializable.
SET transaction isolation level SERIALIZABLE
BEGIN TRANSACTION Upsert
UPDATE myTable set Col1=#col1, Col2=#col2 where ID=#ID
if ##rowcount = 0
begin
INSERT into myTable (ID, Col1, Col2) values (#ID #col1, #col2)
end
COMMIT TRANSACTION Upsert
Maybe adding also the ##error check and rollback could be good idea.
If you are not doing a merge in SQL 2008 you must change it to:
if ##rowcount = 0 and ##error=0
otherwise if the update fails for some reason then it will try and to an insert afterwards because the rowcount on a failed statement is 0
Big fan of the UPSERT, really cuts down on the code to manage. Here is another way I do it: One of the input parameters is ID, if the ID is NULL or 0, you know it's an INSERT, otherwise it's an update. Assumes the application knows if there is an ID, so wont work in all situations, but will cut the executes in half if you do.
Modified Dima Malenko post:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION UPSERT
UPDATE MYTABLE
SET COL1 = #col1,
COL2 = #col2
WHERE ID = #ID
IF ##rowcount = 0
BEGIN
INSERT INTO MYTABLE
(ID,
COL1,
COL2)
VALUES (#ID,
#col1,
#col2)
END
IF ##Error > 0
BEGIN
INSERT INTO MYERRORTABLE
(ID,
COL1,
COL2)
VALUES (#ID,
#col1,
#col2)
END
COMMIT TRANSACTION UPSERT
You can trap the error and send the record to a failed insert table.
I needed to do this because we are taking whatever data is send via WSDL and if possible fixing it internally.
Your logic seems sound, but you might want to consider adding some code to prevent the insert if you had passed in a specific primary key.
Otherwise, if you're always doing an insert if the update didn't affect any records, what happens when someone deletes the record before you "UPSERT" runs? Now the record you were trying to update doesn't exist, so it'll create a record instead. That probably isn't the behavior you were looking for.