I'm working on a project in VB.net which takes large text files containing T-SQL and executes them against a local SQL database, but I've hit a problem in regards to error handling.
I'm using the following technologies :
VB.net
Framework 3.5
SQL Express 2005
The SQL I'm trying to execute is mostly straight-forwards but my app is completely unaware of the schema or the data contained within. For example :
UPDATE mytable SET mycol2='data' WHERE mycol1=1
INSERT INTO mytable (mycol1, mycol2) VALUES (1,'data')
UPDATE mytable SET mycol2='data' WHERE mycol1=2
INSERT INTO mytable (mycol1, mycol2) VALUES (1,'data')
UPDATE mytable SET mycol2='data' WHERE mycol1=3
The above is a sample of the sort of thing I'm executing, but these files will contain around 10,000 to 20,000 statements each.
My problem is that when using sqlCommand.ExecuteNonQuery(), I get an exception raised because the second INSERT statement will hit the Primary Key constraint on the table.
I need to know that this error happened and log it, but also process any subsequent statements. I've tried wrapping these statements in TRY/CATCH blocks but I can't work out a way to handle the error then continue to process the other statements.
The Query Analyser seems to behave in this way, but not when using sqlCommand.ExecuteNonQuery().
So is there a T-SQL equivalent of 'Resume Next' or some other way I can do this without introducing massive amounts of string handling on my part?
Any help greatly appreciated.
SQL Server does have a Try/Catch syntax. See:
http://msdn.microsoft.com/en-us/library/ms175976.aspx
To use this with your file, you would either have to rewrite the files themselves to wrap each line with the try/catch syntax, or else your code would have to programatically modify the file contents to wrap each line.
There is no T-SQL equivalent of "On Error Resume Next", and thank Cthulhu for that.
Actually your batch executed until the end since key violations are not intrerupting batch execution. If you run the same SQL file from Management Studio you'll see that the result is that all the valid statements were executed and the messages panel contains an error for each key violation. The SqlClient of ADO.NEt behaves much the same way, but at the end of the batch (when SqlCommand.ExecuteNonQuery returns) it parses the messages returned and throws an exception. The exception is one single SqlException but it's Errors collection contains a SqlError for each key violation that occured.
Unfortunately there is no silver bullet. Ideally the SQL files should not cause errors. You can choose to iterate through the SqlErrors of the exception and decide, on individual basis, if the error was serious or you can ignore it, knowing that the SQL files have data quality problems. Some errors may be serious and cannot be ignored. See Database Engine Error Severities.
Another alternative is to explictily tell the SqlClient not to throw. If you set the FireInfoMessageEventOnUserErrors property of the connection to true it will raise an SqlConnection.InfoMessage event instead of twroing an exception.
At the risk of slowing down the process (by making thousands of trips to SQL server rwther than one), you could handle this issue by splitting the file into multiple individual queries each for either INSERT or UPDATE. Then you can catch each individual error as it take place and log it or deal with it as your business logic would require.
begin try
--your critical commands
end try
begin catch
-- is necessary write somethink like this
select ''
end cath
One technique I use is to use a try/catch and within the catch raise an event with the exception information. The caller can then hook up an event handler to do whatever she pleases with the information (log it, collect it for reporting in the UI, whatever).
You can also include the technique (used in many .NET framework areas, e.g. Windows.Forms events and XSD validation) of passing a CancelableEventArgs object as the second event argument, with a Boolean field that the event handler can set to indicate that processing should abort after all.
Another thing I urge you to do is to prepare your INSERTs and UPDATEs, then call them many times with varying argments.
I'm not aware of a way to support resume next, but one approach would be to use a local table variable to prevent the errors in the first place e.g.
Declare #Table table(id int, value varchar(100))
UPDATE mytable SET mycol2='data' WHERE mycol1=1
--Insert values for later usage
INSERT INTO #Table (id, value) VALUES (1,'data')
--Insert only if data does not already exist.
INSERT INTO mytable (mycol1, mycol2)
Select id, Value from #Table t left join
mytable t2 on t.id = t2.MyCol1
where t2.MyCol is null and t.id = 1
EDIT
Ok, I don't know that I suggest this per se, but you could achieve a sort of resume next by wrapping the try catch in a while loop if you set an exit condition at the end of all the steps and keep track of what step you performed last.
Declare #Table table(id int, value varchar(100))
Declare #Step int
set #Step = 0
While (1=1)
Begin
Begin Try
if #Step < 1
Begin
Insert into #Table (id, value) values ('s', 1)
Set #Step = #Step + 1
End
if #Step < 2
Begin
Insert into #Table values ( 1, 's')
Set #Step = #Step + 1
End
Break
End Try
Begin Catch
Set #Step = #Step + 1
End Catch
End
Select * from #Table
Unfortunately, I don't think there's a way to force the SqlCommand to keep processing once an error has been returned.
If you're unsure whether any of the commands will cause an error or not (and at some performance cost), you should split the commands in the text file into individual SqlCommands...which would enable you to use a try/catch block and find the offending statements.
...this, of course, depends on the T-SQL commands in the text file to each be on a separate line (or otherwise delineated).
You need to check whether the PK value already exists. Additionally, one large transaction is always faster than many little transactions; and the rows and pages don't need to be locked for as long overall that way.
-- load a temp/import table
create table #importables (mycol1 int, mycol2 varchar(50))
insert #importables (mycol1, mycol2) values (1, 'data for 1')
insert #importables (mycol1, mycol2) values (2, 'data for 2')
insert #importables (mycol1, mycol2) values (3, 'data for 3')
-- update the base table rows that are already there
update mt set MyCol2 = i.MyCol2
from #importables i (nolock)
inner join MyTable mt on mt.MyCol1 = i.MyCol1
where mt.MyCol2 <> i.MyCol2 -- no need to fire triggers and logs if nothing changed
-- insert new rows AFTER the update, so we don't update the rows we just inserted.
insert MyTable (mycol1, mycol2)
select mycol1, mycol2
from #importables i (nolock)
left join MyTable mt (nolock) on mt.MyCol1 = i.MyCol1
where mt.MyCol1 is null;
You could improve this further by opening a SqlConnection, creating a #temp table, using SqlBulkCopy to do a bulk insert to that temp table, and doing the delta from there (as opposed to my #importables in this example). As long as you use the same SqlConnection, a #temp table will remain accessible to subsequent queries on that connection, until you drop it or disconnect.
Related
When building a stored procedure that inserts into a simple link table, should it check if the FK's exist and gracefully return an error or just let SQL throw an exception?
What is best practice?
What is most efficient?
Just in case someone doesn't understand my question:
Table A
Table B
Table AB
Should I do:
IF EXISTS(SELECT 1 FROM A WHERE Id = #A) AND EXISTS(SELECT 1 FROM B WHERE Id = #B)
BEGIN
INSERT AB (AId,BId) VALUES (#A, #B)
END
ELSE
--handle gracefully, return error code or something
or
INSERT AB (AId,BId) VALUES (#A, #B)
and let SQL throw an exception
Thanks
If the tables are under your control, there is no reason to perform an extra check. Just assume they are set up correctly, and let SQL handle any error. Constantly checking that you have indeed done what you intended to do is overly defensive programming that adds unnecessary complexity to your code.
For example, you wouldn't write code like this:
i = 1;
if (i != 1)
{
print "Error: i is not 1!";
}
And I view this situation as similar.
If the tables are not under your control it may be useful to handle the error gracefully. For example, if this procedure can run on an arbitrary set of tables created by the user, or if it will be distributed to external users who are required to set up the tables in their own database, you may want to add some custom error handling. The purpose of this would be to give the user a clearer description of what went wrong.
As a basic concept, validating values before a potentially error raising code is a good thing. However, in this case, there could be (at least theoretically) a change in table a or table b between the exists checks and the insert statement, that would raise the fk violation error.
I would do something like this:
BEGIN TRY
INSERT AB (AId,BId) VALUES (#A, #B)
SELECT NULL As ErrorMessage
END TRY
BEGIN CATCH
SELECT ERROR_MESSAGE() AS ErrorMessage
END CATCH
The ERROR_MESSAGE() function returns the error that was rasied in the try block.
Then in the executing code you can simply check if the returned error message is null. If it is, you know the insert was successful. If not, you can handle this exception how ever you see fit.
Usually i will check Fk. If the Fk isn't exist, then throw a error friendly, and the insert statement will not be execute, database will not lock the table too.
Assume a table structure of MyTable(KEY, datafield1, datafield2...).
Often I want to either update an existing record, or insert a new record if it doesn't exist.
Essentially:
IF (key exists)
run update command
ELSE
run insert command
What's the best performing way to write this?
don't forget about transactions. Performance is good, but simple (IF EXISTS..) approach is very dangerous.
When multiple threads will try to perform Insert-or-update you can easily
get primary key violation.
Solutions provided by #Beau Crawford & #Esteban show general idea but error-prone.
To avoid deadlocks and PK violations you can use something like this:
begin tran
if exists (select * from table with (updlock,serializable) where key = #key)
begin
update table set ...
where key = #key
end
else
begin
insert into table (key, ...)
values (#key, ...)
end
commit tran
or
begin tran
update table with (serializable) set ...
where key = #key
if ##rowcount = 0
begin
insert into table (key, ...) values (#key,..)
end
commit tran
See my detailed answer to a very similar previous question
#Beau Crawford's is a good way in SQL 2005 and below, though if you're granting rep it should go to the first guy to SO it. The only problem is that for inserts it's still two IO operations.
MS Sql2008 introduces merge from the SQL:2003 standard:
merge tablename with(HOLDLOCK) as target
using (values ('new value', 'different value'))
as source (field1, field2)
on target.idfield = 7
when matched then
update
set field1 = source.field1,
field2 = source.field2,
...
when not matched then
insert ( idfield, field1, field2, ... )
values ( 7, source.field1, source.field2, ... )
Now it's really just one IO operation, but awful code :-(
Do an UPSERT:
UPDATE MyTable SET FieldA=#FieldA WHERE Key=#Key
IF ##ROWCOUNT = 0
INSERT INTO MyTable (FieldA) VALUES (#FieldA)
http://en.wikipedia.org/wiki/Upsert
Many people will suggest you use MERGE, but I caution you against it. By default, it doesn't protect you from concurrency and race conditions any more than multiple statements, and it introduces other dangers:
Use Caution with SQL Server's MERGE Statement
So, you want to use MERGE, eh?
Even with this "simpler" syntax available, I still prefer this approach (error handling omitted for brevity):
BEGIN TRANSACTION;
UPDATE dbo.table WITH (UPDLOCK, SERIALIZABLE)
SET ... WHERE PK = #PK;
IF ##ROWCOUNT = 0
BEGIN
INSERT dbo.table(PK, ...) SELECT #PK, ...;
END
COMMIT TRANSACTION;
Please stop using this UPSERT anti-pattern
A lot of folks will suggest this way:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
IF EXISTS (SELECT 1 FROM dbo.table WHERE PK = #PK)
BEGIN
UPDATE ...
END
ELSE
BEGIN
INSERT ...
END
COMMIT TRANSACTION;
But all this accomplishes is ensuring you may need to read the table twice to locate the row(s) to be updated. In the first sample, you will only ever need to locate the row(s) once. (In both cases, if no rows are found from the initial read, an insert occurs.)
Others will suggest this way:
BEGIN TRY
INSERT ...
END TRY
BEGIN CATCH
IF ERROR_NUMBER() = 2627
UPDATE ...
END CATCH
However, this is problematic if for no other reason than letting SQL Server catch exceptions that you could have prevented in the first place is much more expensive, except in the rare scenario where almost every insert fails. I prove as much here:
Checking for potential constraint violations before entering TRY/CATCH
Performance impact of different error handling techniques
IF EXISTS (SELECT * FROM [Table] WHERE ID = rowID)
UPDATE [Table] SET propertyOne = propOne, property2 . . .
ELSE
INSERT INTO [Table] (propOne, propTwo . . .)
Edit:
Alas, even to my own detriment, I must admit the solutions that do this without a select seem to be better since they accomplish the task with one less step.
If you want to UPSERT more than one record at a time you can use the ANSI SQL:2003 DML statement MERGE.
MERGE INTO table_name WITH (HOLDLOCK) USING table_name ON (condition)
WHEN MATCHED THEN UPDATE SET column1 = value1 [, column2 = value2 ...]
WHEN NOT MATCHED THEN INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...])
Check out Mimicking MERGE Statement in SQL Server 2005.
Although its pretty late to comment on this I want to add a more complete example using MERGE.
Such Insert+Update statements are usually called "Upsert" statements and can be implemented using MERGE in SQL Server.
A very good example is given here:
http://weblogs.sqlteam.com/dang/archive/2009/01/31/UPSERT-Race-Condition-With-MERGE.aspx
The above explains locking and concurrency scenarios as well.
I will be quoting the same for reference:
ALTER PROCEDURE dbo.Merge_Foo2
#ID int
AS
SET NOCOUNT, XACT_ABORT ON;
MERGE dbo.Foo2 WITH (HOLDLOCK) AS f
USING (SELECT #ID AS ID) AS new_foo
ON f.ID = new_foo.ID
WHEN MATCHED THEN
UPDATE
SET f.UpdateSpid = ##SPID,
UpdateTime = SYSDATETIME()
WHEN NOT MATCHED THEN
INSERT
(
ID,
InsertSpid,
InsertTime
)
VALUES
(
new_foo.ID,
##SPID,
SYSDATETIME()
);
RETURN ##ERROR;
/*
CREATE TABLE ApplicationsDesSocietes (
id INT IDENTITY(0,1) NOT NULL,
applicationId INT NOT NULL,
societeId INT NOT NULL,
suppression BIT NULL,
CONSTRAINT PK_APPLICATIONSDESSOCIETES PRIMARY KEY (id)
)
GO
--*/
DECLARE #applicationId INT = 81, #societeId INT = 43, #suppression BIT = 0
MERGE dbo.ApplicationsDesSocietes WITH (HOLDLOCK) AS target
--set the SOURCE table one row
USING (VALUES (#applicationId, #societeId, #suppression))
AS source (applicationId, societeId, suppression)
--here goes the ON join condition
ON target.applicationId = source.applicationId and target.societeId = source.societeId
WHEN MATCHED THEN
UPDATE
--place your list of SET here
SET target.suppression = source.suppression
WHEN NOT MATCHED THEN
--insert a new line with the SOURCE table one row
INSERT (applicationId, societeId, suppression)
VALUES (source.applicationId, source.societeId, source.suppression);
GO
Replace table and field names by whatever you need.
Take care of the using ON condition.
Then set the appropriate value (and type) for the variables on the DECLARE line.
Cheers.
That depends on the usage pattern. One has to look at the usage big picture without getting lost in the details. For example, if the usage pattern is 99% updates after the record has been created, then the 'UPSERT' is the best solution.
After the first insert (hit), it will be all single statement updates, no ifs or buts. The 'where' condition on the insert is necessary otherwise it will insert duplicates, and you don't want to deal with locking.
UPDATE <tableName> SET <field>=#field WHERE key=#key;
IF ##ROWCOUNT = 0
BEGIN
INSERT INTO <tableName> (field)
SELECT #field
WHERE NOT EXISTS (select * from tableName where key = #key);
END
You can use MERGE Statement, This statement is used to insert data if not exist or update if does exist.
MERGE INTO Employee AS e
using EmployeeUpdate AS eu
ON e.EmployeeID = eu.EmployeeID`
If going the UPDATE if-no-rows-updated then INSERT route, consider doing the INSERT first to prevent a race condition (assuming no intervening DELETE)
INSERT INTO MyTable (Key, FieldA)
SELECT #Key, #FieldA
WHERE NOT EXISTS
(
SELECT *
FROM MyTable
WHERE Key = #Key
)
IF ##ROWCOUNT = 0
BEGIN
UPDATE MyTable
SET FieldA=#FieldA
WHERE Key=#Key
IF ##ROWCOUNT = 0
... record was deleted, consider looping to re-run the INSERT, or RAISERROR ...
END
Apart from avoiding a race condition, if in most cases the record will already exist then this will cause the INSERT to fail, wasting CPU.
Using MERGE probably preferable for SQL2008 onwards.
MS SQL Server 2008 introduces the MERGE statement, which I believe is part of the SQL:2003 standard. As many have shown it is not a big deal to handle one row cases, but when dealing with large datasets, one needs a cursor, with all the performance problems that come along. The MERGE statement will be much welcomed addition when dealing with large datasets.
Before everyone jumps to HOLDLOCK-s out of fear from these nafarious users running your sprocs directly :-) let me point out that you have to guarantee uniqueness of new PK-s by design (identity keys, sequence generators in Oracle, unique indexes for external ID-s, queries covered by indexes). That's the alpha and omega of the issue. If you don't have that, no HOLDLOCK-s of the universe are going to save you and if you do have that then you don't need anything beyond UPDLOCK on the first select (or to use update first).
Sprocs normally run under very controlled conditions and with the assumption of a trusted caller (mid tier). Meaning that if a simple upsert pattern (update+insert or merge) ever sees duplicate PK that means a bug in your mid-tier or table design and it's good that SQL will yell a fault in such case and reject the record. Placing a HOLDLOCK in this case equals eating exceptions and taking in potentially faulty data, besides reducing your perf.
Having said that, Using MERGE, or UPDATE then INSERT is easier on your server and less error prone since you don't have to remember to add (UPDLOCK) to first select. Also, if you are doing inserts/updates in small batches you need to know your data in order to decide whether a transaction is appropriate or not. It it's just a collection of unrelated records then additional "enveloping" transaction will be detrimental.
Does the race conditions really matter if you first try an update followed by an insert?
Lets say you have two threads that want to set a value for key key:
Thread 1: value = 1
Thread 2: value = 2
Example race condition scenario
key is not defined
Thread 1 fails with update
Thread 2 fails with update
Exactly one of thread 1 or thread 2 succeeds with insert. E.g. thread 1
The other thread fails with insert (with error duplicate key) - thread 2.
Result: The "first" of the two treads to insert, decides value.
Wanted result: The last of the 2 threads to write data (update or insert) should decide value
But; in a multithreaded environment, the OS scheduler decides on the order of the thread execution - in the above scenario, where we have this race condition, it was the OS that decided on the sequence of execution. Ie: It is wrong to say that "thread 1" or "thread 2" was "first" from a system viewpoint.
When the time of execution is so close for thread 1 and thread 2, the outcome of the race condition doesn't matter. The only requirement should be that one of the threads should define the resulting value.
For the implementation: If update followed by insert results in error "duplicate key", this should be treated as success.
Also, one should of course never assume that value in the database is the same as the value you wrote last.
I had tried below solution and it works for me, when concurrent request for insert statement occurs.
begin tran
if exists (select * from table with (updlock,serializable) where key = #key)
begin
update table set ...
where key = #key
end
else
begin
insert table (key, ...)
values (#key, ...)
end
commit tran
You can use this query. Work in all SQL Server editions. It's simple, and clear. But you need use 2 queries. You can use if you can't use MERGE
BEGIN TRAN
UPDATE table
SET Id = #ID, Description = #Description
WHERE Id = #Id
INSERT INTO table(Id, Description)
SELECT #Id, #Description
WHERE NOT EXISTS (SELECT NULL FROM table WHERE Id = #Id)
COMMIT TRAN
NOTE: Please explain answer negatives
Assuming that you want to insert/update single row, most optimal approach is to use SQL Server's REPEATABLE READ transaction isolation level:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN TRANSACTION
IF (EXISTS (SELECT * FROM myTable WHERE key=#key)
UPDATE myTable SET ...
WHERE key=#key
ELSE
INSERT INTO myTable (key, ...)
VALUES (#key, ...)
COMMIT TRANSACTION
This isolation level will prevent/block subsequent repeatable read transactions from accessing same row (WHERE key=#key) while currently running transaction is open.
On the other hand, operations on another row won't be blocked (WHERE key=#key2).
You can use:
INSERT INTO tableName (...) VALUES (...)
ON DUPLICATE KEY
UPDATE ...
Using this, if there is already an entry for the particular key, then it will UPDATE, else, it will INSERT.
In SQL Server 2008 you can use the MERGE statement
If you use ADO.NET, the DataAdapter handles this.
If you want to handle it yourself, this is the way:
Make sure there is a primary key constraint on your key column.
Then you:
Do the update
If the update fails because a record with the key already exists, do the insert. If the update does not fail, you are finished.
You can also do it the other way round, i.e. do the insert first, and do the update if the insert fails. Normally the first way is better, because updates are done more often than inserts.
Doing an if exists ... else ... involves doing two requests minimum (one to check, one to take action). The following approach requires only one where the record exists, two if an insert is required:
DECLARE #RowExists bit
SET #RowExists = 0
UPDATE MyTable SET DataField1 = 'xxx', #RowExists = 1 WHERE Key = 123
IF #RowExists = 0
INSERT INTO MyTable (Key, DataField1) VALUES (123, 'xxx')
I usually do what several of the other posters have said with regard to checking for it existing first and then doing whatever the correct path is. One thing you should remember when doing this is that the execution plan cached by sql could be nonoptimal for one path or the other. I believe the best way to do this is to call two different stored procedures.
FirstSP:
If Exists
Call SecondSP (UpdateProc)
Else
Call ThirdSP (InsertProc)
Now, I don't follow my own advice very often, so take it with a grain of salt.
Do a select, if you get a result, update it, if not, create it.
We consume a web service that decided to alter the max length of a field from 255. We have a legacy vendor table on our end that is still capped at 255. We are hoping to use a trigger to address this issue temporarily until we can implement a more business-friendly solution in our next iteration.
Here's what I started with:
CREATE TRIGGER [mySchema].[TruncDescription]
ON [mySchema].[myTable]
INSTEAD OF INSERT
AS
BEGIN
SET NOCOUNT ON;
INSERT INTO [mySchema].[myTable]
SELECT SubType, type, substring(description, 1, 255)
FROM inserted
END
However, when I try to insert on myTable, I get the error:
String or binary data would be
truncated. The statement has been
terminated.
I tried experimenting with SET ANSI_WARNINGS OFF which allowed the query to work but then simply didn't insert any data into the description column.
Is there any way to use a trigger to truncate the too-long data or is there another alternative that I can use until a more eloquent solution can be designed? We are fairly limited in table modifications (i.e. we can't) because it's a vendor table, and we don't control the web service we're consuming so we can't ask them to fix it either. Any help would be appreciated.
The error cannot be avoided because the error is happening when the inserted table is populated.
From the documentation:
http://msdn.microsoft.com/en-us/library/ms191300.aspx
"The format of the inserted and deleted tables is the same as the format of the table on which the INSTEAD OF trigger is defined. Each column in the inserted and deleted tables maps directly to a column in the base table."
The only really "clever" idea I can think of is to take advantage of schemas and the default schema used by a login. If you can get the login that the web service is using to reference another table, you can increase the column size on that table and use the INSTEAD OF INSERT trigger to perform the INSERT into the vendor table. A variation of this is to create the table in a different database and set the default database for the web service login.
CREATE TRIGGER [myDB].[mySchema].[TruncDescription]
ON [myDB].[mySchema].[myTable]
INSTEAD OF INSERT
AS
BEGIN
SET NOCOUNT ON;
INSERT INTO [VendorDB].[VendorSchema].[VendorTable]
SELECT SubType, type, substring(description, 1, 255)
FROM inserted
END
With this setup everything works OK for me.
Not to state the obvious but are you sure there is data in the description field when you are testing? It is possible they change one of the other fields you are inserting as well and maybe one of those is throwing the error?
CREATE TABLE [dbo].[DataPlay](
[Data] [nvarchar](255) NULL
) ON [PRIMARY]
GO
and a trigger like this
Create TRIGGER updT ON DataPlay
Instead of Insert
AS
BEGIN
SET NOCOUNT ON;
INSERT INTO [tempdb].[dbo].[DataPlay]
([Data])
(Select substring(Data, 1, 255) from inserted)
END
GO
then inserting with
Declare #d as nvarchar(max)
Select #d = REPLICATE('a', 500)
SET ANSI_WARNINGS OFF
INSERT INTO [tempdb].[dbo].[DataPlay]
([Data])
VALUES
(#d)
GO
I am unable to reproduce this issue on SQL 2008 R2 using:
Declare #table table ( fielda varchar(10) )
Insert Into #table ( fielda )
Values ( Substring('12345678901234567890', 1, 10) )
Please make sure that your field is really defined as varchar(255).
I also strongly suggest you use an Insert statement with an explicit field list. While your Insert is syntactically correct, you really should be using an explicit field list (like in my sample). The problem is when you don't specify a field list you are at the mercy of SQL and the table definition for the field order. When you do use a field list you can change the order of the fields in the table (or add new fields in the middle) and not care about your insert statements.
Assume a table structure of MyTable(KEY, datafield1, datafield2...).
Often I want to either update an existing record, or insert a new record if it doesn't exist.
Essentially:
IF (key exists)
run update command
ELSE
run insert command
What's the best performing way to write this?
don't forget about transactions. Performance is good, but simple (IF EXISTS..) approach is very dangerous.
When multiple threads will try to perform Insert-or-update you can easily
get primary key violation.
Solutions provided by #Beau Crawford & #Esteban show general idea but error-prone.
To avoid deadlocks and PK violations you can use something like this:
begin tran
if exists (select * from table with (updlock,serializable) where key = #key)
begin
update table set ...
where key = #key
end
else
begin
insert into table (key, ...)
values (#key, ...)
end
commit tran
or
begin tran
update table with (serializable) set ...
where key = #key
if ##rowcount = 0
begin
insert into table (key, ...) values (#key,..)
end
commit tran
See my detailed answer to a very similar previous question
#Beau Crawford's is a good way in SQL 2005 and below, though if you're granting rep it should go to the first guy to SO it. The only problem is that for inserts it's still two IO operations.
MS Sql2008 introduces merge from the SQL:2003 standard:
merge tablename with(HOLDLOCK) as target
using (values ('new value', 'different value'))
as source (field1, field2)
on target.idfield = 7
when matched then
update
set field1 = source.field1,
field2 = source.field2,
...
when not matched then
insert ( idfield, field1, field2, ... )
values ( 7, source.field1, source.field2, ... )
Now it's really just one IO operation, but awful code :-(
Do an UPSERT:
UPDATE MyTable SET FieldA=#FieldA WHERE Key=#Key
IF ##ROWCOUNT = 0
INSERT INTO MyTable (FieldA) VALUES (#FieldA)
http://en.wikipedia.org/wiki/Upsert
Many people will suggest you use MERGE, but I caution you against it. By default, it doesn't protect you from concurrency and race conditions any more than multiple statements, and it introduces other dangers:
Use Caution with SQL Server's MERGE Statement
So, you want to use MERGE, eh?
Even with this "simpler" syntax available, I still prefer this approach (error handling omitted for brevity):
BEGIN TRANSACTION;
UPDATE dbo.table WITH (UPDLOCK, SERIALIZABLE)
SET ... WHERE PK = #PK;
IF ##ROWCOUNT = 0
BEGIN
INSERT dbo.table(PK, ...) SELECT #PK, ...;
END
COMMIT TRANSACTION;
Please stop using this UPSERT anti-pattern
A lot of folks will suggest this way:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
IF EXISTS (SELECT 1 FROM dbo.table WHERE PK = #PK)
BEGIN
UPDATE ...
END
ELSE
BEGIN
INSERT ...
END
COMMIT TRANSACTION;
But all this accomplishes is ensuring you may need to read the table twice to locate the row(s) to be updated. In the first sample, you will only ever need to locate the row(s) once. (In both cases, if no rows are found from the initial read, an insert occurs.)
Others will suggest this way:
BEGIN TRY
INSERT ...
END TRY
BEGIN CATCH
IF ERROR_NUMBER() = 2627
UPDATE ...
END CATCH
However, this is problematic if for no other reason than letting SQL Server catch exceptions that you could have prevented in the first place is much more expensive, except in the rare scenario where almost every insert fails. I prove as much here:
Checking for potential constraint violations before entering TRY/CATCH
Performance impact of different error handling techniques
IF EXISTS (SELECT * FROM [Table] WHERE ID = rowID)
UPDATE [Table] SET propertyOne = propOne, property2 . . .
ELSE
INSERT INTO [Table] (propOne, propTwo . . .)
Edit:
Alas, even to my own detriment, I must admit the solutions that do this without a select seem to be better since they accomplish the task with one less step.
If you want to UPSERT more than one record at a time you can use the ANSI SQL:2003 DML statement MERGE.
MERGE INTO table_name WITH (HOLDLOCK) USING table_name ON (condition)
WHEN MATCHED THEN UPDATE SET column1 = value1 [, column2 = value2 ...]
WHEN NOT MATCHED THEN INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...])
Check out Mimicking MERGE Statement in SQL Server 2005.
Although its pretty late to comment on this I want to add a more complete example using MERGE.
Such Insert+Update statements are usually called "Upsert" statements and can be implemented using MERGE in SQL Server.
A very good example is given here:
http://weblogs.sqlteam.com/dang/archive/2009/01/31/UPSERT-Race-Condition-With-MERGE.aspx
The above explains locking and concurrency scenarios as well.
I will be quoting the same for reference:
ALTER PROCEDURE dbo.Merge_Foo2
#ID int
AS
SET NOCOUNT, XACT_ABORT ON;
MERGE dbo.Foo2 WITH (HOLDLOCK) AS f
USING (SELECT #ID AS ID) AS new_foo
ON f.ID = new_foo.ID
WHEN MATCHED THEN
UPDATE
SET f.UpdateSpid = ##SPID,
UpdateTime = SYSDATETIME()
WHEN NOT MATCHED THEN
INSERT
(
ID,
InsertSpid,
InsertTime
)
VALUES
(
new_foo.ID,
##SPID,
SYSDATETIME()
);
RETURN ##ERROR;
/*
CREATE TABLE ApplicationsDesSocietes (
id INT IDENTITY(0,1) NOT NULL,
applicationId INT NOT NULL,
societeId INT NOT NULL,
suppression BIT NULL,
CONSTRAINT PK_APPLICATIONSDESSOCIETES PRIMARY KEY (id)
)
GO
--*/
DECLARE #applicationId INT = 81, #societeId INT = 43, #suppression BIT = 0
MERGE dbo.ApplicationsDesSocietes WITH (HOLDLOCK) AS target
--set the SOURCE table one row
USING (VALUES (#applicationId, #societeId, #suppression))
AS source (applicationId, societeId, suppression)
--here goes the ON join condition
ON target.applicationId = source.applicationId and target.societeId = source.societeId
WHEN MATCHED THEN
UPDATE
--place your list of SET here
SET target.suppression = source.suppression
WHEN NOT MATCHED THEN
--insert a new line with the SOURCE table one row
INSERT (applicationId, societeId, suppression)
VALUES (source.applicationId, source.societeId, source.suppression);
GO
Replace table and field names by whatever you need.
Take care of the using ON condition.
Then set the appropriate value (and type) for the variables on the DECLARE line.
Cheers.
That depends on the usage pattern. One has to look at the usage big picture without getting lost in the details. For example, if the usage pattern is 99% updates after the record has been created, then the 'UPSERT' is the best solution.
After the first insert (hit), it will be all single statement updates, no ifs or buts. The 'where' condition on the insert is necessary otherwise it will insert duplicates, and you don't want to deal with locking.
UPDATE <tableName> SET <field>=#field WHERE key=#key;
IF ##ROWCOUNT = 0
BEGIN
INSERT INTO <tableName> (field)
SELECT #field
WHERE NOT EXISTS (select * from tableName where key = #key);
END
You can use MERGE Statement, This statement is used to insert data if not exist or update if does exist.
MERGE INTO Employee AS e
using EmployeeUpdate AS eu
ON e.EmployeeID = eu.EmployeeID`
If going the UPDATE if-no-rows-updated then INSERT route, consider doing the INSERT first to prevent a race condition (assuming no intervening DELETE)
INSERT INTO MyTable (Key, FieldA)
SELECT #Key, #FieldA
WHERE NOT EXISTS
(
SELECT *
FROM MyTable
WHERE Key = #Key
)
IF ##ROWCOUNT = 0
BEGIN
UPDATE MyTable
SET FieldA=#FieldA
WHERE Key=#Key
IF ##ROWCOUNT = 0
... record was deleted, consider looping to re-run the INSERT, or RAISERROR ...
END
Apart from avoiding a race condition, if in most cases the record will already exist then this will cause the INSERT to fail, wasting CPU.
Using MERGE probably preferable for SQL2008 onwards.
MS SQL Server 2008 introduces the MERGE statement, which I believe is part of the SQL:2003 standard. As many have shown it is not a big deal to handle one row cases, but when dealing with large datasets, one needs a cursor, with all the performance problems that come along. The MERGE statement will be much welcomed addition when dealing with large datasets.
Before everyone jumps to HOLDLOCK-s out of fear from these nafarious users running your sprocs directly :-) let me point out that you have to guarantee uniqueness of new PK-s by design (identity keys, sequence generators in Oracle, unique indexes for external ID-s, queries covered by indexes). That's the alpha and omega of the issue. If you don't have that, no HOLDLOCK-s of the universe are going to save you and if you do have that then you don't need anything beyond UPDLOCK on the first select (or to use update first).
Sprocs normally run under very controlled conditions and with the assumption of a trusted caller (mid tier). Meaning that if a simple upsert pattern (update+insert or merge) ever sees duplicate PK that means a bug in your mid-tier or table design and it's good that SQL will yell a fault in such case and reject the record. Placing a HOLDLOCK in this case equals eating exceptions and taking in potentially faulty data, besides reducing your perf.
Having said that, Using MERGE, or UPDATE then INSERT is easier on your server and less error prone since you don't have to remember to add (UPDLOCK) to first select. Also, if you are doing inserts/updates in small batches you need to know your data in order to decide whether a transaction is appropriate or not. It it's just a collection of unrelated records then additional "enveloping" transaction will be detrimental.
Does the race conditions really matter if you first try an update followed by an insert?
Lets say you have two threads that want to set a value for key key:
Thread 1: value = 1
Thread 2: value = 2
Example race condition scenario
key is not defined
Thread 1 fails with update
Thread 2 fails with update
Exactly one of thread 1 or thread 2 succeeds with insert. E.g. thread 1
The other thread fails with insert (with error duplicate key) - thread 2.
Result: The "first" of the two treads to insert, decides value.
Wanted result: The last of the 2 threads to write data (update or insert) should decide value
But; in a multithreaded environment, the OS scheduler decides on the order of the thread execution - in the above scenario, where we have this race condition, it was the OS that decided on the sequence of execution. Ie: It is wrong to say that "thread 1" or "thread 2" was "first" from a system viewpoint.
When the time of execution is so close for thread 1 and thread 2, the outcome of the race condition doesn't matter. The only requirement should be that one of the threads should define the resulting value.
For the implementation: If update followed by insert results in error "duplicate key", this should be treated as success.
Also, one should of course never assume that value in the database is the same as the value you wrote last.
I had tried below solution and it works for me, when concurrent request for insert statement occurs.
begin tran
if exists (select * from table with (updlock,serializable) where key = #key)
begin
update table set ...
where key = #key
end
else
begin
insert table (key, ...)
values (#key, ...)
end
commit tran
You can use this query. Work in all SQL Server editions. It's simple, and clear. But you need use 2 queries. You can use if you can't use MERGE
BEGIN TRAN
UPDATE table
SET Id = #ID, Description = #Description
WHERE Id = #Id
INSERT INTO table(Id, Description)
SELECT #Id, #Description
WHERE NOT EXISTS (SELECT NULL FROM table WHERE Id = #Id)
COMMIT TRAN
NOTE: Please explain answer negatives
Assuming that you want to insert/update single row, most optimal approach is to use SQL Server's REPEATABLE READ transaction isolation level:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN TRANSACTION
IF (EXISTS (SELECT * FROM myTable WHERE key=#key)
UPDATE myTable SET ...
WHERE key=#key
ELSE
INSERT INTO myTable (key, ...)
VALUES (#key, ...)
COMMIT TRANSACTION
This isolation level will prevent/block subsequent repeatable read transactions from accessing same row (WHERE key=#key) while currently running transaction is open.
On the other hand, operations on another row won't be blocked (WHERE key=#key2).
You can use:
INSERT INTO tableName (...) VALUES (...)
ON DUPLICATE KEY
UPDATE ...
Using this, if there is already an entry for the particular key, then it will UPDATE, else, it will INSERT.
In SQL Server 2008 you can use the MERGE statement
If you use ADO.NET, the DataAdapter handles this.
If you want to handle it yourself, this is the way:
Make sure there is a primary key constraint on your key column.
Then you:
Do the update
If the update fails because a record with the key already exists, do the insert. If the update does not fail, you are finished.
You can also do it the other way round, i.e. do the insert first, and do the update if the insert fails. Normally the first way is better, because updates are done more often than inserts.
Doing an if exists ... else ... involves doing two requests minimum (one to check, one to take action). The following approach requires only one where the record exists, two if an insert is required:
DECLARE #RowExists bit
SET #RowExists = 0
UPDATE MyTable SET DataField1 = 'xxx', #RowExists = 1 WHERE Key = 123
IF #RowExists = 0
INSERT INTO MyTable (Key, DataField1) VALUES (123, 'xxx')
I usually do what several of the other posters have said with regard to checking for it existing first and then doing whatever the correct path is. One thing you should remember when doing this is that the execution plan cached by sql could be nonoptimal for one path or the other. I believe the best way to do this is to call two different stored procedures.
FirstSP:
If Exists
Call SecondSP (UpdateProc)
Else
Call ThirdSP (InsertProc)
Now, I don't follow my own advice very often, so take it with a grain of salt.
Do a select, if you get a result, update it, if not, create it.
I've written a stored proc that will do an update if a record exists, otherwise it will do an insert. It looks something like this:
update myTable set Col1=#col1, Col2=#col2 where ID=#ID
if ##rowcount = 0
insert into myTable (Col1, Col2) values (#col1, #col2)
My logic behind writing it in this way is that the update will perform an implicit select using the where clause and if that returns 0 then the insert will take place.
The alternative to doing it this way would be to do a select and then based on the number of rows returned either do an update or insert. This I considered inefficient because if you are to do an update it will cause 2 selects (the first explicit select call and the second implicit in the where of the update). If the proc were to do an insert then there'd be no difference in efficiency.
Is my logic sound here?
Is this how you would combine an insert and update into a stored proc?
Your assumption is right, this is the optimal way to do it and it's called upsert/merge.
Importance of UPSERT - from sqlservercentral.com:
For every update in the case mentioned above we are removing one
additional read from the table if we
use the UPSERT instead of EXISTS.
Unfortunately for an Insert, both the
UPSERT and IF EXISTS methods use the
same number of reads on the table.
Therefore the check for existence
should only be done when there is a
very valid reason to justify the
additional I/O. The optimized way to
do things is to make sure that you
have little reads as possible on the
DB.
The best strategy is to attempt the
update. If no rows are affected by the
update then insert. In most
circumstances, the row will already
exist and only one I/O will be
required.
Edit:
Please check out this answer and the linked blog post to learn about the problems with this pattern and how to make it work safe.
Please read the post on my blog for a good, safe pattern you can use. There are a lot of considerations, and the accepted answer on this question is far from safe.
For a quick answer try the following pattern. It will work fine on SQL 2000 and above. SQL 2005 gives you error handling which opens up other options and SQL 2008 gives you a MERGE command.
begin tran
update t with (serializable)
set hitCount = hitCount + 1
where pk = #id
if ##rowcount = 0
begin
insert t (pk, hitCount)
values (#id,1)
end
commit tran
If to be used with SQL Server 2000/2005 the original code needs to be enclosed in transaction to make sure that data remain consistent in concurrent scenario.
BEGIN TRANSACTION Upsert
update myTable set Col1=#col1, Col2=#col2 where ID=#ID
if ##rowcount = 0
insert into myTable (Col1, Col2) values (#col1, #col2)
COMMIT TRANSACTION Upsert
This will incur additional performance cost, but will ensure data integrity.
Add, as already suggested, MERGE should be used where available.
MERGE is one of the new features in SQL Server 2008, by the way.
You not only need to run it in transaction, it also needs high isolation level. I fact default isolation level is Read Commited and this code need Serializable.
SET transaction isolation level SERIALIZABLE
BEGIN TRANSACTION Upsert
UPDATE myTable set Col1=#col1, Col2=#col2 where ID=#ID
if ##rowcount = 0
begin
INSERT into myTable (ID, Col1, Col2) values (#ID #col1, #col2)
end
COMMIT TRANSACTION Upsert
Maybe adding also the ##error check and rollback could be good idea.
If you are not doing a merge in SQL 2008 you must change it to:
if ##rowcount = 0 and ##error=0
otherwise if the update fails for some reason then it will try and to an insert afterwards because the rowcount on a failed statement is 0
Big fan of the UPSERT, really cuts down on the code to manage. Here is another way I do it: One of the input parameters is ID, if the ID is NULL or 0, you know it's an INSERT, otherwise it's an update. Assumes the application knows if there is an ID, so wont work in all situations, but will cut the executes in half if you do.
Modified Dima Malenko post:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION UPSERT
UPDATE MYTABLE
SET COL1 = #col1,
COL2 = #col2
WHERE ID = #ID
IF ##rowcount = 0
BEGIN
INSERT INTO MYTABLE
(ID,
COL1,
COL2)
VALUES (#ID,
#col1,
#col2)
END
IF ##Error > 0
BEGIN
INSERT INTO MYERRORTABLE
(ID,
COL1,
COL2)
VALUES (#ID,
#col1,
#col2)
END
COMMIT TRANSACTION UPSERT
You can trap the error and send the record to a failed insert table.
I needed to do this because we are taking whatever data is send via WSDL and if possible fixing it internally.
Your logic seems sound, but you might want to consider adding some code to prevent the insert if you had passed in a specific primary key.
Otherwise, if you're always doing an insert if the update didn't affect any records, what happens when someone deletes the record before you "UPSERT" runs? Now the record you were trying to update doesn't exist, so it'll create a record instead. That probably isn't the behavior you were looking for.