We have a status table. When the status changes we currently delete the old record and insert a new.
We are wondering if it would be faster to do a select to check if it exists followed by an insert or update.
Although similar to the following question, it is not the same, since we are changing individual records and the other question was doing a total table refresh.
DELETE, INSERT vs UPDATE || INSERT
Since you're talking SQL Server 2008, have you considered MERGE? It's a single statement that allows you to do an update or insert:
create table T1 (
ID int not null,
Val1 varchar(10) not null
)
go
insert into T1 (ID,Val1)
select 1,'abc'
go
merge into T1
using (select 1 as ID,'def' as Val1) upd on T1.ID = upd.ID --<-- These identify the row you want to update/insert and the new value you want to set. They could be #parameters
when matched then update set Val1 = upd.Val1
when not matched then insert (ID,Val1) values (upd.ID,upd.Val1);
What about INSERT ... ON DUPLICATE KEY? First doing a select to check if a record exists and checking in your program the result of that creates a race condition. That might not be important in your case if there is only a single instance of the program however.
INSERT INTO users (username, email) VALUES ('Jo', 'jo#email.com')
ON DUPLICATE KEY UPDATE email = 'jo#email.com'
You can use ##ROWCOUNT and perform UPDATE. If it was 0 rows affected - then perform INSERT after, nothing otherwise.
Your suggestion would mean always two instructions for each status change. The usual way is to do an UPDATE and then check if the operation changed any rows (Most databases have a variable like ROWCOUNT which should be greater than 0 if something changed). If it didn't, do an INSERT.
Search for UPSERT for find patterns for your specific DBMS
Personally, I think the UPDATE method is the best. Instead of doing a SELECT first to check if a record already exists, you can first attempt an UPDATE but if no rows are affected (using ##ROWCOUNT) you can do an INSERT.
The reason for this is that sooner or later you might want to track status changes, and the best way to do this would be to keep an audit trail of all changes using a trigger on the status table.
Related
I have a situation where I very frequently need to get a row from a table with a unique constraint, and if none exists then create it and return.
For example my table might be:
CREATE TABLE names(
id SERIAL PRIMARY KEY,
name TEXT,
CONSTRAINT names_name_key UNIQUE (name)
);
And it contains:
id | name
1 | bob
2 | alice
Then I'd like to:
INSERT INTO names(name) VALUES ('bob')
ON CONFLICT DO NOTHING RETURNING id;
Or perhaps:
INSERT INTO names(name) VALUES ('bob')
ON CONFLICT (name) DO NOTHING RETURNING id
and have it return bob's id 1. However, RETURNING only returns either inserted or updated rows. So, in the above example, it wouldn't return anything. In order to have it function as desired I would actually need to:
INSERT INTO names(name) VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO UPDATE
SET name = 'bob'
RETURNING id;
which seems kind of cumbersome. I guess my questions are:
What is the reasoning for not allowing the (my) desired behaviour?
Is there a more elegant way to do this?
It's the recurring problem of SELECT or INSERT, related to (but different from) an UPSERT. The new UPSERT functionality in Postgres 9.5 is still instrumental.
WITH ins AS (
INSERT INTO names(name)
VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO UPDATE
SET name = NULL
WHERE FALSE -- never executed, but locks the row
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM names
WHERE name = 'bob' -- only executed if no INSERT
LIMIT 1;
This way you do not actually write a new row version without need.
I assume you are aware that in Postgres every UPDATE writes a new version of the row due to its MVCC model - even if name is set to the same value as before. This would make the operation more expensive, add to possible concurrency issues / lock contention in certain situations and bloat the table additionally.
However, there is still a tiny corner case for a race condition. Concurrent transactions may have added a conflicting row, which is not yet visible in the same statement. Then INSERT and SELECT come up empty.
Proper solution for single-row UPSERT:
Is SELECT or INSERT in a function prone to race conditions?
General solutions for bulk UPSERT:
How to use RETURNING with ON CONFLICT in PostgreSQL?
Without concurrent write load
If concurrent writes (from a different session) are not possible you don't need to lock the row and can simplify:
WITH ins AS (
INSERT INTO names(name)
VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO NOTHING -- no lock needed
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM names
WHERE name = 'bob' -- only executed if no INSERT
LIMIT 1;
Assume a table structure of MyTable(MyTableId NVARCHAR(MAX) PRIMARY KEY, NumberOfInserts INTEGER).
I often need to either update i.e. increment a counter of an existing record, or insert a new record if it doesn't exist with a value of 0 for NumberOfInserts.
Essentially:
IF (MyTableId exists)
run UPDATE command
ELSE
run INSERT command
My concern is losing data due to race conditions, etc.
What's the safest way to write this?
I need it to be 100% accurate if possible, and willing to sacrifice speed where necessary.
MERGE statement can perform both UPDATE and INSERT (and DELETE if needed).
Even though it is a single atomic statement, it is important to use HOLDLOCK query hint to prevent race condition. There is a blog post “UPSERT” Race Condition With MERGE by Dan Guzman where he explains in great details how it works and provides a test script to verify it.
The query itself is straight-forward:
DECLARE #NewKey NVARCHAR(MAX) = ...;
MERGE INTO dbo.MyTable WITH (HOLDLOCK) AS Dst
USING
(
SELECT #NewKey AS NewKey
) AS Src
ON Src.NewKey = Dst.[Key]
WHEN MATCHED THEN
UPDATE
SET NumberOfInserts = NumberOfInserts + 1
WHEN NOT MATCHED THEN
INSERT
(
[Key]
,NumberOfInserts
)
VALUES
(
#NewKey
,0
);
Of course, you can also use explicit two-step approach with a separate check if a row exists and separate UPDATE and INSERT statements. Just make sure to wrap them all in a transaction with appropriate table locking hints.
See Conditional INSERT/UPDATE Race Condition by Dan Guzman for details.
This question already has answers here:
Only inserting a row if it's not already there
(7 answers)
SQL MERGE statement to update data
(6 answers)
Closed 9 years ago.
I have a table "INSERTIF" which looks like this -
id value
S1 s1rocks
S2 s2rocks
S3 s3rocks
Before inserting a row into this table, I wanted to check if the given id exists or not. If it does not exist, then insert. Else, just update the value. I want to do this in a thread safe way. Can you tell me if my code is correct or not ? I tried it and it worked but, I want to be sure that I am not missing anything like performance issues.
EDIT 1- I want to use this code to insert millions of rows, one at a time. Each insert statement is wrapped around the code I have shown.
EDIT 2 - I do not want to use the UPDATE part of my code, only inserting is enough.
I do NOT want to use MERGE because it works with only SQL server 2008 and above
Thanks.
Code -
-- no check insert
INSERT INTO INSERTIF(ID,VALUE)
VALUES('S1', 's1doesNOTrock')
--insert with checking
begin tran /* default read committed isolation level is fine */
if not exists
(select * from INSERTIF with (updlock, rowlock, holdlock)
where ID = 'S1')
BEGIN
INSERT INTO INSERTIF(ID,VALUE)
VALUES('S1', 's1doesNOTrock')
END
else
/* update */
UPDATE INSERTIF
SET VALUE = 's1doesNOTrock'
WHERE ID = 'S1'
commit /* locks are released here */
Code to create table -
CREATE TABLE [dbo].[INSERTIF](
[id] [varchar](50) NULL,
[value] [varchar](50) NULL
)
INSERT [dbo].[INSERTIF] ([id], [value]) VALUES (N'S1', N's1rocks')
INSERT [dbo].[INSERTIF] ([id], [value]) VALUES (N'S2', N's2rocks')
INSERT [dbo].[INSERTIF] ([id], [value]) VALUES (N'S3', N's3rocks')
Your question is about the thread-safety of your code. Succinctly, no — it is not thread-safe. (But see below where isolation is discussed.)
You have a (smallish) window of vulnerability because of the TOCTOU (Time of Check, Time of Use) issue between your 'not exists' SELECT and the corresponding action. Assuming you have a unique (primary) key constraint on the id column, you should use the 'Easier to Ask Forgiveness than Permission' paradigm rather than the 'Look Before You Leap' paradigm (see EAFP vs LBYL).
That means you should determine which of two operation sequences you're going to use:
INSERT, but UPDATE if it fails.
UPDATE, but INSERT if no rows are updated.
Either works. If the work will be mostly insert and occasionally update, then 1 is better than 2; if the work will be mostly update with the occasional insert, then 2 is better than 1. You might even work adaptively; keep a track of what happened in the last N rows (where N might be as few as 5 or as many as 500) and use a heuristic to decide which to try on the new row. There still could be a problem if the INSERT fails (because the row existed) but the UPDATE updates nothing (because someone deleted the row after the insert failed). Similarly, there could still be a problem with UPDATE and INSERT too (no row existed, but one was inserted).
Note that the INSERT option depends wholly on a unique constraint to ensure that duplicate rows are not inserted; the UPDATE option is more reliable.
You also need to consider your isolation level — which might change the original answer. If your isolation is high enough to ensure that after the 'not exists' SELECT is executed, no-one else will be able to insert the row that you determined did not exist, then you may be OK. That gets into some nitty-gritty understanding of your DBMS (and I'm not an SQL Server expert).
You'll need to think about transaction boundaries, too; how big a transaction would be appropriate, especially if the source data has a million entries.
This technique is typically called UPSERT. It can be done in SQL Server using MERGE.
It works like this:
MERGE INTO A_Table
USING
(SELECT 'data_searched' AS Search_Col) AS SRC
-- Search predicates
--
ON A_Table.Data = SRC.Search_Col
WHEN MATCHED THEN
-- Update part of the 'UPSERT'
--
UPDATE SET
Data = 'data_searched_updated'
WHEN NOT MATCHED THEN
-- INSERT part of the 'UPSERT'
--
INSERT (Data)
VALUES (SRC.Search_Col);
Also see http://www.sergeyv.com/blog/archive/2010/09/10/sql-server-upsert-equivalent.aspx
EDIT: I see you're using an older SQL Server. In that case you must use multiple statements.
If only one row with a distinct field is to be updated,I can use:
insert into tab(..) value(..) on duplicate key update ...
But now it's not the case,I need to update 4 rows inside the same table,which have its field "accountId" equal to $_SESSION['accountId'].
What I can get out of my mind at the moment is:
delete from tab where accountId = $_SESSION['accountId'],
then insert the new rows.
Which obviously is not the best solution.
Has someone a better idea about this?
Use the update just like that!
update tab set col1 = 'value' where accountId = $_SESSION['accountId']
Moreover, MySQL allows you to do an update with a join, if that makes your life a bit easier:
update
tab t
inner join accounts a on
t.accountid = a.accountid
set
t.col1 = 'value'
where
a.accountname = 'Tom'
Based on your question, it seems like you should review the Update Statement.
Insert is used to put new rows in - not update them. Delete is used to remove. And Update is used to modify existing rows. Using "Insert On Duplicate Key Update" is a hackish way to modify rows, and is poor form to use when you know the row is already there.
load all of the values in to a temporary table.
UPDATE all of the values using a JOIN.
INSERT all of the values from the temp table that don't exist in the target table.
You can use replace statement. This will work as a DELETE followed by INSERT
I wish to make a trigger but i'm not sure how to grab the data for whatever caused the trigger.
I have a simlpe table.
FooId INT PK NOT NULL IDENTITY
Name VARCHAR(100) NOT NULL
I wish to have a trigger so that when an UPDATE, INSERT or DELETE occurs, i then do the following.
Pseduocode
IF INSERT
Print 'Insert' & Name
ELSE IF UPDATE
Print 'Update' & FooId & Name
ELSE IF DELETE
Print 'Delete' & FooId & Name
Now, I know how to make a trigger for a table.
What i don't know how to do is figure out the values based on what the trigger type is.
Can anyone help?
Edit: Not sure if it helps, but db is Sql Server 2008
the pseudo table "inserted" contains the new data, and "deleted" table contains the old data.
You can do something like
create trigger mytrigger on mytable for insert, update, delete
as
if ( select count(*) from inserted ) > 0
-- insert or update
select FooId, Name from inserted
else
-- delete
select FooId, Name from deleted
To clarify all the comments made by others, on an insert, the inserted table contains data and deleted is empty. On a delete, the situation is reversed. On an update, deleted and inserted contain the "before" and "after" copy of any updated rows.
When you are writing a trigger, you have to account for the fact that your trigger may be called by a statement that effects more than one row at a time.
As others have pointed out, you reference the inserted table to get the values of new values of updated or inserted rows, and you reference the deleted table to get the value of deleted rows.
SQL triggers provide an implicitly-defined table called "inserted" which returns the affected rows, allowing you to do things like
UPDATE mytable SET mytimestamp = GETDATE() WHERE id IN (SELECT id FROM inserted)
Regarding your code sample, you'll want to create separate INSERT, UPDATE and DELETE triggers if you are performing separate actions for each.
(At least, this is the case in SQL Server... you didn't specify a platform.)
On 2008, there is also MERGE command. How do you want to handle it?
Starting from 2008, there are four commands you can modify a table with:
INSERT, UPDATE, DELETE, and MERGE:
http://blogs.conchango.com/davidportas/archive/2007/11/14/SQL-Server-2008-MERGE.aspx
http://sqlblogcasts.com/blogs/grumpyolddba/archive/2009/03/11/reasons-to-move-to-sql-2008-merge.aspx
What do you want your trigger to do when someone issues a MERGE command against your table?