Copy data from table to table doing INSERT or UPDATE - sql

I need to copy a lot of data from one table to another. If the data already exists, I need to update it, otherwise I need to insert it. The data to be copied is selecting using a WHERE condition. The data has a primary key (a string of up to 12 characters).
If I was just inserting the data, I would do
INSERT INTO T2 SELECT COL1, COL2 FROM T1 WHERE T1.ID ='I'
but I cannot figure out how to do the INSERT / UPDATE. I keep seeing references to upserts and MERGE, but MERGE appears to have issues,and I cannot figure ut how to do the upsert for multiple records.
What is the best solution for this?

If you want to avoid merges (though you should not be afraid of it) you can do something like
update t2
set col1 = t1.col1
,col2 = t1.col2
from t2
join t1
on t2.[joinkey] = t1.[joinkey]
where [where clause]
And after for the ones that you do not have
insert into t2(col1,col2)
select col1,col2 from t1
where not exists (select * from t2 where t1.[joinkey] = t2.[joinkey])
in such way you first update the ones that match and then insert the ones that do not. Also if you want it in one go you can wrap it in a transaction.

It is commonly known as UPSERT operation. Yes you are correct in saying merge has some issues with it so stay away from it.
A simple approach assuming there is a Primary Key column in Both tables called PK_Col would be something like this...
BEGIN TRANSACTION;
-- Update already existing records
UPDATE T2
SET T2.Col1 = T1.Col1
,T2.Col2 = T1.Col2
FROM T2 INNER JOIN T1 ON T2.PK_COl = T1.PK_Col
-- Insert missing records
INSERT INTO T2 (COL1, COL2 )
SELECT COL1, COL2
FROM T1
WHERE T1.ID ='I'
AND NOT EXISTS (SELECT 1
FROM T2
WHERE T2.PK_COl = T1.PK_Col )
COMMIT TRANSACTION;
Wrap the whole UPSERT operation in one transaction.

You can use IF EXISTS something like:
if exists (select * from table with (updlock,serializable) where key = #key)
begin
update table set ...
where key = #key
end
else
begin
insert table (key, ...)
values (#key, ...)
end
Another solution is to check ##ROWCOUNT
UPDATE MyTable SET FieldA=#FieldA WHERE Key=#Key
IF ##ROWCOUNT = 0
INSERT INTO MyTable (FieldA) VALUES (#FieldA)

Related

Trigger to insert one record to new rows from another table

I have a small app that inserts data into the database.
I have a something like this
dbo.system.table1(col1, col2, col3) and table2(col1) (table2.col1 is just one single row).
What I want to do is insert the table2.col1 into table1.col3 when a new order is made.
Also the table2.col1 updates twice a day, so every time a new order is made with table1 I need to keep the old table1.col3 changes.
I've tried
CREATE TRIGGER dbo.TR_CHANGES
ON dbo.SYSTEM
AFTER INSERT
AS
UPDATE table1
SET table1.col3 = (SELECT col1
FROM table2
WHERE table1.col3 = table2.col1)
But it ends updating col3 for all rows.
You need to include the Inserted pseudo table into your statement, to find the rows that were actually updated - and I would recommend using proper JOIN syntax instead of those nested subqueries - seems a lot easier to read and understand for me.
So try this:
CREATE TRIGGER dbo.TR_CHANGES
ON dbo.SYSTEM
AFTER INSERT
AS
UPDATE t1
SET col3 = t2.col1
FROM table1 t1
INNER JOIN table2 t2 ON t1.col3 = t2.col1
INNER JOIN inserted i ON t1.primarykeycol = i.primarykeycol
You need to replace the .primarykeycol for the Inserted and t1 tables with the actual primary key column for your table - this is needed to link the Inserted rows with the actual data table

Hive - cannot recognize input 'insert' in select clause

Say I've already created table3, and try to insert data into it using the following code
WITH table1
AS
(SELECT 1 AS key, 'One' AS value),
table2
AS
(SELECT 1 AS key, 'I' AS value)
INSERT TABLE table3
SELECT t1.key, t1.value, t2.value
FROM table1 t1
JOIN table2 t2
ON (t1.key = t2.key)
However, I got an error as cannot recognize input 'insert' in select clause. If I simply delete the insert sentence, then the query runs just fine.
Is this a syntax problem? Or I cannot use with clause to insert?
Use INTO or OVERWRITE depending on what you need:
INSERT INTO TABLE table3 --this will append data, keeping the existing data intact
or
INSERT OVERWRITE TABLE table3 --will overwrite any existing data
Read manual: Inserting data into Hive Tables from queries

Inserting new rows in table if already not exisitng

I have a table than over time can get bigger and I want to insert some of its rows in another table but I also want to make sure I am not duplicating the rows that I had inserted before.
So here is the type of condition for my insert:
INSERT INTO SecondTable(Col1,Col2)
SELECT Col5,Col6
FROM
FirstTable ft
WHERE ft.RecType = 'ABC'
So if I keep running this it will keep inserting the same rows again and again. How can I tell it only insert if it is not already there?
You can use not exists:
INSERT INTO SecondTable(Col1,Col2)
SELECT Col5,Col6
FROM FirstTable ft
WHERE ft.RecType = 'ABC' AND
NOT EXISTS (SELECT 1 FROM SecondTable t2 WHERE t2.col1 = ft.col5 AND t2.col2 = ft.colt6);
Generate unique constraint on table with proper columns which identifies unicity. This will also help you to preserve integrity of your table. when you try to insert records into the RDBMS will give you an error.
ALTER TABLE SecondTable
ADD UNIQUE (col1, col2, col3);
INSERT INTO SecondTable(Col1,Col2)
SELECT Col5,Col6
FROM FirstTable ft
LEFT JOIN SecondTable st ON st.Col1 = ft.Col1
WHERE st.Col1 IS NULL AND ft.RecType = 'ABC'

duplicating rows through multiple tables(new id/foreign key) with some column modifications

The basic concepts is to duplicate rows in table1 where id between for example 100..10000,
modify some of the column data then insert with a new id:
Table2 referencing to table1.id with foreign key, table3 referencing to table2.id with foreign key
.... and tableX referencing to tableX-1.id with foreign key.
I also have the modificate some of the table2..tableX data.
I started to think about writing nested loops; for the first 3 table, it looks like this (in plsql), maybe it should work:
declare
table1_row table1%rowtype;
table2_row table2%rowtype;
table3_row table3%rowtype;
begin
for t1 in(select * from table1
where id between 100 and 10000)
loop
table1_row:=t1;
table1_row.id:=tableseq.nextval;
table1_row.col1:='asdf';
table1_row.col4:='xxx';
insert into table1 values table1_row;
for t2 in(select * from table2
where foreign_key_id =t1.id)
loop
table2_row:=t2;
table2_row.id:=tableseq.nextval;
table2_row.foreign_key_id:=table1_row.id;
table2_row.col3:='gfdgf';
insert into table2 values table2_row;
for t3 in(select * from table3
where foreign_key_id =t2.id)
loop
table3_row:=t3;
table3_row.id:=tableseq.nextval;
table3_row.foreign_key_id:=table2_row.id;
table3_row.col1:='gdfgdg';
insert into table3 values table3_row;
end loop;
end loop;
end loop;
end;
Any better solutions? With about 10-20nested loops, it looks awful :(
Thanks in advance.
I believe you can use an insert statement with several subqueries to clean this up. Here is a simpler example but I believe you can extrapolate for your specific case:
insert into table1
(col1, col2, col3, col4, col5)
values
select 'asdf',
(select table2_data --whatever data from this table you want
from table2
where foreign_key_id =table1.id),
(select table3_data --whatever data from this table you want
from table3
where foreign_key_id =table1.id),
'xxx',
table1.col5
from table1
where table1.id between 100 and 10000
Note, your id column should be set up as an Auto_increment primary key so you shouldn't need it as part of your insert statement. Also, I added "table1.col5" as an example of how to use the same data from the existing row in your duplicated row (as I'm assuming some data you want to be duplicated).

Avoid duplicates in INSERT INTO SELECT query in SQL Server

I have the following two tables:
Table1
----------
ID Name
1 A
2 B
3 C
Table2
----------
ID Name
1 Z
I need to insert data from Table1 to Table2. I can use the following syntax:
INSERT INTO Table2(Id, Name) SELECT Id, Name FROM Table1
However, in my case, duplicate IDs might exist in Table2 (in my case, it's just "1") and I don't want to copy that again as that would throw an error.
I can write something like this:
IF NOT EXISTS(SELECT 1 FROM Table2 WHERE Id=1)
INSERT INTO Table2 (Id, name) SELECT Id, name FROM Table1
ELSE
INSERT INTO Table2 (Id, name) SELECT Id, name FROM Table1 WHERE Table1.Id<>1
Is there a better way to do this without using IF - ELSE? I want to avoid two INSERT INTO-SELECT statements based on some condition.
Using NOT EXISTS:
INSERT INTO TABLE_2
(id, name)
SELECT t1.id,
t1.name
FROM TABLE_1 t1
WHERE NOT EXISTS(SELECT id
FROM TABLE_2 t2
WHERE t2.id = t1.id)
Using NOT IN:
INSERT INTO TABLE_2
(id, name)
SELECT t1.id,
t1.name
FROM TABLE_1 t1
WHERE t1.id NOT IN (SELECT id
FROM TABLE_2)
Using LEFT JOIN/IS NULL:
INSERT INTO TABLE_2
(id, name)
SELECT t1.id,
t1.name
FROM TABLE_1 t1
LEFT JOIN TABLE_2 t2 ON t2.id = t1.id
WHERE t2.id IS NULL
Of the three options, the LEFT JOIN/IS NULL is less efficient. See this link for more details.
In MySQL you can do this:
INSERT IGNORE INTO Table2(Id, Name) SELECT Id, Name FROM Table1
Does SQL Server have anything similar?
I just had a similar problem, the DISTINCT keyword works magic:
INSERT INTO Table2(Id, Name) SELECT DISTINCT Id, Name FROM Table1
I was facing the same problem recently...
Heres what worked for me in MS SQL server 2017...
The primary key should be set on ID in table 2...
The columns and column properties should be the same of course between both tables. This will work the first time you run the below script. The duplicate ID in table 1, will not insert...
If you run it the second time, you will get a
Violation of PRIMARY KEY constraint error
This is the code:
Insert into Table_2
Select distinct *
from Table_1
where table_1.ID >1
Using ignore Duplicates on the unique index as suggested by IanC here was my solution for a similar issue, creating the index with the Option WITH IGNORE_DUP_KEY
In backward compatible syntax
, WITH IGNORE_DUP_KEY is equivalent to WITH IGNORE_DUP_KEY = ON.
Ref.: index_option
From SQL Server you can set a Unique key index on the table for (Columns that needs to be unique)
A little off topic, but if you want to migrate the data to a new table, and the possible duplicates are in the original table, and the column possibly duplicated is not an id, a GROUP BY will do:
INSERT INTO TABLE_2
(name)
SELECT t1.name
FROM TABLE_1 t1
GROUP BY t1.name
In my case, I had duplicate IDs in the source table, so none of the proposals worked. I don't care about performance, it's just done once.
To solve this I took the records one by one with a cursor to ignore the duplicates.
So here's the code example:
DECLARE #c1 AS VARCHAR(12);
DECLARE #c2 AS VARCHAR(250);
DECLARE #c3 AS VARCHAR(250);
DECLARE MY_cursor CURSOR STATIC FOR
Select
c1,
c2,
c3
from T2
where ....;
OPEN MY_cursor
FETCH NEXT FROM MY_cursor INTO #c1, #c2, #c3
WHILE ##FETCH_STATUS = 0
BEGIN
if (select count(1)
from T1
where a1 = #c1
and a2 = #c2
) = 0
INSERT INTO T1
values (#c1, #c2, #c3)
FETCH NEXT FROM MY_cursor INTO #c1, #c2, #c3
END
CLOSE MY_cursor
DEALLOCATE MY_cursor
I used a MERGE query to fill a table without duplications.
The problem I had was a double key in the tables ( Code , Value ) ,
and the exists query was very slow
The MERGE executed very fast ( more then X100 )
examples for MERGE query
For one table it works perfectly when creating one unique index from multiple field. Then simple "INSERT IGNORE" will ignore duplicates if ALL of 7 fields (in this case) will have SAME values.
Select fields in PMA Structure View and click Unique, new combined index will be created.
A simple DELETE before the INSERT would suffice:
DELETE FROM Table2 WHERE Id = (SELECT Id FROM Table1)
INSERT INTO Table2 (Id, name) SELECT Id, name FROM Table1
Switching Table1 for Table2 depending on which table's Id and name pairing you want to preserve.