SQL migration script - insert into select with output ID - sql

I am using PostgreSQL and Flyway to perform a data migration in an application. The idea is to move rows from one table to another and keep the link between the old and new table in the old table. So, let's say we have Table_1 with columns (id, name, user_id) and a new Table_2 with similiar columns (id2, name2, user_id2).
Now, the first step will be to add a column to Table_1 that will store the id of its counterpart in new Table_2. So:
alter Table_1 add column if not exists migrated_table_2_id int;
And now I would like to write an sql that will perform the migration of data from Table_1 to Table_2 and at the same time fill in the id values in the migrated_table_2_id column. So something like:
insert into Table_2 (name2, user_id2) select name, user_id from Table_1;
but with filling in the migrated_table_2_id with the newly created row in Table 2

You can use a CTE, assuming that name2, user_id2 or both in combination are unique:
with i as (
insert into Table_2 (name2, user_id2)
select name, user_id
from Table_1
returning *
)
update table_1 t1
set t1.user_id2 = t2.id
from table_2 t2
where t2.name = t1.name and t2.user_id2 = t.user_id;

Related

How can I effectively check uniqueness of an item in database?

I have a spreadsheet that needs to be uploaded. But each row needs to be check if they are unique in the database. One option that I can think of is to check each row exists in the database or not, that means doing N sql queries for N rows. Is there any alternatives on effectively checking for unique data instead of checking row by row?
we can do that by using NOT EXISTS and NOT IN methods. To do that, first we need to insert that spreadsheet data's into one temp table.
lets take your main table as TABLE_1 and your spreadsheet temp table as TABLE_2.
Using NOT IN -
INSERT INTO TABLE_1 (id, name)
SELECT t2.id, t2.name FROM TABLE_2 t2
WHERE t2.id NOT IN (SELECT id FROM TABLE_1)
Using NOT EXISTS -
INSERT INTO TABLE_1 (id, name)
SELECT t2.id, t2.name FROM TABLE_2 t2
WHERE NOT EXISTS(SELECT id FROM TABLE_1 t1 WHERE t1.id = t2.id)

move all values from column in one table to new table and update third table with relation between those tables, PostgreSQL

I am using SQL after a long time and I have following:
I have existing table1 with columns id, name and a lot of other columns, it already contains rows.
I created empty table2 which only has columns id and name.
I created empty table3 which only has reference columns table1_id and table2_id.
Now I want to:
take all the values from column name in table1 (can be NULL, discard them in that case),
insert them as new rows into table2,
insert ids of the corresponding table1 and table2 rows into table3,
remove the column name from table1.
=> probably ALTER TABLE table1 DROP COLUMN name;, but I guess there may be a neater way to cut the result from step 1, transform it and paste as rows in step 2.
EDIT: I came up with something like (not tested yet):
SELECT table1.id, table1.name INTO results FROM table1;
FOR result1 IN
results
LOOP
WITH result2 AS (
INSERT INTO table2 (name) VALUES (result1.name) RETURNING id
)
INSERT INTO table3 (table2_id, table1_id) VALUES (result2.id, result1.id);
END LOOP;
ALTER TABLE table1 DROP COLUMN name;
EDIT:
I forgot to tell that if the name already existed in table2, I don't want to add it again (should be unique in table2), but I add the relation between the id from table1 and from the inserted/existing id from table2 into the table3.
EDIT:
I found we have source scripts for creating the database and I changed it there. Now I don't know how to get rid of this open question :(
For steps 1) & 2):
--Since you already have a table2
DROP TABLE table2;
--Create new table2 with data. Unless you are going to replace NULL with something
--discarding them would just end up with NULL again.
CREATE table2 AS SELECT id, name from table1;
Step 3). Not sure of the purpose of table3 as you would have matching id values between table1 and table2. In fact you could use that to set up a FOREIGN KEY relationship between them.
Step 4) Your solution: ALTER TABLE table1 DROP COLUMN name;
Not sure how you want to use it. If you want to run it as one-time transformation in one bulk, this could help (you can try the code on sqlfiddle):
CREATE TABLE table1 (
id int,
name varchar(9)
);
INSERT INTO table1 (
id,
name
)
VALUES
(1, 'A'),
(2, null),
(3, 'C'),
(4, null),
(5, 'E'),
(6, 'C')
;
CREATE TABLE table2 (
id SERIAL,
name varchar(9) UNIQUE
);
INSERT INTO table2 (name)
SELECT DISTINCT name
FROM table1
WHERE name IS NOT NULL
;
/*
-- This would be better option, but I was not able to test the merge/upsert function of PostgreSQL
INSERT INTO table2 (name)
SELECT name
FROM table1
WHERE name IS NOT NULL
ON CONFLICT table2_name_key DO NOTHING --merge/upsert, supports PostgreSQL 9.5 and newer
;
*/
CREATE TABLE table3 (
id_table1 int,
id_table2 int
) AS
SELECT
t1.id id_table1,
t2.id id_table2
FROM table1 t1
INNER JOIN table2 t2
ON t1.name = t2.name
;
--ALTER TABLE table1 DROP COLUMN name;
This could also be useful:
stackoverflow_1
postgresqltutorial
stackoverflow_2
postgresql documentation with PL/pgSQL code - suggestion you wrote in question is going much more this way

update table column using values from another table column

I've got two tables that look like this
TABLE_1
option_id PK,
condition_id FK,
And I has another table that looks like this
TABLE_2
option_id PK, FK -> TABLE_1
condition_id PK, FK
I want to set condition_id in TABLE_1 with corresponding values for condition_id from TABLE_2.
My script looks like this
UPDATE TABLE_1
SET
condition_id = t2.condition_id
FROM TABLE_1 t1
INNER JOIN TABLE_2 t2
ON t1.option_id = t2.option_id
But it seems to be wrong - after the execution all the values of condition_id in TABLE_1 are the same.
What is wrong?
The problem is: you are using two instances of TABLE_1.
UPDATE TABLE_1 <-- first instance
FROM TABLE_1 t1 <-- second instance
Thus, while the FROM allows you to refer to a combined structure that relates matching entries, this forms a full cross join with the instance of TABLE_1 that is being updated. To avoid this you would need to add a further condition like WHERE TU.option_id=t1.option_id. (I introduced TU as an alias for the update target table to avoid ambiguity.)
Or, likely, you might simply use:
UPDATE TABLE_1 t1
SET
condition_id = t2.condition_id
FROM TABLE_2 t2
WHEREt1.option_id = t2.option_id
Something like this should do it :
UPDATE table1
SET table1.condition_id= table2.condition_id
FROM table1 INNER JOIN table2 ON table1.option_id = table2.option_id

Updating a key on table from another table in Oracle

I am trying to update a key on a table (t1) when the key value is (abc) by getting the value from table (t2).
It is working as expected when I am limiting it to a specific person
update table_a t1
set t1.u_key = (select t2.u_key
from table_b t2
where t2.name_f=t1.name_f
and t2.name_l=t1.name_l
and rownum<=1
and t2='NEVADA')
where t1.u_key = 'abc'
and e.name_f='Lori'
and e.name_l='U'
;
I initially tried without rownum and it said too many rows returned.
To run on all the data with t1.u_key='abc' and took out the specific name, I tried this which has been running until time out.
update table_a t1
set t1.u_key = (select t2.u_key
from table_b t2
where t2.name_f=t1.name_f
and t2.name_l=t1.name_l
and rownum<=1
and t2='NEVADA')
where t1.u_key = 'abc'
;
Can you please look at it and suggest what am I missing.
You should first take a look what is returned when you run the inner SELECT statement alone:
SELECT t2.u_key FROM table_b t2
WHERE t2.name_f IN (SELECT name_f FROM table_a WHERE u_key = 'abc')
AND t2.name_l IN (SELECT name_l FROM table_a WHERE u_key = 'abc')
AND t2='NEVADA'
Examine the results and you will see that there are more than one row returned.
If there should be only matching row per key, you would need to add the key to the inner SELECT as well but I can't tell you how it should look like without additional table descriptions and possibly some sample entries from table_a and table_b.
Use this:
update (
SELECT t2.u_key t2key,
t1.ukey t1key
FROM table_b t2,
table_a t1
where t2.name_f=t1.name_f
and t2.name_l=t1.name_l
and t2='NEVADA'
and rownum<=1 )
SET t1key = t2key
where t1key = 'abc';
merge into table_a t1
using(
select name_f, name_l, max(u_key) as new_key
from table_b t2
where t2='NEVADA'
group by name_f, name_l
) t2
on (t1.name_f=t2.name_f and t1.name_l=t2.name_l and t1.u_key='abc')
when matched then
update set t1.u_key=t2.new_key

Avoid duplicates in INSERT INTO SELECT query in SQL Server

I have the following two tables:
Table1
----------
ID Name
1 A
2 B
3 C
Table2
----------
ID Name
1 Z
I need to insert data from Table1 to Table2. I can use the following syntax:
INSERT INTO Table2(Id, Name) SELECT Id, Name FROM Table1
However, in my case, duplicate IDs might exist in Table2 (in my case, it's just "1") and I don't want to copy that again as that would throw an error.
I can write something like this:
IF NOT EXISTS(SELECT 1 FROM Table2 WHERE Id=1)
INSERT INTO Table2 (Id, name) SELECT Id, name FROM Table1
ELSE
INSERT INTO Table2 (Id, name) SELECT Id, name FROM Table1 WHERE Table1.Id<>1
Is there a better way to do this without using IF - ELSE? I want to avoid two INSERT INTO-SELECT statements based on some condition.
Using NOT EXISTS:
INSERT INTO TABLE_2
(id, name)
SELECT t1.id,
t1.name
FROM TABLE_1 t1
WHERE NOT EXISTS(SELECT id
FROM TABLE_2 t2
WHERE t2.id = t1.id)
Using NOT IN:
INSERT INTO TABLE_2
(id, name)
SELECT t1.id,
t1.name
FROM TABLE_1 t1
WHERE t1.id NOT IN (SELECT id
FROM TABLE_2)
Using LEFT JOIN/IS NULL:
INSERT INTO TABLE_2
(id, name)
SELECT t1.id,
t1.name
FROM TABLE_1 t1
LEFT JOIN TABLE_2 t2 ON t2.id = t1.id
WHERE t2.id IS NULL
Of the three options, the LEFT JOIN/IS NULL is less efficient. See this link for more details.
In MySQL you can do this:
INSERT IGNORE INTO Table2(Id, Name) SELECT Id, Name FROM Table1
Does SQL Server have anything similar?
I just had a similar problem, the DISTINCT keyword works magic:
INSERT INTO Table2(Id, Name) SELECT DISTINCT Id, Name FROM Table1
I was facing the same problem recently...
Heres what worked for me in MS SQL server 2017...
The primary key should be set on ID in table 2...
The columns and column properties should be the same of course between both tables. This will work the first time you run the below script. The duplicate ID in table 1, will not insert...
If you run it the second time, you will get a
Violation of PRIMARY KEY constraint error
This is the code:
Insert into Table_2
Select distinct *
from Table_1
where table_1.ID >1
Using ignore Duplicates on the unique index as suggested by IanC here was my solution for a similar issue, creating the index with the Option WITH IGNORE_DUP_KEY
In backward compatible syntax
, WITH IGNORE_DUP_KEY is equivalent to WITH IGNORE_DUP_KEY = ON.
Ref.: index_option
From SQL Server you can set a Unique key index on the table for (Columns that needs to be unique)
A little off topic, but if you want to migrate the data to a new table, and the possible duplicates are in the original table, and the column possibly duplicated is not an id, a GROUP BY will do:
INSERT INTO TABLE_2
(name)
SELECT t1.name
FROM TABLE_1 t1
GROUP BY t1.name
In my case, I had duplicate IDs in the source table, so none of the proposals worked. I don't care about performance, it's just done once.
To solve this I took the records one by one with a cursor to ignore the duplicates.
So here's the code example:
DECLARE #c1 AS VARCHAR(12);
DECLARE #c2 AS VARCHAR(250);
DECLARE #c3 AS VARCHAR(250);
DECLARE MY_cursor CURSOR STATIC FOR
Select
c1,
c2,
c3
from T2
where ....;
OPEN MY_cursor
FETCH NEXT FROM MY_cursor INTO #c1, #c2, #c3
WHILE ##FETCH_STATUS = 0
BEGIN
if (select count(1)
from T1
where a1 = #c1
and a2 = #c2
) = 0
INSERT INTO T1
values (#c1, #c2, #c3)
FETCH NEXT FROM MY_cursor INTO #c1, #c2, #c3
END
CLOSE MY_cursor
DEALLOCATE MY_cursor
I used a MERGE query to fill a table without duplications.
The problem I had was a double key in the tables ( Code , Value ) ,
and the exists query was very slow
The MERGE executed very fast ( more then X100 )
examples for MERGE query
For one table it works perfectly when creating one unique index from multiple field. Then simple "INSERT IGNORE" will ignore duplicates if ALL of 7 fields (in this case) will have SAME values.
Select fields in PMA Structure View and click Unique, new combined index will be created.
A simple DELETE before the INSERT would suffice:
DELETE FROM Table2 WHERE Id = (SELECT Id FROM Table1)
INSERT INTO Table2 (Id, name) SELECT Id, name FROM Table1
Switching Table1 for Table2 depending on which table's Id and name pairing you want to preserve.