PostgreSQL insert into from select but ignore existing rows - sql

I want to insert data into a table from a select. This works fine so far...
INSERT INTO table_2
SELECT t.id, 1
FROM table_1 t
WHERE t.title LIKE '%search%';
But when I run this a second time, the statement raises an exception, because some of the rows already exist.
What can I do to get around this?
Thanks for your help,
Urkman

You can insert rows where they don't already exist, by adding that as a clause.
insert into table_2
select t.id, 1
from table_1 t
where t.title like '%search%'
and not exists (select t2.id from table_2 t2 where t2.id = t.id);

Related

How can I effectively check uniqueness of an item in database?

I have a spreadsheet that needs to be uploaded. But each row needs to be check if they are unique in the database. One option that I can think of is to check each row exists in the database or not, that means doing N sql queries for N rows. Is there any alternatives on effectively checking for unique data instead of checking row by row?
we can do that by using NOT EXISTS and NOT IN methods. To do that, first we need to insert that spreadsheet data's into one temp table.
lets take your main table as TABLE_1 and your spreadsheet temp table as TABLE_2.
Using NOT IN -
INSERT INTO TABLE_1 (id, name)
SELECT t2.id, t2.name FROM TABLE_2 t2
WHERE t2.id NOT IN (SELECT id FROM TABLE_1)
Using NOT EXISTS -
INSERT INTO TABLE_1 (id, name)
SELECT t2.id, t2.name FROM TABLE_2 t2
WHERE NOT EXISTS(SELECT id FROM TABLE_1 t1 WHERE t1.id = t2.id)

Hive - cannot recognize input 'insert' in select clause

Say I've already created table3, and try to insert data into it using the following code
WITH table1
AS
(SELECT 1 AS key, 'One' AS value),
table2
AS
(SELECT 1 AS key, 'I' AS value)
INSERT TABLE table3
SELECT t1.key, t1.value, t2.value
FROM table1 t1
JOIN table2 t2
ON (t1.key = t2.key)
However, I got an error as cannot recognize input 'insert' in select clause. If I simply delete the insert sentence, then the query runs just fine.
Is this a syntax problem? Or I cannot use with clause to insert?
Use INTO or OVERWRITE depending on what you need:
INSERT INTO TABLE table3 --this will append data, keeping the existing data intact
or
INSERT OVERWRITE TABLE table3 --will overwrite any existing data
Read manual: Inserting data into Hive Tables from queries

In SQL Server, how can I insert values into table EXCEPT repeat rows?

I wrote a python script which would look at a text file and create SQL code which would insert data into a table.
It looked like this:
insert into table1 (date, locid, personid, itemid, amounts)
values (val11,val12,val13,val14,val15)
,(val21,val22,val23,val24,val25)
The data is structured so that for a particular set of values the first four columns (date, locid, personid, itemid), there will be at most one row.
At the moment, I have to check by hand if an entry already exists in the table, then remove it from the insert statement.
How can I enter this data into the DB without manually checking for repeats?
Create a second table, table2, with the DDL for just the columns involved in your insert.
Do your inserts into table2.
Then run:
insert into table1 (date, locid, personid, itemid, amounts)
select t2.* from table2 t2
join(
select locid, personid, itemid, amounts from table2
except
select locid, personid, itemid, amounts from table1) x
on t2.locid = x.locid
and t2.personid = x.personid
and t2.itemid = x.itemid
and t2.amounts = x.amounts
Then you can delete table2.
And table1 will be populated with only those INSERTS where values in all 4 columns are not a match with all 4 columns on any existing row of table1.
This assumes you do not want the INSERT to go through only if there exists a match in all 4 columns. In other words the above will do the INSERT if there is a row where 3 of the 4 columns match. It only stops the INSERT when there already exists a row that is a perfect match on all 4 columns.
If you have duplicate INSERT statements being generated as well, simply add the DISTINCT operator to the query, "select DISTINCT t2.* from table2 t2"
As ludwigmace pointed out in the comments maybe try the following as well and compare the performance difference, it should be functionally equivalent (if the inserts don't contain duplicates, you can get rid of the group by) ---
insert into table1 (date, locid, personid, itemid, amounts)
SELECT t2.date, t2.locid, t2.personid, t2.itemid, t2.amounts
FROM table2 t2
LEFT JOIN t1
ON t2.date = t1.date
AND t2.locid = t1.locid
AND t2.personid = t1.personid
AND t2.itemid = t1.itemid
WHERE t1.date is null
GROUP BY t2.date, t2.locid, t2.personid, t2.itemid, t2.amounts
This should work:
INSERT INTO [table1] ([date], [locid], [personid], [itemid], [amounts])
SELECT val1, val2, val3, val4, val5
WHERE NOT EXISTS
(
SELECT * FROM [table1] WHERE [date]=val1 AND [locid]=val2 AND [personid]=val3 AND [itemid]=val4
)
Instead of inserting values directly, you insert from the result of a select statement.
The select statement is crafted to only return the values that you specify if they don't already exist
You can control the scope of uniqueness (the combination of which columns should be unique) by modifying the Where-clause in the second select (i.e. add or remove comparisons as needed)

sql query distinct join

Sorry I missed and deleted earlier question on accident again.
I have a situation, I am trying to select distinct values from table 1 that are new and store them in table 2. The problem is that table has duplicates on column "name" but it does have a key column "id", but the different ids of course map to the same name.
My idea on the query would be
INSERT INTO TABLE2
(NAME, UniqueID)
SELECT DISTINCT TABLE1.NAME, TABLE1.ID
FROM TABLE1
LEFT JOIN TABLE2 ON TABLE1.ID=TABLE2.UniqueID
WHERE TABLE2.NAME IS NULL
Need help on getting the query to return my desired results, right now it still produces duplicates in table2 (on name column), which I don't want. I would want it to only append new records even if I run the query multiple times. For example if two new records were added into table1 but one has the name already in table 2, then the query would only add 1 new record to table2
just a note: I am using ms access, so it has strict syntax on single queries
EDIT:
Folliwing input I had came with this query
INSERT INTO TABLE2
(NAME, UniqueID)
SELECT TABLE1.NAME, Min(TABLE1.ID)
FROM TABLE1
LEFT JOIN TABLE2 ON TABLE1.NAME=TABLE2.NAME
WHERE TABLE2.UniqueID IS NULL
Group By TABLE1.NAME;
but these actually had to be separated to two separate wueries in access to run without a reserver error flag but now I ran into additional problem. When I run the two separate queries, it works fine the first time, but when I run it twice trying to test to see if any new records have been added to table 1, it then appends 1 record when no new records are in table 1, so it appends a blank name value and a duplicate unique id, and continually does that same process everytime I run it.
Since you're pulling both Name and ID, the distinct keyword will only pull distinct combinations of those. Two records with the same Name and different ID's is still valid.
In the case of two Names with different ID's, which would you like to be inserted?...
insert into table2 (Name, UniqueID)
select t1.Name, MIN(t1.ID)
from table1 t1
left join table2 t2 on t1.ID = t2.UniqueID
where t2.Name is null
group by t1.Name
in response to comments, I realize the Name field is what should be joined on, to prevent dupes that already exist.
insert into table2 (Name, UniqueID)
select t1.Name, MIN(t1.ID)
from table1 t1
left join table2 t2 on t1.Name = t2.Name
where t2.UniqueID is null
group by t1.Name
INSERT INTO TABLE2 (UniqueID, NAME)
SELECT min(t1.ID) as UniqueID, t1.NAME
FROM TABLE1 t1
LEFT JOIN TABLE2 t2 ON t1.ID=t2.UniqueID
WHERE t2.NAME IS NULL
group by t1.NAME

Avoid duplicates in INSERT INTO SELECT query in SQL Server

I have the following two tables:
Table1
----------
ID Name
1 A
2 B
3 C
Table2
----------
ID Name
1 Z
I need to insert data from Table1 to Table2. I can use the following syntax:
INSERT INTO Table2(Id, Name) SELECT Id, Name FROM Table1
However, in my case, duplicate IDs might exist in Table2 (in my case, it's just "1") and I don't want to copy that again as that would throw an error.
I can write something like this:
IF NOT EXISTS(SELECT 1 FROM Table2 WHERE Id=1)
INSERT INTO Table2 (Id, name) SELECT Id, name FROM Table1
ELSE
INSERT INTO Table2 (Id, name) SELECT Id, name FROM Table1 WHERE Table1.Id<>1
Is there a better way to do this without using IF - ELSE? I want to avoid two INSERT INTO-SELECT statements based on some condition.
Using NOT EXISTS:
INSERT INTO TABLE_2
(id, name)
SELECT t1.id,
t1.name
FROM TABLE_1 t1
WHERE NOT EXISTS(SELECT id
FROM TABLE_2 t2
WHERE t2.id = t1.id)
Using NOT IN:
INSERT INTO TABLE_2
(id, name)
SELECT t1.id,
t1.name
FROM TABLE_1 t1
WHERE t1.id NOT IN (SELECT id
FROM TABLE_2)
Using LEFT JOIN/IS NULL:
INSERT INTO TABLE_2
(id, name)
SELECT t1.id,
t1.name
FROM TABLE_1 t1
LEFT JOIN TABLE_2 t2 ON t2.id = t1.id
WHERE t2.id IS NULL
Of the three options, the LEFT JOIN/IS NULL is less efficient. See this link for more details.
In MySQL you can do this:
INSERT IGNORE INTO Table2(Id, Name) SELECT Id, Name FROM Table1
Does SQL Server have anything similar?
I just had a similar problem, the DISTINCT keyword works magic:
INSERT INTO Table2(Id, Name) SELECT DISTINCT Id, Name FROM Table1
I was facing the same problem recently...
Heres what worked for me in MS SQL server 2017...
The primary key should be set on ID in table 2...
The columns and column properties should be the same of course between both tables. This will work the first time you run the below script. The duplicate ID in table 1, will not insert...
If you run it the second time, you will get a
Violation of PRIMARY KEY constraint error
This is the code:
Insert into Table_2
Select distinct *
from Table_1
where table_1.ID >1
Using ignore Duplicates on the unique index as suggested by IanC here was my solution for a similar issue, creating the index with the Option WITH IGNORE_DUP_KEY
In backward compatible syntax
, WITH IGNORE_DUP_KEY is equivalent to WITH IGNORE_DUP_KEY = ON.
Ref.: index_option
From SQL Server you can set a Unique key index on the table for (Columns that needs to be unique)
A little off topic, but if you want to migrate the data to a new table, and the possible duplicates are in the original table, and the column possibly duplicated is not an id, a GROUP BY will do:
INSERT INTO TABLE_2
(name)
SELECT t1.name
FROM TABLE_1 t1
GROUP BY t1.name
In my case, I had duplicate IDs in the source table, so none of the proposals worked. I don't care about performance, it's just done once.
To solve this I took the records one by one with a cursor to ignore the duplicates.
So here's the code example:
DECLARE #c1 AS VARCHAR(12);
DECLARE #c2 AS VARCHAR(250);
DECLARE #c3 AS VARCHAR(250);
DECLARE MY_cursor CURSOR STATIC FOR
Select
c1,
c2,
c3
from T2
where ....;
OPEN MY_cursor
FETCH NEXT FROM MY_cursor INTO #c1, #c2, #c3
WHILE ##FETCH_STATUS = 0
BEGIN
if (select count(1)
from T1
where a1 = #c1
and a2 = #c2
) = 0
INSERT INTO T1
values (#c1, #c2, #c3)
FETCH NEXT FROM MY_cursor INTO #c1, #c2, #c3
END
CLOSE MY_cursor
DEALLOCATE MY_cursor
I used a MERGE query to fill a table without duplications.
The problem I had was a double key in the tables ( Code , Value ) ,
and the exists query was very slow
The MERGE executed very fast ( more then X100 )
examples for MERGE query
For one table it works perfectly when creating one unique index from multiple field. Then simple "INSERT IGNORE" will ignore duplicates if ALL of 7 fields (in this case) will have SAME values.
Select fields in PMA Structure View and click Unique, new combined index will be created.
A simple DELETE before the INSERT would suffice:
DELETE FROM Table2 WHERE Id = (SELECT Id FROM Table1)
INSERT INTO Table2 (Id, name) SELECT Id, name FROM Table1
Switching Table1 for Table2 depending on which table's Id and name pairing you want to preserve.