Oracle SQL Subquery - Usage of NOT EXISTS - sql

I used a query to find a list of Primary Keys. One Primary key per each ForiegnKey in a table by using below query.
select foreignKey, min(primaryKey)
from t
group by foreignKey;
Let us say this is the result : 1,4,5
NOw I have another table - Table B that has list of all Primary keys. It has 1,2,3,6,7,8,9
I want a write a query using the above query So that I get a subset of the original query(above) that does not exist in Table B. I want 4 and 5 back with the new query.

Use a having clause:
select foreignKey, min(primaryKey)
from t
group by foreignKey
having min(primarykey) not in (select pk from b);
You should also be able to express this as not exists:
having not exists (select 1
from b
where b.pk = min(t.primaryKey)
)

Related

Optimisation of sql query for deleting duplicate items from large table

Could anyone please help me optimise one of the queries which is taking more than 20 minutes to run against 3 Million data.
Table Structure
-----------------------------------------------------------------------------------------
|id [INT Auto Inc]| name_id (uuid) | name (varchar)| city (varchar) | name_type(varchar)|
-----------------------------------------------------------------------------------------
Query
The purpose of the query is to eliminate the duplicate, here duplicate means having same name_id and name.
DELETE
FROM records
WHERE id NOT IN
(SELECT DISTINCT
ON (name_id, name) id
FROM records);
I would write your delete using exists logic:
DELETE
FROM records r1
WHERE EXISTS (SELECT 1 FROM records r2
WHERE r2.name_id = r1.name_id AND r2.name = r2.name AND
r2.id < r1.id);
This delete query will spare the duplicate having the smallest id value. To speed this up, you may try adding the following index:
CREATE INDEX idx ON records (name_id, name, id);
You probably already have a primary key on the identity column, then you can use it to exclude redundant rows by id in the following way:
WITH cte AS (
SELECT MIN(id) AS id FROM records GROUP BY name_id, name)
DELETE FROM records
WHERE NOT EXISTS (SELECT id FROM cte WHERE id=records.id)
Even without the index, this should work relatively fast, probably because of merge join strategy.

Insert data from one table to other using select statement and avoid duplicate data

Database: Oracle
I want to insert data from table 1 to table 2 but the catch is, primary key of table 2 is the combination of first 4 letters and last 4 numbers of the primary key of table 1.
For example:
Table 1 - primary key : abcd12349887/abcd22339887/abcder019987
In this case even if the primary key of table 1 is different, but when I extract the 1st 4 and last 4 chars, the output will be same abcd9887
So, when I use select to insert data, I get error of duplicate PK in table 2.
What I want is if the data of the PK is already present then don't add that record.
Here's my complete stored procedure:
INSERT INTO CPIPRODUCTFAMILIE
(productfamilieid, rapport, mesh, mesh_uitbreiding, productlabelid)
(SELECT DISTINCT (CONCAT(SUBSTR(p.productnummer,1,4),SUBSTR(p.productnummer,8,4)))
productnummer,
ps.rapport, ps.mesh, ps.mesh_uitbreiding, ps.productlabelid
FROM productspecificatie ps, productgroep pg,
product p left join cpiproductfamilie cpf
on (CONCAT(SUBSTR(p.productnummer,1,4),SUBSTR(p.productnummer,8,4))) = cpf.productfamilieid
WHERE p.productnummer = ps.productnummer
AND p.productgroepid = pg.productgroepid
AND cpf.productfamilieid IS NULL
AND pg.productietype = 'P'
**AND p.ROWID IN (SELECT MAX(ROWID) FROM product
GROUP BY (CONCAT(SUBSTR(productnummer,1,4),SUBSTR(productnummer,8,4))))**
AND (CONCAT(SUBSTR(p.productnummer,1,2),SUBSTR(p.productnummer,8,4))) not in
(select productfamilieid from cpiproductfamilie));
The highlighted section seems to be wrong, and because of this the data is not picking up.
Please help
Try using this.
p.productnummer IN (SELECT MAX(productnummer) FROM product
GROUP BY (CONCAT(SUBSTR(productnummer,1,4),SUBSTR(productnummer,8,4))))

Trying to delete when not exists is not working. Multiple columns in primary key

I am currently trying to delete from Table A where a corresponding record is not being used in Table B. Table A has Section, SubSection, Code, Text as fields, where the first three are the Primary Key. Table B has ID, Section, SubSection, Code as fields, where all four are the Primary Key. There are more columns, but they are irrelevant to this question...just wanted to point that out before I get questioned on why all columns are part of the Primary Key for Table B. Pretty much Table A is a repository of all possible data that can be assigned to a entity, Table B is where they are assigned. I want to delete all records from table A that are not in use in Table B. I have tried the following with no success:
DELETE FROM Table A
WHERE NOT EXISTS (SELECT * from Table B
WHERE A.section = B.section
AND A.subsection = B.subsection
AND A.code = b.code)
If I do a Select instead of a delete, I get the subset I am looking for, but when I do a delete, I get an error saying that there is a syntax error at Table A. I would use a NOT IN statement, but with multiple columns being part of the Primary Key, I just don't see how that would work. Any help would be greatly appreciated.
In sql server,when using not exists, you need to set an alias for the table to be connected, and in the delete statement, to specify the table to delete rows from.
DELETE a FROM Table_A a
WHERE NOT EXISTS (SELECT * from Table_B b
WHERE a.section = b.section
AND a.subsection = b.subsection
AND a.code = b.code)
Please try :
DELETE FROM Table A
WHERE NOT EXISTS (SELECT 1 from Table B
WHERE A.section = B.section
AND A.subsection = B.subsection
AND A.code = b.code)
1 is just a placeholder, any constant/single non-null column will work.
Try something like this:
delete from Table_A
where (section, subsection, code) not in (select section,
subsection,
code
from Table_B)

how to delete duplicates from a database table based on a certain field

i have a table that somehow got duplicated. i basically want to delete all records that are duplicates, which is defined by a field in my table called SourceId. There should only be one record for each source ID.
is there any SQL that i can write that will delete every duplicate so i only have one record per Sourceid ?
Assuming you have a column ID that can tie-break the duplicate sourceid's, you can use this. Using min(id) causes it to keep just the min(id) per sourceid batch.
delete from tbl
where id NOT in
(
select min(id)
from tbl
group by sourceid
)
delete from table
where pk in (
select i2.pk
from table i1
inner join table i2
on i1.SourceId = i2.SourceId
)
good practice is to start with
select * from … and only later replace to delete from …

How to delete duplicate rows with SQL?

I have a table with some rows in. Every row has a date-field. Right now, it may be duplicates of a date. I need to delete all the duplicates and only store the row with the highest id. How is this possible using a SQL query?
Now:
date id
'07/07' 1
'07/07' 2
'07/07' 3
'07/05' 4
'07/05' 5
What I want:
date id
'07/07' 3
'07/05' 5
DELETE FROM table WHERE id NOT IN
(SELECT MAX(id) FROM table GROUP BY date);
I don't have comment rights, so here's my comment as an answer in case anyone comes across the same problem:
In SQLite3, there is an implicit numerical primary key called "rowid", so the same query would look like this:
DELETE FROM table WHERE rowid NOT IN
(SELECT MAX(rowid) FROM table GROUP BY date);
this will work with any table even if it does not contain a primary key column called "id".
For mysql,postgresql,oracle better way is SELF JOIN.
Postgresql:
DELETE FROM table t1 USING table t2 WHERE t1.date=t2.date AND t1.id<t2.id;
MySQL
DELETE FROM table
USING table, table as vtable
WHERE (table.id < vtable.id)
AND (table.date=vtable.date)
SQL aggregate (max,group by) functions almost always are very slow.