Delete ALL rows that have a duplicate ID - sql

There are plenty of posts on SO where a solution is given to take out rows that are in one way or form duplicate to other rows, leaving only 1.
What I am looking for is how I can delete all rows from my temp-table that do not have a unique ID:
ID other_values
-----------------------------
1 foo bar
2 bar baz
2 null
2 something
3 else
I don't care about the other values; once the ID is not unique, I want all rows out, the result being:
ID other_values
-----------------------------
1 foo bar
3 else
How can I do this?

Try this:
--delete all rows from my temp-table that do not have a unique ID
DELETE from MYTABLE
WHERE ID IN (SELECT ID FROM MYTABLE GROUP BY ID HAVING COUNT(*) > 1)

I would use a DELETE command in conjunction with a subquery to detect duplicates
DELETE
FROM mytable
WHERE ID IN (SELECT ID FROM mytable GROUP BY ID HAVING COUNT(*) > 1)

Use Cte to delete rows.
WITH cte
AS (
SELECT id
,Other_values
,ROW_NUMBER() OVER (
PARTITION BY id ORDER BY id
) rownum
FROM mytable
)
DELETE
FROM cte
WHERE rownum > 1

Related

Removing rows from result set where column only has one value against a user

I have a result set
name stage value
---- ----- -----
jim 1 4
jim 1 8
paul 1 8
paul 1 8
want to remove the rows where 8 is the only value against a person
keep the 2 jim rows and lose the 2 paul rows
You can use not exists. For a select query:
select t.*
from t
where not exists (select 1
from t t2
where t2.name = t.name and t2.value = 8
);
Similar logic (except using exists rather than not exists) can be used for a delete -- if you really want to delete the rows from the table.
If you have a complex query that you don't want to repeat, then window functions are helpful:
select t.*
from (select t.*,
sum(case when value = 8 then 1 else 0 end) over (partition by name) as cnt_8
from t
) t
where cnt_8 = 0;
If your database support analytical function then you can use count as follows:
Select * from
(Select t.*,
Count(case when value <> 8 then 1 end) over (partition by name) as cnt
From your_table t) t
Where cnt > 0
Assuming you also have an ID column (defined as an auto increment integer) defined in your table this query would select the row with the highest id for each unique combination:
select max(id) from t group by name,stage,value
In your example this would only return the latest id for rows having values paul,1,8 in columns name,stage,value respectively.
You can then use the prior query to filter out any duplciates using it in the where clause:
select * from t
where id in (select max(id) from t group by name,stage,value)
Finally you can also delete rows that are not unique if that's your goal:
delete from t
where not id in (select max(id) from t group by name,stage,value)

SQL - Selecting all rows with non matching null rows

How can I select all rows which have a non matching null row? Given the following table, any row with the foreign key 1 should not be returned since a corresponding row with a NULL exists. How could I only select rows with the foreign keys 2 and 3?
foreign_key | created_at
1 12345...
1 12345...
2 12345...
3 12345...
1 NULL
You can use not exists:
select *
from mytable t
where not exists (
select 1
from mytable t1
where t1.foreign_key = t.foreign_key and t1.created_at is null
)
Another option is to use window functions; here is one approach using boolean windowing:
select *
from (
select t.*, bool_or(created_at is null) over(partition by foreignkey) has_null
from mytable t
) t
where not has_null

SQL remove duplicate row depend on certain value

I spend day in hope to figure out how to solve this query.
I have following table
ID Name Pregnancy Gender
1 Raghad Yes Female
1 Raghad No Female
2 Ohoud no Male
What I need is to remove duplicate (in this case 1,1) and to keep one of these rows which has a pregnancy status of yes.
To clarify, I can't use delete since it's a restricted database. I can only retrieve data.
Using an exists clause:
DELETE
FROM yourTable t1
WHERE
pregnancy = 'no' AND
EXISTS (SELECT 1 FROM yourTable t2 WHERE t2.ID = t1.ID AND t2.pregnancy = 'yes');
There are other ways to go about doing this, e.g. using ROW_NUMBER, but as you did not tag your database, I offer the above solution which should work on basically any database.
If you want to just view your data with the "duplicates" removed, then use:
SELECT *
FROM yourTable t1
WHERE
pregnancy = 'yes' OR
NOT EXISTS (SELECT 1 FROM yourTable t2 WHERE t2.ID = t1.ID AND t2.pregnancy = 'yes');
If column Pregnancy have just two values "Yes" and "No", in that case you can use ROW_NUMBER() also to get the results.
;WITH CTE
AS (
SELECT *,ROW_NUMBER() OVER (PARTITION BY id ORDER BY Pregnancy DESC) RN
FROM TABLE_NAME
)
SELECT *
FROM CTE
WHERE RN = 1
In case of multiple values when you want to give highest priorty to "Yes", you can write your query like following
;WITH CTE
AS (
SELECT *,ROW_NUMBER() OVER
(PARTITION BY id ORDER BY CASE WHEN Pregnancy = 'Yes' then 0 else 1 end) RN
FROM TABLE_NAME
)
SELECT *
FROM CTE
WHERE RN= 1
For this sample data you can group by ID, Name, Gender and return the maximum value of the column Pregnancy for each group since Yes is greater compared to No:
SELECT ID, Name, MAX(Pregnancy) Pregnancy, Gender
FROM tablename
GROUP BY ID, Name, Gender
See the demo.
Results:
> ID | Name | Pregnancy | Gender
> -: | :----- | :-------- | :-----
> 1 | Raghad | Yes | Female
> 2 | Ohoud | No | Male
Here is how you could do it in MySQL 8.
Similar Common Table Expressions exist in SQL Server and Oracle.
There you may need to add a comma after then closing parentheses that
ends the CTE (with) definition.
with dups as (
Select id from test
group by id
Having count(1) > 1
)
select * from test
where id in (select id from dups)
and Pregnancy = 'Yes'
union all
select * from test where id not in (select id from dups);
You can see it in action, by running it here
Note this does it without deleting the original.
But it gives you a result set to work with that has what you want.
If you wanted to delete, then you could use this instead, after the dups CTE definition:
delete from test
where id in (select id from dups) and Pregnancy = 'No'
Or distill this into:
delete from test
where id in (Select id from test
group by id
Having count(1) > 1) and Pregnancy = 'No'
1) First of all, update design of your table. ID must be primary key. This would automatically restrict the duplicate rows having same ID.
2) You can use Group by and having clause to remove duplicates
delete from table where pregnancy='no' and exists (SELECT
id
FROM table
GROUP BY id
HAVING count(id)>1)

how to get duplicates when I group rows?

I have this table:
MyTable(ID, FK, ...)
I am using this query:
select ID fromMytable were FK <> 1
group by ID, FK
order by ID
This gives me the result that I want:
255
255
267
268
790
...
The 255 is duplicate because has two differnt KFs. The rest of the IDs has the same FK. I would like to get the IDs which has more than one FK and has differents values.
If an ID has two rows with FK = 2 and FK = 3 then get this ID, but if the ID has FK = 2, FK = 2, FK = 2 I don't want this ID because it has the same FK.
How could I get this IDs?
Thank you so much.
You should count distinct FKs
select ID from Mytable where FK <> 1
group by ID
having count(distinct FK) > 1
order by ID
Try this:
SELECT
ID, COUNT(*)
FROM
fromMytable
WHERE
FK <> 1
GROUP BY
ID
HAVING
COUNT(*) > 1
ORDER BY ID
Use HAVING to find only ID that exists more one once:
select DISTINCT ID
from Mytable
where FK <> 1
group by ID, FK
having count(*) >= 2
order by ID
You can use ROW_NUMBER window function.
SELECT ID FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) RN, ID
from Mytable WHERE FK <> 1
) TMP
WHERE RN = 1

delete duplicate records, keep one

I have a temp table created from a copy from a CSV file and the result includes some duplicate ids. I need to delete any duplication. I have tried the following:
delete from my_table where id in
(select id from (select count(*) as count, id
from my_table group by id) as counts where count>1);
However this deletes both the duplicate records and I must keep one.
How can I delete only the 2nd record with a duplicated Id?
Thanks.
Your query deletes all IDs that have a count greater than 1, so it removes everything that is duplicated. What you need to do is isolate one record from the list of duplicates and preserve that:
delete
from my_table
where id in (select id
from my_table
where some_field in (select some_field
from my_table
group by some_field
having count(id) > 1))
and id not in (select min(id)
from my_table
where some_field in (select some_field
from my_table
group by some_field
having count(id) > 1)
group by some_field);
EDIT Fixed :P
Assuming you don't have foreign key relations...
CREATE TABLE "temp"(*column definitions*);
insert into "temp" (*column definitions*)
select *column definitions*
from (
select *,row_number() over(PARTITION BY id) as rn from "yourtable"
) tm
where rn=1;
drop table "yourtable";
alter table "temp" rename to "yourtable";