SQL: Deleting Duplicates using Not in and Group by

SQL: Deleting Duplicates using Not in and Group by - sql

I have the following SQL Syntax to delete duplicate rows, but never are any rows affected.
DELETE FROM content_stacks WHERE id NOT IN (
SELECT id
FROM content_stacks
GROUP BY user_id, content_id
);
The subquery itself is returning the id list of first entries correctly.
SELECT id
FROM content_stacks
GROUP BY user_id, content_id
When I'm inserting the results list as a string it is working, too:
DELETE FROM content_stacks WHERE id NOT IN (239,231,217,218,219,232,233,220,230,226,234,235,224,225,221,223,222,227,228,229,236,237,238,216,208,209,210,204,211,212,242,203,240,201,241,205,206,207,213,214,215);
I checked many similar examples and this should be working in my opinion. What am I missing?

First find first rows using ROW_NUMBER Then delete record with row number greater than 1:
WITH CTE AS (
SELECT id , ROW_NUMBER() OVER(PARTITION BY user_id, content_id, ORDER BY id) rn
FROM content_stacks
)
DELETE cs
FROM content_stacks cs
INNER JOIN CTE ON CTE.id = cs.id
WHERE rn > 1

Am sorry to ask but if your deleting why would u need to group the records.
Are not just increasing the runtime.

The code from Meyssam Toluie is not working as it is but I made a similar solution with the same idea with rownumbers:
DELETE FROM content_stacks WHERE id IN
(SELECT id FROM (
SELECT id, ROW_NUMBER() OVER(PARTITION BY user_id, content_id)row_num
FROM content_stacks
) sub
WHERE row_num > 1)
This is working for me now.
My first command did not work because: The group by command does not show all ids in the output, but they are still there, so in fact all ids were returned in the NOT IN id-list. The row number seems to be the easiest way for this problem.

Related

Delete duplicate records on SQL Server

I have a table with duplicate records, where I've already created a script to summarize the duplicate records with the original ones, but I'm not able to delete the duplicate records.
I'm trying this way:
DELETE FROM TB_MOVIMENTO_PDV_DETALHE_PLANO_PAGAMENTO
WHERE COD_PLANO_PAGAMENTO IN (SELECT MAX(COD_PLANO_PAGAMENTO) COD_PLANO_PAGAMENTO
FROM TB_MOVIMENTO_PDV_DETALHE_PLANO_PAGAMENTO
GROUP BY COD_PLANO_PAGAMENTO)
The idea was to take the last record of each COD_PLANO_PAGAMENTO and delete it, but this way all the records are being deleted, what am I doing wrong?
The table is structured as follows:
I need to delete, for example, the second record of COD_MOVIMENTO = 405 with COD_PLANO_PAGAMENTO = 9, there should only be one record of COD_PLANO_PAGAMENTO different in each COD_MOVIMENTO

You can use an updatable CTE with row-numbering to calculate which rows to delete.
You may need to adjust the partitioning and ordering clauses, it's not clear exactly what you need.
WITH cte AS (
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY COD_MOVIMENTO, COD_PLANO_PAGAMENTO ORDER BY (SELECT 1)
FROM TB_MOVIMENTO_PDV_DETALHE_PLANO_PAGAMENTO mp
)
DELETE FROM cte
WHERE rn > 1;

Your delete statement will take the max() but even if you have only one record, it'll return a value.
Also note that your group by should be on COD_MOVIMENTO.
As a fix, make sure there are at least two items:
DELETE FROM TB_MOVIMENTO_PDV_DETALHE_PLANO_PAGAMENTO
WHERE COD_PLANO_PAGAMENTO IN
(SELECT MAX(COD_PLANO_PAGAMENTO)COD_PLANO_PAGAMENTO
FROM TB_MOVIMENTO_PDV_DETALHE_PLANO_PAGAMENTO
WHERE cod_plano_pagamento in
(select cod_plano_pagamento
from TB_MOVIMENTO_PDV_DETALHE_PLANO_PAGAMENTO
group by COD_PLANO_PAGAMENTO
having count(*) > 1)
GROUP BY COD_MOVIMENTO )

In your comment you want remove duplicate rows with same COD_MOVIMENTO, COD_PLANO_PAGAMENTO and VAL_TOTAL_APURADO, try this:
delete f1 from
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY COD_MOVIMENTO, COD_PLANO_PAGAMENTO, VAL_TOTAL_APURADO ORDER BY COD_MOVIMENTO) rang
FROM TB_MOVIMENTO_PDV_DETALHE_PLANO_PAGAMENTO
) f1
where f1.rang>1

Remove duplicate records except the first record in SQL

I want to remove all duplicate records except the first one.
Like :
NAME
R
R
rajesh
YOGESH
YOGESH
Now in the above I want to remove the second "R" and the second "YOGESH".
I have only one column whose name is "NAME".

Use a CTE (I have several of these in production).
;WITH duplicateRemoval as (
SELECT
[name]
,ROW_NUMBER() OVER(PARTITION BY [name] ORDER BY [name]) ranked
from #myTable
ORDER BY name
)
DELETE
FROM duplicateRemoval
WHERE ranked > 1;
Explanation: The CTE will grab all of your records and apply a row number for each unique entry. Each additional entry will get an incrementing number. Replace the DELETE with a SELECT * in order to see what it does.

Seems like a simple distinct modifier would do the trick:
SELECT DISTINCT name
FROM mytable

This is bigger code but it works perfectly where you don't take the original row but find all the duplicate Rows
select majorTable.RowID,majorTable.Name,majorTable.Value from
(select outerTable.Name, outerTable.Value, RowID, ROW_NUMBER()
over(partition by outerTable.Name,outerTable.Value order by RowID)
as RowNo from #Your_Table outerTable inner join
(select Name, Value,COUNT(*) as duplicateRows FROM #Your_Table group by Name, Value
having COUNT(*)>1)innerTable on innerTable.Name = outerTable.Name
and innerTable.Value = outerTable.Value)majorTable where MajorTable.ROwNo <>1

Remove multiple postings but keep first

I have a table which has had numerous postings over the course of the week that I need to remove, the timestamp is different so i need to keep the first entry but then remove all the others which came after that.
What techniques would be advised.
SQL Server 2008
Many thanks
J

You can use a CTE with delete. The result is something like this:
with todelete as (
select p.*,
row_number() over (partition by post_id order by datetimecol asc) as seqnum
from posts p
)
delete from todelete
where seqnum > 1;
You can just run the subquery to see what is happening.

Delete all the posts except the oldest
DELETE FROM tbl
WHERE ID NOT IN
(
select top 1 id
from tbl
order by TimeStampColumn
)

Select last duplicate row with different id Oracle 11g

I have a table that look like this:
The problem is I need to get the last record with duplicates in the column "NRODENUNCIA".

You can use MAX(DENUNCIAID), along with GROUP BY... HAVING to find the duplicates and select the row with the largest DENUNCIAID:
SELECT MAX(DENUNCIAID), NRODENUNCIA, FECHAEMISION, ADUANA, MES, NOMBREESTADO
FROM YourTable
GROUP BY NRODENUNCIA, FECHAEMISION, ADUANA, MES, NOMBREESTADO
HAVING COUNT(1) > 1
This will only show rows that have at least one duplicate. If you want to see non-duplicate rows too, just remove the HAVING COUNT(1) > 1

There are a number of solutions for your problem. One is to use row_number.
Note that I've ordered by DENUNCIID in the OVER clause. This defines the "Last Record" as the one that has the largest DENUNCIID. If you want to define it differently you'd need to change the field that is being ordered.
with dupes as (
SELECT
ROW_NUMBER() OVER (Partition by NRODENUNCIA ORDER BY DENUNCIID DESC) RN,
*
FROM
YourTable
)
SELECT * FROM dupes where rn = 1
This only get's the last record per dupe.
If you want to only include records that have dupes then you change the where clause to
WHERE rn =1
and NRODENUNCIA in (select NRODENUNCIA from dupes where rn > 1)

Deleting rows where the Primary key is duplicated - SQL

My issue is how do we delete a primary key row in case it is duplicated. The other fields may/may not be duplicates. I am interested only in the primary key being duplicated and would like to retain the first instance while deleting the other duplicate entries.
For example,
I have 2 tables with the following data:
Table1:- Portfolio
Columns:- PortfolioID(PK), PortfolioName
Sample data :-
1, North America
2, Europe
3, Asia
Table2:- Account
Columns:- AccountID(PK), PortfolioID(FK), AccountName
Sample data :-
1,1,Quake
1,1,Wind
2,1,Fire
3,1,Quake
4,2,Flood
5,2,Wind
Lets say for PortfolioID = 1,
I am trying to delete row number 2 from the Account table where the AccountID 1 is repeated for PortfolioID =1. I have tried using the CTE expression where I use the ROW_NUMBER statement and try to delete ROWNUMBER <> 1. But this query doesn't work as it deletes all the rows in the table.
The query I tried:
WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY [Account].[AccountID] ORDER BY [Account].[AccountID]) AS [ROWNUMBER],
[Account].[AccountID]
FROM [Account]
INNER JOIN [Portfolio] ON [Portfolio].[PortfolioID] = [Account]. [PortfolioID]
WHERE [Portfolio].[PortfolioID] = 1
)
DELETE [Account]
FROM [CTE]
WHERE [ROWNUMBER] <> 1
Am I doing something wrong in the query? Thanks in advance for the help.

Firstly, if you define the AccountID column as the primary key in your database, this going forward will help solve having these kinds of problems.
Secondly, are you using Sql Server? Which version?
Assuming you are using Sql Server and a recent version which allows you to use windowing, you can try something like this to delete any duplicates that you have.
This will delete ALL copies of ALL duplicates:
WITH CTE AS
(SELECT *,R=RANK() OVER (ORDER BY AccountID,PortfolioID)
FROM Account)
DELETE CTE
WHERE R IN (SELECT R FROM CTE GROUP BY R HAVING COUNT(*)>1)
This alternative script will keep one of the duplicates if that is what you prefer:
WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY AccountID,PortfolioID ORDER BY AccountID,PortfolioID) AS RN
FROM Account
)
DELETE FROM CTE WHERE RN<>1
Finally, if you want to only delete duplicates for Portfolio Id 1:
WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY AccountID,PortfolioID ORDER BY AccountID,PortfolioID) AS RN
FROM Account
Where PortfolioID = 1
)
DELETE FROM CTE WHERE RN<>1

Primary key column never ever support duplicate entries.
Try with the below query for the desired result based on the given data/inputs.
;WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY a.[AccountID],a.PortfolioID ORDER BY a.[AccountID]) AS [ROWNUMBER],*
FROM [Account] a
WHERE a.[PortfolioID] = 1
)
DELETE
FROM [CTE]
WHERE [ROWNUMBER] > 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL: Deleting Duplicates using Not in and Group by - sql

First find first rows using ROW_NUMBER Then delete record with row number greater than 1: WITH CTE AS ( SELECT id , ROW_NUMBER() OVER(PARTITION BY user_id, content_id, ORDER BY id) rn FROM content_stacks ) DELETE cs FROM content_stacks cs INNER JOIN CTE ON CTE.id = cs.id WHERE rn > 1

Am sorry to ask but if your deleting why would u need to group the records. Are not just increasing the runtime.

Related

Delete duplicate records on SQL Server

Remove duplicate records except the first record in SQL

Remove multiple postings but keep first

Select last duplicate row with different id Oracle 11g

Deleting rows where the Primary key is duplicated - SQL

Categories

Resources