Delete specific record from multiple duplicates in the table

Delete specific record from multiple duplicates in the table - sql

How do I delete specific record from multiple duplicates
below is the table for eg
This is just one of the example and we have many cases like this. From this table I need to delete rank 2 and 3.
Kindly suggest me best way to identify duplicate records and delete the specific rows

This should work
delete
from <your table> t
where rank != (select top(rank)
from <your table> tt
where tt.emp_id = t.emp_id
order by rank desc --put asc if you want to keep the lowest rank
)
group by t.emp_id

I do not encourage record deleting but this solution can help with expiring records or deleting them:
The table should have a unique ID and a field that allows you to identify that the record has been expired. If it does not, I recommend adding it to the table. You can creating a composite ID in your query but down the road you will wish you had these attributes.
Create a query that identifies every record where the RANK <> 1. This will be your subquery.
Write your UPDATE query
UPDATE A
SET [EXPIRE_DTTM] = GETDATE()
FROM *TableNameWithTheRecords* A
INNER JOIN (*SubQuery*) B ON A.UniqueID = B.UniqueID
**If you truly want to delete the records, use this:
DELETE FROM *TableNameWithTheRecords*
WHERE *UniqueID* = (SELECT *UniqueID* FROM *TableNameWithTheRecords* WHERE RANK <> 1)

WITH tbl_alias AS
(
SELECT emp_ID,
RN = ROW_NUMBER() OVER(PARTITION BY emp_ID ORDER BY emp_ID)
FROM tblName
)
DELETE FROM tbl_alias WHERE RN > 1

Related

SQL query to combine Select duplicates with count and grouping with delete based on Top but not the top 1 of each duplicate

I am looking to combine these 2 statement into one to run as a stored procedure if possible.
I have not used temp tables in queries before and may have to with this, not sure asking advice.
I did not write the original queries and manually run the first one which returns a table listing ID's with duplicate data nad how many records. Then each record ID is put into the 2nd query to remove all but the TOP 1 based on additional filtering criteria.
I have looked at using CTE from SQL select into delete DIRECTLY but am stil at a loss on how to pass each result row ID value into the delete query.
The queries, edited for public consumption are
SELECT id, count() FROM [DEV].[dbo].[7dtest] where FileVer = 1 and CALC_DATE > FORMAT(DATEADD(DD,-7,GETDATE()), 'yyyy-MM-dd') group by id having count() > 1 order by count(*) desc
returns a table with id and number of duplicate rows
then take the id of each row and put into this delete statement
delete from [DEV].[dbo].[7dtest] where AutoID not in (
SELECT TOP 1 AutoID FROM [DEV].[dbo].[7dtest] where FileVer = 1 and id = '123' and CALC_DATE > FORMAT(DATEADD(DD,-7,GETDATE()), 'yyyy-MM-dd')
order by COMPLETED_DATE_CHECK_3 desc, COMPLETED_DATE_CHECK_2 desc, COMPLETED_DATE_CHECK_1 desc)
and FileVer = 1 and id = '123' and CALC_DATE > FORMAT(DATEADD(DD,-7,GETDATE()), 'yyyy-MM-dd')
Can this be done with CTE or do I need to create a temp table and some looping to get the ID one row at a time? Is there a better way I should be doing this?
TIA

Delete duplicate records on SQL Server

I have a table with duplicate records, where I've already created a script to summarize the duplicate records with the original ones, but I'm not able to delete the duplicate records.
I'm trying this way:
DELETE FROM TB_MOVIMENTO_PDV_DETALHE_PLANO_PAGAMENTO
WHERE COD_PLANO_PAGAMENTO IN (SELECT MAX(COD_PLANO_PAGAMENTO) COD_PLANO_PAGAMENTO
FROM TB_MOVIMENTO_PDV_DETALHE_PLANO_PAGAMENTO
GROUP BY COD_PLANO_PAGAMENTO)
The idea was to take the last record of each COD_PLANO_PAGAMENTO and delete it, but this way all the records are being deleted, what am I doing wrong?
The table is structured as follows:
I need to delete, for example, the second record of COD_MOVIMENTO = 405 with COD_PLANO_PAGAMENTO = 9, there should only be one record of COD_PLANO_PAGAMENTO different in each COD_MOVIMENTO

You can use an updatable CTE with row-numbering to calculate which rows to delete.
You may need to adjust the partitioning and ordering clauses, it's not clear exactly what you need.
WITH cte AS (
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY COD_MOVIMENTO, COD_PLANO_PAGAMENTO ORDER BY (SELECT 1)
FROM TB_MOVIMENTO_PDV_DETALHE_PLANO_PAGAMENTO mp
)
DELETE FROM cte
WHERE rn > 1;

Your delete statement will take the max() but even if you have only one record, it'll return a value.
Also note that your group by should be on COD_MOVIMENTO.
As a fix, make sure there are at least two items:
DELETE FROM TB_MOVIMENTO_PDV_DETALHE_PLANO_PAGAMENTO
WHERE COD_PLANO_PAGAMENTO IN
(SELECT MAX(COD_PLANO_PAGAMENTO)COD_PLANO_PAGAMENTO
FROM TB_MOVIMENTO_PDV_DETALHE_PLANO_PAGAMENTO
WHERE cod_plano_pagamento in
(select cod_plano_pagamento
from TB_MOVIMENTO_PDV_DETALHE_PLANO_PAGAMENTO
group by COD_PLANO_PAGAMENTO
having count(*) > 1)
GROUP BY COD_MOVIMENTO )

In your comment you want remove duplicate rows with same COD_MOVIMENTO, COD_PLANO_PAGAMENTO and VAL_TOTAL_APURADO, try this:
delete f1 from
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY COD_MOVIMENTO, COD_PLANO_PAGAMENTO, VAL_TOTAL_APURADO ORDER BY COD_MOVIMENTO) rang
FROM TB_MOVIMENTO_PDV_DETALHE_PLANO_PAGAMENTO
) f1
where f1.rang>1

How do you create a delete query that deletes rows with criteria?

I have an insert query that goes into a table linked to QuickBooks. The table Test_InvoiceLine has a lot of Part_ID's and Descriptions that are exactly the same.
INSERT INTO InvoiceLine (Part_ID, More_Info )
SELECT Part_ID, Description
FROM Test_InvoiceLine;
How can I write a query that goes into the InvoiceLine table and deletes duplicates with the same Part_ID and Description that are already there?

Use a CTE as a temporary result set. You essentially query for duplicate records that are partitioned by your criteria, and delete the records that are not the first record (keeps the original record).
Double-check me before you blow anything away, because it has been a while since I've done this, but this should work.
WITH CTE AS(
SELECT Part_ID, Description,
RowNum = ROW_NUMBER()OVER(PARTITION BY Part_ID, Description ORDER BY Part_ID, Description)
FROM Test_InvoiceLine
)
DELETE FROM CTE WHERE RowNum > 1
Source: How to delete duplicate records in SQL Server

Is there a reason why you need to do an insert then delete? If you're trying to insert -> delete, it sounds like you're actually trying to update the data by Part_ID.
update InvoiceLine a
set More_Info = b.Description
from ( select Part_ID,
Description
from Test_InvoiceLine ) b
where b.Part_ID = a.Part_ID
This will update More_Info in the InvoiceLine table with Description.

This is SQL Server, however, ROW_NUMBER is widely used in other RDMS.
Here is the query you need.
;WITH Data AS
(
SELECT
Part_ID,Description,
RowNumber = ROW_NUMBER() OVER(PARTITION BY Part_Id,Description ORDER BY Part_Id,Description)
FROM Test_InvoiceLine
)
DELETE FROM Data WHERE RowNumber > 1
I don't know how More_Info will make a difference here as the duplicate key does not include it, according to your post, however, if you need to inspect the more_info values in the delete statement then perhaps you could use something similar to the query below.
;WITH Data AS
(
SELECT
More_Info,
Part_ID,Description,
RowNumber = ROW_NUMBER() OVER(PARTITION BY Part_Id,Description ORDER BY Part_Id,Description)
FROM Test_InvoiceLine
)
DELETE T
FROM Test_InvoiceLine T
INNER JOIN Data D ON D.RowNumber > 1 AND D.MoreInfo = "Y" AND D.Part_Id = T.Part_ID

How can I overwrite a column from a column in another table without a join?

I want to simply overwrite values in a column of a table with values from a column in another table.
I have a table based off from another table without a unique identifier in one of the columns, so I don't want to use joins, but just update the values since the rows are in the same order. How do I do that? So far I have tried two different approaches where Approach A only put the value from the first row into every row of the updated table whereas Approach B does not work at all.
Approach A:
Update Transactions
SET Transactions.Amount = Transactions_raw.Amount
FROM Transactions_raw
Approach B:
UPDATE Transactions
SET Amount = (SELECT Amount FROM Transactions_raw)

You need some kind of join to match the tables -- even if on an artificial key:
update t
set Amount = tr.Amount
from (select t.*, row_number() over (order by (select null)) as seqnum
from Transactions t
) t join
(select tr.*, row_number() over (order by (select null)) as seqnum
from Transactions_raw tr
) tr
on t.seqnum = tr.seqnum;

Your assumption that the rows are in the same order may mislead you. If you do select statement without order by and see the same order in both tables, this is not what you want rely on. This so called order is not guaranteed. Instead, you have to have some rule for ordering. When you have this rule, which you can place in order by, then you can add ID column to both tables according to this order.
You can calculate ID value using statement:
update Transactions
set Id = row_number() over(order by ...)
Then you can use regular inner join.

How to keep only one row of a table, removing duplicate rows?

I have a table that has a lot of duplicates in the Name column. I'd
like to only keep one row for each.
The following lists the duplicates, but I don't know how to delete the
duplicates and just keep one:
SELECT name FROM members GROUP BY name HAVING COUNT(*) > 1;
Thank you.

See the following question: Deleting duplicate rows from a table.
The adapted accepted answer from there (which is my answer, so no "theft" here...):
You can do it in a simple way assuming you have a unique ID field: you can delete all records that are the same except for the ID, but don't have "the minimum ID" for their name.
Example query:
DELETE FROM members
WHERE ID NOT IN
(
SELECT MIN(ID)
FROM members
GROUP BY name
)
In case you don't have a unique index, my recommendation is to simply add an auto-incremental unique index. Mainly because it's good design, but also because it will allow you to run the query above.

It would probably be easier to select the unique ones into a new table, drop the old table, then rename the temp table to replace it.
#create a table with same schema as members
CREATE TABLE tmp (...);
#insert the unique records
INSERT INTO tmp SELECT * FROM members GROUP BY name;
#swap it in
RENAME TABLE members TO members_old, tmp TO members;
#drop the old one
DROP TABLE members_old;

We have a huge database where deleting duplicates is part of the regular maintenance process. We use DISTINCT to select the unique records then write them into a TEMPORARY TABLE. After TRUNCATE we write back the TEMPORARY data into the TABLE.
That is one way of doing it and works as a STORED PROCEDURE.

If we want to see first which rows you are about to delete. Then delete them.
with MYCTE as (
SELECT DuplicateKey1
,DuplicateKey2 --optional
,count(*) X
FROM MyTable
group by DuplicateKey1, DuplicateKey2
having count(*) > 1
)
SELECT E.*
FROM MyTable E
JOIN MYCTE cte
ON E.DuplicateKey1=cte.DuplicateKey1
AND E.DuplicateKey2=cte.DuplicateKey2
ORDER BY E.DuplicateKey1, E.DuplicateKey2, CreatedAt
Full example at http://developer.azurewebsites.net/2014/09/better-sql-group-by-find-duplicate-data/

You can join table with yourself by matched field and delete unmatching rows
DELETE t1 FROM table_name t1
LEFT JOIN tablename t2 ON t1.match_field = t2.match_field
WHERE t1.id <> t2.id;

delete dup row keep one
table has duplicate rows and may be some rows have no duplicate rows then it keep one rows if have duplicate or single in a table.
table has two column id and name if we have to remove duplicate name from table
and keep one. Its Work Fine at My end You have to Use this query.
DELETE FROM tablename
WHERE id NOT IN(
SELECT id FROM
(
SELECT MIN(id)AS id
FROM tablename
GROUP BY name HAVING
COUNT(*) > 1
)AS a )
AND id NOT IN(
(SELECT ids FROM
(
SELECT MIN(id)AS ids
FROM tablename
GROUP BY name HAVING
COUNT(*) =1
)AS a1
)
)
before delete table is below see the screenshot:
enter image description here
after delete table is below see the screenshot this query delete amit and akhil duplicate rows and keep one record (amit and akhil):
enter image description here

if you want to remove duplicate record from table.
CREATE TABLE tmp SELECT lastname, firstname, sex
FROM user_tbl;
GROUP BY (lastname, firstname);
DROP TABLE user_tbl;
ALTER TABLE tmp RENAME TO user_tbl;

show record
SELECT `page_url`,count(*) FROM wl_meta_tags GROUP BY page_url HAVING count(*) > 1
delete record
DELETE FROM wl_meta_tags
WHERE meta_id NOT IN( SELECT meta_id
FROM ( SELECT MIN(meta_id)AS meta_id FROM wl_meta_tags GROUP BY page_url HAVING COUNT(*) > 1 )AS a )
AND meta_id NOT IN( (SELECT ids FROM (
SELECT MIN(meta_id)AS ids FROM wl_meta_tags GROUP BY page_url HAVING COUNT(*) =1 )AS a1 ) )
Source url

WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY [emp_id] ORDER BY [emp_id]) AS Row, * FROM employee_salary
)
DELETE FROM CTE
WHERE ROW <> 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Delete specific record from multiple duplicates in the table - sql

How do I delete specific record from multiple duplicates below is the table for eg This is just one of the example and we have many cases like this. From this table I need to delete rank 2 and 3. Kindly suggest me best way to identify duplicate records and delete the specific rows

This should work delete from <your table> t where rank != (select top(rank) from <your table> tt where tt.emp_id = t.emp_id order by rank desc --put asc if you want to keep the lowest rank ) group by t.emp_id

WITH tbl_alias AS ( SELECT emp_ID, RN = ROW_NUMBER() OVER(PARTITION BY emp_ID ORDER BY emp_ID) FROM tblName ) DELETE FROM tbl_alias WHERE RN > 1

Related

SQL query to combine Select duplicates with count and grouping with delete based on Top but not the top 1 of each duplicate

Delete duplicate records on SQL Server

How do you create a delete query that deletes rows with criteria?

How can I overwrite a column from a column in another table without a join?

How to keep only one row of a table, removing duplicate rows?

Categories

Resources