Deletion of duplicate records using one query only - sql

I am using SQL server 2005.
I have a table like this -
ID Name
1 a
1 a
1 a
2 b
2 b
3 c
4 d
4 d
In this, I want to delete all duplicate entries and retain only one instance as -
ID Name
1 a
2 b
3 c
4 d
I can do this easily by adding another identity column to this table and having unique numbers in it and then deleting the duplicate records. However I want to know if I can delete the duplicate records without adding that additional column to this table.
Additionally if this can be done using only one query statement. i.e. Without using Stored procedures or temp tables.

Using a ROW_NUMBER in a CTE allows you to delete duplicate values while retaining unique rows.
WITH q AS (
SELECT RN = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID )
, ID
, Name
FROM ATable
)
DELETE FROM q WHERE RN > 1

Lieven is Right... however you may want to tweak lieven's code by just adding a top clause in the delete statement like this:
delete top(1) from q where RN > 1;
Hope this helps

You may use this query:
delete a from
(select id,name, ROW_NUMBER() over (partition by id,name order by id) row_Count
from dup_table) a
where a.row_Count >1

delete from table1
USING table1, table1 as vtable
WHERE (NOT table1.ID=vtable.ID)
AND (table1.Name=vtable.Name)

DELETE FROM tbl
WHERE ID NOT IN (
SELECT MIN(ID)
FROM tbl
GROUP BY Name
)

Related

Delete specific record from multiple duplicates in the table

How do I delete specific record from multiple duplicates
below is the table for eg
This is just one of the example and we have many cases like this. From this table I need to delete rank 2 and 3.
Kindly suggest me best way to identify duplicate records and delete the specific rows
This should work
delete
from <your table> t
where rank != (select top(rank)
from <your table> tt
where tt.emp_id = t.emp_id
order by rank desc --put asc if you want to keep the lowest rank
)
group by t.emp_id
I do not encourage record deleting but this solution can help with expiring records or deleting them:
The table should have a unique ID and a field that allows you to identify that the record has been expired. If it does not, I recommend adding it to the table. You can creating a composite ID in your query but down the road you will wish you had these attributes.
Create a query that identifies every record where the RANK <> 1. This will be your subquery.
Write your UPDATE query
UPDATE A
SET [EXPIRE_DTTM] = GETDATE()
FROM *TableNameWithTheRecords* A
INNER JOIN (*SubQuery*) B ON A.UniqueID = B.UniqueID
**If you truly want to delete the records, use this:
DELETE FROM *TableNameWithTheRecords*
WHERE *UniqueID* = (SELECT *UniqueID* FROM *TableNameWithTheRecords* WHERE RANK <> 1)
WITH tbl_alias AS
(
SELECT emp_ID,
RN = ROW_NUMBER() OVER(PARTITION BY emp_ID ORDER BY emp_ID)
FROM tblName
)
DELETE FROM tbl_alias WHERE RN > 1

Deleting rows where the Primary key is duplicated - SQL

My issue is how do we delete a primary key row in case it is duplicated. The other fields may/may not be duplicates. I am interested only in the primary key being duplicated and would like to retain the first instance while deleting the other duplicate entries.
For example,
I have 2 tables with the following data:
Table1:- Portfolio
Columns:- PortfolioID(PK), PortfolioName
Sample data :-
1, North America
2, Europe
3, Asia
Table2:- Account
Columns:- AccountID(PK), PortfolioID(FK), AccountName
Sample data :-
1,1,Quake
1,1,Wind
2,1,Fire
3,1,Quake
4,2,Flood
5,2,Wind
Lets say for PortfolioID = 1,
I am trying to delete row number 2 from the Account table where the AccountID 1 is repeated for PortfolioID =1. I have tried using the CTE expression where I use the ROW_NUMBER statement and try to delete ROWNUMBER <> 1. But this query doesn't work as it deletes all the rows in the table.
The query I tried:
WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY [Account].[AccountID] ORDER BY [Account].[AccountID]) AS [ROWNUMBER],
[Account].[AccountID]
FROM [Account]
INNER JOIN [Portfolio] ON [Portfolio].[PortfolioID] = [Account]. [PortfolioID]
WHERE [Portfolio].[PortfolioID] = 1
)
DELETE [Account]
FROM [CTE]
WHERE [ROWNUMBER] <> 1
Am I doing something wrong in the query? Thanks in advance for the help.
Firstly, if you define the AccountID column as the primary key in your database, this going forward will help solve having these kinds of problems.
Secondly, are you using Sql Server? Which version?
Assuming you are using Sql Server and a recent version which allows you to use windowing, you can try something like this to delete any duplicates that you have.
This will delete ALL copies of ALL duplicates:
WITH CTE AS
(SELECT *,R=RANK() OVER (ORDER BY AccountID,PortfolioID)
FROM Account)
DELETE CTE
WHERE R IN (SELECT R FROM CTE GROUP BY R HAVING COUNT(*)>1)
This alternative script will keep one of the duplicates if that is what you prefer:
WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY AccountID,PortfolioID ORDER BY AccountID,PortfolioID) AS RN
FROM Account
)
DELETE FROM CTE WHERE RN<>1
Finally, if you want to only delete duplicates for Portfolio Id 1:
WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY AccountID,PortfolioID ORDER BY AccountID,PortfolioID) AS RN
FROM Account
Where PortfolioID = 1
)
DELETE FROM CTE WHERE RN<>1
Primary key column never ever support duplicate entries.
Try with the below query for the desired result based on the given data/inputs.
;WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY a.[AccountID],a.PortfolioID ORDER BY a.[AccountID]) AS [ROWNUMBER],*
FROM [Account] a
WHERE a.[PortfolioID] = 1
)
DELETE
FROM [CTE]
WHERE [ROWNUMBER] > 1

Writing a query for deleting duplicate records without deleting all duplicate rows

Suppose Table A has a column B like:
tbl A
------------------------
column B
10
10
20
20
50
50
40
Then how to delete duplicate records? E.g. in the above column there are two "10"s, so only one should be get deleted. This is the condition, how to implement it?
If you use SQL-Server(or other rdbms which support window functions) you can use ROW_NUMBER.
So this works in SQL-Server 2005 and upon:
WITH CTE -- a common-table-expression which is similar to a subquery, increases readability
(
SELECT ColumnB, RN = ROW_NUMBER() OVER (PARTITION BY ColumnB ORDER BY ColumnB)
FROM dbo.TblA
)
DELETE FROM CTE WHERE RN > 1
I like this approach because it's easy to use a SELECT instead to see what I'm going to delete.
This works with Oracle RDBMS
DELETE FROM table_A
WHERE ROWID IN (
SELECT rid
FROM (SELECT ROWID rid,
ROW_NUMBER () OVER (PARTITION BY B ORDER BY ROWID) rn
FROM table_A)
WHERE rn <> 1);

How to find first duplicate row in a table sql server

I am working on SQL Server. I have a table, that contains around 75000 records. Among them there are several duplicate records. So i wrote a query to know which record repeated how many times like,
SELECT [RETAILERNAME],COUNT([RETAILERNAME]) as Repeated FROM [Stores] GROUP BY [RETAILERNAME]
It gives me result like,
---------------------------
RETAILERNAME | Repeated
---------------------------
X | 4
---------------------------
Y | 6
---------------------------
Z | 10
---------------------------
Among 4 record(s) of X record, i need take only first record of X.
so here i want to retrieve all fields from first row of duplicate records. i.e. Take all records whose RETAILERNAME='X' we will get some no. of duplicate records, we need to get only first row from them.
Please guide me.
You could try using ROW_NUMBER.
Something like
;WITH Vals AS (
SELECT [RETAILERNAME],
ROW_NUMBER() OVER(PARTITION BY [RETAILERNAME] ORDER BY [RETAILERNAME]) RowID
FROM [Stores ]
)
SELECT *
FROm Vals
WHERE RowID = 1
SQL Fiddle DEMO
You can then also remove the duplicates if need be (BUT BE CAREFUL THIS IS PERMANENT)
;WITH Vals AS (
SELECT [RETAILERNAME],
ROW_NUMBER() OVER(PARTITION BY [RETAILERNAME] ORDER BY [RETAILERNAME]) RowID
FROM Stores
)
DELETE
FROM Vals
WHERE RowID > 1;
You Can write query as under
SELECT TOP 1 * FROM [Stores] GROUP BY [RETAILERNAME]
HAVING your condition
WITH cte
AS (SELECT [retailername],
Row_number()
OVER(
partition BY [retailername]
ORDER BY [retailername])'RowRank'
FROM [retailername])
SELECT *
FROM cte

Remove duplicates if you have only one column with value

This is too easy, if you have Id column and Value column which has duplicate rows. But in the interview i had been asked how to remove it, if you have only Value column. For example:
table_a input:
Value
A
A
B
A
C
D
D
E
F
F
E
table_a output:
Value
A
B
C
D
E
F
Question: You have table with only one column Value and you have to delete all rows, which have duplicates (as in result upper).
if you are allowed to use CTE:
with cte as (
select
row_number() over(partition by Value order by Value) as row_num,
Value
from Table1
)
delete from cte where row_num > 1
sql fiddle demo
as t-clausen.dk suggested in comments, you don't even need value inside the CTE:
with cte as (
select
row_number() over(partition by Value order by Value) as row_num
from Table1
)
delete from cte where row_num > 1;
Well, gow about using a CTE
A common table expression (CTE) can be thought of as a temporary
result set that is defined within the execution scope of a single
SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is
similar to a derived table in that it is not stored as an object and
lasts only for the duration of the query. Unlike a derived table, a
CTE can be self-referencing and can be referenced multiple times in
the same query.
and ROW_NUMBER.
Returns the sequential number of a row within a partition of a result
set, starting at 1 for the first row in each partition.
Something like
;WITH Vals AS (
SELECT [Value],
ROW_NUMBER() OVER(PARTITION BY [Value] ORDER BY [Value]) RowID
FROM MyTable
)
DELETE
FROM Vals
WHERE RowID > 1
select distinct value
from your_table
(not relevant anymore as the question has been updated by the user, see Roman Pekar's answer)