T-SQL Delete half of duplicates with no primary key - sql

In a T-SQL stored procedure I have a complex procedure that is comparing data using temp tables but at the end of everything when I return a single table I end up with duplicate rows. In these rows all columns in the row are EXACTLY the same and there is no primary key within this table. I need to delete only half of these based on the number of times that row occurs. For example if there are eight rows that are all the same value. I want to delete four of them.
There is no way to get rid of them through my SP filtering because the data that is entered is literally duplicate information entered in by the user but I do required half of that information.
I've done some research on the subject and did some testing but it seems as if it's not possible to delete half of the duplicated rows. Is this not possible? Or is there a way?

Here is one way, using a great feature of SQL Server, updatable CTEs:
with todelete as (
select t.*,
row_number() over (partition by col1, col2, col3, . . . order by newid()) as seqnum
from table t
)
delete from todelete
where seqnum % 2 = 0;
This will delete every other value.

Assuming SQL Server 2005+:
;WITH CTE AS
(
SELECT *,
RN=ROW_NUMBER() OVER(PARTITION BY Col1, Col2,...Coln ORDER BY Col1)
FROM YourTempTableHere
)
DELETE FROM CTE
WHERE RN = 1

Related

How to delete duplicate records (SQL) when there is no identification column?

It is a datawarehouse project in which I load a table in which each column refers to another table. The problem is that due to an error in the process, many duplicate records were loaded (approximately 13,000) but they do not have a unique identifier, therefore they are exactly the same. Is there a way to delete only one of the duplicate records so that I don't have to delete everything and repeat the table loading process?
You can use row_number() and a cte:
with cte as (
select row_number() over(
partition by col1, col2, ...
order by (select null)) rn
from mytable
)
delete from cte where rn > 1
The window functions guarantees that the same number will not be assigned twice within a partition - you need to enumerate all column columns in the partition by clause.
If you are going to delete a significant part of the rows, then it might be simpler to empty and recreate the table:
create table tmptable as select distinct * from mytable;
truncate table mytable; -- back it up first!
insert into mytable select * from tmptable;
drop table tmptable;
You can make use row_number to delete the duplicate rows by first partitioning them and then ordering by one of the columns with that partition.
You have to list all your columns in partition by if records are completely identical.
WITH CTE1 AS (
SELECT A.*
, ROW_NUMBER(PARTITION BY CODDIMALUMNO, (OTHER COLUMNS) ORDER BY CODDIMALUMNO) RN
FROM TABLE1 A
)
DELETE FROM CTE1
WHERE RN > 1;
You can use row_number() and an updatable CTE:
with todelete as (
select t.*, row_number() over (partition by . . . ) as seqnum
from t
)
delete from todelete
where seqnum > 1;
The . . . is for the columns that define duplicates.

Delete registers with rownumber greater than specified for each group got from sql

I have a table of people with their commments on a blog . I need to leave the last 10 comments for each person in the table and delete the older ones. Lets say the columns are:
personId
commentId
dateFromComment
I know how to do it with several queries but not with just one query(any subqueries allowed) and for anyDatabase
With:
select personId from PeopleComments
group by personId
having count(*) >10
I would get the people ids who have more than 10 comments but I dont know how to get the comments Ids from there and delete them
Thanks!
In my other answer the DBMS must find and count rows for every row in the table. This can be slow. It would be better to find all rows we want to keep once and then delete the others. Hence this additional answer.
The following works for Oracle as of version 12c:
delete from peoplecomments
where rowid not in
(
select rowid
from peoplecomments
order by row_number() over (partition by personid order by datefromcomment desc)
fetch first 10 rows with ties
);
Apart from ROWID this is standard SQL.
In other DBMS that support window functions and FETCH WITH TIES:
If your table has a single-column primary key, you'd replace ROWID with it.
If your table has a composite primary key, you'd use where (col1, col2) not in (select col1, col2 ...) provided your DBMS supports this syntax.
You need a correlated subquery counting the following comments:
delete from peoplecomments pc
where
(
select count(*)
from peoplecomments pc2
where pc2.personid = pc.personid
and pc2.datefromcomment > pc.datefromcomment
) >= 10; -- at least 10 newer comments for the person
BTW: While it seems we could simply number our rows and delete accordingly via
delete from
(
select
pc.*, row_number() over (partition by personid order by datefromcomment desc) as rn
from peoplecomments pc
)
where rn > 10;
Oracle doesn't allow this and gives us ORA-01732: data manipulation operation not legal on this view.

How to find duplicates from Unique code column and delete the rows they're attached too, while still keeping the original row?

I have a table in my azure sql server named dbo.SQL_Transactional, and there are columns with headers as code, saledate, branchcode
code is my primary key, so if there is ever 2 or more rows with the same code, they are duplicates and need to be deleted. How can I do so?
I don't need to worry about if saledate or branchcode are duplicates, because if the code is duplicated then that's all I need to be able to delete the entire duplicate row.
If you just want to delete the duplicate rows, then try an updateable CTE:
with todelete as (
select t.*, row_number() over (partition by code order by code) as seqnum
from t
)
delete from todelete
where seqnum > 1;
If you just wanted to select one row, then you would use where seqnum = 1.

How to retrieve specific rows from SQL Server table?

I was wondering is there a way to retrieve, for example, 2nd and 5th row from SQL table that contains 100 rows?
I saw some solutions with WHERE clause but they all assume that the column on which WHERE clause is applied is linear, starting at 1.
Is there other way to query a SQL Server table for a specific rows in case table doesn't have a column whose values start at 1?
P.S. - I know for a solution with temporary tables, where you copy your select statement output and add a linear column to the table. I am using T-SQL
Try this,
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY ColumnName ASC) AS rownumber
FROM TableName
) as temptablename
WHERE rownumber IN (2,5)
With SQL Server:
; WITH Base AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY id) RN FROM YourTable
)
SELECT *
FROM Base WHERE RN IN (2, 5)
The id that you'll have to replace with your primary key or your ordering, YourTable that is your table.
It's a CTE (Common Table Expression) so it isn't a temporary table. It's something that will be expanded together with your query.
There is no 2nd or 5th row in the table.
There is only the 2nd or 5th result in a resultset that you return, as determined by the order you specify in that query.
If you are on SQL Server 2005 or above, you could use Row_Number() function. Ex:
;With CTE as (
select col1, ..., row_number() over (order by yourOrderingCol) rn
from yourTable
)
select col1,...
from cte
where rn in (2,5)
Please note that yourOrderingCol will decide the value of row number (i.e. rn).

How to delete duplicate record where PK is uniqueidentifier field

I want to know the way we can remove duplicate records where PK is uniqueidentifier.
I have to delete records on the basis of duplicate values in a set of fields.we can use option to get temptable using Row_Number() and except row number one we can delete rest or the records.
But I wanted to build one liner query. Any suggestion?
You can use CTE to do this, without seeing your table structure here is the basic SQL
;with cte as
(
select *, row_number() over(partition by yourfields order by yourfields) rn
from yourTable
)
delete
from cte
where rn > 1
delete from table t using table ta where ta.dup_field=t.dup_field and t.pk >ta.pk
;