delete duplicate sqlite duplicates using temp table - sql

As you can see below, I'm able to select all the row_numbers that are duplicates. I identified them using a window function ROW_NUMBER()
Although I want to delete them from the database.
How can I change my code to remove the duplicates identified, as I'm currently getting an error
WITH RowNumCTE AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY ParcelID,
PropertyAddress,
SalePrice,
SaleDate,
LegalReference
ORDER BY
UniqueID
) row_num
FROM housing_data
)
SELECT *
FROM RowNumCTE
WHERE row_num > 1
Duplicates are identified as having a row_number greater than 1.
Thanks

I found the solution. I used
DELETE FROM housing_data
WHERE ROWID NOT IN (
SELECT MIN(ROWID)
FROM housing_data
GROUP BY ParcelID, PropertyAddress, SalePrice, SaleDate, LegalReference
);

Related

Distinct rows in a table in sql

I have a table with multiple rows of the same member id. I need only distinct rows based on 2 unique columns
Ex: there are 100 different customers, the table has 1000 rows because every customer has multiple cities and segments assigned to him.
I need 100 distinct rows for these customers depending on a unique segment and city combination. There is no specific requirement for this combination, just the first from the table is fine.
So, currently the table is somewhat like this,
Hope this helps.
use row_number()
select * from (select *,row_number() over(partition by memberid order by sales) rn
from table_name
) a where a.rn=1
Handy sql-server top(1) with ties syntax for that
select top(1) with ties t.*
from table_name t
order by row_number() over(partition by memberid order by sales)
As you have no paticular requirement for which exactly row to select, any column will do at order by, it can be null as well
select top(1) with ties t.*
from table_name t
order by row_number() over(partition by memberid order by (select null))
The simplest way to do this is to use the ROW_NUMBER() OVER(GROUP BY...) syntax. You have no need to use an order by, since you want an arbitrary row, but only one, for each member.
Since you need only the expected data, and not the Row_Number value, make sure that you detail the fields returned, like below:
SELECT
MemberId,
city,
segment,
sales
FROM (
SELECT *
ROW_NUMBER() OVER (GROUP BY MemberId) as Seq
FROM [Status]
) src
WHERE Seq = 1

Select an id from a group sql which returns duplicate records?

I have to select CompanyId column only from the following SQL;
select CompanyId,
row_number() over (partition by [GradeName] order by [TankNumber] ) rn
from [Data_DB].[dbo].[Company] where CompanyCode='ASAAA'
In the SQL, I try to figure out duplicate records, and from another table i want to delete some records based on the CompanyId from above query.
that is;
delete from [[dbo].ObservationData
where CompanyId in (select CompanyId,
row_number() over (partition by [GradeName] order by [TankNumber] ) rn
from [Data_DB].[dbo].[Company] where CompanyCode='ASAAA')
How can I modify above query?
Assuming you don't care which duplicate gets retained or deleted, you may try using a deletable CTE here:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [GradeName] ORDER BY [TankNumber]) rn
FROM [Data_DB].[dbo].[Company]
WHERE CompanyCode = 'ASAAA'
)
DELETE
FROM cte
WHERE rn > 1;
This answer arbitrarily retains the "first" duplicate, with first being defined as the record with the earliest row number.
delete from [[dbo].ObservationData
where CompanyId in (select CompanyId from (select CompanyId,
row_number() over (partition by [GradeName] order by [TankNumber] ) rn
from [Datat_DB].[dbo].[Company] where CompanyCode='ASAAA') a where rn > 1 ;

Delete duplicates but keep 1 with multiple column key

I have the following SQL select. How can I convert it to a delete statement so it keeps 1 of the rows but deletes the duplicate?
select s.ForsNr, t.*
from [testDeleteDublicates] s
join (
select ForsNr, period, count(*) as qty
from [testDeleteDublicates]
group by ForsNr, period
having count(*) > 1
) t on s.ForsNr = t.ForsNr and s.Period = t.Period
Try using following:
Method 1:
DELETE FROM Mytable WHERE RowID NOT IN (SELECT MIN(RowID) FROM Mytable GROUP BY Col1,Col2,Col3)
Method 2:
;WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY ForsNr, period
ORDER BY ( SELECT 0)) RN
FROM testDeleteDublicates)
DELETE FROM cte
WHERE RN > 1
Hope this helps!
NOTE:
Please change the table & column names according to your need!
This is easy as long as you have a generated primary key column (which is a good idea). You can simply select the min(id) of each duplicate group and delete everything else - Note that I have removed the having clause so that the ids of non-duplicate rows are also excluded from the delete.
delete from [testDeleteDublicates]
where id not in (
select Min(Id) as Id
from [testDeleteDublicates]
group by ForsNr, period
)
If you don't have an artificial primary key you may have to achieve the same effect using row numbers, which will be a bit more fiddly as their implementation varies from vendor to vendor.
You can do with 2 option.
Add primary-key and delete accordingly
http://www.mssqltips.com/sqlservertip/1103/delete-duplicate-rows-with-no-primary-key-on-a-sql-server-table/
'2. Use row_number() with partition option, runtime add row to each row and then delete duplicate row.
Removing duplicates using partition by SQL Server
--give group by field in partition.
;with cte(
select ROW_NUMBER() over( order by ForsNr, period partition ForsNr, period) RowNo , * from [testDeleteDublicates]
group by ForsNr, period
having count(*) > 1
)
select RowNo from cte
group by ForsNr, period

Delete field Duplicates from the Same table

I am writing this query to display a bunch of Names from a table filled automatically from an outside source:
select MAX(UN_ID) as [ID] , MAX(UN_Name) from UnavailableNames group by (UN_Name)
I have a lot of name duplicates, so I used "Group by"
I want to delete all the duplicates right after I do this select query..
(Delete where the field UN_Name is available twice, leave it once)
Any way to do this?
Something likes this should work:
WITH CTE AS
(
SELECT rn = ROW_NUMBER()
OVER(
PARTITION BY UN_Name
ORDER BY UN_ID ASC), *
FROM dbo.UnavailableNames
)
DELETE FROM cte
WHERE rn > 1
You basically assign an increasing "row number" within each group that shares the same "un_name".
Then you just delete all rows which have a "row number" higher than 1 and keep all the ones that appeared first.
With CTE As
(
Select uid,ROW_NUMBER() OVER( PARTITION BY uname order by uid) as rownum
From yourTable
)
Delete
From yourTable
where uid in (select uid from CTE where rownum> 1 )

How do I delete duplicate rows in SQL Server using the OVER clause?

Here are the columns in my table:
Id
EmployeeId
IncidentRecordedById
DateOfIncident
Comments
TypeId
Description
IsAttenIncident
I would like to delete duplicate rows where EmployeeId, DateOfIncident, TypeId and Description are the same - just to clarify - I do want to keep one of them. I think I should be using the OVER clause with PARTITION, but I am not sure.
Thanks
If you want to keep one row of the duplicate-groups you can use ROW_NUMBER. In this example i keep the row with the lowest Id:
WITH CTE AS
(
SELECT rn = ROW_NUMBER()
OVER(
PARTITION BY employeeid, dateofincident, typeid, description
ORDER BY Id ASC), *
FROM dbo.TableName
)
DELETE FROM cte
WHERE rn > 1
use this query without using CTE....
delete a from
(select id,name,place, ROW_NUMBER() over (partition by id,name,place order by id) row_Count
from dup_table) a
where a.row_Count >1
You can use the following query. This has an assumption that you want to keep the latest row and delete the other duplicates.
DELETE [YourTable]
FROM [YourTable]
LEFT OUTER JOIN (
SELECT MAX(ID) as RowId
FROM [YourTable]
GROUP BY EmployeeId, DateOfIncident, TypeId, Description
) as KeepRows ON
[YourTable].ID = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL