Delete duplicate rows not based on primary key - sql

I have this table in my database:
tblAgencies
----------------------
AgencyID (PK)
VendorID
RegionID
Name
Zip
Long story short, I accidentally copied my entire table into itself - so every row in my table has a duplicate.
But with my AgencyID field being the identity, and automatically incrementing, I need to find duplicates based on all the other fields, since AgencyID is unique.
Does anyone know how I can do this?

This will keep the oldest AgencyID values, and delete any duplicates otherwise.
;WITH x AS
(
SELECT *, rn = ROW_NUMBER() OVER
(PARTITION BY VendorID, RegionID, Name, Zip
ORDER BY AgencyID) FROM dbo.tblAgencies
)
DELETE x WHERE rn > 1;
Be careful, though; this may not work if other tables reference AgencyID and they've obtained any of your newer, erroneous values.

The simplest solution, use select distinct into a temp table, then reload the original

This query will give you duplicates provided that the combination of all other columns is unique:
select * from mytable t1
where exists
(select * from mytable t2
where t1.VendorID = t2.VendorID
and t1.RegionID = t2.RegionID
and and t1.Name = t2.Name
and t1.Zip = t2.Zip
and t1.AgencyID > t2.AgencyID)

This should give you all the rows that have duplicate values except for the minimum agencyid row.
select *
from tblAgencies
where AgencyID not in (select min(AgencyID)
from tblAgencies
group by VendorID, RegionID, Name, Zip)
edit: adding SQLFiddle

;with CTE
AS
(
SELECT ID_Column, rn = ROW_NUMBER() OVER (PARTITION BY Column1, Column2, Column3... ORDER BY ID ASC)
FROM T
)
DELETE FROM CTE
WHERE rn >= 2

;with CTE
AS
(SELECT MAX(AgencyID) AgentID,VendorID ,
RegionID ,
Name ,
Zip FROM tblAgencies
GROUP BY VendorID ,
RegionID ,
Name ,
Zip
HAVING COUNT(*) > 1)
DELETE FROM tblAgencies WHERE EXISTS (SELECT 1 FROM CTE
WHERE AgentID = tblAgencies.AgencyID)

Lots of answers that will give you what you want here, but there's no need to use a CTE or do any grouping, the simplest way is just:
delete t1
from tblAgencies t1
join tblAgencies t2
on t1.VendorId = t2.VendorId
and t1.RegionId = t2.RegionId
and t1.Name = t2.Name
and t1.Zip = t2.Zip
and t1.AgencyId > t2.AgencyId

Maybe this will help: How to delete duplicates in the presence of a primary key?

Related

PostgreSQL how to delete duplicated values

I have a table in my Postgres database where I forgot to insert a unique index. because of that index that i have now duplicated values. How to remove the duplicated values? I want to add a unique index on the fields translationset_Id and key.
I think you are asking for this:
DELETE FROM tablename
WHERE id IN (SELECT id
FROM (SELECT id,
ROW_NUMBER() OVER (partition BY column1, column2, column3 ORDER BY id) AS rnum
FROM tablename) t
WHERE t.rnum > 1);
It appears that you only want to delete records which are duplicate with regard to the translationset_id column. In this case, we can use Postgres' row number functionality to discern between duplicate rows, and then to delete those duplicates.
WITH cte AS
(
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY translationset_id, key) AS rnum
FROM yourTable t
)
DELETE FROM yourTable
WHERE translationset_id IN (SELECT translationset_id FROM cte WHERE rnum > 1)
I think the most efficient way to do this is below.
DELETE FROM
table_name a
USING table_name b
WHERE
a.id < b.id and
a.same_column = b.same_column;
delete from mytable
where exists (select 1
from mytable t2
where t2.name = mytable.name and
t2.address = mytable.address and
t2.zip = mytable.zip and
t2.ctid > mytable.ctid
);

How to get duplicate text values from SQL query

I have to get table only with duplicate text values using SQL query. I have used Having count(columnname) > 1 but I'm not getting result, only with duplicate values instead getting all values.
Can anyone suggest whether I have to add anything to my query?
Thanks.
Use the below query. mention the column which is getting duplicated in the patition by clause..
with CTE_1
AS
(SELECT *,COUNT(1) OVER(PARTITION BY LTRIM(RTRIM(REPLACE(yourDuplicateColumn,' ',''))) Order by -anycolunm- ) cnt
FROM YourTable
)
SELECT *
FROM CTE_1
WHERE cnt>1
Assuming id is a primary key
select *
from myTable t1
where exists (select 1
from myTable t2
where t2.text = t1.text and t2.id != t1.id)
You can use similar to following query:
SELECT
column1, COUNT(*)
FROM table
GROUP BY column1
HAVING COUNT(*) > 1

TSQL merge 2 dataset with even number of rows next to eachother

What I am trying to accomplish:
Dataset 1
Name1
Name2
Name3
Dataset 2
Number1
Number2
Number3
will become 2 columns:
dataset1 dataset2
Name1 Number1
Name2 Number2
Name3 Number3
My datasets 1 & 2 will always have equal rows.
Which name linked to which number I don't care as long as two names are not linked to the same number and vice versa.
How can I solve this with SQL / SQL Server ?
If you don't want to add an identity column to the tables, you can use the ROW_NUMBER() function like this:
SELECT
T1.Col1,
T2.Col1
FROM
(SELECT Col1, ROW_NUMBER() OVER (ORDER BY Col1) AS N FROM Table1) T1
INNER JOIN
(SELECT Col1, ROW_NUMBER() OVER (ORDER BY Col1) AS N FROM Table2) T2
ON T1.N = T2.N
Here, replace Table1 and Table2 with the name of your tables, and replace Col1 with the name of the column (or columns) that you want to output from the two tables.
Add identity columns to both the tables and perform join on basis of these column
ALTER TABLE Table1
ADD ID INT IDENTITY(1,1) NOT NULL
ALTER TABLE Table2
ADD ID INT IDENTITY(1,1) NOT NULL
SELECT Table1.dataset1col , Table2.dataset2Col
From Table1 INNER JOIN Table2
ON Table1.ID = Table2.ID
This may work for you :
;WITH cte1 (name, rn)
AS (SELECT Name,
row_number()
OVER(
ORDER BY Name) rn
FROM Dataset1),
cte2 (Number, rn)
AS (SELECT Number,
row_number()
OVER(
ORDER BY Number) rn
FROM Dataset2)
SELECT name,
Number
FROM cte1
JOIN cte2
ON cte1.rn = cte2.rn
WITH Table1 AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Dataset1) as Rnk,Dataset1
FROM TA1
)
With Table2 AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Dataset2) as Rnk, Dataset2
FROM TA2
)
Select Table.Dataset1 as 'DataSet1', Table2.DataSet2 as 'DataSet2'
From Table1
inner join Table2 on Table1.Rnk = Table2.Rnk
Because you haven't added table name so I considered it as TA1 and TA2.
Another way of writing the query is:
select row_number() over (order by Names asc) as rownum,
Names
into #Temp1
from NameTable
select row_number() over (order by Numbers asc) as rownum,
Numbers
into #Temp2
from NumberTable
select Names, Numbers
from #Temp1
inner join #Temp2 on #Temp1.rownum = #Temp2.rownum
Demo
There are 3 possible solutions to this.
First: Use following trick (Warning: Use this in case of small datasets)
SELECT DISTINCT tbl1.col1, tbl2.col2
FROM
(SELECT FirstName AS col1, ROW_NUMBER() OVER (ORDER BY FirstName) Number FROM dbo.User) tbl1
INNER JOIN
(SELECT LastName AS col2, ROW_NUMBER() OVER (ORDER BY LastName) Number FROM dbo.User) tbl2
ON tbl1.Number = tbl2.Number
Second: Use variable tables to store result temporarily. This solution is for relatively large datasets. (approx records to 100s)
Third:
Use identitfy field in both tables as already mentioned by mmhasannn. But i will prefer this method least, as we need to modify our DB structure.
RECOMMENDED: Use variable tables approach

SQL query: how to distinct count of a column group by another column

In my table I need to know if each ID has one and only one ID_name. How can I write such query?
I tried:
select ID, count(distinct ID_name) as count_name
from table
group by ID
having count_name > 1
But it takes forever to run.
Any thoughts?
select ID
from YourTable
group by
ID
having count(distinct ID_name) > 1
or
select *
from YourTable yt1
where exists
(
select *
from YourTable yt2
where yt1.ID = yt2.ID
and yt1.ID_Name <> yt2.ID_Name
)
Now, most ID columns are defined as primary key and are unique. So in a regular database you'd expect both queries to return an empty set.
select tt.ID,max(tt.myRank)
from
(
select
ip.ID,ip.ID_name,
ROW_Number() over (partition by ip.ID,ip.ID_nameorder by ip.ID) as myRank
from YourTable ip
) tt
group by tt.ID
This gives you every ID with it's total number of ID_Name
If you want only those ID's which have more than one name associated just add a where clause
e.g.
select tt.ID,max(tt.myRank)
from
(
select
ip.ID,ip.ID_name,
ROW_NUMBER() over (partition by ip.ID,ip.ID_nameorder by ip.ID) as myRank
from YourTable ip
) tt
**where tt.myRank > 1**
group by tt.ID

Send faulty rows to other table

I have a table with many columns in which I have to find the duplicate based on one column.
I.e. if I found duplicate customer_name in the Customer_name then
I have to remove all repeating from the source table.
Send all those rows to other table with same structure.
If you have two tables like this:
CREATE TABLE t1 (ID int, customerName varchar(64))
CREATE TABLE t2 (ID int, customerName varchar(64))
You can make something like this: (The ID column is for just to have a base for the deceision what to keep, you can change it as you need)
--First Copy
WITH CTE_T1
AS
(
SELECT
ID,
customerName,
ROW_NUMBER() OVER(PARTITION BY customerName ORDER BY ID) as OrderOfCustomer
FROM
t1
)
INSERT INTO t2
SELECT ID, customerName FROM cte_T1
WHERE OrderOfCustomer > 1;
--Then Delete
WITH CTE_T1
AS
(
SELECT
ID,
customerName,
ROW_NUMBER() OVER(PARTITION BY customerName ORDER BY ID) as OrderOfCustomer
FROM
t1
)
DELETE FROM CTE_T1
WHERE OrderOfCustomer > 1
Here is an SQLFiddle to show how it works.
I guess each row has a unique Id primary key.
This inserts into your duplicate rows table :
Insert into duplicateRowsTable
select * from myTable t1
where (select count(*) from myTable t2 where t1.customerId = t2.customerId) > 1
You delete from the duplicateRowsTable the good rows:
delete from duplicatesTable
where --this is not the faulty row for each customerId
finally you delete from your first table :
delete from myTable
where id IN (select id from duplicatesTable)
Try this:
For moving duplicates
INSERT Into DuplicatesTable
SELECT *
FROM
(SELECT *, ROW_NUMBER() OVER(PARTITION BY Customer_name ORDER BY Customer_name) As RowID,
FROM SourceTable) as temp
WHERE RowID > 1
For deteting:
WITH TableCTE
AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY Customer_name ORDER BY Customer_name) AS RowID
FROM SourceTable
)
DELETE
FROM TableCTE
WHERE RowID> 1