Finding duplicates with two similar columns and one distinct - sql

I am in a situation where I need to select rows that have the same content in two specific columns, AND distinct content in a third one. So far I got this for the two similar columns:
SELECT id, Title,
COUNT(*) AS NumOccurrences
FROM Table
GROUP BY id, Title
HAVING ( COUNT(*) > 1 )
I now need to specify a third distinct column in this query. Let's call it Ralph. This obviously does not work:
SELECT id, Title, DISTINCT Ralph,
COUNT(*) AS NumOccurrences
FROM Table
GROUP BY id, Title
HAVING ( COUNT(*) > 1 )
So what will?

select * from (
SELECT id, Title, COUNT(*) AS NumOccurrences
FROM Table t
GROUP BY id, Title
HAVING ( COUNT(*) > 1 )
) t
cross apply (
select distinct Ralph
from Table
where id = t.id and Title = t.Title
) t2

You can use COUNT(*) with OVER() clause
;WITH cte AS
(
SELECT id, Title, Ralph, COUNT(*) OVER (PARTITION BY id, Title) AS cnt
FROM dbo.test11
GROUP BY id, Title, Ralph
)
SELECT *
FROM cte
WHERE cnt > 1
Demo on SQLFiddle

Related

SQL Return only duplicate records

I want to return rows that have duplicate values in both Full Name and Address columns in SQL. So in the example, I would just want the first two rows return. How do I code this?
Why return duplicate values? Just aggregate and return the count:
select fullname, address, count(*) as cnt
from t
group by fullname, address
having count(*) >= 2;
One option uses window functions:
select *
from (
select t.*, count(*) over(partition by fullname, address) cnt
from mytable t
) t
where cnt > 1
If your table has a primary key, say id, you can also use exists:
select t.*
from mytable t
where exists (
select 1
from mytable t1
where t1.fullname = t.fullname and t1.address = t.address and t1.id <> t.id
)

How do I use MIN on a union column SQL

I'm having problems with using the MIN function in sql. I want to get a list of all the rows with the minimum value from my count function.
Here is my code:
SELECT land, MIN(count) as lowest
FROM
(
SELECT temp.land, count(*)
FROM
(
SELECT grans.land FROM Grans
UNION ALL
SELECT grans.aland FROM Grans
) as temp
GROUP BY land
ORDER BY land
) as subQuery
GROUP BY land
ORDER BY land
At the moment I just get a table listing land and count, although count is renamed to lowest.
I would use window functions:
SELECT land, cnt
FROM (SELECT temp.land, count(*) as cnt,
MIN(count(*)) OVER () as min_cnt
FROM (SELECT grans.land FROM Grans
UNION ALL
SELECT grans.aland FROM Grans
) temp
GROUP BY land
) l
WHERE cnt = min_cnt;
remove group by if you just want min because if you put group by it will return all the land count that you got in your sub-query, as in count it already made group and that is distinct
SELECT *
FROM
(
SELECT temp.land, count(*) as cnt
FROM
(
SELECT grans.land FROM Grans
UNION ALL
SELECT grans.aland FROM Grans
) as temp
GROUP BY land
ORDER BY land
) as subQuery
order by cnt asc
Limit 1
another way is
SELECT temp.land, count(*) as cnt
FROM
(
SELECT grans.land FROM Grans
UNION ALL
SELECT grans.aland FROM Grans
) as temp
GROUP BY land
having cnt in(
SELECT min(cnt)
FROM
(
SELECT temp.land, count(*) as cnt
FROM
(
SELECT grans.land FROM Grans
UNION ALL
SELECT grans.aland FROM Grans
) as temp
GROUP BY land
ORDER BY land
) as subQuery
)
and it also work
select * from
(
SELECT * ,row_number() over(partition by land order by cnt) as rn
FROM
(
SELECT temp.land, count(*) as cnt
FROM
(
SELECT grans.land FROM Grans
UNION ALL
SELECT grans.aland FROM Grans
) as temp
GROUP BY land
ORDER BY land
) as subQuery
) t where t.rn=1

Identify duplicates rows based on multiple columns

#SQL Experts,
I am trying to fetch duplicate records from SQL table where 1st Column and 2nd Column values are same but 3rd column values should be different.
Below is my table
ID NAME DEPT
--------------------
1 VRK CSE
1 VRK ECE
2 AME MEC
3 BMS CVL
From the above table , i am trying to fetch first 2 rows, below is the Query, suggest me why isn't give correct results.
SELECT A.ID, A.NAME, A.DEPT
FROM TBL A
INNER JOIN TBL B ON A.ID = B.ID
AND A.NAME = B.NAME
AND A.DEPT <> B.DEPT
Somehow I am not getting the expected results.
Your sample data does not make it completely clear what you want here. Assuming you want to target groups of records having duplicate first/second columns with all third column values being unique, then we may try:
SELECT ID, NAME, DEPT
FROM
(
SELECT ID, NAME, DEPT,
COUNT(*) OVER (PARTITION BY ID, NAME) cnt,
MIN(DEPT) OVER (PARTITION BY ID, NAME) min_dept,
MAX(DEPT) OVER (PARTITION BY ID, NAME) max_dept
FROM yourTable
) t
WHERE cnt > 1 AND min_dept = max_dept;
UPDATE
select *
from
(
select *,
COUNT(*) over (partition by id, [name]) cnt1,
COUNT(*) over (partition by id, [name], dept) cnt2
from dbo.T
) x
where x.cnt1 > 1 and x.cnt2 < x.cnt1;
For find duplicate column
select x.id, x.name, count(*)
from
(select distinct a.id, a.name, a.dept
from tab a) x
group by x.id, x.name
having count(*) > 1
If you want the original rows, I would just go for exists:
select t.*
from tbl t
where exists (select 1
from tbl t
where t2.id = t.id and t2.name = t.name and
t2.dept <> t.dept
);
If you just want the id/name pairs:
select t.id, t.name
from tbl t
group by t.id, t.name
having min(t.dept) <> max(t.dept);

Group by and Count to select repeated rows

I wrote this query but it does not work as I expected.
1st Goal: select rows that have repeated in certain columns and return whole columns.
2nd Goal: Update a flag (a column) to identify which records have repeated.
Could you please help me?
SELECT
*
FROM AvvalV2NS AS M
WHERE EXISTS
(SELECT
M.Astate,
M.Acity,
M.Azone,
M.Abvillage,
M.Avillage,
COUNT(*)
FROM AvvalV2NS AS M
GROUP BY M.Astate,
M.Acity,
M.Azone,
M.Abvillage,
M.Avillage
HAVING COUNT(*) > 1)
If you want to get the rows that are duplicated, window functions are probably the easiest way:
select a.*
from (select a.*,
count(*) over (partition by M.Astate, M.Acity, M.Azone, M.Abvillage, M.Avillage) as cnt
from AvvalV2NS a
) a
where cnt > 1;
You can update a flag by doing something like this:
with toupdate as (
select a.*
from (select a.*,
count(*) over (partition by M.Astate, M.Acity, M.Azone, M.Abvillage, M.Avillage) as cnt
from AvvalV2NS a
) a
)
update toupdate
set isduplicate = (case when cnt > 1 then 1 else 0 end);
Suppose your table have an id column:
SELECT * FROM THE_TABLE WHERE ID IN (
SELECT ID FROM
(SELECT ID, REPEATING_COLUMNS, COUNT(*) FROM THE_TABLE GROUP BY REPEATING_COLUMNS HAVING COUNT(*) > 1)
)
UPDATE THE_TABLE SET THE_FLAG = "HERE WE GO" WHERE ID IN (
SELECT ID FROM
(SELECT ID, REPEATING_COLUMNS, COUNT(*) FROM THE_TABLE GROUP BY REPEATING_COLUMNS HAVING COUNT(*) > 1)
)
Hope this helps.

SQL query: how to distinct count of a column group by another column

In my table I need to know if each ID has one and only one ID_name. How can I write such query?
I tried:
select ID, count(distinct ID_name) as count_name
from table
group by ID
having count_name > 1
But it takes forever to run.
Any thoughts?
select ID
from YourTable
group by
ID
having count(distinct ID_name) > 1
or
select *
from YourTable yt1
where exists
(
select *
from YourTable yt2
where yt1.ID = yt2.ID
and yt1.ID_Name <> yt2.ID_Name
)
Now, most ID columns are defined as primary key and are unique. So in a regular database you'd expect both queries to return an empty set.
select tt.ID,max(tt.myRank)
from
(
select
ip.ID,ip.ID_name,
ROW_Number() over (partition by ip.ID,ip.ID_nameorder by ip.ID) as myRank
from YourTable ip
) tt
group by tt.ID
This gives you every ID with it's total number of ID_Name
If you want only those ID's which have more than one name associated just add a where clause
e.g.
select tt.ID,max(tt.myRank)
from
(
select
ip.ID,ip.ID_name,
ROW_NUMBER() over (partition by ip.ID,ip.ID_nameorder by ip.ID) as myRank
from YourTable ip
) tt
**where tt.myRank > 1**
group by tt.ID