Find duplicate ID's with different fields - sql

I have a table that contain UserID's and Departments. UserID's can belong to several departments so their combo makes it unique.
However I have been trying to query trying to find where the UserID belongs to either one of two departments (hr or customer).
SELECT UserId, Dept, COUNT(*) Total
FROM MyTable
GROUP BY UserID
HAVING COUNT(*) = 1
However this still brings back duplicates if a UserId has both departments I guess because the combo makes it a unique record.
What I get back is this
UserID | Department | Total
1 hr 1
2 customer 1
3 customer 1
1 customer 1
3 hr 1
But what I am trying to get back is this
UserID | Department | Total
2 customer 1
Where any instances of UserId belonging to both departments are not included only if they belong to one or the other.

This should do the job
select t1.UserId, Dept, t2.Total
FROM MyTable t1
INNER JOIN
(
SELECT UserId, COUNT(*) Total
FROM Table1
GROUP BY UserID
HAVING COUNT(*) = 1
) t2 on t1.UserId = t2.UserId

try
SELECT UserId, COUNT(distinct Dept) Total
FROM MyTable
GROUP BY UserID
HAVING COUNT(distinct Dept) = 1

You can try something like this
select UserId, Dept
FROM Mytable where UserId in
(
SELECT UserId, COUNT(*) Total
FROM MyTable
GROUP BY UserID
HAVING COUNT(*) = 1
)
If you add the Dept column in your Group by you will always retrieve all the combinations.
So you have to select all users with only one Dept and then retrieve the additional informations
There is maybe syntax errors because I am usually working with oracle but I think the concept is correct

This will select users in 'hr' that do not have id's in 'customer' and then users in 'customer' that do not have ids in 'hr'.
I didn't include count on purpose as it can always only be one.
SELECT * FROM [MyTable] T WHERE Department = 'hr' AND NOT EXISTS (SELECT 1 FROM [MyTable] WHERE [MyTable].UserID =T.UserID AND Department = 'customer') UNION
SELECT * FROM [MyTable] T WHERE Department = 'customer' AND NOT EXISTS (SELECT 1 FROM [MyTable] WHERE [MyTable].UserID =T.UserID AND Department = 'hr')

I do not think the above query will run as Dept is not in group by clause and also count comparsion cannot be in that way.
Coming to your issue:
Select userid, Dept
from #temp
where userid in (Select userid from (Select userid,count(*) 'C'
from #temp group by userid ) u where u.c=1)
I do not think you require count as it is always 1

It's a fake question because you can't use Dept in this query.
I'm don't understand why author doesn't explain us about any compiler warnings and tells us some results of this query.
but you can yse this way if you really want to get some result:
SELECT UserId, min(Dept) as Dept
FROM MyTable
GROUP BY UserID
HAVING COUNT(*) = 1

The problem is the COUNT (*) at the end.
Try this:
SELECT UserId, Dept, COUNT(*) Total
FROM MyTable
GROUP BY UserID
HAVING COUNT(UserId) = 1

Related

Select Person with only one value in the column

I can't correctly write a query.
I need to select a person who has only one value in the column. For example,
select * (
select PersonID, sum(TotalAmount)
from Table1
group by PersonID
HAVING sum(TotalAmount) = 0 )
where Group = A
It means that I would select all customers that belong to ONLY 'A' group...
Could someone help me?
If you want the persons with only one value, then having count(*) = 1 comes to mind:
select personid
from table1
group by personid
having count(*) = 1;

Display multiple count values on joined tables

I am trying to get a count of the userID's that exist in 2 tables,
I also have an inner join for the tables on the userID so that I can search for the count of one table, based off the BadgeID from the other table.
So far it returns the user count for one table, but I can't seem to figure out how to return a count for userID on both tables, Any suggestions?
SELECT DISTINCT live_event.usid, COUNT(1) AS membercount
FROM live_event
INNER JOIN member_badges ON live_event.usid = member_badges.usid
WHERE bdgid = 14
Not sure what you are looking for; but based on what I could understand, are sub-queries perhaps the route you are wanting to go?
select
(select count(*) from firsttable where userid = {userid})
as count1,
(select count(*) from secondtable where userid = {userid})
as count2
Where {userid} would be your bdgid.
Try to use union select to connect results from two tables
select useid, count(*) from (
select useid from firsttable
union select useid from second table ) group by useid
Try this:
SELECT
user_id,
COALESCE(MAX(
CASE
when tbl_name='live_event' then cnt_records
else null
END), 0) as cnt_live_event,
COALESCE(MAX(CASE
when tbl_name='member_badges' then cnt_records
else null
END), 0) as cnt_member_badges
FROM
(
SELECT
'live_event' AS tbl_name,
usid as user_id,
count(*) AS cnt_records
FROM live_event
GROUP BY 'live_event', usid
UNION
SELECT
'member_badges',
usid as user_id,
count(*)
FROM member_badges
GROUP BY 'member_badges', usid
) grouped_counts
GROUP BY user_id
ORDER BY user_id;
The above query would get you the user counts even if the usid is present in only one of the tables.

grouping and aggregates with subqueries

I have a query that is designed to find the number of people who went to a hospital more than once. What I have works, but is there a way to do it without the subquery?
SELECT count(*) as counts, hospitals.hospitalname
FROM Patient INNER JOIN
hospitals ON Patient.hospitalnpi = hospitals.npi
WHERE (hospitals.hospitalname = 'X')
group by patientid, hospitalname
having count(patient.patientid) >1
order by count(*) desc
This will always return the number of correct rows (30), but not the number 30. If I remove the group by patientid then I get the entire result set returned.
I solved this problem by doing
select COUNT(*),hospitalname
from
(
SELECT count(*) as counts,hospitals.hospitalname
FROM hospitals INNER JOIN
Patient ON hospitals.npi = Patient.hospitalnpi
group by patientid, hospitals.hospitalname
having count(patient.patientid) >1
) t
group by t.hospitalname
order by t.hospitalname desc
I feel that there has to be a more elegant solution than using subqueries all the time. How could this be improved?
sample data from first query
row # revisits
1 2
2 2
3 2
4 2
same data from second, working query
row# hosp. name revisitAggregate
1 x 30
2 y 15
3 z 5
Simple one-to-many relationship between patient and hospitals
It's super hacky, but here you are:
SELECT TOP 1
ROW_NUMBER() OVER (order by patient.patientid) as Count
FROM
Patient
INNER JOIN hospitals
ON Patient.hospitalnpi = hospitals.npi
WHERE
(hospitals.hospitalname = 'X')
GROUP BY
patientid,
hospitalname
HAVING
count(patient.patientid) >1
ORDER BY
Count desc
select distinct hospitalname, count(*) over (partition by hospitalname) from (
SELECT hospitalname, count(*) over (partition by patientid,
hospitals.hospitalname) as counter
FROM hospitals INNER JOIN
Patient ON hospitals.npi = Patient.hospitalnpi
WHERE (hospitals.hospitalname = 'X')
) Z
where counter > 1

Finding duplicate rows in SQL Server

I have a SQL Server database of organizations, and there are many duplicate rows. I want to run a select statement to grab all of these and the amount of dupes, but also return the ids that are associated with each organization.
A statement like:
SELECT orgName, COUNT(*) AS dupes
FROM organizations
GROUP BY orgName
HAVING (COUNT(*) > 1)
Will return something like
orgName | dupes
ABC Corp | 7
Foo Federation | 5
Widget Company | 2
But I'd also like to grab the IDs of them. Is there any way to do this? Maybe like a
orgName | dupeCount | id
ABC Corp | 1 | 34
ABC Corp | 2 | 5
...
Widget Company | 1 | 10
Widget Company | 2 | 2
The reason being that there is also a separate table of users that link to these organizations, and I would like to unify them (therefore remove dupes so the users link to the same organization instead of dupe orgs). But I would like part manually so I don't screw anything up, but I would still need a statement returning the IDs of all the dupe orgs so I can go through the list of users.
select o.orgName, oc.dupeCount, o.id
from organizations o
inner join (
SELECT orgName, COUNT(*) AS dupeCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName
You can run the following query and find the duplicates with max(id) and delete those rows.
SELECT orgName, COUNT(*), Max(ID) AS dupes
FROM organizations
GROUP BY orgName
HAVING (COUNT(*) > 1)
But you'll have to run this query a few times.
You can do it like this:
SELECT
o.id, o.orgName, d.intCount
FROM (
SELECT orgName, COUNT(*) as intCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) AS d
INNER JOIN organizations o ON o.orgName = d.orgName
If you want to return just the records that can be deleted (leaving one of each), you can use:
SELECT
id, orgName
FROM (
SELECT
orgName, id,
ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY id) AS intRow
FROM organizations
) AS d
WHERE intRow != 1
Edit: SQL Server 2000 doesn't have the ROW_NUMBER() function. Instead, you can use:
SELECT
o.id, o.orgName, d.intCount
FROM (
SELECT orgName, COUNT(*) as intCount, MIN(id) AS minId
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) AS d
INNER JOIN organizations o ON o.orgName = d.orgName
WHERE d.minId != o.id
You can try this , it is best for you
WITH CTE AS
(
SELECT *,RN=ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY orgName DESC) FROM organizations
)
select * from CTE where RN>1
go
The solution marked as correct didn't work for me, but I found this answer that worked just great: Get list of duplicate rows in MySql
SELECT n1.*
FROM myTable n1
INNER JOIN myTable n2
ON n2.repeatedCol = n1.repeatedCol
WHERE n1.id <> n2.id
If you want to delete duplicates:
WITH CTE AS(
SELECT orgName,id,
RN = ROW_NUMBER()OVER(PARTITION BY orgName ORDER BY Id)
FROM organizations
)
DELETE FROM CTE WHERE RN > 1
select * from [Employees]
For finding duplicate Record
1)Using CTE
with mycte
as
(
select Name,EmailId,ROW_NUMBER() over(partition by Name,EmailId order by id) as Duplicate from [Employees]
)
select * from mycte
2)By Using GroupBy
select Name,EmailId,COUNT(name) as Duplicate from [Employees] group by Name,EmailId
Select * from (Select orgName,id,
ROW_NUMBER() OVER(Partition By OrgName ORDER by id DESC) Rownum
From organizations )tbl Where Rownum>1
So the records with rowum> 1 will be the duplicate records in your table. ‘Partition by’ first group by the records and then serialize them by giving them serial nos.
So rownum> 1 will be the duplicate records which could be deleted as such.
select column_name, count(column_name)
from table_name
group by column_name
having count (column_name) > 1;
Src : https://stackoverflow.com/a/59242/1465252
select a.orgName,b.duplicate, a.id
from organizations a
inner join (
SELECT orgName, COUNT(*) AS duplicate
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) b on o.orgName = oc.orgName
group by a.orgName,a.id
select orgname, count(*) as dupes, id
from organizations
where orgname in (
select orgname
from organizations
group by orgname
having (count(*) > 1)
)
group by orgname, id
You have several way for Select duplicate rows.
for my solutions , first consider this table for example
CREATE TABLE #Employee
(
ID INT,
FIRST_NAME NVARCHAR(100),
LAST_NAME NVARCHAR(300)
)
INSERT INTO #Employee VALUES ( 1, 'Ardalan', 'Shahgholi' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 4, 'name3', 'lname3' );
First solution :
SELECT DISTINCT *
FROM #Employee;
WITH #DeleteEmployee AS (
SELECT ROW_NUMBER()
OVER(PARTITION BY ID, First_Name, Last_Name ORDER BY ID) AS
RNUM
FROM #Employee
)
SELECT *
FROM #DeleteEmployee
WHERE RNUM > 1
SELECT DISTINCT *
FROM #Employee
Secound solution : Use identity field
SELECT DISTINCT *
FROM #Employee;
ALTER TABLE #Employee ADD UNIQ_ID INT IDENTITY(1, 1)
SELECT *
FROM #Employee
WHERE UNIQ_ID < (
SELECT MAX(UNIQ_ID)
FROM #Employee a2
WHERE #Employee.ID = a2.ID
AND #Employee.FIRST_NAME = a2.FIRST_NAME
AND #Employee.LAST_NAME = a2.LAST_NAME
)
ALTER TABLE #Employee DROP COLUMN UNIQ_ID
SELECT DISTINCT *
FROM #Employee
and end of all solution use this command
DROP TABLE #Employee
i think i know what you need
i needed to mix between the answers and i think i got the solution he wanted:
select o.id,o.orgName, oc.dupeCount, oc.id,oc.orgName
from organizations o
inner join (
SELECT MAX(id) as id, orgName, COUNT(*) AS dupeCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName
having the max id will give you the id of the dublicate and the one of the original which is what he asked for:
id org name , dublicate count (missing out in this case)
id doublicate org name , doub count (missing out again because does not help in this case)
only sad thing you get it put out in this form
id , name , dubid , name
hope it still helps
Suppose we have table the table 'Student' with 2 columns:
student_id int
student_name varchar
Records:
+------------+---------------------+
| student_id | student_name |
+------------+---------------------+
| 101 | usman |
| 101 | usman |
| 101 | usman |
| 102 | usmanyaqoob |
| 103 | muhammadusmanyaqoob |
| 103 | muhammadusmanyaqoob |
+------------+---------------------+
Now we want to see duplicate records
Use this query:
select student_name,student_id ,count(*) c from student group by student_id,student_name having c>1;
+---------------------+------------+---+
| student_name | student_id | c |
+---------------------+------------+---+
| usman | 101 | 3 |
| muhammadusmanyaqoob | 103 | 2 |
+---------------------+------------+---+
I got a better option to get the duplicate records in a table
SELECT x.studid, y.stdname, y.dupecount
FROM student AS x INNER JOIN
(SELECT a.stdname, COUNT(*) AS dupecount
FROM student AS a INNER JOIN
studmisc AS b ON a.studid = b.studid
WHERE (a.studid LIKE '2018%') AND (b.studstatus = 4)
GROUP BY a.stdname
HAVING (COUNT(*) > 1)) AS y ON x.stdname = y.stdname INNER JOIN
studmisc AS z ON x.studid = z.studid
WHERE (x.studid LIKE '2018%') AND (z.studstatus = 4)
ORDER BY x.stdname
Result of the above query shows all the duplicate names with unique student ids and number of duplicate occurances
Click here to see the result of the sql
/*To get duplicate data in table */
SELECT COUNT(EmpCode),EmpCode FROM tbl_Employees WHERE Status=1
GROUP BY EmpCode HAVING COUNT(EmpCode) > 1
I use two methods to find duplicate rows.
1st method is the most famous one using group by and having.
2nd method is using CTE - Common Table Expression.
As mentioned by #RedFilter this way is also right. Many times I find CTE method is also useful for me.
WITH TempOrg (orgName,RepeatCount)
AS
(
SELECT orgName,ROW_NUMBER() OVER(PARTITION by orgName ORDER BY orgName)
AS RepeatCount
FROM dbo.organizations
)
select t.*,e.id from organizations e
inner join TempOrg t on t.orgName= e.orgName
where t.RepeatCount>1
In the example above we collected the result by finding repeat occurrence using ROW_NUMBER and PARTITION BY. Then we applied where clause to select only rows which are on repeat count more than 1. All the result is collected CTE table and joined with Organizations table.
Source : CodoBee
Try
SELECT orgName, id, count(*) as dupes
FROM organizations
GROUP BY orgName, id
HAVING count(*) > 1;

SQL query to return only 1 record per group ID

I'm looking for a way to handle the following scenario. I have a database table that I need to return only one record for each "group id" that is contained within the table, furthermore the record that is selected within each group should be the oldest person in the household.
ID Group ID Name Age
1 134 John Bowers 37
2 134 Kerri Bowers 33
3 135 John Bowers 44
4 135 Shannon Bowers 42
So in the sample data provided above I would need ID 1 and 3 returned, as they are the oldest people within each group id.
This is being queried against a SQL Server 2005 database.
SELECT t.*
FROM (
SELECT DISTINCT groupid
FROM mytable
) mo
CROSS APPLY
(
SELECT TOP 1 *
FROM mytable mi
WHERE mi.groupid = mo.groupid
ORDER BY
age DESC
) t
or this:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY groupid ORDER BY age DESC) rn
FROM mytable
) x
WHERE x.rn = 1
This will return at most one record per group even in case of ties.
See this article in my blog for performance comparisons of both methods:
SQL Server: Selecting records holding group-wise maximum
Use:
SELECT DISTINCT
t.groupid,
t.name
FROM TABLE t
JOIN (SELECT t.groupid,
MAX(t.age) 'max_age'
FROM TABLE t
GROUP BY t.groupid) x ON x.groupid = t.groupid
AND x.max_age = t.age
So what if there's 2+ people with the same age for a group? It'd be better to store the birthdate rather than age - you can always calculate the age for presentation.
Try this (assuming Group is synonym for Household)
Select * From Table t
Where Age = (Select Max(Age)
From Table
Where GroupId = t.GroupId)
If there are two or more "oldest" people in some household (They all are the same age and there is noone else older), then this will return all of them, not just one at random.
If this is an issue, then you need to add another subquery to return an arbitrary key value for one person in that set.
Select * From Table t
Where Id =
(Select Max(Id) Fom Table
Where GroupId = t.GroupId
And Age =
(Select(Max(Age) From Table
Where GroupId = t.GroupId))
SELECT GroupID, Name, Age
FROM table
INNER JOIN
(
SELECT GroupID, MAX(Age) AS OLDEST
FROM table
) AS OLDESTPEOPLE
ON
table.GroupID = OLDESTPEOPLE.GroupID
AND
table.Age = OLDESTPEOPLE.OLDEST
SELECT GroupID, Name, Age
FROM table
INNER JOIN
(
SELECT GroupID, MAX(Age) AS OLDEST
FROM table
**GROUP BY GroupID**
) AS OLDESTPEOPLE
ON
table.GroupID = OLDESTPEOPLE.GroupID
AND
table.Age = OLDESTPEOPLE.OLDEST