How to return all names that appear multiple times in table [duplicate] - sql

This question already has answers here:
What's the SQL query to list all rows that have 2 column sub-rows as duplicates?
(10 answers)
Closed last year.
Suppose I have the following schema:
student(name, siblings)
The related table has names and siblings. Note the number of rows of the same name will appear the same number of times as the number of siblings an individual has. For instance, a table could be as follows:
Jack, Lucy
Jack, Tim
Meaning that Jack has Lucy and Tim as his siblings.
I want to identify an SQL query that reports the names of all students who have 2 or more siblings. My attempt is the following:
select name
from student
where count(name) >= 1;
I'm not sure I'm using count correctly in this SQL query. Can someone please help with identifying the correct SQL query for this?

You're almost there:
select name
from student
group by name
having count(*) > 1;
HAVING is a where clause that runs after grouping is done. In it you can use things that a grouping would make available (like counts and aggregations). By grouping on the name and counting (filtering for >1, if you want two or more, not >=1 because that would include 1) you get the names you want..
This will just deliver "Jack" as a single result (in the example data from the question). If you then want all the detail, like who Jack's siblings are, you can join your grouped, filtered list of names back to the table:
select *
from
student
INNER JOIN
(
select name
from student
group by name
having count(*) > 1
) morethanone ON morethanone.name = student.name
You can't avoid doing this "joining back" because the grouping has thrown the detail away in order to create the group. The only way to get the detail back is to take the name list the group gave you and use it to filter the original detail data again
Full disclosure; it's a bit of a lie to say "can't avoid doing this": SQL Server supports something called a window function, which will effectively perform a grouping in the background and join it back to the detail. Such a query would look like:
select student.*, count(*) over(partition by name) n
from student
And for a table like this:
jack, lucy
jack, tim
jane, bill
jane, fred
jane, tom
john, dave
It would produce:
jack, lucy, 2
jack, tim, 2
jane, bill, 3
jane, fred, 3
jane, tom, 3
john, dave, 1
The rows with jack would have 2 on because there are two jack rows. There are 3 janes, there is 1 john. You could then wrap all that in a subquery and filter for n > 1 which would remove john
select *
from
(
select student.*, count(*) over(partition by name) n
from student
) x
where x.n > 1
If SQL Server didn't have window functions, it would look more like:
select *
from
student
INNER JOIN
(
select name, count(*) as n
from student
group by name
) x ON x.name = student.name
The COUNT(*) OVER(PARTITION BY name) is like a mini "group by name and return the count, then auto join back to the main detail using the name as key" i.e. a short form of the latter query

You can do:
select name
from student as s1
where exists (
select s2
from student as s2
where s1.name = s2.name and s1.siblings != s2.siblings
)

I think the best approach is what 'Caius Jard' mentioned. However, additional way if you want to get how many siblings each name has .
SELECT name, COUNT(*) AS Occurrences
FROM student
GROUP BY name
HAVING (COUNT(*) > 1)

I wanted to share another solution I came up with:
select s1.name
from student s1, student s2
where s1.name = s2.name and s1.sibling != s2.sibling;

Related

SQL - Removing Duplicate without 'hard' coding?

Heres my scenario.
I have a table with 3 rows I want to return within a stored procedure, rows are email, name and id. id must = 3 or 4 and email must only be per user as some have multiple entries.
I have a Select statement as follows
SELECT
DISTINCT email,
name,
id
from table
where
id = 3
or id = 4
Ok fairly simple but there are some users whose have entries that are both 3 and 4 so they appear twice, if they appear twice I want only those with ids of 4 remaining. I'll give another example below as its hard to explain.
Table -
Email Name Id
jimmy#domain.com jimmy 4
brian#domain.com brian 4
kevin#domain.com kevin 3
jimmy#domain.com jimmy 3
So in the above scenario I would want to ignore the jimmy with the id of 3, any way of doing this without hard coding?
Thanks
SELECT
email,
name,
max(id)
from table
where
id in( 3, 4 )
group by email, name
Is this what you want to achieve?
SELECT Email, Name, MAX(Id) FROM Table WHERE Id IN (3, 4) GROUP BY Email;
Sometimes using Having Count(*) > 1 may be useful to find duplicated records.
select * from table group by Email having count(*) > 1
or
select * from table group by Email having count(*) > 1 and id > 3.
The solution provided before with the select MAX(ID) from table sounds good for this case.
This maybe an alternative solution.
What RDMS are you using? This will return only one "Jimmy", using RANK():
SELECT A.email, A.name,A.id
FROM SO_Table A
INNER JOIN(
SELECT
email, name,id,RANK() OVER (Partition BY name ORDER BY ID DESC) AS COUNTER
FROM SO_Table B
) X ON X.ID = A.ID AND X.NAME = A.NAME
WHERE X.COUNTER = 1
Returns:
email name id
------------------------------
jimmy#domain.com jimmy 4
brian#domain.com brian 4
kevin#domain.com kevin 3

SQL Server - Possible Pivot Solution?

I have a simple enough issue that has been surprisingly difficult to locate online. Perhaps I am searching on improper keywords so I wanted to stop in and ask you guys because your site has been a blessing with my studies. See below scenario:
Select student, count(*) as Total, (the unknown variable: book1, book2, book3, book4, ect...) from mystudies.
Essentially all I would like to do is list out all books for a unique student id that matches the Total count. Could someone point me in the right direction, a good read or anything, so I can get a step going in the correct direction? I am assuming it would be done via a left join (not sure how to do the x1, x2, x3 part) and then just link the two by the unique student id number (no duplicates) but everyone online points to pivot but pivot appears to put all the rows into columns instead of one single column. SQL server 2005 is the platform of choice.
Thanks!
Sorry
The following query produces my unique id (the student) and the student's count for all duplicate entries in the table:
select student, count(*) as Total
from mystudies
group by student order by total desc
the part I don't know is how to create the left join on the table unique id (boookid)
select mystudies1.student, mystudies1.total, mystudies2.bookid
from ( select student, count(*) as Total
from mystudies
group by student
) mystudies1
left join
( select student, bookid
from mystudies
) mystudies2
on mystudies1.student=mystudies2.student
order by mystudies1.total desc, mystudies1.student asc
Obviously the above row will produce results similar to the following:
Student Total BookID
000001 3 100001
000001 3 100002
000001 3 100003
000002 2 200001
000002 2 200002
000003 1 300001
But what I actually want is something similar to the following:
Student Total BookID
000001 3 100001, 100002, 100003
000002 2 200001, 200002
000003 1 300001
I assumed it had to be done in a left join so that it didn't alter the actual count being performed on the student. thanks!
In SQL-Server use the FOR XML Path Method:
SELECT Student,
Total,
STUFF(( SELECT ', ' + BookID
FROM MyStudies books
WHERE Books.Student = MyStudies.Student
FOR XML PATH(''), TYPE
).value('.', 'VARCHAR(MAX)'), 1, 2, '') AS Books
FROM ( SELECT Student, COUNT(*) AS Total
FROM myStudies
GROUP BY Student
) MyStudies
I have previously given a full explanation of how the XML PATH Method works here. With a further improvement to my answer pointed out here
SQL Server Fiddle
In MySQL AND SQLite you can use the GROUP_CONCAT function:
SELECT Student,
COUNT(*) AS Total,
GROUP_CONCAT(BookID) AS Books
FROM myStudies
GROUP BY Student
MySQL Fiddle
SQLite Fiddle
In Postgresql you can use the ARRAY_AGG Function:
SELECT Student,
COUNT(*) AS Total,
ARRAY_AGG(BookID) AS Books
FROM myStudies
GROUP BY Student
Postgresql Fiddle
In oracle you can use the LISTAGG Function
SELECT Student,
COUNT(*) AS Total,
LISTAGG(BookID, ', ') WITHIN GROUP (ORDER BY BookID) AS Books
FROM myStudies
GROUP BY Student
Oracle SQL Fiddle

how to avoid duplicate on Joining two tables

Student Table
SID Name
1 A
2 B
3 C
Marks Table
id mark subject
1 50 physics
2 40 biology
1 50 chemistry
3 30 mathematics
SELECT distinct(std.id),std.name,m.mark, row_number() over() as rownum FROM
student std JOIN marks m ON std.id=m.id AND m.mark=50
This result is 2 times A even after using disticnt . My expected result will have only one A. if i remove row_number() over() as rownum its working fine. Why this is happening ? how to resolve. AM using DB2!!
There are two rows in marks Table with id = 1 and mark = 50.. So you will get two rows in the output for each row in student table...
If you only want one, you have to do a group By
SELECT std.id, std.name, m.mark, row_number()
over() as rownum
FROM student std
JOIN marks m
ON m.id=std.id AND m.mark=50
Group By std.id, std.name, m.mark
Now that you've clarified your question as:
I want to find all students with a mark of 50 in at least one subject. I would use the query:
SELECT student.id, '50'
FROM student
WHERE EXISTS (SELECT 1 FROM marks WHERE marks.id = student.id AND marks.mark = 50)
This also gives you flexibility to change the criteria, e.g. at least one mark of 50 or less.
Similar to Charles answer, but you always want to put the predicate (mark=50) in the WHERE clause, so you're filtering before joining. If this is just homework it might not matter but you'll want to remember this if you ever hit any real data.
SELECT std.sid,
std.name,
m.mark,
row_number() over() AS rownum
FROM student std
JOIN marks m
ON std.sid=m.id
WHERE m.mark=50
GROUP BY std.sid, std.name, m.mark

write a query to identify discrepancy

I have a table with Student ID's and Student Names. There has been issues with assigning unique Student Id's to students and Hence I want to find the duplicates
Here is the sample Table:
Student ID Student Name
1 Jack
1 John
1 Bill
2 Amanda
2 Molly
3 Ron
4 Matt
5 James
6 Kathy
6 Will
Here I want a third column "Duplicate_Count" to display count of duplicate records.
For e.g. "Duplicate_Count" would display "3" for Student ID = 1 and so on. How can I do this?
Thanks in advance
Select StudentId, Count(*) DupCount
From Table
Group By StudentId
Having Count(*) > 1
Order By Count(*) desc,
Select
aa.StudentId, aa.StudentName, bb.DupCount
from
Table as aa
join
(
Select StudentId, Count(*) as DupCount from Table group by StudentId
) as bb
on aa.StudentId = bb.StudentId
The virtual table gives the count for each StudentId, this is joined back to the original table to add the count to each student record.
If you want to add a column to the table to hold dupcount, this query can be used in an update statement to update that column in the table
This should work:
update mytable
set duplicate_count = (select count(*) from mytable t where t.id = mytable.id)
UPDATE:
As mentioned by #HansUp, adding a new column with the duplicate count probably doesn't make sense, but that really depends on what the OP originally thought of using it for. I'm leaving the answer in case it is of help for someone else.

Sql COALESCE entire rows?

I just learned about COALESCE and I'm wondering if it's possible to COALESCE an entire row of data between two tables? If not, what's the best approach to the following ramblings?
For instance, I have these two tables and assuming that all columns match:
tbl_Employees
Id Name Email Etc
-----------------------------------
1 Sue ... ...
2 Rick ... ...
tbl_Customers
Id Name Email Etc
-----------------------------------
1 Bob ... ...
2 Dan ... ...
3 Mary ... ...
And a table with id's:
tbl_PeopleInCompany
Id CompanyId
-----------------
1 1
2 1
3 1
And I want to query the data in a way that gets rows from the first table with matching id's, but gets from second table if no id is found.
So the resulting query would look like:
Id Name Email Etc
-----------------------------------
1 Sue ... ...
2 Rick ... ...
3 Mary ... ...
Where Sue and Rick was taken from the first table, and Mary from the second.
SELECT Id, Name, Email, Etc FROM tbl_Employees
WHERE Id IN (SELECT ID From tbl_PeopleInID)
UNION ALL
SELECT Id, Name, Email, Etc FROM tbl_Customers
WHERE Id IN (SELECT ID From tbl_PeopleInID) AND
Id NOT IN (SELECT Id FROM tbl_Employees)
Depending on the number of rows, there are several different ways to write these queries (with JOIN and EXISTS), but try this first.
This query first selects all the people from tbl_Employees that have an Id value in your target list (the table tbl_PeopleInID). It then adds to the "bottom" of this bunch of rows the results of the second query. The second query gets all tbl_Customer rows with Ids in your target list but excluding any with Ids that appear in tbl_Employees.
The total list contains the people you want — all Ids from tbl_PeopleInID with preference given to Employees but missing records pulled from Customers.
You can also do this:
1) Outer Join the two tables on tbl_Employees.Id = tbl_Customers.Id. This will give you all the rows from tbl_Employees and leave the tbl_Customers columns null if there is no matching row.
2) Use CASE WHEN to select either the tbl_Employees column or tbl_Customers column, based on whether tbl_Customers.Id IS NULL, like this:
CASE WHEN tbl_Customers.Id IS NULL THEN tbl_Employees.Name ELSE tbl_Customers.Name END AS Name
(My syntax might not be perfect there, but the technique is sound).
This should be pretty performant. It uses a CTE to basically build a small table of Customers that have no matching Employee records, and then it simply UNIONs that result with the Employee records
;WITH FilteredCustomers (Id, Name, Email, Etc)
AS
(
SELECT Id, Name, Email, Etc
FROM tbl_Customers C
INNER JOIN tbl_PeopleInCompany PIC
ON C.Id = PIC.Id
LEFT JOIN tbl_Employees E
ON C.Id = E.Id
WHERE E.Id IS NULL
)
SELECT Id, Name, Email, Etc
FROM tbl_Employees E
INNER JOIN tbl_PeopleInCompany PIC
ON C.Id = PIC.Id
UNION
SELECT Id, Name, Email, Etc
FROM FilteredCustomers
Using the IN Operator can be rather taxing on large queries as it might have to evaluate the subquery for each record being processed.
I don't think the COALESCE function can be used for what you're thinking. COALESCE is similar to ISNULL, except it allows you to pass in multiple columns, and will return the first non-null value:
SELECT Name, Class, Color, ProductNumber,
COALESCE(Class, Color, ProductNumber) AS FirstNotNull
FROM Production.Product
This article should explain it's application:
http://msdn.microsoft.com/en-us/library/ms190349.aspx
It sounds like Larry Lustig's answer is more along the lines of what you need though.