sql : how to find duplicates rows? - sql

i got a table users (id,first_name,last_name,...)
i want to find duplicate users in that table (users which has the same first_name AND the same last_name).
let's say my data is :
1;bill;campton
2;sarah;connor
3;bill;campton
i need to get
1;bill;campton;3;bill;campton
i dont want to get
1;bill;campton;3;bill;campton
3;bill;campton;1;bill;campton
How could i do that?
I use SQL Server 2005
thank you

One way
select first_name, last_name
from table
group by first_name, last_name
having count(*) > 1
If you want also the IDs then you can do this
SELECT t1.*
FROM table t1
join
(select first_name, last_name
from table
group by first_name, last_name
having count(*) > 1) x ON t1.last_name = x.last_name
AND t1.first_name = x.first_name

You could use:
select u1.id, u2.id, u1.first_name, u1.last_name
from users u1
inner join users u2
on u1.first_name = u2.first_name
and u1.last_name = u2.last_name
where u2.id > u1.id
Or, to get your 6 rows, use
select u1.id, u1.first_name, u1.last_name, u2.id, u2.first_name, u2.last_name
etc.

I Just figure this out. It's very simple. You can use a Common Table Expression and Window partition.
This example finds all students with the same name and DOB. The fields you want to check for duplication go in the partition. You could include what ever other fields you want in the projection.
with cte (StudentId, Fname, LName, DOB, RowCnt)
as (
SELECT StudentId, FirstName, LastName, DateOfBirth as DOB, SUM(1) OVER (Partition By FirstName, LastName, DateOfBirth) as RowCnt
FROM tblStudent
)
SELECT * from CTE where RowCnt > 1
Order By DOB

Given the output format you said you wanted, this works:
select
o.id,
o.firstname,
o.lastname,
d.id,
d.firstname,
d.lastname
from
users o
join users d on d.firstname = o.firstname and d.lastname = o.lastname and o.id < d.id
Note that if you have more than one duplicate that you will get results that you probably don't want, and so SQLMenace's solution is probably much better overall.

Related

How to find columns that only have one value - Postgresql

I have 2 tables, person(email, first_name, last_name, postcode, place_name) and location(postcode, place_name). I am trying to find people that live in places where only one person lives. I tried using SELECT COUNT() but failed because I couldn't figure out what to count in this situation.
SELECT DISTINCT email,
first_name,
last_name
FROM person
INNER JOIN location USING(postcode,
place_name)
WHERE 1 <=
(SELECT COUNT(?))
Aggregate functions always go with having:
SELECT DISTINCT first_value(email) over (partition by place_name),
first_value(first_name) over (partition by place_name),
first_value(last_name) over (partition by place_name),
count(*)
FROM person
INNER JOIN location USING(postcode,
place_name)
GROUP BY place_name
HAVING count(*) = 1
For more about the window functions (like first_value) check out this tutorial.
I would do this as follows. I find it plain and simple.
select p1.* from
person p1
join
(
select p.postcode, p.place_name, count(*) cnt from
person p
group by p.postcode, p.place_name
) t on p1.postcode = t.postcode and p1.place_name = t.place_name and t.cnt = 1
How does it work?
In the inner query (aliased t) we just count how many people live in each location.
Then we join the result of it (t) with the table person (aliased p1) and in the join we require t.cnt = 1. This is probably the most natural way of doing it, I think.
Thanks to the help of people here, I found this answer:
SELECT first_name,
last_name,
email
FROM person
WHERE postcode IN
(SELECT postcode
FROM person
GROUP BY postcode,
place_name
HAVING COUNT(place_name)=1
ORDER BY postcode)
AND place_name IN
(SELECT place_name
FROM person
GROUP BY postcode,
place_name
HAVING COUNT(postcode)=1
ORDER BY place_name)

PostgreSQL: how to get max() of GROUP BY?

I have a table called history with the fields id, lastname, event and date.
It saves events of persons that works in my office like "Entering office", "Exiting office", "Illness", "Holidays", etc.
I need to get the last event of every person.
This is what I tried but it doesn't work:
SELECT lastname, event, max(date)
FROM personshistory
GROUP BY lastname, event;
You can use distinct on:
select distinct on (lastname) ph.*
from personshistory
order by lastname, date desc;
distinct on is a very convenient Postgres extension. It keeps one row for each value of the expressions in parentheses. The specific row is determined by the order by clause -- based on the keys that follow the distinct on keys.
With NOT EXISTS:
SELECT p.lastname, p.event, p.date
FROM personshistory p
WHERE NOT EXISTS (
SELECT 1 FROM personshistory
WHERE lastname = p.lastname and date > p.date
)
or join your query (without the event column) to the table:
SELECT p.lastname, p.event, p.date
FROM personshistory p INNER JOIN (
SELECT lastname, MAX(date) maxdate
FROM personshistory
GROUP BY lastname
) g on g.lastname = p.lastname and g.maxdate = p.date

Using SQL Group By while keeping same varchar values

I have a query that is returning two values. I want to have the largest value so I do a group by, then MAX. However, I have three other columns(varchar) that I would like to remain consistent with the id that is brought in with max.
Example.
OId CId FName LName BName
18477 110 Hubba Bubba whoa
158 110 Test2 Person2 leee
What I want is
OId CId FName LName BName
18477 110 Hubba Bubba whoa
So I want to group them by CId. And O Id I want to keep the largest number. I can't use Min or Max for the FName, LName, or BName because I want them to be the one with the OId that is selected. The FName, LName and BName for the other row I don't even want/need.
I tried using SELECT TOP, but that only pulls in literally one row and I need multiple.
SQL
INSERT INTO #CustomerInfoAll(FName, LName, BName, OwnerId, CustomerId)
SELECT
-- what goes here --(o.FirstName) AS FName,
-- what goes here --(o.LastName) AS LName,
-- what goes here --(o.BusinessName) AS BName,
MAX(o.OId) AS OId,
(r.CId) AS CId
FROM Owner o
INNER JOIN Report r
ON o.ReportId = r.ReportId
WHERE r.CId IN (SELECT CId FROM #ThisReportAll)
AND r.Completed IS NOT NULL
GROUP BY r.CId
ORDER BY OId DESC;
Assuming you have SQL Server 2005 or higher:
INSERT INTO #CustomerInfoAll (FName, LName, BName, OwnerId, CustomerId)
SELECT
FirstName,
LastName,
BusinessName,
Id,
CId
FROM
(
SELECT
Seq = ROW_NUMBER() OVER (PARTITION BY r.CId ORDER BY o.Id DESC),
o.Id,
r.CId,
o.FirstName,
o.LastName,
o.BusinessName
FROM
dbo.Owner o
INNER JOIN dbo.Report r
ON o.ReportId = r.ReportId
WHERE
EXISTS ( -- can be INNER JOIN instead if `CId` is unique in temp table
SELECT *
FROM #ThisReportAll tra
WHERE r.CId = tra.CId
)
AND r.Completed IS NOT NULL
GROUP BY
o.Id,
r.CId,
o.FirstName,
o.LastName,
o.BusinessName
) x
WHERE
x.Seq = 1;
DO use full schema names on all your objects (dbo.Owner and dbo.Report).
DO use a semi-join (an EXISTS clause) or INNER JOIN instead of IN when possible.

SQL query to find students who received an A for every course taken

I have following schema:
Students(sid, firstname, lastname, status, gpa, email)
Courses(dept_code, course#, title)
Classes(classid, dept_code, course#, sect#, year, semester, limit, class_size)
Enrollments(sid, classid, lgrade)
I need some help to find out all the students who received an A for every course taken.
I might suggest doing this with an aggregation:
select e.sid
from enrollement e
group by e.sid
having min(lgrade) = max(lgrade) and min(lgrade) = 'A';
try this
select * from students
where sid not in (select distinct sid from enrollement where coalesce (lgrade,'X') <> 'A')
It means: take all students where none of his/her grade is other than A
if you wanna to get also the name of the class and course, you have to join also both tables
Try not to overthink this:
SELECT s.*
FROM STUDENTS s
WHERE s.GPA = 4.0
That'll work for the A=4, B=3, C=2, D=1, F=0 case (standard American grading system).
For non-standard systems (such as my kids high school where A=5 for honors and advanced placement classes) we can't trust the GPA:
SELECT s.*
FROM STUDENTS s
INNER JOIN (SELECT SID, COUNT(*) AS CLASS_COUNT
FROM ENROLLMENTS
GROUP BY SID) cc
INNER JOIN (SELECT SID, COUNT(*) AS A_GRADE_COUNT
FROM ENROLLMENTS
WHERE LGRADE = 'A'
GROUP BY SID) ag
ON ag.SID = s.SID
WHERE CLASS_COUNT = A_GRADE_COUNT
Best of luck.
I think this is the clearest:
select e.sid
from enrollement e
group by e.sid
having count(case when lgrade = 'A' then 1 else 0 end) = count(*) and count(*) > 0

How to select 'exceptions' in SQL?

I have a table comments which contains a field student_id(foreign key to students table.
I have another table students
What I would like to do, is run a query that displays all students who have not made any comments. The SQL I have, only shows students who have made comments
SELECT studentID, email, first_name, last_name FROM "students" JOIN comments ON students.id = comments.student_id
How do I 'reverse' this SQL to show students who have NOT commented?
One method uses not exists:
select s.*
from students s
where not exists (select 1
from comments c
where s.id = c.student_id
);
You could do this:
SELECT studentID, email, first_name, last_name
FROM students
LEFT JOIN comments ON students.id = comments.student_id
WHERE comments.student_id IS NULL
select s.studentID, s.email, s.first_name, s.last_name from students s where s.id not in (select student_id from comments);