Finding pairs of repeating entries - sql

Could you please help me with one SQL query?
Table : Students
Id | Name | Date of birth
1 Will 1991-02-10
2 James 1981-01-20
3 Sam 1991-02-10
I need to find pairs of students who has same Date of birth. However, we are not allowed to use GROUP BY, so simply grouping and counting records is not a solution.
I have been trying to do it with JOIN, however with no success.
Your help is greatly appreciated!

You can use a self join on the table, joining on the date_of_birth column:
select s1.name,
s2.name
from students s1
join students s2
on s1.date_of_birth = s2.date_of_birth
and s1.name < s2.name;
As wildplasser and dasblinkenlight pointed out the < operator (or >) is better than a <> because when using <> in the join condition, the combination Will/Sam will be reported twice.
Another way of removing duplicate those duplicates is to use a distinct query:
select distinct greatest(s1.name, s2.name), least(s1.name, s2.name)
from students s1
join students s2
on s1.date_of_birth = s2.date_of_birth
and s1.name <> s2.name;
(although eliminating the duplicates in the join condition is almost certainly more efficient)

select st.name, stu.name
from students st, students stu
where st.date_of_birth = stu.date_of_birth AND and st.name <> stu.name;

This query reports all students who have a non-unique birthdate.
SELECT *
FROM students s
WHERE EXISTS (
SELECT *
FROM students ex
WHERE ex.dob = st.dob
AND ex.name <> st.name
)
ORDER BY dob
;

Related

SQL - Querying for names that occur at least twice

So I'm trying to find a way to query for a table "people" that has attribute "name", and I would like to query for names that occur at least twice, while the results should be distinct.
I was thinking of creating two alias tables, and joining on name but I can't figure it out.
Here is what I tried:
SELECT DISTINCT name
FROM people AS S1
INNER JOIN people AS S2 USING name
WHERE S2.lastname <> S2.surname
The surname part I did to remove cases of names appearing because of the two tables being equal (not even sure if this is correct).
But either way, this already failed as the syntax is wrong.
Would appreciate some help! Thanks in advance.
Aggregation is a simple method if you want just the names:
select name
from persons
group by name
having count(*) > 1;
If you want the original rows, use window functions:
select p.*
from (select p.*, count(*) over (partition by name) as cnt
from persons p
) p
where cnt >= 2;
Simple: use EXISTS() [ you only need to select from the people table once, and you dont have to use DISTINCT ] :
SELECT *
FROM people s1
WHERE EXISTS (SELECT *
FROM people s2
WHERE s2.name = s1.name
AND S2.lastname <> S1.lastname
);
BTW: assuming lastname <--> surname was a typo?
select p.people_name, count(1) as cnt
from people p
group by 1
having cnt >=1

Using BETWEEN with a subquery postgres

I need a create a query to get all attendance of the an employee within a time limit. But the time is from different table. I need to create a query like the one below, but I dont know how?
SELECT * FROM attendance WHERE employeeid = 25 AND attendance_date BETWEEN (SELECT bill_fromdate,bill_todate FROM bill WHERE bill_id = 21487)
I am using PostgreSQL 8.4.
You could use a join instead of a subquery:
SELECT *
FROM attendance a
JOIN bill b ON
a.attendance_date BETWEEN b.bill_fromdate AND b.bill_todate
WHERE a.employeeid = 25 AND AND b.bill_id = 21487
Either use a JOIN (as in Mureinik's answer) or use a sub-select with an exists condition:
SELECT a.*
FROM attendance a
WHERE a.employeeid = 25
AND exists (select 1
from bill b
where b.bill_id = 21487
and a.attendance_date BETWEEN b.bill_fromdate AND b.bill_todate)
Given your example query, most probably there isn't a difference between using the join or the sub-select.
But they have different meanings and a join could return a different result (i.e. more rows) than the sub-select (but again I doubt it in this situation).

SQL Comparing COUNT values within same table

I'm trying to solve a seemingly simple problem, but I think i'm tripping over on my understanding of how the EXISTS keyword works. The problem is simple (this is a dumbed down version of the actual problem) - I have a table of students and a table of hobbies. The students table has their student ID and Name. Return only the students that share the same number of hobbies (i.e. those students who have a unique number of hobbies would not be shown)
So the difficulty I run into is working out how to compare the count of hobbies. What I have tried is this.
SELECT sa.studentnum, COUNT(ha.hobbynum)
FROM student sa, hobby ha
WHERE sa.studentnum = ha.studentnum
AND EXISTS (SELECT *
FROM student sb, hobby hb
WHERE sb.studentnum = hb.studentnum
AND sa.studentnum != sb.studentnum
HAVING COUNT(ha.hobbynum) = COUNT(hb.hobbynum)
)
GROUP BY sa.studentnum
ORDER BY sa.studentnum;
So what appears to be happening is that the count of hobbynums is identical each test, resulting in all of the original table being returned, instead of just those that match the same number of hobbies.
Not tested, but maybe something like this (if I understand the problem correctly):
WITH h AS (
SELECT studentnum, COUNT(hobbynum) OVER (PARTITION BY studentnum) student_hobby_ct
FROM hobby)
SELECT studentnum, student_hobby_ct
FROM h h1 JOIN h h2 ON h1.student_hobby_ct = h2.student_hobby_ct AND
h1.studentnum <> h2.studentnum;
I think that what your query would do is only return students who had at least one other student that had the same number of hobbies. But you're not returning anything about the students with whom they match. Is that intentional? I'd treat both queries as sub-queries and aggregate before a join on the counts. You could do several things... here it's returning the number of students that have matching hobby counts, but you could limit HAVING(COUNT(distinct sb.studentnum) = 0 to get the result your query seemed to return...
with xx as
(SELECT sa.studentnum, count(ha.hobbynum) hobbycount
FROM student sa inner join hobby ha
on sa.studentnum = ha.studentnum
group by sa.studentnum
)
select sa.studentnum, sa.hobbycount, count(distinct sb.studentnum) as matchcount
from
xx sa inner join xx sb on
sa.hobbycount = sb.hobbycount
where
sa.studentnum != sb.studentnum
GROUP by sa.studentnum, sa.hobbycount
ORDER BY sa.studentnum;

SQL Query for finding values that do not exist in one table, with WHERE clause

I'm struggling to compile a query for the following and wonder if anyone can please help (I'm a SQL newbie).
I have two tables:
(1) student_details, which contains the columns: student_id (PK), firstname, surname (and others, but not relevant to this query)
(2) membership_fee_payments, which contains details of monthly membership payments for each student and contains the columns: membership_fee_payments_id (PK), student_id (FK), payment_month, payment_year, amount_paid
I need to create the following query:
which students have not paid fees for March 2012?
The query could be for any month/year, March is just an example. I want to return in the query firstname, surname from student_details.
I can query successfully who has paid for a certain month and year, but I can't work out how to query who has not paid!
Here is my query for finding out who has paid:
SELECT student_details.firstname, student_details.surname,
FROM student_details
INNER JOIN membership_fee_payments
ON student_details.student_id = membership_fee_payments.student_id
WHERE membership_fee_payments.payment_month = "March"
AND membership_fee_payments.payment_year = "2012"
ORDER BY student_details.firstname
I have tried a left join and left outer join but get the same result. I think perhaps I need to use NOT EXISTS or IS NULL but I haven't had much luck writing the right query yet.
Any help much appreciated.
I'm partial to using WHERE NOT EXISTS Typically that would look something like this
SELECT D.firstname, D.surname
FROM student_details D
WHERE NOT EXISTS (SELECT * FROM membership_fee_payments P
WHERE P.student_id = D.student_id
AND P.payment_year = '2012'
AND P.payment_month = 'March'
)
This is know an a correlated subquery as it contains references to the outer query. This allows you to include your join criteria in the subquery without necessarily writing a JOIN. Also, most RDBMS query optimizers will implement this as a SEMI JOIN which does not typically do as much 'work' as a complete join.
You could use a left join. When the payment is missing, all the columns in the left join table will be null:
SELECT student_details.firstname, student_details.surname,
FROM student_details
LEFT JOIN membership_fee_payments
ON student_details.student_id = membership_fee_payments.student_id
AND membership_fee_payments.payment_month = "March"
AND membership_fee_payments.payment_year = "2012"
WHERE membership_fee_payments.student_id is null
ORDER BY student_details.firstname
You can also write following query. This will gives your expected output.
SELECT student_details.firstname,
student_details.surname,
FROM student_details
Where
student_details.student_id Not in
(SELECT membership_fee_payments.student_id
from membership_fee_payments
WHERE
membership_fee_payments.payment_year = '2012'
AND membership_fee_payments.payment_month = 'March'
)

SQL Query Help - Return row in table which relates to another table row with max(column)

I have two tables:
Table1 = Schools
Columns: id(PK), state(nvchar(100)), schoolname
Table2 = Grades
Columns: id(PK), id_schools(FK), Year, Reading, Writing...
I would like to develop a query to find the schoolname which has the highest grade for Reading.
So far I have the following and need help to fill in the blanks:
SELECT Schools.schoolname, Grades.Reading
FROM Schools, Grades
WHERE Schools.id = (* need id_schools for max(Grades.Reading)*)
SELECT
Schools.schoolname,
Grades.Reading
FROM
Schools INNER JOIN Grades on Schools.id = Grades.id_schools
WHERE
Grades.Reading = (SELECT MAX(Reading) from Grades)
Here's how I solve this sort of problem without using a subquery:
SELECT s.*
FROM Schools AS s
JOIN Grades AS g1 ON g1.id_schools = s.id
LEFT OUTER JOIN Grades AS g2 ON g2.id_schools <> s.id
AND g1.Reading < g2.Reading
WHERE g2.id_schools IS NULL
Note that you can get more than one row back, if more than one school ties for highest Reading score. In that case, you need to decide how to resolve the tie and build that into the LEFT OUTER JOIN condition.
Re your comment: The left outer join looks for a row with a higher grade for the same school, and if none is found, all of g2.* columns will be null. In that case, we know that no grade is higher than the grade in the row g1 points to, which means g1 is the highest grade for that school. It can also be written this way, which is logically the same but might be easier to understand:
SELECT s.*
FROM Schools AS s
JOIN Grades AS g1 ON g1.id_schools = s.id
WHERE NOT EXISTS (
SELECT * FROM Grades g2
WHERE g2.id_schools <> s.id AND g2.Reading > g1.Reading)
You say it's not working. Can you be more specific? What is the answer you expect, and what's actually happening, and how do they differ?
edit: Changed = to <> as per suggestion in comment by #potatopeelings. Thanks!
This should do it
select * from Schools as s
where s.id=(
select top(1) id_schools from grades as g
order by g.reading desc)