PostgreSQL: How do I get data from table `A` filtered by a column in table `B` - sql

I want to fetch all parents that have kids in a specific grade only in a school.
Below are trimmed down version of the tables.
TABLE students
id,
last_name,
grade_id,
school_id
TABLE parents_students
parent_id,
student_id
TABLE parents
id,
last_name,
school_id
I tried the below query but it doesn't really work as expected. It rather fetches all parents in a school disregarding the grade. Any help is appreciated. Thank you.
SELECT DISTINCT
p.id,
p.last_name,
p.school_id,
st.school_id,
st.grade_id,
FROM parents p
INNER JOIN students st ON st.school_id = p.school_id
WHERE st.grade_id = 118
AND st.school_id = 6
GROUP BY p.id,st.grade_id,st.school_id;

I would think:
select p.*
from parents p
where exists (select 1
from parents_students ps join
students s
on ps.student_id = s.id
where ps.parent_id = p.id and
s.grade_id = 118 and
s.school_id = 6
);
Your question says that you want information about the parents. If so, I don't see why you are including redundant information about the school and grade (it is redundant because the where clause specifies exactly what those values are).

Related

How to make sure result pairs are unique - without using distinct?

I have three tables I want to iterate over. The tables are pretty big so I will show a small snippet of the tables. First table is Students:
id
name
address
1
John Smith
New York
2
Rebeka Jens
Miami
3
Amira Sarty
Boston
Second one is TakingCourse. This is the course the students are taking, so student_id is the id of the one in Students.
id
student_id
course_id
20
1
26
19
2
27
18
3
28
Last table is Courses. The id is the same as the course_id in the previous table. These are the courses the students are following and looks like this:
id
type
26
History
27
Maths
28
Science
I want to return a table with the location (address) and the type of courses that are taken there. So the results table should look like this:
address
type
The pairs should be unique, and that is what's going wrong. I tried this:
select S.address, C.type
from Students S, Courses C, TakingCourse TC
where TC.course_id = C.id
and S.id = TC.student_id
And this does work, but the pairs are not all unique. I tried select distinct and it's still the same.
Multiple students can (and will) reside at the same address. So don't expect unique results from this query.
Only an overview is needed, so that's why I don''t want duplicates
So fold duplicates. Simple way with DISTINCT:
SELECT DISTINCT s.address, c.type
FROM students s
JOIN takingcourse t ON s.id = t.student_id
JOIN courses c ON t.course_id = c.id;
Or to avoid DISTINCT (why would you for this task?) and, optionally, get counts, too:
SELECT c.type, s.address, count(*) AS ct
FROM students s
JOIN takingcourse t ON s.id = t.student_id
JOIN courses c ON t.course_id = c.id
GROUP BY c.type, s.address
ORDER BY c.type, s.address;
A missing UNIQUE constraint on takingcourse(student_id, course_id) could be an additional source of duplicates. See:
How to implement a many-to-many relationship in PostgreSQL?

SQL statement : average

My question: What is the average age to become the first grandpa. The solution should be given out as average_age. The day a person becomes grandpa is where his first grandchild was born.
Relations:
human (name, gender, age)
parent (ParentName, ChildName) -> is subset of human(name).
Table:
I do know that grandpa is the person which has a parentname and a child in childname which is also a person(father) in parentname which has children in childname (grandchildren). The problem is now how do I get the average age to become grandpa.
What I got so far:
SELECT AVG(age) as average_age
FROM human h JOIN
parent p
ON h.name = p.parentname
WHERE h.gender = 'm' AND p.parentname = p.childname AND h.name = p.parentname
Expected outcome:
average_age : 52
It is extremely unusually to be storing the AGE of people in a table, because that changes -- every day. The data should be stored with a date of birth.
This is an aggregation query, but you have to join the tables multiple times. To get grandparents, you need a join on the parents table. Then you need to bring in humans for filtering:
select avg(min_age * 1.0)
from (select min(h_grandparent.age - h.grandchild.age) as min_age
from parent p join -- p.parentname is the grandparent
parent pchild
on p.childname = pchild.parentname join
human h_grandparent
on p.parentname = h_grandparent.name join
human h_grandchild
on pchild.childname = h_grandchild.name
where h_grandparent.gender = 'm'
group by h_grandparent.name
) a
I would address this with an exists condition that filters on humans that have grandchilds:
select avg(age) avg_age_of_grandpas
from human h
where
gender = 'm'
and exists (
select 1
from parent p1
inner join parent p2 on p2.parentName = p1.childName
where p1.parentName = h.name
)
The exists condition ensures that the person has at least one child and one grand child. The the outer query computes the average of such humans. Given the information available in your table structure, this seems to me like the most logical approach. Unlike joins, using exists avoids duplicating the records (and getting wrong results in the average) when a person has more than one line of descendants.
If you want the age of the grand parent at the date when their first grand child was born, then it is a bit complicated. This should get you close to what you expect:
select avg(h.age - g.maxGrandChildAge) avg_age_of_grandpas
from human h
inner join (
select
p1.parentName grandParentName,
max(h1.age) maxGrandChildAge
from parent p1
inner join parent p2 on p2.parentName = p1.childName
inner join human h1 on h1.name = p2.childName
) g
on g.grandParentName = h.name
You can do this with 2 more joins and subtraction the age of biggest grandchild:
SELECT AVG(p_age) average_age
FROM
(
SELECT h.name, h.age-MAX(h2.age) as p_age
FROM parent p1
LEFT JOIN parent p2 ON P1.childname = P2.parentname
INNER JOIN human h ON P1.parentname = h.name
INNER JOIN human h2 ON P2.ChildName = h2.name
WHERE h.gender = 'm' AND p2.childname IS NOT NULL
GROUP BY h.name, h.age
)pAges
Please consider that the name is not appropriate data for doing this task.

Which Join for SQL plus query

I have 4 tables, I would like to select one column from each table, but only if the department has both 'Mick' and 'Dave working in it (must have both names, not one or the other). But it does not seem to be working properly:
SELECT SCHOOL_NAME, TOWN, COUNTY
FROM STUDENTS
NATURAL JOIN SCHOOLS NATURAL JOIN TOWNS NATURAL JOIN
COUNTIES
WHERE FIRST_NAME IN ('Mick','Dave)
/
I'm going wrong somewhere (probably lots of places :( ). Any help would be great
Don't use NATURAL JOIN. It is an abomination, because it does not take properly declared foreign key relationships into account. It only looks at the names of columns. This can introduce really hard to find errors.
Second, what you want is aggregation:
select sc.SCHOOL_NAME, t.TOWN, c.COUNTY
from STUDENTS st join
SCHOOLS sc
on st.? = sc.? join
TOWNS t
on t.? = ? join
COUNTIES c
on c.? = t.?
where FIRST_NAME in ('Mick', 'Dave')
group by sc.SCHOOL_NAME, t.TOWN, c.COUNTY
having count(distinct st.first_name) = 2;
The ? are placeholders for table and column names. If you are learning SQL, it is all the more important that you understand how columns line up for joins in different tables.
A where clause can only check the values in a single row. There is a separate row for each student, so there is no way -- with just a where -- to find both students. That is where the aggregation comes in.
You need at least three Join conditions, and properly end the string Dave with quote :
SELECT SCHOOL_NAME, TOWN, COUNTY
FROM SCHOOLS h
JOIN TOWNS t ON (t.id=h.town_id)
JOIN COUNTIES c ON (t.county_id=c.id)
WHERE EXISTS ( SELECT school_id
FROM STUDENTS s
WHERE s.first_name in ('Mick','Dave')
AND school_id = h.id
GROUP BY school_id
HAVING count(1)>1
);
SQL Fiddle Demo
You can use an analytic function in a sub-query to count the students who have the name Mick or Dave for each school_id (assuming that is your identifier for a school):
SELECT SCHOOL_NAME, TOWN, COUNTY
FROM ( SELECT *
FROM (
SELECT d.*,
COUNT(
DISTINCT
CASE WHEN FIRST_NAME IN ( 'Mick', 'Dave' ) THEN FIRST_NAME END
) OVER( PARTITION BY school_id )
AS num_matched
FROM STUDENTS d
)
WHERE num_matched = 2
)
NATURAL JOIN SCHOOLS
NATURAL JOIN TOWNS
NATURAL JOIN COUNTIES;
SQLFiddle
You would also be better to use an INNER JOIN and explicitly specify the join condition rather than relying on NATURAL JOIN.

SQL query - list of items in one table not in another

I need some help with a SQL query. I have a table of courses and a table that contains user id and course id, denoting courses that user has taken (might not have taken any; no entry in that table for that user id).
I need a query to return the list of courses not taken.
Course Category table
CategoryID
Caegory
Courses table
CourseID
CategoryID
CourseName
...
UserCourse table
UserID
CourseID
you can use not exists
Select *
From Courses c
Where Not Exists (Select 1 From UserCourse uc Where uc.CourseID = c.CourseID)
This will just list the course
select *
from Courses C
Left join CourseCategory cc on
cc.CategoryID = c.CategoryID
where CourseID not in (Select CourseID from UserCourse where UserID = 14)
what i need is for a given user id and course category, what courses within that category have not been taken by this user
(This should have been in the request by the way.)
So:
Select from courses.
Limit to the desired category.
Limit to courses not in the set of courses taken by the user.
The query:
select *
from courses
where categoryid = 123
and courseid not in (select courseid from usercourse where userid = 456);
Another way of writing same query, which will perform faster.
select C.CourseID,C.CategoryID
from Courses C
Left join CourseCategory cc on
cc.CategoryID = c.CategoryID
left join UserCourse uc
on C.CourseID=uc.CourseID
where uc.CourseID is null

SQL query for finding row with same column values that was created most recently

If I have three columns in my MySQL table people, say id, name, created where name is a string and created is a timestamp.. what's the appropriate query for a scenario where I have 10 rows and each row has a record with a name. The names could have a unique id, but a similar name none the less. So you can have three Bob's, two Mary's, one Jack and 4 Phil's.
There is also a hobbies table with the columns id, hobby, person_id.
Basically I want a query that will do the following:
Return all of the people with zero hobbies, but only check by the latest distinct person created, if that makes sense. Meaning if there is a Bob person that was created yesterday, and one created today.. I only want to know if the Bob created today has zero hobbies. The one from yesterday is no longer relevant.
select pp.id
from people pp, (select name, max(created) from people group by name) p
where pp.name = p.name
and pp.created = p.created
and id not in ( select person_id from hobbies )
SELECT latest_person.* FROM (
SELECT p1.* FROM people p1
WHERE NOT EXISTS (
SELECT * FROM people p2
WHERE p1.name = p2.name AND p1.created < p2.created
)
) AS latest_person
LEFT OUTER JOIN hobbies h ON h.person_id = latest_person.id
WHERE h.id IS NULL;
Try This:
Select *
From people p
Where timeStamp =
(Select Max(timestamp)
From people
Where name = p.Name
And not exists
(Select * From hobbies
Where person_id = p.id))