SQL MAX() grouping - sql

Tables:
student(sid, sname, sex, age, year, gpa)
major(dname, sid)
Question:
For each department with more than 15 students majoring in the department, we want to print information about the student(s) with the highest GPA within the department. In particular, for each such student, we want to print the student id, student name and GPA, and the department name the student is major in.
So far I have:
SELECT student.sid, student.sname, student.gpa, major.dname
FROM student
RIGHT JOIN major ON student.sid = major.sid
WHERE student.gpa IN (
SELECT MAX(gpa)
FROM student JOIN major ON student.sid = major.sid
GROUP BY dname
HAVING COUNT(dname) > 15
)
But it doesn't give me the accurate query. The clause inside the IN works but when put together in this way it doesn't actually match student.gpa to dname max GPA. What am I doing wrong here?
This Query gives:
enter image description here
I need:
enter image description here

Your inner query gives you the maximum GPA for each department. Then the outer query returns all students who have a GPA equal to any of the maximum GPA's, regardless of the department. The quickest fix of your code is to use a correlated subquery, that will find the maximum GPA for the student's specific department.
SELECT s.sid, s.sname, s.gpa, sm.dname
FROM student s
RIGHT JOIN major sm ON s.sid = sm.sid
WHERE student.gpa IN (
SELECT MAX(ds.gpa)
FROM student ds JOIN major dm ON ds.sid=dm.sid
WHERE dm.dname = sm.dname
GROUP BY dm.dname
HAVING COUNT(dm.dname)>15
)

This query:
select dname, max(s.gpa) maxgpa
from major m inner join student s
on s.sid = m.sid
group by dname
having count(s.sid) > 15
returns all the departments with more than 15 students and the highest gpa in that department.
Join it to the 2 tables like this:
select s.sid, s.sname, s.gpa, t.dname
from (
select dname, max(s.gpa) maxgpa
from major m inner join student s
on s.sid = m.sid
group by dname
having count(s.sid) > 15
) t
inner join major m on m.dname = t.dname
inner join student s on s.sid = m.sid and s.gpa = t.maxgpa
Or with window functions:
select t.sid, t.sname, t.gpa, t.dname
from (
select m.dname, s.*,
rank() over (partition by m.dname order by s.gpa desc) rn,
count(s.sid) over (partition by m.dname) counter
from major m inner join student s
on s.sid = m.sid
) t
where t.counter > 15 and t.rn = 1

If you change your WHERE ... IN statement to be an INNER JOIN, you can connect more fields.
SELECT student.sid, student.sname, student.gpa, major.dname
FROM student
RIGHT JOIN major
ON student.sid = major.sid
INNER JOIN (
SELECT MAX(gpa) as max_gpa, dname
FROM student JOIN major ON student.sid=major.sid
GROUP BY dname
HAVING COUNT(dname)>15
) as dept_gpa
ON student.gpa = dept_gpa.max_gpa
AND major.dname = dept_gpa.dname

Related

SQL: Need to check columns for values that exist in another column

Using SQL, my job is to fetch the SSN of students who enrolled in a course without enrolling in that course’s prerequisite(s). I'm using Access. The tables I need are as follows:
STUDENT (SSN, SNAME, MAJOR, DOB, ADDRESS)
ENROLLED (SSN, CID, GRADE)
PREQ (CID, PREQCID, PASSINGGRADE, NOTE)
Here's what I've done so far.
select *
from
(select SSN, CID from ENROLLED) AS enrolled
left join
(select CID, PREQCID FROM PREQ) AS prereq ON enrolled.CID = prereq.CID;
What I'm missing is how to check each row of the same student on the condition WHERE enrolled.CID = prereq.CID for a PREQCID that's NOT in enrolled.CID.
Is what I'm asking for here a loop? Am I on the right track? Please keep in mind this I'm in an introductory course so the simplest of solutions is preferable.
Here's another way using not exists:
select e1.ssn
from enrolled e1 left join preq p on e1.cid = p.cid
where
p.preqcid is not null and
not exists (select 1 from enrolled e2 where e2.ssn = e1.ssn and e2.cid = p.preqcid)
This is essentially stating:
"Select the ssn for all enrollments where there is a prerequisite course and the prerequisite course ID does not exist in the table of enrollments for that ssn."
I use select 1 purely for optimisation - we don't care about the values held by the nested query, only whether or not the nested query returns one or more records.
You could also write this using joins as:
select e1.ssn
from
(
enrolled e1 left join preq p on e1.cid = p.cid
)
left join enrolled e2 on
p.preqcid = e2.cid and e1.ssn = e2.ssn
where
e2.cid is null
Here, the enrolled table is referenced twice: the first is left joined on the table of prerequisite courses for each course, and the second is left joined on the prerequisite course ID and the ssn from the original enrollment.
The where clause then causes records to be selected for which the link on the prerequisite course is null for the given ssn.
I'm sure there is a cleaner way to do this, but this gets you your result:
Select S.SSN, E.CID From Student S
Inner Join Enrolled E on E.SSN = S.SSN
Where E.CID IN (Select CID From PREQ Where Preqcid NOT IN
(Select CID From Enbrolled Where SSN = S.SSN))
This might work if analytic functions and CTEs are available:
with d as (
select ssn, cid,
count(distinct p.cid) over (partition by e.ssn, e.cid) ecnt, ,
count(distinct e.cid) over (partition by e.ssn, p.cid) pcnt
from enrolled e left outer join preq p on p.cid = e.cid
)
select distinct ssn, cid from d
where ecnt = pcnt;

SQL not group by expression, Find student who has taken at least 5 courses

select s.name, s.id
from student s join takes t on t.id = s.id
where s.name like 'D%'
group by s.name, s.id
having (
select count(distinct c.course_id)
from course c
where c.dept_name = 'History' and c.course_id = t.course_id)>4
order by s.name
I am confused about how GROUP BY works. I am trying to find the students who has taken at least 5 courses from history department and name start with D.
Not sure with the nested subqueries...
course(course id, title, dept name, credits)
student(ID, name, dept name, tot_cred)
takes(ID, course_id, sec_id, semester, year, grade)
You have to additionally JOIN with course table:
select s.name, s.id
from student s
inner join takes t on t.id = s.id
inner join course c on c.course_id = t.course_id
where s.name like 'D%' and c.dept_name = 'History'
group by s.name, s.id
having count(distinct c.course_id) >= 5
The WHERE clause returns all students whose names start with a 'D' and have taken at least one course in history department. The HAVING clause filters out any students with 4 or less distinct courses in history department.

Using max() in SQL

One part of my homework assignment is to find the student with the highest average from each department.
QUERY:
SELECT g.sid as studentID, s.sfirstname, s.dcode, AVG(grade) as average
FROM studentgrades g, student s
WHERE g.sid = s.sid
GROUP BY s.sid
RESULT:
1 Robert ger 80.0000
2 Julie sta 77.0000
3 Michael csc 84.0000
4 Julia csc 100.0000
5 Patric csc 86.0000
6 Jill sta 74.5000
To answer The question, I ran the query
SELECT dcode, averages.sfirstName, MAX(averages.average)
FROM (
SELECT g.sid as studentID, s.sfirstname, s.dcode, AVG(grade) as average
FROM studentgrades g, student s
WHERE g.sid = s.sid
GROUP BY s.sid) averages
GROUP BY dcode
RESULT:
csc Michael 100.0000
ger Robert 80.0000
sta Julie 77.0000
Even though the averages are correct, the names are not!
Julia is the one who has the average 100 in csc, so why does Michael show up?
Here's an example:
a student takes courses and gets grades for these courses. EG:
student1 from dept1 took course A and got grade 80
student1 from dept1 took course B and got grade 90
student2 from dept1 took course C and got grade 100
student3 from dept2 took course X and got grade 90
AFTER RUNNING THE FIRST QUERY we get the averages for each student
student 1 from dept1 has average 85
student 2 from dept1 has average 100
student 3 from dept2 has average 90
Now we find the student with the highest average from each department
dept1, student2, 100
dept2, student3, 90
This should do it (and it uses the GROUP BY according to the SQL standard, not the way MySQL implements it)
select s.sid,
s.sfirstname,
s.dcode,
ag.avg_grade
from students s
join (select sid, avg(grade) as avg_grade
from studentgrades
group by sid) ag on ag.sid = s.sid
join (select s.dcode,
max(avg_grade) max_avg_grade
from students s
join (select sid, avg(grade) as avg_grade
from studentgrades
group by sid) ag on ag.sid = s.sid
group by s.dcode) mag on mag.dcode = s.dcode and mag.max_avg_grade = ag.avg_grade
order by mag.avg_grade;
How this works
This builds up the result in several steps. First it calculates the average grade for each student:
select sid, avg(grade) as avg_grade
from studentgrades
group by sid
Based on the result of this statement, we can calculate the max. average grade:
select s.dcode,
max(avg_grade) max_avg_grade
from students s
join (select sid, avg(grade) as avg_grade
from studentgrades
group by sid) ag on ag.sid = s.sid
group by s.dcode
Now these two results are joined to the students table. For easier reading assume there is a view called average_grades (the first statement) and max_average_grades (the second one).
The final statement basically does this then:
select s.sid,
s.sfirstname,
s.dcode,
ag.avg_grade
from students s
join avg_grades ag on ag.sid = s.sid
join max_avg_grades mag
on mag.dcode = s.dcode
and mag.max_avg_grade = ag.avg_grade;
The real one (the very first in my answer) simply replaces the names avg_grades and max_avg_grades with the selects I have shown. That's why it looks so complicated.
A solution in standard SQL that is a bit more readable
In standard SQL, this could be expressed using a common table expression which makes it a bit more readable (but is essentially the same thing)
with avg_grades (sid, avg_grade) as (
select sid, avg(grade) as avg_grade
from studentgrades
group by sid
),
max_avg_grades (dcode, max_avg_grade) as (
select s.dcode, max(avg_grade) max_avg_grade
from students s
join avg_grades ag on ag.sid = s.sid
group by s.dcode
)
select s.sid,
s.sfirstname,
s.dcode,
ag.avg_grade
from students s
join avg_grades ag on ag.sid = s.sid
join max_avg_grades mag on mag.dcode = s.dcode and mag.max_avg_grade = ag.avg_grade;
But MySQL is one of the very few DBMS to not support this, so you will need to stick with the initial statement.
A standard SQL solution requiring less derived tables
In standard SQL it could be written even a bit shorter using windowing functions to calculate the rank inside a department (again this does not work in MySQL)
with avg_grades (sid, avg_grade) as (
select sid, avg(grade) as avg_grade
from studentgrades
group by sid
)
select sid,
sfirstname,
dcode,
avg_grade
from (
select s.sid,
s.sfirstname,
s.dcode,
ag.avg_grade,
rank() over (partition by s.dcode order by ag.avg_grade desc) as rnk
from students s
join avg_grades ag on ag.sid = s.sid
) t
where rnk = 1;
Update the query to use a HAVING clause as below:
SELECT dcode, averages.sfirstName, averages.average
FROM (
SELECT g.sid as studentID, s.sfirstname, s.dcode, AVG(grade) as average
FROM studentgrades g, student s
WHERE g.sid = s.sid
GROUP BY s.sid) averages
GROUP BY dcode
HAVING MAX(averages.average) = averages.average
there are many different solutions.
Maybe this one is simpler to understand:
/* create a new temporariy table of student performance. to keep the code clean and the performance better */
INSERT INTO studentperformance (studentID, sfirstname, dcode, average)
SELECT g.sid as studentID
, s.sfirstname
, s.dcode
, AVG(grade) as average
FROM studentgrades g, student s
WHERE g.sid = s.sid
GROUP BY s.sid;
/* best grades for each department */
INSERT INTO bestgrades (best_average_per_department)
SELECT (dcode + '|' + MAX(average)) as best_average_per_department /* important string. maybe one has to cast the max(average) to string for this to work */
FROM studentperformance
GROUP BY dcode; /* important groub by ! */
/* get all the students who are best in each department */
SELECT a.studentID
, a.sfirstname
, a.dcode
, a.average
FROM studentperformance as a
JOIN bestgrades as b on (a.dcode + '|' + a.average) = b.best_average_per_department;

Select one record from two tables in Oracle

There are three tables:
A table about students: s41071030(sno, sname, ssex, sage, sdept)
A table about course: c41071030(cno, cname, cpno, credit)
A table about selecting courses: sc41071030(sno, cno, grade)
Now, I want select the details about a student whose sdept='CS' and he or she has selected the most courses in department 'CS'.
As with any modestly complex SQL statement, it is best to do 'TDQD' — Test Driven Query Design. Start off with simple parts of the question and build them into a more complex answer.
To find out how many courses each student in the CS department is taking, we write:
SELECT S.Sno, COUNT(*) NumCourses
FROM s41071030 S
JOIN sc41071030 SC ON S.Sno = SC.Sno
WHERE S.Sdept = 'CS'
GROUP BY S.Sno;
We now need to find the largest value of NumCourses:
SELECT MAX(NumCourses) MaxCourses
FROM (SELECT S.Sno, COUNT(*) NumCourses
FROM s41071030 S
JOIN sc41071030 SC ON S.Sno = SC.Sno
WHERE S.Sdept = 'CS'
GROUP BY S.Sno
)
Now we need to join that result with the sub-query, so it is time for a CTE (Common Table Expression):
WITH N AS
(SELECT S.Sno, COUNT(*) NumCourses
FROM s41071030 S
JOIN sc41071030 SC ON S.Sno = SC.Sno
WHERE S.Sdept = 'CS'
GROUP BY S.Sno
)
SELECT N.Sno
FROM N
JOIN (SELECT MAX(NumCourses) MaxCourses FROM N) M
ON M.MaxCourses = N.NumCourses;
And we need to get the student details, so we join that with the student table:
WITH N AS
(SELECT S.Sno, COUNT(*) NumCourses
FROM s41071030 S
JOIN sc41071030 SC ON S.Sno = SC.Sno
WHERE S.Sdept = 'CS'
GROUP BY S.Sno
)
SELECT S.*
FROM s41071030 S
JOIN N ON N.Sno = S.Sno
JOIN (SELECT MAX(NumCourses) MaxCourses FROM N) M
ON M.MaxCourses = N.NumCourses;
Lightly tested SQL: you were warned. To test, run the component queries, making sure you get reasonable results each time. Don't move on to the next query until the previous one is working correctly.
Note that the courses table turns out to be immaterial to the query you are solving.
Also note that this may return several rows if it turns out there are several students all taking the same number of courses and that number is the largest number that any student is taking. (So, if there are 3 students taking 7 courses each, and no student taking more than 7 courses, then you will see 3 rows in the result set.)
Aggregate sc41071030 rows to get the counts.
Join the results to s41071030 to:
filter rows on sdept;
get student details;
RANK() the joined rows on the count values.
Select rows with the ranking of 1.
WITH
aggregated AS (
SELECT
sno,
COUNT(*) AS coursecount
FROM
sc41071030
GROUP BY
sno
),
ranked AS (
SELECT
s.*,
RANK() OVER (ORDER BY agg.coursecount DESC) AS rnk
FROM
s41071030 s
INNER JOIN aggregated agg ON s.sno = agg.sno
WHERE
s.sdept = 'CS'
)
SELECT
sno,
sname,
ssex,
sage,
sdept
FROM
ranked
WHERE
rnk = 1
;

Finding total participation in sql

This is a question I could not answer in oracle lab exam.
Given the schema:
(Courses: cid(int), deptid(int)...);
(Students: sid(int), sname (string), deptid(int)...);
(Participation: cid(int), sid(int), ...);
A student can attend courses outside his department.
Need to get the names of the students who take all the courses offered by his department.
How to do this in sqlplus?
SELECT s.sid, s.sname, s.deptid
FROM Students s
INNER JOIN Participation p
ON s.sid = p.sid
INNER JOIN Courses c
ON p.cid = c.cid
AND s.deptid = c.deptid
GROUP BY s.sid, s.sname, s.deptid
HAVING COUNT(DISTINCT c.cid) = (SELECT COUNT(*)
FROM Courses c2
WHERE c2.deptid = s.deptid)
I cannot tesst the query right now, so I don't know if I have a syntactic error, anyway, you can try this idea to achieve your requirements.
SELECT studentName
FROM
(SELECT stu.sname AS studentName,
cour.deptid AS dept,
COUNT(*) AS assistedCoursesByDpt,
Max(cour.total) AS total
FROM students stu,
participation part,
(SELECT cour.deptid,COUNT(*) AS total FROM courses cour GROUP BY cour.deptid
) AS cour
WHERE stu.sid=part.sid
AND part.cid =cour.cid
GROUP BY stu.sid,
cour.deptid
)
WHERE total=assistedCoursesByDpt
Th idea is to create a subquery (cour) that has a new calculated column, the total courses by debt. Then you can compare this total with the agrouped student courses by dept.