cant understand how this view in SQL works - sql

I am having a hard time understand how the create view, TRANSCRIPTVIEW, manages to set the grade of 0 for those who did not take a course. An explanation would help, the solution and question is below. Thanks.
Student(Id,Name)
Transcript(StudId,CourseName,Semester,Grade)
Formulate the following query in SQL:
Create a list of all students (Id, Name) and, for each student, list the average grade for the courses taken in the S2002 semester.
Note that there can be students who did not take any courses in S2002. For these, the average grade should be listed as 0.
Solution:
We first create a view which augments TRANSCRIPT with rows that enroll every student into a NULL course with the grade of 0. Therefore, students who did not take anything in semester ’S2002’ will have the average grade of 0 for that semester.
Below is what confuses me, how does this work and why does it work?
CREATE VIEW TRANSCRIPTVIEW AS (
( SELECT * FROM Transcipt)
UNION
(
SELECT S.Id,NULL,’S2002’,0
FROM Student S)
WHERE S.Id NOT IN (
SELECT T.StudId
FROM Transcript T
WHERE T.Semester = ’S2002’) )
)
Remaining solution:
SELECT S.Id, S.Name, AVG(T.Grade)
FROM Student S, TRANSCRIPTVIEW T
WHERE S.Id = T.StudId AND T.Semester = ’S2002’ GROUP BY S.Id

how the create view, TRANSCRIPTVIEW, manages to set the grade of 0 for those who did not take a course
The set of students who did not take a course in semester S2002 have no record in the transcript table for that semester. Those who did take a course in that semester do have a record in the table for that semester. The query supplies values NULL, 'S2002',0 for students if they are not in the Transcript table for semester S2002:
SELECT S.Id,NULL,’S2002’,0 FROM Student S) -- this parenthesis is wrong
-- this following where conditions looks for students NOT IN the 2002 subset:
WHERE S.Id NOT IN
-- this next part gets a list of studentids for semester 2002
(
SELECT T.StudId FROM Transcript T
WHERE T.Semester = ’S2002’
)

The solution in your qusetion is ridiculous. The better solution is:
SELECT S.Id, S.Name, AVG(case when T.Semester = ’S2002’ then T.Grade end) as AvgS2002Grade
FROM Student S left outer join
TRANSCRIPTVIEW T
on S.Id = T.StudId AND T.Semester = ’S2002’
GROUP BY S.Id
The query in your question is overly complicated. It is using a union (which should really be union all for performance reasons) to be sure all students are included. Gosh, this is what left outer join is for. It is doing filtering in the where clause, when a conditional aggregation is more suitable. It uses archaic join syntax, instead of the ANSI standard.
I hope you are not learning SQL with those shortcomings.

Related

'ALL' concept in SQL queries

Relational Schema:
Students (**sid**, name, age, major)
Courses (**cid**, name)
Enrollment (**sid**, **cid**, year, term, grade)
Write a SQL query that returns the name of the students who took all courses.I'm not sure how I capture the concept of 'ALL' in a SQL query.
EDIT:
I want to be able write it without aggregation as I want to use the same logic for writing the query in relational algebra as well.
Thanks for the help!
One way of writing such queries is to count the number of course and number of courses each student took, and compare them:
SELECT s.*
FROM students s
JOIN (SELECT sid, COUNT(DISTINCT cid) AS student_courses
FROM enrollment
GROUP BY sid) e ON s.sid = e.sid
JOIN (SELECT COUNT(*) AS cnt
FROM courses) c ON cnt = student_cursed
This gives course combinations that are possible but haven't been taken...
SELECT s.sid, c.cid FROM students CROSS JOIN courses
EXCEPT
SELECT sid, cid FROM enrollment
So, you can then do the same with the student list...
SELECT sid FROM students
EXCEPT
(
SELECT DISTINCT
sid
FROM
(
SELECT s.sid, c.cid FROM students CROSS JOIN courses
EXCEPT
SELECT sid, cid FROM enrollment
)
AS not_enrolled
)
AS slacker_students
I don't like it, but it avoids aggregation...
SELECT *
FROM Students
WHERE NOT EXISTS (
SELECT 1 FROM Courses
LEFT OUTER JOIN Enrollment ON Courses.cid = Enrollment.cid
AND Enrollment.sid = Students.sid
WHERE Enrollment.sid IS NULL
)
btw. names of tables should be in singular form, not plural

I don't know why this query means this sentence

select distinct S.ID, S.name
from student as S
where not exists (
(select course_id from course where dept_name = ’Biology’)
except
(select T.course_id from takes as T where S.ID = T.ID)
);
This query says in my book that it means
"Find all students who have taken all courses offered in the Biology
department"
The table is as follows:
student(ID, name, dept_name,tot_cred)
takes(ID, course_id, sec_id, semester, year, grade)
course(course_id, title, dept_name, credits)
I thought the query didn't work because student table didn't have a courseID.
Why is the execution of this query the same as described above?
I thought the query didn't work because student table didn't have a courseID.
Note that course_id is never referred to except from the table course (select course_id from course) or the table takes (select T.course_id from takes as T). The query doesn't ever refer to student.course_id so your question belies your misunderstanding of what the query is doing.
The query is a bit confusing because it effectively employs a double negative. The where not exists clause more or less says "find all students where, if we take the set of biology courses offered and remove all of the courses the student has taken, the result is the empty set."
-- Where the following set is empty...
where not exists (
-- All of the biology courses offered...
(select course_id from course where dept_name = ’Biology’)
-- EXCEPT those that the student has taken.
except
(select T.course_id from takes as T where S.ID = T.ID)
)
If we take all offered biology courses and remove the courses that the student has taken, and the result is an empty set, the only possible explanation is that the student has taken all of the biology courses offered. (Aside: It's also possible that there are no biology courses offered, in which case the student still has taken all of the biology courses offered -- this is called vacuous truth.)
The query is saying: "There does not exist a course in the biology department that the student did not take".
The second subquery (after the except) is getting all the courses that the student takes. These are removed from all courses in the biology department. So, if a student took all courses in that department, the result would be no rows. Otherwise, the result are the rows the student did not take.
The select distinct is highly misleading. The student table should not have duplicates.
I prefer using aggregation for these types of queries:
select t.student_id
from takes t join
course c
on t.course_id = c.course_id
where c.dept_name = 'Biology'
group by t.student_id
having count(distinct t.course_id) = (select count(*) from course c2 where c2.dept_name = 'Biology');
For me (at least), the logic here more clearly matches "students would took all courses in the Biology department".

confusion with empty relations test using 'not exitst'

I'm reading database systems concepts by Hank Korth.The book states the following gives list of all students who have taken all courses offered
in the Biology department.
select distinct S.ID, S.name
from student as S
where not exists (
(
select course id
from course
where dept name = 'Biology'
)
except
(
select T.course id
from takes as T
where S.ID = T.ID
)
);
student(ID, name, dept name, tot cred)
course(course id, title, dept name, credits)
takes(ID, course id, sec id, semester, year, grade)
However, to my understanding, the last subquery finds all students who are taking at least one course, and by doing minus from the first subquery, we are subtracting all biology courses that are currently taken by students, so we will be left with biology courses that are NOT taken by any students(if any).Then when we do 'not exists' with all student ID's, we are looking for student ID's who are not taking those found biology courses plus they can take any course other than biology.But that does not give list of students who are taking all courses offered in biology department.Can someone please explain?
Please Note: I do understand the use if 'not exist' from this
site but I'm not getting the book example.
I think what you are missing is that the NOT EXISTS subquery gets executed independently, for each row returned by the outer query.
Notice that its a correlated subquery. There's a reference to a column from the outer query S.ID.
A single value of S.ID is passed in for each execution of the subquery. So, if a student is taking all of the courses from the Biology department, the EXCEPT operation in the subquery is going to result in an empty set. And the NOT EXISTS will evaluate to TRUE. But if the subquery returns a row, then there's at least on Biology course which a student isn't taking.

Get distinct result

I was asked this trick question:
Table: Student
ID NAME
1 JOHN
2 MARY
3 ROBERT
4 DENNIS
Table: Grade
ID GRADE
1 A
1 A
1 F
2 B
3 A
How do you write SQL query to return DISTINCT name of all students who has never received grade 'F' OR who has never taken a course (meaning, their ID not present in Grade table)?
Trick part is, you're not allowed to use OUTER JOIN, UNION or DISTINCT. Also, why this is a big deal?
Expected result is MARY, ROBERT, DENNIS (3 rows).
SELECT name FROM Student
WHERE
NOT EXISTS (SELECT * FROM Grade WHERE Grade.id = Student.id AND grade = 'F')
OR
NOT EXISTS (SELECT * FROM Grade WHERE Grade.id = Student.id);
You may use GROUP BY in order to fake a distinct.
SELECT name FROM student
WHERE (SELECT COUNT(*) FROM grade WHERE grade = 'F'
AND id = student.id) = 0
at least this is the shortest answer so far ...
Something like this could work, if you're allowed to use subqueries.
SELECT `NAME`
FROM Student
WHERE 'F' NOT IN
(SELECT GRADE FROM Grade WHERE ID = Student.ID)
Hmm, my homework sense is tingling... Not that my questions have never related to homework though...
You could use the GROUP BY and aggregate functions in order to fake a distinct.
You want to exclude everyone who has both taken a course and received a grade of 'F'. Something like this might work:
SELECT NAME
FROM Student
WHERE 0 = (SELECT COUNT(*)
FROM Student
LEFT JOIN Grade
USING (ID)
WHERE GRADE='F')
GROUP BY NAME
SELECT name
FROM grade G, student S
WHERE (S.id = G.id AND 'F' NOT IN (SELECT G1.grade
FROM grade G1
WHERE G1.id = G.id))
OR
S.id NOT IN (SELECT id
FROM grade)
GROUP BY name
A reason why they might not want you to use UNION, JOIN or DISTINCT is that some of those queries might be slow if you try to force an "optimized" solution.
I'm not too familiar with query optimization techniques but usually if you use some of those aggregators and JOINs, you might slow down your query rather than just letting the query optimizations run through and organize your SQL based on your table structure and contents.

Getting single records back from joined tables that may produce multiple records

I've got a student table and an enrollment table; a student could have multiple enrollment records that can be active or inactive.
I want to get a select that has a single student record and an indicator as to whether that student has active enrollments.
I thought about doing this in an inline UDF that uses the student ID in a join to the enrollment table, but I wonder if there's a better way to do it in a single select statement.
The UDF call might look something like:
Select Student_Name,Student_Email,isEnrolled(Student_ID) from Student
What might the alternative - with one SQL statement - look like?
select Student_Name,
Student_Email,
(select count(*)
from Enrollment e
where e.student_id = s.student_id
) Number_Of_Enrollments
from Student e
will get the number of enrollments, which should help.
Why not join to a secondary select? Unlike other solutions this isn't firing a subquery for every row returned, but gathers the enrollment data for everyone all at once. The syntax may not be quite correct, but you should get the idea.
SELECT
s.student_name,
s.student_email,
IsNull( e.enrollment_count, 0 )
FROM
Students s
LEFT OUTER JOIN (
SELECT
student_id,
count(*) as enrollment_count
FROM
enrollments
WHERE
active = 1
GROUP BY
student_id
) e
ON s.student_id = e.student_id
The select from enrollments could also be redone as a function which returns a table for you to join on.
CREATE FUNCTION getAllEnrollmentsGroupedByStudent()
RETURNS #enrollments TABLE
(
student_id int,
enrollment_count int
) AS BEGIN
INSERT INTO
#enrollments
(
student_id,
enrollment_count
) SELECT
student_id,
count(*) as enrollment_count
FROM
enrollments
WHERE
active = 1
GROUP BY
student_id
RETURN
END
SELECT
s.student_name,
s.student_email,
e.enrollment_count
FROM
Students s
JOIN
dbo.getAllEnrollmentsGroupedByStudent() e
ON s.student_id = e.student_id
Edit:
Renze de Waal corrected my bad SQL!
Try someting like this:
SELECT Student_Name, Student_Email, CAST((SELECT TOP 1 1 FROM Enrollments e WHERE e.student_id=s.student_id) as bit) as enrolled FROM Student s
I think you can also use the exists statement in the select but not positive
try to avoid using udfs or subqueries, they are performance killers. banjolity seems to havea good solution otherwise because it uses a derivd table instead of a UDF or subselect.
select students.name,
decode(count(1), 0, "no enrollments", "has enrollments")
from students, enrollments
where
students.id = enrollments.sutdent_id and
enrollments.is_active = 1 group by students.name
Of course, replace the decode with a function your database uses (or, a case statement).