confusion with empty relations test using 'not exitst' - sql

I'm reading database systems concepts by Hank Korth.The book states the following gives list of all students who have taken all courses offered
in the Biology department.
select distinct S.ID, S.name
from student as S
where not exists (
(
select course id
from course
where dept name = 'Biology'
)
except
(
select T.course id
from takes as T
where S.ID = T.ID
)
);
student(ID, name, dept name, tot cred)
course(course id, title, dept name, credits)
takes(ID, course id, sec id, semester, year, grade)
However, to my understanding, the last subquery finds all students who are taking at least one course, and by doing minus from the first subquery, we are subtracting all biology courses that are currently taken by students, so we will be left with biology courses that are NOT taken by any students(if any).Then when we do 'not exists' with all student ID's, we are looking for student ID's who are not taking those found biology courses plus they can take any course other than biology.But that does not give list of students who are taking all courses offered in biology department.Can someone please explain?
Please Note: I do understand the use if 'not exist' from this
site but I'm not getting the book example.

I think what you are missing is that the NOT EXISTS subquery gets executed independently, for each row returned by the outer query.
Notice that its a correlated subquery. There's a reference to a column from the outer query S.ID.
A single value of S.ID is passed in for each execution of the subquery. So, if a student is taking all of the courses from the Biology department, the EXCEPT operation in the subquery is going to result in an empty set. And the NOT EXISTS will evaluate to TRUE. But if the subquery returns a row, then there's at least on Biology course which a student isn't taking.

Related

I don't know why this query means this sentence

select distinct S.ID, S.name
from student as S
where not exists (
(select course_id from course where dept_name = ’Biology’)
except
(select T.course_id from takes as T where S.ID = T.ID)
);
This query says in my book that it means
"Find all students who have taken all courses offered in the Biology
department"
The table is as follows:
student(ID, name, dept_name,tot_cred)
takes(ID, course_id, sec_id, semester, year, grade)
course(course_id, title, dept_name, credits)
I thought the query didn't work because student table didn't have a courseID.
Why is the execution of this query the same as described above?
I thought the query didn't work because student table didn't have a courseID.
Note that course_id is never referred to except from the table course (select course_id from course) or the table takes (select T.course_id from takes as T). The query doesn't ever refer to student.course_id so your question belies your misunderstanding of what the query is doing.
The query is a bit confusing because it effectively employs a double negative. The where not exists clause more or less says "find all students where, if we take the set of biology courses offered and remove all of the courses the student has taken, the result is the empty set."
-- Where the following set is empty...
where not exists (
-- All of the biology courses offered...
(select course_id from course where dept_name = ’Biology’)
-- EXCEPT those that the student has taken.
except
(select T.course_id from takes as T where S.ID = T.ID)
)
If we take all offered biology courses and remove the courses that the student has taken, and the result is an empty set, the only possible explanation is that the student has taken all of the biology courses offered. (Aside: It's also possible that there are no biology courses offered, in which case the student still has taken all of the biology courses offered -- this is called vacuous truth.)
The query is saying: "There does not exist a course in the biology department that the student did not take".
The second subquery (after the except) is getting all the courses that the student takes. These are removed from all courses in the biology department. So, if a student took all courses in that department, the result would be no rows. Otherwise, the result are the rows the student did not take.
The select distinct is highly misleading. The student table should not have duplicates.
I prefer using aggregation for these types of queries:
select t.student_id
from takes t join
course c
on t.course_id = c.course_id
where c.dept_name = 'Biology'
group by t.student_id
having count(distinct t.course_id) = (select count(*) from course c2 where c2.dept_name = 'Biology');
For me (at least), the logic here more clearly matches "students would took all courses in the Biology department".

Querying from two tables in Oracle Database

I have two tables where I have students details and other table consists of the details of TAs. Tables are as follows:
Students(B#, first_name, last_name, status, GPA, email, bdate, dept)
TAs(B#, ta_level, office)
Now, For each TA from the CS department, find his/her B#, first name, last name, and birth date. I have tried the following query:
select Students.B#, Students.FIRST_NAME, Students.LAST_NAME, Students.BDATE
from Students INNER JOIN TAs ON Students.B# = TAs.B#;
but I have to get only those TAs who are studying in Computer Science. I am using Oracle DB. How will I add another condition after inner join?
For each TA from the CS department
Is there a table or a column specify if a student is studying Computer Science ? however as per your question it seems from the department you can know that.
You can do the below:
select Students.B#, Students.FIRST_NAME, Students.LAST_NAME, Students.BDATE
from Students INNER JOIN TAs ON Students.B# = TAs.B#
where Students.dept='CS' -- or computer science depending on the value.

Get courses chosen by all students

Now there are two tables student(sid, name) and course(cid, name), which relation is many-to-many. So there is another table student_course(sid, cid) that stores the information about what courses have been chosen by whom.
How to write the sql that can get the courses that have been chosen by all the students?
Standard solution: use the NOT EXISTS ( ... NOT EXISTS (...)) construct:
find all courses in which all students take part
==>> there must not exist a student that does not take part in this course
SELECT * FROM course c
WHERE NOT EXISTS (
SELECT * from student s
WHERE NOT EXISTS (
SELECT * from student_course cs
WHERE cs.sid = s.sid
AND cs.cid = c.cid
)
)
;
This query is often faster (given appropiate indexes) than the count() == count() variant. The reason for this: you do not have to count all the (distinct) records; once you found one student that does not take this course you can omit this course from your list of suspects. Also: ANTI-joins often can make use of indexes [so can a count(), but that still has to count all the (distinct) keyvalues in the index]
Select c.cid, c.name
From course c where
(select count(1) from student) = (select count(1) from student_course sc where sc.cid = c.cid);
See SQL Fiddle
It finds all courses where the count of entries for that course in the student_course table matches the number of students
CID NAME
1 Test Course1
4 Test Course4

cant understand how this view in SQL works

I am having a hard time understand how the create view, TRANSCRIPTVIEW, manages to set the grade of 0 for those who did not take a course. An explanation would help, the solution and question is below. Thanks.
Student(Id,Name)
Transcript(StudId,CourseName,Semester,Grade)
Formulate the following query in SQL:
Create a list of all students (Id, Name) and, for each student, list the average grade for the courses taken in the S2002 semester.
Note that there can be students who did not take any courses in S2002. For these, the average grade should be listed as 0.
Solution:
We first create a view which augments TRANSCRIPT with rows that enroll every student into a NULL course with the grade of 0. Therefore, students who did not take anything in semester ’S2002’ will have the average grade of 0 for that semester.
Below is what confuses me, how does this work and why does it work?
CREATE VIEW TRANSCRIPTVIEW AS (
( SELECT * FROM Transcipt)
UNION
(
SELECT S.Id,NULL,’S2002’,0
FROM Student S)
WHERE S.Id NOT IN (
SELECT T.StudId
FROM Transcript T
WHERE T.Semester = ’S2002’) )
)
Remaining solution:
SELECT S.Id, S.Name, AVG(T.Grade)
FROM Student S, TRANSCRIPTVIEW T
WHERE S.Id = T.StudId AND T.Semester = ’S2002’ GROUP BY S.Id
how the create view, TRANSCRIPTVIEW, manages to set the grade of 0 for those who did not take a course
The set of students who did not take a course in semester S2002 have no record in the transcript table for that semester. Those who did take a course in that semester do have a record in the table for that semester. The query supplies values NULL, 'S2002',0 for students if they are not in the Transcript table for semester S2002:
SELECT S.Id,NULL,’S2002’,0 FROM Student S) -- this parenthesis is wrong
-- this following where conditions looks for students NOT IN the 2002 subset:
WHERE S.Id NOT IN
-- this next part gets a list of studentids for semester 2002
(
SELECT T.StudId FROM Transcript T
WHERE T.Semester = ’S2002’
)
The solution in your qusetion is ridiculous. The better solution is:
SELECT S.Id, S.Name, AVG(case when T.Semester = ’S2002’ then T.Grade end) as AvgS2002Grade
FROM Student S left outer join
TRANSCRIPTVIEW T
on S.Id = T.StudId AND T.Semester = ’S2002’
GROUP BY S.Id
The query in your question is overly complicated. It is using a union (which should really be union all for performance reasons) to be sure all students are included. Gosh, this is what left outer join is for. It is doing filtering in the where clause, when a conditional aggregation is more suitable. It uses archaic join syntax, instead of the ANSI standard.
I hope you are not learning SQL with those shortcomings.

Getting single records back from joined tables that may produce multiple records

I've got a student table and an enrollment table; a student could have multiple enrollment records that can be active or inactive.
I want to get a select that has a single student record and an indicator as to whether that student has active enrollments.
I thought about doing this in an inline UDF that uses the student ID in a join to the enrollment table, but I wonder if there's a better way to do it in a single select statement.
The UDF call might look something like:
Select Student_Name,Student_Email,isEnrolled(Student_ID) from Student
What might the alternative - with one SQL statement - look like?
select Student_Name,
Student_Email,
(select count(*)
from Enrollment e
where e.student_id = s.student_id
) Number_Of_Enrollments
from Student e
will get the number of enrollments, which should help.
Why not join to a secondary select? Unlike other solutions this isn't firing a subquery for every row returned, but gathers the enrollment data for everyone all at once. The syntax may not be quite correct, but you should get the idea.
SELECT
s.student_name,
s.student_email,
IsNull( e.enrollment_count, 0 )
FROM
Students s
LEFT OUTER JOIN (
SELECT
student_id,
count(*) as enrollment_count
FROM
enrollments
WHERE
active = 1
GROUP BY
student_id
) e
ON s.student_id = e.student_id
The select from enrollments could also be redone as a function which returns a table for you to join on.
CREATE FUNCTION getAllEnrollmentsGroupedByStudent()
RETURNS #enrollments TABLE
(
student_id int,
enrollment_count int
) AS BEGIN
INSERT INTO
#enrollments
(
student_id,
enrollment_count
) SELECT
student_id,
count(*) as enrollment_count
FROM
enrollments
WHERE
active = 1
GROUP BY
student_id
RETURN
END
SELECT
s.student_name,
s.student_email,
e.enrollment_count
FROM
Students s
JOIN
dbo.getAllEnrollmentsGroupedByStudent() e
ON s.student_id = e.student_id
Edit:
Renze de Waal corrected my bad SQL!
Try someting like this:
SELECT Student_Name, Student_Email, CAST((SELECT TOP 1 1 FROM Enrollments e WHERE e.student_id=s.student_id) as bit) as enrolled FROM Student s
I think you can also use the exists statement in the select but not positive
try to avoid using udfs or subqueries, they are performance killers. banjolity seems to havea good solution otherwise because it uses a derivd table instead of a UDF or subselect.
select students.name,
decode(count(1), 0, "no enrollments", "has enrollments")
from students, enrollments
where
students.id = enrollments.sutdent_id and
enrollments.is_active = 1 group by students.name
Of course, replace the decode with a function your database uses (or, a case statement).