I am having a few issues with a DB query.
I have two tables, students (Fields: FirstName, LastName, StdSSN), and teachers (TFirstName, TLastName, TSSN) which I've stripped down for this example. I need to perform a query that will return all the students except for the students that are teachers themselves.
I have the query
SELECT student.FirstName, student.LastName
FROM `student`,`teachers`
WHERE student.StdSSN=teachers.TSSN
Which gives me a list of all the teachers who are also students it does not provide me with a list of students who are not teachers, so I tried changing to:
SELECT student.FirstName, student.LastName
FROM `student`,`teachers`
WHERE student.StdSSN!=teachers.TSSN
Which gives me a list of all the students with many duplicate values so I am a little stuck here. How can I change things to return a list of all students who are not teachers? I was thinking INNER/OUTER/SELF-JOIN and was playing with that for a few hours but things became complicated and I did not accomplish anything so I've pretty much given up.
Can anyone give me any advice? I did see the query before and it was pretty simple, but I've failed somewhere.
Using NOT IN
SELECT s.*
FROM STUDENTS s
WHERE s.stdssn NOT IN (SELECT t.tssn
FROM TEACHERS t)
Using NOT EXISTS
SELECT s.*
FROM STUDENTS s
WHERE NOT EXISTS (SELECT NULL
FROM TEACHERS t
WHERE t.tssn = s.stdssn)
Using LEFT JOIN / IS NULL
SELECT s.*
FROM STUDENTS s
LEFT JOIN TEACHERS t ON t.tssn = s.stdssn
WHERE t.column IS NULL
I used "column" for any column in TEACHERS other than what is joined on.
Comparison:
If the column(s) compared are nullable (value can be NULL), NOT EXISTS is the best choice. Otherwise, LEFT JOIN/IS NULL is the best choice (for MySQL).
Related
In class, professor said that SQL language does not provide 'for all' operator.
In order to use 'for all' you have to use 'not exist( X except Y)'
At this point, I can't figure out why 'for all' is same meaning as 'not exist( X except Y)'
I give you example relation:
course (cID,title,deptName,credit),
teaches (pID,cID,semester,year,classroom),
student (sID,name,gender,deptName)
Q: Find all student names who have taken all courses offered in 'CS' department
The answer is:
Select distinct
S.sid, S.name
from
student as S
where
not exists (
(select cID from course where deptName = 'CS')
except
(select T.cID from takes as T where S.sID = T.sID)
);
Can you give me specific explain about that?
ps. Sorry for my English skill
You professor is right. SQL has no direct way to query all records that have all possible relations of a certain type.
It's easy to query which relations of a certain type a record has. Just INNER JOIN the two tables and you are done.
But in an M:N relationship like "students" to "taken courses" it's not that simple.
To answer the question "which student has taken all possible courses" you must find out which relations could possibly exist and then make sure that all of them do actually exist.
select distinct
S.sid, S.name
from
student as S
where
not exists (
(select cID from course where deptName = 'CS')
except
(select T.cID from takes as T where S.sID = T.sID)
);
can be translated as
give me all students SELECT
for whom it is true: WHERE
that the following set is empty NOT EXISTS
(any course in 'CS') "all relations that can possibly exist"
minus EXCEPT
(all courses the student has taken) "the ones that do actually exist"
In other words: Of all possible relations there is no relation that does not exist.
There are other ways of expressing the same thought that can be used in database systems without support for EXCEPT.
For example
select
S.sid,
S.name
from
student as S
inner join takes as T on T.sID = S.sID
inner join course as C on C. cID = T. cID
where
c. deptName = 'CS'
group by
S.sid,
S.name
having
count(*) = (select count(*) from course where deptName = 'CS');
From your table definition and requirement its not clear what is the use of teaches table. You want the list of students names those have taken all courses offered by 'CS' department. For this students and course table is enough.
SELECT name
FROM
(
SELECT B.name, A.cid
FROM course A
INNER JOIN student B ON A.deptName = B.deptName
WHERE A.deptName = 'CS'
GROUP BY A.cid, B.name
) A
GROUP BY name
HAVING COUNT(name) >= (SELECT COUNT(cid) FROM course WHERE deptName = 'CS')
Internal query just selects all students those have taken any course offered by 'CS' dept and with group by I just make sure that in case a student take same course twice they will be counted as one row. Next I just select those students take all course offered by 'CS' dept.
I think you have some gap to understand your requirement properly. In your requirement no relation with teaches table is specified.
Q: Find all student names who have taken all courses offered in 'CS'
department
NOT EXISTS returns true if the query passed to it contains 0 records.
In this case, your sub-query from NOT EXISTS selects all the courses offered in 'CS', and subtract from this result set all the courses taken by specific student.
If the student have taken all the courses then except will remove all and the sub-query will return 0 records, which in pair with NOT EXISTS will give you true for specific student, and it will be displayed in final result set.
Brief history: Codd invented the Relational Model (RM), some people created a DBMS loosely based on RM to prove a RM product could be performant, and the SQL language emerged based on that DBMS (i.e. not directly based on the RM).
Codd came up with a set of primitive operators to define a database as being relationally complete. His algebra included product, where two relations are 'multiplied' together to give a combined relation; this made it into SQL as CROSS JOIN. [Side note: people refer to this operator as 'Cartesian product', which results in a set of ordered pairs. However, product in RM results in a relation (as do all relational operators), and CROSS JOIN results in a table expression (loosely speaking).]
Codd's algebra also included a division operator. I guess the thinking is, we should be able to take the result of product and one of the relations and use an operator to result in the other relation. But it does has some practical use too, of course. It is commonly expressed as 'the supplier who supplies all products', after Chris Date's parts and suppliers database found in his books. SQL lacks an explicit division operator, so we need to use other operators to get the desired result.
Note there are two flavours of division, being exact division ("suppliers who supply all the parts we are interested in and no more") and division with remainder ("suppliers who supply at least all the parts we are interested in and possibly more"). I tend to be wary of the answers here that do not mention either the name 'division' or that you need to decide whether you need to deal with remainders.
The thinking behind your professor's answer is that a double negative (in mathematics and English) i.e. if the statement "there is no part I don't supply" is true for a given supplier then that supplier will be in the result.
Note there are operators that Codd omitted (e.g. rename and summerize) that can now be found in SQL, so it's a shame we are still waiting for division!
I have an ER Diagram as shown below
for every student I want to appear all courses that addends.
So I use query
select studentId,course.courseCode
from student natural left outer join attends
natural left outer join course
which gives me all results in right way
now I want to appear the total amount of courses that a student attends
and I am using this query
select studentId,
(select count(attends.courseCode)
from attends natural left outer join student
)as 'amount'
from student
but I am having this result
How am I supposed to appear the real amount of courses for every student whether he is in Attends or not? That is, a 0 for studentId 6,7,8 and a 2 for studentId 17 etc.
Thank you in advance
PS1: If you want more of my tables, please let me know.
PS2: I was not sure about the title. If you find that another title fits better, please suggest
First, don't use natural join. It is entirely dependent on the data structure -- and if that changes, then the semantics of the query change too. In other words, you cannot read a query and really understand what it is doing.
Then, for this query, first generate a list of all students and courses using cross join, then bring in the attendance information:
select s.studentId, c.courseCode, count(a.CourseCode)
from student s cross join
course c left join
attends a
on s.studentId = a.studentId and s.courseCode = c.courseCode
group by s.studentId, c.courseCode;
I have three tables - Assignment, Grades, Student and I am trying to make this query work so that it returns all assignments even if there is no grade entered for a student yet.
The tables are setup like this
Assignment
AssignmentId, AssignmentName, PointsPossible
Grades (Junction table)
StudentId, AssignmentId, PointsReceived
Student
StudentId, StudentName
My query:
select
s.StudentName, a.AssignmentName, g.PointsReceived, a.PointsPossible
from
student s
cross join
assignment a
left outer join
grades g on s.StudentId = g.StudentId and g.AssignmentId = a.AssignmentId
order by
s.StudentName;
When I run the query I get all the names I need, but I don't get all the assignments back. I should be getting all the names, all the assignments, and if the assignment hasn't been graded yet, there should be a null value returned.
I need a little direction, maybe my tables are setup incorrectly.
You need to get all assignments even if there isn't a grade? The obvious question is: without a junction table, how do you know which assignments to provide for each student?
So, let me guess that you want to get a cross product of all students and all assignments, along with grades (if any). If so, you want to structure your query like this:
select s.StudentName, a.AssignmentName, a.PointsPossible, g.PointsReceived
from students s cross join
assignments a left outer join
grades g
on g.StudentId = a.StudentId and g.AssignmentId = a.AssignmentId;
order by s.StudentName;
I have 3 tables. Below is the structure:
student (id int, name varchar(20))
course (course_id int, subject varchar(10))
student_course (st_id int, course_id int) -> contains name of students who enrolled for a course
Now, I want to write a query to find out students who did not enroll for any course. As I could figure out there are multiple ways to fetching this information. Could you please let me know which one of these is the most efficient and also, why. Also, if there could be any other better way of executing same, please let me know.
db2 => select distinct name from student inner join student_course on id not in (select st_id from student_course)
db2 => select name from student minus (select name from student inner join student_course on id=st_id)
db2 => select name from student where id not in (select st_id from student_course)
Thanks in advance!!
The subqueries you use, whether it is not in, minus or whatever, are generally inefficient. Common way to do this is left join:
select name
from student
left join student_course on id = st_id
where st_id is NULL
Using join is "normal" and preffered solution.
The canonical (maybe even synoptic) idiom is (IMHO) to use NOT EXISTS :
SELECT *
FROM student st
WHERE NOT EXISTS (
SELECT *
FROM student_course
WHERE st.id = nx.st_id
);
Advantages:
NOT EXISTS(...) is very old, and most optimisers will know how to handle it
, thus it will probably be present on all platforms
the nx. correlation name is not leaked into the outer query: the select * in the outer query will only yield fields from the student table, and not the (null) rows from the student_course table, like in the LEFT JOIN ... WHERE ... IS NULL case. This is especially useful in queries with a large number of range table entries.
(NOT) IN is error prone (NULLs), and it might perform bad on some implementations (duplicates and NULLs have to be removed from the result of the uncorrelated subquery)
Using "not in" is generally slow. That makes your second query the most efficient. You probably don't need the brackets though.
Just as a comment: I would suggest to select student Id (which are unique) and not names.
As another query option you might want to join the two tables, group by student_id, count(course_id) having count(course_id) = 0.
Also, I agree that indexes will be more important.
I have a query which checks a table for a number of fields. The two tables I am interested in are: PERSON & PERSON_ALTERNATE_ID.
I want to modify my query to also return the value stored in person_alternate_id (if the particular person indeed has one)
select distinct person.person_id, person_name, person_address
from person join person_alternate_id
on
person.person_id=person_alternate_id.person_id
where person.person_id
in (10001,10002,10003);
Can anyone suggest how I could do that? I was looking at nested select examples, but I wasn't able to implement a suitable change to my query that achieved what I require. At the moment, the query only returns fields I need from the PERSON table.
Because the person may or may not have an alternate id, you should use a left join:
select person.person_id, person_name, person_address, person_alternate_id.*
from person
left join person_alternate_id
on person.person_id=person_alternate_id.person_id
where person.person_id
in (10001,10002,10003);