Efficient way to select records missing in another table - sql

I have 3 tables. Below is the structure:
student (id int, name varchar(20))
course (course_id int, subject varchar(10))
student_course (st_id int, course_id int) -> contains name of students who enrolled for a course
Now, I want to write a query to find out students who did not enroll for any course. As I could figure out there are multiple ways to fetching this information. Could you please let me know which one of these is the most efficient and also, why. Also, if there could be any other better way of executing same, please let me know.
db2 => select distinct name from student inner join student_course on id not in (select st_id from student_course)
db2 => select name from student minus (select name from student inner join student_course on id=st_id)
db2 => select name from student where id not in (select st_id from student_course)
Thanks in advance!!

The subqueries you use, whether it is not in, minus or whatever, are generally inefficient. Common way to do this is left join:
select name
from student
left join student_course on id = st_id
where st_id is NULL
Using join is "normal" and preffered solution.

The canonical (maybe even synoptic) idiom is (IMHO) to use NOT EXISTS :
SELECT *
FROM student st
WHERE NOT EXISTS (
SELECT *
FROM student_course
WHERE st.id = nx.st_id
);
Advantages:
NOT EXISTS(...) is very old, and most optimisers will know how to handle it
, thus it will probably be present on all platforms
the nx. correlation name is not leaked into the outer query: the select * in the outer query will only yield fields from the student table, and not the (null) rows from the student_course table, like in the LEFT JOIN ... WHERE ... IS NULL case. This is especially useful in queries with a large number of range table entries.
(NOT) IN is error prone (NULLs), and it might perform bad on some implementations (duplicates and NULLs have to be removed from the result of the uncorrelated subquery)

Using "not in" is generally slow. That makes your second query the most efficient. You probably don't need the brackets though.

Just as a comment: I would suggest to select student Id (which are unique) and not names.
As another query option you might want to join the two tables, group by student_id, count(course_id) having count(course_id) = 0.
Also, I agree that indexes will be more important.

Related

SQL-Find data that appears in one table and not another BUT data appears in more than one column

Literally that is my question. How would one go about querying data that appears in one table and not another, but that data can exist in 2 or more columns?
create table Highschooler(ID int, name text, grade int);
create table Likes(ID1 int, ID2 int);
In a theoretical social network you're given two tables, one named highschooler, one named
likes. In likes, id1 likes id2, but it's not
necessarily mutual.
Question: Find all students who do not appear in the Likes table (as
a student who likes or is liked) and return their names and grades.
I've tried left joins, not in, not exists, is null and am not getting it. I have a feeling there are a few joins in this one but I'm not extremely experienced with SQL.
Using SQlite so I cant use select* in a subquery. Tried that too.
I'd use the anti-join pattern.
SELECT h.*
FROM highschooler h
LEFT
JOIN Likes lo
ON lo.id1 = h.id
LEFT
JOIN Likes lt
ON lt.id2 = h.id
WHERE lo.id1 IS NULL
AND lt.id2 IS NULL
That would work on MySQL. I'm not sure about restrictions in sqllite.
That basically says... get me all rows from highschooler, along with matching rows from Likes. Any rows in highschooler that don't have a matching row in Likes will be returned, with NULL values as placeholders for the columns from the Likes table. The "trick" is the predicate in the WHERE clause, that excludes any rows where a match was found.
Also found this worked:
SELECT name, grade
FROM Highschooler
WHERE ID NOT IN (SELECT ID1 FROM Likes) AND ID NOT IN (SELECT ID2 FROM Likes)
I didn't know that you could separate select statements in the where clause with and/or. I thought they always had to be nested. Evidently you can. I'm new.

Cannot find correct number of values in a table that are not in another table, though I can do otherwise

I want to retrieve the course_id in table course that is not in the table takes. Table takes only contains course_id of courses taken by students. The problem is that if I have:
select count (distinct course.course_id)
from course, takes
where course.course_id = (takes.course_id);
the result is 85 which is smaller than the total number of course_id in table course, which is 200. The result is correct.
But I want to find the number of course_id that are not in the table takes, and I have:
select count (distinct course.course_id)
from course, takes
where course.course_id != (takes.course_id);
The result is 200, which is equal the number of course_id in table course. What is wrong with my code?
This SQL will give you the count of course_id in table course that aren't in the table takes:
select count (*)
from course c
where not exists (select *
from takes t
where c.course_id = t.course_id);
You didn't specify your DBMS, however, this SQL is pretty standard so it should work in the popular DBMSs.
There are a few different ways to accomplish what you're looking for. My personal favorite is the LEFT JOIN condition. Let me walk you through it:
Fact One: You want to return a list of courses
Fact Two: You want to
filter that list to not include anything in the Takes table.
I'd go about this by first mentally selecting a list of courses:
SELECT c.Course_ID
FROM Course c
and then filtering out the ones I don't want. One way to do this is to use a LEFT JOIN to get all the rows from the first table, along with any that happen to match in the second table, and then filter out the rows that actually do match, like so:
SELECT c.Course_ID
FROM
Course c
LEFT JOIN -- note the syntax: 'comma joins' are a bad idea.
Takes t ON
c.Course_ID = t.Course_ID -- at this point, you have all courses
WHERE t.Course_ID IS NULL -- this phrase means that none of the matching records will be returned.
Another note: as mentioned above, comma joins should be avoided. Instead, use the syntax I demonstrated above (INNER JOIN or LEFT JOIN, followed by the table name and an ON condition).

MySQL 5.5 Database Query Help

I am having a few issues with a DB query.
I have two tables, students (Fields: FirstName, LastName, StdSSN), and teachers (TFirstName, TLastName, TSSN) which I've stripped down for this example. I need to perform a query that will return all the students except for the students that are teachers themselves.
I have the query
SELECT student.FirstName, student.LastName
FROM `student`,`teachers`
WHERE student.StdSSN=teachers.TSSN
Which gives me a list of all the teachers who are also students it does not provide me with a list of students who are not teachers, so I tried changing to:
SELECT student.FirstName, student.LastName
FROM `student`,`teachers`
WHERE student.StdSSN!=teachers.TSSN
Which gives me a list of all the students with many duplicate values so I am a little stuck here. How can I change things to return a list of all students who are not teachers? I was thinking INNER/OUTER/SELF-JOIN and was playing with that for a few hours but things became complicated and I did not accomplish anything so I've pretty much given up.
Can anyone give me any advice? I did see the query before and it was pretty simple, but I've failed somewhere.
Using NOT IN
SELECT s.*
FROM STUDENTS s
WHERE s.stdssn NOT IN (SELECT t.tssn
FROM TEACHERS t)
Using NOT EXISTS
SELECT s.*
FROM STUDENTS s
WHERE NOT EXISTS (SELECT NULL
FROM TEACHERS t
WHERE t.tssn = s.stdssn)
Using LEFT JOIN / IS NULL
SELECT s.*
FROM STUDENTS s
LEFT JOIN TEACHERS t ON t.tssn = s.stdssn
WHERE t.column IS NULL
I used "column" for any column in TEACHERS other than what is joined on.
Comparison:
If the column(s) compared are nullable (value can be NULL), NOT EXISTS is the best choice. Otherwise, LEFT JOIN/IS NULL is the best choice (for MySQL).

SQL question: Show me students who have NOT taken a certain course?

So I have these tables:
STUDENTS:
Student ID - First name - Last name - Email
COURSES:
Catalog ID - Course Name - Description
TERMS:
Term ID - Start Date - End Date
COURSEINSTANCES:
CourseInstance ID - Catalog ID - Term ID
STUDENTCOURSES:
StudentCourse ID - CourseInstance ID - Student ID - Date added to database
This makes it easy to see which students have taken which courses. I'm not sure how to go about finding out which students have NOT taken a particular course.
Doing something like this:
WHERE ((CourseInstances.CatalogLookup)<>504)
will just give me a list of courses taken by students that do not equal catalog number 504 like this:
Tara - 501
Tara - 502
Tara - 505
John - 503
So for example I've taken 504. Therefore I do not want me to show up on this list. The SQL above will just show all of my courses that are not 504, but it will not exclude me from the list.
Any ideas? Is this possible?
I prefer this syntax over outer joins, IMO it's easier to read:
select *
from STUDENTS
where StudentID not in
(
select StudentID
from STUDENTCOURSES s
inner join COURSEINSTANCES c on s.CourseInstanceID = c.CourseInstanceID
where c.CatalogID = 504
)
In the nested query, you select the StudentIDs of all students who HAVE taken course 504.
Then, you select all the students whose StudentIDs are not included in the nested query.
EDIT:
As ChrisJ already said, the c and the s are aliases for the table names.
Without them, the query would look like this:
select *
from STUDENTS
where StudentID not in
(
select StudentID
from STUDENTCOURSES
inner join COURSEINSTANCES on STUDENTCOURSES.CourseInstanceID = COURSEINSTANCES.CourseInstanceID
where CatalogID = 504
)
I always use aliases because:
a) I'm too lazy to type the table names more often than necessary.
b) In my opinion it's easier to read, especially when you join tables with long names.
Try something like this:
SELECT *
FROM Users
WHERE UserID NOT IN
( SELECT UserID
FROM
Users
INNER JOIN
ClassesTaken ON Users.UserID = ClassesTaken.UserID AND ClassesTaken.ClassNumber = 504)
Another way occurred to me the other day:
SELECT *
FROM
Users
LEFT OUTER JOIN ClassesTaken ON Users.UserID = ClassesTaken.UserID AND ClassesTaken.ClassNumber = 504
WHERE ClassesTaken.UserID IS NULL
You should read about outer joins.
SELECT * FROM students
WHERE studentId not in
(SELECT distinct studentID FROM studentCourses WHERE courseInstanceID = 504)
Three main ways in Access
NOT IN (Be careful to exclude any NULLs if there is any possibility of them appearing in the sub query)
OUTER JOIN and filter on NULL (may need DISTINCT added)
NOT EXISTS
Other RDBMSs also have EXCEPT or MINUS
homework? use set operators.
select all students MINUS select any student who has taken this course...

T-SQL: Comparing Two Tables - Records that don't exist in second table

If UNION ALL is an addition in T-SQL. What is the equivalent of subtraction?
For example, if I have a table PEOPLE and a table EMPLOYEES. And I know if I remove EMPLOYEES records from PEOPLE I will be left with my companies CONTRACTORS.
Is there a way of doing this that is similar to UNION ALL? One where I don't have to specify any field names? The reason I ask is this is just one hypothetical example. I need to do this several times to many different tables. Assume that the schema of EMPLOYEES and PEOPLE are the same.
You can use the EXCEPT operator to subtract one set from another. Here's a sample of code using EMPLOYEES and PEOPLE temporary tables. You'll need to use the field names with the EXCEPT operator as far as I know.
CREATE TABLE #PEOPLE
(ID INTEGER,
Name NVARCHAR(50))
CREATE TABLE #EMPLOYEE
(ID INTEGER,
Name NVARCHAR(50))
GO
INSERT #PEOPLE VALUES (1, 'Bob')
INSERT #PEOPLE VALUES (2, 'Steve')
INSERT #PEOPLE VALUES (3, 'Jim')
INSERT #EMPLOYEE VALUES (1, 'Bob')
GO
SELECT ID, Name
FROM #PEOPLE
EXCEPT
SELECT ID, Name
FROM #EMPLOYEE
GO
The final query will return the two rows in the PEOPLE table which do not exist in the EMPLOYEE table.
Instead of using UNION, use EXCEPT, ( or INTERSECT to get only records in both )
as described in
msdn EXCEPT Link for Sql2k8
msdn EXCEPT Link for Sql2k5
SELECT
P.*
FROM
People P
LEFT OUTER JOIN Employees E ON
E.ID = P.ID -- Or whatever your PK-FK relationship is
WHERE
E.ID IS NULL
For SQL Server this will probably be the most performant way that you can do it.
SELECT * FROM Table1
WHERE Table1.Key NOT IN (SELECT Table2.Key FROM Table2 WHERE Table2.Key IS NOT NULL)
Added IS NOT NULL to make people happy.
I would agree with Tom. His version is most likely more efficient. The only possible reason to use mine, might be that it's prettier.
Unfortunately there is a problem in your design.
instead of having two table PEOPLE and CONTRACTOR.
You should have a table PEOPLE and another Table TYPE (if some people can have several role another table maybe needed).
In your PEOPLE table you make a referece to the TYPE table.
then you requests become
SELECT * from PEOPLE, TYPE
WHERE PEOPLE.type_id = TYPE.id
AND TYPE.name = 'CONTRACTOR'
SELECT * from PEOPLE, TYPE
WHERE PEOPLE.type_id = TYPE.id
AND TYPE.name = 'EMPLOYEE'
(untested)
When I compare tables looking for data that isn't in one that is in the other I typically use SQL Division.
select *(or selected matching field)
from tableA as A
where not exist
(select *(or selected matching field)
from tableB as B
where A.key = B.key)
This query will return the results that are in tableA that are not in through the process of division.
select *(or selected matching field)
from tableA as A
where exist
(select *(or selected matching field)
from tableB as B
where A.key = B.key)
This query will return all the rows of data that match in both tables therefore if there is a row data that is in tableA that isn't in tableB that row of data will not be retrieved.
I found it is a lot easier to use a tool like SQLMerger to do this for you. The results are displayed in a nicer way and you can go on with whatever you need to do with the data thereafter easily.
www.auisoft.com/SQLMerger <= the tool that makes it easy to compare data
example on comparing two tables: http://auisoft.com/SQLMerger/How-to/visualize-differences-in-2-databases/