Query to find missing teacher-student combination - sql

I have a single table with two columns, TEACHER_ID and STUDENT_ID, which holds the data of all teachers teaching the students. One teacher can teach multiple students and one student can be taught by many teachers.
TEACHER_ID STUDENT_ID
100 123
100 124
100 125
100 126
101 123
101 124
101 125
102 123
102 124
102 125
102 126
103 123
103 127
The need is to find which teacher is not teaching which student, i.e. it should show the output as the same 2 columns but the pair should be such that it doesn't exist in the table.
For example: student_id 127 is taught by teacher_id 103 alone, and similarly for all such missing pairs...
We can create a cross join to get all possible combinations and using the MINUS operator discard the actual data from the result leaving us with rest of the pairs.
But is there a better and efficient way to do this?

As for your question. You need to create all combinations first there isnt another way to find the missing pair.
SQL DEMO
With teachers as (
SELECT DISTINCT "TEACHER_ID"
FROM Table1
), students as (
SELECT DISTINCT "STUDENT_ID"
FROM Table1
)
SELECT teachers."TEACHER_ID" , students."STUDENT_ID"
FROM teachers
CROSS JOIN students
LEFT JOIN Table1 t
ON teachers."TEACHER_ID" = t."TEACHER_ID"
AND students."STUDENT_ID" = t."STUDENT_ID"
WHERE t."TEACHER_ID" IS NULL
ORDER BY 2, 1

You can do this by generating all combinations of teacher and student using cross join. Then filter out the ones that exist:
select t.teacher_id, s.student_id
from (select distinct teacher_id from t) t cross join
(select distinct student_id from t) s
minus
select t.teacher_id, s.student_id
from t;
For the filtering part, you can also use not in, not exists, or left join/where.

Related

How to combine two tables in SQL Server

I have these two tables:
Table 1
Date Name StudentID TotalDuration
------------------------------------------------------------
2019-09-30 aA 11111 100
2019-09-30 bB 22222 40
2019-09-30 cC 33333 60
2019-10-07 aA 11111 50
2019-10-07 bB 22222 10
2019-10-07 cC 33333 12
2019-10-07 dD 44444 90
It contains data of students who ATTENDED a lecture (hence, does not include those who did not attend.
Table 2
StudentID Surname FirstName Group
------------------------------------------------------------
11111 A a 1
22222 B b 1
33333 C c 1
44444 D d 2
55555 E e 2
66666 F f 2
77777 G g 3
Table2 contains data of ALL students in the class.
Name attribute in Table 1 is combination of surname + first name and TotalDuration is duration of student participation in minutes.
I want to combine these two tables so that it list all the students in the class and their TotalDuration.
I tried OUTER JOIN and UNION ALL, but can't figure out how I can list all the student, yet shows NULL value for those students who did not attend to the lecture on particular date.
How can I achieve this?
You could left join and group by. But since you just need one aggregate computation, a correlated subquery is probably a simpler and more efficient approach (if you have the right index in place - see below):
select
t2.*,
(
select sum(totalduration)
from table1 t1
where t1.studentid = t2.studentid
) totalduration
from table2 t2
For performance, consider an index on table1(studentid, totalduration).
Side note: from a database design perspective, column Name in table1 is just not needed; this is redondant information, that complicates maintenance task (what if someone changes the first name of a studend in the other table?). You should remove that column, and rely on the foreign key on studentid only.
You probably want to aggregate the result by StudentID and use a LEFT JOIN, as in:
select
a.StudentId,
a.FirstName,
a.Surname,
a.Group,
sum(b.TotalDuration) as total_duration
from table2 a
left join table1 b on b.StudentId = a.StudentId
group by a.StudentId
Note that grouping by a.FirstName and a.Surname (assuming StudentId is the primary key) is optional according to the SQL Standard. I'm not sure SQL server will require it, though.

SQL select from 2 tables where values in the second table might be absent

I have 2 tables:
Students with columns (student id, firstname, lastname) and
grades (gradeid, studentid, grade).
How can I select ALL students including those that do not have grades (hence, they are not in the grades tables).
For example:
Students
100 StudentFN1 StudentLN1
101 StudentFN2 StudentLN2
102 StudentFN3 StudentLN3
Grades
1 101 90
2 102 70
So I want all students to be selected, even student 102 which is not in Grades. Student 102 grade should be empty.
You are looking for an outer join:
select *
from students
left outer join grades on grades.studentid = students.studentid
order by students.studentid;
All you need to do is a left join
select
student.*, grade_id, grade
from students
left join grades
on students.student_id = grades.student_id

Selecting Records Matching Two or More Related Tables

I have a 'persons' table:
person_id name
100 jack
125 jill
201 jane
And many sub-tables, that the person_id could be in:
'rowing'
id person_id
1 100
2 201
'swimming'
id person_id
1 125
2 201
'running'
id person_id
1 201
'throwing'
id person_id
1 125
2 201
I would like to be able to select all people who are involved in two activities, regardless of which two.
As the great #TimSchmelter (great first name) mentioned, you should really be having a single PersonActivities table with an id corresponding to the particular activity.
That being said, if you must work with your current schema, one option would be to UNION together the activity tables, and then count which persons have two or more records, meaning that they participated in two or more activities.
SELECT t1.person_id, t1.name
FROM persons
INNER JOIN
(
SELECT t.person_id, COUNT(t.person_id) AS activityCount
FROM
(
SELECT person_id FROM rowing
UNION ALL
SELECT person_id FROM swimming
UNION ALL
SELECT person_id FROM running
UNION ALL
SELECT person_id FROM throwing
) AS t
GROUP BY t.person_id
HAVING COUNT(t.person_id) > 1
) t2
ON t1.person_id = t2.person_id

Hive query equivalent of sql

Hi I have a table student as follows:
student_id course_id
1111 100
2222 101
3333 101
4444 102
5555 103
And a courses table as follows:
course_id course_desc
100 Electronics
101 Computer
102 Mechanical
In case If I want to join the above 2 tables I have course_id from student table which is not listed in courses table. So every time I do join I should compare the course_id with courses table and come to know a new course_id is in students table.
I believe in Sql we can use such as :
select DISTINCT course_id from students WHERE course_id NOT IN ( select course_id FROM courses);
How this can be done in Hive, any help or suggestion is much appreciated.
This should work:
select students.course_id from students students LEFT OUTER JOIN courses courses ON (students.course_id = courses.course_id) where courses.course_id is null;
I don't think NOT IN is supported so this work around.

Selecting only rows with the highest value of one field, grouped by another field

I have a table that has information structured like this:
ID Points Name School
1 123 James A
2 534 Henry B
3 56 Henry B
4 153 Chris B
5 95 Chris B
6 83 Chris B
7 421 James A
And I need to get out of a query the rows that have the same name, but only the highest points for each like this:
ID Points Name School
7 421 James A
2 534 Henry B
4 153 Chris B
Any ideas on how this could be accomplished with a query? I've spent way too much time trying to figure this out.
select name,school,max(points) from table group by name,school
That will give you the max points per name/school combination. Join it to itself if you want the ID:
select table.* from table inner join
(select name,school,max(points) as points from table group by name,school) a
on a.name = table.name and a.school = b.school and a.points = table.points
edit : sorry, this is a SQL solution...just saw the MSACCESS tag. Logic is right, but you'll need to convert to access syntax.
edit to correct the second query, missed a column inh my join
SELECT
(SELECT TOP 1 ID FROM Table
WHERE
Name = t.Name AND
School=t.School AND
Points=t.Points
) as Id, t.Name, t.Points, t.School
FROM
(SELECT Name, School, max(Points) as Points
FROM Table
GROUP BY Name, School) t