SQL Server - Combine two select queries - sql

I have two tables, Semester1 and Semester2.
Semester1:
StudentId
SubjectId
abc
sub1
def
sub1
ghi
sub1
Semester2:
StudentId
SubjectId
abc
changedSub1
def
sub1
ghi
changedSub2
newStudent1
sub2
newStudent2
sub3
I am trying to write a single Select statement such that it selects rows from Semester2 that have:
New StudentIds - i.e., StudentIds in Semester2 that are not in Semester1. So the result from this requirement should be Semester2's newStudent1 and newStudent2 rows.
AND
Changed SubjectIds - i.e., SubjectId are different for the same StudentId between Semester1 and Semester2. So the result from this requirement should be Semester2's changedSub1 and changedSub2 rows.
I have been able to write two separate queries to select the 2 requirements separately:
-- Part 1
SELECT * FROM Semester2
WHERE StudentId NOT IN ( SELECT StudentId from Semester1 );
-- Part 2
SELECT Semester2.StudentId, Semester2.SubjectId
FROM Semester2
JOIN Semester1
ON (Semester1.StudentId = Semester2.StudentId)
WHERE Semester1.SubjectId <> Semester2.SubjectId;
How can I combine the two queries? Or if there is a better/easier/clearer way to write both requirements as a single query (without combining my above queries), how do I do that?

It looks like a single query with an outer join should suffice
select s2.*
from semester2 s2
left join semester1 s1 on s1.studentId = s2.studentId
where s1.studentId is null or s2.SubjectId != s1.SubjectId;

You could also do it in a single query using a join if UNION doesn't count for "single query":
SELECT s2.*
FROM Semester2 s2
LEFT OUTER JOIN Semester1 s1
ON s2.StudentId = s1.StudentId
AND s2.SubjectId = s1.SubjectId
WHERE s1.StudentId IS NULL;
The WHERE clause will make it so only results where there isn't a perfect match in Semester1 appear.

You might just need to extend your "Part 1" query a little. Right now it excludes all students from semester 1, but you only want to exclude students from semester 1 that do not have changed subjects in semester 2.
Something like this:
SELECT * FROM Semester2
WHERE StudentId NOT IN (-- Student ids with same subject from semester 1
SELECT StudentId FROM Semester1
WHERE Semester1.SubjectId = Semester2.SubjectId);
But I haven't tested it. Please let me know if I made some terrible mistake.

One option that requires no UNION (which requires scanning the table twice) and no OR condition (which can be slow) and no LEFT JOIN (which confuses the optimizer into thinking there will be multiple joined rows)
SELECT s2.*
FROM semester2 s
WHERE NOT EXISTS (SELECT 1
FROM semester1 s1
WHERE s1.studentId = s2.studentId
AND s2.SubjectId = s1.SubjectId
);

Here's the simplest thing I can think of.
select Semester2.*
from Semester2
left outer join Semester1
on Semester1.StudentId = Semester2.StudentId
where NULLIF(Semester2.SubjectId,Semester1.SubjectId) is NOT NULL
NULLIF will return NULL if the two things are equal (same student had the same subject both semesters). Otherwise it returns Semester2.SubjectId. This excludes exactly what you want to exclude - students from Semester1 who didn't have a different subject in Semester2.

Related

How to Limit Results Per Match on a Left Join - SQL Server

I have a table with student info [STU] and a table with parent info [PAR]. I want to return an email address for each student, but just one. So I run this query:
SELECT [STU].[ID], [PAR].[EM]
FROM (SELECT [STU].* FROM DB1.STU)
STU LEFT JOIN (SELECT [PAR].* FROM DB1.PAR) PAR ON [STU].[ID] = [PAR].[ID]
This gives me the below table:
Student ID ParentEmail
1 jim#email.com
1 sarah#email.com
2 paul#email.com
2 tim#email.com
3 bill#email.com
3 frank#email.com
3 joyce#email.com
4 greg#email.com
5 tony#email.com
5 sam#email.com
Each student has multiple parent emails, but I only want one. In other words, I want the output to look like this:
Student ID ParentEmail
1 jim#email.com
2 paul#email.com
3 frank#email.com
4 greg#email.com
5 sam#email.com
I've tried so many things. I've tried using GROUP BY and MIN/MAX and I've tried complex CASE statements, and I've tried COALESCE but I just can't seem to figure it out.
I think OUTER APPLY is the simplest method:
SELECT [STU].[ID], [PAR].[EM]
FROM DB1.STU OUTER APPLY
(SELECT TOP (1) [PAR].*
FROM DB1.PAR
WHERE [STU].[ID] = [PAR].[ID]
) PAR;
Normally, there would be an ORDER BY in the subquery, to give you control over which email you want -- the longest, shortest, oldest, or whatever. Without an ORDER BY it returns just one email, which is what you are asking for.
If you just want one column from the parent table, a simple approach is a correlated subquery:
select
s.id student_id,
(select max(p.em) from db1.par p where p.id = s.id) parent_email
from db1.stu s
This gives you the greatest parent email per student.

Will this left join on same table ever return data?

In SQL Server, on a re-engineering project, I'm walking through some old sprocs, and I've come across this bit. I've hopefully captured the essence in this example:
Example Table
SELECT * FROM People
Id | Name
-------------------------
1 | Bob Slydell
2 | Jim Halpert
3 | Pamela Landy
4 | Bob Wiley
5 | Jim Hawkins
Example Query
SELECT a.*
FROM (
SELECT DISTINCT Id, Name
FROM People
WHERE Id > 3
) a
LEFT JOIN People b
ON a.Name = b.Name
WHERE b.Name IS NULL
Please disregard formatting, style, and query efficiency issues here. This example is merely an attempt to capture the exact essence of the real query I'm working with.
After looking over the real, more complex version of the query, I burned it down to this above, and I cannot for the life of me see how it would ever return any data. The LEFT JOIN should always exclude everything that was just selected because of the b.Name IS NULL check, right? (and it being the same table). If a row from People was found where b.Name IS NULL evals to true, then shouldn't that mean that data found in People a was never found? (impossible?)
Just to be very clear, I'm not looking for a "solution". The code is what it is. I'm merely trying to understand its behavior for the purpose of re-engineering it.
If this code indeed never returns results, then I'll conclude it was written incorrectly and use that knowledge during the re-engineering.
If there is a valid data scenario where it would/could return results, then that will be news to me and I'll have to go back to the books on SQL Joins! #DrivenCrazy
Yes. There are circumstances where this query will retrieve rows.
The query
SELECT a.*
FROM (
SELECT DISTINCT Id, PName
FROM People
WHERE Id > 3
) a
LEFT JOIN People b
ON a.PName = b.PName
WHERE b.PName IS NULL;
is roughly (maybe even exactly) equivalent to...
select distinct Id, PName
from People
where Id > 3 and PName is null;
Why?
Tested it using this code (mysql).
create table People (Id int, PName varchar(50));
insert into People (Id, Pname)
values (1, 'Bob Slydell'),
(2, 'Jim Halpert'),
(3,'Pamela Landy'),
(4,'Bob Wiley'),
(5,'Jim Hawkins');
insert into People (Id, PName) values (6,null);
Now run the query. You get
6, Null
I don't know if your schema allows null Name.
What value can P.Name have such that a.PName = b.PName finds no match and b.PName is Null?
Well it's written right there. b.PName is Null.
Can we prove that there is no other case where a row is returned?
Suppose there is a value for (Id,PName) such that PName is not null and a row is returned.
In order to satisfy the condition...
where b.PName is null
...such a value must include a PName that does not match any PName in the People table.
All a.PName and all b.PName values are drawn from People.PName ...
So a.PName may not match itself.
The only scalar value in SQL that does not equal itself is Null.
Therefore if there are no rows with Null PName this query will not return a row.
That's my proposed casual proof.
This is very confusing code. So #DrivenCrazy is appropriate.
The meaning of the query is exactly "return people with id > 3 and a null as name", i.e. it may return data but only if there are null-values in the name:
SELECT DISTINCT Id, PName
FROM People
WHERE Id > 3 and PName is null
The proof for this is rather simple, if we consider the meaning of the left join condition ... LEFT JOIN People b ON a.PName = b.PName together with the (overall) condition where p.pname is null:
Generally, a condition where PName = PName is true if and only if PName is not null, and it has exactly the same meaning as where PName is not null. Hence, the left join will match only tuples where pname is not null, but any matching row will subsequently be filtered out by the overall condition where pname is null.
Hence, the left join cannot introduce any new rows in the query, and it cannot reduce the set of rows of the left hand side (as a left join never does). So the left join is superfluous, and the only effective condition is where PName is null.
LEFT JOIN ON returns the rows that INNER JOIN ON returns plus unmatched rows of the left table extended by NULL for the right table columns. If the ON condition does not allow a matched row to have NULL in some column (like b.NAME here being equal to something) then the only NULLs in that column in the result are from unmatched left hand rows. So keeping rows with NULL for that column as the result gives exactly the rows unmatched by the INNER JOIN ON. (This is an idiom. In some cases it can also be expressed via NOT IN or EXCEPT.)
In your case the left table has distinct People rows with a.Id > 3 and the right table has all People rows. So the only a rows unmatched in a.Name = b.Name are those where a.Name IS NULL. So the WHERE returns those rows extended by NULLs.
SELECT * FROM
(SELECT DISTINCT * FROM People WHERE Id > 3 AND Name IS NULL) a
LEFT JOIN People b ON 1=0;
But then you SELECT a.*. So the entire query is just
SELECT DISTINCT * FROM People WHERE Id > 3 AND Name IS NULL;
sure.left join will return data even if the join is done on the same table.
according to your query
"SELECT a.*
FROM (
SELECT DISTINCT Id, Name
FROM People
WHERE Id > 3
) a
LEFT JOIN People b
ON a.Name = b.Name
WHERE b.Name IS NULL"
it returns null because of the final filtering "b.Name IS NULL".without that filtering it will return 2 records with id > 3

Taking Unique Records From Two Tables

i am stuck at a silly problem. And it has to be one of the most cliche solution.
Table student_selection:
Columns
=======
student_id
subject_id
faculty_id
Another table: sub_group
Columns
=======
subject_id
sub_group
They are Joint on subject_id.
I want to find those subject_id from table sub_group where, the subject_id is not present in the student_selection table.
Eg:
sub_group(subject_id)
2
3
4
student_selection(subject_id)
2
3
2
3
2
3
Output
4
You can run a simple not in query to get the data.
Like:
SELECT
subject_id
FROM
sub_group
WHERE
subject_id not in (
SELECT
DISTINCT subject_id
FROM
student_selection
)
Standard SQL IN clause:
select subject_id from sub_group
where subject_id not in (select subject_id from student_selection);
Standard SQL EXISTS clause:
select subject_id from sub_group sg
where not exists
(select * from student_selection ss where ss.subject_id = sg.subject_id);
Set based query. Some dbms support set based operations. They use the word EXCEPT or MINUS to subtract one set from another. (MySQL doesn't support this.)
select subject_id from sub_group
except
select subject_id from student_selection;
At last you can use a trick where you outer join the second table and only stay with those results where there was a record outer-joined (i.e. there is no macthing record in the second table). This is also standard SQL. It is less readable for the unexperienced reader but happens to be faster on some dbms.
select sg.subject_id
from sub_group sg
left join student_selection ss on ss.subject_id = sg.subject_id
where ss.subject_id is null;

How to fetch the non matching rows in Oracle

Can anyone help me fetch the non matching rows from two tables in Oracle?
Table: Names
Class_id Stud_name
S001 JAMES
S001 PETER
S002 MARK
Table: Course
Course_id Stud_name
S001 JAMES
S001 KEITH
S002 MARK
Output
I need the rows to display as
CLASS ID STUD_NAME_FROM_NAME_TABLE STUD_NAME_FROM_COURSE_TABLE
---------------------------------------------------------------------
S001 PETER KEITH
I have used Oracle joins to fetch the non matching names:
SELECT *
FROM Names, Course
WHERE Names.Class_id=Course.Course_id
AND Names.Stud_name<>Course.Stud_name
This query is returning duplicate rows.
If you insist on Join you can use this one:
SELECT *
FROM Names
FULL OUTER JOIN Course ON Names.Class_id=Course.Course_id
AND Names.Stud_name = Course.Stud_name
WHERE Names.Stud_name IS NULL or Course.Stud_name IS NULL
Fetches unmatched rows in Names table
SELECT * FROM Names
WHERE
NOT EXISTS
(SELECT 'x' from Course
WHERE
Names.Class_id = Course.Course_id AND
Names.Stud_name = Course.Stud_name)
Fetches unmatched rows in Names and Course too!
SELECT Names.Class_id,Names.Stud_name,C1.Stud_name
FROM Names , Course C1
WHERE Names.Class_id = C1.Course_id AND
NOT EXISTS
(SELECT 'x' from Course C2
WHERE
Names.Class_id = C2.Course_id AND
Names.Stud_name = C2.Stud_name);
When you ask for unmatching rows I assume that you want rows that exist in names but not in course.
If this is the case you're probably after
select * from names
where (class_id, stud_name ) not in
(select course_id, stud_name from course);
Your query returned duplicate rows beacuse for each row in names it selected all rows in course that satisfied the where condition.
So, for the row S001, PETER in names it faound that S001, JAMES and S001, KEITH matched that condition, thus, that row was "returned" twice.
EDIT Since it is not clear if stud_name is a primary key, or unique (and on second sight I think it's not), you'd probably want a
select * from names
where not exists (
select 1 from course where
names.class_id = course.course_id and
names.stud_name <> course.stud_name
)
Edit II if you insist on using a join (as per your comment) you might want to try a
select distinct names.* from...
Hope it helps you
with not_in_class as
(select a.*
from Names a
where not exists ( select 'x'
from course b
where b.Course_id = a.class_id
and a.Stud_name = b.Stud_name)),
not_in_course as
(select b.*
from course b
where not exists ( select 'x'
from Names a
where b.Course_id = a.class_id
and a.Stud_name = b.Stud_name))
select x.class_id,
x.Stud_name NOT_IN_CLASS,
y.stud_name NOT_IN_COURSE
from not_in_class x, not_in_course y
where x.class_id = y.course_id
Output
| CLASS_ID | NOT_IN_CLASS | NOT_IN_COURSE |
|----------|--------------|---------------|
| S001 | PETER | KEITH |
Only problem is that if multiple mismatches are there in both the tables for a given id, it works for single mismatch for a particular id. You need to rework if multiple mismatches are there for the same id.
Well, I am not sure if I understand correctly what you are asking. I think you want a list of all IDs where the student list in class table and course table differs. Then you want to show the id and the students that are in class but not in course and the students that are in course but not in class.
To do so you would full outer join the tables. That gives you students that are both in class and course, students that are in class and not in course, and students that are in course and not in class. Filter your results where either class_id or course_id is null then to get the students missing in course or class. At last group by id and list the students.
select coalesce(class.class_id, course.course_id) as id
, listagg(class.stud_name, ',') within group (order by class.stud_name) as missing_in_course
, listagg(course.stud_name, ',') within group (order by course.stud_name) as missing_in_class
from class
full outer join course
on (class.class_id = course.course_id and class.stud_name = course.stud_name)
where class.class_id is null or course.course_id is null
group by coalesce(class.class_id, course.course_id);
Here is the SQL fiddle showing how it works: http://sqlfiddle.com/#!4/8aaaa/2
EDIT: In Oracle 9i there is no listagg. You can use the inofficial function wm_concat instead:
select coalesce(class.class_id, course.course_id) as id
, wm_concat(class.stud_name) as missing_in_course
, wm_concat(course.stud_name) as missing_in_class
from class
full outer join course
on (class.class_id = course.course_id and class.stud_name = course.stud_name)
where class.class_id is null or course.course_id is null
group by coalesce(class.class_id, course.course_id);

Get distinct result

I was asked this trick question:
Table: Student
ID NAME
1 JOHN
2 MARY
3 ROBERT
4 DENNIS
Table: Grade
ID GRADE
1 A
1 A
1 F
2 B
3 A
How do you write SQL query to return DISTINCT name of all students who has never received grade 'F' OR who has never taken a course (meaning, their ID not present in Grade table)?
Trick part is, you're not allowed to use OUTER JOIN, UNION or DISTINCT. Also, why this is a big deal?
Expected result is MARY, ROBERT, DENNIS (3 rows).
SELECT name FROM Student
WHERE
NOT EXISTS (SELECT * FROM Grade WHERE Grade.id = Student.id AND grade = 'F')
OR
NOT EXISTS (SELECT * FROM Grade WHERE Grade.id = Student.id);
You may use GROUP BY in order to fake a distinct.
SELECT name FROM student
WHERE (SELECT COUNT(*) FROM grade WHERE grade = 'F'
AND id = student.id) = 0
at least this is the shortest answer so far ...
Something like this could work, if you're allowed to use subqueries.
SELECT `NAME`
FROM Student
WHERE 'F' NOT IN
(SELECT GRADE FROM Grade WHERE ID = Student.ID)
Hmm, my homework sense is tingling... Not that my questions have never related to homework though...
You could use the GROUP BY and aggregate functions in order to fake a distinct.
You want to exclude everyone who has both taken a course and received a grade of 'F'. Something like this might work:
SELECT NAME
FROM Student
WHERE 0 = (SELECT COUNT(*)
FROM Student
LEFT JOIN Grade
USING (ID)
WHERE GRADE='F')
GROUP BY NAME
SELECT name
FROM grade G, student S
WHERE (S.id = G.id AND 'F' NOT IN (SELECT G1.grade
FROM grade G1
WHERE G1.id = G.id))
OR
S.id NOT IN (SELECT id
FROM grade)
GROUP BY name
A reason why they might not want you to use UNION, JOIN or DISTINCT is that some of those queries might be slow if you try to force an "optimized" solution.
I'm not too familiar with query optimization techniques but usually if you use some of those aggregators and JOINs, you might slow down your query rather than just letting the query optimizations run through and organize your SQL based on your table structure and contents.