Taking Unique Records From Two Tables - sql

i am stuck at a silly problem. And it has to be one of the most cliche solution.
Table student_selection:
Columns
=======
student_id
subject_id
faculty_id
Another table: sub_group
Columns
=======
subject_id
sub_group
They are Joint on subject_id.
I want to find those subject_id from table sub_group where, the subject_id is not present in the student_selection table.
Eg:
sub_group(subject_id)
2
3
4
student_selection(subject_id)
2
3
2
3
2
3
Output
4

You can run a simple not in query to get the data.
Like:
SELECT
subject_id
FROM
sub_group
WHERE
subject_id not in (
SELECT
DISTINCT subject_id
FROM
student_selection
)

Standard SQL IN clause:
select subject_id from sub_group
where subject_id not in (select subject_id from student_selection);
Standard SQL EXISTS clause:
select subject_id from sub_group sg
where not exists
(select * from student_selection ss where ss.subject_id = sg.subject_id);
Set based query. Some dbms support set based operations. They use the word EXCEPT or MINUS to subtract one set from another. (MySQL doesn't support this.)
select subject_id from sub_group
except
select subject_id from student_selection;
At last you can use a trick where you outer join the second table and only stay with those results where there was a record outer-joined (i.e. there is no macthing record in the second table). This is also standard SQL. It is less readable for the unexperienced reader but happens to be faster on some dbms.
select sg.subject_id
from sub_group sg
left join student_selection ss on ss.subject_id = sg.subject_id
where ss.subject_id is null;

Related

SQL Server - Combine two select queries

I have two tables, Semester1 and Semester2.
Semester1:
StudentId
SubjectId
abc
sub1
def
sub1
ghi
sub1
Semester2:
StudentId
SubjectId
abc
changedSub1
def
sub1
ghi
changedSub2
newStudent1
sub2
newStudent2
sub3
I am trying to write a single Select statement such that it selects rows from Semester2 that have:
New StudentIds - i.e., StudentIds in Semester2 that are not in Semester1. So the result from this requirement should be Semester2's newStudent1 and newStudent2 rows.
AND
Changed SubjectIds - i.e., SubjectId are different for the same StudentId between Semester1 and Semester2. So the result from this requirement should be Semester2's changedSub1 and changedSub2 rows.
I have been able to write two separate queries to select the 2 requirements separately:
-- Part 1
SELECT * FROM Semester2
WHERE StudentId NOT IN ( SELECT StudentId from Semester1 );
-- Part 2
SELECT Semester2.StudentId, Semester2.SubjectId
FROM Semester2
JOIN Semester1
ON (Semester1.StudentId = Semester2.StudentId)
WHERE Semester1.SubjectId <> Semester2.SubjectId;
How can I combine the two queries? Or if there is a better/easier/clearer way to write both requirements as a single query (without combining my above queries), how do I do that?
It looks like a single query with an outer join should suffice
select s2.*
from semester2 s2
left join semester1 s1 on s1.studentId = s2.studentId
where s1.studentId is null or s2.SubjectId != s1.SubjectId;
You could also do it in a single query using a join if UNION doesn't count for "single query":
SELECT s2.*
FROM Semester2 s2
LEFT OUTER JOIN Semester1 s1
ON s2.StudentId = s1.StudentId
AND s2.SubjectId = s1.SubjectId
WHERE s1.StudentId IS NULL;
The WHERE clause will make it so only results where there isn't a perfect match in Semester1 appear.
You might just need to extend your "Part 1" query a little. Right now it excludes all students from semester 1, but you only want to exclude students from semester 1 that do not have changed subjects in semester 2.
Something like this:
SELECT * FROM Semester2
WHERE StudentId NOT IN (-- Student ids with same subject from semester 1
SELECT StudentId FROM Semester1
WHERE Semester1.SubjectId = Semester2.SubjectId);
But I haven't tested it. Please let me know if I made some terrible mistake.
One option that requires no UNION (which requires scanning the table twice) and no OR condition (which can be slow) and no LEFT JOIN (which confuses the optimizer into thinking there will be multiple joined rows)
SELECT s2.*
FROM semester2 s
WHERE NOT EXISTS (SELECT 1
FROM semester1 s1
WHERE s1.studentId = s2.studentId
AND s2.SubjectId = s1.SubjectId
);
Here's the simplest thing I can think of.
select Semester2.*
from Semester2
left outer join Semester1
on Semester1.StudentId = Semester2.StudentId
where NULLIF(Semester2.SubjectId,Semester1.SubjectId) is NOT NULL
NULLIF will return NULL if the two things are equal (same student had the same subject both semesters). Otherwise it returns Semester2.SubjectId. This excludes exactly what you want to exclude - students from Semester1 who didn't have a different subject in Semester2.

Select columns based on count of many-to-many association

I have a Postgres database with 3 tables that looks a little something like this:
table categories
id
type
table games
id
table game_category
id
game_id
category_id
I want to select all games which have more than x categories where type is something
I have gotten this far:
SELECT * FROM games WHERE id IN (
SELECT game_id FROM game_category GROUP BY game_id HAVING COUNT(*) >= 5
)
This works to select all games with more than 5 categories, but doesn't narrow down the categories by their type. How could I expand on this to add the additional check for the type?
You have to join your categories table with the subquery. Then you can add a WHERE clause for the type. Replace '?' with your actual type, of course.
SELECT * FROM games WHERE id IN (
SELECT game_id FROM game_category
INNER JOIN categories ON (categories.id=game_category.category_id)
WHERE categories.type='?'
GROUP BY game_id HAVING COUNT(*) >= 5
)
Considering query response time, you can avoid the in clause. Mitchel's answer would work if written as follows:
SELECT game_id
FROM game_category gc
inner join categories c on c.id = gc.category_id
WHERE type = 'X'
GROUP BY game_id
HAVING COUNT(game_id) >= 5
Notice I avoided using count(*) that is also a query optimization strategy

How to use INTERSECT together with COUNT in SQLite?

I have a table called customer_transactions and a table called blacklist.
The customer_transactions table has a column called atm_name.
Both tables share a unique key called id.
How can I intersect the two tables in such a way that the query shows me
customers that appear on both tables.
a corresponding column that displays the times that they had used a
certain atm alongside the atm's name
(for instance: id_1 -- bank of america -- 2; id_1 -- citibank -- 3;
id_2 -- bank of america -- 1; id_2 -- citibank -- 4, etcetera).
I have something like this
SELECT id,
atm_name,
count(atm_name) as atm_count
FROM customer_transactions
GROUP BY id, atm_name
How can I INTERSECT this table with the blacklist table and maintain what I currently have as output?
Thanks in advance.
You seem to want a join. Assuming that column id relates the two tables, and that it is a unique key in blacklist, you can do:
select ct.id, ct.atm_name, count(*) as atm_count
from customer_transactions ct
inner join blacklist b on b.id = ct.id
group by ct.id, ct.atm_name
You can also express this logic with exists and a correlated subquery:
select ct.id, ct.atm_name, count(*) as atm_count
from customer_transactions ct
where exists (select 1 from blacklist b where b.id = ct.id)
group by ct.id, ct.atm_name

How to Limit Results Per Match on a Left Join - SQL Server

I have a table with student info [STU] and a table with parent info [PAR]. I want to return an email address for each student, but just one. So I run this query:
SELECT [STU].[ID], [PAR].[EM]
FROM (SELECT [STU].* FROM DB1.STU)
STU LEFT JOIN (SELECT [PAR].* FROM DB1.PAR) PAR ON [STU].[ID] = [PAR].[ID]
This gives me the below table:
Student ID ParentEmail
1 jim#email.com
1 sarah#email.com
2 paul#email.com
2 tim#email.com
3 bill#email.com
3 frank#email.com
3 joyce#email.com
4 greg#email.com
5 tony#email.com
5 sam#email.com
Each student has multiple parent emails, but I only want one. In other words, I want the output to look like this:
Student ID ParentEmail
1 jim#email.com
2 paul#email.com
3 frank#email.com
4 greg#email.com
5 sam#email.com
I've tried so many things. I've tried using GROUP BY and MIN/MAX and I've tried complex CASE statements, and I've tried COALESCE but I just can't seem to figure it out.
I think OUTER APPLY is the simplest method:
SELECT [STU].[ID], [PAR].[EM]
FROM DB1.STU OUTER APPLY
(SELECT TOP (1) [PAR].*
FROM DB1.PAR
WHERE [STU].[ID] = [PAR].[ID]
) PAR;
Normally, there would be an ORDER BY in the subquery, to give you control over which email you want -- the longest, shortest, oldest, or whatever. Without an ORDER BY it returns just one email, which is what you are asking for.
If you just want one column from the parent table, a simple approach is a correlated subquery:
select
s.id student_id,
(select max(p.em) from db1.par p where p.id = s.id) parent_email
from db1.stu s
This gives you the greatest parent email per student.

How to fetch the non matching rows in Oracle

Can anyone help me fetch the non matching rows from two tables in Oracle?
Table: Names
Class_id Stud_name
S001 JAMES
S001 PETER
S002 MARK
Table: Course
Course_id Stud_name
S001 JAMES
S001 KEITH
S002 MARK
Output
I need the rows to display as
CLASS ID STUD_NAME_FROM_NAME_TABLE STUD_NAME_FROM_COURSE_TABLE
---------------------------------------------------------------------
S001 PETER KEITH
I have used Oracle joins to fetch the non matching names:
SELECT *
FROM Names, Course
WHERE Names.Class_id=Course.Course_id
AND Names.Stud_name<>Course.Stud_name
This query is returning duplicate rows.
If you insist on Join you can use this one:
SELECT *
FROM Names
FULL OUTER JOIN Course ON Names.Class_id=Course.Course_id
AND Names.Stud_name = Course.Stud_name
WHERE Names.Stud_name IS NULL or Course.Stud_name IS NULL
Fetches unmatched rows in Names table
SELECT * FROM Names
WHERE
NOT EXISTS
(SELECT 'x' from Course
WHERE
Names.Class_id = Course.Course_id AND
Names.Stud_name = Course.Stud_name)
Fetches unmatched rows in Names and Course too!
SELECT Names.Class_id,Names.Stud_name,C1.Stud_name
FROM Names , Course C1
WHERE Names.Class_id = C1.Course_id AND
NOT EXISTS
(SELECT 'x' from Course C2
WHERE
Names.Class_id = C2.Course_id AND
Names.Stud_name = C2.Stud_name);
When you ask for unmatching rows I assume that you want rows that exist in names but not in course.
If this is the case you're probably after
select * from names
where (class_id, stud_name ) not in
(select course_id, stud_name from course);
Your query returned duplicate rows beacuse for each row in names it selected all rows in course that satisfied the where condition.
So, for the row S001, PETER in names it faound that S001, JAMES and S001, KEITH matched that condition, thus, that row was "returned" twice.
EDIT Since it is not clear if stud_name is a primary key, or unique (and on second sight I think it's not), you'd probably want a
select * from names
where not exists (
select 1 from course where
names.class_id = course.course_id and
names.stud_name <> course.stud_name
)
Edit II if you insist on using a join (as per your comment) you might want to try a
select distinct names.* from...
Hope it helps you
with not_in_class as
(select a.*
from Names a
where not exists ( select 'x'
from course b
where b.Course_id = a.class_id
and a.Stud_name = b.Stud_name)),
not_in_course as
(select b.*
from course b
where not exists ( select 'x'
from Names a
where b.Course_id = a.class_id
and a.Stud_name = b.Stud_name))
select x.class_id,
x.Stud_name NOT_IN_CLASS,
y.stud_name NOT_IN_COURSE
from not_in_class x, not_in_course y
where x.class_id = y.course_id
Output
| CLASS_ID | NOT_IN_CLASS | NOT_IN_COURSE |
|----------|--------------|---------------|
| S001 | PETER | KEITH |
Only problem is that if multiple mismatches are there in both the tables for a given id, it works for single mismatch for a particular id. You need to rework if multiple mismatches are there for the same id.
Well, I am not sure if I understand correctly what you are asking. I think you want a list of all IDs where the student list in class table and course table differs. Then you want to show the id and the students that are in class but not in course and the students that are in course but not in class.
To do so you would full outer join the tables. That gives you students that are both in class and course, students that are in class and not in course, and students that are in course and not in class. Filter your results where either class_id or course_id is null then to get the students missing in course or class. At last group by id and list the students.
select coalesce(class.class_id, course.course_id) as id
, listagg(class.stud_name, ',') within group (order by class.stud_name) as missing_in_course
, listagg(course.stud_name, ',') within group (order by course.stud_name) as missing_in_class
from class
full outer join course
on (class.class_id = course.course_id and class.stud_name = course.stud_name)
where class.class_id is null or course.course_id is null
group by coalesce(class.class_id, course.course_id);
Here is the SQL fiddle showing how it works: http://sqlfiddle.com/#!4/8aaaa/2
EDIT: In Oracle 9i there is no listagg. You can use the inofficial function wm_concat instead:
select coalesce(class.class_id, course.course_id) as id
, wm_concat(class.stud_name) as missing_in_course
, wm_concat(course.stud_name) as missing_in_class
from class
full outer join course
on (class.class_id = course.course_id and class.stud_name = course.stud_name)
where class.class_id is null or course.course_id is null
group by coalesce(class.class_id, course.course_id);