I'm in a situation where I have several sub-queries, joined by UNION, each nested with an inner sub-query. The outer sub-queries are fully identical to each other, while the inner queries differ and are unique.
Reuse of the entire outer sub-queries is cumbersome for reading and making changes, and it would be greatly beneficial if they could be defined once and reused. So this is a question about creating reusable SQL-queries, but with distinct inner sub-queries that are passed on as arguments.
For my examples, I will present a simplified case that has the same issue as my real code.
We're using Oracle SQL for our project.
Say that we have a database for a school or university, with the tables PERSON, STUDENT, GRADE and COURSE, and all are connected by FK-relationships.
I need to run a query which gathers a list, counting up the number of people once for every criteria:
Number of students whose last name begin on the letter 'E'
Number of students older than 20 years
Number of people (including but not limited to students) who are female
Number of students who passed the course "Intermediate Norwegian" with grade B or higher
The expected outcome:
| Description | Number_of_students
1 | Last names beginning with letter "E" | 32
2 | Older than 20 years | 154
3 | All female persons | 356
4 | Passed "Intermediate Norwegian" with grade >= B | 12
Below is a query which should satisfy what I need.
It consists of several sub-queries joined by UNION, and all have their own distinct inner query.
The code is far from brilliant, but that is beside the point. The real question is about drastically improving readability. The outer sub-queries have the same structure that could be re-used, but the inner ones are different.
SELECT * FROM
-- 1st entry: Number of students on the last name 'E'
(SELECT 'Last names beginning with letter "E"' AS Description, count(*) AS Number_of_students FROM
FROM PERSON p, STUDENT s, GRADE g, COURSE c
WHERE p.ID = s.PERSON_ID
AND s.ID = g.STUDENT_ID
AND g.COURSE_ID = c.ID
-- ... other complex code here
AND p.ID IN(
SELECT DISTINCT p1.ID FROM PERSON p1, STUDENT s1
WHERE p1.ID = s1.PERSON_ID AND p1.LASTNAME LIKE 'E%'
)
)
UNION
-- 2nd entry: Number of students older than 20 years
(SELECT 'Older than 20 years' AS Description, count(*) AS Number_of_students FROM
FROM PERSON p, STUDENT s, GRADE g, COURSE c
WHERE p.ID = s.PERSON_ID
AND s.ID = g.STUDENT_ID
AND g.COURSE_ID = c.ID
-- ... other complex code here
AND p.ID IN(
SELECT DISTINCT p2.ID FROM PERSON p2, STUDENT s2
WHERE p2.ID = s2.PERSON_ID AND p2.AGE > 20
)
)
UNION
-- 3rd entry: Number of female persons, including but not limited to students
(SELECT 'All female persons' AS Description, count(*) AS Number_of_students FROM
FROM PERSON p, STUDENT s, GRADE g, COURSE c
WHERE p.ID = s.PERSON_ID
AND s.ID = g.STUDENT_ID
AND g.COURSE_ID = c.ID
-- ... other complex code here
AND p.ID IN(
SELECT DISTINCT p3.ID FROM PERSON p3 WHERE p3.GENDER = 'Female'
)
)
UNION
-- 4th entry: Students who passed the course "Intermediate Norwegian" with grade B or higher
(SELECT 'Passed "Intermediate Norwegian" with grade >= B' AS Description, count(*) AS Number_of_students FROM
FROM PERSON p, STUDENT s, GRADE g, COURSE c
WHERE p.ID = s.PERSON_ID
AND s.ID = g.STUDENT_ID
AND g.COURSE_ID = c.ID
-- ... other complex code here
AND p.ID IN(
SELECT DISTINCT p4.ID FROM PERSON p4, STUDENT s4, GRADE g4 AND COURSE c4
WHERE p4.ID = s4.PERSON_ID
AND s4.ID = g4.STUDENT_ID
AND g4.COURSE_ID = c4.ID
AND (g4.GRADE = 'A' OR g4.GRADE = 'B')
AND c4.COURSE_NAME = 'Intermediate Norwegian'
)
)
Like I said, the code is far from brilliant. I won't be surprised if some of you cringed at what you just read.
For instance, the entire fourth one could easily be replaced by a query where you replace the entire inner query with g.GRADE = 'A' OR 'B' and c.COURSE_NAME = 'Intermediate Norwegian'.
But like I said, that is not the point here.
Every outer sub-query has the same structure:
(SELECT 'Passed "Intermediate Norwegian" with grade >= B' AS Description, count(*) AS Number_of_students FROM
FROM PERSON p, STUDENT s, GRADE g, COURSE c
WHERE p.ID = s.PERSON_ID
AND s.ID = g.STUDENT_ID
AND g.COURSE_ID = c.ID
-- ... other complex code here
AND p.ID IN(
-- Inner Sub-query here
)
While every Sub-query has an inner one that differs from each other. Like the 1st and the 3rd one:
SELECT DISTINCT p1.ID FROM PERSON p1, STUDENT s1 WHERE p1.ID = s1.PERSON_ID AND p1.LASTNAME LIKE 'E%'
and
SELECT DISTINCT p3.ID FROM PERSON p3 WHERE p3.GENDER = 'Female'
What I need:
The real code I'm working with is far more complex, but has the same following issues as presented in the example above.
The result must be a list with several numbers, each categorized by their own distinct criteria (preferably described in the first column).
It consists of several sub-queries, joined by UNION
Each of these sub-queries are completely identical, with the exception of an inner sub-query that completely unique, and different from the others.
The resulting code is a huge beast, but could in theory be made far more readable if the outer code had been written only once, and reused with different inner code passed on as arguments.
I have recently come across the WITH-clause in Oracle SQL.
Something similar to this following change would be very beneficial:
WITH outer_sub_query AS (
SELECT 'DESCRIPTION HERE' AS Description, count(*) AS Number_of_students FROM
FROM PERSON p, STUDENT s, GRADE g, COURSE c
WHERE p.ID = s.PERSON_ID
AND s.ID = g.STUDENT_ID
AND g.COURSE_ID = c.ID
-- ... other complex code here
AND p.ID IN(
-- INSERT INNER SUB-QUERY HERE
)
)
SELECT * FROM (
outer_sub_query -- Last Names beginning with letter 'E'
UNION
outer_sub_query -- Age > 20
UNION
outer_sub_query -- All female
UNION
outer_sub_query -- Passed that course with grade >= B
)
Unfortunately, my needs are not yet satisfied. I still need to pass on the inner sub-queries, as well as descriptions. Something similar to this:
SELECT * FROM (
outer_sub_query(
'Last names beginning with letter "E",'
SELECT DISTINCT p1.ID FROM PERSON p1, STUDENT s1
WHERE p1.ID = s1.PERSON_ID AND p1.LASTNAME LIKE 'E%'
)
UNION
outer_sub_query(
'Older than 20 years.'
SELECT DISTINCT p2.ID FROM PERSON p2, STUDENT s2
WHERE p2.ID = s2.PERSON_ID AND p2.AGE > 20
)
UNION
outer_sub_query(
'All female persons'
SELECT DISTINCT p3.ID FROM PERSON p3 WHERE p3.GENDER = 'Female'
)
UNION
outer_sub_query(
'Passed "Intermediate Norwegian" with grade >= B'
SELECT DISTINCT p4.ID FROM PERSON p4, STUDENT s4, GRADE g4 AND COURSE c4
WHERE p4.ID = s4.PERSON_ID
AND s4.ID = g4.STUDENT_ID
AND g4.COURSE_ID = c4.ID
AND (g4.GRADE = 'A' OR g4.GRADE = 'B')
AND c4.COURSE_NAME = 'Intermediate Norwegian'
)
)
The questions:
Now, defining a FUNCTION easily comes to mind. But it still brings me some questions:
It first glance, it seems that WITH-clause does not take in parameters that can be passed on. Is there any other pre-existing clauses or functions in SQL or Oracle SQL that handles this?
Is it possible to extract the inner sub-query out from the outer one, and still achieve the same result? (Remember: No changes in the outer sub-query itself).
If I am to define a FUNCTION that handles this, is it possible to pass on pure SQL-codes like I have done above?
Is there any other smart solutions that I am missing?
Thank you for your advice(s).
A common table expression (such as you already suggested) seems a likely approach for reducing code duplication in your case, but you're trying to get it to do too much for you. CTEs cannot be parameterized in the way you hope; if they were, then a use such as you envision would no longer have them in common.
Yes, you could write a table-valued function, but that seems way overkill, and it could well be difficult for the query planner to analyze. Here's about as far as you can go with a CTE:
WITH student_grades AS (
SELECT
p.id AS id,
p.lastname AS lastname,
p.age AS age,
p.gender AS gender,
c.course_name AS course_name,
g.grade AS grade
FROM
-- You really, really should use ANSI JOIN syntax:
PERSON p
JOIN STUDENT s ON p.ID = s.PERSON_ID
JOIN GRADE g ON s.ID = g.STUDENT_ID
JOIN COURSE c ON g.COURSE_ID = c.ID
-- WHERE ... other complex code here
)
You might then continue your query with ...
-- 1st entry: Number of students on the last name 'E'
SELECT
'Last names beginning with letter "E"' AS Description,
count(distinct sg1.id) AS Number_of_students
FROM student_grades sg1
WHERE sg1.lastname LIKE 'E%'
UNION
-- 2nd entry: Number of students older than 20 years
SELECT
'Older than 20 years' AS Description,
count(distinct sg2.id) AS Number_of_students
FROM student_grades sg2
WHERE sg2.AGE > 20
UNION
-- 3rd entry: Number of female persons, including but not limited to students
-- NOTE: THIS ONE MATCHES YOUR ORIGINAL, WHICH IS INCORRECT
SELECT
'All female persons' AS Description,
count(distinct sg3.id) AS Number_of_students
FROM student_grades sg3
WHERE sg3.GENDER = 'Female'
UNION
-- 4th entry: Students who passed the course "Intermediate Norwegian" with grade B or higher
SELECT
'Passed "Intermediate Norwegian" with grade >= B' AS Description,
count(distinct sg4.id) AS Number_of_students
FROM student_grades sg4
WHERE
sg4.COURSE_NAME = 'Intermediate Norwegian'
AND sg4.grade IN ('A', 'B')
And that's actually a significant improvement. Note in particular that you don't need to pass conditions, subquery or not, into the CTE; instead, you query the CTE (which you could also join to other tables, etc.). Of course, in part that's because your "inner" subqueries were a pretty horrible way of doing things; instead, I use count(distinct sg.id), which achieves the same thing as those subqueries as long as person.id is non-null, which I presume it is on account of being a PK.
But note also that even the need for a distinct count (and the bugginess of the third part of the query) arise from trying to do all four parts with the same common intermediate results in the first place. You don't need to join course or grade information in order to query information related strictly to personal characteristics, and as long as student has a 0,1:1 relationship with person, leaving out the course and grade information would give you a distinct count for free.
And as for the third part, joining the student table restricts your results to students, which you didn't want. The fact that you don't put that restriction in the "inner" subquery is irrelevant; you're using that subquery to filter results that only include people who are students in the first place. Thus, *your approach cannot produce the results you want in this case.**
Maybe your desire to factor out a big chunk of common query arises from the mysterious "other complex code". I don't see how such a thing applies to the question as you've presented it, but I'm inclined to suspect that you would be better off finding a way -- or maybe separate ways per item -- to simplify or eliminate that code. If it were the case that that code could be ignored then I might write your query like so:
WITH student_person AS (
SELECT
p.lastname AS lastname,
p.age AS age,
p.gender AS gender,
s.id AS student_id
FROM
PERSON p
JOIN STUDENT s ON p.ID = s.PERSON_ID
)
-- 1st entry: Number of students on the last name 'E'
SELECT
'Last names beginning with letter "E"' AS Description,
count(*) AS Number_of_students
FROM student_person sp1
WHERE sp1.lastname LIKE 'E%'
UNION ALL
-- 2nd entry: Number of students older than 20 years
SELECT
'Older than 20 years' AS Description,
count(*) AS Number_of_students
FROM student_person sp2
WHERE sp2.AGE > 20
UNION ALL
-- 3rd entry: Number of female persons, including but not limited to students
-- NOTE: THIS ONE MATCHES YOUR ORIGINAL, WHICH IS INCORRECT
SELECT
'All female persons' AS Description,
count(*) AS Number_of_students
-- must select from PERSON, not STUDENT_PERSON:
FROM person p2
WHERE p2.GENDER = 'Female'
UNION ALL
-- 4th entry: Students who passed the course "Intermediate Norwegian" with grade B or higher
SELECT
'Passed "Intermediate Norwegian" with grade >= B' AS Description,
count(distinct sp3.student_id) AS Number_of_students
FROM student_person sp3
JOIN grades g ON sp3.student_id = g.student_id
JOIN course c ON g.course_id = c.id
WHERE
c.COURSE_NAME = 'Intermediate Norwegian'
AND g.grade IN ('A', 'B')
Take it from here, I'm sure you'll manage.
select count(case when lastname like 'e%' then 1 end) as lastname_starts_with_e
,count(case when age > 20 then 1 end) as age_greater_than_20
,count(case when gender = 'Female' then 1 end) as is_female
from person
;
Related
I have three tables I want to iterate over. The tables are pretty big so I will show a small snippet of the tables. First table is Students:
id
name
address
1
John Smith
New York
2
Rebeka Jens
Miami
3
Amira Sarty
Boston
Second one is TakingCourse. This is the course the students are taking, so student_id is the id of the one in Students.
id
student_id
course_id
20
1
26
19
2
27
18
3
28
Last table is Courses. The id is the same as the course_id in the previous table. These are the courses the students are following and looks like this:
id
type
26
History
27
Maths
28
Science
I want to return a table with the location (address) and the type of courses that are taken there. So the results table should look like this:
address
type
The pairs should be unique, and that is what's going wrong. I tried this:
select S.address, C.type
from Students S, Courses C, TakingCourse TC
where TC.course_id = C.id
and S.id = TC.student_id
And this does work, but the pairs are not all unique. I tried select distinct and it's still the same.
Multiple students can (and will) reside at the same address. So don't expect unique results from this query.
Only an overview is needed, so that's why I don''t want duplicates
So fold duplicates. Simple way with DISTINCT:
SELECT DISTINCT s.address, c.type
FROM students s
JOIN takingcourse t ON s.id = t.student_id
JOIN courses c ON t.course_id = c.id;
Or to avoid DISTINCT (why would you for this task?) and, optionally, get counts, too:
SELECT c.type, s.address, count(*) AS ct
FROM students s
JOIN takingcourse t ON s.id = t.student_id
JOIN courses c ON t.course_id = c.id
GROUP BY c.type, s.address
ORDER BY c.type, s.address;
A missing UNIQUE constraint on takingcourse(student_id, course_id) could be an additional source of duplicates. See:
How to implement a many-to-many relationship in PostgreSQL?
My end goal is to create a list of honor roll students. Each student has multiple rows, one for each grade. I want to say, look at their grades across these rows; only show 1 student name if none of their grades are <80%.
I've started just with this, but I'm stuck, I don't know how to assess across the multiple rows as a criterion for selecting a unique list.
SELECT students.first_name, students.last_name, storedgrades.storecode, storedgrades.percent,storedgrades.course_name
FROM storedgrades join
students
on students.ID = storedgrades.StudentID
where students.enroll_status=0 AND
storedgrades.termid>2799 AND
storedgrades.storecode = 'Q4'
Example of grades table:
BOB A 95
BOB D 65
ANDREA B 85
ANDREA A 95
EXAMPLE RESULT:
ANDREA
Use aggregation. I think this is what you want:
select s.id, s.first_name, s.last_name
from students s join
storedgrades sg
on s.ID = sg.StudentID
where s.enroll_status = 0 and
sg.termid > 2799 AND
sg.storecode = 'Q4'
group by s.id, s.first_name, s.last_name
having count(*) = 5 and -- you want all five courses
min(scorecode) >= 80;
Select a distinct list of student names, then compare with a NOT EXISTS to the grades table where the grade is less than 80%
SELECT distinct students.ID, students.first_name, students.last_name
FROM storedgrades s join
students
on students.ID = storedgrades.StudentID
where students.enroll_status=0 AND
storedgrades.termid>2799 AND
storedgrades.storecode = 'Q4' and not exists
(select 'x' from storedgrades s2 where s2.students.StudentID = s.StudentID and s2.first_name = s.first_name and
s.last_name = s2.last_name and s2.percent < 80)
EDIT: added in studentID to the join
I want to fetch all parents that have kids in a specific grade only in a school.
Below are trimmed down version of the tables.
TABLE students
id,
last_name,
grade_id,
school_id
TABLE parents_students
parent_id,
student_id
TABLE parents
id,
last_name,
school_id
I tried the below query but it doesn't really work as expected. It rather fetches all parents in a school disregarding the grade. Any help is appreciated. Thank you.
SELECT DISTINCT
p.id,
p.last_name,
p.school_id,
st.school_id,
st.grade_id,
FROM parents p
INNER JOIN students st ON st.school_id = p.school_id
WHERE st.grade_id = 118
AND st.school_id = 6
GROUP BY p.id,st.grade_id,st.school_id;
I would think:
select p.*
from parents p
where exists (select 1
from parents_students ps join
students s
on ps.student_id = s.id
where ps.parent_id = p.id and
s.grade_id = 118 and
s.school_id = 6
);
Your question says that you want information about the parents. If so, I don't see why you are including redundant information about the school and grade (it is redundant because the where clause specifies exactly what those values are).
I am having a hard time understand how the create view, TRANSCRIPTVIEW, manages to set the grade of 0 for those who did not take a course. An explanation would help, the solution and question is below. Thanks.
Student(Id,Name)
Transcript(StudId,CourseName,Semester,Grade)
Formulate the following query in SQL:
Create a list of all students (Id, Name) and, for each student, list the average grade for the courses taken in the S2002 semester.
Note that there can be students who did not take any courses in S2002. For these, the average grade should be listed as 0.
Solution:
We first create a view which augments TRANSCRIPT with rows that enroll every student into a NULL course with the grade of 0. Therefore, students who did not take anything in semester ’S2002’ will have the average grade of 0 for that semester.
Below is what confuses me, how does this work and why does it work?
CREATE VIEW TRANSCRIPTVIEW AS (
( SELECT * FROM Transcipt)
UNION
(
SELECT S.Id,NULL,’S2002’,0
FROM Student S)
WHERE S.Id NOT IN (
SELECT T.StudId
FROM Transcript T
WHERE T.Semester = ’S2002’) )
)
Remaining solution:
SELECT S.Id, S.Name, AVG(T.Grade)
FROM Student S, TRANSCRIPTVIEW T
WHERE S.Id = T.StudId AND T.Semester = ’S2002’ GROUP BY S.Id
how the create view, TRANSCRIPTVIEW, manages to set the grade of 0 for those who did not take a course
The set of students who did not take a course in semester S2002 have no record in the transcript table for that semester. Those who did take a course in that semester do have a record in the table for that semester. The query supplies values NULL, 'S2002',0 for students if they are not in the Transcript table for semester S2002:
SELECT S.Id,NULL,’S2002’,0 FROM Student S) -- this parenthesis is wrong
-- this following where conditions looks for students NOT IN the 2002 subset:
WHERE S.Id NOT IN
-- this next part gets a list of studentids for semester 2002
(
SELECT T.StudId FROM Transcript T
WHERE T.Semester = ’S2002’
)
The solution in your qusetion is ridiculous. The better solution is:
SELECT S.Id, S.Name, AVG(case when T.Semester = ’S2002’ then T.Grade end) as AvgS2002Grade
FROM Student S left outer join
TRANSCRIPTVIEW T
on S.Id = T.StudId AND T.Semester = ’S2002’
GROUP BY S.Id
The query in your question is overly complicated. It is using a union (which should really be union all for performance reasons) to be sure all students are included. Gosh, this is what left outer join is for. It is doing filtering in the where clause, when a conditional aggregation is more suitable. It uses archaic join syntax, instead of the ANSI standard.
I hope you are not learning SQL with those shortcomings.
I was asked this trick question:
Table: Student
ID NAME
1 JOHN
2 MARY
3 ROBERT
4 DENNIS
Table: Grade
ID GRADE
1 A
1 A
1 F
2 B
3 A
How do you write SQL query to return DISTINCT name of all students who has never received grade 'F' OR who has never taken a course (meaning, their ID not present in Grade table)?
Trick part is, you're not allowed to use OUTER JOIN, UNION or DISTINCT. Also, why this is a big deal?
Expected result is MARY, ROBERT, DENNIS (3 rows).
SELECT name FROM Student
WHERE
NOT EXISTS (SELECT * FROM Grade WHERE Grade.id = Student.id AND grade = 'F')
OR
NOT EXISTS (SELECT * FROM Grade WHERE Grade.id = Student.id);
You may use GROUP BY in order to fake a distinct.
SELECT name FROM student
WHERE (SELECT COUNT(*) FROM grade WHERE grade = 'F'
AND id = student.id) = 0
at least this is the shortest answer so far ...
Something like this could work, if you're allowed to use subqueries.
SELECT `NAME`
FROM Student
WHERE 'F' NOT IN
(SELECT GRADE FROM Grade WHERE ID = Student.ID)
Hmm, my homework sense is tingling... Not that my questions have never related to homework though...
You could use the GROUP BY and aggregate functions in order to fake a distinct.
You want to exclude everyone who has both taken a course and received a grade of 'F'. Something like this might work:
SELECT NAME
FROM Student
WHERE 0 = (SELECT COUNT(*)
FROM Student
LEFT JOIN Grade
USING (ID)
WHERE GRADE='F')
GROUP BY NAME
SELECT name
FROM grade G, student S
WHERE (S.id = G.id AND 'F' NOT IN (SELECT G1.grade
FROM grade G1
WHERE G1.id = G.id))
OR
S.id NOT IN (SELECT id
FROM grade)
GROUP BY name
A reason why they might not want you to use UNION, JOIN or DISTINCT is that some of those queries might be slow if you try to force an "optimized" solution.
I'm not too familiar with query optimization techniques but usually if you use some of those aggregators and JOINs, you might slow down your query rather than just letting the query optimizations run through and organize your SQL based on your table structure and contents.