Questions about SQL query - sql

I need to select exam results of students in class 7A, but need to look into another table (student_profile) to identify students in 7A (identify by student_id).
I wonder which of the following method will be faster, assume index for student_id is created in both tables:
Method 1:
select * from exam_results r
where exists
(select 1
from student_profile p
where p.student_id = r.student_id
and p.class = '7A')
Method 2:
select * from exam_results
where student_id in
(select student_id
from student_profile
where class = '7A')
Thanks in Advance,
Jonathan

Short answer, it doesn't matter. The query engine will treat them the same.
Personally, I'd consider this syntax.
select
r.*
from
exam_results r
join
student_profile p
on p.student_id = r.student_id
where
p.class = '7A'
The inner is implicit if omitted.
You'll get the same performance because modern query engines are well developed but, I think this standard syntax is more extendable and easier to read.
If you extend this query in the future, multiple join conditons will be easier to optimize than multiple exists or ins.

If you compare the two queries, then the one with the EXISTS is faster. However the right (and usually faster) approach to such problems is a JOIN.
select r.student_id, r.other_columns
from exam_results r
inner join student_profiles s
on r.student_id = s.student_id
where s.class = '7A'

Related

How to display the desired rows of two tables using a subquery?

My subquery:
select studentName, Course.dataStart
from Student,
Course
where Student.id in (select Course.id from Course);
I need a solution to this (above) subquery (not a join)
Why does the SQL subquery display one date for each name? (task: display the names of students from the Student table and the course start date from the Course table using a subquery)
With the help of Join, I get it as it should: (but I need to do it with a subquery)
You seem to be using implicit join syntax, but really you should be using an explicit inner join:
SELECT s.studentName, c.dataStart
FROM Student s
INNER JOIN Course c
ON c.id = s.course_id;
If you really wanted to use the implicit join syntax, it should be something like this:
SELECT s.studentName, c.dataStart
FROM Student s, Course c
WHERE c.id = s.course_id;
But again, please use the first version as its syntax is considered the best way to do it.
You can apply join :
SELECT S.studentName, C.dataStart
FROM Student S
INNER JOIN Course C
ON C.id = S.course_id;
With Sub query:
Select studentName, (Select Course.dataStart from Course
Where Course.id = course_id)
From Student
Asuming that Course.Id field is Student.Id (although it seems strange to me), I think the only way to get the results you want with a subquery would be using it in the SELECT clause:
select studentName, (SELECT Course.dataStart FROM Course WHERE Course.Id = Student.Id)
from Student
This would fail if you have more than 1 row in Course per Student, in that case you could use (SELECT DISTINCT Course.dataStart...)

Query to find students that failed all given subjects

I'm trying to find the students that have failed every subject in a set of subjects via PostgreSQL queries.
Students fail a subject if they have a not null mark < 50 for at least one course offering of the subject. And I want to find the students that have failed all subjects in the set of subjects Relevant_subjects.
NOTE: students can have several records per course.
SELECT People.name
FROM
Relevant_subjects
JOIN Courses on (Courses.subject = Relevant_subjects.id)
JOIN Course_enrolments on (Course_enrolments.course = Courses.id)
JOIN Students on (Students.id = Course_enrolments.student)
JOIN People on (People.id = Students.id)
WHERE
Course_enrolments.mark is not null AND
Course_enrolments.mark < 50 AND
;
With the code above, I get the students that has failed any of the Relevant_subjects but I my desired result is to get the students that has failed all Relevant_subjects. How can I do that?
Students fail a subject if they have a not null mark < 50 for at least one course offering of the subject.
One of many possible ways:
SELECT id, p.name
FROM (
SELECT s.id
FROM students s
CROSS JOIN relevant_subjects rs
GROUP BY s.id
HAVING bool_and( EXISTS(
SELECT -- empty list
FROM course_enrolments ce
JOIN courses c ON c.id = ce.course
WHERE ce.mark < 50 -- also implies NOT NULL
AND ce.student = s.id
AND c.subject = rs.id
)
) -- all failed
) sub
JOIN people p USING (id);
Form a Carthesian Product of students and relevant subjects.
Aggregate by student (s.id) and filter those who failed all subjects in the HAVING clause with bool_and()over a correlated EXISTS subquery testing for at least one such failed course for each student-subject combination.
Join to people as final cosmetic step to get student names. I added id to get unique results (as names probably are not guaranteed to be unique).
Depending on actual table definition, your version of Postgres, cardinalities and value distribution, there may be (much) more efficient queries.
It's a case of relational-division at its core. See:
How to filter SQL results in a has-many-through relation
And the most efficient strategy is to eliminate as many students as possible as early in the query as possible - like by checking the subject with the fewest failing students first. Then proceed with only the remaining students etc.
Your case adds the specific difficulty that the number and identities of subjects to be tested are unknown / dynamic. Typically, a recursive CTE or similar offers best performance for this kind of problem:
SQL query to find a row with a specific number of associations
I would use aggregation:
SELECT p.name
FROM Relevant_subjects rs JOIN
Courses c
ON c.subject = rs.id JOIN
Course_enrolments ce
ON ce.course = c.id JOIN
Students s
ON s.id = ce.student JOIN
People p
ON p.id = s.id
WHERE ce.mark < 50
GROUP BY p.id, p.name
HAVING COUNT(*) = (SELECT COUNT(*) FROM relevant_subjects);
Note: This version assumes that students only have one record per course and relevant_subjects has no duplicates. These can easily be handling using COUNT(DISTINCT) if necessary.
To handle duplicates, this would look like:
SELECT p.name
FROM Relevant_subjects rs JOIN
Courses c
ON c.subject = rs.id JOIN
Course_enrolments ce
ON ce.course = c.id JOIN
Students s
ON s.id = ce.student JOIN
People p
ON p.id = s.id
WHERE ce.mark < 50
GROUP BY p.id, p.name
HAVING COUNT(DISTINCT rs.id) = (SELECT COUNT(DISTINCT rs2.id) FROM relevant_subjects rs2);

Use result of multiple rows to do arithmetic operation

I'm writing a query to multiply the count that I receive from subquery to fees amount, But I don't know how to do that. Any help/suggestion?
Oracle query is:
select courseid,coursename,fees*tmp
from course c join registration r on
r.courseid=c.courseid
and tmp IN (select count(*)
from course c join registration r on
r.courseid=c.courseid group by coursename);
I tried to use like a variable tmp ,But i don't think it works in oracle query. Is there an alternative way to do so?
You can't do that, because you can only select data from tables that appeared between FROM and WHERE. The IN operator is a quick way to save having to write a bunch of OR statements, it is not something that can establish a variable in the outer query.
Instead do something like:
select courseid,coursename,fees * COUNT(r.courseID) OVER(PARTITION BY c.coursename)
from course c join registration r on
r.courseid=c.courseid
Edit/update: you noted that this query produces too many rows and you only want to see distinct course names. In that case it would be better to just use the registrations table to count the number of people on the course and then multiply the fees:
SELECT
c.courseid, c.coursename, c.fees * COALESCE(r.numberOfstudents, 0) as courseWorth
FROM
course c
LEFT OUTER JOIN
(select courseid, COUNT(*) as numberofstudents FROM registration GROUP BY courseid) r
ON c.courseID = r.courseid
You can use a windowing function like Caius or you can use a join like this:
select courseid,coursename, fees * COALESCE(sub.cnt,0)
from course c
join registration r on r.courseid=c.courseid
left join (
select coursename, count(*) as cnt
from course c2
join registration r2 on r2.courseid=c2.courseid
group by coursename
) as sub;
note: I make no claim your joins are correct -- I'm basing this query off of your example not on any knowledge of your data model.

EXISTS Syntax in sql?

I have two tables:
students(id, name, school_id)
schools(id, name)
I'm trying to use EXISTS in order to learn if there are any students that go to specific school, say "Harvard" for example. I know that EXISTS is used after WHERE but I'm wondering if I can do this:
SELECT EXISTS
(SELECT *
FROM students st, schools sch
WHERE st.school_id=sch.id AND sch.name="Harvard");
Is this query correct? I am working on MySQL Workbench and I don't get an error. But I don't know if it does what it's supposed to do.
If it's not, then what should I change? I just want to know if it's correct and if I can use this syntax in the future.
Note that the desired result is either yes or no (1 or 0).
How do I get this result?
Sorry if my question was unclear, I can edit it again if you still don't understand.
You could just get the count, then you would know if any exist - and if so, how many...
SELECT COUNT(*)
FROM students st, schools sch
WHERE st.school_id=sch.id AND sch.name="Harvard"
The EXISTS keyword is normally used as a pre-condition... i.e. "if not exists, then add this..."
This is a simple Select with a Join.
SELECT *
FROM Students S
JOIN Schools SC
ON S.School_id = SC.id
Where SC.Name = 'Harvard'
This will give you all the rows for the students that go to Harvard, if any. If you want to do a count you can do SELECT COUNT(*) instead or limit which columns are returned by indicating the specific columns in the SELECT statement
This will give you student names that go to Harvard. If you wish to see how many students learn at Harvard replace st.name with count(*).
Note that it doesn't matter what you put in SELECT list inside EXISTS statement, so choosing a constant value provides better performance than selecting columns.
SELECT
st.name
FROM
students st
WHERE
EXISTS(
SELECT 1 FROM schools s WHERE s.name= 'Harvard' AND s.id = st.school_id)
OR
SELECT
st.name
FROM
students st
INNER JOIN schools s ON
st.school_id = s.id
WHERE
s.name = 'Harvard'
Additional note: Code below would only yield result either true or false.
SELECT EXISTS ( <query> )
This means that below query would return true if there are any students that learn at Harvard.
SELECT EXISTS (
SELECT 1
FROM students st
INNER JOIN schools s ON st.school_id = s.id
WHERE s.name = 'Harvard' )

SQL: Speed Improvement - Left Join on cond1 or cond2

SELECT DISTINCT a.*, b.*
FROM current_tbl a
LEFT JOIN import_tbl b
ON ( a.user_id = b.user_id
OR ( a.f_name||' '||a.l_name = b.f_name||' '||b.l_name)
)
Two tables that are basically the same
I don't have access to the table structure or data input (thus no cleaning up primary keys)
Sometimes the user_id is populated in one and not the other
Sometimes names are equal, sometimes they are not
I've found that I can get the most of the data by matching on user_id or the first/last names. I'm using the ' ' between the names to avoid cases where one user has the same first name as another's last name and both are missing the other field (unlikely, but plausible).
This query runs in 33000ms, whereas individualized they are each about 200ms.
I've been up late and can't think straight right now
I'm thinking that I could do a UNION and only query by name where a user_id does not exist (the default join is the user_id, if a user_id doesn't exist then I want to join by the name)
Here is some free points to anyone that wants to help
Please don't ask for the execution plan.
Looks like you can easily avoid the string concatenation:
OR ( a.f_name||' '||a.l_name = b.f_name||' '||b.l_name)
Change it to:
OR ( a.f_name = b.f_name AND a.l_name = b.l_name)
Rather than concatenating first and last name and comparing them, try comparing them individually instead. Assuming you have them (and you should create them if you don't), this should improve your chances of using indexes on the first name and last name columns.
SELECT DISTINCT a.*, b.*
FROM current_tbl a
LEFT JOIN import_tbl b
ON ( a.user_id = b.user_id
OR (a.f_name = b.f_name and a.l_name = b.l_name)
)
If people's suggestions don't provide a major speed increase, there is a possibility that your real problem is that the best query plan for your two possible join conditions is different. For that situation you would want to do two queries and merge results in some way. This is likely to make your query much, much uglier.
One obscure trick that I have used for that kind of situation is to do a GROUP BY off of a UNION ALL query. The idea looks like this:
SELECT a_field1, a_field2, ...
MAX(b_field1) as b_field1, MAX(b_field2) as b_field2, ...
FROM (
SELECT a.field_1 as a_field1, ..., b.field1 as b_field1, ...
FROM current_tbl a
LEFT JOIN import_tbl b
ON a.user_id = b.user_id
UNION ALL
SELECT a.field_1 as a_field1, ..., b.field1 as b_field1, ...
FROM current_tbl a
LEFT JOIN import_tbl b
ON a.f_name = b.f_name AND a.l_name = b.l_name
)
GROUP BY a_field1, a_field2, ...
And now the database can do each of the two joins using the most efficient plan.
(Warning of a drawback in this approach. If a row in current_tbl joins to multiple rows in import_tbl, then you'll wind up merging data in a very odd way.)
Incidental random performance tip. Unless you have reason to believe that there are potential duplicate rows, avoid DISTINCT. It forces an implicit GROUP BY, which can be expensive.
I don't really understand why you're concatenating those strings. Seems like that's where your slowdown would be. Does this work instead?
SELECT DISTINCT a.*, b.*
FROM current_tbl a
LEFT JOIN import_tbl b
ON ( a.user_id = b.user_id
OR ( a.f_name = b.f_name AND a.l_name = b.l_name)
)
Here is Yet Another Ugly Way To Do It.
SELECT a.*
, CASE WHEN b.user_id IS NULL THEN c.field1 ELSE b.field1 END as b_field1
, CASE WHEN b.user_id IS NULL THEN c.field2 ELSE b.field2 END as b_field2
...
FROM current_tbl a
LEFT JOIN import_tbl b
ON a.user_id = b.user_id
LEFT JOIN import_tbl c
ON a.f_name = c.f_name AND a.l_name = c.l_name;
This avoids any GROUP BY, and also handles conflicting matches in a somewhat reasonable way.
Try using JOIN hints:
http://msdn.microsoft.com/en-us/library/ms173815.aspx
We were encountering the same type of behavior with one of our queries. As a last resort we added the LOOP hint, and the query ran much much faster.
It's important to note that Microsoft says this about JOIN hints:
Because the SQL Server query optimizer typically selects the best execution plan for a query, we recommend that hints, including , be used only as a last resort by experienced developers and database administrators.
my boss at my last job.. I swear.. he thought that using UNIONS was ALWAYS FASTER THAN OR.
For example.. instead of writing
Select * from employees Where Employee_id = 12 or employee_id = 47
he would write (and have me write)
Select * from employees Where employee_id = 12
UNION
Select * from employees Where employee_id = 47
SQL Sever optimizer said that this was the right thing to do in SOME situations.. I have a friend who works on the SQL Server team at Microsoft, I emailed him about this and he told me that my stats were out of date or something along those lines.
I never really got a good answer on WHY the unions are faster, it seems REALLY counter-intuitive.
I'm not recommending you DO this, but in some situations it can help.
Also two more things-- GET RID OF THE DISTINCT CLAUSE unless you absolutely need it.. n
and more importantly, you can easily get rid of the concatenation in your join, like this for example (pardon my lack of mySQL knowledge)
SELECT DISTINCT a., b.
FROM current_tbl a
LEFT JOIN import_tbl b
ON ( a.user_id = b.user_id
OR ( a.f_name = b.f_name and a.l_name = b.l_name)
)
I've had some tests at work in a similiar situation that show 10x performance improvement by getting rid of the simple concatenation in your join