Is there a more elegant way of writting this SQL query? - sql

I'm doing Stanfords introduction to DB course and this is one of the homework assignments. My code does the job well, but I don't really like it how I reused the same SELECT-FROM-JOIN part twice:
SELECT name, grade
FROM Highschooler
WHERE
ID IN (
SELECT H1.ID
FROM Friend
JOIN Highschooler AS H1
ON Friend.ID1 = H1.ID
JOIN Highschooler AS H2
ON Friend.ID2 = H2.ID
WHERE H1.grade = H2.grade
) AND
ID NOT IN (
SELECT H1.ID
FROM Friend
JOIN Highschooler AS H1
ON Friend.ID1 = H1.ID
JOIN Highschooler AS H2
ON Friend.ID2 = H2.ID
WHERE H1.grade <> H2.grade
)
ORDER BY grade, name
This is the SQL schema for the two tables used in the code:
Highschooler(ID int, name text, grade int);
Friend(ID1 int, ID2 int);
I had to query all the Highschoolers that only have friends in the same grade, and not in any other grades. Is there a way to somehow write the code bellow only once, and reuse it two times for the two different WHERE clauses = and <>?
SELECT H1.ID
FROM Friend
JOIN Highschooler AS H1
ON Friend.ID1 = H1.ID
JOIN Highschooler AS H2
ON Friend.ID2 = H2.ID
EDIT: We are required to provide SQLite code.

This is a "poster child" example for the WHERE EXISTS query:
SELECT name, grade
FROM Highschooler ME
WHERE EXISTS (
SELECT 1
FROM Friend F
JOIN Highschooler OTHER on F.ID2=OTHER.ID
WHERE F.ID1=ME.ID AND OTHER.Grade = ME.GRADE
)
AND NOT EXISTS (
SELECT 1
FROM Friend F
JOIN Highschooler OTHER on F.ID2=OTHER.ID
WHERE F.ID1=ME.ID AND OTHER.Grade <> ME.GRADE
)
An EXISTS condition is true if its SELECT returns one or more row; otherwise, it is false. All you need to do is to correlate the inner subquery with the outer one (the F.ID1=ME.ID part), and add the remaining constraints that you need (the OTHER.Grade = ME.GRADE or the OTHER.Grade <> ME.GRADE) to your query.

This is a typical type of question about groups related to an individual. When you are faced with such a question, one approach is to use joins (looking at things in pairs). Often a better approach is to use aggregation to look at the entire group at once.
The insight here is that if you have a group of friends and all are in the same grade, then the minimum and maximum grades will be the same.
That hint might be enough for you to write the query. If so, stop here.
The query that returns what you want is much simpler than what you were doing. You just need to look at the friends' grades:
SELECT f.id1
FROM Friend f jJOIN
Highschooler fh
ON Friend.ID1 = fh.ID join
group by f.id1
having max(fh.grade) = min(fh.grade)
The having clause ensures that all are the same (ignore NULL values).
EDIT:
This version answers the question: Which highschoolers have friends all of whom are in the same grade. Your question is ambiguous. Perhaps you mean that the friends and the original person are all in the same grade. If so, then you can do so with a small modification. One way is to change the having clause to:
having max(fh.grade) = min(fh.grade) and
max(fh.grade) = (select grade from Highschooler h where f.id1 = h.id1)
This checks that the friends and original person are all in the same grade.

Sometimes you can get more natural query shape when you turn some filtering joins into set operations like UNION or MINUS/EXCEPT. The query of yours could be for example written as (pseudo-code):
SELECT H.id
FROM Highschooler H
JOIN .... | has a friend
WHERE ... | in SAME grade
EXCEPT
SELECT H.id
FROM Highschooler H
JOIN .... | has a friend
WHERE ... | in OTHER grade
some SQL engines use keyword "MINUS", some use "EXCEPT".
But note that very like UNION, this will execute both queries, then filter their results. This can have different performance then a single do-it-all query, but mind that not necessarily worse. Many times I find it even having better performance, as 'excepting' over single column, especially sorted, is very quick
Also, if your DB engine permits, you might try to use a View or CTE to shorten your original query, but I do not see much sense in doing so, except for aesthetics

Some databases support the minus keyword.
select whatever
from wherever
where id in
(select id
from somewhere
where something
minus
select id
from somewhere
where something else
)
Other databases support the same concept, but with the keyword except, instead of minus.

Related

Use result of multiple rows to do arithmetic operation

I'm writing a query to multiply the count that I receive from subquery to fees amount, But I don't know how to do that. Any help/suggestion?
Oracle query is:
select courseid,coursename,fees*tmp
from course c join registration r on
r.courseid=c.courseid
and tmp IN (select count(*)
from course c join registration r on
r.courseid=c.courseid group by coursename);
I tried to use like a variable tmp ,But i don't think it works in oracle query. Is there an alternative way to do so?
You can't do that, because you can only select data from tables that appeared between FROM and WHERE. The IN operator is a quick way to save having to write a bunch of OR statements, it is not something that can establish a variable in the outer query.
Instead do something like:
select courseid,coursename,fees * COUNT(r.courseID) OVER(PARTITION BY c.coursename)
from course c join registration r on
r.courseid=c.courseid
Edit/update: you noted that this query produces too many rows and you only want to see distinct course names. In that case it would be better to just use the registrations table to count the number of people on the course and then multiply the fees:
SELECT
c.courseid, c.coursename, c.fees * COALESCE(r.numberOfstudents, 0) as courseWorth
FROM
course c
LEFT OUTER JOIN
(select courseid, COUNT(*) as numberofstudents FROM registration GROUP BY courseid) r
ON c.courseID = r.courseid
You can use a windowing function like Caius or you can use a join like this:
select courseid,coursename, fees * COALESCE(sub.cnt,0)
from course c
join registration r on r.courseid=c.courseid
left join (
select coursename, count(*) as cnt
from course c2
join registration r2 on r2.courseid=c2.courseid
group by coursename
) as sub;
note: I make no claim your joins are correct -- I'm basing this query off of your example not on any knowledge of your data model.

Self join using an additional table in a subquery not selected in the outer select statement

I'm new to Sql, which might explain why I can't figure this out at all.
I have 2 tables from a theoretical social network. One named 'highschooler' and one names 'likes'. 'Highschooler' has info like id(PK), name, and grade whereas 'likes' has id1 and id2 which simply indicate which id's from highschooler have 'liked' each other, with id1 'liking' id2. Likes are not necessarily mutual.
Query: For every student who likes someone 2 or more grades younger than themselves, return that student's name and grade, and the name and grade of the student they like.
This:
select h1.name, h1.grade, h2.name, h2.grade
from highschooler h1 inner join highschooler h2
where h1.id in (select id1 from likes where h1.id=id1 and id1<>id2 and h1.grade>h2.grade+2);
is not returning the correct result. In fact the correct result isn't even in the result I get from this query. I'm still trying to wrap my head around "join on" which I feel would be useful here but I don't know how that would work in a self join.
As much as I want to know the answer, more than anything with this I'd just like to know how to do something like this in future because I can't really find any info out there about queries like this as self join examples just demonstrate a singular self join and nothing more complicated.
Any help would be appreciated.
Ps This is being run in SQLite
Give this a try
select h1.name, h1.grade, h2.name, h2.grade from highschooler h1
join likes l on (h1.id = l.id1)
join highschooler h2 on (l.id2 = h2.id)
where h2.grade <= h1.grade-2
Let me know if anything fails :)
Try with the below query
SELECT h1.name, h1.grade, h2.name, h2.grade
FROM highschooler h1
INNER JOIN likes l ON h1.id=l.id1
INNER JOIN highschooler h2 ON h2.id=l.id2
WHERE id1 !=id2 AND h1.grade>h2.grade+2
Not 10 damn minutes after posting this I figured it out but screwing around with some inner joins. I've been working on this for like a day:
select h1.name, h1.grade, h2.name, h2.grade
from highschooler h1
inner join likes l on h1.id = l.id1 and id1<>id2
inner join highschooler h2 on h2.id=id2
where h1.grade >= h2.grade+2

Questions about my SQL queries in SQL exercise (Unable to find the column)

I am doing some SQL exercises and the requirement is as follows:
For every situation where student A likes student B, but we have no information about whom B likes (that is, B does not appear as an ID1 in the Likes table), return A and B's names and grades.
And I have a reference about the schema: https://lagunita.stanford.edu/c4x/DB/SQL/asset/socialdata.html
and my query is as follow:
select
h1.name, h1.grade, h2.name, h2.grade
from
highschooler h1, highschooler h2, likes l1, likes l2
where
h1.id in (select id1
from l1
where not exists (select id2 from l1 where id2 = id1))
and h2.id in (select id2
from l2
where not exists (select id2 from l2 where id2 = id1))
The SQL is built in html server and it keeps telling me that they are unable to find column l1. Is there any logical error in my code and can someone tell what's wrong with it? Thanks!
You should better work with joins ... give it a shot with something like this
Select HS1.ID, HS1.Name, HS1.grade,
LI1.ID1, LI1.ID2,
HS2.ID, HS2.Name, HS2.Grade
from Highschooler HS1
join Likes LI1 on HS1.ID = LI1.ID1
JOIN Highschooler HS2 on LI1.ID2 = HS2.id
LEFT JOIN Likes LI2 on HS2.ID = LI2.ID1
where LI2.ID1 IS NULL
Basically you just joining these tables and last 2 lines will give you what you asked for ... "but we have no information about whom B likes (that is, B does not appear as an ID1 in the Likes table)" ... take a look at joins and fix this query ... i just took a glance at problem to point you towards right direction

Questions about SQL query

I need to select exam results of students in class 7A, but need to look into another table (student_profile) to identify students in 7A (identify by student_id).
I wonder which of the following method will be faster, assume index for student_id is created in both tables:
Method 1:
select * from exam_results r
where exists
(select 1
from student_profile p
where p.student_id = r.student_id
and p.class = '7A')
Method 2:
select * from exam_results
where student_id in
(select student_id
from student_profile
where class = '7A')
Thanks in Advance,
Jonathan
Short answer, it doesn't matter. The query engine will treat them the same.
Personally, I'd consider this syntax.
select
r.*
from
exam_results r
join
student_profile p
on p.student_id = r.student_id
where
p.class = '7A'
The inner is implicit if omitted.
You'll get the same performance because modern query engines are well developed but, I think this standard syntax is more extendable and easier to read.
If you extend this query in the future, multiple join conditons will be easier to optimize than multiple exists or ins.
If you compare the two queries, then the one with the EXISTS is faster. However the right (and usually faster) approach to such problems is a JOIN.
select r.student_id, r.other_columns
from exam_results r
inner join student_profiles s
on r.student_id = s.student_id
where s.class = '7A'

Simple SQL question with subqueries

I'm refreshing my SQL with the online Stanford database class exercises, found here. Here is the problem:
"Find names and grades of students who only have friends in the same
grade. Return the result sorted by grade, then by name within each
grade."
We have a highschooler table, with the attributes name, grade, id. Also, the likes table has attributes id1 and id2. id1 and id2 in likes correspond to id in highschooler.
Based on the problem section this comes from, I can tell that I'll need to use subqueries, but I'm not sure where. How should I approach this problem? None of the currently suggested solutions work.
Here is my current SQL statement, that is not working correctly (ignoring sorting):
select distinct
student1.id,
student1.name,
student1.grade
from
highschooler student1,
highschooler student2,
friend
where not exists (select *
from friend
where student1.id = id1
and student2.id = id2
and student1.grade = student2.grade
and student1.id <> student2.id);
I assumed that, if A was B's Friend, it's equal to B was A's friend.
CREATE VIEW Temp
AS
SELECT id,name,grade,id2,[grd2] FROM highschooler
INNER JOIN Likes ON highschooler.id = Likes.id1
INNER JOIN (SELECT id as [id2t], grade as [grd2] from highschooler) a ON a.id2t = Likes.id2
UNION ALL
SELECT id,name,grade,[id1] as [id2],[grd2] FROM highschooler
INNER JOIN Likes ON highschooler.id = Likes.id2
INNER JOIN (SELECT id as [id2t], grade as [grd2] from highschooler) a ON a.id2t = Likes.id1
The temp view let me have all the info i need.
CREATE VIEW PlayWithClassMate
AS
SELECT distinct id FROM Temp WHERE grade = grd2
This PlayWithClassMate view let me have all student who play with her/his classmate (some how, i think a person can play, with all his/her friend not their classmate).
CREATE VIEW IDResult
AS
SELECT id FROM (
SELECT id, COUNT(GRD2) as c FROM TEMP
WHERE id in (SELECT id FROM PlayWithClassMate)
GROUP BY ID) A
WHERE C>1
this IDResult view have all the id the question ask you.
Now, select whatever you need, inwhich its ID in IDResult
i think it's not the best, or it may be the worst, but it work.
(srr abt terribe grammar)
This is harder than it looks, because it requires preparing sets sequentially. But, there are a few ways to solve this one. Here's what quickly comes to mind:
First, find the friend-of-friend for everybody by grade producing something like:
[ID], [FoF ID], [Grade of FoF]
You really don't need [FoF ID], but it might help when debugging.
Then, as a second-order operation, you'll need to produce a list of [ID]s where [Grade of FoF] is equal to both the MAX() and MIN():
SELECT [ID], MAX(Grade of FoF) as A, MIN(Grade of FoF) as B FROM [the above] WHERE A=B
UPDATE:
I realized that I should also add that in the final qry: A=B and A=Grade. Then this solution works. Keep in mind: it only answers the question "Find names and grades of students who only have friends in the same grade." and it assumes friendship is one-directional. (Sorry, I had to leave something undone.)
For those that need to see some SQL, here you are. It's written for MS Access, but easily ported (start by removing the "()" in the inner-most query) to MySQL, PGSQL, or Oracle. Better still, no procedural extensions and no temp tables.
SELECT
name
FROM
(
SELECT
ID
,name
,grade
,min( friend_grade) as min_friend_grade
,max( friend_grade) as max_friend_grade
FROM
(
SELECT
hs1.ID
,hs1.name
,hs1.grade
,l.ID2 as friend_id
,hs2.name as friend_name
,hs2.grade as friend_grade
FROM
( highschooler hs1
INNER JOIN likes l ON (hs1.ID = l.ID1) )
INNER JOIN highschooler hs2 ON (l.ID2 = hs2.ID)
)FoF
GROUP BY
ID
,name
,grade
)FoF_max_min
WHERE
grade=min_friend_grade
AND min_friend_grade=max_friend_grade