I'm trying to solve a seemingly simple problem, but I think i'm tripping over on my understanding of how the EXISTS keyword works. The problem is simple (this is a dumbed down version of the actual problem) - I have a table of students and a table of hobbies. The students table has their student ID and Name. Return only the students that share the same number of hobbies (i.e. those students who have a unique number of hobbies would not be shown)
So the difficulty I run into is working out how to compare the count of hobbies. What I have tried is this.
SELECT sa.studentnum, COUNT(ha.hobbynum)
FROM student sa, hobby ha
WHERE sa.studentnum = ha.studentnum
AND EXISTS (SELECT *
FROM student sb, hobby hb
WHERE sb.studentnum = hb.studentnum
AND sa.studentnum != sb.studentnum
HAVING COUNT(ha.hobbynum) = COUNT(hb.hobbynum)
)
GROUP BY sa.studentnum
ORDER BY sa.studentnum;
So what appears to be happening is that the count of hobbynums is identical each test, resulting in all of the original table being returned, instead of just those that match the same number of hobbies.
Not tested, but maybe something like this (if I understand the problem correctly):
WITH h AS (
SELECT studentnum, COUNT(hobbynum) OVER (PARTITION BY studentnum) student_hobby_ct
FROM hobby)
SELECT studentnum, student_hobby_ct
FROM h h1 JOIN h h2 ON h1.student_hobby_ct = h2.student_hobby_ct AND
h1.studentnum <> h2.studentnum;
I think that what your query would do is only return students who had at least one other student that had the same number of hobbies. But you're not returning anything about the students with whom they match. Is that intentional? I'd treat both queries as sub-queries and aggregate before a join on the counts. You could do several things... here it's returning the number of students that have matching hobby counts, but you could limit HAVING(COUNT(distinct sb.studentnum) = 0 to get the result your query seemed to return...
with xx as
(SELECT sa.studentnum, count(ha.hobbynum) hobbycount
FROM student sa inner join hobby ha
on sa.studentnum = ha.studentnum
group by sa.studentnum
)
select sa.studentnum, sa.hobbycount, count(distinct sb.studentnum) as matchcount
from
xx sa inner join xx sb on
sa.hobbycount = sb.hobbycount
where
sa.studentnum != sb.studentnum
GROUP by sa.studentnum, sa.hobbycount
ORDER BY sa.studentnum;
Related
The caveat here is I must complete this with only the following tools:
The basic SQL construct: SELECT FROM .. AS WHERE... Distinct is ok.
Set operators: UNION, INTERSECT, EXCEPT
Create temporary relations: CREATE VIEW... AS ...
Arithmetic operators like <, >, <=, == etc.
Subquery can be used only in the context of NOT IN or a subtraction operation. I.e. (select ... from... where not in (select...)
I can NOT use any join, limit, max, min, count, sum, having, group by, not exists, any exists, count, aggregate functions or anything else not listed in 1-5 above.
Schema:
People (id, name, age, address)
Courses (cid, name, department)
Grades (pid, cid, grade)
I satisfied the query but I used not exists (which I can't use). The sql below shows only people who took every class in the Courses table:
select People.name from People
where not exists
(select Courses.cid from Courses
where not exists
(select grades.cid from grades
where grades.cid = courses.cid and grades.pid = people.id))
Is there way to solve this by using not in or some other method that I am allowed to use? I've struggled with this for hours. If anyone can help with this goofy obstacle, I'll gladly upvote your answer and select your answer.
As Nick.McDermaid said you can use except to identify students that are missing classes and not in to exclude them.
1 Get the complete list with a cartesian product of people x courses. This is what grades would look like if every student has taken every course.
create view complete_view as
select people.id as pid, courses.id as cid
from people, courses
2 Use except to identify students that are missing at least one class
create view missing_view as select distinct pid from (
select pid, cid from complete_view
except
select pid, cid from grades
) t
3 Use not in to select students that aren't missing any classes
select * from people where id not in (select pid from missing_view)
As Nick suggests, you can use EXCEPT in this case. Here is the sample:
select People.name from People
EXCEPT
select People.name from People AS p
join Grades AS g on g.pid = p.id
join Courses as c on c.cid = g.cid
you can turn the first not exists into not in using a constant value.
select *
from People a
where 1 not in (
select 1
from courses b
...
The database being used for this question is structured as follows with Primary Keys bolded, and Foreign Keys ' '.
Countries (Name, Country_ID, area_sqkm, population)
Teams (team_id, name, 'country_id', description, manager)
Stages (stage_id, took_place, start_loc, end_loc, distance, description)
Riders (rider_id, name, 'team_id', year_born, height_cms, weight_kgs, 'country_id', bmi)
Results ('stage_id', 'rider_id', time_seconds)
I am stuck at the question of:
Q: Bradley Wiggins won the tour. Write a query to find the riders who beat him in at least 4 stages, i.e., riders who had a better time than Wiggins in at least 4 of the 21 stages.
I am currently at :
SELECT ri.name
from riders ri
INNER JOIN results re ON ri.name = re.name
WHERE ri.name = 'BRADLEY Wiggins' IN ...`
I am unsure of how can I move to comparing 2 time_seconds.
May I know how can I go about getting the solution?
Thank you
The task is indeed a little complicated, as it involves several concepts.
The first of these is a self join, i.e. you'll have to select from the same table twice. You want Bradley's results and the others' results, so as to be able to compare them.
select ...
from results bradley
join results other on ...
Or:
select ...
from (select * from results where ...) bradley
join (select * from results where ...) other on ...
Let's use the first option. We add a WHERE clause so to get Bradley and we add the ON clause to get non-Bradleys at the same stage with a better result:
select ...
from results bradley
join results other on other.rider_id <> bradley.rider_id
and other.stage_id = bradley.stage_id
and other.time_seconds < bradley.time_seconds
where bradley.rider_id = (select id from riders where name = 'BRADLEY Wiggins')
The last part is to find riders with at least four better results. This is called aggregation. You want to see riders, so you group by rider_id. And you want to count, so you use COUNT. Moreover you want to restrict results based on COUNT, so you put this in the HAVING clause:
select other.rider_id
from results bradley
join results other on other.rider_id <> bradley.rider_id
and other.stage_id = bradley.stage_id
and other.time_seconds < bradley.time_seconds
where bradley.rider_id = (select id from riders where name = 'BRADLEY Wiggins')
group by other.rider_id
having count(*) >= 4;
As to getting the riders' data, e.g. their names, there are a couple of options:
Join the table and put the columns both in your SELECT clause and your GROUP BY clause. You would do this, if you wanted data from both sets, i.e. riders' data plus the result count.
Subselect the value if you only want one value (e.g. the name). That's simple but really only makes sense when you want only one value from riders table.
You'd change your SELECT clause thus:
select (select name from riders where id = other.rider_id) as name
Write an outer query around the query you already have.
This would be:
select *
from riders
where id in
(
select other.rider_id
from results bradley
join results other on other.rider_id <> bradley.rider_id
and other.stage_id = bradley.stage_id
and other.time_seconds < bradley.time_seconds
where bradley.rider_id = (select id from riders where name = 'BRADLEY Wiggins')
group by other.rider_id
having count(*) >= 4
);
I'm trying to create a view that takes a base table and joins it to information from multiple other tables and returns one row per row in the original table. For the sake of example, let's say I'm matching college graduates to employment and graduate school data... because that is, in fact, what I'm doing. Now, the issue here is that I can get multiple matches in the employment and graduate school data. People could work for more than one employer, or they could go to one grad school and then decide to transfer to another. This creates duplicate rows when I join, which then need to be eliminated through aggregation (or some other means).
My current solution is to do nested joins/queries something like this:
select ID, GradYear, max(Salary) as Salary, case when sum(case when S.Year=GradYear+1 then 1 else 0 end)>0 then 1 else 0 end
from
(
select ID, GradYear, sum(case when W.Year=GradYear+1 then W.Wages else null end) as Salary
from
(
select ID, GradYear
from dbo.Students
where Graduated=1
) as G
left join dbo.Wages as W
on G.ID=W.ID
) as Inner
left join dbo.GradSchool as S
on Inner.ID=S.ID
This seems a bit ugly to me, especially if I want to bring in more data (say, I now want to look for them in the military too). Is there a better way of accomplishing the joining? If I just straight up join the three tables together, I'll end up double counting people's wages if they have 2 grad school records, for example... Let me know if you've got a solution!
SELECT
U.ID,
U.GradYear,
W.Salary,
S.HasGradSchool
FROM dbo.Students U
OUTER APPLY (
SELECT
SUM(Wages) AS Salary
FROM dbo.Wages
WHERE ID = U.ID
AND Year = U.GradYear+1
) W
OUTER APPLY (
SELECT TOP 1
1 AS HasGradSchool
FROM dbo.GradSchool
WHERE ID = U.ID
AND Year = U.GradYear+1
) S
where U.Graduated=1
Here's what I have:
Person
name = varchar
Helmet
person = foreignkey -> Person
is_safe = boolean
Now, for a batch job, I need to query (no ORM, just raw SQL) for all Person that have 0 Helmet that are safe. I could obviously just loop through each Person in the database, etc., but I need to do this in a single query and limit it to 100 at a time (there are novemdecillions of these suckers in the database), and remove each Person. I don't need the Helmet records for each to be attached in the result. I only need the Person records (naturally deleting will cascade), but I can't simply issue a DELETE in place of my SELECT because there are things I need to do elsewhere before deleting them.
I'm using Postgres, but I'd prefer to use a query that's more or less DB agnostic, if possible.
Here's what I've abstractly come up with:
SELECT * FROM person
WHERE (SELECT COUNT(*) FROM helmet
WHERE person_id = person.id AND is_safe = false) = 0
LIMIT 100
This is clearly not valid SQL, but I'm hoping there is a functionally equivalent, but valid version.
select *
from person
where person_id not in
(
select person_id
from helmet
where is_safe = false
)
SELECT *
FROM (
SELECT p.*
FROM Person p
INNER JOIN Helmet h ON p.id = h.person
GROUP BY p.id
HAVING SUM(h.is_safe) = 0
) inner_select
LIMIT 100
So, this query consists of two parts:
The workhorse of the query is the inner query. This query joins together each person with all his helmets. It then uses GROUP BY two relate all rows for a specific person together. Once we have a group, we can use aggregate functions on each group, and in this case we use SUM to count the number of helmets that are safe. The SUM is used by the HAVING-clause to only select groups that have the SUM of safe helmets (i.e the number of safe helmets) equal to zero.
The outer query ensures that the LIMIT is applied to the result of the inner query, and not the rows of the tables needed to calculate an accurate result.
SELECT person.*
FROM person
LEFT JOIN (SELECT DISTINCT person_id FROM helmet) AS T2 ON person.id = T2.person_id
WHERE T2.person_id IS NULL
LIMIT 100
I ended up using:
SELECT *
FROM person p
WHERE NOT EXISTS
(
SELECT h.person_id
FROM helmet h
WHERE h.person_id = p.id
AND is_safe = true
)
LIMIT 100
which turns out to stop scanning the table once it finds 100 results that match.
I have two tables:
Table1 = Schools
Columns: id(PK), state(nvchar(100)), schoolname
Table2 = Grades
Columns: id(PK), id_schools(FK), Year, Reading, Writing...
I would like to develop a query to find the schoolname which has the highest grade for Reading.
So far I have the following and need help to fill in the blanks:
SELECT Schools.schoolname, Grades.Reading
FROM Schools, Grades
WHERE Schools.id = (* need id_schools for max(Grades.Reading)*)
SELECT
Schools.schoolname,
Grades.Reading
FROM
Schools INNER JOIN Grades on Schools.id = Grades.id_schools
WHERE
Grades.Reading = (SELECT MAX(Reading) from Grades)
Here's how I solve this sort of problem without using a subquery:
SELECT s.*
FROM Schools AS s
JOIN Grades AS g1 ON g1.id_schools = s.id
LEFT OUTER JOIN Grades AS g2 ON g2.id_schools <> s.id
AND g1.Reading < g2.Reading
WHERE g2.id_schools IS NULL
Note that you can get more than one row back, if more than one school ties for highest Reading score. In that case, you need to decide how to resolve the tie and build that into the LEFT OUTER JOIN condition.
Re your comment: The left outer join looks for a row with a higher grade for the same school, and if none is found, all of g2.* columns will be null. In that case, we know that no grade is higher than the grade in the row g1 points to, which means g1 is the highest grade for that school. It can also be written this way, which is logically the same but might be easier to understand:
SELECT s.*
FROM Schools AS s
JOIN Grades AS g1 ON g1.id_schools = s.id
WHERE NOT EXISTS (
SELECT * FROM Grades g2
WHERE g2.id_schools <> s.id AND g2.Reading > g1.Reading)
You say it's not working. Can you be more specific? What is the answer you expect, and what's actually happening, and how do they differ?
edit: Changed = to <> as per suggestion in comment by #potatopeelings. Thanks!
This should do it
select * from Schools as s
where s.id=(
select top(1) id_schools from grades as g
order by g.reading desc)