Join statement and comparison - sql

The database being used for this question is structured as follows with Primary Keys bolded, and Foreign Keys ' '.
Countries (Name, Country_ID, area_sqkm, population)
Teams (team_id, name, 'country_id', description, manager)
Stages (stage_id, took_place, start_loc, end_loc, distance, description)
Riders (rider_id, name, 'team_id', year_born, height_cms, weight_kgs, 'country_id', bmi)
Results ('stage_id', 'rider_id', time_seconds)
I am stuck at the question of:
Q: Bradley Wiggins won the tour. Write a query to find the riders who beat him in at least 4 stages, i.e., riders who had a better time than Wiggins in at least 4 of the 21 stages.
I am currently at :
SELECT ri.name
from riders ri
INNER JOIN results re ON ri.name = re.name
WHERE ri.name = 'BRADLEY Wiggins' IN ...`
I am unsure of how can I move to comparing 2 time_seconds.
May I know how can I go about getting the solution?
Thank you

The task is indeed a little complicated, as it involves several concepts.
The first of these is a self join, i.e. you'll have to select from the same table twice. You want Bradley's results and the others' results, so as to be able to compare them.
select ...
from results bradley
join results other on ...
Or:
select ...
from (select * from results where ...) bradley
join (select * from results where ...) other on ...
Let's use the first option. We add a WHERE clause so to get Bradley and we add the ON clause to get non-Bradleys at the same stage with a better result:
select ...
from results bradley
join results other on other.rider_id <> bradley.rider_id
and other.stage_id = bradley.stage_id
and other.time_seconds < bradley.time_seconds
where bradley.rider_id = (select id from riders where name = 'BRADLEY Wiggins')
The last part is to find riders with at least four better results. This is called aggregation. You want to see riders, so you group by rider_id. And you want to count, so you use COUNT. Moreover you want to restrict results based on COUNT, so you put this in the HAVING clause:
select other.rider_id
from results bradley
join results other on other.rider_id <> bradley.rider_id
and other.stage_id = bradley.stage_id
and other.time_seconds < bradley.time_seconds
where bradley.rider_id = (select id from riders where name = 'BRADLEY Wiggins')
group by other.rider_id
having count(*) >= 4;
As to getting the riders' data, e.g. their names, there are a couple of options:
Join the table and put the columns both in your SELECT clause and your GROUP BY clause. You would do this, if you wanted data from both sets, i.e. riders' data plus the result count.
Subselect the value if you only want one value (e.g. the name). That's simple but really only makes sense when you want only one value from riders table.
You'd change your SELECT clause thus:
select (select name from riders where id = other.rider_id) as name
Write an outer query around the query you already have.
This would be:
select *
from riders
where id in
(
select other.rider_id
from results bradley
join results other on other.rider_id <> bradley.rider_id
and other.stage_id = bradley.stage_id
and other.time_seconds < bradley.time_seconds
where bradley.rider_id = (select id from riders where name = 'BRADLEY Wiggins')
group by other.rider_id
having count(*) >= 4
);

Related

Get row from one table, plus COUNT from a related table

I'm trying to build an SQL query where I grab one table's information (WHERE shops.shop_domain = X) along with the COUNT of the customers table WHERE customers.product_id = 4242451.
The shops table DOES NOT have product.id in it, but the customers table DOES HAVE the shop_domain in it, hence my attempt to do some sort of join.
I essentially want to return the following:
shops.id
shops.name
shops.shop_domain
COUNT OF CUSTOMERS WHERE customers.product_id = '4242451'
Here is my not so lovely attempt at the query.
I think I have the idea right (maybe...) but I can't wrap my head around building this query.
SELECT shops.id, shops.name, shops.shop_domain, COUNT(customers.customer_id)
FROM shops
LEFT JOIN customers ON shops.shop_domain = customers.shop_domain
WHERE shops.shop_domain = 'myshop.com' AND
customers.product_id = '4242451'
GROUP BY shops.shop_id
Relevant database schemas:
shops:
id, name, shop_domain
customers:
id, name, product_id, shop_domain
You are close. The condition on customers needs to go in the ON clause, because this is a LEFT JOIN and customers is the second table:
SELECT s.id, s.name, s.shop_domain, COUNT(c.customer_id)
FROM shops s LEFT JOIN
customers c
ON s.shop_domain = c.shop_domain AND c.product_id = '4242451'
WHERE s.shop_domain = 'myshop.com'
GROUP BY s.id, s.name, s.shop_domain;
I am also inclined to include all three columns in the GROUP BY, although Postgres (and ANSI/ISO standards) are happy with just id if it is declared as the primary key in the table.
A correlated subquery should be substantially cheaper (and simpler) for the purpose:
SELECT id, name, shop_domain
, (SELECT count(*)
FROM customers
WHERE shop_domain = s.shop_domain
AND product_id = 4242451) AS special_count
FROM shops s
WHERE shop_domain = 'myshop.com';
This way you only need to aggregate in the subquery, and need not worry about undesired effects on the outer query.
Assuming product_id is a numeric data type, so I use a numeric literal (4242451) instead of a string literal '4242451' - which might cause problems otherwise.

SQL Server 2016 Sub Query Guidance

I am currently working on an assignment for my SQL class and I am stuck. I'm not looking for full code to answer the question, just a little nudge in the right direction. If you do provide full code would you mind a small explanation as to why you did it that way (so I can actually learn something.)
Here is the question:
Write a SELECT statement that returns three columns: EmailAddress, ShipmentId, and the order total for each Client. To do this, you can group the result set by the EmailAddress and ShipmentId columns. In addition, you must calculate the order total from the columns in the ShipItems table.
Write a second SELECT statement that uses the first SELECT statement in its FROM clause. The main query should return two columns: the Client’s email address and the largest order for that Client. To do this, you can group the result set by the EmailAddress column.
I am confused on how to pull in the EmailAddress column from the Clients table, as in order to join it I have to bring in other tables that aren't being used. I am assuming there is an easier way to do this using sub Queries as that is what we are working on at the time.
Think of SQL as working with sets of data as opposed to just tables. Tables are merely a set of data. So when you view data this way you immediately see that the query below returns a set of data consisting of the entirety of another set, being a table:
SELECT * FROM MyTable1
Now, if you were to only get the first two columns from MyTable1 you would return a different set that consisted only of columns 1 and 2:
SELECT col1, col2 FROM MyTable1
Now you can treat this second set, a subset of data as a "table" as well and query it like this:
SELECT
*
FROM (
SELECT
col1,
col2
FROM
MyTable1
)
This will return all the columns from the two columns provided in the inner set.
So, your inner query, which I won't write for you since you appear to be a student, and that wouldn't be right for me to give you the entire answer, would be a query consisting of a GROUP BY clause and a SUM of the order value field. But the key thing you need to understand is this set thinking: you can just wrap the ENTIRE query inside brackets and treat it as a table the way I have done above. Hopefully this helps.
You need a subquery, like this:
select emailaddress, max(OrderTotal) as MaxOrder
from
( -- Open the subquery
select Cl.emailaddress,
Sh.ShipmentID,
sum(SI.Value) as OrderTotal -- Use the line item value column in here
from Client Cl -- First table
inner join Shipments Sh -- Join the shipments
on Sh.ClientID = Cl.ClientID
inner join ShipItem SI -- Now the items
on SI.ShipmentID = Sh.ShipmentID
group by C1.emailaddress, Sh.ShipmentID -- here's your grouping for the sum() aggregation
) -- Close subquery
group by emailaddress -- group for the max()
For the first query you can join the Clients to Shipments (on ClientId).
And Shipments to the ShipItems table (on ShipmentId).
Then group the results, and count or sum the total you need.
Using aliases for the tables is usefull, certainly when you select fields from the joined tables that have the same column name.
select
c.EmailAddress,
i.ShipmentId,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
order by i.ShipmentId, c.EmailAddress;
Using that grouped query in a subquery, you can get the Maximum total per EmailAddress.
select EmailAddress,
-- max(TotalShipItems) as MaxTotalShipItems,
max(TotalPriceDiscounted) as MaxTotalPriceDiscounted
from (
select
c.EmailAddress,
-- i.ShipmentId,
-- count(*) as TotalShipItems,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
) q
group by EmailAddress
order by EmailAddress
Note that an ORDER BY is mostly meaningless inside a subquery if you don't use TOP.

SQL: Find all rows in a table when the rows are a foreign key in another table

The caveat here is I must complete this with only the following tools:
The basic SQL construct: SELECT FROM .. AS WHERE... Distinct is ok.
Set operators: UNION, INTERSECT, EXCEPT
Create temporary relations: CREATE VIEW... AS ...
Arithmetic operators like <, >, <=, == etc.
Subquery can be used only in the context of NOT IN or a subtraction operation. I.e. (select ... from... where not in (select...)
I can NOT use any join, limit, max, min, count, sum, having, group by, not exists, any exists, count, aggregate functions or anything else not listed in 1-5 above.
Schema:
People (id, name, age, address)
Courses (cid, name, department)
Grades (pid, cid, grade)
I satisfied the query but I used not exists (which I can't use). The sql below shows only people who took every class in the Courses table:
select People.name from People
where not exists
(select Courses.cid from Courses
where not exists
(select grades.cid from grades
where grades.cid = courses.cid and grades.pid = people.id))
Is there way to solve this by using not in or some other method that I am allowed to use? I've struggled with this for hours. If anyone can help with this goofy obstacle, I'll gladly upvote your answer and select your answer.
As Nick.McDermaid said you can use except to identify students that are missing classes and not in to exclude them.
1 Get the complete list with a cartesian product of people x courses. This is what grades would look like if every student has taken every course.
create view complete_view as
select people.id as pid, courses.id as cid
from people, courses
2 Use except to identify students that are missing at least one class
create view missing_view as select distinct pid from (
select pid, cid from complete_view
except
select pid, cid from grades
) t
3 Use not in to select students that aren't missing any classes
select * from people where id not in (select pid from missing_view)
As Nick suggests, you can use EXCEPT in this case. Here is the sample:
select People.name from People
EXCEPT
select People.name from People AS p
join Grades AS g on g.pid = p.id
join Courses as c on c.cid = g.cid
you can turn the first not exists into not in using a constant value.
select *
from People a
where 1 not in (
select 1
from courses b
...

SQL Comparing COUNT values within same table

I'm trying to solve a seemingly simple problem, but I think i'm tripping over on my understanding of how the EXISTS keyword works. The problem is simple (this is a dumbed down version of the actual problem) - I have a table of students and a table of hobbies. The students table has their student ID and Name. Return only the students that share the same number of hobbies (i.e. those students who have a unique number of hobbies would not be shown)
So the difficulty I run into is working out how to compare the count of hobbies. What I have tried is this.
SELECT sa.studentnum, COUNT(ha.hobbynum)
FROM student sa, hobby ha
WHERE sa.studentnum = ha.studentnum
AND EXISTS (SELECT *
FROM student sb, hobby hb
WHERE sb.studentnum = hb.studentnum
AND sa.studentnum != sb.studentnum
HAVING COUNT(ha.hobbynum) = COUNT(hb.hobbynum)
)
GROUP BY sa.studentnum
ORDER BY sa.studentnum;
So what appears to be happening is that the count of hobbynums is identical each test, resulting in all of the original table being returned, instead of just those that match the same number of hobbies.
Not tested, but maybe something like this (if I understand the problem correctly):
WITH h AS (
SELECT studentnum, COUNT(hobbynum) OVER (PARTITION BY studentnum) student_hobby_ct
FROM hobby)
SELECT studentnum, student_hobby_ct
FROM h h1 JOIN h h2 ON h1.student_hobby_ct = h2.student_hobby_ct AND
h1.studentnum <> h2.studentnum;
I think that what your query would do is only return students who had at least one other student that had the same number of hobbies. But you're not returning anything about the students with whom they match. Is that intentional? I'd treat both queries as sub-queries and aggregate before a join on the counts. You could do several things... here it's returning the number of students that have matching hobby counts, but you could limit HAVING(COUNT(distinct sb.studentnum) = 0 to get the result your query seemed to return...
with xx as
(SELECT sa.studentnum, count(ha.hobbynum) hobbycount
FROM student sa inner join hobby ha
on sa.studentnum = ha.studentnum
group by sa.studentnum
)
select sa.studentnum, sa.hobbycount, count(distinct sb.studentnum) as matchcount
from
xx sa inner join xx sb on
sa.hobbycount = sb.hobbycount
where
sa.studentnum != sb.studentnum
GROUP by sa.studentnum, sa.hobbycount
ORDER BY sa.studentnum;

SQL - How can I query for all objects who have COUNT(0) of related objects in a reverse relationship?

Here's what I have:
Person
name = varchar
Helmet
person = foreignkey -> Person
is_safe = boolean
Now, for a batch job, I need to query (no ORM, just raw SQL) for all Person that have 0 Helmet that are safe. I could obviously just loop through each Person in the database, etc., but I need to do this in a single query and limit it to 100 at a time (there are novemdecillions of these suckers in the database), and remove each Person. I don't need the Helmet records for each to be attached in the result. I only need the Person records (naturally deleting will cascade), but I can't simply issue a DELETE in place of my SELECT because there are things I need to do elsewhere before deleting them.
I'm using Postgres, but I'd prefer to use a query that's more or less DB agnostic, if possible.
Here's what I've abstractly come up with:
SELECT * FROM person
WHERE (SELECT COUNT(*) FROM helmet
WHERE person_id = person.id AND is_safe = false) = 0
LIMIT 100
This is clearly not valid SQL, but I'm hoping there is a functionally equivalent, but valid version.
select *
from person
where person_id not in
(
select person_id
from helmet
where is_safe = false
)
SELECT *
FROM (
SELECT p.*
FROM Person p
INNER JOIN Helmet h ON p.id = h.person
GROUP BY p.id
HAVING SUM(h.is_safe) = 0
) inner_select
LIMIT 100
So, this query consists of two parts:
The workhorse of the query is the inner query. This query joins together each person with all his helmets. It then uses GROUP BY two relate all rows for a specific person together. Once we have a group, we can use aggregate functions on each group, and in this case we use SUM to count the number of helmets that are safe. The SUM is used by the HAVING-clause to only select groups that have the SUM of safe helmets (i.e the number of safe helmets) equal to zero.
The outer query ensures that the LIMIT is applied to the result of the inner query, and not the rows of the tables needed to calculate an accurate result.
SELECT person.*
FROM person
LEFT JOIN (SELECT DISTINCT person_id FROM helmet) AS T2 ON person.id = T2.person_id
WHERE T2.person_id IS NULL
LIMIT 100
I ended up using:
SELECT *
FROM person p
WHERE NOT EXISTS
(
SELECT h.person_id
FROM helmet h
WHERE h.person_id = p.id
AND is_safe = true
)
LIMIT 100
which turns out to stop scanning the table once it finds 100 results that match.