SQL: MS Access query results change when new inner join added - sql

Three tables:
Activities
Matches
Ratings
Each match is associated with one activity (AID as a foreign key). And each activity has multiple reviews.
I'm trying to count the number of matches that each activity is associated with:
SELECT MATCHES.AID, Count(MATCHES.AID) AS CountOfAID
FROM ACTIVITIES INNER JOIN MATCHES ON ACTIVITIES.AID = MATCHES.AID
GROUP BY MATCHES.AID;
Returns this just fine:
But as soon as I add the inner join to also include the average rating of each activity:
SELECT ACTIVITIES.[Activity Name], Count(MATCHES.AID) AS CountOfAID,
Avg(RATINGS.Rating) AS AvgOfRating
FROM (ACTIVITIES INNER JOIN MATCHES ON ACTIVITIES.AID = MATCHES.AID) INNER
JOIN RATINGS ON ACTIVITIES.AID = RATINGS.AID
GROUP BY ACTIVITIES.[Activity Name];
This happens:
How can I work around this?

Since both reviews and matches are joined to activity but there's no relationship between them, you're essentially creating a product of review-match per activity. One approach to get the result you want is to perform the grouping on subqueries for the matches and reviews (independently), and only then join them on the activities:
SELECT a.[Activity Name], CountOfAID, AvgOfRating
FROM activities a
JOIN (SELECT aid, COUNT(*) AS CountOfAID
FROM matches
GROUP BY aid) m ON a.aid = m.aid
JOIN (SELECT aid, AVG(rating) AS AvgOfRating
FROM ratings
GROUP BY aid) r ON a.aid = r.aid

Related

Join with count

I need to write SQL query like:
Show all countries with more than 1000 users, sorted by user count.
The country with the most users should be at the top.
I have tables:
● Table users (id, email, citizenship_country_id)
● Table countries (id, name, iso)
Users with columns: id, email, citizenship_country_id
Countries with columns: id, name, iso
SELECT countries.name,
Count(users.citiizenship_country_id) AS W1
FROM countries
LEFT JOIN users ON countries.id = users.citizenship_country_id
GROUP BY users.citiizenship_country_id, countries.name
HAVING ((([users].[citiizenship_country_id])>2));
But this does not work - I get an empty result set.
Could you please tell me what I'm doing wrong?
A LEFT JOIN is superfluous for this purpose. To have 1000 users, you need at least one match:
SELECT c.name, Count(*) AS W1
FROM countries c JOIN
users u
ON c.id = u.citizenship_country_id
GROUP BY c.name
HAVING COUNT(*) > 1000;
Notice that table aliases also make the query easier to write and to read.
Group by country name and use HAVING Count(u.citiizenship_country_id)>1000, it filters rows after aggregation:
SELECT c.name,
Count(u.citiizenship_country_id) AS W1
FROM countries c
INNER JOIN users u ON c.id = u.citizenship_country_id
GROUP BY c.name
HAVING Count(u.citiizenship_country_id)>1000
ORDER BY W1 desc --Order top counts first
;
As #GordonLinoff pointed, you can use INNER JOIN instead of LEFT JOIN, because anyway this query does not return counries without users and INNER JOIN performs better because no need to pass not joined records to the aggregation.

how can I get the selected columns fully and the sum column separately

SELECT f_name,l_name,teachers.first_name,teachers.t_id,p_id,paid_amount,family_id,date,sum(payments.paid_amount)
FROM payments
LEFT JOIN family ON family.id = payments.family_id
LEFT JOIN teachers ON family.teacher_id = teachers.t_id
How can I get the selected columns fully and the sum column separately?
because that sum function makes all the selected result one row
SELECT f_name,l_name,teachers.first_name,teachers.t_id,p_id,paid_amount,family_id,date
FROM payments
LEFT JOIN family ON family.id = payments.family_id
LEFT JOIN teachers ON family.teacher_id = teachers.t_id
This query is working fine without the sum column
You didn't tell the database, which column to use for aggregating the data. Don't know which database you are using, but some complain, that there is no GROUP BY statement in the SQL text.
Please try with the following query:
SELECT f_name,l_name,teachers.first_name,teachers.t_id,p_id,paid_amount,family_id,date,sum(payments.paid_amount)
FROM payments
LEFT JOIN family ON family.id = payments.family_id
LEFT JOIN teachers ON family.teacher_id = teachers.t_id
GROUP BY f_name,l_name,teachers.first_name,teachers.t_id,p_id,paid_amount,family_id,date
GROUP BY tells the database, which are the key columns in the aggregation.
If you want all the payments, use a subquery or join:
SELECT f_name, l_name, t.first_name, t.t_id, p.p_id, p.paid_amount, p.family_id, date,
(select sum(p.paid_amount) from payments) as all_paid
FROM payments p LEFT JOIN
family f
ON f.id = p.family_id LEFT JOIN
teachers t
ON f.teacher_id = tetchers.t_id;
SELECT f_name,l_name,t.first_name,t.t_id,p_id,paid_amount,family_id,date,sum(p.paid_amount)
FROM payments p,family f,teachers t where f.id = p.family_id and f.teacher_id = t.t_id
Group by f_name,l_name,teachers.first_name,teachers.t_id,p_id,paid_amount,family_id
You can add date column also in Group by expression based on your requirement. Example:
f_name,l_name,teachers.first_name,teachers.t_id,p_id,paid_amount,family_id,date

SQL Access: how to obtain output involving multiple tables without running 2 queries?

I would like to find out the most popular genre of film for a certain age group, for example 20-30 year-olds. I'm quite new to SQL and would appreciate any help I can get, apologies if this is too minor.
The relevant tables for this query are:
FILM {FID (PK), ..., Film_Title}
MEMBER {MID (PK), ..., Date_of_Birth}
LIST {MID (FK), FID (FK)}
GENRE {GID (PK), Genre}
FILM_ACTOR_DIRECTOR_GENRE {FID (FK), ..., GID (FK)}
FILM and MEMBER table should be quite self-explanatory, while a LIST is a selection of films a MEMBER wishes to rent. It's like a shopping basket. Each member only has one list and each list can contain many films. FILM_ACTOR_DIRECTOR_GENRE contains Genre belonging to each film. Each film can only have one genre.
So far I have managed to get an output which shows:
Genre # People Aged 20-30
------- -------------------
Action 5
Comedy 4
Horror 2
etc. etc.
However it involves creating a table and then running another query. Is there a way to obtain the most popular genre within a particular age group without having to run 2 separate queries?
The 2 queries I've used are:
SELECT DISTINCT Genre.Genre_Name, Member.Date_of_Birth
INTO Genre_by_Age
FROM
((((Genre
INNER JOIN Film_Actor_Director_Genre ON Genre.GID = Film_Actor_Director_Genre.GID)
INNER JOIN Film ON Film_Actor_Director_Genre.FID = Film.FID)
INNER JOIN List ON Film.FID = List.FID)
INNER JOIN Member ON Member.MID = List.MID)
WHERE (((Member.[Date_of_Birth]) Between #4/16/1995# And #4/16/1985#));
for creating the new table with information I want, and:
SELECT Genre_Name, COUNT(*) as Number_of_People_aged_20_to_30
FROM Genre_by_Age
GROUP BY Genre_Name
ORDER BY COUNT(*) DESC;
to obtain the output shown above.
Is there a way to obtain the above result without running 2 separate queries? Thanks for your time!
How about using a subquery?
SELECT Genre_Name, COUNT(*) as Number_of_People_aged_20_to_30
FROM (SELECT DISTINCT Genre.Genre_Name, Member.Date_of_Birth
FROM ((((Genre
INNER JOIN Film_Actor_Director_Genre ON Genre.GID = Film_Actor_Director_Genre.GID)
INNER JOIN Film ON Film_Actor_Director_Genre.FID = Film.FID)
INNER JOIN List ON Film.FID = List.FID)
INNER JOIN Member ON Member.MID = List.MID)
WHERE (((Member.[Date_of_Birth]) Between #4/16/1995# And #4/16/1985#))
) as t
GROUP BY Genre_Name
ORDER BY COUNT(*) DESC;
I think this should work:
SELECT Genre.Genre_Name, count(Member.MID) as Number_of_People_aged_20_to_30
FROM
((((Genre
INNER JOIN Film_Actor_Director_Genre ON Genre.GID = Film_Actor_Director_Genre.GID)
INNER JOIN Film ON Film_Actor_Director_Genre.FID = Film.FID)
INNER JOIN List ON Film.FID = List.FID)
INNER JOIN Member ON Member.MID = List.MID)
WHERE (((Member.[Date_of_Birth]) Between #4/16/1995# And #4/16/1985#))
GROUP BY Genre.Genre_Name
ORDER BY count(Member.MID) DESC;

Left outer join two levels deep in Postgres results in cartesian product

Given the following 4 tables:
CREATE TABLE events ( id, name )
CREATE TABLE profiles ( id, event_id )
CREATE TABLE donations ( amount, profile_id )
CREATE TABLE event_members( id, event_id, user_id )
I'm attempting to get a list of all events, along with a count of any members, and a sum of any donations. The issue is the sum of donations is coming back wrong (appears to be a cartesian result of donations * # of event_members).
Here is the SQL query (Postgres)
SELECT events.name, COUNT(DISTINCT event_members.id), SUM(donations.amount)
FROM events
LEFT OUTER JOIN profiles ON events.id = profiles.event_id
LEFT OUTER JOIN donations ON donations.profile_id = profiles.id
LEFT OUTER JOIN event_members ON event_members.event_id = events.id
GROUP BY events.name
The sum(donations.amount) is coming back = to the actual sum of donations * number of rows in event_members. If I comment out the count(distinct event_members.id) and the event_members left outer join, the sum is correct.
As I explained in an answer to the referenced question you need to aggregate before joining to avoid a proxy CROSS JOIN. Like:
SELECT e.name, e.sum_donations, m.ct_members
FROM (
SELECT e.id AS event_id, e.name, SUM(d.amount) AS sum_donations
FROM events e
LEFT JOIN profiles p ON p.event_id = e.id
LEFT JOIN donations d ON d.profile_id = p.id
GROUP BY 1, 2
) e
LEFT JOIN (
SELECT m.event_id, count(DISTINCT m.id) AS ct_members
FROM event_members m
GROUP BY 1
) m USING (event_id);
IF event_members.id is the primary key, then id is guaranteed to be UNIQUE in the table and you can drop DISTINCT from the count:
count(*) AS ct_members
You seem to have this two independent structures (-[ means 1-N association):
events -[ profiles -[ donations
events -[ event members
I wrapped the second one into a subquery:
SELECT events.name,
member_count.the_member_count
COUNT(DISTINCT event_members.id),
SUM(donations.amount)
FROM events
LEFT OUTER JOIN profiles ON events.id = profiles.event_id
LEFT OUTER JOIN donations ON donations.profile_id = profiles.id
LEFT OUTER JOIN (
SELECT
event_id,
COUNT(*) AS the_member_count
FROM event_members
GROUP BY event_id
) AS member_count
ON member_count.event_id = events.id
GROUP BY events.name
Of course you get a cartesian product between donations and events for every event since both are only bound to the event, there is no join relation between donations and event_members other than the event id, which of course means that every member matches every donation.
When you do your query, you ask for all events - let's say there are two, event Alpha and event Beta - and then JOIN with the members. Let's say that there is a member Alice that participates on both events.
SELECT events.name, COUNT(DISTINCT event_members.id), SUM(donations.amount)
FROM events
LEFT OUTER JOIN profiles ON events.id = profiles.event_id
LEFT OUTER JOIN donations ON donations.profile_id = profiles.id
LEFT OUTER JOIN event_members ON event_members.event_id = events.id
GROUP BY events.name
On each row you asked the total for Alice's donations. If Alice donated 100 USD, then you asked for:
Alpha Alice 100USD
Beta Alice 100USD
So it's not surprising that when asking for the sum total Alice comes out as having donated 200 USD.
If you wanted the sum of all donations, you'd better doing with two distinct queries. Trying to do everything with a single query, while possible, would be a classical SQL Antipattern (actually the one in chapter #18, "Spaghetti Query"):
Unintended Products
One common consequence of producing all your
results in one query is a Cartesian product. This happens when two of
the tables in the query have no condition restricting their
relationship. Without such a restriction, the join of two tables pairs
each row in the first table to every row in the other table. Each such
pairing becomes a row of the result set, and you end up with many more
rows than you expect.

Getting individual counts of a tables column after joining other tables

I'm having problems getting an accurate count of a column after joining others. When a column is joined I would still like to have a DISTINCT count of the table that it is being joined on.
A restaurant has multiple meals, meals have multiple food groups, food groups have multiple ingredients.
Through the restaurants id I want to be able to calculate how many of meals, food groups, and ingrediants the restaurant has.
When I join the food_groups the count for meals increases as well (I understand this is natural behavior I just don't understand how to get what I need due to it.) I have tried DISTINCT and other things I have found, but nothing seems to do the trick. I would like to keep this to one query rather than splitting it up into multiple ones.
SELECT
COUNT(meals.id) AS countMeals,
COUNT(food_groups.id) AS countGroups,
COUNT(ingrediants.id) AS countIngrediants
FROM
restaurants
INNER JOIN
meals ON restaurants.id = meals.restaurant_id
INNER JOIN
food_groups ON meals.id = food_groups.meal_id
INNER JOIN
ingrediants ON food_groups.id = ingrediants.food_group_id
WHERE
restaurants.id='43'
GROUP BY
restaurants.id
Thanks!
The DISTINCT goes inside the count
SELECT
COUNT(DISTINCT meals.id) AS countMeals,
COUNT(DISTINCT food_groups.id) AS countGroups,
COUNT(DISTINCT ingrediants.id) AS countIngrediants
FROM
restaurants
INNER JOIN
meals ON restaurants.id = meals.restaurant_id
INNER JOIN
food_groups ON meals.id = food_groups.meal_id
INNER JOIN
ingrediants ON food_groups.id = ingrediants.food_group_id
WHERE
restaurants.id='43'
GROUP BY
restaurants.id
You're going to have to do subqueries, I think. Something like:
SELECT
(SELECT COUNT(1) FROM meals m WHERE m.restaurant_id = r.id) AS countMeals,
(SELECT COUNT(1) FROM food_groups fg WHERE fg.meal_id = m.id) AS countGroups,
(SELECT COUNT(1) FROM ingrediants i WHERE i.food_group_id = fg.id) AS countGroups
FROM restaurants r
Where were you putting your DISTINCT and on which columns? When using COUNT() you need to do the distinct inside the parentheses and you need to do it over a single column that is distinct for what you're trying to count. For example:
SELECT
COUNT(DISTINCT M.id) AS count_meals,
COUNT(DISTINCT FG.id) AS count_food_groups,
COUNT(DISTINCT I.id) AS count_ingredients
FROM
Restaurants R
INNER JOIN Meals M ON M.restaurant_id = R.id
INNER JOIN Food_Groups FG ON FG.meal_id = M.id
INNER JOIN Ingredients I ON I.food_group_id = FG.id
WHERE
R.id='43'
Since you're selecting for a single restaurant, you shouldn't need the GROUP BY. Also, unless this is in a non-English language, I think you misspelled ingredients.