How do i get multiple values in sql? - sql

I'm stuck in loop.
I have a postgres database and i have to filter the top-5, unique co-author pairs by number of jointly published papers.
I want to return pairs of author names and paper count
What i have so far:
select persons.name as person, count(papers.pkey) as amount
from persons
inner join authpapers on authpapers.akey = persons.akey
inner join papers on authpapers.pkey = papers.pkey
group by persons.name
order by amount desc limit 5;
As a result i get the first 5 authors ba name and paper count but i want papers where an author had a co author.

You can self-join the persons table to generate authors combinations, then bring each person's papers by joining authpapers once per user, and finally filter on matching papers. The final step is aggregation and sorting:
select p1.name as person1, p2.name person2, count(*) as amount
from persons p1
inner join persons p2 on p2.akey > p1.akey
inner join authpapers ap1 on ap1.akey = p1.akey
inner join authpapers ap2 on ap2.akey = p2.akey
where ap1.pkey = ap2.pkey
group by p1.akey, p2.akey
order by amount desc limit 5;
Note that you don't need the papers table to get the results that you want.

Related

Not getting 0 value in SQL count aggregate by inner join

I am using the basic chinook database and I am trying to get a query that will display the worst selling genres. I am mostly getting the answer, however there is one genre 'Opera' that has 0 sales, but the query result is ignoring that and moving on to the next lowest non-zero value.
I tried using left join instead of inner join but that returns different values.
This is my query currently:
create view max
as
select distinct
t1.name as genre,
count(*) as Sales
from
tracks t2
inner join
invoice_items t3 on t2.trackid == t3.trackid
left join
genres as t1 on t1.genreid == t2.genreid
group by
t1.genreid
order by
2
limit 10;
The result however skips past the opera value which is 0 sales. How can I include that? I tried using left join but it yields different results.
Any help is appreciated.
If you want to include genres with no sales then you should start the joins from genres and then do LEFT joins to the other tables.
Also, you should not use count(*) which counts any row in the resultset.
SELECT g.name Genre,
COUNT(i.trackid) Sales
FROM genres g
LEFT JOIN tracks t ON t.genreid = g.genreid
LEFT JOIN invoice_items i ON i.trackid = t.trackid
GROUP BY g.genreid
ORDER BY Sales LIMIT 10;
There is no need for the keyword DISTINCT, since the query returns 1 row for each genre.
When asking for the top n one must always state how to deal with ties. If I am looking for the top 1, but there are three rows in the table, all with the same value, shall I select 3 rows? Zero rows? One row arbitrarily chosen? Most often we don't want arbitrary results, which excludes the last option. This excludes LIMIT, too, because LIMIT has no clause for ties in SQLite.
Here is an example with DENSE_RANK instead. You are looking for the worst selling genres, so we must probably look at the revenue per genre, which is the sum of price x quantity sold. In order to include genres without invoices (and maybe even without tracks?) we outer join this data to the genre table.
select total, genre_name
from
(
select
g.name as genre_name,
coalesce(sum(ii.unit_price * ii.quantity), 0) as total
dense_rank() over (order by coalesce(sum(ii.unit_price * ii.quantity), 0)) as rnk
from genres g
left join tracks t on t.genreid = g.genreid
left join invoice_items ii on ii.trackid = t.trackid
group by g.name
) aggregated
where rnk <= 10
order by total, genre_name;

Query to find students that failed all given subjects

I'm trying to find the students that have failed every subject in a set of subjects via PostgreSQL queries.
Students fail a subject if they have a not null mark < 50 for at least one course offering of the subject. And I want to find the students that have failed all subjects in the set of subjects Relevant_subjects.
NOTE: students can have several records per course.
SELECT People.name
FROM
Relevant_subjects
JOIN Courses on (Courses.subject = Relevant_subjects.id)
JOIN Course_enrolments on (Course_enrolments.course = Courses.id)
JOIN Students on (Students.id = Course_enrolments.student)
JOIN People on (People.id = Students.id)
WHERE
Course_enrolments.mark is not null AND
Course_enrolments.mark < 50 AND
;
With the code above, I get the students that has failed any of the Relevant_subjects but I my desired result is to get the students that has failed all Relevant_subjects. How can I do that?
Students fail a subject if they have a not null mark < 50 for at least one course offering of the subject.
One of many possible ways:
SELECT id, p.name
FROM (
SELECT s.id
FROM students s
CROSS JOIN relevant_subjects rs
GROUP BY s.id
HAVING bool_and( EXISTS(
SELECT -- empty list
FROM course_enrolments ce
JOIN courses c ON c.id = ce.course
WHERE ce.mark < 50 -- also implies NOT NULL
AND ce.student = s.id
AND c.subject = rs.id
)
) -- all failed
) sub
JOIN people p USING (id);
Form a Carthesian Product of students and relevant subjects.
Aggregate by student (s.id) and filter those who failed all subjects in the HAVING clause with bool_and()over a correlated EXISTS subquery testing for at least one such failed course for each student-subject combination.
Join to people as final cosmetic step to get student names. I added id to get unique results (as names probably are not guaranteed to be unique).
Depending on actual table definition, your version of Postgres, cardinalities and value distribution, there may be (much) more efficient queries.
It's a case of relational-division at its core. See:
How to filter SQL results in a has-many-through relation
And the most efficient strategy is to eliminate as many students as possible as early in the query as possible - like by checking the subject with the fewest failing students first. Then proceed with only the remaining students etc.
Your case adds the specific difficulty that the number and identities of subjects to be tested are unknown / dynamic. Typically, a recursive CTE or similar offers best performance for this kind of problem:
SQL query to find a row with a specific number of associations
I would use aggregation:
SELECT p.name
FROM Relevant_subjects rs JOIN
Courses c
ON c.subject = rs.id JOIN
Course_enrolments ce
ON ce.course = c.id JOIN
Students s
ON s.id = ce.student JOIN
People p
ON p.id = s.id
WHERE ce.mark < 50
GROUP BY p.id, p.name
HAVING COUNT(*) = (SELECT COUNT(*) FROM relevant_subjects);
Note: This version assumes that students only have one record per course and relevant_subjects has no duplicates. These can easily be handling using COUNT(DISTINCT) if necessary.
To handle duplicates, this would look like:
SELECT p.name
FROM Relevant_subjects rs JOIN
Courses c
ON c.subject = rs.id JOIN
Course_enrolments ce
ON ce.course = c.id JOIN
Students s
ON s.id = ce.student JOIN
People p
ON p.id = s.id
WHERE ce.mark < 50
GROUP BY p.id, p.name
HAVING COUNT(DISTINCT rs.id) = (SELECT COUNT(DISTINCT rs2.id) FROM relevant_subjects rs2);

SQL Access: how to obtain output involving multiple tables without running 2 queries?

I would like to find out the most popular genre of film for a certain age group, for example 20-30 year-olds. I'm quite new to SQL and would appreciate any help I can get, apologies if this is too minor.
The relevant tables for this query are:
FILM {FID (PK), ..., Film_Title}
MEMBER {MID (PK), ..., Date_of_Birth}
LIST {MID (FK), FID (FK)}
GENRE {GID (PK), Genre}
FILM_ACTOR_DIRECTOR_GENRE {FID (FK), ..., GID (FK)}
FILM and MEMBER table should be quite self-explanatory, while a LIST is a selection of films a MEMBER wishes to rent. It's like a shopping basket. Each member only has one list and each list can contain many films. FILM_ACTOR_DIRECTOR_GENRE contains Genre belonging to each film. Each film can only have one genre.
So far I have managed to get an output which shows:
Genre # People Aged 20-30
------- -------------------
Action 5
Comedy 4
Horror 2
etc. etc.
However it involves creating a table and then running another query. Is there a way to obtain the most popular genre within a particular age group without having to run 2 separate queries?
The 2 queries I've used are:
SELECT DISTINCT Genre.Genre_Name, Member.Date_of_Birth
INTO Genre_by_Age
FROM
((((Genre
INNER JOIN Film_Actor_Director_Genre ON Genre.GID = Film_Actor_Director_Genre.GID)
INNER JOIN Film ON Film_Actor_Director_Genre.FID = Film.FID)
INNER JOIN List ON Film.FID = List.FID)
INNER JOIN Member ON Member.MID = List.MID)
WHERE (((Member.[Date_of_Birth]) Between #4/16/1995# And #4/16/1985#));
for creating the new table with information I want, and:
SELECT Genre_Name, COUNT(*) as Number_of_People_aged_20_to_30
FROM Genre_by_Age
GROUP BY Genre_Name
ORDER BY COUNT(*) DESC;
to obtain the output shown above.
Is there a way to obtain the above result without running 2 separate queries? Thanks for your time!
How about using a subquery?
SELECT Genre_Name, COUNT(*) as Number_of_People_aged_20_to_30
FROM (SELECT DISTINCT Genre.Genre_Name, Member.Date_of_Birth
FROM ((((Genre
INNER JOIN Film_Actor_Director_Genre ON Genre.GID = Film_Actor_Director_Genre.GID)
INNER JOIN Film ON Film_Actor_Director_Genre.FID = Film.FID)
INNER JOIN List ON Film.FID = List.FID)
INNER JOIN Member ON Member.MID = List.MID)
WHERE (((Member.[Date_of_Birth]) Between #4/16/1995# And #4/16/1985#))
) as t
GROUP BY Genre_Name
ORDER BY COUNT(*) DESC;
I think this should work:
SELECT Genre.Genre_Name, count(Member.MID) as Number_of_People_aged_20_to_30
FROM
((((Genre
INNER JOIN Film_Actor_Director_Genre ON Genre.GID = Film_Actor_Director_Genre.GID)
INNER JOIN Film ON Film_Actor_Director_Genre.FID = Film.FID)
INNER JOIN List ON Film.FID = List.FID)
INNER JOIN Member ON Member.MID = List.MID)
WHERE (((Member.[Date_of_Birth]) Between #4/16/1995# And #4/16/1985#))
GROUP BY Genre.Genre_Name
ORDER BY count(Member.MID) DESC;

Getting individual counts of a tables column after joining other tables

I'm having problems getting an accurate count of a column after joining others. When a column is joined I would still like to have a DISTINCT count of the table that it is being joined on.
A restaurant has multiple meals, meals have multiple food groups, food groups have multiple ingredients.
Through the restaurants id I want to be able to calculate how many of meals, food groups, and ingrediants the restaurant has.
When I join the food_groups the count for meals increases as well (I understand this is natural behavior I just don't understand how to get what I need due to it.) I have tried DISTINCT and other things I have found, but nothing seems to do the trick. I would like to keep this to one query rather than splitting it up into multiple ones.
SELECT
COUNT(meals.id) AS countMeals,
COUNT(food_groups.id) AS countGroups,
COUNT(ingrediants.id) AS countIngrediants
FROM
restaurants
INNER JOIN
meals ON restaurants.id = meals.restaurant_id
INNER JOIN
food_groups ON meals.id = food_groups.meal_id
INNER JOIN
ingrediants ON food_groups.id = ingrediants.food_group_id
WHERE
restaurants.id='43'
GROUP BY
restaurants.id
Thanks!
The DISTINCT goes inside the count
SELECT
COUNT(DISTINCT meals.id) AS countMeals,
COUNT(DISTINCT food_groups.id) AS countGroups,
COUNT(DISTINCT ingrediants.id) AS countIngrediants
FROM
restaurants
INNER JOIN
meals ON restaurants.id = meals.restaurant_id
INNER JOIN
food_groups ON meals.id = food_groups.meal_id
INNER JOIN
ingrediants ON food_groups.id = ingrediants.food_group_id
WHERE
restaurants.id='43'
GROUP BY
restaurants.id
You're going to have to do subqueries, I think. Something like:
SELECT
(SELECT COUNT(1) FROM meals m WHERE m.restaurant_id = r.id) AS countMeals,
(SELECT COUNT(1) FROM food_groups fg WHERE fg.meal_id = m.id) AS countGroups,
(SELECT COUNT(1) FROM ingrediants i WHERE i.food_group_id = fg.id) AS countGroups
FROM restaurants r
Where were you putting your DISTINCT and on which columns? When using COUNT() you need to do the distinct inside the parentheses and you need to do it over a single column that is distinct for what you're trying to count. For example:
SELECT
COUNT(DISTINCT M.id) AS count_meals,
COUNT(DISTINCT FG.id) AS count_food_groups,
COUNT(DISTINCT I.id) AS count_ingredients
FROM
Restaurants R
INNER JOIN Meals M ON M.restaurant_id = R.id
INNER JOIN Food_Groups FG ON FG.meal_id = M.id
INNER JOIN Ingredients I ON I.food_group_id = FG.id
WHERE
R.id='43'
Since you're selecting for a single restaurant, you shouldn't need the GROUP BY. Also, unless this is in a non-English language, I think you misspelled ingredients.

SQL Query Help - Return row in table which relates to another table row with max(column)

I have two tables:
Table1 = Schools
Columns: id(PK), state(nvchar(100)), schoolname
Table2 = Grades
Columns: id(PK), id_schools(FK), Year, Reading, Writing...
I would like to develop a query to find the schoolname which has the highest grade for Reading.
So far I have the following and need help to fill in the blanks:
SELECT Schools.schoolname, Grades.Reading
FROM Schools, Grades
WHERE Schools.id = (* need id_schools for max(Grades.Reading)*)
SELECT
Schools.schoolname,
Grades.Reading
FROM
Schools INNER JOIN Grades on Schools.id = Grades.id_schools
WHERE
Grades.Reading = (SELECT MAX(Reading) from Grades)
Here's how I solve this sort of problem without using a subquery:
SELECT s.*
FROM Schools AS s
JOIN Grades AS g1 ON g1.id_schools = s.id
LEFT OUTER JOIN Grades AS g2 ON g2.id_schools <> s.id
AND g1.Reading < g2.Reading
WHERE g2.id_schools IS NULL
Note that you can get more than one row back, if more than one school ties for highest Reading score. In that case, you need to decide how to resolve the tie and build that into the LEFT OUTER JOIN condition.
Re your comment: The left outer join looks for a row with a higher grade for the same school, and if none is found, all of g2.* columns will be null. In that case, we know that no grade is higher than the grade in the row g1 points to, which means g1 is the highest grade for that school. It can also be written this way, which is logically the same but might be easier to understand:
SELECT s.*
FROM Schools AS s
JOIN Grades AS g1 ON g1.id_schools = s.id
WHERE NOT EXISTS (
SELECT * FROM Grades g2
WHERE g2.id_schools <> s.id AND g2.Reading > g1.Reading)
You say it's not working. Can you be more specific? What is the answer you expect, and what's actually happening, and how do they differ?
edit: Changed = to <> as per suggestion in comment by #potatopeelings. Thanks!
This should do it
select * from Schools as s
where s.id=(
select top(1) id_schools from grades as g
order by g.reading desc)