Getting single row from JOIN given an additional condition - sql

I'm making a select in which I give a year (hardcoded as 1981 below) and I expect to get one row per qualifying band. The main problem is to get the oldest living member for each band:
SELECT b.id_band,
COUNT(DISTINCT a.id_album),
COUNT(DISTINCT s.id_song),
COUNT(DISTINCT m.id_musician),
(SELECT name FROM MUSICIAN WHERE year_death IS NULL ORDER BY(birth)LIMIT 1)
FROM BAND b
LEFT JOIN ALBUM a ON(b.id_band = a.id_band)
LEFT JOIN SONG s ON(a.id_album = s.id_album)
JOIN MEMBER m ON(b.id_band= m.id_band)
JOIN MUSICIAN mu ON(m.id_musician = mu.id_musician)
/*LEFT JOIN(SELECT name FROM MUSICIAN WHERE year_death IS NULL
ORDER BY(birth) LIMIT 1) AS alive FROM mu*/ -- ??
WHERE b.year_formed = 1981
GROUP BY b.id_band;
I would like to obtain the oldest living member from mu for each band. But I just get the oldest musician overall from the relation MUSICIAN.
Here is screenshot showing output for my current query:

Well, I think you can follow the structure that you have, but you need JOINs in in the subquery.
SELECT b.id_band,
COUNT(DISTINCT a.id_album),
COUNT(DISTINCT s.id_song),
COUNT(DISTINCT mem.id_musician),
(SELECT m.name
FROM MUSICIAN m JOIN
MEMBER mem
ON mem.id_musician = m.id_musician
WHERE m.year_death IS NULL AND mem.id_band = b.id_band
ORDER BY m.birth
LIMIT 1
) as oldest_member
FROM BAND b LEFT JOIN
ALBUM a
ON b.id_band = a.id_band LEFT JOIN
SONG s
ON a.id_album = s.id_album LEFT JOIN
MEMBER mem
ON mem.id_band = b.id_band
WHERE b.year_formed = 1981
GROUP BY b.id_band

Following query will give you oldest member of each band group. You can put filter by year_formed = 1981 if you need.
SELECT
b.id_band,
total_albums,
total_songs,
total_musicians
FROM
(
SELECT b.id_band,
COUNT(DISTINCT a.id_album) as total_albums,
COUNT(DISTINCT s.id_song) as total_songs,
COUNT(DISTINCT m.id_musician) as total_musicians,
dense_rank() over (partition by b.id_band order by mu.year_death desc) as rnk
FROM BAND b
LEFT JOIN ALBUM a ON(b.id_band = a.id_band)
LEFT JOIN SONG s ON(a.id_album = s.id_album)
JOIN MEMBER m ON(b.id_band= m.id_band)
JOIN MUSICIAN mu ON(m.id_musician = mu.id_musician)
WHERE mu.year_death is NULL
)
where rnk = 1

You can reference a table that is out of this nested select, like so
SELECT b.id_band,
COUNT(DISTINCT a.id_album),
COUNT(DISTINCT s.id_song),
COUNT(DISTINCT m.id_musician),
(SELECT name FROM MUSICIAN WHERE year_death IS NULL ORDER BY(birth) AND
MUSICIAN.id_BAND = b.id_band LIMIT 1)
FROM BAND b
LEFT JOIN ALBUM a ON(b.id_band = a.id_band)
LEFT JOIN SONG s ON(a.id_album = s.id_album)
JOIN MEMBER m ON(b.id_band= m.id_band)
JOIN MUSICIAN mu ON(m.id_musician = mu.id_musician)
/*LEFT JOIN(SELECT name FROM MUSICIAN WHERE year_death IS NULL ORDER
BY(birth)LIMIT 1) AS alive FROM mu*/
WHERE b.year_formed= 1981
GROUP BY b.id_band

For queries where you want to find the "max person by age" you can use ROW_NUMBER() grouped by the band
SELECT b.id_band,
COUNT(DISTINCT a.id_album),
COUNT(DISTINCT s.id_song),
COUNT(DISTINCT m.id_musician),
oldest_living_members.*
FROM
band b
LEFT JOIN album a ON(b.id_band = a.id_band)
LEFT JOIN song s ON(a.id_album = s.id_album)
LEFT JOIN
(
SELECT
m.id_band
mu.*,
ROW_NUMBER() OVER(PARTITION BY m.id_band ORDER BY mu.birthdate ASC) rown
FROM
MEMBER m
JOIN MUSICIAN mu ON(m.id_musician = mu.id_musician)
WHERE year_death IS NULL
) oldest_living_members
ON
b.id_band = oldest_living_members.id_band AND
oldest_living_members.rown = 1
WHERE b.year_formed= 1981
GROUP BY b.id_band
If you run just the subquery you'll see how it's working = artists are joined to member to get the band id, and this forms a partition. Rownumber will start numbering from 1 according to the order of birthdates (I didn't know what your column name for birthday was; you'll have to edit it) so the oldest person (earliest birthday) gets a 1.. Every time the band id changes the numbering will restart from 1 with the oldest person in that band. Then when we join it we just pick the 1s

I think this should be considerably faster (while also solving your problem):
SELECT b.id_band, a.*, m.*
FROM band b
LEFT JOIN LATERAL (
SELECT count(*) AS ct_albums, sum(ct_songs) AS ct_songs
FROM (
SELECT id_album, count(*) AS ct_songs
FROM album a
LEFT JOIN song s USING (id_album)
WHERE a.id_band = b.id_band
GROUP BY 1
) ab
) a ON true
LEFT JOIN LATERAL (
SELECT count(*) OVER () AS ct_musicians
, name AS senior_member -- any other columns you need?
FROM member m
JOIN musician mu USING (id_musician)
WHERE m.id_band = b.id_band
ORDER BY year_death IS NOT NULL -- sorts the living first
, birth
, name -- as tiebreaker (my optional addition)
LIMIT 1
) m ON true
WHERE b.year_formed = 1981;
Getting the senior band member is solved in the LATERAL subquery m - without multiplying the cost for the base query. It works because the window function count(*) OVER () is computed before ORDER BY and LIMIT are applied. Since bands naturally only have few members, this should be the fastest possible way. See:
Best way to get result count before LIMIT was applied
What is the difference between LATERAL and a subquery in PostgreSQL?
Prevent duplicate values in LEFT JOIN
The other optimization for counting albums and songs builds on the assumption that the same id_song is never included in multiple albums of the same band. Else, those are counted multiple times. (Easily fixed, and uncorrelated to the task of getting the senior band member.)
The point is to eliminate the need for DISTINCT at the top level after multiplying rows at the N-side repeatedly (I like to call that "proxy cross join"). That would produce a possibly huge number of rows in the derived table without need.
Plus, it's much more convenient to retrieve additional column (like more columns for the senior band member) than with some other query styles.

Related

Not getting 0 value in SQL count aggregate by inner join

I am using the basic chinook database and I am trying to get a query that will display the worst selling genres. I am mostly getting the answer, however there is one genre 'Opera' that has 0 sales, but the query result is ignoring that and moving on to the next lowest non-zero value.
I tried using left join instead of inner join but that returns different values.
This is my query currently:
create view max
as
select distinct
t1.name as genre,
count(*) as Sales
from
tracks t2
inner join
invoice_items t3 on t2.trackid == t3.trackid
left join
genres as t1 on t1.genreid == t2.genreid
group by
t1.genreid
order by
2
limit 10;
The result however skips past the opera value which is 0 sales. How can I include that? I tried using left join but it yields different results.
Any help is appreciated.
If you want to include genres with no sales then you should start the joins from genres and then do LEFT joins to the other tables.
Also, you should not use count(*) which counts any row in the resultset.
SELECT g.name Genre,
COUNT(i.trackid) Sales
FROM genres g
LEFT JOIN tracks t ON t.genreid = g.genreid
LEFT JOIN invoice_items i ON i.trackid = t.trackid
GROUP BY g.genreid
ORDER BY Sales LIMIT 10;
There is no need for the keyword DISTINCT, since the query returns 1 row for each genre.
When asking for the top n one must always state how to deal with ties. If I am looking for the top 1, but there are three rows in the table, all with the same value, shall I select 3 rows? Zero rows? One row arbitrarily chosen? Most often we don't want arbitrary results, which excludes the last option. This excludes LIMIT, too, because LIMIT has no clause for ties in SQLite.
Here is an example with DENSE_RANK instead. You are looking for the worst selling genres, so we must probably look at the revenue per genre, which is the sum of price x quantity sold. In order to include genres without invoices (and maybe even without tracks?) we outer join this data to the genre table.
select total, genre_name
from
(
select
g.name as genre_name,
coalesce(sum(ii.unit_price * ii.quantity), 0) as total
dense_rank() over (order by coalesce(sum(ii.unit_price * ii.quantity), 0)) as rnk
from genres g
left join tracks t on t.genreid = g.genreid
left join invoice_items ii on ii.trackid = t.trackid
group by g.name
) aggregated
where rnk <= 10
order by total, genre_name;

How to find the single highest value? How to show everything if there is a tie?

I want to have the output of the movie(s) that have the most awards. The problem I'm having is how do I show a single movie? I tried to make a PIVOT function, and use the MAX() function rather than the COUNT() function; however I am would only get the output of 1 with almost all of the rows. I would however like to use the MAX() function to do this. I also want to know how can I show all "Movies" if there would be a tie? From my information there isn't going to be any tie, however if there were to be one, I would like it if all of the information would be shown.
Expected output:
MOVIE Awards Won
----------------------------------- ----------
Saving Private Ryan 6
1 rows selected
Output with my query:
MOVIE Awards Won
----------------------------------- ----------
A Lonely Place to Die 5
Act of Valor 0
Captain America: The First Avenger 2
Date Night 1
Drive Angry 0
Saving Private Ryan 6
Taken 1
7 rows selected
Here is my query:
SELECT * FROM
(
SELECT MovieTitle AS "MOVIE",
TBLAWARDRESULT.AWARDRESULTDESC AS "Result Type",
TBLAWARDRESULT.AWARDRESULTID AS "Rating"
FROM TBLMOVIE
INNER JOIN TBLAWARDDETAIL
ON TBLMOVIE.MOVIEID = TBLAWARDDETAIL.MOVIEID
INNER JOIN TBLAWARDRESULT
ON TBLAWARDDETAIL.AWARDRESULTID = TBLAWARDRESULT.AWARDRESULTID
ORDER BY Movietitle
)
PIVOT
(
COUNT("Rating") FOR "Result Type"
IN ('Won' AS "Awards Won")
)
ORDER BY Movie;
Tables:
File1 (PasteBin)
File2 (PasteBin)
Use RANK function to order the results by award count descending, which would get you multiple rows in case of ties as well.
SELECT MOVIE,Awards_Won
FROM (
SELECT
MovieTitle AS "MOVIE",
COUNT(TBLAWARDRESULT.AWARDRESULTID) AS Awards_Won,
RANK() OVER(ORDER BY COUNT(TBLAWARDRESULT.AWARDRESULTID) DESC) RNK
FROM TBLMOVIE
INNER JOIN TBLAWARDDETAIL ON TBLMOVIE.MOVIEID = TBLAWARDDETAIL.MOVIEID
INNER JOIN TBLAWARDRESULT ON TBLAWARDDETAIL.AWARDRESULTID = TBLAWARDRESULT.AWARDRESULTID
WHERE TBLAWARDRESULT.AWARDRESULTDESC = 'Won'
) t
WHERE RNK = 1
Don't use pivot. Use window functions:
SELECT "MOVIE", AWARDS_WON
FROM (SELECT m.MovieTitle AS "MOVIE", COUNT(*) as AWARDS_WON,
RANK() OVER (PARTITION BY m.MovieTitle ORDER BY COUNT(*) DESC) as seqnum
FROM TBLMOVIE m INNER JOIN
TBLAWARDDETAIL ad
ON m.MOVIEID = ad.MOVIEID INNER JOIN
TBLAWARDRESULT ar
ON ad.AWARDRESULTID = ar.AWARDRESULTID
WHERE ar.AWARDTYPE = 'Won'
GROUP BY m.MovieTitle
) m
WHERE seqnum = 1;
If your on Oracle 12c there is the slightly simpler option to ROW_NUMBER of using FETCH.
SELECT m.MovieTitle MOVIE, COUNT(1) AS "Awards Won"
FROM
TBLMOVIE m
INNER JOIN
TBLAWARDDETAIL ad ON m.MovieID = ad.MovieID
INNER JOIN
TBLAWARDRESULT ar ON ad.AwardResultID = ar.AwardResultID
WHERE ar.AwardResultDesc = 'Won'
GROUP BY m.MovieTitle
ORDER BY "Awards Won" DESC
FETCH FIRST ROW ONLY

Sort teams by average vote for a given jury

I have the following schema :
teams(id, name)
jury(id, name)
criteria(id, name, coefficient, jury_id)
vote(id, team_id, jury_id, value, criterion_id)
I would like to get every team and order them by average vote for a given jury.
Here is my current SQL:
SELECT teams.*,
SUM(votes.value * criteria.coefficient) / SUM(criteria.coefficient) AS rating
FROM "teams"
LEFT JOIN "votes" ON "teams"."id" = "votes"."team_id"
LEFT JOIN "criteria" ON "votes"."criterion_id" = "criteria"."id"
WHERE (votes.jury_id = 3510 OR votes.jury_id IS NULL)
GROUP BY teams.id
ORDER BY rating DESC NULLS LAST, teams.id
This works well for the following cases:
The team as vote for the selected jury
The team as vote for the selected jury and for other jury (the vote for other jury is not taken into account)
The team as no vote at all (the team appears at the end of the list)
It DOES NOT work for the following case:
The team is voted for another jury but not on the selected jury (in this case, the team does not appear in the list)
How could I make this work.
I finally came with the following SQL:
SELECT "teams".*
FROM (
SELECT teams.*, SUM(votes.value * criteria.coefficient) / SUM(criteria.coefficient) AS rating, teams.id
FROM "teams"
LEFT JOIN "votes" ON "teams"."id" = "votes"."team_id"
LEFT JOIN "criteria" ON "votes"."criterion_id" = "criteria"."id"
WHERE (votes.jury_id = 3613 OR votes.jury_id IS NULL)
GROUP BY teams.id
UNION
SELECT teams.*, NULL AS rating, teams.id
FROM "teams"
INNER JOIN "votes" ON "votes"."team_id" = "teams"."id"
INNER JOIN "criteria" ON "criteria"."id" = "votes"."criterion_id"
GROUP BY teams.id HAVING EVERY(votes.jury_id != 3613)
) AS teams
ORDER BY rating DESC NULLS LAST, teams.created_at
However, I could not sort on teams.id because it says the column in ambiguous. I tried replacing 'AS teams' with another alias without success so I used another column created_at instead.

Find MAX with JOIN where Field also shows up in another Table

I have 3 tables: Master, Paper and iCodes. For a certain set of Master.Ref's, I need to find Max(Paper.Date), where the Paper.Code is also in the iCodes table (i.e., Paper.Code is a type of iCode). Master is joined to Paper by the File field.
EDIT:
I only need the Max(Paper.Date) its corresponding Code; I do not need all of the Codes.
I wrote the following but it is very slow. I have a few hundred ref #'s to look for. What is a better way to do this?
SELECT Master.Ref,
Paper.Code,
mp.MaxDate
FROM ( SELECT p.File ,
MAX(p.Date) AS MaxDate ,
FROM Paper AS p
LEFT JOIN Master AS m ON p.File = m.File
WHERE m.Ref IN ('ref1', 'ref2', 'ref3', 'ref4', 'ref5', 'ref6'... )
AND p.Code IN ( SELECT DISTINCT i.iCode
FROM iCodes AS i
)
GROUP BY p.File
) AS mp
LEFT JOIN Master ON mp.File = Master.File
LEFT JOIN Paper ON Master.File = Paper.File
AND mp.MaxDate = Paper.Date
WHERE Paper.Code IN ( SELECT DISTINCT iCodes.iCode
FROM iCodes
)
Does this do what you want?
SELECT m.Ref, p.Code, max(p.date)
FROM Master m LEFT JOIN
Paper
ON m.File = p.File
WHERE p.Code IN (SELECT DISTINCT iCodes.iCode FROM iCodes) and
m.Ref IN ('ref1','ref2','ref3','ref4','ref5','ref6'...)
GROUP BY m.Ref, p.Code;
EDIT:
To get the code on the max date, then use window functions:
select ref, code, date
from (SELECT m.Ref, p.Code, p.date
row_number() over (partition by m.Ref order by p.date desc) as seqnum
FROM Master m LEFT JOIN
Paper
ON m.File = p.File
WHERE p.Code IN (SELECT DISTINCT iCodes.iCode FROM iCodes) and
m.Ref IN ('ref1','ref2','ref3','ref4','ref5','ref6'...)
) mp
where seqnum = 1;
The function row_number() assigns a sequential number starting at 1 to a group of rows. The groups are defined by the partition by clause, so in this case everything with the same m.Ref value would be in a single group. Within the group, rows are assigned the number based on the order by clause. So, the one with the biggest date gets the value of 1. That is the row you want.

Help in a Join query

SELECT game_ratingstblx245v.game_id,avg( game_ratingstblx245v.rating )
as avg_rating,
count(DISTINCT game_ratingstblx245v.userid)
as count,
game_data.name,
game_data.id ,
avg(game_ratings.critic_rating),count(DISTINCT game_ratings.critic)
as cr_count
FROM game_data
LEFT JOIN game_ratingstblx245v ON game_ratingstblx245v.game_id = game_data.id
LEFT JOIN game_ratings ON game_ratings.game_id = game_data.id
WHERE game_data.release_date < NOW()
GROUP BY game_ratingstblx245v.game_id
ORDER BY game_data.release_date DESC,
game_data.name
I am currenty using this query to extract values from 3 tables
game_data - id(foreign key), name, release_date \games info
game_ratings - game_id(foreign key),critic , rating \critic rating
game_ratingstblx245v - game_id(foreign key), rating, userid \user rating
What I want to do with this query is select all id's from table game_data order by release_date descending, then check the avg rating from table game_ratings and game_ratingsblx245v corresponding to individual id's(if games have not been rated the result should return null from fields of the latter two tables)..Now the problem I am facing here is the result is not coming out as expected(some games which have not been rated are showing up while others are not), can you guys check my query and tell me where am i wrong if so...Thanks
You shouldn't use the game_ratingstblx245v.game_id column in your GROUP BY, since it could be NULL when there are no ratings for a given game id. Use game_data.id instead.
Here's how I would write the query:
SELECT g.id, g.name,
AVG( x.rating ) AS avg_user_rating,
COUNT( DISTINCT x.userid ) AS user_count,
AVG( r.critic_rating ) AS avg_critic_rating,
COUNT( DISTINCT r.critic ) AS critic_count
FROM game_data g
LEFT JOIN game_ratingstblx245v x ON (x.game_id = g.id)
LEFT JOIN game_ratings r ON (r.game_id = g.id)
WHERE g.release_date < NOW()
GROUP BY g.id
ORDER BY g.release_date DESC, g.name;
Note that although this query produces a Cartesian product between x and r, it doesn't affect the calculation of the average ratings. Just be aware in the future that if you were doing SUM() or COUNT(), the calculations could be exaggerated by an unintended Cartesian product.