SQL Query not getting the corresponding column - sql

SELECT t.title, Max(t.st)
FROM (SELECT title,
Avg(stars) AS st
FROM movie
JOIN rating USING(mid)
GROUP BY title)t;
This is my query which I am writing to get max values of AVG(stars) and it's corresponding title. Max value is coming fine but facing trouble in title. I am not getting corresponding title.
Output of subquery is:
Avatar 4.0
E.T. 2.5
Gone with the Wind 3.0
Raiders of the Lost Ark 3.33333333333
Snow White 4.5
The Sound of Music 2.5
Output of whole query is wrong .
The Sound of Music 4.5
Expected /Correct output is
SnowWhite 4.5

Try this
select t.title,
t.st
from
(select title,
avg(stars) as st,rank() over(order by avg(stars) desc) as rSt
from movie
join rating using(mID)
group by title)t where t.rSt=1 ;
It is ranking movies in descending order first, then in outer query's where condition, movie with highest rank is being selected. Hope this helps :-)

You can try using correlated subquery
select *
from
(select title,
avg(stars) as st
from movie a
join rating b on a.mID=b.mID
group by title
)t where st in
(select max(st) from (select title,
avg(stars) as st
from movie a
join rating b on a.mID=b.mID
group by title
)t1 on t.title=t1.title)

You can use WHERE condition in your query. You can experiment with joins or just use subquery.
Select title, max(stars)
from table
where stars = (select max(stars) from table)
or
SELECT top 1 title
FROM table
ORDER BY Stars DESC

Related

Get Average in SQL Through Join

I'm just playing around with SQL and I'm trying to do the following.
I have 2 tables and here is their structure:
Movies_metadata Movies
ratings table:
Ratings
As there are many ratings for one movie, what I'd like to do is get the avg rating per movie and have it display next to the title which is only available in the Metadata table.
This is as far as I got but obviously the issue with my SELECT statement is that it'll return the average of all movies and display it for each record:
SELECT
(SELECT
AVG(rating)
FROM
`movies-dataset.movies_data.ratings`) AS rating_avg,
metadata.title,
metadata.budget,
metadata.revenue,
metadata.genres,
metadata.original_language,
metadata.release_date
FROM
`movies-dataset.movies_data.Movies_metadata` AS metadata
INNER JOIN `movies-dataset.movies_data.ratings` AS ratings
ON metadata.id = ratings.movieId
LIMIT 10
Here is an example of the result:
Result
I'm thinking I can potentially use a GROUP BY but when I try, I get an error
Appreciate the help!
The following should work:
SELECT movies_metadata.title, AVG(ratings.rating)
FROM movies_metadata
LEFT JOIN ratings ON movies_metadata.id = ratings.movieID
GROUP BY movies_metadata.title
You can swap movies_metadata.title by movies_metadata.id if not unique.
The LIMIT function and GROUP function might conflict with each other. Try getting the average rating as part of the inner join like this:
SELECT
ratings.averagerating,
metadata.title,
metadata.budget,
metadata.revenue,
metadata.genres,
metadata.original_language,
metadata.release_date
FROM `movies-dataset.movies_data.Movies_metadata` AS metadata
INNER JOIN (SELECT movieId, AVG(rating) averagerating FROM `movies-dataset.movies_data.ratings` GROUP by movieId) AS ratings
ON metadata.id = ratings.movieId
ORDER BY ratings.averagerating
LIMIT 5
Maybe try something like:
Select m.movieID, (r.rate_sum / r.num_rate) as avg_rating
From your_movies_table m
Left Join (select movie_id, sum(rating) as ‘rate_sum’, count(rating) as ‘num_rate’
From your_ratings_table
Group by movie_id) r
On m.movie_id = r.movie_id
I'm using a left join because I'm not sure if all movies have been rated at least once.

Getting single row from JOIN given an additional condition

I'm making a select in which I give a year (hardcoded as 1981 below) and I expect to get one row per qualifying band. The main problem is to get the oldest living member for each band:
SELECT b.id_band,
COUNT(DISTINCT a.id_album),
COUNT(DISTINCT s.id_song),
COUNT(DISTINCT m.id_musician),
(SELECT name FROM MUSICIAN WHERE year_death IS NULL ORDER BY(birth)LIMIT 1)
FROM BAND b
LEFT JOIN ALBUM a ON(b.id_band = a.id_band)
LEFT JOIN SONG s ON(a.id_album = s.id_album)
JOIN MEMBER m ON(b.id_band= m.id_band)
JOIN MUSICIAN mu ON(m.id_musician = mu.id_musician)
/*LEFT JOIN(SELECT name FROM MUSICIAN WHERE year_death IS NULL
ORDER BY(birth) LIMIT 1) AS alive FROM mu*/ -- ??
WHERE b.year_formed = 1981
GROUP BY b.id_band;
I would like to obtain the oldest living member from mu for each band. But I just get the oldest musician overall from the relation MUSICIAN.
Here is screenshot showing output for my current query:
Well, I think you can follow the structure that you have, but you need JOINs in in the subquery.
SELECT b.id_band,
COUNT(DISTINCT a.id_album),
COUNT(DISTINCT s.id_song),
COUNT(DISTINCT mem.id_musician),
(SELECT m.name
FROM MUSICIAN m JOIN
MEMBER mem
ON mem.id_musician = m.id_musician
WHERE m.year_death IS NULL AND mem.id_band = b.id_band
ORDER BY m.birth
LIMIT 1
) as oldest_member
FROM BAND b LEFT JOIN
ALBUM a
ON b.id_band = a.id_band LEFT JOIN
SONG s
ON a.id_album = s.id_album LEFT JOIN
MEMBER mem
ON mem.id_band = b.id_band
WHERE b.year_formed = 1981
GROUP BY b.id_band
Following query will give you oldest member of each band group. You can put filter by year_formed = 1981 if you need.
SELECT
b.id_band,
total_albums,
total_songs,
total_musicians
FROM
(
SELECT b.id_band,
COUNT(DISTINCT a.id_album) as total_albums,
COUNT(DISTINCT s.id_song) as total_songs,
COUNT(DISTINCT m.id_musician) as total_musicians,
dense_rank() over (partition by b.id_band order by mu.year_death desc) as rnk
FROM BAND b
LEFT JOIN ALBUM a ON(b.id_band = a.id_band)
LEFT JOIN SONG s ON(a.id_album = s.id_album)
JOIN MEMBER m ON(b.id_band= m.id_band)
JOIN MUSICIAN mu ON(m.id_musician = mu.id_musician)
WHERE mu.year_death is NULL
)
where rnk = 1
You can reference a table that is out of this nested select, like so
SELECT b.id_band,
COUNT(DISTINCT a.id_album),
COUNT(DISTINCT s.id_song),
COUNT(DISTINCT m.id_musician),
(SELECT name FROM MUSICIAN WHERE year_death IS NULL ORDER BY(birth) AND
MUSICIAN.id_BAND = b.id_band LIMIT 1)
FROM BAND b
LEFT JOIN ALBUM a ON(b.id_band = a.id_band)
LEFT JOIN SONG s ON(a.id_album = s.id_album)
JOIN MEMBER m ON(b.id_band= m.id_band)
JOIN MUSICIAN mu ON(m.id_musician = mu.id_musician)
/*LEFT JOIN(SELECT name FROM MUSICIAN WHERE year_death IS NULL ORDER
BY(birth)LIMIT 1) AS alive FROM mu*/
WHERE b.year_formed= 1981
GROUP BY b.id_band
For queries where you want to find the "max person by age" you can use ROW_NUMBER() grouped by the band
SELECT b.id_band,
COUNT(DISTINCT a.id_album),
COUNT(DISTINCT s.id_song),
COUNT(DISTINCT m.id_musician),
oldest_living_members.*
FROM
band b
LEFT JOIN album a ON(b.id_band = a.id_band)
LEFT JOIN song s ON(a.id_album = s.id_album)
LEFT JOIN
(
SELECT
m.id_band
mu.*,
ROW_NUMBER() OVER(PARTITION BY m.id_band ORDER BY mu.birthdate ASC) rown
FROM
MEMBER m
JOIN MUSICIAN mu ON(m.id_musician = mu.id_musician)
WHERE year_death IS NULL
) oldest_living_members
ON
b.id_band = oldest_living_members.id_band AND
oldest_living_members.rown = 1
WHERE b.year_formed= 1981
GROUP BY b.id_band
If you run just the subquery you'll see how it's working = artists are joined to member to get the band id, and this forms a partition. Rownumber will start numbering from 1 according to the order of birthdates (I didn't know what your column name for birthday was; you'll have to edit it) so the oldest person (earliest birthday) gets a 1.. Every time the band id changes the numbering will restart from 1 with the oldest person in that band. Then when we join it we just pick the 1s
I think this should be considerably faster (while also solving your problem):
SELECT b.id_band, a.*, m.*
FROM band b
LEFT JOIN LATERAL (
SELECT count(*) AS ct_albums, sum(ct_songs) AS ct_songs
FROM (
SELECT id_album, count(*) AS ct_songs
FROM album a
LEFT JOIN song s USING (id_album)
WHERE a.id_band = b.id_band
GROUP BY 1
) ab
) a ON true
LEFT JOIN LATERAL (
SELECT count(*) OVER () AS ct_musicians
, name AS senior_member -- any other columns you need?
FROM member m
JOIN musician mu USING (id_musician)
WHERE m.id_band = b.id_band
ORDER BY year_death IS NOT NULL -- sorts the living first
, birth
, name -- as tiebreaker (my optional addition)
LIMIT 1
) m ON true
WHERE b.year_formed = 1981;
Getting the senior band member is solved in the LATERAL subquery m - without multiplying the cost for the base query. It works because the window function count(*) OVER () is computed before ORDER BY and LIMIT are applied. Since bands naturally only have few members, this should be the fastest possible way. See:
Best way to get result count before LIMIT was applied
What is the difference between LATERAL and a subquery in PostgreSQL?
Prevent duplicate values in LEFT JOIN
The other optimization for counting albums and songs builds on the assumption that the same id_song is never included in multiple albums of the same band. Else, those are counted multiple times. (Easily fixed, and uncorrelated to the task of getting the senior band member.)
The point is to eliminate the need for DISTINCT at the top level after multiplying rows at the N-side repeatedly (I like to call that "proxy cross join"). That would produce a possibly huge number of rows in the derived table without need.
Plus, it's much more convenient to retrieve additional column (like more columns for the senior band member) than with some other query styles.

What makes this difference when using group by or join?

Can anybody tell the difference between them? The 1st and 3rd query can be successfully executed, and they have the same output. While the 2nd and 4th query cannot be executed, and they both raised an ERROR:
column "movie.title" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT title, (MAX(stars)-MIN(stars)) AS ratingspread.
My questions are:
why using m.mid are difference with r.mid while using group by (query 1 vs query 2)
why A inner join B is not equal to B inner join A (query 3 vs query 4)
SELECT title, (MAX(stars)-MIN(stars)) AS ratingspread
FROM rating r JOIN movie m
ON r.mid = m.mid
GROUP BY m.mid
ORDER BY ratingspread DESC, title;
SELECT title, (MAX(stars)-MIN(stars)) AS ratingspread
FROM rating r JOIN movie m
ON r.mid = m.mid
GROUP BY r.mid
ORDER BY ratingspread DESC, title;
SELECT title, (MAX(stars) - MIN(stars)) AS ratingspread
FROM movie
INNER JOIN rating USING(mId)
GROUP BY mId
ORDER BY rating_spread DESC, title;
SELECT title, (MAX(stars)-MIN(stars)) AS ratingspread
FROM rating
INNER JOIN movie USING(mId)
GROUP BY mId
ORDER BY ratingspread DESC, title
FYI the schema goes like this:
Movie ( mID, title, year, director )There is a movie with ID number mID, a title, a release year, and a director.
Reviewer ( rID, name ) The reviewer with ID number rID has a certain name.
Rating ( rID, mID, stars, ratingDate ) The reviewer rID gave the movie mID a number of stars rating (1-5) on a certain ratingDate.
Your queries are not producing what you think they are producing for a number of reasons.
Here is what you are really looking for I think:
SELECT
movie.mid
movie.title,
(MAX(rating.stars)-MIN(rating.stars)) AS ratingspread
FROM
movie
INNER JOIN rating on movie.mid = rating.mid
GROUP BY
movie.mid,
title
ORDER BY
(MAX(stars)-MIN(stars)) DESC,
title
There are a few things to point out: Firstly, you need to join on the matching column - in some of your queries you are joining mid to rid - these are unrelated fields. The movie ID is what joins the rating to the movie. Secondly, you are not getting the GROUP BY concept. What you are trying to do is get the spread of ratings for a given movie and display its title, so to display its title (and any other non-summarised data), you have to include the field in the group by. For further illustration, imagine you wanted to get the spread of all reviews by each reviewer, to see if they had any bias towards going hard or going soft on the movies they were reviewing. Here is how you would get the spread of reviews for each reviewer:
SELECT
reviewer.rid,
reviewer.name,
(MAX(rating.stars)-MIN(rating.stars)) AS ratingspread
FROM
reviewer
INNER JOIN rating on reviewer.rid = rating.rid
GROUP BY
reviewer.rid,
reviewer.name
ORDER BY
(MAX(stars)-MIN(stars)) DESC,
reviewer.name
By the way, the reason you want to include the ID as well as the title or reviewer name is to ensure you eliminate problems where two movies share the same title, or two reviewers have the same name.

SQL top 10 results in nested query

hope someone can help me produce a nested query. Does not have to be efficient, just simple to follow.
I am looking to "Find the top 10 albums that have the greatest number of played tracks"
I have the following tables:
Album which has GRid and Title
AlbumTrack which has GRid and ISRC
Track which has ISRC and PlayCount
Ive currently got:
SELECT TOP 10 Album.Title
FROM Album
WHERE Grid IN
(SELECT AlbumTrack.GRid
FROM AlbumTrack
WHERE ISRC IN
(SELECT Track,ISRC
FROM TRACK
WHERE Track.ISRC = SUM(Track.PlayCount)
ORDER BY Track.PlayCount));
Any thoughts?
Using Join
SELECT Title
FROM (
SELECT Album.Title,
DENSE_RANK() OVER (ORDER BY PlayCount) AS RN
FROM Album
INNER JOIN AlbumTrack
ON Album.Grid = AlbumTrack.Grid
INNER JOIN
( SELECT ISRC, SUM(PlayCount) AS PlayCount
FROM TRACK
GROUP BY ISRC
)AS TRACK
ON AlbumTrack.ISRC=TRACK.ISRC
) AS T
WHERE RN<=10
try this query :
SELECT TOP 10 Album.Title FROM Album
JOIN AlbumTrack ON Album.Grid = AlbumTrack.GRid
JOIN (SELECT Sum(PlayCount) as NumberOfPlay , ISRC TRACK
GROUP BY AlbumTrack.ISRC
) as Track ON AlbumTrack.ISRC = Track.ISRC
ORDER BY Track.NumberOfPlay

SQL Server : multiple transactions

select
STUDIO.NAME, MOVIE.TITLE, MOVIE.YEAR
from
STUDIO
join
MOVIE on STUDIO.NAME = MOVIE.STUDIONAME
where
MOVIE.YEAR >= ALL (select MOVIE.YEAR from MOVIE)
I have this code which give me as a result the year of the last film, it's title and the name of the studio, which make the movie.
How can I rewrite this code, so I can get the last produced movie by each studio, not only by one?
Try:
SELECT a.NAME,
a.TITLE,
a.YEAR
FROM (select s.NAME,
m.TITLE,
m.YEAR,
ROW_NUMBER()OVER(PARTITION BY s.NAME ORDER BY m.YEAR DESC ) as rnk
from STUDIO s
join MOVIE m on s.NAME = m.STUDIONAME) a
WHERE a.rnk = 1
SELECT *
FROM (
SELECT STUDIO.NAME, MOVIE.TITLE, MOVIE.YEAR
from
STUDIO
join
MOVIE on STUDIO.NAME = MOVIE.STUDIONAME
ORDER BY MOVIE.YEAR DESC
) AS newTable
GROUP BY newTable.NAME
You might try something like
select
studio.name,
movie.title,
movie.year
from
studio
inner join movie on
studio.name = movie.studioname
inner join
(
select
movie.studioname
max(movie.year) year
from
movie
) t1 on
t1.studioname = studio.studioname and
t1.year = movie.year
Note that there are limitations to the way you have written your query:
There could be multiple movies in the same year - which one is the latest? You would need a date column for that or something that that affect
Also you are using varchar/chars to join on which is really bad practice since you have to do a string comparison on each match. It would be better to use id - integers - which give better performance
The answer above is fine for finding the most recent year a studio published a movie, however if you need the title too, using that method you'll need to use some form of subselect to get the largest year (and so the most recent) by studio name, and then bring in the title based on this.
The below does both in a fairly succinct fashion using a CROSS APPLY:
SELECT
S.NAME,
MM.TITLE
MM.YEAR
FROM STUDIO S
CROSS APPLY
(SELECT TOP 1 TITLE, YEAR FROM MOVIE M
WHERE M.STUDIONAME = S.STUDIO
ORDER BY M.YEAR DESC) MM
1st Edit: Added context
2nd Edit: Added edit notes.
What do you mean "not only by one"?
Normally you do such queries using aggregate functions:
select STUDIO.NAME, MOVIE.TITLE, MAX( MOVIE.YEAR )
from STUDIO
join MOVIE on STUDIO.NAME = MOVIE.STUDIONAME
Group by STUDIO.NAME, MOVIE.TITLE
Is this what you need?