What makes this difference when using group by or join? - sql

Can anybody tell the difference between them? The 1st and 3rd query can be successfully executed, and they have the same output. While the 2nd and 4th query cannot be executed, and they both raised an ERROR:
column "movie.title" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT title, (MAX(stars)-MIN(stars)) AS ratingspread.
My questions are:
why using m.mid are difference with r.mid while using group by (query 1 vs query 2)
why A inner join B is not equal to B inner join A (query 3 vs query 4)
SELECT title, (MAX(stars)-MIN(stars)) AS ratingspread
FROM rating r JOIN movie m
ON r.mid = m.mid
GROUP BY m.mid
ORDER BY ratingspread DESC, title;
SELECT title, (MAX(stars)-MIN(stars)) AS ratingspread
FROM rating r JOIN movie m
ON r.mid = m.mid
GROUP BY r.mid
ORDER BY ratingspread DESC, title;
SELECT title, (MAX(stars) - MIN(stars)) AS ratingspread
FROM movie
INNER JOIN rating USING(mId)
GROUP BY mId
ORDER BY rating_spread DESC, title;
SELECT title, (MAX(stars)-MIN(stars)) AS ratingspread
FROM rating
INNER JOIN movie USING(mId)
GROUP BY mId
ORDER BY ratingspread DESC, title
FYI the schema goes like this:
Movie ( mID, title, year, director )There is a movie with ID number mID, a title, a release year, and a director.
Reviewer ( rID, name ) The reviewer with ID number rID has a certain name.
Rating ( rID, mID, stars, ratingDate ) The reviewer rID gave the movie mID a number of stars rating (1-5) on a certain ratingDate.

Your queries are not producing what you think they are producing for a number of reasons.
Here is what you are really looking for I think:
SELECT
movie.mid
movie.title,
(MAX(rating.stars)-MIN(rating.stars)) AS ratingspread
FROM
movie
INNER JOIN rating on movie.mid = rating.mid
GROUP BY
movie.mid,
title
ORDER BY
(MAX(stars)-MIN(stars)) DESC,
title
There are a few things to point out: Firstly, you need to join on the matching column - in some of your queries you are joining mid to rid - these are unrelated fields. The movie ID is what joins the rating to the movie. Secondly, you are not getting the GROUP BY concept. What you are trying to do is get the spread of ratings for a given movie and display its title, so to display its title (and any other non-summarised data), you have to include the field in the group by. For further illustration, imagine you wanted to get the spread of all reviews by each reviewer, to see if they had any bias towards going hard or going soft on the movies they were reviewing. Here is how you would get the spread of reviews for each reviewer:
SELECT
reviewer.rid,
reviewer.name,
(MAX(rating.stars)-MIN(rating.stars)) AS ratingspread
FROM
reviewer
INNER JOIN rating on reviewer.rid = rating.rid
GROUP BY
reviewer.rid,
reviewer.name
ORDER BY
(MAX(stars)-MIN(stars)) DESC,
reviewer.name
By the way, the reason you want to include the ID as well as the title or reviewer name is to ensure you eliminate problems where two movies share the same title, or two reviewers have the same name.

Related

Which actor has the highest difference in ratings?

I need further help with my SQL problem.
In this database on movies, ratings and actors: https://i.stack.imgur.com/qFIbC.jpg
I am required to find the actor who has the largest difference between their best and their worst rated movie.
The condition is that the ratings cannot be lower than 3! (>3)
My current SQL looks as follows:
SELECT * FROM stars
JOIN ratings ON stars.movie_id = ratings.movie_id
WHERE ratings.movie_id = (
SELECT MAX(rating) - MIN(rating) FROM ratings
WHERE rating > 3);
My expectations were that I would get somewhat of a result in my Github terminal that I can work with to adjust my SQL query.
But I seem to have reached a dead-end and I'm not sure how to solve this solution
You need to GROUP BY actor to calculate everyone's rating range. Then, take the actor with the largest range. Something like this:
SELECT
person_id,
MAX(rating) - MIN(rating) AS rating_range
FROM
stars
JOIN ratings ON stars.movie_id = ratings.movie_id
WHERE
rating > 3
GROUP BY
person_id
ORDER BY
2 DESC
LIMIT
1
;

Get Average in SQL Through Join

I'm just playing around with SQL and I'm trying to do the following.
I have 2 tables and here is their structure:
Movies_metadata Movies
ratings table:
Ratings
As there are many ratings for one movie, what I'd like to do is get the avg rating per movie and have it display next to the title which is only available in the Metadata table.
This is as far as I got but obviously the issue with my SELECT statement is that it'll return the average of all movies and display it for each record:
SELECT
(SELECT
AVG(rating)
FROM
`movies-dataset.movies_data.ratings`) AS rating_avg,
metadata.title,
metadata.budget,
metadata.revenue,
metadata.genres,
metadata.original_language,
metadata.release_date
FROM
`movies-dataset.movies_data.Movies_metadata` AS metadata
INNER JOIN `movies-dataset.movies_data.ratings` AS ratings
ON metadata.id = ratings.movieId
LIMIT 10
Here is an example of the result:
Result
I'm thinking I can potentially use a GROUP BY but when I try, I get an error
Appreciate the help!
The following should work:
SELECT movies_metadata.title, AVG(ratings.rating)
FROM movies_metadata
LEFT JOIN ratings ON movies_metadata.id = ratings.movieID
GROUP BY movies_metadata.title
You can swap movies_metadata.title by movies_metadata.id if not unique.
The LIMIT function and GROUP function might conflict with each other. Try getting the average rating as part of the inner join like this:
SELECT
ratings.averagerating,
metadata.title,
metadata.budget,
metadata.revenue,
metadata.genres,
metadata.original_language,
metadata.release_date
FROM `movies-dataset.movies_data.Movies_metadata` AS metadata
INNER JOIN (SELECT movieId, AVG(rating) averagerating FROM `movies-dataset.movies_data.ratings` GROUP by movieId) AS ratings
ON metadata.id = ratings.movieId
ORDER BY ratings.averagerating
LIMIT 5
Maybe try something like:
Select m.movieID, (r.rate_sum / r.num_rate) as avg_rating
From your_movies_table m
Left Join (select movie_id, sum(rating) as ‘rate_sum’, count(rating) as ‘num_rate’
From your_ratings_table
Group by movie_id) r
On m.movie_id = r.movie_id
I'm using a left join because I'm not sure if all movies have been rated at least once.

Find the max average for each country - SQL SERVER

For each country, report the book with the highest average rating. If two or more books have the same average rating, report the one with the highest number of ratings. If two or more books are still tie (same highest average rating and same number of ratings), report them all sorted alphabetically on book title. If a country is not associated to any ratings, it should not show up in the query output.
myTablesSample
SELECT CountryName, Title, AVG(CAST(Rate AS DECIMAL)) as AverageRatePerCountry, count(Rate) as NumberOfRates
FROM bda.booksRatings as BR2
left JOIN bda.books as B2 on B2.ISBN = BR2.ISBN
left JOIN bda.users as U2 ON U2.UserID = BR2.UserID
left JOIN bda.countries as C2 on U2.CountryCode = C2.CountryCode
GROUP BY CountryName, Title
ORDER BY CountryName desc, AverageRatePerCountry desc, NumberOfRates desc, Title asc
I tried adding the below but clearly it does not work,
having AVG(CAST(Rate AS DECIMAL)) >=all and count(Rate) >= all
This is Homework, am not 100% sure if it is allowed to ask.
But, please i do not want the solution, just some pointers on what i am losing here.
Thank you in advance

SQL Query not getting the corresponding column

SELECT t.title, Max(t.st)
FROM (SELECT title,
Avg(stars) AS st
FROM movie
JOIN rating USING(mid)
GROUP BY title)t;
This is my query which I am writing to get max values of AVG(stars) and it's corresponding title. Max value is coming fine but facing trouble in title. I am not getting corresponding title.
Output of subquery is:
Avatar 4.0
E.T. 2.5
Gone with the Wind 3.0
Raiders of the Lost Ark 3.33333333333
Snow White 4.5
The Sound of Music 2.5
Output of whole query is wrong .
The Sound of Music 4.5
Expected /Correct output is
SnowWhite 4.5
Try this
select t.title,
t.st
from
(select title,
avg(stars) as st,rank() over(order by avg(stars) desc) as rSt
from movie
join rating using(mID)
group by title)t where t.rSt=1 ;
It is ranking movies in descending order first, then in outer query's where condition, movie with highest rank is being selected. Hope this helps :-)
You can try using correlated subquery
select *
from
(select title,
avg(stars) as st
from movie a
join rating b on a.mID=b.mID
group by title
)t where st in
(select max(st) from (select title,
avg(stars) as st
from movie a
join rating b on a.mID=b.mID
group by title
)t1 on t.title=t1.title)
You can use WHERE condition in your query. You can experiment with joins or just use subquery.
Select title, max(stars)
from table
where stars = (select max(stars) from table)
or
SELECT top 1 title
FROM table
ORDER BY Stars DESC

SQL displaying only max value

I am trying to return a result from sql query where sort by movie title and the highest rating of the movie - and get rid of the lower rating of the same movie. and theres only 1 select statement allowed.
i tried this;
Select distinct m.title, r.stars
from Movie as m inner join Rating as r on m.mid = r.mid
order by m.title
but can't figure out how to only choose the higher rating. If anyone has a good resource for the nuances it would help.
use MAX() which is an aggregate function that gets the greatest value in a certain field on each group.
Select m.title, MAX(r.stars) stars
from Movie as m inner join Rating as r on m.mid = r.mid
GROUP BY m.title
order by m.title