Problems with SQL Joins on a assigment - sql

I've got a question for my assignment
Data
Question: For all cases where the same reviewer rated the same movie twice and gave it a higher rating the second time, return the reviewer's name and the title of the movie.
Here's what I've tried. I joined all the tables.
select *
from Rating
join Reviewer on Rating.rID = Reviewer.rID
join Movie on Rating.mID = Movie.mID
But how to continue? If a Reviewer rated the same Movie and the last rating of this movie is higher than former, then I need to show this reviewer. But how to do it in SQL?

Join what you already had with Rating again, so that you can get all records where the reviewer is the same and the movie is the same, then filter only rows where a record with a later ratingDate has more stars.
In case the same reviewer did review 3 or more times, then use select distinct to remove duplicates
select distinct rev.name, m.title
from Rating r1
join Reviewer rev on rev.rID = r1.rID
join Movie m on m.mID = r1.mID
join Rating r2 on r1.rID = r2.rID and r1.mID = r2.mID
where r1.ratingDate < r2.ratingDate and r1.stars < r2.stars

Here is a way to do this..
I find out the count of (rid,mid) combinations which have exactly 2 (ie two reviews by same reviewer against the same movie) this shows up as the column cnt.
After which i find the latest rating by ranking the ratingdate in desc. Thus row_number=1 gets you the latest rating value
with data
as (
select count(*) over(partition by rt.rid,rt.mid) as cnt
,row_number() over(partition by rt.rid,rt.mid order by rt.ratindate desc) as rnk
,rw.name
,mov.title
from rating rt
join reviewer rw
on rt.rid=rw.rid
join movie mov
on mov.mid=rt.mid
)
select *
from data
where rnk=1
and cnt=2

For the cases a reviewer rated the same movie multiple times, you are interested in their first and second rating. (Possible further ratings, i.e. a reviewer rating a movie a third or fourth time etc., must get ignored.) So, number the rows (with ROW_NUMBER). Then see whether the second rating is higher than the first (by grouping by reviewer and movie and comparing both ratings). For the matches look up reviewer name and movie title, for which you'd normally use where (rid, mid) in ( subquery ), but SQL Server does not support IN clauses with tuples, so you'd inner join instead.
select r.name, m.title
from reviewer r
cross join movie m
join
(
select rid, mid
from
(
select *, row_number() over(partition by rid order by ratingdate) as rn
from Rating
) numbered
group by rid, mid
having max(rn) > 1
and any_value(case when rn = 1 then stars end) <
any_value(case when rn = 2 then stars end)
) matches on matches.rid = r.rid and matches.mid = m.mid
order by r.name, m.title;

Related

For all pairs of reviewers such that both reviewers gave a rating to the same movie, return the names of both reviewers

Stanford Self Paced Course for SQL question:
For all pairs of reviewers such that both reviewers gave a rating to
the same movie, return the names of both reviewers. Eliminate
duplicates, don't pair reviewers with themselves, and include each
pair only once. For each pair, return the names in the pair in
alphabetical order.
The schema :
Movie ( mID, title, year, director )
There is a movie with ID number mID, a title, a release year, and a director.
Reviewer ( rID, name )
The reviewer with ID number rID has a certain name.
Rating ( rID, mID, stars, ratingDate )
The reviewer rID gave the movie mID a number of stars rating (1-5) on a certain ratingDate.
My attempt:
Select R.Name, R2.Name
From Reviewer R
Join Reviewer R2 on (R.rID = R2.rID)
Join Rating Rt on (Rt.rID = R2.rID)
Join Rating Rt2 on (Rt2.rID = R.rID)
Where Rt.MID = Rt2.mID and R.rID < r2.rID
I know I need to have a table with 2 Reviewer Name columns and 2 Movie columns. I apply the condition so that the movies have to equal each other and the condition that the Id's cannot be the same as the question says "Don't pair reviewers with themselves, and include each pair only once".
My result is empty (incorrect). What am I doing wrong?
You must join the table rating twice to the table movie and for each join, join the table reviewer.
Then filter the result so that reviewers are not paired with themselves and by using distinct and min() and max() functions make sure that each pair is not repeated:
select distinct
min(v1.name, v2.name) reviewer1,
max(v1.name, v2.name) reviewer2
from movie m
inner join rating r1 on r1.mid = m.mid
inner join rating r2 on r2.mid = m.mid
inner join reviewer v1 on v1.rid = r1.rid
inner join reviewer v2 on v2.rid = r2.rid
where v1.rid <> v2.rid
order by reviewer1, reviewer2
I would start with the self join on rating and then bring in the names:
select distinct rv1.name, rv2.name
from rating r1 join
rating r2
on r1.mid = r2.mid join
reviewer rv1
on rv1.rid = r1.rid join
reviewer rv2
on rv2.rid = r2.rid and rv1.name < rv2.name;
Your query is actually very similar. I think the main issue is the select distinct and the ordering by name instead of id.
SELECT DISTINCT MIN(NAME),MAX(NAME)
FROM (SELECT MID,NAME
FROM RATING R,REVIEWER R1
WHERE R.RID=R1.RID
GROUP BY MID,R.RID
ORDER BY MID,NAME)
GROUP BY MID
ORDER BY MIN(NAME)
Correct Answer
select distinct rv1.name,rv2.name
from rating r1
join rating r2 on r1.mid = r2.mid
join reviewer rv1 on rv1.rID= r1.rID
join reviewer rv2 on rv2.rID = r2.rID and rv1.name < rv2.name
order by rv1.name,rv2.name;
Here is my solution:
select rev1.*, rev2.*
from Rating rat1
JOIN Rating rat2 on rat2.mID = rat2.mID AND rat1.rID < rat2.rID
JOIN Reviers rev1 ON rev1.rID = rat1.rID
JOIN Reviers rev2 ON rev2.rID = rat2.rID
order by rev1.name, rev2.name
Distinct is always good to be avoided.
And it's always good join tables using ids, and after that, order using some criteria, like name
SELECT rev.name, rev2.name
FROM ( SELECT distinct rID, mID
FROM RATING as r ) as t1
INNER JOIN ( SELECT distinct rID, mID
FROM RATING as r ) as t2 On t1.MID = t2.MID
INNER JOIN Reviewer as rev ON t1.RID = rev.RID
INNER JOIN Reviewer as rev2 ON t2.RID = rev2.RID
WHERE t1.RID <> t2.RID
AND t1.RID < t2.RID
GROUP by rev.name, rev2.name
ORDER by 1, 2
Select Distinct Min(name) as Mn, Max(name) as Mx
From Rating Join Reviewer Using(rID)
Group by mID
Order by Mn
select r1.name n1, r2.name n2 from (
select distinct a1.rid rid1, a2.rid rid2 from rating a1 inner join rating a2 on a1.mid = a2.mid
and not a1.rid = a2.rid
)r
inner join reviewer r1 on r.rid1 = r1.rid
inner join reviewer r2 on r.rid2 = r2.rid
where r1.name < r2.name order by r1.name
select DISTINCT min(name) as rv1, max(name) rv2
from Rating r1, Reviewer
WHERE (SELECT COUNT(*) from Rating r2 WHERE r1.mID = r2.mID) > 1 and r1.rID = Reviewer.rID
GROUP by r1.mID
ORDER by rv1

expanding sql query for selecting top two rows based on rating criteria?

I am doing a few exercises to get my sql basics up. I am stuck here and unable to make any progress further. I would really appreciate if I could get tips on how to break down complex query such as the following:
There are three tables:
Movie ( mID, title, year, director ) --
There is a movie with ID number mID, a title, a release year, and a director.
Reviewer ( rID, name ) -- The reviewer with ID number rID has a certain name.
Rating ( rID, mID, stars, ratingDate ) -- The reviewer rID gave the movie mID a number of stars rating (1-5) on a certain ratingDate.
The problem is :
For all cases where the same reviewer rated the same movie twice and gave it a higher rating the second time, return the reviewer's name and the title of the movie.
Here is my attempt:
select distinct temp1.ID FROM (select * FROM
(select rID, name,title twos FROM
(select r.rID , rev.name, m.title, count(*) as twos from reviewer rev
JOIN rating r on r.rID=rev.rID
JOIN movie m on m.mID=r.mID
GROUP BY rev.rID) counts where counts.twos=2) result, rating r
where result.rID=r.rID ORDER BY ratingDate DESC) TEMP temp1
INNER JOIN TEMP temp2
ON temp1.rId = temp2.rId AND temp1.ratingDate > temp2.ratingDate
WHERE temp1.stars > temp2.stars;
I build this query iteratively. but It did not give right solution. so I would like to know how to approach this kind of problem.
This is NOT homework.I am doing online tutorial from here.
Thank you
In SQL, it helps to think in sets. For example you could select the set of reviews for which an earlier review with a lower rating exists:
select Reviewer.name
, Movie.title
from Rating
join Reviewer
on Reviewer.rID = Rating.rID
join Movie
on Movie.mID = Rating.mID
where exists
(
select *
from Rating prev
where prev.mID = Rating.mID
and prev.rID = Rating.rID
and prev.ratingDate < Rating.ratingDate
and prev.stars < Rating.stars
)
That's a really nice course btw!
First, you get rows with duplicate mID and rID combination. And then JOIN that to Rating to see if the second star given is higher than of the first. And then JOIN to Movie and Reviewer for the reviewer name and movie title.
SELECT
rv.name,
m.title
FROM (
SELECT
rID, mID
FROM Rating
GROUP BY rID, mID
HAVING COUNT(*) = 2
)t
INNER JOIN Rating r
ON r.rID = t.rID
AND r.mID = t.mID
INNER JOIN Rating r2
ON r2.rID = r.rID
AND r2.mID = r.mID
AND r2.ratingDate > r.ratingDate
INNER JOIN Movie m
ON m.mID = r.mID
INNER JOIN Reviewer rv
ON rv.rID = r.rID
WHERE
r2.stars > r.stars

Movie with highest rating?

Find the movie(s) with the highest average rating. Return the movie title(s) and average rating.
I tried this and stuck because I'm not able to retrieve mid if i add mid, max(avg_stars) then it will give max of every mid, I want only one max value.
http://sqlfiddle.com/#!3/e3ee1/13
select max(avg_stars) from
(
select top 1 mid, avg(stars) as avg_stars
from rating
group by mid
order by avg_stars desc
) z
excepted output Snow White 4.5 and how can i handle if two movies having same max(avg_stars).
This would serve your purpose perfectly & with performance -
SELECT
Title
,AVG_RATING
FROM
(
SELECT
M.Title
,M.mID
,CAST(ROUND(AVG(R.stars),2) AS DECIMAL(10,2)) AS AVG_RATING
,RANK() OVER (ORDER BY AVG(R.stars) DESC) RATING_RANK
FROM Movie M
INNER JOIN Rating R
ON M.mID = R.mID
GROUP BY M.Title,M.mID
)RANKED_RATING
WHERE RATING_RANK = 1
You may have to play around the casting a little to suit your table definitions.
Note - If 2 or more movies have the highest avg rating - all would be ranked 1 and all would get selected. If you still want only one - you'll need to define a rule as to which one you want to be selected.
Try this http://sqlfiddle.com/#!3/e3ee1/143:
;WITH CTE as
(
select r.mid, avg(r.stars) as avg_stars, m.title
from rating r
INNER JOIN Movie m ON m.mid=r.mid
group by r.mid, m.title
--order by avg_stars desc
)
select TOP 1 mid, title,avg_stars from CTE
Group by avg_stars,mid,title
--having avg_stars=Max(avg_stars)
Order By avg_stars desc
Output:
MID TITLE AVG_STARS
106 Snow White 4.5
SELECT TOP 1 MAX(m.title) AS title, AVG(stars) AS averageStars
FROM rating r
JOIN movie m
ON r.mId = m.mId
GROUP BY r.mId
ORDER BY AVG(stars) DESC,
--Order by a seond column of your
--choice to break ties for AVG(stars)
MAX(m.title)
You can probably optimize or come up with something cleaner but this works:
SELECT m.title, AVG(r.stars) AS AverageStars
FROM Rating AS r (NOLOCK)
INNER JOIN Movie AS m (NOLOCK) ON m.mID = r.mID
GROUP BY r.mID, m.Title
HAVING AVG(r.stars) =
(
SELECT TOP 1 AVG(stars) AS AverageStars
FROM Rating (NOLOCK)
GROUP BY mID
ORDER BY AverageStars DESC
)

I don't understand this SQL Server MIN() resultset

I have this SQL Server query which I wrote to find the Movie title that has the least amount of records in the RENTAL table.
When run, it returns a resultset that is identical to the resultset I get from executing the sub-query by itself.
In other words, rather returning the single movie with the minimum RentalCount, it returns all movie titles and their corresponding RentalCount.
SELECT B.Title, MIN(B.RentalCount) AS RentalCount
FROM (
SELECT Movie.Title, Count(*) AS RentalCount
FROM Rental
JOIN Dvd ON Rental.RentalID=Dvd.DvdID
JOIN Movie ON Dvd.Movieid=movie.MovieID
GROUP BY Movie.Title
) B
GROUP BY B.Title
The result is correct. Your subquery returns the total count for each title on the rental table. And the result will be the same on the outer query because you have grouped them by their title also.
follow-up question: what result do you want to achieved?
find the Movie title that has the least amount of records in the RENTAL table
SELECT Movie.Title, Count(*) AS RentalCount
FROM Rental
JOIN Dvd ON Rental.RentalID=Dvd.DvdID
JOIN Movie ON Dvd.Movieid=movie.MovieID
GROUP BY Movie.Title
HAVING Count(*) =
(
SELECT MIN(t_count)
FROM
(
SELECT Count(*) t_count
FROM Rental
GROUP BY Title
) a
)
UPDATE 1
Thanks to Martin Smith for introducing me TOP....WITH TIES
SELECT TOP 1 WITH TIES Movie.Title, Count(*) AS RentalCount
FROM Rental
JOIN Dvd ON Rental.RentalID=Dvd.DvdID
JOIN Movie ON Dvd.Movieid=movie.MovieID
GROUP BY Movie.Title
ORDER BY RentalCount DESC
SQLFiddle Demo
You could have done this without a subquery
SELECT TOP 1 Movie.Title, Count(*) AS RentalCount
FROM Rental
JOIN Dvd ON Rental.RentalID=Dvd.DvdID
JOIN Movie ON Dvd.Movieid=movie.MovieID
GROUP BY Movie.Title
ORDER BY Count(*)
if you are looking for a specfic movie title then do like this:
SELECT Movie.Title, Count(*) AS RentalCount
FROM Rental
JOIN Dvd ON Rental.RentalID=Dvd.DvdID
JOIN Movie ON Dvd.Movieid=movie.MovieID
where Movie.Title='xyz'
GROUP BY Movie.Title

How should I join these 3 SQL queries in Oracle?

I have these 3 queries:
SELECT
title, year, MovieGenres(m.mid) genres,
MovieDirectors(m.mid) directors, MovieWriters(m.mid) writers,
synopsis, poster_url
FROM movies m
WHERE m.mid = 1;
SELECT AVG(rating) FROM movie_ratings WHERE mid = 1;
SELECT COUNT(rating) FROM movie_ratings WHERE mid = 1;
And I need to join them into a single query. I was able to do it like this:
SELECT
title, year, MovieGenres(m.mid) genres,
MovieDirectors(m.mid) directors, MovieWriters(m.mid) writers,
synopsis, poster_url, AVG(rating) average, COUNT(rating) count
FROM movies m INNER JOIN movie_ratings mr
ON m.mid = mr.mid
WHERE m.mid = 1
GROUP BY
title, year, MovieGenres(m.mid), MovieDirectors(m.mid),
MovieWriters(m.mid), synopsis, poster_url;
But I don't really like that "huge" GROUP BY, is there a simpler way to do it?
You could do something like this:
SELECT title
,year
,MovieGenres(m.mid) genres
,MovieDirectors(m.mid) directors
,MovieWriters(m.mid) writers
,synopsis
,poster_url
,(select avg(mr.rating)
from movie_ratings mr
where mr.mid = m.mid) as avg_rating
,(select count(rating)
from movie_ratings mr
where mr.mid = m.mid) as num_ratings
FROM movies m
WHERE m.mid = 1;
or even
with grouped as(
select avg(rating) as avg_rating
,count(rating) as num_ratings
from movie_ratings
where mid = 1
)
select title
,year
,MovieGenres(m.mid) genres
,MovieDirectors(m.mid) directors
,MovieWriters(m.mid) writers
,synopsis
,poster_url
,avg_rating
,num_ratings
from movies m cross join grouped
where m.mid = 1;
I guess I don't see the problem with having several GroupBy columns. That's a very common pattern in SQL. Of course, code clarity is often in the eye of the beholder.
Check the explain plans for the two approaches; my guess is you'll get better performance with your original version since it only needs to process the movie_ratings table once. But I haven't checked, and that will be somewhat data and installation dependent.
how about
SELECT
title, year, MovieGenres(m.mid) genres,
MovieDirectors(m.mid) directors, MovieWriters(m.mid) writers,
synopsis, poster_url,
(SELECT AVG(rating) FROM movie_ratings WHERE mid = 1) av,
(SELECT COUNT(rating) FROM movie_ratings WHERE mid = 1) cnt
FROM movies m
WHERE m.mid = 1;
or
SELECT
title, year, MovieGenres(m.mid) genres,
MovieDirectors(m.mid) directors, MovieWriters(m.mid) writers,
synopsis, poster_url,
av.av,
cnt.cnt
FROM movies m,
(SELECT AVG(rating) av FROM movie_ratings WHERE mid = 1) av,
(SELECT COUNT(rating) cnt FROM movie_ratings WHERE mid = 1) cnt
WHERE m.mid = 1;