SQL - Get highest rated movie by genre

SQL - Get highest rated movie by genre - sql

I have three tables:
CREATE TABLE Movie
(
movieId INTEGER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
title VARCHAR(255) NOT NULL,
moviePath VARCHAR(500) NOT NULL
);
CREATE TABLE Rating
(
rid INTEGER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
mid INTEGER FOREIGN KEY REFERENCES Movie(movieId) ON DELETE CASCADE,
uid INTEGER FOREIGN KEY REFERENCES User(id) ON DELETE CASCADE,
rating INTEGER NOT NULL,
);
CREATE TABLE Genre(
id INTEGER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
movieId INTEGER NOT NULL FOREIGN KEY REFERENCES Movie(movieId) ON DELETE CASCADE,
genre VARCHAR(255) NOT NULL
);
I want to create an sql query which gives me the most seen movie(with moviepath, title) from the the most seen genre back.
Any ideas?
UPDATE
Results:
| MID | TITLE | MOVIEPATH |
--------------------------------
| 4 | Happy days | a |
| 4 | Happy days | a |

It's really great if you had provided some sample data to match.. Well try this out. Looking at your earlier question this answer is drafted.
SQLFIDDLE DEMO
select t.mid, t.sum_rating,
m.title, m.moviepath, g.genres
from (
select mid,
sum(rating) as sum_rating,
dense_rank() over (order by
sum(rating) desc) as rnk
from rating
group by mid
) t
left join movie m
on m.movieid = t.mid
left join genre g
on g.movieid = m.movieid
where t.rnk = 1;
Results:
| MID | SUM_RATING | TITLE | MOVIEPATH | GENRES |
------------------------------------------------------
| 4 | 37 | Happy days | a | comedy |
| 4 | 37 | Happy days | a | RomCom |
You may use this alternative as HSQL doesn't support dense_rank:
SQLFIDDLE DEMO ALTERNATIVE
Query with : order by desc and top 1
-- alternatively
select t.mid, t.sum_rating,
m.title, m.moviepath, g.genres
from (
select top 1 mid,
sum(rating) as sum_rating
from rating
group by mid
order by sum_rating desc
) t
left join movie m
on m.movieid = t.mid
left join genre g
on g.movieid = m.movieid
;

You could calculate the rating by summing all ratings for a movie in a subquery. Another subquery could calculate the highest rating per genre. By joining them together, you'd filter only the top movies per genre:
select *
from Movie m
join Genre g
on g.movieId = m.movieId
join (
select r.mid
, sum(Rating) as SumRating
from Rating r
group by
r.mid
) r
on r.mid = m.movieId
join (
select g.id as gid
, max(SumRating) as MaxGenreRating
from (
select r.mid
, sum(Rating) as SumRating
from Rating r
group by
r.mid
) r
join Genre g
on g.movieId = r.mid
group by
g.id
) filter
on filter.gid = g.id
and filter.MaxGenreRating = r.SumRating

Where can we found the count of views? With this tables you can find the highest rated movie by using a query like this:
select Movie.movieId, Movie.title, Movie.moviepath
from movie, rating, genre
where
movie.id = rating.mid and
movie.id = genre.movieId
order by rating desc
limit 1; // limit is used if you are using MySQL. In other databases you can use suitable sql query.
But if you are looking for the most seen movie from most seen genre, you have to have the view count of each movie and each genre inside your tables.

I can recommend you using aggregate max on the rating together with group by on genre
select max(Rating.rating) as max_rating, Genre.genre, Movie.movieId from Movie
inner join Rating on Movie.movieId = Rating.mid
inner join Genre on Movie.movieId = Genre.movieId
group by Genre.genre;
I am not so sure that this will work 100% since I didn't try it, but I the idea is to use group by. It it ment to be used with aggregates like count, max, min, avg and so on.
I hope that helps

Related

is there a more efficient alternative for this SQL query?

Im working on a movie data set that has tables for movies, genre and a bridge table in_genre.
The following query tries to find common genres between two movies. Im doing two joins to get the genre list and a intersect to find common genres.
Is there a more efficient way?
Table schema:
movie : movie_id(PK)(int)
in_genre(bridge_table): movie_id(FK)(int), genre_id(int)
SELECT count(*) as common_genre
FROM(
// getting genres of first movie
SELECT in_genre.genre_id
FROM movie INNER JOIN in_genre ON movie.id = in_genre.movie_id
WHERE movie.id = 0109830
INTERSECT
// getting genres of second movie
SELECT in_genre.genre_id
FROM movie INNER JOIN in_genre ON movie.id = in_genre.movie_id
WHERE movie.id = 1375666
) as genres

If it only needs the data from in_genre then there's no need to join the movie table.
And you can use an EXISTS to find the common genres.
SELECT COUNT(DISTINCT genre_id) as common_genre
FROM in_genre ig
WHERE movie_id = 0109830
AND EXISTS
(
SELECT 1
FROM in_genre ig2
WHERE ig2.movie_id = 1375666
AND ig2.genre_id = ig.genre_id
)

If you want the genres, I would simply do:
SELECT genre_id as common_genre
FROM in_genre ig
WHERE movie_id IN (0109830, 1375666)
GROUP BY genre_id
HAVING COUNT(*) = 2;
If you want the count, a subquery is simple enough:
SELECT COUNT(*)
FROM (SELECT genre_id as common_genre
FROM in_genre ig
WHERE movie_id IN (0109830, 1375666)
GROUP BY genre_id
HAVING COUNT(*) = 2
) g;
If you want full information about the genres, then I would suggest exists:
select g.*
from genres g
where exists (select 1
from in_genre ig
where ig.genre_id = g.genre_id and ig.movie_id = 0109830
) and
exists (select 1
from in_genre ig
where ig.genre_id = g.genre_id and ig.movie_id = 1375666
);

SQL - Selecting highest scores for different categories

Lets say i've got a db with 3 tables:
Players (PK id_player, name...),
Tournaments (PK id_tournament, name...),
Game (PK id_turn, FK id_tournament, FK id_player and score)
Players participate in tournaments. Table called Game keeps track of each player's score for different tournaments)
I want to create a view that looks like this:
torunament_name Winner highest_score
Tournament_1 Jones 300
Tournament_2 White 250
I tried different aproaches but I'm fairly new to sql (and alsoto this forum)
I tried using union all clause like:
select * from (select "Id_player", avg("score") as "Score" from
"Game" where "Id_tournament" = '1' group by "Id_player" order by
"Score" desc) where rownum <= 1
union all
select * from (select "Id_player", avg("score") as "Score" from
"Game" where "Id_tournament" = '2' group by "Id_player" order by
"Score" desc) where rownum <= 1;
and ofc it works but whenever a tournament happens, i would have to manually add a select statement to this with Id_torunament = nextvalue
EDIT:
So lets say that player with id 1 scored 50 points in tournament a, player 2 scored 40 points, player 1 wins, so the table should show only player 1 as the winner (or if its possible 2or more players if its a tie) of this tournament. Next row shows the winner of second tournament. I dont think Im going to put multiple games for one player in the same tournament, but if i would, it would probably count avg from all his scores.
EDIT2:
Create table scripts:
create table players
(id_player numeric(5) constraint pk_id_player primary key, name
varchar2(50));
create table tournaments
(id_tournament numeric(5) constraint pk_id_tournament primary key,
name varchar2(50));
create table game
(id_game numeric(5) constraint pk_game primary key, id_player
numeric(5) constraint fk_id_player references players(id_player),
id_tournament numeric(5) constraint fk_id_tournament references
tournaments(id_tournament), score numeric(3));
RDBM screenshot
FINAL EDIT:
Ok, in case anyone is wondering I used Jorge Campos script, changed it a bit and it works. Thank you all for helping. Unfortunately I cannot upvote comments yet, so I can only thank by posting. Heres the final script:
select
t.name,
p.name as winner,
g.score
from
game g inner join tournaments t
on g.id_tournament = t.id_tournament
inner join players p
on g.id_player = p.id_player
inner join
(select g.id_tournament, g.id_player,
row_number() over (partition by t.name order by
score desc) as rd from game g join tournaments t on
g.id_tournament = t.id_tournament
) a
on g.id_player = a.id_player
and g.id_tournament = a.id_tournament
and a.rd=1
order by t.name, g.score desc;

This query could be simplified depending on the RDBMs you are using.
select
t.name,
p.name as winner,
g.score
from
game g inner join tournaments t
on g.id_tournament = t.id_tournament
inner join players p
on g.id_player = p.id_player
inner join
(select id_tournament,
id_player,
row_number() over (partition by t.name order by score desc) as rd
from game
) a
on g.id_player = a.id_player
and g.id_tournament = a.id_tournament
and a.rd=1
order by t.name, g.score desc

Assuming what you want as "Display high score of each player in each tournament"
your query would be like below in MS Sql server
select
t.name as tournament_name,
p.name as Winner,
Max(g.score) as [Highest_Score]
from Tournmanents t
Inner join Game g on t.id_tournament=g.id_tournament
inner join Players p on p.id_player=g.id_player
group by
g.id_tournament,
g.id_player,
t.name,
p.name

Please check this if this works for you
SELECT tournemntData.id_tournament ,
tournemntData.name ,
dbo.Players.name ,
tournemntData.Score
FROM dbo.Game
INNER JOIN ( SELECT dbo.Tournaments.id_tournament ,
dbo.Tournaments.name ,
MAX(dbo.Game.score) AS Score
FROM dbo.Game
INNER JOIN dbo.Tournaments ONTournaments.id_tournament = Game.id_tournament
INNER JOIN dbo.Players ON Players.id_player = Game.id_player
GROUP BY dbo.Tournaments.id_tournament ,
dbo.Tournaments.name
) tournemntData ON tournemntData.id_tournament =Game.id_tournament
INNER JOIN dbo.Players ON Players.id_player = Game.id_player
WHERE tournemntData.Score = dbo.Game.score

aggregate functions are not allowed in WHERE - when joining PostgreSQL tables

In a game using PostgreSQL 9.3.10 some players have paid for a "VIP status", which is indicated by vip column containing a date from future:
# \d pref_users
Column | Type | Modifiers
------------+-----------------------------+--------------------
id | character varying(32) | not null
first_name | character varying(64) | not null
last_name | character varying(64) |
vip | timestamp without time zone |
Also players can rate other players by setting nice column to true, false or leaving it at null:
# \d pref_rep
Column | Type | Modifiers
-----------+-----------------------------+-----------------------------------------------------------
id | character varying(32) | not null
author | character varying(32) | not null
nice | boolean |
I calculate a "reputation" of VIP-players by issuing this SQL JOIN statement:
# select u.id, u.first_name, u.last_name,
count(nullif(r.nice, false))-count(nullif(r.nice, true)) as rep
from pref_users u, pref_rep r
where u.vip>now()and u.id=r.id group by u.id order by rep asc;
id | first_name | last_name | rep
-------------------------+--------------------------------+--------------------
OK413274501330 | ali | salimov | -193
OK357353924092 | viktor | litovka | -137
DE20287 | sergej warapow |
My question is please the following:
How to find all negatively rated players, who have rated other players?
(The background is that I have added a possibility to rate others - to all VIP-players. Until that only positively rated players could rate others).
I have tried the following, but get the error below:
# select count(*) from pref_rep r, pref_users u
where r.author = u.id and u.vip > now() and
u.id in (select id from pref_rep
where (count(nullif(nice, false)) -count(nullif(nice, true))) < 0);
ERROR: aggregate functions are not allowed in WHERE
LINE 1: ...now() and u.id in (select id from pref_rep where (count(null...
^
UPDATE:
I am trying it with temporary table now -
First I fill it with all negatively rated VIP-users and this works well:
# create temp table my_temp as select u.id, u.first_name, u.last_name,
count(nullif(r.nice, false))-count(nullif(r.nice, true)) as rep
from pref_users u, pref_rep r
where u.vip>now() and u.id=r.id group by u.id;
SELECT 362
But then my SQL JOIN returns too many identical rows and I can not find what condition is missing there:
# select u.id, u.first_name, u.last_name
from pref_rep r, pref_users u, my_temp t
where r.author=u.id and u.vip>now()
and u.id=t.id and t.rep<0;
id | first_name | last_name
-------------------------+--------------------------------+----------------------------
OK400153108439 | Vladimir | Pelix
OK123283032465 | Edik | Lehtik
OK123283032465 | Edik | Lehtik
OK123283032465 | Edik | Lehtik
OK123283032465 | Edik | Lehtik
OK123283032465 | Edik | Lehtik
OK123283032465 | Edik | Lehtik
Same problem (multiple rows with same data) I get for the statement:
# select u.id, u.first_name, u.last_name
from pref_rep r, pref_users u
where r.author = u.id and u.vip>now()
and u.id in (select id from my_temp where rep < 0);
I wonder what condition could be missing here?

First of all, I would write your first query as this:
select
u.id, u.first_name, u.last_name,
sum(case
when r.nice=true then 1
when r.nice=false then -1
end) as rep
from
pref_users u inner join pref_rep r on u.id=r.id
where
u.vip>now()
group by
u.id, u.first_name, u.last_name;
(it's the same as yours, but I find it clearer).
To find negatively rated players, you can use the same query as before, just adding HAVING clause:
having
sum(case
when r.nice=true then 1
when r.nice=false then -1
end)<0
to find negatively rated players who have rated players, one solution is this:
select
s.id, s.first_name, s.last_name, s.rep
from (
select
u.id, u.first_name, u.last_name,
sum(case
when r.nice=true then 1
when r.nice=false then -1
end) as rep
from
pref_users u inner join pref_rep r on u.id=r.id
where
u.vip>now()
group by
u.id, u.first_name, u.last_name
having
sum(case
when r.nice=true then 1
when r.nice=false then -1
end)<0
) s
where
exists (select * from pref_rep p where p.author = s.id)
eventually the having clause can be removed from the inner query, and you just can use this where clause on the outer query:
where
rep<0
and exists (select * from pref_rep p where p.author = s.id)

You forgot to mention that pref_users.id is defined as PRIMARY KEY - else your first query would not work. It also means that id is already indexed.
The best query largely depends on typical data distribution.
Assuming that:
... most users don't get any negative ratings.
... most users don't vote at all.
... some or many of those who vote do it often.
It would pay to identify the few possible candidates and only calculate the total rating for those to arrive at the final selection - instead of calculating the total for every user and then filtering only few.
SELECT *
FROM ( -- filter candidates in a subquery
SELECT *
FROM pref_users u
WHERE u.vip > now()
AND EXISTS (
SELECT 1
FROM pref_rep
WHERE author = u.id -- at least one rating given
)
AND EXISTS (
SELECT 1
FROM pref_rep
WHERE id = u.id
AND NOT nice -- at least one neg. rating received
)
) u
JOIN LATERAL ( -- calculate total only for identified candidates
SELECT sum(CASE nice WHEN true THEN 1 WHEN false THEN -1 END) AS rep
FROM pref_rep
WHERE id = u.id
) r ON r.rep < 0;
Indexes
Obviously, you need an index on pref_rep.author besides the (also assumed!) PRIMARY KEY indexes on both id columns.
If your tables are big some more advanced indexes will pay.
For one, you only seem to be interested in current VIP users (u.vip > now()). A plain index on vip would go a long way. Or even a partial multicolumn index that includes the id and truncates older tuples from the index:
CREATE INDEX pref_users_index_name ON pref_users (vip, id)
WHERE vip > '2015-04-21 18:00';
Consider details:
Add datetime constraint to a PostgreSQL multi-column partial index
If (and only if) negative votes are a minority, a partial index on pref_rep might also pay:
CREATE INDEX pref_rep_downvote_idx ON pref_rep (id)
WHERE NOT nice;
Test performance with EXPLAIN ANALYZE, repeat a couple of time to rule out caching effects.

How to list movie actors who acted in every single film released in any given year

I have two databases heading into this: a table acting_gigs with columns actor_name and movie_title and table movies with columns movie_title and release_year. I would like to make a SQL query that lists the names of all the actors that have participated in every single movie in a given release_year, and display two columns: the actors' names (actor_names) and the year in which they participated in every movie (release_year).
For example:
movie_title | release_year
------------------------------------------
'The Green Mile' | 2000
'Titanic' | 1997
'Cast Aways' | 2000
'Independence Day' | 1997
actor_name | movie_title
-------------------------------------------------
'Leonardo DiCaprio' | 'Titanic'
'Tom Hanks' | 'The Green Mile'
'Will Smith' | 'Independence Day'
'Tom Hanks' | 'Cast Aways'
Which means that the table I would like to return is
actor_name | release_year
---------------------------
'Tom Hanks' | 2000
I have been trying to use subqueries and outer joining, but I have not been able to quite arrive at a solution. I know that I have to use count, but I'm unsure how to apply it multiple times in a manner such as this.

Here's one way:
SELECT y.actor_name, y.release_year
FROM (SELECT release_year, COUNT(*) AS cnt
FROM movies
GROUP BY release_year) AS x
INNER JOIN (SELECT actor_name, release_year, COUNT(*) AS cnt
FROM acting_gigs AS t1
INNER JOIN movies AS t2 ON t1.movie_title = t2.movie_title
GROUP BY actor_name, release_year) AS y
ON x.release_year = y.release_year AND x.cnt = y.cnt
Derived table x contains the count of movies per year, whereas derived table y contains the count of movies per year / per actor.
The JOIN predicates:
x.release_year = y.release_year and
x.cnt = y.cnt
guarantee that, for a specific year, only actors that participated in all movies of that year are returned.
Demo here
Here's another, probably more efficient, way using window functions:
SELECT DISTINCT actor_name, release_year
FROM (
SELECT actor_name, release_year,
COUNT(*) OVER (PARTITION BY actor_name, release_year) AS cntPerActorYear,
COUNT(*) OVER (PARTITION BY release_year) AS cntPerYear
FROM acting_gigs AS t1
INNER JOIN movies AS t2 ON t1.movie_title = t2.movie_title ) AS t
WHERE cntPerActorYear = cntPerYear
Demo here

This should do the trick:
select m.release_year
, a.actor_name
, count(1) total_movies
from movies m
join actors a on a.movie_title = m.movie_title
group by m.release_year, a.actor_name
order by m.release_year, a.actor_name -- or however you want to order it

Here's how you do it in MS SQL - http://sqlfiddle.com/#!6/492ac/3
SELECT
A.ActorName,
M.ReleaseYear
FROM
Movies AS M
INNER JOIN
ActorsMovies AS A
ON
M.MovieTitle = A.MovieTitle

Using SELECT with UNION

CREATE TABLE members
(
name varchar(60),
ID char(6) PRIMARY KEY
);
CREATE TABLE ratings
(
memberID char(6) REFERENCES members(ID),
rating SMALLINT CHECK(rating >= 1 AND rating <= 8),
gameID integer REFERENCES games(ID),
PRIMARY KEY (memberID, gameID)
);
Hi guys, I'm trying to list all the members who have rated a game or not with showing the Max rate, Min rate,average rate, and how many times they have rated.
I tried :
(SELECT MAX(rating), MIN(rating), AVG(rating), COUNT(rating), name
FROM ratings, members
WHERE ratings.memberID = members.ID
GROUP BY name)
UNION
(SELECT MAX(rating), MIN(rating), AVG(rating),COUNT(distinct rating), name
FROM ratings, members
WHERE
members.ID NOT IN (SELECT memberID
FROM ratings, members
WHERE ratings.memberID = members.ID)
GROUP BY name);
This first part gives me a correct values; it gives the correct names followed by Max, min, and count, and the average. But the second part gives a correct names but wrong values of Max, Min, Average. It gives a Max of 9 and Min of 2 for all members who didn't rate any game! Which is not true. How can i fix the second part , so it gives a value of zero instead of 9 and 2 ?

I think you can get the result set you are looking for by using either a LEFT or RIGHT join
SELECT
M.name,
MAX(rating),
MIN(rating),
AVG(rating),
COUNT(rating)
FROM
[members] M
LEFT OUTER JOIN
[ratings] R
ON
M.ID = R.memberID
GROUP BY
M.name
This would then give a results like
name max | min | average | count
name1 8 | 2 | 5 | 3
name2 NULL | NULL | NULL | 0

I'd rewrite that query using INNER JOIN syntax, as it is clearer why the joins bring back unexpected results. As you'd written it, the second query would join all results against all members that had no results, per:
SELECT MAX(rating), MIN(rating), AVG(rating),COUNT(distinct rating), name
FROM ratings, members
WHERE
members.ID NOT IN (SELECT memberID
FROM ratings, members
WHERE ratings.memberID = members.ID)
I suspect you were after something more like:
SELECT MAX(rating),
MIN(rating),
AVG(rating),
COUNT(rating),
name
FROM ratings
INNER JOIN members
ON ratings.memberID = members.ID
GROUP BY name
UNION
SELECT MAX(rating),
MIN(rating),
AVG(rating),
COUNT(distinct rating),
name
FROM ratings
WHERE ratings.memberID NOT IN (SELECT memberID FROM members)
GROUP BY name

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL - Get highest rated movie by genre - sql

Related

is there a more efficient alternative for this SQL query?

SQL - Selecting highest scores for different categories

aggregate functions are not allowed in WHERE - when joining PostgreSQL tables

How to list movie actors who acted in every single film released in any given year

Using SELECT with UNION

Categories

Resources