Hadoop hive query - sql

I have these kinds of rows in the table 1st is the movie id, 2nd is the movie title, 3rd is the rating given by a person. There are different movies. NOT all of them are toy story for example. Its just limited.
The question I have is this:
Give the name of the movie with the highest ratings
So for example: if 6 persons give a 1 star rating for a movie the sum is 6. Now to another movie, another 2 persons give ratings, 1 give 5 star and the other one 1 star rating. Then the 2nd one is the highest rated movie.
I need to find this answer working with hadoop hive.
This is what i was able to do until now.
Don't know if I need a function or something else.

use this,
select a.movie_name from (
select movie_name, sum(rating) as r, count(*) as cnt
from tableMovieDetail
group by movie_name ) a
order by a.r , cnt desc

Related

can = follow by a variable in sql

I have two tables movie with moive id, movie title, director of the movie, and rating with rating id, movie id, and rating.
The question is to select director's name together with the title(s) of the movie(s) they directed that received the highest rating among all of their movies, and the value of that rating.
I am trying to understand the following solution
select distinct director, title, stars
from (movie join rating using (mid)) m
where stars in (select max(stars)
from rating join movie using (mid)
where m.director = director)
I am in particular confused with the last subquery
select max(stars)
from rating join movie using (mid)
where m.director = director
from all I know, '=' can only be followed by a fixed value, but here it seems to suggest 'looping' through all distinct directors. Which table is the latter director referring to? And how does the looping concept work in sql?
Although this code works to find the highest rated of all movies, it does not do so for each distinct director and it's not the simplest solution, which I will include below. However, the answers to your questions are:
1) The second director (from the where clause in the subquery) is referring to the table created in that subquery while the other m.director is using the m alias of the first table created in the main query.
2) This isn't really a loop here, in the traditional sense of the word. Basically what the above query is saying is: 'Give me the distinct director name, movie name, and rating from the table created by joining rating to movie, where the number of stars is the largest number of stars pulled from this subquery.' Loops for SQL server use the WHILE keyword but they are pretty rare in SQL since there are other functions (or clauses) that can fulfill the same purpose without the need for iteration.
The query posted in your comment only returns a single line with the data for the highest rated movie of all movies in the database not the highest rated movie for each director. The following is a simpler way of writing the query which gives the highest rating achieved for all movies for each director:
SELECT director, title, MAX(stars)
FROM movie
JOIN rating
ON movie.title = rating.movieID
GROUP BY director

How do I select rows with 5 distinct values for each value in other columns?

For reference, this is the schema of the table: casts (pid, mid, role)
What I want to do is find the pid(s) such that they have exactly 5 distinct roles in that mid. That is, since this is a table for actors where pid is the actor id, mid is the movie id and role is the role they play, I want to find all the actor ids that have exactly 5 distinct roles in the respective movie ids of which there can be more than one and that I also want these movie ids.
I'm not exactly sure how to do this without say like 5 self-joins but I'd rather not do that since that would be resource heavy.
Sample table data(casts table)
Sample result from query
Thank you in advance.
Is this what you want?
select pid, mid
from casts
group by pid, mid
having count(distinct role) = 5;

Consult records for certain values of an attribute [duplicate]

This question already has answers here:
Grouped LIMIT in PostgreSQL: show the first N rows for each group?
(6 answers)
Closed 4 years ago.
I have a table with the following scheme (idMovie, genre, title, rating).
How can I make a query that returns the ten films with the best rating for each genre?
I think it could possibly be solved using 'ORDER BY' and also 'LIMIT' to get the top 10 of a genre but I do not know how to do it for each genre.
Disclaimer: I'm newbie in sql.
This is a typical problem called greatest-N-per-group. This normally isn't solved using order by + limit (unless you use LATERAL which is more complicated in my opinion), since as you've mentioned it is an answer to problem of greatest-N but not per group. In your case movie genre is the group.
You could use dense_rank window function to generate ranks based on rating for each genre in a subquery and then select those which are top 10:
select title, rating
from (
select title, rating, dense_rank() over (partition by genre order by rating desc) as rn
from yourtable
) t
where rn <= 10
This may return more than 10 titles for each genre, because there may be ties (the same rating for different movies belonging to one genre). If you only want top 10 without looking at ties, use row_number instead of dense_rank.

query which returns 10 values with a complex condition

I am writing now a pretty complex query and I am facing now a problem I am not able to solve.
I have a table called tbl with 2 columns:
movie_id, Rank
(INTEGER), (LIKE\DISLIKE\NULL)
I need to write a query that returns the top 10 movies which
have the most number of LIKES.
(If there is equality of likes, they need to ordered by Ascending movie_id)
Edge Cases:
If there are less than 10 movies which have Rank = 'LIKE'
(let's say there are only 7) then I need to return those 7 movie_id's ordered by the number of likes and another 3 movies_id which are ordered by movie_id
(it doesn't matter if there is 'DISLIKE' or NULL in the Rank value)
If there aren't 10 movies on the table then I need to return the movies that are in the table (in the same way explained before, that is, first I need to return the movies ordered by the number of'LIKES' and then the rest ordered by movie_id)
Can someone please help me with this?
Thank you!
I think this does what you describe:
select t.*
from tbl t
order by ( (ranktype = 'like')::int ) desc,
rank desc
fetch first 10 rows only;

sqlite Joins with MAX

I have 2 tables. One displays a game played (Date,Where, result,opponent etc) and the other one the details of a batting innings (runs scored, etc) Both tables have a primary key that relates the batting back to a specific game.
I am trying to return the OPPONENT column from Games when the MAX (highest) score is recorded in the table BATTING, but currently i am unsure how to do this.
The 2 tables can be found here
http://i.imgur.com/bqiyD3X.png
The example from these tables would be (max score is 101 in RUNSSCORED, so return the linked OPPONENT from GAMEINDEX which is "Ferndale"
Any help would be great. Thanks.
Is this what you are looking for?
select OPPONENT
from GAMES
where GAMESINDEX in
(select GAMESINDEX from BATTING order by RUNSSCORED desc limit 1);
If there isn't a unique max RUNSSCORED value, then the answer might not be deterministic.
If you want multiple winners in that case, you could use
select OPPONENT
from GAMES natural join BATTING
WHERE RUNSSCORED in (select MAX(RUNSSCORED) from BATTING);
SELECT G.OPPONENT, MAX(B.RUNSSCORED)
FROM GAMES AS G
INNER JOIN BATTING AS B
ON G.GAMESINDEX = B.GAMESINDEX