can = follow by a variable in sql - sql

I have two tables movie with moive id, movie title, director of the movie, and rating with rating id, movie id, and rating.
The question is to select director's name together with the title(s) of the movie(s) they directed that received the highest rating among all of their movies, and the value of that rating.
I am trying to understand the following solution
select distinct director, title, stars
from (movie join rating using (mid)) m
where stars in (select max(stars)
from rating join movie using (mid)
where m.director = director)
I am in particular confused with the last subquery
select max(stars)
from rating join movie using (mid)
where m.director = director
from all I know, '=' can only be followed by a fixed value, but here it seems to suggest 'looping' through all distinct directors. Which table is the latter director referring to? And how does the looping concept work in sql?

Although this code works to find the highest rated of all movies, it does not do so for each distinct director and it's not the simplest solution, which I will include below. However, the answers to your questions are:
1) The second director (from the where clause in the subquery) is referring to the table created in that subquery while the other m.director is using the m alias of the first table created in the main query.
2) This isn't really a loop here, in the traditional sense of the word. Basically what the above query is saying is: 'Give me the distinct director name, movie name, and rating from the table created by joining rating to movie, where the number of stars is the largest number of stars pulled from this subquery.' Loops for SQL server use the WHILE keyword but they are pretty rare in SQL since there are other functions (or clauses) that can fulfill the same purpose without the need for iteration.
The query posted in your comment only returns a single line with the data for the highest rated movie of all movies in the database not the highest rated movie for each director. The following is a simpler way of writing the query which gives the highest rating achieved for all movies for each director:
SELECT director, title, MAX(stars)
FROM movie
JOIN rating
ON movie.title = rating.movieID
GROUP BY director

Related

SQL logic needed help possible subquerying [duplicate]

This question already has answers here:
How to resolve this SQL query?
(2 answers)
Closed 1 year ago.
Question from an exercise that I am not able to solve. I do not have enough knowledge.
query = For all cases where the same reviewer rated the same movie twice and gave it a higher rating the second time, return the reviewer's name and the title of the movie
If I try this:
SELECT title, max (stars)
FROM Movie
JOIN Rating on Movie.mID = Rating.mID
I get only one movie, I want to be able to see all movies from table "Movie" and the highest rate in case they have more than one rate in table "Rating"
What I get from question I guess this query will solve your problem
SELECT
mov.`title`,
MAX(rat.`rating`)
FROM
rating rat
JOIN movie mov
ON rat.`movie_id` = mov.`id`
WHERE mov.`id` IN
(SELECT
r.`movie_id`
FROM
rating r
GROUP BY r.`movie_id`
HAVING COUNT(*) > 1)
GROUP BY mov.`id`
Movie Table
Rating Table
Result
Use LEFT OUTER JOIN, to bring all values of the 'Movie' table.
Add Group by Clause to get the Title record only one, ie The Max Stars
in case the stars value is null, handle the output.
MaxRating alias for the max value
The script eg.
SELECT Movie.title, max(IFNULL(Rating.stars,0)) MaxRating
FROM Movie
LEFT JOIN Rating
on Movie.mID = Rating.mID
group by Movie.title

How to achieve the correct joins in my PostgreSQL query?

I am facing troubles with generating a query, as I am new to this.
I have 4 tables: movies, ratings, principals and actors
Movies table consists of: tconst(a key), titles, start year
ratings table consists of: tconst(a key), average rating, number of voters
actors table consists of: actors name, birth year, death year, nconst(a key)
principals table is a junction table : nconst, tconst
I would like to create a table that will be able to have one column being the year of a movie, the second column being all the movies released in that year, and the third column having all actors born after that year
I used the query
SELECT
movies.start_year,
array_agg(DISTINCT(actors.name)) AS actors_born_after,
array_agg(movies.title) AS movies_title,
array_agg(DISTINCT(actors.birth_year))
FROM movies
INNER JOIN principals on movies.tconst=principals.tconst
INNER JOIN actors on principals.nconst=actors.nconst
WHERE actors.birth_year > movies.start_year
GROUP BY movies.start_year
limit 5;
but this doesn't seem to work.
Give this a try:
SELECT
movies.start_year,
movies_agg.movies_title,
array_agg(DISTINCT(actors.name)) actors_born_after -- Get list of actors
FROM movies
LEFT JOIN (
SELECT start_year, array_agg(DISTINCT(title)) AS movies_title -- Get all movies per each year
FROM movies a
GROUP BY start_year
) AS movies_agg ON movies.start_year = movies_agg.start_year -- Get list of movies from same year
LEFT JOIN actors ON actors.birth_year > movies.start_year -- Get all actors born after movie start_year
GROUP BY movies.start_year, movies_agg.movie_titles
This should give you:
year of the movie
all movies released in that year
all actors born after that year
I haven't tested it, so you may have some syntax errors. Give it a try and let me know.

Hadoop hive query

I have these kinds of rows in the table 1st is the movie id, 2nd is the movie title, 3rd is the rating given by a person. There are different movies. NOT all of them are toy story for example. Its just limited.
The question I have is this:
Give the name of the movie with the highest ratings
So for example: if 6 persons give a 1 star rating for a movie the sum is 6. Now to another movie, another 2 persons give ratings, 1 give 5 star and the other one 1 star rating. Then the 2nd one is the highest rated movie.
I need to find this answer working with hadoop hive.
This is what i was able to do until now.
Don't know if I need a function or something else.
use this,
select a.movie_name from (
select movie_name, sum(rating) as r, count(*) as cnt
from tableMovieDetail
group by movie_name ) a
order by a.r , cnt desc

SQL: How do I find which movie genre a user watched the most? (IMDb personal project)

I'm currently working on a personal project and I could use a little help. Here's the scenario:
I'm creating a database (MS Access) for all of the movies myself and some friends have ever watched. We rated all of our movies on IMDb and used the export feature to get all of the movie data and our movie ratings. I plan on doing some summary analysis on Excel. One thing I am interested in is the most common movie genre that each person watched. Below is my current scenario. Note that the column "const" is the movies' unique IDs. I also have individual tables for each person's ratings and the following tables are the summary tables that make up the combination of all the movies we have watched.
Here's the table I had: http://imgur.com/v5x9Dhg
I assigned each genre an ID, like this: http://imgur.com/aXdr9XI
And here is a table where I have separate instances for each movie ID and a unique genre: http://imgur.com/N0wULo8
I want to find a way to count up all of the genres that each person watches. Any advice? I would love to provide any additional information that you need!
Thank you!
You need to have at least one table which has one row per user and const (movie watched). In the 3 example tables you posted nothing shows who watched which movies, which is information you need to solve your problem. You mention having "individual tables for each person's ratings," so I assume you have that information. You will want to combine all of them though, into a table called PERSON_MOVIE or something of the like.
So let's say your second table is called GENRE and its columns are ID, Genre.
Let's say your third table is called GENRE_MOVIE and its columns are Const and ID (ID corresponds to ID on the GENRE table)
Let's say the fourth table, which you did not post, but which is required, is called PERSON_MOVIE and its columns are person, Const, rating.
You could then write a query like this:
select vw1.*, ge.genre
from (select um.person, gm.id as genre_id, count(*) as num_of_genre
from user_movie um
inner join genre_movie gm
on um.const = gm.const
group by um.person, gm.id) vw1
inner join (select person, max(num_of_genre) as high_count
from (select um.person, gm.id, count(*) as num_of_genre
from user_movie um
inner join genre_movie gm
on um.const = gm.const
group by um.person, gm.id) x
group by person) vw2
on vw1.person = vw2.person
and vw1.num_of_genre = vw2.high_count
inner join genre ge
on vw1.genre_id = ge.id
Edit re: your comment:
So right now you have multiple tables reflecting people's ratings of movies. You need to combine those into a table called PERSON_MOVIE or something similar (as in example above).
There will be 3 columns on the table: person, const, rating
I'm not sure if access supports the traditional create table as select query but ordinarily you would be able to construct such a table in the following way:
create table person_movie as
select 'Bob', const, [You rated]
from ratings_by_bob
union all
select 'Sally', const, [You rated]
from ratings_by_sally
union all
select 'Jack', const, [You rated]
from ratings_by_jack
....
If not, just combine the tables manually and add a third column as shown indicating what users are reflected by each row. Then you can run my initial query.

Why does my query return an empty result?

Here's the dataset:
Movie (mID, title, year, director)
English: There is a movie with ID number mID, a title, a release year, and a director.
Reviewer (rID, name)
English: The reviewer with ID number rID has a certain name.
Rating (rID, mID, stars, ratingDate)
English: The reviewer rID gave the movie mID a number of stars rating (1-5) on a certain ratingDate.
Here's the question: find the titles of all movies that have no ratings.
My answer: (returns an empty set)
select m.title
from movie m
join rating r on m.mid = r.mid
where stars is null
Correct answer:
select title
from movie
left join rating using (mID)
where stars is null
I'm not sure what's wrong with my join? Thanks in advance!
Your query finds each movie record that does have a corresponding rating record — this is the meaning of movie JOIN rating — provided that said rating record has stars IS NULL.
The correct query finds each movie record that that either lacks a rating record or has a rating record with stars IS NULL.
The key difference is that your query will, as a first step, filter out any movie records without matching rating records (since these will fail the join), whereas the correct query uses a LEFT JOIN to prevent this filtering.
(Note that, even with a LEFT JOIN, this filtering can still happen if the WHERE clause is constructed poorly. For example, WHERE stars = 'X' would also filter out movie records without corresponding rating records, because only an existent rating record could satisfy the WHERE-clause. But with WHERE stars IS NULL, this problem does not arise, because NULL is the default value of stars when the join failed.)
you can use this query for which title has no stars,
"select title from movie left join rating on rating.mid = movie.mid where rating.rid is null"
or you can use
select title from movie where mid not in (select distinct mid from rating )