Novice SQL query question for a movie ratings database - sql

I have a database with one table, like so:
UserID (int), MovieID (int), Rating (real)
The userIDs and movieIDs are large numbers, but my database only has a sample of the many possible values (4000 unique users, and 3000 unique movies)
I am going to do a matrix SVD (singular value decomposition) on it, so I want to return this database as an ordered array. Basically, I want to return each user in order, and for each user, return each movie in order, and then return the rating for that user, movie pair, or null if that user did not rate that particular movie. example:
USERID | MOVIEID | RATING
-------------------------
99835 8847874 4
99835 8994385 3
99835 9001934 null
99835 3235524 2
.
.
.
109834 8847874 null
109834 8994385 1
109834 9001934 null
etc
This way, I can simply read these results into a two dimensional array, suitable for my SVD algorithm. (Any other suggestions for getting a database of info into a simple two dimensional array of floats would be appreciated)
It is important that this be returned in order so that when I get my two dimensional array back, I will be able to re-map the values to the respective users and movies to do my analysis.

SELECT m.UserID, m.MovieID, r.Rating
FROM (SELECT a.userid, b.movieid
FROM (SELECT DISTINCT UserID FROM Ratings) AS a,
(SELECT DISTINCT MovieID FROM Ratings) AS b
) AS m LEFT OUTER JOIN Ratings AS r
ON (m.MovieID = r.MovieID AND m.UserID = r.UserID)
ORDER BY m.UserID, m.MovieID;
Now tested and it seems to work!
The concept is to create the cartesian product of the list of UserID values in the Ratings table with the list of MovieID values in the Ratings table (ouch!), and then do an outer join of that complete matrix with the Ratings table (again) to collect the ratings values.
This is NOT efficient.
It might be effective.
You might do better though to just run the plain simple select of the data, and arrange to populate the arrays as the data arrives. If you have many thousands of users and movies, you are going to be returning many millions of rows, but most of them are going to have nulls. You should treat the incoming data as a description of a sparse matrix, and first set the matrix in the program to all zeroes (or other default value), and then read the stream from the database and set just the rows that were actually present.
That query is the basically trivial:
SELECT UserID, MovieID, Rating
FROM Ratings
ORDER BY UserID, MovieID;

Sometimes the best thing to do is refactor the table/normalize your data (if that is an option).
Normalize the data structure:
Users Table: (all distinct users)
UserId, FirstName, LastName
Movies Table: (all distinct movies)
MovieId, Name
UserMovieRatings: (ratings that users have given to movies)
UserId, MovieId, Rating
You can do a Cartesian join if you want every combination of users and movies and then use the UserMovieRatings table as needed.
It's probably best to do the refactoring now before you system gets any more complicated. Take this time upfront and I'm positive any queries you need to make will come naturally...hope that helps...
Sample Query:
select UserId, FirstName, LastName, MoveId, Name, cast(null as int) as Rating
into #FinalResults
from Users
cross join Movies
update #FinalResults
set Rating = UMR.Rating
from #FinalResults FR
inner join UserMovieRatings UMR
on FR.UserId = UMR.UserId and FR.MovieId = UMR.MovieId

If I understand your question correctly, you have all the data in your table, and you just want to extract it in the right order. Is that correct? If so, it should just be a mattter of:
select userid, movieid, rating
from ratings
order by userid, movieid

Related

SQL query for finding the movies that users haven't watched

Let, these are the two tables
I've used except keyword to get the desired output
Now, my case is that there are two tables having:
All the user-related data is available (user_id, email, contact...) User_id is of importance for us.
User_id and the movie name that a particular user watches ( multiple records can be there for each user ) Basically this table is created when any user watches a movie that is available.
I don't have the list of available movies, so let us assume that all the movies have been covered by some or the other user in table 2. By using a distinct keyword will give all the movies available.
I need to get a query that gives the output like the user id and the movies that the particular user hasn't watched. Is there a way to get the output without using "PLSQL", "except", "anti join", or "exists" keyword on SQL
SELECT DISTINCT
"tabl1"."type",
"tabl2"."user_id"
FROM
"tabl2"
RIGHT JOIN
"tabl1" ON "tabl1"."userid" = "tabl2"."user_id"
WHERE
"tabl1"."type" NOT IN (SELECT DISTINCT "type"
FROM "tabl1"
LEFT JOIN "tabl2" ON "tabl1"."userid" = "tabl2"."user_id"
WHERE "tabl2"."user_id" IN (SELECT DISTINCT "user_id"
FROM "tabl2"))
I've tried using the join operation but it doesn't give any result and end up having NULL only.
I'm stuck on how to get the required output.
Is there a way to get a similar output like this without using the functions described above.
This looks like the opposite of a many-to-many relationship because one user maybe not watch many movies and one movie not watch by many users.
why you do retrieve it as movies not watch by the particular user.
select movie_name from Movie_table where movie_name not in( select movie_name from userMovieTable where user_id =: user_id)
You want to join user and movie on the condition that the pair is not in the watched table:
with movies as (select distinct movie from watched)
select *
from users u
join movies m on (u.userid, m.movie) not in (select userid, movie from watched)
order by u.userid, m.movie;

How do I select rows with 5 distinct values for each value in other columns?

For reference, this is the schema of the table: casts (pid, mid, role)
What I want to do is find the pid(s) such that they have exactly 5 distinct roles in that mid. That is, since this is a table for actors where pid is the actor id, mid is the movie id and role is the role they play, I want to find all the actor ids that have exactly 5 distinct roles in the respective movie ids of which there can be more than one and that I also want these movie ids.
I'm not exactly sure how to do this without say like 5 self-joins but I'd rather not do that since that would be resource heavy.
Sample table data(casts table)
Sample result from query
Thank you in advance.
Is this what you want?
select pid, mid
from casts
group by pid, mid
having count(distinct role) = 5;

Select query from three tables SQL

The database is Oracle XE .
Let me explain the scenario first ,
Two tables Movie and UserInfo are in a relationship many to many using the junction table Rating.
Rating ( MovieID (FK) , UserName(FK) , Rating)
MovieID and UserName are both respectively the primary keys in the respected tables.
What I am trying to do is make a select statement to select the MovieNames from the Movie table where UserName is not equal to the given input. As the MoveID was the FK, but I need to retrieve MovieName if the movie is not already been rated by the GIVEN user, so I guess I may need to make a rather complex joining operation - which I can't figure out or maybe joining two or more different query using where.
Thanks in advance and please if possible give an explanation about the solution.
This seems like a classic usecase for the not exists operator:
SELECT *
FROM movie m
WHERE NOT EXISTS (SELECT *
FROM rating r
WHERE r.movideid = m.moveid AND
r.username = 'given username here')

SQL: How do I find which movie genre a user watched the most? (IMDb personal project)

I'm currently working on a personal project and I could use a little help. Here's the scenario:
I'm creating a database (MS Access) for all of the movies myself and some friends have ever watched. We rated all of our movies on IMDb and used the export feature to get all of the movie data and our movie ratings. I plan on doing some summary analysis on Excel. One thing I am interested in is the most common movie genre that each person watched. Below is my current scenario. Note that the column "const" is the movies' unique IDs. I also have individual tables for each person's ratings and the following tables are the summary tables that make up the combination of all the movies we have watched.
Here's the table I had: http://imgur.com/v5x9Dhg
I assigned each genre an ID, like this: http://imgur.com/aXdr9XI
And here is a table where I have separate instances for each movie ID and a unique genre: http://imgur.com/N0wULo8
I want to find a way to count up all of the genres that each person watches. Any advice? I would love to provide any additional information that you need!
Thank you!
You need to have at least one table which has one row per user and const (movie watched). In the 3 example tables you posted nothing shows who watched which movies, which is information you need to solve your problem. You mention having "individual tables for each person's ratings," so I assume you have that information. You will want to combine all of them though, into a table called PERSON_MOVIE or something of the like.
So let's say your second table is called GENRE and its columns are ID, Genre.
Let's say your third table is called GENRE_MOVIE and its columns are Const and ID (ID corresponds to ID on the GENRE table)
Let's say the fourth table, which you did not post, but which is required, is called PERSON_MOVIE and its columns are person, Const, rating.
You could then write a query like this:
select vw1.*, ge.genre
from (select um.person, gm.id as genre_id, count(*) as num_of_genre
from user_movie um
inner join genre_movie gm
on um.const = gm.const
group by um.person, gm.id) vw1
inner join (select person, max(num_of_genre) as high_count
from (select um.person, gm.id, count(*) as num_of_genre
from user_movie um
inner join genre_movie gm
on um.const = gm.const
group by um.person, gm.id) x
group by person) vw2
on vw1.person = vw2.person
and vw1.num_of_genre = vw2.high_count
inner join genre ge
on vw1.genre_id = ge.id
Edit re: your comment:
So right now you have multiple tables reflecting people's ratings of movies. You need to combine those into a table called PERSON_MOVIE or something similar (as in example above).
There will be 3 columns on the table: person, const, rating
I'm not sure if access supports the traditional create table as select query but ordinarily you would be able to construct such a table in the following way:
create table person_movie as
select 'Bob', const, [You rated]
from ratings_by_bob
union all
select 'Sally', const, [You rated]
from ratings_by_sally
union all
select 'Jack', const, [You rated]
from ratings_by_jack
....
If not, just combine the tables manually and add a third column as shown indicating what users are reflected by each row. Then you can run my initial query.

sqlite Joins with MAX

I have 2 tables. One displays a game played (Date,Where, result,opponent etc) and the other one the details of a batting innings (runs scored, etc) Both tables have a primary key that relates the batting back to a specific game.
I am trying to return the OPPONENT column from Games when the MAX (highest) score is recorded in the table BATTING, but currently i am unsure how to do this.
The 2 tables can be found here
http://i.imgur.com/bqiyD3X.png
The example from these tables would be (max score is 101 in RUNSSCORED, so return the linked OPPONENT from GAMEINDEX which is "Ferndale"
Any help would be great. Thanks.
Is this what you are looking for?
select OPPONENT
from GAMES
where GAMESINDEX in
(select GAMESINDEX from BATTING order by RUNSSCORED desc limit 1);
If there isn't a unique max RUNSSCORED value, then the answer might not be deterministic.
If you want multiple winners in that case, you could use
select OPPONENT
from GAMES natural join BATTING
WHERE RUNSSCORED in (select MAX(RUNSSCORED) from BATTING);
SELECT G.OPPONENT, MAX(B.RUNSSCORED)
FROM GAMES AS G
INNER JOIN BATTING AS B
ON G.GAMESINDEX = B.GAMESINDEX