SQL: How do I find which movie genre a user watched the most? (IMDb personal project) - sql

I'm currently working on a personal project and I could use a little help. Here's the scenario:
I'm creating a database (MS Access) for all of the movies myself and some friends have ever watched. We rated all of our movies on IMDb and used the export feature to get all of the movie data and our movie ratings. I plan on doing some summary analysis on Excel. One thing I am interested in is the most common movie genre that each person watched. Below is my current scenario. Note that the column "const" is the movies' unique IDs. I also have individual tables for each person's ratings and the following tables are the summary tables that make up the combination of all the movies we have watched.
Here's the table I had: http://imgur.com/v5x9Dhg
I assigned each genre an ID, like this: http://imgur.com/aXdr9XI
And here is a table where I have separate instances for each movie ID and a unique genre: http://imgur.com/N0wULo8
I want to find a way to count up all of the genres that each person watches. Any advice? I would love to provide any additional information that you need!
Thank you!

You need to have at least one table which has one row per user and const (movie watched). In the 3 example tables you posted nothing shows who watched which movies, which is information you need to solve your problem. You mention having "individual tables for each person's ratings," so I assume you have that information. You will want to combine all of them though, into a table called PERSON_MOVIE or something of the like.
So let's say your second table is called GENRE and its columns are ID, Genre.
Let's say your third table is called GENRE_MOVIE and its columns are Const and ID (ID corresponds to ID on the GENRE table)
Let's say the fourth table, which you did not post, but which is required, is called PERSON_MOVIE and its columns are person, Const, rating.
You could then write a query like this:
select vw1.*, ge.genre
from (select um.person, gm.id as genre_id, count(*) as num_of_genre
from user_movie um
inner join genre_movie gm
on um.const = gm.const
group by um.person, gm.id) vw1
inner join (select person, max(num_of_genre) as high_count
from (select um.person, gm.id, count(*) as num_of_genre
from user_movie um
inner join genre_movie gm
on um.const = gm.const
group by um.person, gm.id) x
group by person) vw2
on vw1.person = vw2.person
and vw1.num_of_genre = vw2.high_count
inner join genre ge
on vw1.genre_id = ge.id
Edit re: your comment:
So right now you have multiple tables reflecting people's ratings of movies. You need to combine those into a table called PERSON_MOVIE or something similar (as in example above).
There will be 3 columns on the table: person, const, rating
I'm not sure if access supports the traditional create table as select query but ordinarily you would be able to construct such a table in the following way:
create table person_movie as
select 'Bob', const, [You rated]
from ratings_by_bob
union all
select 'Sally', const, [You rated]
from ratings_by_sally
union all
select 'Jack', const, [You rated]
from ratings_by_jack
....
If not, just combine the tables manually and add a third column as shown indicating what users are reflected by each row. Then you can run my initial query.

Related

How to design a query in WHERE clause of all column that contain same data value?

I have a table, the columns are:
Respondent_ID, classical, gospel, pop, kpop, country, folk, rock, metal ... (all genre of music)
there are 16 columns of different type of genre of music,
and data value is Never, Rarely, Sometimes or Very frequently
SELECT *
FROM genre_frequency
WHERE
I want to design a query which show results of all columns in the table what has the value 'Very Frequently', can anyone lend me a hand here? I'm still new to this, please help anyone...
Could put the same criteria under every genre field with OR operator - very messy. Or could use a VBA custom function.
Or could normalize data structure so you have fields: RespondentID, Genre, Frequency. A UNION query can rearrange data to this normalized structure (unpivot). There is a limit of 50 SELECT lines and there is no builder or wizard for UNION - must type or copy/paste in SQL View.
SELECT Respondent_ID, "classical" AS Genre, classical AS Frequency FROM genre_frequency
UNION SELECT Respondent_ID, "gospel", gospel FROM genre_frequency
... {continue for additional genre columns};
Now use that query like a table in subsequent queries. Just cannot edit data.
SELECT * FROM qryUNION WHERE Frequency="Very frequently";
UNION query can perform slowly with very large dataset. Probably would be best to redesign table. Could save this rearranged data to a table. If you want to utilize lookup tables for Genre and Frequency in order to save ID keys instead of full descriptive text, that can also be accommodated in redesign.
You should normalize your schema. This one has the problem that it requires you to alter the table whenever you want to add or remove a genre.
You must have at least three tables:
Table Respondent: Respondent_ID (PK), Name, etc.
Table Genre: Genre_ID (PK), Name
Table Respondent_Genre: Respondent_ID (PK, FK), Genre_ID (PK, FK), Frequency
This also easily allows you to alter the name of a genre or to add additional attributes to a genre like sub-genre or an annotation like (1930–present).
Optionally, you could also have a lookup table for Frequencies and then include the Frequency_ID in Respondent_Genre instead the Frequency as text.
Then you can write a query like this
SELECT r.Name, g.Name, rg.Frequency
FROM
(Respondent r
INNER JOIN Respondent_Genre rg
ON r.Respondent_ID = rg.Respondent_ID)
INNER JOIN Genre g
ON rg.Genre_ID = g.Genre_ID
WHERE
rg.Frequency = 'Very Frequently'

Insert columns from two tables to a new table in PostgreSQL

I am building an application to manage an inventory and I have a problem when creating my tables for the database (I am using PostgreSQL). My problem is the following:
I have two tables, one called 'products' and one called 'users'. Each one with its columns (See image). I want to create a third table called 'product_act_register' , which will keep a record of activity of the products and has with it the columns id, activity_type, quantity, date. But, I want to add other columns which are taken from the table 'users' and 'products'.
It should look like this (Image)
Where product_id, product_name, product_category, product_unit are taken from the table 'products' and the column 'user_id' is taken from the table 'users'.
How can I do this with PostgreSQL ?
Your description and your precise goals are very unclear. You didn't tell us what should happen in the different cases (only a product exists or only a user exists or none exist or both exist). You also didn't tell us how the other columns not coming from these tables should be filled. You furthermore didn't tell us how the tables users and products depend on each other. Basically you con do something like this if you only want to do an insert if both tables have an entry:
INSERT INTO product_acts_register
SELECT 1,'ActivityType1',p.id, p.name, p.category,
p.unit, u.id, 100, CURRENT_DATE
FROM products p JOIN users u ON p.id = u.id;
(Since you didn't tell us how or if to join them, I assumed to join on their id column)
If you don't care about this, but want to insert an entry for any possible combination of users and products, you can just select both tables without joining:
INSERT INTO product_acts_register
SELECT 1,'ActivityType1',p.id, p.name, p.category,
p.unit, u.id, 100, CURRENT_DATE
FROM products p, users u;
You can replicate this here and try out other commands: db<>fiddle
Please be more precise and give us more information when asking the next question.

SQL query for finding the movies that users haven't watched

Let, these are the two tables
I've used except keyword to get the desired output
Now, my case is that there are two tables having:
All the user-related data is available (user_id, email, contact...) User_id is of importance for us.
User_id and the movie name that a particular user watches ( multiple records can be there for each user ) Basically this table is created when any user watches a movie that is available.
I don't have the list of available movies, so let us assume that all the movies have been covered by some or the other user in table 2. By using a distinct keyword will give all the movies available.
I need to get a query that gives the output like the user id and the movies that the particular user hasn't watched. Is there a way to get the output without using "PLSQL", "except", "anti join", or "exists" keyword on SQL
SELECT DISTINCT
"tabl1"."type",
"tabl2"."user_id"
FROM
"tabl2"
RIGHT JOIN
"tabl1" ON "tabl1"."userid" = "tabl2"."user_id"
WHERE
"tabl1"."type" NOT IN (SELECT DISTINCT "type"
FROM "tabl1"
LEFT JOIN "tabl2" ON "tabl1"."userid" = "tabl2"."user_id"
WHERE "tabl2"."user_id" IN (SELECT DISTINCT "user_id"
FROM "tabl2"))
I've tried using the join operation but it doesn't give any result and end up having NULL only.
I'm stuck on how to get the required output.
Is there a way to get a similar output like this without using the functions described above.
This looks like the opposite of a many-to-many relationship because one user maybe not watch many movies and one movie not watch by many users.
why you do retrieve it as movies not watch by the particular user.
select movie_name from Movie_table where movie_name not in( select movie_name from userMovieTable where user_id =: user_id)
You want to join user and movie on the condition that the pair is not in the watched table:
with movies as (select distinct movie from watched)
select *
from users u
join movies m on (u.userid, m.movie) not in (select userid, movie from watched)
order by u.userid, m.movie;

SQL Query to return a table of specific matching values based on a criteria

I have 3 tables in PostgreSQL database:
person (id, first_name, last_name, age)
interest (id, title, person_id REFERENCES person)
location (id, city, state text NOT NULL, country, person_id REFERENCES person)
city can be null, but state and country cannot.
A person can have many interests but only one location. My challenge is to return a table of people who share the same interest and location.
All ID's are serialized and thus created automatically.
Let's say I have 4 people living in "TX", they each have two interests a piece, BUT only person 1 and 3 share a similar interest, lets say "Guns" (cause its Texas after all). I need to select all people from person table where the person's interest title (because the id is auto generated, two Guns interest would result in two different ID keys) equals that of another persons interest title AND the city or state is also equal.
I was looking at the answer to this question here Select Rows with matching columns from SQL Server and I feel like the logic is sort of similar to my question, the difference is he has two tables, to join together where I have three.
return a table of people who share the same interest and location.
I'll interpret this as "all rows from table person where another rows exists that shares at least one matching row in interest and a matching row in location. No particular order."
A simple solution with a window function in a subquery:
SELECT p.*
FROM (
SELECT person_id AS id, i.title, l.city, l.state, l.country
, count(*) OVER (PARTITION BY i.title, l.city, l.state, l.country) AS ct
FROM interest i
JOIN location l USING (person_id)
) x
JOIN person p USING (id)
WHERE x.ct > 1;
This treats NULL values as "equal". (You did not specify clearly.)
Depending on undisclosed cardinalities, there may be faster query styles. (Like reducing to duplicative interests and / or locations first.)
Asides 1:
It's almost always better to have a column birthday (or year_of_birth) than age, which starts to bit-rot immediately.
Asides 2:
A person can have [...] only one location.
You might at least add a UNIQUE constraint on location.person_id to enforce that. (If you cannot make it the PK or just append location columns to the person table.)

correlated query to update a table based on a select

I have these tables Genre and Songs. There is obviously many to many relationship btw them, as one genre can have (obviously) have many songs and one song may belong to many genre (say there is a song xyz, it belong to rap, it can also belong to hip-hop). I have this table GenreSongs which acts as a many to many relationship map btw these two, as it contains GenreID and SongID column. So, what I am supposed to do this, add a column to this Genre table named SongsCount which will contain the number of songs in this genre. I can alter table to add a column, also create a query that will give the count of song,
SELECT GenreID, Count(SongID) FROM GenreSongs GROUP BY GenreID
Now, this gives us what we require, the number of songs per genre, but how can I use this query to update the column I made (SongsCount). One way is that run this query and see the results, and then manually update that column, but I am sure everyone will agree that's not a programmtic way to do it.
I came to think I would require to create a query with a subquery, that would get the value of GenreID from outer query and then count of its value from inner query (correlated query) but I can't make any. Can any one please help me make this?
The question of how to approach this depends on the size of your data and how frequently it is updated. Here are some scenarios.
If your songs are updated quite frequently and your tables are quite large, then you might want to have a column in Genre with the count, and update the column using a trigger on the Songs table.
Alternatively, you could build an index on the GenreSong table on Genre. Then the following query:
select count(*)
from GenreSong gs
where genre = <whatever>
should run quite fast.
If your songs are updated infrequently or in a batch (say nightly or weekly), then you can update the song count as part of the batch. Your query might look like:
update Genre
set SongCnt = cnt
from (select Genre, count(*) as cnt from GenreCount gc group by Genre) gc
where Genre.genre = gc.Genre
And yet another possibility is that you don't need to store the value at all. You can make it part of a view/query that does the calculation on the fly.
Relational databases are quite flexible, and there is often more than one way to do things. The right approach depends very much on what you are trying to accomplish.
Making a table named SongsCount is just plainly bad design (redundant data and update overhead). Instead use this query for single results:
SELECT ID, ..., (SELECT Count(*) FROM GenreSongs WHERE GenreID = X) AS SongsCount FROM Genre WHERE ID = X
And this for multiple results (much more efficient):
SELECT ID, ..., SongsCount FROM (SELECT GenreID, Count(*) AS SongsCount FROM GenreSongs GROUP BY GenreID) AS sub RIGHT JOIN Genre AS g ON sub.GenreID = g.ID