Complex SQL query for recommendations - sql

Suppose the following tables :
User
----
id
name
Rating
------
userid
movieid
value
Movie
-----
id
title
movie_genre
-----------
movieid
genreid
genre
-----
id
value
I think the foreign keys are obvious here. The query I am looking for is the following :
Movie A is a movie of genre X that both me and some other user have rated with value 5. I want Movie B also of genre X that was rated a five by one of those users mentioned above.
And I seriously can't find it... The amount of joins isn't necessarily a problem btw, there can be plenty.
EDIT: In case I was unclear. The idea here is that people may have a similar taste in one genre, but a very different taste in another genre. I might like the movies that other people like who have the same taste in that specific genre.

This is what I think you're looking for:
Given a user John, find all movies B such that there exists a movie A, a user Simon, and a genre G where:
John rated movie A a 5
Simon rated movie A a 5
Simon is not John
Simon rated movie B a 5
movie A is of genre G
movie B is of genre G
Phrased this way, I think it's pretty easy to come up with a query:
select B.*
from user John
join rating JohnA on JohnA.userid = John.id and JohnA.value = 5
join movie A on A.id = JohnA.movieid
join rating ASimon on ASimon.movieid = A.id and ASimon.value = 5
join user Simon on Simon.id = ASimon.userid and Simon.id <> John.id
join rating SimonB on SimonB.userid = Simon.id and SimonB.value =5
join movie B on B.id = SimonB.movieid
join movie_genre Agenre on Agrenre.movieID = A.id
join genre G on G.id = Agenre.genreid
join movie_genre Bgenre on Bgenre.genreid = G.id and Bgenre.movieid = B.id
Your database of choice would probably optimize this to remove some joins, but we only really need the relationships (ratings) and not the intermediate objects (movie A, user Simon, and genre G):
select B.*
from user John
join rating JohnA on JohnA.userid = John.id and JohnA.value = 5
join rating ASimon on ASimon.movieid = JohnA.movieid and ASimon.value = 5 and ASimon.userid <> John.id
join rating SimonB on SimonB.userid = ASimon.userid and SimonB.value =5
join movie B on B.id = SimonB.movieid
join movie_genre Agenre on Agrenre.movieID = A.id
join movie_genre Bgenre on Bgenre.genreid = Agenre.genreid and Bgenre.movieid = B.id

Related

How to find transitively related (1 degree of separation) IDs in PostgreSQL?

First time posting a question here, couldn't find the answer any other way.
I have a table for music bands
bands
id | name
and I have a musician table
musicians
id | first_name | last_name
I have created a third table, that links them via foreign keys
band_membership
band_id | musician_id
I have populated the bands table with a few bands and the musicians with a few musicians.
Then I linked musician John Doe (ID 1) to bands Foo (ID 1) and Bar (ID 2)
Then I linked musician Jane Doe (ID 2) to bands Foo (ID 1) and Rab (ID 3)
So these musicians share a band but also play in other bands separately.
The question is: How do I select all members of the band Foo and iterate through Foos musicians and SELECT all band names which are related/associated with Foo through it's members? In this case I want the "input" band to be Foo and the SELECT result to be
1 | Bar
2 | Rab
Since these are the two bands which are directly (1 degree of separation) associated with Foo via the band member's other sideprojects/bands
I know I can select all the IDs of Foos members (Jane and John) via the following query
SELECT m.id FROM musicians AS m
INNER JOIN band_membership AS bm ON m.id = bm.musician_id
INNER JOIN bands AS b ON bm.band_id = (SELECT id FROM bands WHERE name = 'Foo')
GROUP BY m.id
I also know I can find all the bands John Doe is a member of via the following query
SELECT b.name FROM bands AS b
INNER JOIN band_membership AS bm ON b.id = bm.band_id
INNER JOIN musicians as m ON bm.musician_id = 1
GROUP BY b.name
But for the life of me I cannot find a way to combine these.
The first query will return
1 | 1
2 | 2
which are John and Jane's IDs but how do I "plug" them into the second query, where the 1 is currently hardcoded.
Thank you for your help!
You can just use multiple joins:
select distinct b2.name
from bands b join
band_membership bm
on bm.band_id = b.id join
band_membership bm2
on bm2.musician_id = bm.musician_id join
bands b2
on bm2.band_id = b2.id
where b.name = 'Foo'

Is there an easier way to figure the query out

I have a movie table which has year and movie details like title , movie id( mid) and a table m_cast where i have all the actors in that movie.
I would like to get all the actors who have never been unemployed for more than 3 years. ( Assuming actors are unemployed between two consecutive movies)
i code i came up with is
select a.yr1 y1 , b.yr2 y2 , a.yr1 - b.yr2 diff from
(select substr(substr(trim(year),-5),0,5) yr1 , * from movie m inner join m_cast p on m.mid = p.mid order by pid , yr1) a ,
(select substr(substr(trim(year),-5),0,5) yr2 , * from movie m inner join m_cast p on m.mid = p.mid order by pid, yr2) b on a.yr1 > b.yr2
where not exists
(select count(*) from movie m inner join m_cast p on m.mid = p.mid
and cast(substr(substr(trim(year),-5),0,5) as integer) < a.yr1 and cast(substr(substr(trim(year),-5),0,5) as integer) > b.yr2)
Self join itself takes a lot of time. And lag and lead functions do not work in SQLite version i am using.
I'm assuming the movie table has a column called year, and a column to identify the actor's name. Something like : year int, actorId int
The fastest way to run your query is to filter the last 3 years from your movie table and then to group by your actors the distinct count of years.
Example after filtering
ActorId Year
1. 2018
1. 2018
1. 2017
2. 2016
2. 2017
2. 2018
Then group by and select distinct :
Select actorId from movieTable group by actorId having count (distinct (Year)) =3
And that will only return the actors who have worked in the last 3 years. Once you have your actors id's filtered out in that column do a join to the table that holds their names.
Sorry about the format of my writing - did it from my cellphone.
Regards,
Jorge D. Lopez

Nesting SELECT statements with duplicate entries and COUNT

I'm working with 3 tables: actors, films, and actor_film. Actors and films only have 2 fields: id (primary key) and name. Actor_film also has 2 fields, actor and film, which are both foreign keys representing actor and film ids, respectively. So if a film had 4 actors in it, there'd be 4 actor_film entries with the same film and 4 different actors.
My problem is that, given a certain actor's id, I'd like to return the actor id, actor name, film name, and the total number of actors in that film. However, the only actors that I want to show are ones that contain certain letters in their names.
Let me clear things up with an example. Say Tom Hanks is in only 2 movies, Forrest Gump and Saving Private Ryan, and I'm looking for actors in those 2 movies that have "Gary" or "Matt" in their names. Further suppose that there are 4 actors in Forrest Gump, and 5 in Saving Private Ryan. Then, the only thing I'd want to return would be (without the column names, of course)
actor id | actor name | film name | # actors
abcdefg | Gary Sinise | Forrest Gump | 4
hijklmn | Matt Damon | Saving Private Ryan | 5
opqrstu | Paul Giamatti | Saving Private Ryan | 5
Currently, I'm 75% of the way there by using:
SELECT actor.id, actors.name, films.name,
FROM (
SELECT actor_film.film
FROM actor_film, actors
WHERE actor_film.actor = actors.id
) AS a, actor_film, actors, films
WHERE actor_film.film = a.film
AND actors.id = actor_film.actor
AND films.id = a.film;
This is returning stuff like:
arnie | Arnold Schwarzenegger | Around the World in 80 Days
arnie | Arnold Schwarzenegger | Around the World in 80 Days
for a film that has 2 actors in it. In other words, I can't pull out all the distinct actors in the movie, but get the proper count for it implicitly and not explicitly with COUNT.
Anyway, I think I'm looking for some kind of INNER JOIN or nested SELECT, but I'm new to SQLite3 and don't know how to bring these together. Any solutions would be great, and any explanations on top of that would be amazing as well.
You shouldn't use the old style joins. They were old in '95 when the newer standard that let you do left joins clearer was made a standard.
I've noticed you also use plurals for your table names (eg "actors") The standard style is to use the singular for the table table name (eg "actor")
I use both these suggestions below, I also show you each step. I suggest you run the queries for each step and look at the output to understand how everything works since you are new to SQL.
Ok, lets take you problem step by step. First of all to see each actor and the films they are in (your first 3 columns) do this:
SELECT a.id as actor_id, a.name as actor_name, f.name as film_name
FROM actor as a
JOIN actor_film af on a.id = af.actor
JOIN film as f on af.film = f.id
Your last column can be found with the following query:
SELECT af.film as film_id, count(*) as c
FROM actor_film as af
GROUP BY af.film
Now we just join them together
SELECT a.id as actor_id, a.name as actor_name, f.name as film_name, fc.c as num_actors
FROM actor as a
JOIN actor_film af on a.id = af.actor
JOIN film as f on af.film = f.id
JOIN (
SELECT af.film as film_id, count(*) as c
FROM actor_file as af
GROUP BY af.film
) as fc on af.film = fc.film_id
If you want you can add a
WHERE a.name = 'Gary' OR a.name = 'Matt'
depending on your platform you might want
WHERE lower(a.name) = 'gary' OR lower(a.name) = 'matt'

SQL select on a many-to-many table

I've got 3 tables: Movies, Actors, and MovieActors. MovieActors is a many-to-many relationship of Movies and Actors with columns MovieActorId, MovieId, and ActorId
How do I find movies that have a certain set of actors in it? For example I want to find all movies that have both Michael Fassbender (actor Id 1) and Brad Pitt (actor Id 2) in them. What would the query look like?
One way is to join the tables. Filter for the actors and then insure the count has the number of actors you want in it (2 in this case)
SELECT
m.MovieID
FROM
Movies m
INNER JOIN MovieActors ma
ON m.MovieID = ma.MovieID
WHERE
ma.ActorID IN (1,2)
GROUP BY
m.MovieID
HAVING COUNT(DISTINCT ma.ActorID) = 2
DEMO
Note
Thanks to user814064 for pointing out that since Actors can have more than one role on a movie we need to count the DISTINCT ma.ActorID not just * The SQL Fiddle Demo demonstrates the difference
select m.movieid
from movies m
inner join movieactors ma on ma.movieid = m.movieid
where ma.actorid in (1,2)
group by m.movieid
having count(distinct ma.actorid) = 2
To keep it simple, you can just do two in clauses:
select * from Movies m
where m.MovieId in (select MovieId from MovieActors where ActorId = 1)
and m.MovieId in (select MovieId from MovieActors where ActorId = 2)
Performance may not be as good as a single join, but it's clean and easy to read.

help with a sql query joining 2 many-many tables

i want help how to solve this sql problem.
suppose i have 3 tables
Movie
ID
Name
Genre
ID
Name
Movie_Genre (this one is the link for many to many)
FK_MovieID
FK_GenreID
i want to select all the movies that are of genre 1 and genre 3
how is this possible?
i can only select the movies of 1 genre but not the movies that are of 2 genres using
SELECT Movie.ID, Movie.Name
FROM Movies
INNER JOIN Movie_Genre ON Movie_Genre.FK_MovieID=Movie.ID
AND Movie_Genre.FK_GenreID = 1
Use:
SELECT m.id,
m.name
FROM MOVIE m
JOIN MOVIE_GENRE mg ON mg.fk_movieid = m.id
AND mg.fk_genreid IN (1, 3)
GROUP BY m.id, m.name
HAVING COUNT(DISTINCT mg.fk_genreid) = 2
The last line is key to getting rows from both genre's - the DISTINCT means duplicate associations (IE: two instances of genre 1) will be ignored because they are false positives. But the count must equal the number of genres you are looking for.
But COUNT(DISINCT isn't supported by all databases. You should mention what you are using - if not by tag, then in the question... If the primary key for the MOVIE_GENRE table is both fk_movieid and fk_genreid, then it's not an issue. Next best thing would be that both the columns are in a unique constraint/index...