How to find transitively related (1 degree of separation) IDs in PostgreSQL? - sql

First time posting a question here, couldn't find the answer any other way.
I have a table for music bands
bands
id | name
and I have a musician table
musicians
id | first_name | last_name
I have created a third table, that links them via foreign keys
band_membership
band_id | musician_id
I have populated the bands table with a few bands and the musicians with a few musicians.
Then I linked musician John Doe (ID 1) to bands Foo (ID 1) and Bar (ID 2)
Then I linked musician Jane Doe (ID 2) to bands Foo (ID 1) and Rab (ID 3)
So these musicians share a band but also play in other bands separately.
The question is: How do I select all members of the band Foo and iterate through Foos musicians and SELECT all band names which are related/associated with Foo through it's members? In this case I want the "input" band to be Foo and the SELECT result to be
1 | Bar
2 | Rab
Since these are the two bands which are directly (1 degree of separation) associated with Foo via the band member's other sideprojects/bands
I know I can select all the IDs of Foos members (Jane and John) via the following query
SELECT m.id FROM musicians AS m
INNER JOIN band_membership AS bm ON m.id = bm.musician_id
INNER JOIN bands AS b ON bm.band_id = (SELECT id FROM bands WHERE name = 'Foo')
GROUP BY m.id
I also know I can find all the bands John Doe is a member of via the following query
SELECT b.name FROM bands AS b
INNER JOIN band_membership AS bm ON b.id = bm.band_id
INNER JOIN musicians as m ON bm.musician_id = 1
GROUP BY b.name
But for the life of me I cannot find a way to combine these.
The first query will return
1 | 1
2 | 2
which are John and Jane's IDs but how do I "plug" them into the second query, where the 1 is currently hardcoded.
Thank you for your help!

You can just use multiple joins:
select distinct b2.name
from bands b join
band_membership bm
on bm.band_id = b.id join
band_membership bm2
on bm2.musician_id = bm.musician_id join
bands b2
on bm2.band_id = b2.id
where b.name = 'Foo'

Related

Bad performance when joining two sets based on a

To better illustrate my problem picture the following data set that has Rooms that contain a "range" of animals. To represent the range, each animal is assigned a sequence number in a separate table. There are different animal types and the sequence is "reset" for each of them.
Table A
RoomId
StartAnimal
EndAnimal
GroupType
1
Monkey
Bee
A
1
Lion
Buffalo
A
2
Ant
Frog
B
Table B
Animal
Sequence
Type
Monkey
1
A
Zebra
2
A
Bee
3
A
Turtle
4
A
Lion
5
A
Buffalo
6
A
Ant
1
B
Frog
2
B
Desired Output
Getting all the animals for each Room based on their Start-End entries, e.g.
RoomId
Animal
1
Monkey
1
Zebra
1
Bee
1
Lion
1
Buffalo
2
Ant
2
Frog
I have been able to get the desired output by first creating a view where the rooms have their start and end sequence numbers, and then Join them with the animal list comparing the ranges.
The problem is that this is performing poorly in my real data set where there are around 10k rooms and around 340k animals. Is there a different (better) way to go about this that I'm not seeing?
Example fiddle I'm working with: https://dbfiddle.uk/RnagCTf0
The query I tried is
WITH fullAnimals AS (
SELECT DISTINCT(RoomId), a.[Animal], ta.[GroupType], a.[sequence] s1, ae.[sequence] s2
FROM [TableA] ta
LEFT JOIN [TableB] a ON a.[Animal] = ta.[StartAnimal] AND a.[Type] = ta.[GroupType]
LEFT JOIN [TableB] ae ON ae.[Animal] = ta.[EndAnimal] AND ae.[Type] = a.[Type]
)
SELECT DISTINCT(r.Id), Name, b.[Animal], b.[Type]
FROM [TableB] b
LEFT JOIN fullAnimals ON (b.[Sequence] >= s1 AND b.[Sequence] <= s2)
INNER JOIN [Rooms] r ON (r.[Id] = fullAnimals.[RoomId]) --this is a third table that has more data from the rooms
WHERE b.[Type] = fullAnimals.[GroupType]
Thanks!
One option, to remove the aggregations, is to use the following joins:
between TableA and TableB, to gather "a.StartAnimal" id
between TableA and TableB, to gather "a.EndAnimal" id
between TableB and the previous two TableBs, to gather only the rows that have b.Sequence between the two values of "a.StartAnimal" id and "b.StartAnimal" id, on the matching "Type".
between Table A and Rooms, to gather room infos
SELECT r.*, b.Animal, b.Type
FROM TableA a
INNER JOIN TableB b1 ON a.StartAnimal = b1.Animal
INNER JOIN TableB b2 ON a.EndAnimal = b2.Animal
INNER JOIN TableB b ON b.Sequence BETWEEN b1.Sequence AND b2.Sequence
AND a.GroupType = b.Type
INNER JOIN Rooms r ON r.Id = a.roomId
Check the updated demo here.

would I need to use a Union here, a Join, or something else?

I cant figure out if what i'm needing to do here is a Join statement, or a Union.
Pets
Id name color
1 wiskers grey
2 midnight black
3 ralph yellow
4 Bob brown
Shots table
Id Rabbies a123
2 Yes No
4 No No
Notes tables
Id Notes
4 This pet is blind
2 This pet has no owner
The result im looking for:
Id Name Color Rabbies A123 Notes
1 Wiskers grey Null Null Null
2 midnight black Yes No This pet has no owner
......
I think you want left joins:
select p.*, s.rabies, s.a123, n.notes
from pets p left join
shots s
on s.id = p.id left join
notes n
on n.id = p.id;
Joins and Unions are both used to combine data, and both could potentially be used here. However, I would recommend using a Join, Joins combine columns from different tables, which seems to be what you want to include. You want all columns included for a single row (for the ID of the animal).
https://www.codeproject.com/Articles/1068500/What-Is-the-Difference-Between-a-Join-and-a-UNION
Try that link for more information.
If this is the case
create table Shots (id serial primary key, Rabbies varchar, a123 varchar, pet_id int);
insert into shots (pet_id, Rabbies, a123) values (2, 'Yes','No'), (4, 'No','No');
create table notes (id serial primary key, notes varchar, pet_id int);
insert into notes (pet_id, notes) values (4, 'This pet is blind'), (2, 'This pet has no owner');
select p.id, p.name, p.color, s.rabbies, s.a123, n.notes
from pets p
left join shots s on p.id = s.id
left join notes n on p.id = n.id;

Nesting SELECT statements with duplicate entries and COUNT

I'm working with 3 tables: actors, films, and actor_film. Actors and films only have 2 fields: id (primary key) and name. Actor_film also has 2 fields, actor and film, which are both foreign keys representing actor and film ids, respectively. So if a film had 4 actors in it, there'd be 4 actor_film entries with the same film and 4 different actors.
My problem is that, given a certain actor's id, I'd like to return the actor id, actor name, film name, and the total number of actors in that film. However, the only actors that I want to show are ones that contain certain letters in their names.
Let me clear things up with an example. Say Tom Hanks is in only 2 movies, Forrest Gump and Saving Private Ryan, and I'm looking for actors in those 2 movies that have "Gary" or "Matt" in their names. Further suppose that there are 4 actors in Forrest Gump, and 5 in Saving Private Ryan. Then, the only thing I'd want to return would be (without the column names, of course)
actor id | actor name | film name | # actors
abcdefg | Gary Sinise | Forrest Gump | 4
hijklmn | Matt Damon | Saving Private Ryan | 5
opqrstu | Paul Giamatti | Saving Private Ryan | 5
Currently, I'm 75% of the way there by using:
SELECT actor.id, actors.name, films.name,
FROM (
SELECT actor_film.film
FROM actor_film, actors
WHERE actor_film.actor = actors.id
) AS a, actor_film, actors, films
WHERE actor_film.film = a.film
AND actors.id = actor_film.actor
AND films.id = a.film;
This is returning stuff like:
arnie | Arnold Schwarzenegger | Around the World in 80 Days
arnie | Arnold Schwarzenegger | Around the World in 80 Days
for a film that has 2 actors in it. In other words, I can't pull out all the distinct actors in the movie, but get the proper count for it implicitly and not explicitly with COUNT.
Anyway, I think I'm looking for some kind of INNER JOIN or nested SELECT, but I'm new to SQLite3 and don't know how to bring these together. Any solutions would be great, and any explanations on top of that would be amazing as well.
You shouldn't use the old style joins. They were old in '95 when the newer standard that let you do left joins clearer was made a standard.
I've noticed you also use plurals for your table names (eg "actors") The standard style is to use the singular for the table table name (eg "actor")
I use both these suggestions below, I also show you each step. I suggest you run the queries for each step and look at the output to understand how everything works since you are new to SQL.
Ok, lets take you problem step by step. First of all to see each actor and the films they are in (your first 3 columns) do this:
SELECT a.id as actor_id, a.name as actor_name, f.name as film_name
FROM actor as a
JOIN actor_film af on a.id = af.actor
JOIN film as f on af.film = f.id
Your last column can be found with the following query:
SELECT af.film as film_id, count(*) as c
FROM actor_film as af
GROUP BY af.film
Now we just join them together
SELECT a.id as actor_id, a.name as actor_name, f.name as film_name, fc.c as num_actors
FROM actor as a
JOIN actor_film af on a.id = af.actor
JOIN film as f on af.film = f.id
JOIN (
SELECT af.film as film_id, count(*) as c
FROM actor_file as af
GROUP BY af.film
) as fc on af.film = fc.film_id
If you want you can add a
WHERE a.name = 'Gary' OR a.name = 'Matt'
depending on your platform you might want
WHERE lower(a.name) = 'gary' OR lower(a.name) = 'matt'

SQL : get one particular book owner

I have a Student table with columns like this:
| email (PK) | name |
I have a book table with columns as such:
| bookid(PK) | title |
I have a copy table which have copies of books people own
| emailofOwner(FK to student.email) | bookid(FK to book.bookid) |
A student can of course own multiple books. My aim is to find names of students who own only 1 such book and nothing else BUT with a bookid = 3;
My attempt to get people who own only 1 book.
select c.emailofOwner
from copy c
group by c.emailofOwner
having count(*) = 1 ;
SELECT t1.name
FROM student t1
INNER JOIN
(
SELECT emailofOwner
FROM copy
GROUP BY emailofOwner
HAVING COUNT(DISTINCT bookid) = 1 AND MAX(bookid) = 3
) t2
ON t1.email = t2.emailofOwner
The above query uses a subquery to restrict to students who own one and only one book whose ID is 3. The subquery is identical to your attempt except that it adds the restriction that the max book ID is 3. In this case, since there will only be one book per retained group, this is simply checking the value of the book ID.
To get students with only
select s.name, s.email, count(*) as numBooks
from student s, copy c
where s.email = c.emailOfOwner
group by email
having count(*) = 1
And people with book 3 and ONLY book 3:
select s.name, s.email, count(*) as numBooks
from student s, copy c
where s.email = c.emailOfOwner
group by email
having count(*) = 1 and min(bookId) = 3;
Check out this SQL Fiddle.

Complex SQL query for recommendations

Suppose the following tables :
User
----
id
name
Rating
------
userid
movieid
value
Movie
-----
id
title
movie_genre
-----------
movieid
genreid
genre
-----
id
value
I think the foreign keys are obvious here. The query I am looking for is the following :
Movie A is a movie of genre X that both me and some other user have rated with value 5. I want Movie B also of genre X that was rated a five by one of those users mentioned above.
And I seriously can't find it... The amount of joins isn't necessarily a problem btw, there can be plenty.
EDIT: In case I was unclear. The idea here is that people may have a similar taste in one genre, but a very different taste in another genre. I might like the movies that other people like who have the same taste in that specific genre.
This is what I think you're looking for:
Given a user John, find all movies B such that there exists a movie A, a user Simon, and a genre G where:
John rated movie A a 5
Simon rated movie A a 5
Simon is not John
Simon rated movie B a 5
movie A is of genre G
movie B is of genre G
Phrased this way, I think it's pretty easy to come up with a query:
select B.*
from user John
join rating JohnA on JohnA.userid = John.id and JohnA.value = 5
join movie A on A.id = JohnA.movieid
join rating ASimon on ASimon.movieid = A.id and ASimon.value = 5
join user Simon on Simon.id = ASimon.userid and Simon.id <> John.id
join rating SimonB on SimonB.userid = Simon.id and SimonB.value =5
join movie B on B.id = SimonB.movieid
join movie_genre Agenre on Agrenre.movieID = A.id
join genre G on G.id = Agenre.genreid
join movie_genre Bgenre on Bgenre.genreid = G.id and Bgenre.movieid = B.id
Your database of choice would probably optimize this to remove some joins, but we only really need the relationships (ratings) and not the intermediate objects (movie A, user Simon, and genre G):
select B.*
from user John
join rating JohnA on JohnA.userid = John.id and JohnA.value = 5
join rating ASimon on ASimon.movieid = JohnA.movieid and ASimon.value = 5 and ASimon.userid <> John.id
join rating SimonB on SimonB.userid = ASimon.userid and SimonB.value =5
join movie B on B.id = SimonB.movieid
join movie_genre Agenre on Agrenre.movieID = A.id
join movie_genre Bgenre on Bgenre.genreid = Agenre.genreid and Bgenre.movieid = B.id