Nesting SELECT statements with duplicate entries and COUNT - sql

I'm working with 3 tables: actors, films, and actor_film. Actors and films only have 2 fields: id (primary key) and name. Actor_film also has 2 fields, actor and film, which are both foreign keys representing actor and film ids, respectively. So if a film had 4 actors in it, there'd be 4 actor_film entries with the same film and 4 different actors.
My problem is that, given a certain actor's id, I'd like to return the actor id, actor name, film name, and the total number of actors in that film. However, the only actors that I want to show are ones that contain certain letters in their names.
Let me clear things up with an example. Say Tom Hanks is in only 2 movies, Forrest Gump and Saving Private Ryan, and I'm looking for actors in those 2 movies that have "Gary" or "Matt" in their names. Further suppose that there are 4 actors in Forrest Gump, and 5 in Saving Private Ryan. Then, the only thing I'd want to return would be (without the column names, of course)
actor id | actor name | film name | # actors
abcdefg | Gary Sinise | Forrest Gump | 4
hijklmn | Matt Damon | Saving Private Ryan | 5
opqrstu | Paul Giamatti | Saving Private Ryan | 5
Currently, I'm 75% of the way there by using:
SELECT actor.id, actors.name, films.name,
FROM (
SELECT actor_film.film
FROM actor_film, actors
WHERE actor_film.actor = actors.id
) AS a, actor_film, actors, films
WHERE actor_film.film = a.film
AND actors.id = actor_film.actor
AND films.id = a.film;
This is returning stuff like:
arnie | Arnold Schwarzenegger | Around the World in 80 Days
arnie | Arnold Schwarzenegger | Around the World in 80 Days
for a film that has 2 actors in it. In other words, I can't pull out all the distinct actors in the movie, but get the proper count for it implicitly and not explicitly with COUNT.
Anyway, I think I'm looking for some kind of INNER JOIN or nested SELECT, but I'm new to SQLite3 and don't know how to bring these together. Any solutions would be great, and any explanations on top of that would be amazing as well.

You shouldn't use the old style joins. They were old in '95 when the newer standard that let you do left joins clearer was made a standard.
I've noticed you also use plurals for your table names (eg "actors") The standard style is to use the singular for the table table name (eg "actor")
I use both these suggestions below, I also show you each step. I suggest you run the queries for each step and look at the output to understand how everything works since you are new to SQL.
Ok, lets take you problem step by step. First of all to see each actor and the films they are in (your first 3 columns) do this:
SELECT a.id as actor_id, a.name as actor_name, f.name as film_name
FROM actor as a
JOIN actor_film af on a.id = af.actor
JOIN film as f on af.film = f.id
Your last column can be found with the following query:
SELECT af.film as film_id, count(*) as c
FROM actor_film as af
GROUP BY af.film
Now we just join them together
SELECT a.id as actor_id, a.name as actor_name, f.name as film_name, fc.c as num_actors
FROM actor as a
JOIN actor_film af on a.id = af.actor
JOIN film as f on af.film = f.id
JOIN (
SELECT af.film as film_id, count(*) as c
FROM actor_file as af
GROUP BY af.film
) as fc on af.film = fc.film_id
If you want you can add a
WHERE a.name = 'Gary' OR a.name = 'Matt'
depending on your platform you might want
WHERE lower(a.name) = 'gary' OR lower(a.name) = 'matt'

Related

How to make sure result pairs are unique - without using distinct?

I have three tables I want to iterate over. The tables are pretty big so I will show a small snippet of the tables. First table is Students:
id
name
address
1
John Smith
New York
2
Rebeka Jens
Miami
3
Amira Sarty
Boston
Second one is TakingCourse. This is the course the students are taking, so student_id is the id of the one in Students.
id
student_id
course_id
20
1
26
19
2
27
18
3
28
Last table is Courses. The id is the same as the course_id in the previous table. These are the courses the students are following and looks like this:
id
type
26
History
27
Maths
28
Science
I want to return a table with the location (address) and the type of courses that are taken there. So the results table should look like this:
address
type
The pairs should be unique, and that is what's going wrong. I tried this:
select S.address, C.type
from Students S, Courses C, TakingCourse TC
where TC.course_id = C.id
and S.id = TC.student_id
And this does work, but the pairs are not all unique. I tried select distinct and it's still the same.
Multiple students can (and will) reside at the same address. So don't expect unique results from this query.
Only an overview is needed, so that's why I don''t want duplicates
So fold duplicates. Simple way with DISTINCT:
SELECT DISTINCT s.address, c.type
FROM students s
JOIN takingcourse t ON s.id = t.student_id
JOIN courses c ON t.course_id = c.id;
Or to avoid DISTINCT (why would you for this task?) and, optionally, get counts, too:
SELECT c.type, s.address, count(*) AS ct
FROM students s
JOIN takingcourse t ON s.id = t.student_id
JOIN courses c ON t.course_id = c.id
GROUP BY c.type, s.address
ORDER BY c.type, s.address;
A missing UNIQUE constraint on takingcourse(student_id, course_id) could be an additional source of duplicates. See:
How to implement a many-to-many relationship in PostgreSQL?

How to find transitively related (1 degree of separation) IDs in PostgreSQL?

First time posting a question here, couldn't find the answer any other way.
I have a table for music bands
bands
id | name
and I have a musician table
musicians
id | first_name | last_name
I have created a third table, that links them via foreign keys
band_membership
band_id | musician_id
I have populated the bands table with a few bands and the musicians with a few musicians.
Then I linked musician John Doe (ID 1) to bands Foo (ID 1) and Bar (ID 2)
Then I linked musician Jane Doe (ID 2) to bands Foo (ID 1) and Rab (ID 3)
So these musicians share a band but also play in other bands separately.
The question is: How do I select all members of the band Foo and iterate through Foos musicians and SELECT all band names which are related/associated with Foo through it's members? In this case I want the "input" band to be Foo and the SELECT result to be
1 | Bar
2 | Rab
Since these are the two bands which are directly (1 degree of separation) associated with Foo via the band member's other sideprojects/bands
I know I can select all the IDs of Foos members (Jane and John) via the following query
SELECT m.id FROM musicians AS m
INNER JOIN band_membership AS bm ON m.id = bm.musician_id
INNER JOIN bands AS b ON bm.band_id = (SELECT id FROM bands WHERE name = 'Foo')
GROUP BY m.id
I also know I can find all the bands John Doe is a member of via the following query
SELECT b.name FROM bands AS b
INNER JOIN band_membership AS bm ON b.id = bm.band_id
INNER JOIN musicians as m ON bm.musician_id = 1
GROUP BY b.name
But for the life of me I cannot find a way to combine these.
The first query will return
1 | 1
2 | 2
which are John and Jane's IDs but how do I "plug" them into the second query, where the 1 is currently hardcoded.
Thank you for your help!
You can just use multiple joins:
select distinct b2.name
from bands b join
band_membership bm
on bm.band_id = b.id join
band_membership bm2
on bm2.musician_id = bm.musician_id join
bands b2
on bm2.band_id = b2.id
where b.name = 'Foo'

Postgresql join tables to several columns

I have a list of students' name called table Names and I want to find their categories from another table called Categories as below:
Class_A Class_B Class_C Class_D Category
Sam Adam High
Sarah Medium
James High
Emma Simon Nick Low
My solution is to do a left join but students name from first table should be matching with one of four columns so I am not sure how to write queries. At the moment my query is just matching to Class_A while I need to check all categories and if the student's name exist, return category.
(Note: some rows have more than one student's name)
SELECT Names.name, Categories.Category
FROM Names
LEFT JOIN Categories ON Names.name = Categories.Class_A;
Table Names looks like this:
Name
----
Emma
Nick
James
Adam
Jack
Sarah
And I am expecting an output as below:
Name Category
---- ----
Emma Low
Nick Low
James High
Adam High
Jack -
Sarah Medium
I would be inclined to unpivot the first table. This looks like:
select n.name, c.category
from name n left join
(categories c cross join lateral
(values (c.class_a), (c.class_b), (c.class_c), (c.class_d)
) v(name)
)
on n.name = v.name
where v.name is not null;
Although you can also solve this using in (or or) in the on clause, that may produce a much less efficient execution plan.
Try this using OR in on clause:
SELECT Names.name, coalesce(Categories.Category,'-') as category
FROM Names
LEFT JOIN Categories ON Names.name = Categories.Class_A or Names.name = Categories.Class_B or Names.name = Categories.Class_C or Names.name = Categories.Class_D

How to find all the pairs of tuples that agree on a certain attribute

I am trying to write a query in db2 for a database that has books and the customers who bought them and I am to find the pairs of customers who bought common books.
Say for example the DB is called "DB" and it looks like this
CustomerID Book Cost
1 Harry Potter 12
2 SOUE 6
3 Harry Potter 12
4 Harry Potter 12
5 SOUE 6
6 SOUE 6
I am basically trying to get the resulting table look like
Customer1 Customer2
1 3
1 4
2 5
2 6
I have tried using group by's but I cant seem to get the idea right
I've tried
Select book
from DB
group by book
which uniquely gives me all the books but I don't know how I would go about getting the customer pairs. Any help would be greatly appreciated thank you.
I'd self-join according to the book column. In order to avoid conceptual duplicates (e.g., 1-3 and 3-1), you could make an arbitrary decision to always display the lower customer ID on the left:
SELECT DISTINCT a.customerid, b.customerid
FROM mytable a
JOIN mytable b ON a.book = b.book AND a.customerid < b.customerid
EDIT:
To answer the question in the comments, if you want to display customer names instead of ids, you'd need to join the customers table to this query, twice, once for each column:
SELECT DISTINCT ca.name AS customer1, cb.name AS customer2
FROM purchases pa
JOIN purchases pb ON pa.book = pb.book AND pa.customerid < pb.customerid
JOIN customers ca ON pa.customer_id = ca.id
JOIN customers cb ON pb.customer_id = cb.id

SQL can't group by... Options?

I have the next tables:
involved_in represents a relation between a movie and a person who worked in it:
FID AID JOB
---------- ---------- -----------------------------------
2387816 226673 actor
2146284 230306 actor
1814529 233362 actor
2146710 275818 actor
2033140 324419 actor
2387816 452297 actor
1749641 522815 actor
2379685 972581 actor
2384487 1001930 actor
2065098 1021573 actor
is_a represents a relation between two movies as in movie a is a prequel to movie b:
MOVID1 MOVID2 REL_ID
---------- ---------- ----------
2455766 1858631 2
2465356 716238 12
2465467 1005316 2
2465585 2046499 1
2465793 1992318 6
2465793 2144984 5
2467514 1984530 15
In other tables I can get titles and names for the id's used above.
I want to find those pairs actor-director that have worked more than x times together in movies that are not related as in Johnny Depp has worked with Tim Burton in movies that are not related.
The problem comes with the x times and my really small database account which won't let me have big enough temp tables.
I can:
create view friends as
(select actor, director, film, count(*) over (PARTITION BY actor, director) as together
from
(select a.aid as actor, b.aid as director, a.fid as film
from involved_in a, involved_in b
where a.fid=b.fid AND (a.job='actor' or a.job='actress') AND b.job='director'));
And that will give me every actor-director pair, every film they've worked in together and how many times they have worked together.
The view is too big so I could start by removing all those pairs that have worked less than x times together. Using group by actor, director gets me an error in film (not a group by expression).
Is there any way to limit the rows that appear with count less than x? I've also tried
having count(...) > x
It would be perfect if I could count(actor, director) but that's not the syntax of course since it would be convenient.
After getting my friends view I'm using this query:
select f1.actor, f1.director
from friends f1, friends f2, is_a
where f1.actor = f2.director and f2.actor = f1.director and NOT (f1.film = movid1 and f2.film = movid2);
I don't use JOIN ON and such because my teacher said they were redundant though I do think it looks better so maybe I will use them eventually.
Any ideas?
I suggest the following query, with some major tweaks to yours and removed redundancy.
Use explicit JOINs for better readability, and IN clause to save some space in code. Planner will translate this clause anyways.
CREATE VIEW friends AS
SELECT
actor, director, film, num_together
FROM(
SELECT
a.aid AS actor,
b.aid AS director,
a.fid AS film,
COUNT(*) OVER (PARTITION BY a.aid, b.aid) AS num_together
FROM
involved_in a
INNER JOIN involved_in b ON
a.fid = b.fid
WHERE
a.job IN ('actor', 'actress')
AND b.job = 'director'
) foo
WHERE
num_together < 'x' -- placeholder for your "limit the rows that appear with count less than x"
Though, this view may be misleading, as it lists the number of times which actor and director worked together, but it appears right next to every film they did work on together.
You already have nested select statements, just add another one:
create view friends as
(
SELECT actor, director, film, together
FROM (
select actor, director, film, count(*) over (PARTITION BY actor, director) as together
from
(
select a.aid as actor, b.aid as director, a.fid as film
from involved_in a
INNER JOIN involved_in b ON(a.fid=b.fid)
WHERE (a.job='actor' or a.job='actress')
AND b.job='director'
) InnerMostQuey
) MiddleQuery
WHERE together > x -- Replace x with whatever number that makes you happy :-)
);