SQL can't group by... Options? - sql

I have the next tables:
involved_in represents a relation between a movie and a person who worked in it:
FID AID JOB
---------- ---------- -----------------------------------
2387816 226673 actor
2146284 230306 actor
1814529 233362 actor
2146710 275818 actor
2033140 324419 actor
2387816 452297 actor
1749641 522815 actor
2379685 972581 actor
2384487 1001930 actor
2065098 1021573 actor
is_a represents a relation between two movies as in movie a is a prequel to movie b:
MOVID1 MOVID2 REL_ID
---------- ---------- ----------
2455766 1858631 2
2465356 716238 12
2465467 1005316 2
2465585 2046499 1
2465793 1992318 6
2465793 2144984 5
2467514 1984530 15
In other tables I can get titles and names for the id's used above.
I want to find those pairs actor-director that have worked more than x times together in movies that are not related as in Johnny Depp has worked with Tim Burton in movies that are not related.
The problem comes with the x times and my really small database account which won't let me have big enough temp tables.
I can:
create view friends as
(select actor, director, film, count(*) over (PARTITION BY actor, director) as together
from
(select a.aid as actor, b.aid as director, a.fid as film
from involved_in a, involved_in b
where a.fid=b.fid AND (a.job='actor' or a.job='actress') AND b.job='director'));
And that will give me every actor-director pair, every film they've worked in together and how many times they have worked together.
The view is too big so I could start by removing all those pairs that have worked less than x times together. Using group by actor, director gets me an error in film (not a group by expression).
Is there any way to limit the rows that appear with count less than x? I've also tried
having count(...) > x
It would be perfect if I could count(actor, director) but that's not the syntax of course since it would be convenient.
After getting my friends view I'm using this query:
select f1.actor, f1.director
from friends f1, friends f2, is_a
where f1.actor = f2.director and f2.actor = f1.director and NOT (f1.film = movid1 and f2.film = movid2);
I don't use JOIN ON and such because my teacher said they were redundant though I do think it looks better so maybe I will use them eventually.
Any ideas?

I suggest the following query, with some major tweaks to yours and removed redundancy.
Use explicit JOINs for better readability, and IN clause to save some space in code. Planner will translate this clause anyways.
CREATE VIEW friends AS
SELECT
actor, director, film, num_together
FROM(
SELECT
a.aid AS actor,
b.aid AS director,
a.fid AS film,
COUNT(*) OVER (PARTITION BY a.aid, b.aid) AS num_together
FROM
involved_in a
INNER JOIN involved_in b ON
a.fid = b.fid
WHERE
a.job IN ('actor', 'actress')
AND b.job = 'director'
) foo
WHERE
num_together < 'x' -- placeholder for your "limit the rows that appear with count less than x"
Though, this view may be misleading, as it lists the number of times which actor and director worked together, but it appears right next to every film they did work on together.

You already have nested select statements, just add another one:
create view friends as
(
SELECT actor, director, film, together
FROM (
select actor, director, film, count(*) over (PARTITION BY actor, director) as together
from
(
select a.aid as actor, b.aid as director, a.fid as film
from involved_in a
INNER JOIN involved_in b ON(a.fid=b.fid)
WHERE (a.job='actor' or a.job='actress')
AND b.job='director'
) InnerMostQuey
) MiddleQuery
WHERE together > x -- Replace x with whatever number that makes you happy :-)
);

Related

How to make sure result pairs are unique - without using distinct?

I have three tables I want to iterate over. The tables are pretty big so I will show a small snippet of the tables. First table is Students:
id
name
address
1
John Smith
New York
2
Rebeka Jens
Miami
3
Amira Sarty
Boston
Second one is TakingCourse. This is the course the students are taking, so student_id is the id of the one in Students.
id
student_id
course_id
20
1
26
19
2
27
18
3
28
Last table is Courses. The id is the same as the course_id in the previous table. These are the courses the students are following and looks like this:
id
type
26
History
27
Maths
28
Science
I want to return a table with the location (address) and the type of courses that are taken there. So the results table should look like this:
address
type
The pairs should be unique, and that is what's going wrong. I tried this:
select S.address, C.type
from Students S, Courses C, TakingCourse TC
where TC.course_id = C.id
and S.id = TC.student_id
And this does work, but the pairs are not all unique. I tried select distinct and it's still the same.
Multiple students can (and will) reside at the same address. So don't expect unique results from this query.
Only an overview is needed, so that's why I don''t want duplicates
So fold duplicates. Simple way with DISTINCT:
SELECT DISTINCT s.address, c.type
FROM students s
JOIN takingcourse t ON s.id = t.student_id
JOIN courses c ON t.course_id = c.id;
Or to avoid DISTINCT (why would you for this task?) and, optionally, get counts, too:
SELECT c.type, s.address, count(*) AS ct
FROM students s
JOIN takingcourse t ON s.id = t.student_id
JOIN courses c ON t.course_id = c.id
GROUP BY c.type, s.address
ORDER BY c.type, s.address;
A missing UNIQUE constraint on takingcourse(student_id, course_id) could be an additional source of duplicates. See:
How to implement a many-to-many relationship in PostgreSQL?

At most one query SQL

I'm doing some exercises in a basic SQL and I have the following problem:
Let us consider the following relational schema about actors and films:
Film: FilmCode đŸ”‘, Title, Filmmaker, Year
Actor: ActorCode đŸ”‘, Surname, Name, Sex, BirthDate, Nationality
Interpretation: Film, Actor, Role
Let us assume that more than one actor may act in a film and that the same actor may act in more than one film and in a given film, each involved actor plays only one role.
Filmmaker to be univocally identified by his/her surname and each film to be directed by a single filmmaker.
The query I have to write is:
the actors that acted together in at most one film directed by the filmmaker Quentin Tarantino.
How can I translate the "at most one film" into SQL language?
What I have wrote so far is:
SELECT DISTINCT A1.ActorCode, A2.ActorCode
FROM Actor A1 A2, Interpretation I1 I2
WHERE I1.Film=I2.Film and I1.Actor <> I2.Actor and A1.ActorCode= I1.Actor and A2.ActorCode=I2.ActorCode and exists unique (
Select *
From Film F
Where I1.Film=F.FilmCode and F.Filmmaker=’Tarantino’
)
But that's not the point
SELECT
I.Actor, I2.Actor
FROM Interpretation I
INNER JOIN FIlm F
ON F.Filmmaker=’Tarantino’
and F.FilmCode = I.Film
INNER JOIN Interpretation I2
ON I.Film = I2.Film
and I2.Actor <> I.Actor
and I2.Actor not in (SELECT Actor
FROM Interpretation I3
INNER JOIN FIlm F1
ON F1.Filmmaker=’Tarantino’
and F1.FilmCode = I3.Film
and F1.FilmCode <> I.Film #All other tarantino films
WHERE I3.Actor = I.Actor)

How to count number of items in a specific type in a table in sql?

I have a table named Movie, with actors attribute. actors_type is specific and looks like this:
GEORGE.ACTOR_TYPE('Clint Eastwood', 'Christopher Carley', 'Bee Vang', 'Ahney Her')
ACTOR_TYPE is implemented as a varray(5) of varchar(20)
the query I tried to count the number of movies for each actor is :
select m.title, a.column_value, count(m.title)
from movie m, table(m.actors) a
group by m.title, a.column_value
order by a.COLUMN_VALUE
which gives me a count of each row(?) Not the count of movies for each actor. the output is as below:
what I am trying to get is to List actors that acted in multiple movies and show movie title and the actor.
but when I add m.title in the select statement, it will count each row.
This is the other query I wrote:
select a.column_value, count(m.title)
from movie m, table(m.actors) a
having count(m.title) > 1
group by a.column_value
order by a.COLUMN_VALUE
and the result is:
I need to add the title to the output too, but when I add it, all the counts will be one, as the first table.
Movie Table:
There is no table for Actors, we create table for it via table(m.actors) a to access its items
You have to remove m.title from group by and selection cause you are going to count that according to actor.I assume column_value is a common column between two tables , so i used that on join
You need to try like below
select a.column_value, count(m.title)
from movie m
join m.actors a on m.column_value=a.column_value
group by a.column_value
order by a.COLUMN_VALUE

Nesting SELECT statements with duplicate entries and COUNT

I'm working with 3 tables: actors, films, and actor_film. Actors and films only have 2 fields: id (primary key) and name. Actor_film also has 2 fields, actor and film, which are both foreign keys representing actor and film ids, respectively. So if a film had 4 actors in it, there'd be 4 actor_film entries with the same film and 4 different actors.
My problem is that, given a certain actor's id, I'd like to return the actor id, actor name, film name, and the total number of actors in that film. However, the only actors that I want to show are ones that contain certain letters in their names.
Let me clear things up with an example. Say Tom Hanks is in only 2 movies, Forrest Gump and Saving Private Ryan, and I'm looking for actors in those 2 movies that have "Gary" or "Matt" in their names. Further suppose that there are 4 actors in Forrest Gump, and 5 in Saving Private Ryan. Then, the only thing I'd want to return would be (without the column names, of course)
actor id | actor name | film name | # actors
abcdefg | Gary Sinise | Forrest Gump | 4
hijklmn | Matt Damon | Saving Private Ryan | 5
opqrstu | Paul Giamatti | Saving Private Ryan | 5
Currently, I'm 75% of the way there by using:
SELECT actor.id, actors.name, films.name,
FROM (
SELECT actor_film.film
FROM actor_film, actors
WHERE actor_film.actor = actors.id
) AS a, actor_film, actors, films
WHERE actor_film.film = a.film
AND actors.id = actor_film.actor
AND films.id = a.film;
This is returning stuff like:
arnie | Arnold Schwarzenegger | Around the World in 80 Days
arnie | Arnold Schwarzenegger | Around the World in 80 Days
for a film that has 2 actors in it. In other words, I can't pull out all the distinct actors in the movie, but get the proper count for it implicitly and not explicitly with COUNT.
Anyway, I think I'm looking for some kind of INNER JOIN or nested SELECT, but I'm new to SQLite3 and don't know how to bring these together. Any solutions would be great, and any explanations on top of that would be amazing as well.
You shouldn't use the old style joins. They were old in '95 when the newer standard that let you do left joins clearer was made a standard.
I've noticed you also use plurals for your table names (eg "actors") The standard style is to use the singular for the table table name (eg "actor")
I use both these suggestions below, I also show you each step. I suggest you run the queries for each step and look at the output to understand how everything works since you are new to SQL.
Ok, lets take you problem step by step. First of all to see each actor and the films they are in (your first 3 columns) do this:
SELECT a.id as actor_id, a.name as actor_name, f.name as film_name
FROM actor as a
JOIN actor_film af on a.id = af.actor
JOIN film as f on af.film = f.id
Your last column can be found with the following query:
SELECT af.film as film_id, count(*) as c
FROM actor_film as af
GROUP BY af.film
Now we just join them together
SELECT a.id as actor_id, a.name as actor_name, f.name as film_name, fc.c as num_actors
FROM actor as a
JOIN actor_film af on a.id = af.actor
JOIN film as f on af.film = f.id
JOIN (
SELECT af.film as film_id, count(*) as c
FROM actor_file as af
GROUP BY af.film
) as fc on af.film = fc.film_id
If you want you can add a
WHERE a.name = 'Gary' OR a.name = 'Matt'
depending on your platform you might want
WHERE lower(a.name) = 'gary' OR lower(a.name) = 'matt'

sql select actors play two more distinct role in the same movie

the question is to select actors that played 2 or more distinct roles in the same movie.
And I got 3 table, actor (id,name) movie (id,name) and casts(aid,mid,role) (aid is the actor id and mid is the movie id)
I wrote a query like this
select a.name
from actor a, movie m, casts c
where a.id = c.aid and m.id = casts.mid
group by (m.name)
having count(distinct role) > 2;
this didnt print the right result and I didnt see the problem with it.
Thanks for the help!
How about this? Was there any error trying to execute your query?
select actor.name from actor, casts, movie
where casts.aid =actor.id
and casts.mid = movie.id
group by movie.name, actor.name
having count(distinct role) >= 2
As the casts table appears to contain one row per actor, movie, and role (unless you are leaving out other columns), any time a single pair of unique values of aid and mid appears on more than one row, it means the actor played more than one role in that movie. Thus, there is no reason to use distinct. Also because your desired result doesn't contain the movie names, your query doesn't need and shouldn't use the movie table.
If it is true that the cast table has only one row for each unique combination of (aid, mid, and role) then the following should work:
select name
from actor
where id in ( select aid
from casts
group by aid, mid
having count(*) > 1)