SQL Count & Join - sql

Just not sure what I need to do here. I have to count the total movies based off a certain production company, so the question is this:
How many movies in the database were produced by Pixar Animation Studios?
This is my SQL code so far, I work off Jupyter:
select movies.movie_id, movies.title, productioncompanies.production_company_id, productioncompanies.production_company_name
from movies, productioncompanies
where production_company_name = "Pixar Animation Studios"

A possible solution is the following:
select count(*)
from movies join productioncompanies
on movies.production_company_id = productioncompanies.production_company_id
where production_company_name = 'Pixar Animation Studios';

%%sql
select count(*), p.production_company_name
from productioncompanymap as m
left join productioncompanies as p
on m.production_company_id = p.production_company_id
where production_company_name = 'Pixar Animation Studios'

I apologize for the vague detail to my question. I am still new to stack overflow so still finding my feet.
I have to link two databases of movies. The first database has a set of movie details (movie name, budget, rating, release date and reviews as well as a movie_id), the second table has a genre_id and genre's listing. I need to link the movie to the production comapny i.e. Monsters inc. to Pixar Animations, but the two databases does not have a primary key to link each other to. When I have the list of movies linked to the production company, I have to count the total movies per production company.
I hope this gives more detail, oh yeah, I am a student and this is one of my tests I have to do. I am unsure of how to join the tables and where.

Related

BigQuery SQL Query for top products from the Merchant Center

I am running into an issue with writing an SQL query with Google Big Query. Basically looking to transfer the top products, per country, per category which are also in stock into a table.
So far I have pulled in the top products, per country, per category but the issue is with getting the 'in-stock' part added to the table. I can't find any similar keys in the schema to match them up.
Ideally the table would include:
Rank, Product Title, Country, Category, In-Stock
I would really appreciate any help on this! Thanks.
I have tried to add in a separate table that includes the 'availability' key for each product but I could not match it
You have your top_products table to check the rank that you can join to the product_inventory using the rank_id.
This join will retrieve the product_id and join that key to the products table.
After that, you get the availability information of the product and then you have all the information you require.

SQL count column rows that is dependent on a column in different table when group by

I have been trying to complete the following assignment:
Write a query that returns authors who've written books in more than one category and whose books are published by at least two different publishers. Display the author's Last Name and First Name, the Titles of Books they've written and the Book type. Sort the output by the author's last name.
These are the tables involved:
AUTHORS (Au_id, Au_lname, Au_fname, Phone, Address, City, State, Country, Postalcode)
PUBLISHERS (Pub_id, Pub_name, City, State)
TITLES (Title_id, Title, Type, Pub_id, Price, Advance, Total_sales, Notes, Pubdate, Contract)
TITLEAUTHOR (Au_id, Title_id)
I was able to identify the columns I need, but am not sure how to implement the filters (2 different authors, 2 different types). I assume I need to group by author id and count the Pub_id and Type columns.
This is what I have written so far:
SELECT AUTHORS.Au_lname, AUTHORS.Au_fname, TITLE.Title_id, TITLE.Type
from AUTHORS
JOIN TITLEAUTHOR ON AUTHORS.Au_id = TITLEAUTHOR.Au_id
JOIN TITLE ON TITLEAUTHOR.Title_i = TITLE.Title_i
group by AUTHORS.Au_lname, AUTHORS.Au_fname, TITLE.Title_id, TITLE.Type
ORDER BY AUTHORS.Au_lname;
Welcome to Stack Overflow, Tomasz! Congratulations on your first post. You've done a couple of excellent things already:
You included details on your schema
You included a clear problem statement
You included an example of the SQL you have already tried.
There are a couple of ways you could improve your post, which would help your chances of getting a good answer while also boosting your ability to participate in the community:
Be very careful with formatting - your second edit introduced some redundant text.
Be particular with your code - the first set you gave had some incorrect column names, which likely caused some users to look at your post and then ignore it. There's a critical window after you first post a question where it shows up near the top of the list; that's your best shot at getting answers, so it pays to be very careful with your presentation. Here's a link to some good reading on how to ask.
Be upfront about it when you're working on an assignment! As you can imagine, this community gets a lot of requests for help on homework. Many of the regular users get frustrated by people looking to 'cheat' on their assignments by posting their homework online. There are ways to get help, though: read this post over carefully so you're ready for your next question. The keys are to show that you've already tried to solve it yourself, be specific about what's not working, and be clear about the fact that this is homework.
Now, about your question! What you are looking for is the HAVING clause, which allows you to filter a query to return only those members of a group who meet a condition at the aggregate level. Your instinct about a COUNT was right on, and the HAVING clause will let you do that. Here's an example:
SELECT Au_id
FROM
AUTHORS
INNER JOIN
TITLEAUTHOR ON
AUTHORS.Au_id = TITLEAUTHOR.Au_id
INNER JOIN
TITLES ON
TITLEAUTHOR.Title_id = TITLE.Title_id
GROUP BY Au_id
HAVING
COUNT(DISTINCT TITLES.pub_id) > 1 AND
COUNT(DISTINCT TITLES.Type) > 1
Notice that this adds the word DISTINCT inside of the COUNT() function - this helps you meet the "two different publishers" requirement.
Once you understand how to use HAVING, then you can write a query to get the columns your assignment asked for, and filter the set to only return authors you want: the basic query structure is going to look like this:
SELECT MyColumns
FROM MyTables
WHERE Au_id IN
(
SELECT Au_id
FROM
AUTHORS
INNER JOIN
TITLEAUTHOR ON
AUTHORS.Au_id = TITLEAUTHOR.Au_id
INNER JOIN
TITLES ON
TITLEAUTHOR.Title_id = TITLE.Title_id
GROUP BY Au_id
HAVING
COUNT(DISTINCT TITLES.pub_id) > 1 AND
COUNT(DISTINCT TITLES.Type) > 1
)
ORDER BY WhateverYouWant
Good luck, and welcome to SQL!

SQL: How do I find which movie genre a user watched the most? (IMDb personal project)

I'm currently working on a personal project and I could use a little help. Here's the scenario:
I'm creating a database (MS Access) for all of the movies myself and some friends have ever watched. We rated all of our movies on IMDb and used the export feature to get all of the movie data and our movie ratings. I plan on doing some summary analysis on Excel. One thing I am interested in is the most common movie genre that each person watched. Below is my current scenario. Note that the column "const" is the movies' unique IDs. I also have individual tables for each person's ratings and the following tables are the summary tables that make up the combination of all the movies we have watched.
Here's the table I had: http://imgur.com/v5x9Dhg
I assigned each genre an ID, like this: http://imgur.com/aXdr9XI
And here is a table where I have separate instances for each movie ID and a unique genre: http://imgur.com/N0wULo8
I want to find a way to count up all of the genres that each person watches. Any advice? I would love to provide any additional information that you need!
Thank you!
You need to have at least one table which has one row per user and const (movie watched). In the 3 example tables you posted nothing shows who watched which movies, which is information you need to solve your problem. You mention having "individual tables for each person's ratings," so I assume you have that information. You will want to combine all of them though, into a table called PERSON_MOVIE or something of the like.
So let's say your second table is called GENRE and its columns are ID, Genre.
Let's say your third table is called GENRE_MOVIE and its columns are Const and ID (ID corresponds to ID on the GENRE table)
Let's say the fourth table, which you did not post, but which is required, is called PERSON_MOVIE and its columns are person, Const, rating.
You could then write a query like this:
select vw1.*, ge.genre
from (select um.person, gm.id as genre_id, count(*) as num_of_genre
from user_movie um
inner join genre_movie gm
on um.const = gm.const
group by um.person, gm.id) vw1
inner join (select person, max(num_of_genre) as high_count
from (select um.person, gm.id, count(*) as num_of_genre
from user_movie um
inner join genre_movie gm
on um.const = gm.const
group by um.person, gm.id) x
group by person) vw2
on vw1.person = vw2.person
and vw1.num_of_genre = vw2.high_count
inner join genre ge
on vw1.genre_id = ge.id
Edit re: your comment:
So right now you have multiple tables reflecting people's ratings of movies. You need to combine those into a table called PERSON_MOVIE or something similar (as in example above).
There will be 3 columns on the table: person, const, rating
I'm not sure if access supports the traditional create table as select query but ordinarily you would be able to construct such a table in the following way:
create table person_movie as
select 'Bob', const, [You rated]
from ratings_by_bob
union all
select 'Sally', const, [You rated]
from ratings_by_sally
union all
select 'Jack', const, [You rated]
from ratings_by_jack
....
If not, just combine the tables manually and add a third column as shown indicating what users are reflected by each row. Then you can run my initial query.

SQL Two aggregate functions over three tables

I'm doing a database project and I've spent about 3 hours confusing myself so I thought I'd try to get some help on here.
I have tables for music genres, competitions and entries to the competitions.
I need to work out how many competitions there are for each genre and how many entries there are for each genre.
My aggregate functions for competitions and entries work when I only have one in the query but when I have them both in the same query I get the results for the entries column in competitions as well and I have no idea what I'm doing wrong, it's probably something stupid and simple.
Here's my query:
SELECT Genre.Genre, count(Competition.Genre_ID)Competitions, count(Comp_Entry.Comp_ID)Bands
FROM Genre, Competition, Comp_Entry
WHERE Genre.Genre_ID = Competition.Genre_ID
AND Comp_Entry.Comp_ID = Competition.Comp_ID
GROUP BY Competition.Genre_ID, Genre.Genre
ORDER BY Genre.Genre;
Can anyone see what I'm doing wrong?
Thanks.
You should use proper join syntax. But the problem you are facing is that you need count(distinct) rather than count():
SELECT Genre.Genre, count(Competition.Genre_ID) as Competitions,
count(distinct Comp_Entry.Comp_ID) as Bands
FROM Genre join
Competition
on Genre.Genre_ID = Competition.Genre_ID join
Comp_Entry
on Comp_Entry.Comp_ID = Competition.Comp_ID
GROUP BY Competition.Genre_ID, Genre.Genre
ORDER BY Genre.Genre;

correlated query to update a table based on a select

I have these tables Genre and Songs. There is obviously many to many relationship btw them, as one genre can have (obviously) have many songs and one song may belong to many genre (say there is a song xyz, it belong to rap, it can also belong to hip-hop). I have this table GenreSongs which acts as a many to many relationship map btw these two, as it contains GenreID and SongID column. So, what I am supposed to do this, add a column to this Genre table named SongsCount which will contain the number of songs in this genre. I can alter table to add a column, also create a query that will give the count of song,
SELECT GenreID, Count(SongID) FROM GenreSongs GROUP BY GenreID
Now, this gives us what we require, the number of songs per genre, but how can I use this query to update the column I made (SongsCount). One way is that run this query and see the results, and then manually update that column, but I am sure everyone will agree that's not a programmtic way to do it.
I came to think I would require to create a query with a subquery, that would get the value of GenreID from outer query and then count of its value from inner query (correlated query) but I can't make any. Can any one please help me make this?
The question of how to approach this depends on the size of your data and how frequently it is updated. Here are some scenarios.
If your songs are updated quite frequently and your tables are quite large, then you might want to have a column in Genre with the count, and update the column using a trigger on the Songs table.
Alternatively, you could build an index on the GenreSong table on Genre. Then the following query:
select count(*)
from GenreSong gs
where genre = <whatever>
should run quite fast.
If your songs are updated infrequently or in a batch (say nightly or weekly), then you can update the song count as part of the batch. Your query might look like:
update Genre
set SongCnt = cnt
from (select Genre, count(*) as cnt from GenreCount gc group by Genre) gc
where Genre.genre = gc.Genre
And yet another possibility is that you don't need to store the value at all. You can make it part of a view/query that does the calculation on the fly.
Relational databases are quite flexible, and there is often more than one way to do things. The right approach depends very much on what you are trying to accomplish.
Making a table named SongsCount is just plainly bad design (redundant data and update overhead). Instead use this query for single results:
SELECT ID, ..., (SELECT Count(*) FROM GenreSongs WHERE GenreID = X) AS SongsCount FROM Genre WHERE ID = X
And this for multiple results (much more efficient):
SELECT ID, ..., SongsCount FROM (SELECT GenreID, Count(*) AS SongsCount FROM GenreSongs GROUP BY GenreID) AS sub RIGHT JOIN Genre AS g ON sub.GenreID = g.ID