SQL movie database querying

SQL movie database querying - sql

I'm having trouble solving the following SQL requests:
Give the names of the actors that have acted in more films than 'sara allgood' and who have acted in films that won the 'cannes film festival'. Also, give the filmname.
Get the percentage of movies who won awards out of all movies produced between the years 1970 and 1990.
There are several tables but I'm assuming that only 4 are needed:
'films','remakes','casts', 'awtypes'
'films' attributes: filmid, filmname, year, director, studio, award
'remakes' attributes: filmid, title, year, priorfilm, prioryear
'casts' attributes: filmid, filmname, actor, award(10)
'awtypes' attributes: award(10), org(100), country, colloquial(50), year
It's a bit unclear to me how to match the award to the 'Cannes film festival' in the first query since the award field is only 10 characters meaning it is a reference to the awtypes table but I don't know which field in the awtypes table contains the name of the award and I don't have access to the database at the moment so it's either org or colloquial.
As for the second I don't know how I could compute the percentage but it seems that it should be solved using a union operator for the movies produced between 1970 and 1990 and the films that have won an award (I don't know how to place a condition for having at least one award).

A few hints, I hope they help you!
Give the names of the actors that have acted in more films than 'sara
allgood' and who have acted in films that won the 'cannes film
festival'. Also, give the filmname.
Based on the attributes you're stating, I would say that you can get to the right awtypes attribute via the casts table. They both contain the award(10) column. Given your data, I would expect the org(100) column to contain something on the organization that provides the prizes, so that would be my guess in this case for the cannes film festival content. But you would have to try it out and see what results you get. Unfortunately, as in this case, it is often quite hard to guess the contents of a column based only on column names.
Get the percentage of movies who won awards out of all movies produced
between the years 1970 and 1990.
Based on the info stated in your question, I would go with a guess that the award column in the films table contains a boolean or something that states if the movie won an award or not. You'd have to try this out. If that's the case, you can use a COUNT(*) on all movies between 1970 and 1990 and a COUNT(*) on all movies WHERE award = 1 (or something) to get the total numbers.
You could indeed combine these in a computation query with a UNION. Example that might help you:
SELECT SUM(cnt1) / SUM(cnt2) ... do the right computation here ...
FROM ( SELECT COUNT(*) AS cnt1
,0 AS cnt2
FROM table1
UNION ALL
SELECT 0 AS cnt1
,COUNT(*) AS cnt2
FROM table2) AS sub

Related

Create a report with an query

I have a problem. Consider the following fact and dimension tables in a ROLAP system that collects values of harmful substances measured in foods that are sold in supermarkets.
Fact table:
• Contaminants (TimeID, ShopID, FoodID, substance, quantityPerOunce)
This describes which harmful substance in which quantity was found on a given
food in a given supermarket at a given time.
Dimension tables:
• Time (TimeID, dayNr, dayName, weekNr, monthNr, year)
• Food (FoodID, foodName, brand, foodType)
Example data: (43, egg, Bioland, animalProduct)
• Place (ShopID, name, street1, region, country)
Write one SQL statement to create a report that answers the following query:
List the minimum quantities of the substance "PCB" in animal products and
vegetables (both are foodTypes) that were measured per year in the regions Sachsen,
Thüringen, and Hessen in Germany.
The result should contain years, regions, and the minimum values.
With the same statement, also list
the minimum values per year (i.e. aggregating over all regions in each year)
as well as a grand total with the minimum quantity of PCB in the mentioned regions for animal products and vegetables over all years and all regions.
SQL query
SELECT years, regions, min(quantityPerOunce)
FROM Contaminants as c, Time as t, Food as f, Place as p
WHERE c.TimeID = t.TimeID
AND c.FoodID = f.FoodID
AND c.ShopdID = p.ShopID
AND substance = "PCB"
AND foodType = "vegetables"
AND foodType = "animalProducts"
GROUP BY regions;
I don't know how to solve this kind of exercise. I tried it, but I don't know. And the join should be Equi-Join even if this not the best way.

You are close. First, remember that in GROUP BY queries, the non-aggregate fields in your SELECT must also appear on the GROUP BY line. So, you should have:
GROUP BY years, regions;
Further, if you use this:
foodType = 'vegetables' AND foodType = 'animalProducts'
the query will return nothing, because the foodType can't be both at the same time.
As such, you need this:
(foodType = 'vegetables' OR foodType = 'animalProducts')
or alternatively:
foodType IN ('vegetables','animalProducts')
Your query assumes that regions only contains the three listed regions. If you aren't 100% sure about that, it would be better to specify them explicitly with:
AND regions IN ('Sachsen', 'Thüringen', 'Hessen')
This alone also assumes that these regions are only in Germany. This may be true. It might not be though, so it would be safest to also add:
AND country = 'Germany'
So, something along these lines:
SELECT years, regions, MIN(quantityPerOunce) AS min_quantityPerOunce
FROM Contaminants as c, Time as t, Food as f, Place as p
WHERE c.TimeID = t.TimeID
AND c.FoodID = f.FoodID
AND c.ShopdID = p.ShopID
AND substance = 'PCB'
AND foodType IN ('vegetables','animalProducts')
AND regions IN ('Sachsen', 'Thüringen', 'Hessen')
AND country = 'Germany'
GROUP BY years, regions;
Forgive me if I'm mistaken, but it does seem like this might be a school assignment, so it may help to think about general principles in the future:
Identify ALL the nouns in the problem statement (the names of the regions, the name of the country, the names of the food types, the name of the substance) and make sure they are all represented in the query. They likely wouldn't be mentioned in the problem statement / client request if they weren't important. This is a good rule of thumb for professional settings as well as educational settings.
As a rule, fields in the SELECT which aren't aggregates must also be in the GROUP BY. You can have fields in the GROUP BY which are not in the SELECT, but this is far less common.
For parts of the request which list some items from the same field (regions, for example), use field IN (item1,item2,...,itemX) to allow an OR operator on each of the items.
As an addendum, if you have a dimension table called Time, you may want to enclose the name in double-quotes in some systems to avoid confusion with what is normally a system name of some kind.

Show only the First Row of a Query in Oracle

this query shows a list of actors and how many times they acted in an action-type movie between 1980 and 2000. I'd like to get only the first rows but I didn't find an answer to my question.
Moreover, tell me if there is a better solution for my query.
FILM (CODFILM,TITLE,YEAR,GENRE)
ACTOR (CODACTOR,NAME,SURNAME)
CAST (CODFILM,CODACTOR)
YEAR type is Number and it's write-only the year release
SELECT CODACTOR, GENRE,count(CODACTOR)
FROM FILM NATURAL JOIN CAST T
WHERE GENRE = 'Action' AND YEAR BETWEEN 1980 AND 2000
GROUP BY CODACTOR,GENRE
HAVING COUNT(CODACTOR) >= ALL(
SELECT COUNT(CODACTOR)
FROM FILM NATURAL JOIN CAST
WHERE GENRE = 'Action' AND CODACTOR = T.CODACTOR AND ANNO BETWEEN 1980 AND 2000)
ORDER BY COUNT(CODACTOR) DESC;

Add this to your query
FETCH FIRST 1 ROWS ONLY
Docs
Example -
Top 3 compensated employees
This is available in version 12 and higher of the database.
Below that you could use pseudo column, ROWNUM

How to select in SQL

I have 2 tables I am working with, one is movie_score which contains an id, name, and score. I have another table that is movie_cast which contains mid, cid, and name. Mid is movie id and cid is cast id. the problem I must do is as follows:
Find top 10 (distinct) cast members who have the highest average movie scores. The output list must be sorted by score (from high to low), and then, by cast name in alphabetical order, if/when they have the same average score. The search must NOT include: (a) movies with scores lower than 50 AND (b) cast members who have appeared in less than 3 movies (again, only counting the number of appearances in movies with scores of at least 50). (Expected Output: cid, cname, average score)
I have tried to put the command together but so far this is all I was able to get:
SELECT DISTINCT movie_cast.cid, movie_cast.cname, FROM movie_score INNER JOIN movie_cast ON movie_score.id=movie_cast.mid ORDER BY cname LIMIT 10;
movie-name-score.txt goes with movie_score:
Example of .txt file
9,"Star Wars: Episode III - Revenge of the Sith 3D",80
24214,"The Chronicles of Narnia: The Lion, The Witch and The Wardrobe",76
1789,"War of the Worlds",74
10009,"Star Wars: Episode II - Attack of the Clones 3D",67
771238285,"Warm Bodies",-1
770785616,"World War Z",-1
771303871,"War Witch",89
771323601,"War of the Worlds the True Story",-1
movie-cast.txt goes with movie_cast:
Example:
9,162652153,"Hayden Christensen"
9,162652152,"Ewan McGregor"
9,418638213,"Kenny Baker"
9,548155708,"Graeme Blundell"
9,358317901,"Jeremy Bulloch"
9,178810494,"Anthony Daniels"
9,770726713,"Oliver Ford Davies"
9,162652156,"Samuel L. Jackson"
9,162655731,"James Earl Jones"
I expect to have an output something like:
162655731,"James Earl Jones",average score of the movies they have been in
Does anyone know the best way to create this command?

Query complex in Oracle SQL

I have the following tables and their fields
They ask me for a query that seems to me quite complex, I have been going around for two days and trying things, it says:
It is desired to obtain the average age of female athletes, medal winners (gold, silver or bronze), for the different modalities of 'Artistic Gymnastics'. Analyze the possible contents of the result field in order to return only the expected values, even when there is no data of any specific value for the set of records displayed by the query. Specifically, we want to show the gender indicator of the athletes, the medal obtained, and the average age of these athletes. The age will be calculated by subtracting from the system date (SYSDATE), the date of birth of the athlete, dividing said value by 365. In order to avoid showing decimals, truncate (TRUNC) the result of the calculation of age. Order the results by the average age of the athletes.
Well right now I have this:
select person.gender,score.score
from person,athlete,score,competition,sport
where person.idperson = athlete.idathlete and
athlete.idathlete= score.idathlete and
competition.idsport = sport.idsport and
person.gender='F' and competition.idsport=18 and score.score in
('Gold','Silver','Bronze')
group by
person.gender,
score.score;
And I got this out
By adding the person.birthdate field instead of leaving 18 records of the 18 people who have a medal, I'm going to many more records.
Apart from that, I still have to draw the average age with SYSDATE and TRUNC that I try in many ways but I do not get it.
I see it very complicated or I'm a bit saturated from so much spinning, I need some help.

Reading the task you got, it seems that you're quite close to the solution. Have a look at the following query and its explanation, note the differences from your query, see if it helps.
select p.gender,
((sysdate - p.birthday) / 365) age,
s.score
from person p join athlete a on a.idathlete = p.idperson
left join score s on s.idathlete = a.idathlete
left join competition c on c.idcompetition = s.idcompetition
where p.gender = 'F'
and s.score in ('Gold', 'Silver', 'Bronze')
and c.idsport = 18
order by age;
when two dates are subtracted, the result is number of days. Dividing it by 365, you - roughly - get number of years (as each year has 365 days - that's for simplicity, of course, as not all years have that many days (hint: leap years)). The result is usually a decimal number, e.g. 23.912874918724. In order to avoid that, you were told to remove decimals, so - use TRUNC and get 23 as the result
although data model contains 5 tables, you don't have to use all of them in a query. Maybe the best approach is to go step-by-step. The first one would be to simply select all female athletes and calculate their age:
select p.gender,
((sysdate - p.birthday) / 365 age
from person p
where p.gender = 'F'
Note that I've used a table alias - I'd suggest you to use them too, as they make queries easier to read (table names can have really long names which don't help in readability). Also, always use table aliases to avoid confusion (which column belongs to which table)
Once you're satisfied with that result, move on to another table - athlete It is here just as a joining mechanism with the score table that contains ... well, scores. Note that I've used outer join for the score table because not all athletes have won the medal. I presume that this is what the task you've been given says:
... even when there is no data of any specific value for the set of records displayed by the query.
It is suggested that we - as developers - use explicit table joins which let you to see all joins separated from filters (which should be part of the WHERE clause). So:
NO : from person p, athlete a
where a.idathlete = p.idperson
and p.gender = 'F'
YES: from person p join athlete a on a.idathlete = p.idperson
where p.gender = 'F'
Then move to yet another table, and so forth.
Test frequently, all the time - don't skip steps. Move on to another one only when you're sure that the previous step's result is correct, as - in most cases - it won't automagically fix itself.

Help with SQL aggregate functions

I've been learning SQL for about a day now and I've run into a road bump. Please help me with the following questions:
STUDENT (**StudentNumber**, StudentName, TutorialNumber)
TUTORIAL (**TutorialNumber**, Day, Time, Room, TutorInCharge)
ASSESSMENT (**AssessmentNumber**, AssessmentTitle, MarkOutOf)
MARK (**AssessmentNumber**, **StudentNumber**, RawMark)
PK and FK are identified within "**". I need to generate queries that:
1) List of assessment tasks results showing: Assessment Number, Assessment Title, and average Raw Mark. I know how to use the avg function for a single column, but to display something for multiple columns... a little unsure here.
My attempt:
SELECT RawMark, AssessmentNumber, AsessmentTitle
FROM MARK, ASSESSMENT
WHERE RawMark = (SELECT (RawMark) FROM MARK)
AND MARK.AssessmentNumber = ASSESSMENT.AssessmentNumber;
2) Report on tutorial enrollment showing: Tutorial Number, Day, Room, Tutor in Charge and number of students enrolled. Same as the avg function, now for the count function. Would this require 2 queries?
3) List each student's Raw Mark in each of the assessment tasks showing: Assessment Number, Assessment Title, Student Number, Student Name, Raw Mark, Tutor in Charge and Time. Sort on Tutor in Charge, Day and Time.

Here is an example for the first one, just take the logic and see if you can expand it to the other questions. I find that these things can be hard to lear if you can't find any solid examples but once you get the hang of it you'll sort it out pretty quick.
1)
SELECT a.AssessmentNumber, a.AssessmentTitle, AVG(RawMark)
FROM ASSESSMENT a LEFT JOIN MARK m ON a.AssessmentNumber = m.AssessmentNumber
GROUP BY a.AssessmentNumber, a.AssessmentTitle
OR not using a left join or alias table names
SELECT ASSESSMENT.AssessmentNumber, ASSESSMENT.AssessmentTitle, AVG(RawMark)
FROM ASSESSMENT,MARK
WHERE ASSESSMENT.AssessmentNumber = MARK.AssessmentNumber
GROUP BY ASSESSMENT.AssessmentNumber, ASSESSMENT.AssessmentTitle

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas