How to find average of specific records - sql

I'm a little stuck on some work here for my final high school assessment.
The task is essentially to make a database handling movie information.
Tables:
Movie Table, Movie review table
My task is to display the movies along with their reviews then calculate and display an average overall rating.
Visually it should look like this:
Example:
"Movie Title", "Movie Review Rating", "Average Rating"
Titanic , 9 , 8.6
Titanic , 8 , 8.6
Titanic , 9 , 8.6
Star Wars , 4 , 5
Star Wars , 5 , 5
Star Wars , 6 , 5
Matrix , 6 , 8.3
Matrix , 10 , 8.3
Matrix , 9 , 8.3
How would I go about calculating the average for each individual movie?
SELECT
`Movie name`,
`Overall Rating`
AVG(`Average Rating`)
FROM
Movie
INNER JOIN `Movie review`
ON Movie.`Movie Ref` = `Movie review`.`Movie Ref`;

One option uses domain aggregate function:
SELECT MovieRatings.MovieTitle, MovieRatings.MovieReviewRating,
Round(DAvg("MovieReviewRating","MovieRatings","MovieTitle='" &
[MovieTitle] & "'"),1) AS AverageRating FROM MovieRatings;
Or build an aggregate query (open Access query builder and click Totals button on ribbon):
SELECT MovieRatings.MovieTitle, Avg(MovieRatings.MovieReviewRating) AS
AvgOfMovieReviewRating FROM MovieRatings GROUP BY
MovieRatings.MovieTitle;
Then join that query to the table:
SELECT MovieRatings.MovieTitle, MovieRatings.MovieReviewRating,
Round([AvgOfMovieReviewRating],1) AS AverageRating FROM Query1 INNER
JOIN MovieRatings ON Query1.MovieTitle = MovieRatings.MovieTitle;
Here are the 2 queries as all-in-one nested:
SELECT MovieRatings.MovieTitle, MovieRatings.MovieReviewRating,
Round([AvgOfMovieReviewRating],1) AS AverageRating FROM (SELECT
MovieRatings.MovieTitle, Avg(MovieRatings.MovieReviewRating) AS
AvgOfMovieReviewRating FROM MovieRatings GROUP BY
MovieRatings.MovieTitle) AS Query1 INNER JOIN MovieRatings ON
Query1.MovieTitle = MovieRatings.MovieTitle;
The nested query created by first building and saving the aggregate query. Then build second query that references Query1, switch to SQLView and copy/paste the SQL statement of Query1 into second query and save. Can now delete Query1 object.
Grouping and Joining on MovieTitle field but really should be using numeric primary and foreign keys.
Be aware Round() function uses even/odd rule: 9.45 rounds to 9.4 and 9.35 rounds to 9.4.
Advise not to use spaces in naming convention.

You may have misinterpreted the question or are guessing at the data structure. You wouldn't know the averages in advance. If your professor gave you the data with pre-aggregated averages then they didn't teach you correctly because that is not how data is put into a database. i.e. how would you know what to put into the average column if you put the first value in and there are 50,000 values to insert? You would have to wait until all 50,000 values get inserted- then take the average. Then update 50,000 rows of data with this average.
The Movie Review table most likely is a table with two columns- "Movie Title" and "Movie Review Rating". Where each row in the table represents a review. Taking an average would come later after inserting rows into the table with the movie review data.
After data is inserted, an average would be taken similar to:
SELECT MovieTitle, AVG(MovieReviewRating) AS AverageRating
FROM MovieReview
GROUP BY MovieTitle
After confirming the averages, you can join with "Movie Table" for it's columns.

Related

Two Tables/Databases error: Only one expression can be specified in the select list when the subquery is not introduced with EXISTS

Using SQL Server I'm trying to multiply two columns of a distinct part number using 2 tables and 2 databases but it gives this error:
Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.
This SQL joins two tables in different databases and show the Final Part Number (FinalPartNo) for a bumper and the ancillary parts that it needs to put it together (bolts, brackets etc.)
Query:
SELECT
tb_1.FinalPartNo,
tb_1.SubPartType,
tb_1.SubPart,
tb_1.FinalItemSubPartQuantity,
tb_2.PurchasedOrMfg,
tb_2.SalesWeek1,
tb_2.SalesWeek2
FROM [009Reports].[dbo].[ANC Parts] tb_1
JOIN [555].[cache].[PurchasingSupplyChainNeeds] tb_2 ON tb_1.FinalPartNo = tb_2.ItemNo
So it displays this:
If you look at the table and to siplify things, I highlighted 3 part numbers that all use the same SubPart. Two of them use 4 of the FinalSubPartQuantity and one uses 2 in the "install" step. For SalesWeek1 of the highlighted in the image, they sold two of the FinalPartNo which requires 4 FinalSubPartQuantity and had two sales so that totals 8 needed for that week. I don't need the FinalPartNo but added that to show that it's multiple FinalPartNo with the same subpart.
Trying to figure to sum them up with a totals column for each SubPart for that week (for 52 weeks, just showing 2).
In this example, 03CSFY-0500350 for SalesWeek1 could total 150 having it on
multiple FinalPartNo and multiple steps (Fabricate, Assembly, Install).
So, I tried a subquery to make the SubPart distinct and multiply the FinalSubPartQuantity x SalesWeek1 for TotalSalesWeek1 but getting error. Trying to figure out syntax.
SELECT
tb_1.SubPart,
tb_1.FinalItemSubPartQuantity,
TotalSalesWeek1 = (SELECT DISTINCT(tb_1.SubPart),
tb_1.FinalItemSubPartQuantity * tb_2.SalesWeek1),
TotalSalesWeek2 = (SELECT DISTINCT(tb_1.SubPart),
tb_1.FinalItemSubPartQuantity * tb_2.SalesWeek2)
FROM [009Reports].[dbo].[ANC Parts] tb_1
JOIN [555].[cache].[PurchasingSupplyChainNeeds] tb_2 ON tb_1.FinalPartNo = tb_2.ItemNo
I'm just trying to display:
SubPart/FinalSubPartQuantity/TotalSalesWk1/TotalSalesWk2/TotalSalesWk3/ to week 52. So it just shows the sub part, sum of all the FinalSubPartQuantity amounts for that part for all the different FinalPartNo's and the total sales FinalItemPartQuantity * SalesWeek1, 2, 3...
summarize: subpart and how many sold of that part that week.
You can't set the TotalSalesWeek1 to two columns (DISTINCT(tb_1.SubPart) and tb_1.FinalItemSubPartQuantity * tb_2.SalesWeek1).
I would suggest something like the following
SELECT
tb_1.SubPart,
SUM(tb_1.FinalItemSubPartQuantity) FinalItemSubPartQuantity,
SUM(tb_1.FinalItemSubPartQuantity * tb_2.SalesWeek1) TotalSalesWeek1
SUM(tb_1.FinalItemSubPartQuantity * tb_2.SalesWeek2) TotalSalesWeek2
FROM [009Reports].[dbo].[ANC Parts] tb_1
JOIN [555].[cache].[PurchasingSupplyChainNeeds] tb_2 ON tb_1.FinalPartNo = tb_2.ItemNo
GROUP BY tb_1.SubPart
The GROUP BY tb_1.SubPart at the end says you want each unique SubPart on a row, the SUMs in the SELECT explain that you want those values summed for each group.

SQL QUERY : Find for each year copies sold > 10000

I am practicing a bit with SQL and I came across this exercise:
Consider the following database relating to albums, singers and sales:
Album (Code, Singer, Title)
Sales (Album, Year, CopiesSold)
with a constraint of referential integrity between the Sales Album attribute and the key of the
Album report.
Formulate the following query in SQL :
Find the code and title of the albums that have sold 10,000 copies
every year since they came out.
I had thought of solving it like this:
SELECT CODE, TITLE, COUNT (*)
FROM ALBUM JOIN SALES ON ALBUM.Code = SALES.Album
WHERE CopiesSold > 10000
HAVING COUNT(*) = /* Select difference from current year and came out year.*/
Can you help me with this? Thanks.
You can do this with an INNER JOIN, GROUP BY, and HAVING.
SELECT A.Code, A.Title
FROM ALBUM A
INNER JOIN SALES S ON S.Album = A.Code
GROUP BY A.Code, A.Title
HAVING MIN(S.CopiesSold) >= 10000
The HAVING clause will filter out albums whose minimum Copies Sold are < 10000.
EDIT
There was also a question about gaps in the Sales data, there are a number of ways to modify the above query to solve for this as well. One solution would be to use an embedded query to identify the correct number of years.
SELECT A.Code, A.Title
FROM ALBUM A
INNER JOIN SALES S ON S.Album = A.Code
GROUP BY A.Code, A.Title
HAVING MIN(S.CopiesSold) >= 10000 AND
COUNT(*) = (SELECT COUNT(DISTINCT Year) FROM SALES WHERE Year >= MIN(s.Year))
This solution assumes that at least one album by some artist was sold each year (a fairly safe bet). If you had a Years table there are simpler solutions. If the data is current there are also solutions that utilize DATEDIFF.
You can use correlated subqueries with EXISTS or NOT EXISTS respectively.
In one check if the maximum year minus the minimum year plus one is equal to the count of records with a defined year of an album. That way you make sure you don't get albums where there are figures missing for a year and you therefore cannot tell whether they sold 10000 or more or not. Also check that the maximum year is the current year not to miss gaps between the maximum year and the current year. (In the example code I will use the literal 2020 but there are means to get that dynamically. They depend on the DBMS however and you didn't state which one you're using.)
In the second one check that there's no record with undefined sales figures or sales figures lower than 10000 for the album. If no such record exists, all of the existing one have to have figures of 10000 or greater.
SELECT a1.code,
a1.title
FROM album a1
WHERE EXISTS (SELECT ''
FROM sales s1
WHERE s1.album = a1.code
HAVING max(s1.year) - min(s1.year) + 1 = count(s1.year)
AND max(s1.year) = 2020)
AND NOT EXISTS (SELECT *
FROM sales s2
WHERE s2.album = a1.code
AND s2.copiessold IS NULL
OR s2.copiessold < 10000);
I think the ALL keyword should work nicely here. Something like this:
SELECT * FROM Album
WHERE 10000 <= ALL (
SELECT CopiesSold FROM Sales
WHERE Sales.Album = Album.Code)

SQL Nested Query with distinct count

I have a dilemma, and I'm hoping someone will be able to help me out. I am attempting to work on some made up problems from an old text book of mine, this isn't a question from the book, but the data is, I just wanted to see if I could still work in SQL, so here goes. When this code is executed,
SELECT COUNT(code_description) "Number of Different Crimes", last, first,
code_description
FROM
(
SELECT criminal_id, last, first, crime_code, code_description
FROM criminals
JOIN crimes USING (criminal_id)
JOIN crime_charges USING (crime_id)
JOIN crime_codes USING (crime_code)
ORDER BY criminal_id
)
WHERE criminal_id = 1020
GROUP BY last, first, code_description;
I am provided with these results:
Number of Different Crimes LAST FIRST CODE_DESCRIPTION
1 Phelps Sam Agg Assault
1 Phelps Sam Drug Offense
Inevitably, I would like the number of different crimes to be 2 for each line since this criminal has two unique crimes charged to him. I would like it to be displayed something like:
Number of Different Crimes LAST FIRST CODE_DESCRIPTION
2 Phelps Sam Agg Assault
2 Phelps Sam Drug Offense
Not to push my luck but I would also like to get rid of the follow line also:
WHERE criminal_id = 1020
to something a little more elegant to represent any criminal with more than 1 crime type associated with them, for this case, Sam Phelps is the only one in this data set.
As #sgeddes said in a comment, you can use an analytic count, which doesn't need a subquery if you're specifying the criminal ID:
SELECT COUNT(code_description) OVER (PARTITION BY first, last) AS "Number of Different Crimes",
last, first, code_description
FROM criminals
JOIN crimes USING (criminal_id)
JOIN crime_charges USING (crime_id)
JOIN crime_codes USING (crime_code)
WHERE criminal_id = 1020;
If you want to look for anyone with multiple crimes then you do need a subquery so you can filter on the analytic result:
SELECT charge_count AS "Number of Different Crimes",
last, first, code_description
FROM (
SELECT COUNT(DISTINCT code_description) OVER (PARTITION BY first, last) AS charge_count,
criminal_id, last, first, code_description
FROM criminals
JOIN crimes USING (criminal_id)
JOIN crime_charges USING (crime_id)
JOIN crime_codes USING (crime_code)
)
WHERE charge_count > 1
ORDER BY criminal_id, code_description;
SQL Fiddle demo.
If the charges are across multiple crimes, but duplicated, then the distinct count still works, but you might want to make add a distinct to the overall result set - unless you want to show other crime-specific info - otherwise you get something like this.

How do I use the MAX function over three tables?

So, I have a problem with a SQL Query.
It's about getting weather data for German cities. I have 4 tables: staedte (the cities with primary key loc_id), gehoert_zu (contains the city-key and the key of the weather station that is closest to this city (stations_id)), wettermessung (contains all the weather information and the station's key value) and wetterstation (contains the stations key and location). And I'm using PostgreSQL
Here is how the tables look like:
wetterstation
s_id[PK] standort lon lat hoehe
----------------------------------------
10224 Bremen 53.05 8.8 4
wettermessung
stations_id[PK] datum[PK] max_temp_2m ......
----------------------------------------------------
10224 2013-3-24 -0.4
staedte
loc_id[PK] name lat lon
-------------------------------
15 Asch 48.4 9.8
gehoert_zu
loc_id[PK] stations_id[PK]
-----------------------------
15 10224
What I'm trying to do is to get the name of the city with the (for example) highest temperature at a specified date (could be a whole month, or a day). Since the weather data is bound to a station, I actually need to get the station's ID and then just choose one of the corresponding to this station cities. A possible question would be: "In which city was it hottest in June ?" and, say, the highest measured temperature was in station number 10224. As a result I want to get the city Asch. What I got so far is this
SELECT name, MAX (max_temp_2m)
FROM wettermessung, staedte, gehoert_zu
WHERE wettermessung.stations_id = gehoert_zu.stations_id
AND gehoert_zu.loc_id = staedte.loc_id
AND wettermessung.datum BETWEEN '2012-8-1' AND '2012-12-1'
GROUP BY name
ORDER BY MAX (max_temp_2m) DESC
LIMIT 1
There are two problems with the results:
1) it's taking waaaay too long. The tables are not that big (cities has about 70k entries), but it needs between 1 and 7 minutes to get things done (depending on the time span)
2) it ALWAYS produces the same city and I'm pretty sure it's not the right one either.
I hope I managed to explain my problem clearly enough and I'd be happy for any kind of help. Thanks in advance ! :D
If you want to get the max temperature per city use this statement:
SELECT * FROM (
SELECT gz.loc_id, MAX(max_temp_2m) as temperature
FROM wettermessung as wm
INNER JOIN gehoert_zu as gz
ON wm.stations_id = gz.stations_id
WHERE wm.datum BETWEEN '2012-8-1' AND '2012-12-1'
GROUP BY gz.loc_id) as subselect
INNER JOIN staedte as std
ON std.loc_id = subselect.loc_id
ORDER BY subselect.temperature DESC
Use this statement to get the city with the highest temperature (only 1 city):
SELECT * FROM(
SELECT name, MAX(max_temp_2m) as temp
FROM wettermessung as wm
INNER JOIN gehoert_zu as gz
ON wm.stations_id = gz.stations_id
INNER JOIN staedte as std
ON gz.loc_id = std.loc_id
WHERE wm.datum BETWEEN '2012-8-1' AND '2012-12-1'
GROUP BY name
ORDER BY MAX(max_temp_2m) DESC
LIMIT 1) as subselect
ORDER BY temp desc
LIMIT 1
For performance reasons always use explicit joins as LEFT, RIGHT, INNER JOIN and avoid to use joins with separated table name, so your sql serevr has not to guess your table references.
This is a general example of how to get the item with the highest, lowest, biggest, smallest, whatever value. You can adjust it to your particular situation.
select fred, barney, wilma
from bedrock join
(select fred, max(dino) maxdino
from bedrock
where whatever
group by fred ) flinstone on bedrock.fred = flinstone.fred
where dino = maxdino
and other conditions
I propose you use a consistent naming convention. Singular terms for tables holding a single item per row is a good convention. You only table breaking this is staedte. Should be stadt.
And I suggest to use station_id consistently instead of either s_id and stations_id.
Building on these premises, for your question:
... get the name of the city with the ... highest temperature at a specified date
SELECT s.name, w.max_temp_2m
FROM (
SELECT station_id, max_temp_2m
FROM wettermessung
WHERE datum >= '2012-8-1'::date
AND datum < '2012-12-1'::date -- exclude upper border
ORDER BY max_temp_2m DESC, station_id -- id as tie breaker
LIMIT 1
) w
JOIN gehoert_zu g USING (station_id) -- assuming normalized names
JOIN stadt s USING (loc_id)
Use explicit JOIN conditions for better readability and maintenance.
Use table aliases to simplify your query.
Use x >= a AND x < b to include the lower border and exclude the upper border, which is the common use case.
Aggregate first and pick your station with the highest temperature, before you join to the other tables to retrieve the city name. Much simpler and faster.
You did not specify what to do when multiple "wettermessungen" tie on max_temp_2m in the given time frame. I added station_id as tiebreaker, meaning the station with the lowest id will be picked consistently if there are multiple qualifying stations.

Using SUM() with multiple tables in sql

I am trying to write a query that will use the sum function to add up all values in 1 column then divide by the count of tuples in another table. For some reason when i run the sum query by itself i get the correct number back but when i use it in my query below the value is wrong.
this is what im trying to do but the numbers are coming out wrong.
select (sum(adonated) / count(p.pid)) as "Amount donated per Child"
from tsponsors s, player p;
I found out the issue is in the sum. below returns 650,000 when it should return 25000
select (sum(adonated)) as "Amount donated per Child"
from tsponsors s, player p;
if i remove the from player p it gets the correct amount. However i need the player table to get the number of players.
I have 3 tables that are related to this query.
player(pid, tid(fk))
team(tid)
tsponsors(tid(fk), adonated, sid(fk)) this is a joining table
what i want to get is the sum of all the amounts donated to each team sum(adonated) and divide this by the number of players in the database count(pid).
I guess your sponsors are giving amounts to teams. You then want to know the proportion of donations per child in the sponsored team.
You would then need something like this:
SELECT p.tid,(SUM(COALESCE(s.adonated,0)) / COUNT(p.pid)) AS "Amount donated per Child"
FROM player p
LEFT OUTER JOIN tsponsors s ON s.tid=p.tid
GROUP BY p.tid
I also used a LEFT OUTER JOIN in order to show 0$ if a team has no sponsors.
Try
select sum(s.adonated) / (SELECT count(p.pid) FROM player p)
as "Amount donated per Child"
from tsponsors s;
Your original query joins 2 tables without any condition, which results in cross join.
UPDATE
SELECT ts.tid, SUM(ts.adonated),num_plyr
FROM tsponsors ts
INNER JOIN
(
SELECT tid, COUNT(pid) as num_plyr
FROM player
GROUP BY tid
)a ON (a.tid = ts.tid)
GROUP BY ts.tid,num_plyr