I'm looking at movie ratings across a friend group and I'm trying to find the best way to measure how people rate movies compared to IMDb ratings.
Here you can see my table titled fRating, which is a fact table that contains the MovieID, the RaterID, and their UserRating:
This joins with dMovies, which is the dimension table for all of the movies. That table is shown below:
Now, how would I go about finding how much each rater varies from the IMDb ratings? Here is my attempt, using STDDEV:
SELECT a.[RaterID]
,Rater
,STDEV(([UserRating] - ROUND(b.IMDb_Rating,0))) as StdDev
FROM [IMDbRatings].[dbo].[fRating] a
JOIN [IMDbRatings].dbo.dMovies b ON a.MovieID = b.MovieID
JOIN [IMDbRatings].dbo.dRaterID c ON a.RaterID = c.RaterID
GROUP BY a.RaterID, Rater
Here are the results that I get from that:
Am I on the right track? Is this doing what I want it to do?
Thank you for your wisdom!
Related
I'm building a simple project database that holds data about foods and recipes. Recipes exist in the recipe header table that stores one row per recipe with basic overview and the recipe detail table stores rows for all ingredients that make up a recipe. What I want to achieve is a calculated column that shows the total number of calories per recipe.
I can get the total number of calories using the following query:
SELECT SUM(calories) FROM products pd
INNER JOIN recipe_detail rd ON pd.id = rd.product_id
WHERE rd.recipe_id = 1
This will get me the total calories for the first recipe. The problem is, I store the nutritional data in the products table for a specific number of grams (Usually 100g) whereas the recipe might need only 50g of a product. How do I adjust this query so that I would get a fraction of the calories I need?
I basically need to take the gram value of a product that belongs in a recipe, divide it by the gram value in the products table to get the ratio and then multiply that by the number of calories to get the answer.
Not sure about the names and types of your columns or how your information is stored exactly, but I think this might help:
SELECT r.*, SUM((p.calories * rd.weight) / p.weight::numeric) AS calories -- depending on the types of your columns, you can omit the cast; I added it to be sure it is not integer division
FROM recipe r
INNER JOIN recipe_detail rd ON rd.recipe_id = r.id
INNER JOIN porducts p ON p.id = rd.product_id
GROUP BY r.id
Hello i've got 3 different databases which list some different stuff about books. Im looking to find a way to write a SQL function which gives me the sum of books selled in every subsidiary(branch) and only the ones where the total sum is bigger than 2700 + the genre has to be romance. Ive easily figured out the function :
SELECT name, SUM(num_sell) as 'numSum'
FROM subsidiary,subsidiarybook,book
WHERE genre= "Romance"
group by name
As you guys cant see all of the datastuff thats going around there, I just want to ask why im always only getting the different names of the branches with the exact same total output. I guess it just takes the total of the first one and not the actual sum belonging to the different names.
Output looks like following :
Amazonhive 5163400
Celestial Library 5163400
Cloudcast 5163400
Cosmic Library 5163400
Globalworld 5163400
Imaginetworks 5163400
Leopardworks 5163400
Quick Rooster Media 5163400
Radiantelligence 5163400
Royal Library 5163400
Sphinx Brews 5163400
Spring Harbor ... 5163400
Surge Records 5163400
Tiny Mermaid Arts 5163400
Triumphoods 5163400
Tucan Productions 5163400
These values should be different since the sum of these isnt the same for sure. Would appreciate the help. Also how can i check if the sum is > 2700 ? putting a " AND num_sell >2700" behind the where as it would work normally isnt working in this case.
There is much to learn from this query.
FROM subsidiary, subsidiarybook, book
Do NOT use commas between table names in the FROM clause. What this does is MULTIPLY every row of subsidiary by every row of subsidiarybook and then MULTIPLY that result by every row in book. This is why the sum(num_sell) is a constant of 5,163,400 in your current result.
Your FROM clause should look more like this:
FROM subsidiary
INNER JOIN subsidiarybook ON subsidiary.id = subsidiarybook.subsidiary_id
INNER JOIN book ON sb.book_id = book.id
Once you have corrected the joins (nb the joins shown above are done in ignorance of the actual column names of your tables!) then you should start to see better results in the sum(num_sell). To filter the result of any aggregation use a HAVING clause.
SELECT book.name, SUM(num_sell) as 'numSum'
FROM subsidiary
INNER JOIN subsidiarybook ON subsidiary.id = subsidiarybook.subsidiary_id
INNER JOIN book ON sb.book_id = book.id
GROUP BY book.name
HAVING SUM(num_sell) > 2700
Now, there may be more to learn. For example subsidiary does not appear to be necessary for the final result and maybe it can be simplfied to:
SELECT book.name, SUM(num_sell) as 'numSum'
FROM subsidiarybook
INNER JOIN book ON sb.book_id = book.id
GROUP BY book.name
HAVING SUM(num_sell) > 2700
But to know exactly how to help further we need to know much more about your tables.
eg. we do not know which table num_sell actually comes from
you are join the table with cross join thats why sum is always same.
please lean about SQL joins
Definition of Cross Join
The SQL CROSS JOIN produces a result set which is the number of rows in the first table multiplied by the number of rows in the second table.
The HAVING clause will filter the rows after the GROUP BY. Reference
Add HAVING SUM(num_sell) > 2700 after the GROUP BY to achieve the required filtering.
An understanding of the logical processing order of SELECT statements is a valuable tool. Reference
I'm a little stuck on some work here for my final high school assessment.
The task is essentially to make a database handling movie information.
Tables:
Movie Table, Movie review table
My task is to display the movies along with their reviews then calculate and display an average overall rating.
Visually it should look like this:
Example:
"Movie Title", "Movie Review Rating", "Average Rating"
Titanic , 9 , 8.6
Titanic , 8 , 8.6
Titanic , 9 , 8.6
Star Wars , 4 , 5
Star Wars , 5 , 5
Star Wars , 6 , 5
Matrix , 6 , 8.3
Matrix , 10 , 8.3
Matrix , 9 , 8.3
How would I go about calculating the average for each individual movie?
SELECT
`Movie name`,
`Overall Rating`
AVG(`Average Rating`)
FROM
Movie
INNER JOIN `Movie review`
ON Movie.`Movie Ref` = `Movie review`.`Movie Ref`;
One option uses domain aggregate function:
SELECT MovieRatings.MovieTitle, MovieRatings.MovieReviewRating,
Round(DAvg("MovieReviewRating","MovieRatings","MovieTitle='" &
[MovieTitle] & "'"),1) AS AverageRating FROM MovieRatings;
Or build an aggregate query (open Access query builder and click Totals button on ribbon):
SELECT MovieRatings.MovieTitle, Avg(MovieRatings.MovieReviewRating) AS
AvgOfMovieReviewRating FROM MovieRatings GROUP BY
MovieRatings.MovieTitle;
Then join that query to the table:
SELECT MovieRatings.MovieTitle, MovieRatings.MovieReviewRating,
Round([AvgOfMovieReviewRating],1) AS AverageRating FROM Query1 INNER
JOIN MovieRatings ON Query1.MovieTitle = MovieRatings.MovieTitle;
Here are the 2 queries as all-in-one nested:
SELECT MovieRatings.MovieTitle, MovieRatings.MovieReviewRating,
Round([AvgOfMovieReviewRating],1) AS AverageRating FROM (SELECT
MovieRatings.MovieTitle, Avg(MovieRatings.MovieReviewRating) AS
AvgOfMovieReviewRating FROM MovieRatings GROUP BY
MovieRatings.MovieTitle) AS Query1 INNER JOIN MovieRatings ON
Query1.MovieTitle = MovieRatings.MovieTitle;
The nested query created by first building and saving the aggregate query. Then build second query that references Query1, switch to SQLView and copy/paste the SQL statement of Query1 into second query and save. Can now delete Query1 object.
Grouping and Joining on MovieTitle field but really should be using numeric primary and foreign keys.
Be aware Round() function uses even/odd rule: 9.45 rounds to 9.4 and 9.35 rounds to 9.4.
Advise not to use spaces in naming convention.
You may have misinterpreted the question or are guessing at the data structure. You wouldn't know the averages in advance. If your professor gave you the data with pre-aggregated averages then they didn't teach you correctly because that is not how data is put into a database. i.e. how would you know what to put into the average column if you put the first value in and there are 50,000 values to insert? You would have to wait until all 50,000 values get inserted- then take the average. Then update 50,000 rows of data with this average.
The Movie Review table most likely is a table with two columns- "Movie Title" and "Movie Review Rating". Where each row in the table represents a review. Taking an average would come later after inserting rows into the table with the movie review data.
After data is inserted, an average would be taken similar to:
SELECT MovieTitle, AVG(MovieReviewRating) AS AverageRating
FROM MovieReview
GROUP BY MovieTitle
After confirming the averages, you can join with "Movie Table" for it's columns.
I am struggling to add a sorting feature in my fake imdb copycat website.
I have movies that have many reviews through an association. Unfortunately I can't find how to order the #movies object by review average for the index page.
Obviously, this
includes(:reviews).order("reviews.rating_out_of_ten desc")
doesn't work has it only order by the first review of each movie.
How can I write it so that it order by the average of all associated reviews ??
Many thanks in advance
In order to sort by the average movie score, join to the reviews table and use the avg function in the order by clause i.e.
Movie
.select('movie_id, movie_name, avg(reviews.rating_out_of_ten)')
.join(:reviews)
.group('movie_id, movie_name')
.order('avg(reviews.rating_out_of_ten) desc')
I've been learning SQL for about a day now and I've run into a road bump. Please help me with the following questions:
STUDENT (**StudentNumber**, StudentName, TutorialNumber)
TUTORIAL (**TutorialNumber**, Day, Time, Room, TutorInCharge)
ASSESSMENT (**AssessmentNumber**, AssessmentTitle, MarkOutOf)
MARK (**AssessmentNumber**, **StudentNumber**, RawMark)
PK and FK are identified within "**". I need to generate queries that:
1) List of assessment tasks results showing: Assessment Number, Assessment Title, and average Raw Mark. I know how to use the avg function for a single column, but to display something for multiple columns... a little unsure here.
My attempt:
SELECT RawMark, AssessmentNumber, AsessmentTitle
FROM MARK, ASSESSMENT
WHERE RawMark = (SELECT (RawMark) FROM MARK)
AND MARK.AssessmentNumber = ASSESSMENT.AssessmentNumber;
2) Report on tutorial enrollment showing: Tutorial Number, Day, Room, Tutor in Charge and number of students enrolled. Same as the avg function, now for the count function. Would this require 2 queries?
3) List each student's Raw Mark in each of the assessment tasks showing: Assessment Number, Assessment Title, Student Number, Student Name, Raw Mark, Tutor in Charge and Time. Sort on Tutor in Charge, Day and Time.
Here is an example for the first one, just take the logic and see if you can expand it to the other questions. I find that these things can be hard to lear if you can't find any solid examples but once you get the hang of it you'll sort it out pretty quick.
1)
SELECT a.AssessmentNumber, a.AssessmentTitle, AVG(RawMark)
FROM ASSESSMENT a LEFT JOIN MARK m ON a.AssessmentNumber = m.AssessmentNumber
GROUP BY a.AssessmentNumber, a.AssessmentTitle
OR not using a left join or alias table names
SELECT ASSESSMENT.AssessmentNumber, ASSESSMENT.AssessmentTitle, AVG(RawMark)
FROM ASSESSMENT,MARK
WHERE ASSESSMENT.AssessmentNumber = MARK.AssessmentNumber
GROUP BY ASSESSMENT.AssessmentNumber, ASSESSMENT.AssessmentTitle