Rails - Order by the average of an association - sql

I am struggling to add a sorting feature in my fake imdb copycat website.
I have movies that have many reviews through an association. Unfortunately I can't find how to order the #movies object by review average for the index page.
Obviously, this
includes(:reviews).order("reviews.rating_out_of_ten desc")
doesn't work has it only order by the first review of each movie.
How can I write it so that it order by the average of all associated reviews ??
Many thanks in advance

In order to sort by the average movie score, join to the reviews table and use the avg function in the order by clause i.e.
Movie
.select('movie_id, movie_name, avg(reviews.rating_out_of_ten)')
.join(:reviews)
.group('movie_id, movie_name')
.order('avg(reviews.rating_out_of_ten) desc')

Related

Find the movie with the largest cast, out of the list of movies that have a review

Can someone please assist me on how to write a query that will return the following:
"Find the movie with the largest cast, out of the list of movies that have a review."
OUTPUT: movie_title, number_of_cast_members
using this database https://neo4j.com/developer/movie-database/
This needs to be written in Cypher.
MATCH (:Actor)-[:ACTS_IN]->(m:Movie)<-[:RATED]-()
with m, count(*) AS actor_count order by actor_count desc
return m.title, actor_count limit 1
MATCH the pattern you search (actors that acts in a movie that have a rating)
Use an aggregation function to count the numbers of actors (count(*)) and group them by Movie (with m)
ORDER the result by the count descendingly desc
Return the title and the count of the first item limit 1. As the list is ordered with the most largest cast first, limiting the result to the first item gives the largest cast.
Note: if two movies have the same size of casting, only one is returned.

STDEV to measure movie ratings

I'm looking at movie ratings across a friend group and I'm trying to find the best way to measure how people rate movies compared to IMDb ratings.
Here you can see my table titled fRating, which is a fact table that contains the MovieID, the RaterID, and their UserRating:
This joins with dMovies, which is the dimension table for all of the movies. That table is shown below:
Now, how would I go about finding how much each rater varies from the IMDb ratings? Here is my attempt, using STDDEV:
SELECT a.[RaterID]
,Rater
,STDEV(([UserRating] - ROUND(b.IMDb_Rating,0))) as StdDev
FROM [IMDbRatings].[dbo].[fRating] a
JOIN [IMDbRatings].dbo.dMovies b ON a.MovieID = b.MovieID
JOIN [IMDbRatings].dbo.dRaterID c ON a.RaterID = c.RaterID
GROUP BY a.RaterID, Rater
Here are the results that I get from that:
Am I on the right track? Is this doing what I want it to do?
Thank you for your wisdom!

How to find top-X highest values in column using Django Queryset without cutting off ties at the bottom?

I have the following Django Model:
class myModel(models.Model):
name = models.CharField(max_length=255, unique=True)
score = models.FloatField()
There are thousands of values in the DB for this model. I would like to efficiently and elegantly use that QuerySets alone to get the top-ten highest scores and display the names with their scores in descending order of score. So far it is relatively easy.
Here is where the wrinkle is: If there are multiple myModels who are tied for tenth place, I want to show them all. I don't want to only see some of them. That would unduly give some names an arbitrary advantage over others. If absolutely necessary, I can do some post-DB list processing outside of Querysets. However, the main problem I see is that there is no way I can know apriori to limit my DB query to the top 10 elements since for all I know there may be a million records all tied for tenth place.
Do I need to get all the myModels sorted by score and then do one pass over them to calculate the score-threshold? And then use that calculated score-threshold as a filter in another Queryset?
If I wanted to write this in straight-SQL could I even do it in a single query?
Of course you can do it in one SQL query. Generating this query using django ORM is also easily achievable.
top_scores = (myModel.objects
.order_by('-score')
.values_list('score', flat=True)
.distinct())
top_records = (myModel.objects
.order_by('-score')
.filter(score__in=top_scores[:10]))
This should generate single SQL query (with subquery).
As an alternative, you can also do it with two SQL queries, what might be faster with some databases than the single SQL query approach (IN operation is usually more expensive than comparing):
myModel.objects.filter(
score__gte=myModel.objects.order_by('-score')[9].score
)
Also while doing this, you should really have an index on score field (especially when talking about millions of records):
class myModel(models.Model):
name = models.CharField(max_length=255, unique=True)
score = models.FloatField(db_index=True)
As an alternative to #KrzysztofSzularz's answer, you can use raw sql to do this too.
There are 2 SQL operations to get what you want
SELECT score from my_application_mymodel order by score desc limit 10;
Above sql will return top 10 scores (limit does this)
SELECT name, score from my_application_mymodel where score in (***) order by score desc;
That will return you all the results whom score value is within the first query result.
SELECT name, score from my_application_mymodel where score in (SELECT score from my_application_mymodel order by score desc limit 10) order by score desc;
You can use Raw Queries, but probably you will got error messages while you try to run this. So using custom queries is the best
from django.db import connection
cursor = connection.cursor()
cursor.execute("SELECT name, score from my_application_mymodel where score in (SELECT score from my_application_mymodel order by score desc limit 10) order by score desc;")
return cursor.fetchall()
That will return you a list:
[("somename", 100.9),
("someothername", 99.9)
...
]
Django names tables according to your django model name (all lowercase) and application name in which your model lives under and join these to with an underscore. Like my_application_mymodel

Using distinct and then an aggregate function in Postgresql?

This is a pretty basic problem and for whatever reason I can't find a reasonable solution. I'll do my best to explain.
Say you have an event ticket (section, row, seat #). Each ticket belongs to an attendee. Multiple tickets can belong to the same attendee. Each attendee has a worth (ex: Attendee #1 is worth $10,000). That said, here's what I want to do:
1. Group the tickets by their section
2. Get number of tickets (count)
3. Get total worth of the attendees in that section
Here's where I'm having problems: If Attendee #1 is worth $10,000 and is using 4 tickets, sum(attendees.worth) is returning $40,000. Which is not accurate. The worth should be $10,000. Yet when I make the result distinct on the attendee, the count is not accurate. In an ideal world it'd be nice to do something like
select
tickets.section,
count(tickets.*) as count,
sum(DISTINCT ON (attendees.id) attendees.worth) as total_worth
from
tickets
INNER JOIN
attendees ON attendees.id = tickets.attendee_id
GROUP BY tickets.section
Obviously this query doesn't work. How can I accomplish this same thing in a single query? OR is it even possible? I'd prefer to stay away from sub queries too because this is part of a much larger solution where I would need to do this across multiple tables.
Also, the worth should follow the ticket divided evenly. Ex: $10,000 / 4. Each ticket has an attendee worth of $5,000. So if the tickets are in different sections, they take their prorated worth with them.
Thanks for your help.
You need to aggregate the tickets before the attendees:
select ta.section, sum(ta.numtickets) as count, sum(a.worth) as total_worth
from (select attendee_id, section, count(*) as numtickets
from tickets
group by attendee_id, section
) ta INNER JOIN
attendees a
ON a.id = ta.attendee_id
GROUP BY ta.section
You still have a problem of a single attendee having seats in multiple sections. However, you do not specify how to solve that (apportion the worth? randomly choose one section? attribute it to all sections? canonically choose a section?)
Using jsonb_object_agg:
select
tickets.section,
count(tickets.*) as count,
(
SELECT SUM(value::int4)
FROM jsonb_each_text(jsonb_object_agg(attendees.id, attendees.worth))
) as total_worth
from
tickets
INNER JOIN
attendees ON attendees.id = tickets.attendee_id
GROUP BY tickets.section

Help with SQL aggregate functions

I've been learning SQL for about a day now and I've run into a road bump. Please help me with the following questions:
STUDENT (**StudentNumber**, StudentName, TutorialNumber)
TUTORIAL (**TutorialNumber**, Day, Time, Room, TutorInCharge)
ASSESSMENT (**AssessmentNumber**, AssessmentTitle, MarkOutOf)
MARK (**AssessmentNumber**, **StudentNumber**, RawMark)
PK and FK are identified within "**". I need to generate queries that:
1) List of assessment tasks results showing: Assessment Number, Assessment Title, and average Raw Mark. I know how to use the avg function for a single column, but to display something for multiple columns... a little unsure here.
My attempt:
SELECT RawMark, AssessmentNumber, AsessmentTitle
FROM MARK, ASSESSMENT
WHERE RawMark = (SELECT (RawMark) FROM MARK)
AND MARK.AssessmentNumber = ASSESSMENT.AssessmentNumber;
2) Report on tutorial enrollment showing: Tutorial Number, Day, Room, Tutor in Charge and number of students enrolled. Same as the avg function, now for the count function. Would this require 2 queries?
3) List each student's Raw Mark in each of the assessment tasks showing: Assessment Number, Assessment Title, Student Number, Student Name, Raw Mark, Tutor in Charge and Time. Sort on Tutor in Charge, Day and Time.
Here is an example for the first one, just take the logic and see if you can expand it to the other questions. I find that these things can be hard to lear if you can't find any solid examples but once you get the hang of it you'll sort it out pretty quick.
1)
SELECT a.AssessmentNumber, a.AssessmentTitle, AVG(RawMark)
FROM ASSESSMENT a LEFT JOIN MARK m ON a.AssessmentNumber = m.AssessmentNumber
GROUP BY a.AssessmentNumber, a.AssessmentTitle
OR not using a left join or alias table names
SELECT ASSESSMENT.AssessmentNumber, ASSESSMENT.AssessmentTitle, AVG(RawMark)
FROM ASSESSMENT,MARK
WHERE ASSESSMENT.AssessmentNumber = MARK.AssessmentNumber
GROUP BY ASSESSMENT.AssessmentNumber, ASSESSMENT.AssessmentTitle