Combining Active Record group, join, maximum & minimum - sql

I'm trying to get to grips with the Active Record query interface. I have two models:
class Movie < ActiveRecord::Base
has_many :datapoints
attr_accessible :genre
end
class Datapoint < ActiveRecord::Base
belongs_to :movie
attr_accessible :cumulative_downloads, :timestamp
end
I want to find the incremental downloads per genre for a given time period.
So far I've managed to get the maximum and minimum downloads per movie within a time period, like so:
maximums = Datapoint.joins(:movie)
.where(["datapoints.timestamp > ?", Date.today - #timespan])
.group('datatpoints.movie_id')
.maximum(:cumulative_downloads)
This then allows me to calculate the incremental per movie, before aggregating this into the incremental per genre.
Clearly this is a bit ham-fisted, and I'm sure it would be possible to do this in one step (and using hash conditions). I just can't get my head around how. Can you help?
Much appreciated!
Derek.

I think this will allow you to calculate maximum per genre:
Movie.joins(:datapoints).where(datapoints: {timestamp: (Time.now)..(Time.now+1.year)}).group(:genre).maximum(:cumulative_downloads)
Edit 1
You can get the diffs in a couple of steps:
rel = Movie.joins(:datapoints).where(datapoints: {timestamp: (Time.now)..(Time.now+1.year)}).group(:genre)
mins = rel.minimum(:cumulative_downloads)
maxs = rel.maximum(:cumulative_downloads)
res = {}
maxs.each{|k,v| res[k] = v-mins[k]}
Edit 2
Your initial direction was almost there. All you have to do is calculate the diff per movie in the SQL and stage the data so you can collect it with one pass. I'm sure there's a way to do it all in SQL, but I'm not sure it will be as simple.
# get the genre and diff per movie
result = Movie.select('movies.genre, MAX(datapoints.cumulative_downloads)-MIN(datapoints.cumulative_downloads) as diff').joins(:datapoints).group(:movie_id)
# sum the diffs per genre
per_genre = Hash.new(0)
result.each{|m| per_genre[m.genre] += m.diff}
Edit 3
Including the movie_id in the select and the genre in the group:
# get the genre and diff per movie
result = Movie
.select('movies.movie_id, movies.genre, MAX(datapoints.cumulative_downloads)-MIN(datapoints.cumulative_downloads) as diff')
.joins(:datapoints)
.group('movies.movie_id, movies.genre')
# sum the diffs per genre
per_genre = Hash.new(0)
result.each{|m| per_genre[m.genre] += m.diff}

Related

Select oldest HABTM record with group by clause

I want to show a line chart on the admin page (with chartkick) with the incremental number of scores related to their earliest export date.
I have the following models:
# score.rb
class Score < ApplicationRecord
has_and_belongs_to_many :export_orders, join_table: :scores_export_orders
end
# export_order.rb
class ExportOrder < ApplicationRecord
has_and_belongs_to_many :scores, join_table: :scores_export_orders
end
How do I select, for each Score having at least one ExportOrder, the corresponding ExportOrder with the earliest created_at (in date only format)?
I had a look at this, but my situation has a HABTM relationship instead of a simple has_many.
I tried this code, to get at least a mapping between oldest export date and number of scores:
sql = "
SELECT
COUNT(DISTINCT scores.id), MIN(export_orders.created_at::date)
FROM
scores
INNER JOIN
scores_export_orders
ON
scores.id = scores_export_orders.score_id
INNER JOIN
export_orders
ON
export_orders.id = scores_export_orders.export_order_id
GROUP BY
export_orders.created_at::date
".split("\n").join(' ')
query = ActiveRecord::Base.connection.execute(sql)
query.map { |v| [v['count'], v['min']] }
but the total number of scores is greater than all scores having an export date.
Any ideas?
Try:
class Score < ApplicationRecord
has_and_belongs_to_many :export_orders, join_table: :scores_export_orders
def earliest_export_date
export_orders.pluck(&:created_at).min
end
end
This will let you call #score.earliest_export_date, which should return the value you want.
I also think it's the most performant way to do it in ruby, although someone may correct me on that.
The following has better performance than Mark's solution since it relies on pure SQL. Basically, the GROUP BY clause required grouping by scores_export_orders.score_id rather than export_orders.created_at:
sql = "
SELECT
COUNT(DISTINCT scores_export_orders.score_id), MIN(export_orders.created_at::date)
INNER JOIN
scores_export_orders
INNER JOIN
export_orders
ON
export_orders.id = scores_export_orders.export_order_id
GROUP BY
scores_export_orders.score_id
".split("\n").join(' ')
query = ActiveRecord::Base.connection.execute(sql)
query.map { |v| [v['count'], v['min']] }
I couldn't find an exact equivalent in ActiveRecord instructions (all of such attempts were giving me strange results), so executing the SQL will also do the trick.

rails return list of users with average rating above 5

I have 2 models User and Rating as follows:
class User < ActiveRecord::Base
has_many :ratings
end
class Rating < ActiveRecord::Base
belongs_to :user
end
Each user receives multiple ratings from 1 - 10. I want to return all users with an average rating of > 5. I've got this so far...
User.joins(:ratings).where('rating > ?', 5)
But that code returns all Users with any rating above 5. I want Users with an Average rating above 5.
I've seen other posts like this and that are asking similar questions, but I'm having a brainfart today, and can't simulate their question into an appropriate answer.
If you're looking at all users, why join first?
#avg = Ratings.group(:user_id).average("rating") #returns hash which contains arrays
#avg.each do |avg| #0 is user_id, 1 is value
puts avg[0] if avg[1] > 5
end
You need to defined method average for user rating.
Check link below this is good example of moving float to average.
How do I create an average from a Ruby array?
Hope this helps someone in the future. This will find the average rating of each user through the ratings table, and return all users with an average rating above 5.
User.joins(:ratings).merge(Rating.group(:user_id).having('AVG(rating) > 5'))
.having was my missing link. More examples of .having here and here

Issues with DISTINCT when used in conjunction with ORDER

I am trying to construct a site which ranks performances for a selection of athletes in a particular event - I have previously posted a question which received a few good responses which me to identify the key problem with my code currently.
I have 2 models - Athlete and Result (Athlete HAS MANY Results)
Each athlete can have a number of recorded times for a particular event, i want to identify the quickest time for each athlete and rank these quickest times across all athletes.
I use the following code:
<% #filtered_names = Result.where(:event_name => params[:justevent]).joins(:athlete).order('performance_time_hours ASC').order('performance_time_mins ASC').order('performance_time_secs ASC').order('performance_time_msecs ASC') %>
This successfully ranks ALL the results across ALL athletes for the event (i.e. one athlete can appear a number of times in different places depending on the times they have recorded).
I now wish to just pull out the best result for each athlete and include them in the rankings. I can select the time corresponding to the best result using:
<% #currentathleteperformance = Result.where(:event_name => params[:justevent]).where(:athlete_id => filtered_name.athlete_id).order('performance_time_hours ASC').order('performance_time_mins ASC').order('performance_time_secs ASC').order('performance_time_msecs ASC').first() %>
However, my problem comes when I try to identify the distinct athlete names listed in #filtered_names. I tried using <% #filtered_names = #filtered_names.select('distinct athlete_id') %> but this doesn't behave how I expected it to and on occasions it gets the rankings in the wrong order.
I have discovered that as it stands my code essentially looks for a difference between the distinct athlete results, starting with the hours time and progressing through to mins, secs and msec. As soon as it has found a difference between a result for each of the distinct athletes it orders them accordingly.
For example, if I have 2 athletes:
Time for Athlete 1 = 0:0:10:5
Time for Athlete 2 = 0:0:10:3
This will yield the order, Athlete 2, Athlete1
However, if i have:
Time for Athlete 1 = 0:0:10:5
Time for Athlete 2 = 0:0:10:3
Time for Athlete 2 = 0:1:11:5
Then the order is given as Athlete 1, Athlete 2 as the first difference is in the mins digit and Athlete 2 is slower...
Can anyone suggest a way to get around this problem and essentially go down the entries in #filtered_names pulling out each name the first time it appears (i.e. keeping the names in the order they first appear in #filtered_names
Thanks for your time
If you're on Ruby 1.9.2+, you can use Array#uniq and pass a block specifying how to determine uniqueness. For example:
#unique_results = #filtered_names.uniq { |result| result.athlete_id }
That should return only one result per athlete, and that one result should be the first in the array, which in turn will be the quickest time since you've already ordered the results.
One caveat: #filtered_names might still be an ActiveRecord::Relation, which has its own #uniq method. You may first need to call #all to return an Array of the results:
#unique_results = #filtered_names.all.uniq { ... }
You should use DB to perform the max calculation, not the ruby code. Add a new column to the results table called total_time_in_msecs and set the value for it every time you change the Results table.
class Result < ActiveRecord::Base
before_save :init_data
def init_data
self.total_time_in_msecs = performance_time_hours * MSEC_IN_HOUR +
performance_time_mins * MSEC_IN_MIN +
performance_time_secs * MSEC_IN_SEC +
performance_time_msecs
end
MSEC_IN_SEC = 1000
MSEC_IN_MIN = 60 * MSEC_IN_SEC
MSEC_IN_HOUR = 60 * MSEC_IN_MIN
end
Now you can write your query as follows:
athletes = Athlete.joins(:results).
select("athletes.id,athletes.name,max(results.total_time_in_msecs) best_time").
where("results.event_name = ?", params[:justevent])
group("athletes.id, athletes.name").
orde("best_time DESC")
athletes.first.best_time # prints a number
Write a simple helper to break down the the number time parts:
def human_time time_in_msecs
"%d:%02d:%02d:%03d" %
[Result::MSEC_IN_HOUR, Result::MSEC_IN_MIN,
Result::MSEC_IN_SEC, 1 ].map do |interval|
r = time_in_msecs/interval
time_in_msecs = time_in_msecs % interval
r
end
end
Use the helper in your views to display the broken down time.

Limiting user votes in a ruby on rails app

I have an app where users can vote for entries. They are limited to a total number of votes per 24 hours, based on a configuration stored in my Setting model. Here's the code I'm using in my Vote model to check and see if they've hit their limit.
def not_voted_too_much?
#votes_per_period = find_settings.votes_per_period #how many votes are allowed per period
#votes = Vote.find_all_by_user_id(user_id, :order => 'id DESC')
#index = #votes_per_period - 1
if #votes.nil?
true
else
if #votes.size < #votes_per_period
true
else
if #votes[#index].created_at + find_settings.voting_period_in_hours.hours > Time.now.utc
false
else
true
end
end
end
end
When that returns, true -- they're allowed to vote. If false -- they can't. Right now, it relies on the records being retrieved in a certain order and that the one it selects is the oldest. This seems to work, but feels fragile to me.
I'd like to use :order => 'created_at DESC', but when I apply a limit to the query (I'd need to only get as many records as votes are allowed for that period), it seems to always pull the oldest records instead of the latest records and I'm not sure how to go about changing the query to pull the latest votes and not the oldest.
Any thoughts on the best way to go about this?
Can't you just count the user's votes which are newer than 24 hours old and check it against your limits? Am I missing something?
def not_voted_too_much?
votes_count = votes.where("created_at >= ?", 24.hours.ago).count
votes_count < find_settings.votes_per_period
end
(this is assuming you've got the votes association setup correctly in the user model)

SQL: Get a selected row index from a query

I have an applications that stores players ratings for each tournament. So I have many-to-many association:
Tournament
has_many :participations, :order => 'rating desc'
has_many :players, :through => :participations
Participation
belongs_to :tournament
belongs_to :player
Player
has_many :participations
has_many :tournaments, :through => :participations
The Participation model has a rating field (float) that stores rating value (it's like score points) for each player at each tournament.
The thing I want - get last 10 ranks of the player (rank is a position of the player at particular tournament based on his rating: the more rating - the higher rank). For now to get a player's rank on a tournament I'm loading all participations for this tournament, sort them by rating field and get the player's participation index with ruby code:
class Participation < ActiveRecord::Base
belongs_to :player
belongs_to :tournament
def rank
tournament.participations.index(self)
end
end
Method rank of the participation gets its parent tournament, loads all tournamentr's participations (ordered by rating desc) and get own index inside this collection
and then something like:
player.participations.last.rank
The one thing I don't like - it need to load all participations for the tournament, and in case I need player ranks for last 10 tournaments it loads over 5.000 items (and its amount will grow when new players added).
I believe that there should be way to use SQL for it. Actually I tried to use SQL variables:
find_by_sql("select #row:=#row+1 `rank`, p.* from participations p, (SELECT #row:=0) r where(p.tournament_id = #{tournament_id}) order by rating desc limit 10;")
This query selects top-10 ratings from the given tournament. I've been trying to modify it to select last 10 participations for a given user and his rank.
Will appreciate any kind of help. (I think solution will be a SQL request, since it's pretty complex for ActiveRecord).
P.S. I'm using rails3.0.0.beta4
UPD:
Here is final sql request that gets last 10 ranks of the player (in addition it loads the participated tournaments as well)
SELECT *, (
SELECT COUNT(*) FROM participations AS p2
WHERE p2.tour_id = p1.tour_id AND p2.rating > p1.rating
) AS rank
FROM participations AS p1 LEFT JOIN tours ON tours.id = p1.tour_id WHERE p1.player_id = 68 ORDER BY tours.date desc LIMIT 10;
First of all, should this:
Participation
belongs_to :tournament
belongs_to :players
be this?
Participation
belongs_to :tournament
belongs_to :player
Ie, singular player after the belongs_to?
I'm struggling to get my head around what this is doing:
class Participation
def rank_at_tour(tour)
tour.participations.index(self)
end
end
You don't really explain enough about your schema to make it easy to reverse engineer. Is it doing the following...?
"Get all the participations for the given tour and return the position of this current participation in that list"? Is that how you calculate rank? If so i agree it seems like a very convoluted way of doing it.
Do you do the above for the ten participation objects you get back for the player and then take the average? What is rating? Does that have anything to do with rank? Basically, can you explain your schema a bit more and then restate what you want to do?
EDIT
I think you just need a more efficient way of finding the position. There's one way i could think of off the top of my head - get the record you want and then count how many are above it. Add 1 to that and you get the position. eg
class Participation
def rank_at_tour(tour)
tour.participations.count("rating > ?", self.rating) + 1
end
end
You should see in your log file (eg while experimenting in the console) that this just makes a count query. If you have an index on the rating field (which you should have if you don't) then this will be a very fast query to execute.
Also - if tour and tournament are the same thing (as i said you seem to use them interchangeably) then you don't need to pass tour to participation since it belongs to a tour anyway. Just change the method to rank:
class Participation
def rank
self.tour.participations.count("rating > ?", self.rating) + 1
end
end
SELECT *, (
SELECT COUNT(*) FROM participations AS p2
WHERE p2.tour_id = p1.tour_id AND p2.rating > p1.rating
) AS rank
FROM participations AS p1 LEFT JOIN tours ON tours.id = p1.tour_id WHERE p1.player_id = 68 ORDER BY tours.date desc LIMIT 10;