Issues with DISTINCT when used in conjunction with ORDER - sql

I am trying to construct a site which ranks performances for a selection of athletes in a particular event - I have previously posted a question which received a few good responses which me to identify the key problem with my code currently.
I have 2 models - Athlete and Result (Athlete HAS MANY Results)
Each athlete can have a number of recorded times for a particular event, i want to identify the quickest time for each athlete and rank these quickest times across all athletes.
I use the following code:
<% #filtered_names = Result.where(:event_name => params[:justevent]).joins(:athlete).order('performance_time_hours ASC').order('performance_time_mins ASC').order('performance_time_secs ASC').order('performance_time_msecs ASC') %>
This successfully ranks ALL the results across ALL athletes for the event (i.e. one athlete can appear a number of times in different places depending on the times they have recorded).
I now wish to just pull out the best result for each athlete and include them in the rankings. I can select the time corresponding to the best result using:
<% #currentathleteperformance = Result.where(:event_name => params[:justevent]).where(:athlete_id => filtered_name.athlete_id).order('performance_time_hours ASC').order('performance_time_mins ASC').order('performance_time_secs ASC').order('performance_time_msecs ASC').first() %>
However, my problem comes when I try to identify the distinct athlete names listed in #filtered_names. I tried using <% #filtered_names = #filtered_names.select('distinct athlete_id') %> but this doesn't behave how I expected it to and on occasions it gets the rankings in the wrong order.
I have discovered that as it stands my code essentially looks for a difference between the distinct athlete results, starting with the hours time and progressing through to mins, secs and msec. As soon as it has found a difference between a result for each of the distinct athletes it orders them accordingly.
For example, if I have 2 athletes:
Time for Athlete 1 = 0:0:10:5
Time for Athlete 2 = 0:0:10:3
This will yield the order, Athlete 2, Athlete1
However, if i have:
Time for Athlete 1 = 0:0:10:5
Time for Athlete 2 = 0:0:10:3
Time for Athlete 2 = 0:1:11:5
Then the order is given as Athlete 1, Athlete 2 as the first difference is in the mins digit and Athlete 2 is slower...
Can anyone suggest a way to get around this problem and essentially go down the entries in #filtered_names pulling out each name the first time it appears (i.e. keeping the names in the order they first appear in #filtered_names
Thanks for your time

If you're on Ruby 1.9.2+, you can use Array#uniq and pass a block specifying how to determine uniqueness. For example:
#unique_results = #filtered_names.uniq { |result| result.athlete_id }
That should return only one result per athlete, and that one result should be the first in the array, which in turn will be the quickest time since you've already ordered the results.
One caveat: #filtered_names might still be an ActiveRecord::Relation, which has its own #uniq method. You may first need to call #all to return an Array of the results:
#unique_results = #filtered_names.all.uniq { ... }

You should use DB to perform the max calculation, not the ruby code. Add a new column to the results table called total_time_in_msecs and set the value for it every time you change the Results table.
class Result < ActiveRecord::Base
before_save :init_data
def init_data
self.total_time_in_msecs = performance_time_hours * MSEC_IN_HOUR +
performance_time_mins * MSEC_IN_MIN +
performance_time_secs * MSEC_IN_SEC +
performance_time_msecs
end
MSEC_IN_SEC = 1000
MSEC_IN_MIN = 60 * MSEC_IN_SEC
MSEC_IN_HOUR = 60 * MSEC_IN_MIN
end
Now you can write your query as follows:
athletes = Athlete.joins(:results).
select("athletes.id,athletes.name,max(results.total_time_in_msecs) best_time").
where("results.event_name = ?", params[:justevent])
group("athletes.id, athletes.name").
orde("best_time DESC")
athletes.first.best_time # prints a number
Write a simple helper to break down the the number time parts:
def human_time time_in_msecs
"%d:%02d:%02d:%03d" %
[Result::MSEC_IN_HOUR, Result::MSEC_IN_MIN,
Result::MSEC_IN_SEC, 1 ].map do |interval|
r = time_in_msecs/interval
time_in_msecs = time_in_msecs % interval
r
end
end
Use the helper in your views to display the broken down time.

Related

Improve query performance in Rails when creating a json

I'm working with a Rails 5 API. I have a simple model of a store, with:
order has_one checkout
checkout has_one transaction
checkout belong_to order
transactions belongs_to checkout
checkout has_many items
1 1 1 1
order -----> checkout ------> transaction
1 *
------> item
I want and endpoint that given an amount of the transactions, it returns a json with data from the transactions.
I have this code that works but it takes a lot of time. For example, a month worth of transactions it's taking 1 minute.
def get_all_transactions
transactions = Transaction.where.not(status: 'error')
data = transactions.map do |transaction|
checkout = transaction.checkout
order = Order.find(checkout.order_id)
checkout.items.map do |item|
{
checkout_id: checkout.id,
order_id: checkout.order_id,
item_id: item.id,
client_name: checkout.client.full_name,
order_created_at: order.created_at
}
end
end
data.flatten!
end
How can I improve this code to have a better performance?
I have also notice that removing for example, the checkout.client.full_name it takes like 20 seconds off.
With full_name being in the client model:
def full_name
"#{first_name} #{last_name}".strip
end
Why would that take 20 seconds?
The problems here is that you have layers upon layers of N+1 queries. Every time you call an association that hasn't been eager loaded you're causing another round trip to the database. Even if you add includes or eager_loads then the next issue is that you're loading tons off data of the tables that you're not using and creating model instances in memory just to use a single attribute off them.
The most efficient way to do this is most likely going to be to simply perform a join and just select the columns you're actually interested in:
sql = Item.joins(order: { checkout: :client })
.select(
Item.arel_table[:id].as('item_id'),
Order.arel_table[:id].as('order_id'),
"TRIM(CONCAT(clients.first_name, ' ', clients.last_name)) AS client_name",
Order.arel_table[:created_at ].as('order_created_at')
)
result = Item.connection.select_all(sql.arel).map(&:to_h)
This avoids creating entire models instances in memory in multiple levels when all you need is a single column.
However its very unclear what the actual expected result is here or why you're basing the query off the Transaction model when you're actually getting an array of items in the result.

Rails, joining two tables with where clauses on each tabe

I'm new to web development and rails, and I'm trying to construct a query object for my first time. I have a table Players, and a table DefensiveStats, which has a foriegn-key player_id, so each row in this table belongs to a player. Players have a field api_player_number, which is an id used by a 3rd party that I'm referencing. A DefensiveStats object has two fields that are relevant for this query - a season_number integer and a week_number integer. What I'd like to do is build a single query that takes 3 parameters: an api_player_number, season_number, and week_number, and it should return the DefensiveStats object with the corresponding season and week numbers, that belongs to the player with api_player_number = passed in api_player_number.
Here is what I have attempted:
class DefensiveStatsWeekInSeasonQuery
def initialize(season_number, week_number, api_player_number)
#season_number = season_number
#week_number = week_number
#api_player_number = api_player_number
end
# data method always returns an object or list of object, not a relation
def data
defensive_stats = Player.where(api_player_number: #api_player_number)
.joins(:defensive_stats)
.where(season_number:#season_number, week_number: #week_number)
if defensive_stats.nil?
defensive_stats = DefensiveStats.new
end
defensive_stats
end
end
However, this does not work, as it performs the second where clause on the Player class, and not the DefensiveStats class -> specifically, "SQLite3::SQLException: no such column: players.season_number"
How can I construct this query? Thank you!!!
Player.joins(:defensive_stats).where(players: {api_player_number: #api_player_number}, defensive_stats: {season_number: #season_number, week_number: #week_number})
OR
Player.joins(:defensive_stats).where("players.api_player_number = ? and defensive_stats.season_number = ? and defensive_stats.week_number = ?", #api_player_number, #season_number, #week_number)

How to select each model which has the maximum value of an attribute for any given value of another attribute?

I have a Work model with a video_id, a user_id and some other simple fields. I need to display the last 12 works on the page, but only take 1 per user. Currently I'm trying to do it like this:
def self.latest_works_one_per_user(video_id=nil)
scope = self.includes(:user, :video)
scope = video_id ? scope.where(video_id: video_id) : scope.where.not(video_id: nil)
scope = scope.order(created_at: :desc)
user_ids = works = []
scope.each do |work|
next if user_ids.include? work.user_id
user_ids << work.user_id
works << work
break if works.size == 12
end
works
end
But I'm damn sure there is a more elegant and faster way of doing it especially when the number of works gets bigger.
Here's a solution that should work for any SQL database with minimal adjustment. Whether one thinks it's elegant or not depends on how much you enjoy SQL.
def self.latest_works_one_per_user(video_id=nil)
scope = includes(:user, :video)
scope = video_id ? scope.where(video_id: video_id) : scope.where.not(video_id: nil)
scope.
joins("join (select user_id, max(created_at) created_at
from works group by created at) most_recent
on works.user_id = most_recent.user_id and
works.created_at = most_recent.created_at").
order(created_at: :desc).limit(12)
end
It only works if the combination of user_id and created_at is unique, however. If that combination isn't unique you'll get more than 12 rows.
It can be done more simply in MySQL. The MySQL solution doesn't work in Postgres, and I don't know a better solution in Postgres, although I'm sure there is one.

Combining Active Record group, join, maximum & minimum

I'm trying to get to grips with the Active Record query interface. I have two models:
class Movie < ActiveRecord::Base
has_many :datapoints
attr_accessible :genre
end
class Datapoint < ActiveRecord::Base
belongs_to :movie
attr_accessible :cumulative_downloads, :timestamp
end
I want to find the incremental downloads per genre for a given time period.
So far I've managed to get the maximum and minimum downloads per movie within a time period, like so:
maximums = Datapoint.joins(:movie)
.where(["datapoints.timestamp > ?", Date.today - #timespan])
.group('datatpoints.movie_id')
.maximum(:cumulative_downloads)
This then allows me to calculate the incremental per movie, before aggregating this into the incremental per genre.
Clearly this is a bit ham-fisted, and I'm sure it would be possible to do this in one step (and using hash conditions). I just can't get my head around how. Can you help?
Much appreciated!
Derek.
I think this will allow you to calculate maximum per genre:
Movie.joins(:datapoints).where(datapoints: {timestamp: (Time.now)..(Time.now+1.year)}).group(:genre).maximum(:cumulative_downloads)
Edit 1
You can get the diffs in a couple of steps:
rel = Movie.joins(:datapoints).where(datapoints: {timestamp: (Time.now)..(Time.now+1.year)}).group(:genre)
mins = rel.minimum(:cumulative_downloads)
maxs = rel.maximum(:cumulative_downloads)
res = {}
maxs.each{|k,v| res[k] = v-mins[k]}
Edit 2
Your initial direction was almost there. All you have to do is calculate the diff per movie in the SQL and stage the data so you can collect it with one pass. I'm sure there's a way to do it all in SQL, but I'm not sure it will be as simple.
# get the genre and diff per movie
result = Movie.select('movies.genre, MAX(datapoints.cumulative_downloads)-MIN(datapoints.cumulative_downloads) as diff').joins(:datapoints).group(:movie_id)
# sum the diffs per genre
per_genre = Hash.new(0)
result.each{|m| per_genre[m.genre] += m.diff}
Edit 3
Including the movie_id in the select and the genre in the group:
# get the genre and diff per movie
result = Movie
.select('movies.movie_id, movies.genre, MAX(datapoints.cumulative_downloads)-MIN(datapoints.cumulative_downloads) as diff')
.joins(:datapoints)
.group('movies.movie_id, movies.genre')
# sum the diffs per genre
per_genre = Hash.new(0)
result.each{|m| per_genre[m.genre] += m.diff}

Limiting user votes in a ruby on rails app

I have an app where users can vote for entries. They are limited to a total number of votes per 24 hours, based on a configuration stored in my Setting model. Here's the code I'm using in my Vote model to check and see if they've hit their limit.
def not_voted_too_much?
#votes_per_period = find_settings.votes_per_period #how many votes are allowed per period
#votes = Vote.find_all_by_user_id(user_id, :order => 'id DESC')
#index = #votes_per_period - 1
if #votes.nil?
true
else
if #votes.size < #votes_per_period
true
else
if #votes[#index].created_at + find_settings.voting_period_in_hours.hours > Time.now.utc
false
else
true
end
end
end
end
When that returns, true -- they're allowed to vote. If false -- they can't. Right now, it relies on the records being retrieved in a certain order and that the one it selects is the oldest. This seems to work, but feels fragile to me.
I'd like to use :order => 'created_at DESC', but when I apply a limit to the query (I'd need to only get as many records as votes are allowed for that period), it seems to always pull the oldest records instead of the latest records and I'm not sure how to go about changing the query to pull the latest votes and not the oldest.
Any thoughts on the best way to go about this?
Can't you just count the user's votes which are newer than 24 hours old and check it against your limits? Am I missing something?
def not_voted_too_much?
votes_count = votes.where("created_at >= ?", 24.hours.ago).count
votes_count < find_settings.votes_per_period
end
(this is assuming you've got the votes association setup correctly in the user model)