Limiting user votes in a ruby on rails app - sql

I have an app where users can vote for entries. They are limited to a total number of votes per 24 hours, based on a configuration stored in my Setting model. Here's the code I'm using in my Vote model to check and see if they've hit their limit.
def not_voted_too_much?
#votes_per_period = find_settings.votes_per_period #how many votes are allowed per period
#votes = Vote.find_all_by_user_id(user_id, :order => 'id DESC')
#index = #votes_per_period - 1
if #votes.nil?
true
else
if #votes.size < #votes_per_period
true
else
if #votes[#index].created_at + find_settings.voting_period_in_hours.hours > Time.now.utc
false
else
true
end
end
end
end
When that returns, true -- they're allowed to vote. If false -- they can't. Right now, it relies on the records being retrieved in a certain order and that the one it selects is the oldest. This seems to work, but feels fragile to me.
I'd like to use :order => 'created_at DESC', but when I apply a limit to the query (I'd need to only get as many records as votes are allowed for that period), it seems to always pull the oldest records instead of the latest records and I'm not sure how to go about changing the query to pull the latest votes and not the oldest.
Any thoughts on the best way to go about this?

Can't you just count the user's votes which are newer than 24 hours old and check it against your limits? Am I missing something?
def not_voted_too_much?
votes_count = votes.where("created_at >= ?", 24.hours.ago).count
votes_count < find_settings.votes_per_period
end
(this is assuming you've got the votes association setup correctly in the user model)

Related

Count total number of objects in list ordered by the number of associated objects

I have two models
class User
has_many :subscriptions
end
and
class Subscription
belongs_to :user
end
one one of my pages I would like to display a list of all users ordered by the number of subscriptions each user has. I am not to good with sql queries but I think that
list = Users.all.joins(:subscriptions).group("user.id").order("count(subscriptions.id) DESC")
dose the job. Now to my problem, when I try to count the total number of objects in list, using list.count, I get a hash with user.id and subscription count, like this
{11 => 5,
8 => 7,
1 => 11,
...}
not the total number of users in list.. .count works fine if I have a list sorted by for example user name (which is in the user table). I would really like to use .count since it in a module for pagination thats in a gem but any ideas is great!
Thanks!
We can just use a single query to finish this:
User.joins("LEFT OUTER JOIN ( SELECT user_id, COUNT(*) as num_subscriptions
FROM subscriptions
GROUP BY user_id
) AS temp
ON temp.user_id = users.id")
.order("temp.num_subscriptions DESC")
Basically, my idea is to try to query the number of subscription for each user_id in the subquery, then join with User. I used LEFT OUTER JOIN, because there will be several users which don't have any subscriptions
Improve option: You can define a scope inside User, it would be more beautiful for later usage:
user.rb
class User < ActiveRecord::Base
has_many :subscriptions
scope :sorted_by_num_subscriptions, -> {
joins("LEFT OUTER JOIN ( SELECT user_id, COUNT(*) as num_subscriptions
FROM subscriptions
GROUP BY user_id
) AS temp
ON temp.user_id = users.id")
.order("temp.num_subscriptions DESC")
}
end
Then just use it:
User.sorted_by_num_subscriptions
When grouping, the count method changes it's behavior and indeed, instead of returning the total count of records, it returns a hash of the counts for each group (see the docs for more info). So what you get with list.count is simply a hash of the subscription counts for each user.
So, your query is correct and all you need is to sum up the individual counts in the groups. This can be done simply by:
total_count = list.count.values.sum
If it is the pagination code that calls just a bare count that makes the issue, usually the pagination code is able to accept a parameter with total count. For example, will_paginate accepts the total_entries parameter, so you should be able to pass it the total count like this:
list.paginate(page: 2, total_entries: list.count.values.sum)

ActiveRecord find_each combined with limit and order

I'm trying to run a query of about 50,000 records using ActiveRecord's find_each method, but it seems to be ignoring my other parameters like so:
Thing.active.order("created_at DESC").limit(50000).find_each {|t| puts t.id }
Instead of stopping at 50,000 I'd like and sorting by created_at, here's the resulting query that gets executed over the entire dataset:
Thing Load (198.8ms) SELECT "things".* FROM "things" WHERE "things"."active" = 't' AND ("things"."id" > 373343) ORDER BY "things"."id" ASC LIMIT 1000
Is there a way to get similar behavior to find_each but with a total max limit and respecting my sort criteria?
The documentation says that find_each and find_in_batches don't retain sort order and limit because:
Sorting ASC on the PK is used to make the batch ordering work.
Limit is used to control the batch sizes.
You could write your own version of this function like #rorra did. But you can get into trouble when mutating the objects. If for example you sort by created_at and save the object it might come up again in one of the next batches. Similarly you might skip objects because the order of results has changed when executing the query to get the next batch. Only use that solution with read only objects.
Now my primary concern was that I didn't want to load 30000+ objects into memory at once. My concern was not the execution time of the query itself. Therefore I used a solution that executes the original query but only caches the ID's. It then divides the array of ID's into chunks and queries/creates the objects per chunk. This way you can safely mutate the objects because the sort order is kept in memory.
Here is a minimal example similar to what I did:
batch_size = 512
ids = Thing.order('created_at DESC').pluck(:id) # Replace .order(:created_at) with your own scope
ids.each_slice(batch_size) do |chunk|
Thing.find(chunk, :order => "field(id, #{chunk.join(',')})").each do |thing|
# Do things with thing
end
end
The trade-offs to this solution are:
The complete query is executed to get the ID's
An array of all the ID's is kept in memory
Uses the MySQL specific FIELD() function
Hope this helps!
find_each uses find_in_batches under the hood.
Its not possible to select the order of the records, as described in find_in_batches, is automatically set to ascending on the primary key (“id ASC”) to make the batch ordering work.
However, the criteria is applied, what you can do is:
Thing.active.find_each(batch_size: 50000) { |t| puts t.id }
Regarding the limit, it wasn't implemented yet: https://github.com/rails/rails/pull/5696
Answering to your second question, you can create the logic yourself:
total_records = 50000
batch = 1000
(0..(total_records - batch)).step(batch) do |i|
puts Thing.active.order("created_at DESC").offset(i).limit(batch).to_sql
end
Retrieving the ids first and processing the in_groups_of
ordered_photo_ids = Photo.order(likes_count: :desc).pluck(:id)
ordered_photo_ids.in_groups_of(1000, false).each do |photo_ids|
photos = Photo.order(likes_count: :desc).where(id: photo_ids)
# ...
end
It's important to also add the ORDER BY query to the inner call.
Rails 6.1 adds support for descending order in find_each, find_in_batches and in_batches.
One option is to put an implementation tailored for your particular model into the model itself (speaking of which, id is usually a better choice for ordering records, created_at may have duplicates):
class Thing < ActiveRecord::Base
def self.find_each_desc limit
batch_size = 1000
i = 1
records = self.order(created_at: :desc).limit(batch_size)
while records.any?
records.each do |task|
yield task, i
i += 1
return if i > limit
end
records = self.order(created_at: :desc).where('id < ?', records.last.id).limit(batch_size)
end
end
end
Or else you can generalize things a bit, and make it work for all the models:
lib/active_record_extensions.rb:
ActiveRecord::Batches.module_eval do
def find_each_desc limit
batch_size = 1000
i = 1
records = self.order(id: :desc).limit(batch_size)
while records.any?
records.each do |task|
yield task, i
i += 1
return if i > limit
end
records = self.order(id: :desc).where('id < ?', records.last.id).limit(batch_size)
end
end
end
ActiveRecord::Querying.module_eval do
delegate :find_each_desc, :to => :all
end
config/initializers/extensions.rb:
require "active_record_extensions"
P.S. I'm putting the code in files according to this answer.
You can iterate backwards by standard ruby iterators:
Thing.last.id.step(0,-1000) do |i|
Thing.where(id: (i-1000+1)..i).order('id DESC').each do |thing|
#...
end
end
Note: +1 is because BETWEEN which will be in query includes both bounds but we need include only one.
Sure, with this approach there could be fetched less than 1000 records in batch because some of them are deleted already but this is ok in my case.
As remarked by #Kirk in one of the comments, find_each supports limit as of version 5.1.0.
Example from the changelog:
Post.limit(10_000).find_each do |post|
# ...
end
The documentation says:
Limits are honored, and if present there is no requirement for the batch size: it can be less than, equal to, or greater than the limit.
(setting a custom order is still not supported though)
I was looking for the same behaviour and thought up of this solution. This DOES NOT order by created_at but I thought I would post anyways.
max_records_to_retrieve = 50000
last_index = Thing.count
start_index = [(last_index - max_records_to_retrieve), 0].max
Thing.active.find_each(:start => start_index) do |u|
# do stuff
end
Drawbacks of this approach:
- You need 2 queries (first one should be fast)
- This guarantees a max of 50K records but if ids are skipped you will get less.
You can try ar-as-batches Gem.
From their documentation you can do something like this
Users.where(country_id: 44).order(:joined_at).offset(200).as_batches do |user|
user.party_all_night!
end
Using Kaminari or something other it will be easy.
Create batch loader class.
module BatchLoader
extend ActiveSupport::Concern
def batch_by_page(options = {})
options = init_batch_options!(options)
next_page = 1
loop do
next_page = yield(next_page, options[:batch_size])
break next_page if next_page.nil?
end
end
private
def default_batch_options
{
batch_size: 50
}
end
def init_batch_options!(options)
options ||= {}
default_batch_options.merge!(options)
end
end
Create Repository
class ThingRepository
include BatchLoader
# #param [Integer] per_page
# #param [Proc] block
def batch_changes(per_page=100, &block)
relation = Thing.active.order("created_at DESC")
batch_by_page do |next_page|
query = relation.page(next_page).per(per_page)
yield query if block_given?
query.next_page
end
end
end
Use the repository
repo = ThingRepository.new
repo.batch_changes(5000).each do |g|
g.each do |t|
#...
end
end
Adding find_in_batches_with_order did solve my usecase, where I was having ids already but need batching and ordering. It was inspired by #dirk-geurs solution
# Create file config/initializers/find_in_batches_with_order.rb with follwing code.
ActiveRecord::Batches.class_eval do
## Only flat order structure is supported now
## example: [:forename, :surname] is supported but [:forename, {surname: :asc}] is not supported
def find_in_batches_with_order(ids: nil, order: [], batch_size: 1000)
relation = self
arrangement = order.dup
index = order.find_index(:id)
unless index
arrangement.push(:id)
index = arrangement.length - 1
end
ids ||= relation.order(*arrangement).pluck(*arrangement).map{ |tupple| tupple[index] }
ids.each_slice(batch_size) do |chunk_ids|
chunk_relation = relation.where(id: chunk_ids).order(*order)
yield(chunk_relation)
end
end
end
Leaving Gist here https://gist.github.com/the-spectator/28b1176f98cc2f66e870755bb2334545
I had the same problem with a query with DISTINCT ON where you need an ORDER BY with that field, so this is my approach with Postgres:
def filtered_model_ids
Model.joins(:father_model)
.select('DISTINCT ON (model.field) model.id')
.order(:field)
.map(&:id)
end
def processor
filtered_model_ids.each_slice(BATCH_SIZE).lazy.each do |batch|
Model.find(batch).each do |record|
# Code
end
end
end
My code
batch_size = 100
total_count = klass.count
offset = 0
processed_count = 0
while processed_count < total_count
relation = klass.order({ active_at: :asc, created_at: :desc }).offset(offset).limit(batch_size)
relation.each do |record|
record.process
end
processed_count += batch_size
end
Do it in one query and avoid iterating:
User.offset(2).order('name DESC').last(3)
will product a query like this
SELECT "users".* FROM "users" ORDER BY name ASC LIMIT $1 OFFSET $2 [["LIMIT", 3], ["OFFSET", 2]

Combining Active Record group, join, maximum & minimum

I'm trying to get to grips with the Active Record query interface. I have two models:
class Movie < ActiveRecord::Base
has_many :datapoints
attr_accessible :genre
end
class Datapoint < ActiveRecord::Base
belongs_to :movie
attr_accessible :cumulative_downloads, :timestamp
end
I want to find the incremental downloads per genre for a given time period.
So far I've managed to get the maximum and minimum downloads per movie within a time period, like so:
maximums = Datapoint.joins(:movie)
.where(["datapoints.timestamp > ?", Date.today - #timespan])
.group('datatpoints.movie_id')
.maximum(:cumulative_downloads)
This then allows me to calculate the incremental per movie, before aggregating this into the incremental per genre.
Clearly this is a bit ham-fisted, and I'm sure it would be possible to do this in one step (and using hash conditions). I just can't get my head around how. Can you help?
Much appreciated!
Derek.
I think this will allow you to calculate maximum per genre:
Movie.joins(:datapoints).where(datapoints: {timestamp: (Time.now)..(Time.now+1.year)}).group(:genre).maximum(:cumulative_downloads)
Edit 1
You can get the diffs in a couple of steps:
rel = Movie.joins(:datapoints).where(datapoints: {timestamp: (Time.now)..(Time.now+1.year)}).group(:genre)
mins = rel.minimum(:cumulative_downloads)
maxs = rel.maximum(:cumulative_downloads)
res = {}
maxs.each{|k,v| res[k] = v-mins[k]}
Edit 2
Your initial direction was almost there. All you have to do is calculate the diff per movie in the SQL and stage the data so you can collect it with one pass. I'm sure there's a way to do it all in SQL, but I'm not sure it will be as simple.
# get the genre and diff per movie
result = Movie.select('movies.genre, MAX(datapoints.cumulative_downloads)-MIN(datapoints.cumulative_downloads) as diff').joins(:datapoints).group(:movie_id)
# sum the diffs per genre
per_genre = Hash.new(0)
result.each{|m| per_genre[m.genre] += m.diff}
Edit 3
Including the movie_id in the select and the genre in the group:
# get the genre and diff per movie
result = Movie
.select('movies.movie_id, movies.genre, MAX(datapoints.cumulative_downloads)-MIN(datapoints.cumulative_downloads) as diff')
.joins(:datapoints)
.group('movies.movie_id, movies.genre')
# sum the diffs per genre
per_genre = Hash.new(0)
result.each{|m| per_genre[m.genre] += m.diff}

Issues with DISTINCT when used in conjunction with ORDER

I am trying to construct a site which ranks performances for a selection of athletes in a particular event - I have previously posted a question which received a few good responses which me to identify the key problem with my code currently.
I have 2 models - Athlete and Result (Athlete HAS MANY Results)
Each athlete can have a number of recorded times for a particular event, i want to identify the quickest time for each athlete and rank these quickest times across all athletes.
I use the following code:
<% #filtered_names = Result.where(:event_name => params[:justevent]).joins(:athlete).order('performance_time_hours ASC').order('performance_time_mins ASC').order('performance_time_secs ASC').order('performance_time_msecs ASC') %>
This successfully ranks ALL the results across ALL athletes for the event (i.e. one athlete can appear a number of times in different places depending on the times they have recorded).
I now wish to just pull out the best result for each athlete and include them in the rankings. I can select the time corresponding to the best result using:
<% #currentathleteperformance = Result.where(:event_name => params[:justevent]).where(:athlete_id => filtered_name.athlete_id).order('performance_time_hours ASC').order('performance_time_mins ASC').order('performance_time_secs ASC').order('performance_time_msecs ASC').first() %>
However, my problem comes when I try to identify the distinct athlete names listed in #filtered_names. I tried using <% #filtered_names = #filtered_names.select('distinct athlete_id') %> but this doesn't behave how I expected it to and on occasions it gets the rankings in the wrong order.
I have discovered that as it stands my code essentially looks for a difference between the distinct athlete results, starting with the hours time and progressing through to mins, secs and msec. As soon as it has found a difference between a result for each of the distinct athletes it orders them accordingly.
For example, if I have 2 athletes:
Time for Athlete 1 = 0:0:10:5
Time for Athlete 2 = 0:0:10:3
This will yield the order, Athlete 2, Athlete1
However, if i have:
Time for Athlete 1 = 0:0:10:5
Time for Athlete 2 = 0:0:10:3
Time for Athlete 2 = 0:1:11:5
Then the order is given as Athlete 1, Athlete 2 as the first difference is in the mins digit and Athlete 2 is slower...
Can anyone suggest a way to get around this problem and essentially go down the entries in #filtered_names pulling out each name the first time it appears (i.e. keeping the names in the order they first appear in #filtered_names
Thanks for your time
If you're on Ruby 1.9.2+, you can use Array#uniq and pass a block specifying how to determine uniqueness. For example:
#unique_results = #filtered_names.uniq { |result| result.athlete_id }
That should return only one result per athlete, and that one result should be the first in the array, which in turn will be the quickest time since you've already ordered the results.
One caveat: #filtered_names might still be an ActiveRecord::Relation, which has its own #uniq method. You may first need to call #all to return an Array of the results:
#unique_results = #filtered_names.all.uniq { ... }
You should use DB to perform the max calculation, not the ruby code. Add a new column to the results table called total_time_in_msecs and set the value for it every time you change the Results table.
class Result < ActiveRecord::Base
before_save :init_data
def init_data
self.total_time_in_msecs = performance_time_hours * MSEC_IN_HOUR +
performance_time_mins * MSEC_IN_MIN +
performance_time_secs * MSEC_IN_SEC +
performance_time_msecs
end
MSEC_IN_SEC = 1000
MSEC_IN_MIN = 60 * MSEC_IN_SEC
MSEC_IN_HOUR = 60 * MSEC_IN_MIN
end
Now you can write your query as follows:
athletes = Athlete.joins(:results).
select("athletes.id,athletes.name,max(results.total_time_in_msecs) best_time").
where("results.event_name = ?", params[:justevent])
group("athletes.id, athletes.name").
orde("best_time DESC")
athletes.first.best_time # prints a number
Write a simple helper to break down the the number time parts:
def human_time time_in_msecs
"%d:%02d:%02d:%03d" %
[Result::MSEC_IN_HOUR, Result::MSEC_IN_MIN,
Result::MSEC_IN_SEC, 1 ].map do |interval|
r = time_in_msecs/interval
time_in_msecs = time_in_msecs % interval
r
end
end
Use the helper in your views to display the broken down time.

Improving performance of Rails model

I have the following model that allows Users to cast Votes on Photos.
class Vote < ActiveRecord::Base
attr_accessible :value
belongs_to :photo
belongs_to :user
validates_associated :photo, :user
validates_uniqueness_of :user_id, :scope => :photo_id
validates_uniqueness_of :photo_id, :scope => :user_id
validates_inclusion_of :value, :in => [-2,-1,1,2], :allow_nil => true
after_save :write_photo_data
def self.score
dd = where( :value => -2 ).count
d = where( :value => -1 ).count
u = where( :value => 1 ).count
uu = where( :value => 2 ).count
self.compute_score(dd,d,u,uu)
end
def self.compute_score(dd, d, u, uu)
tot = [dd,d,u,uu].sum.to_f
score = [-5*dd, -2*d, 2*u, 5*uu].sum / [tot,4].sum*20.0
score.round(2)
end
private
def write_photo_data
self.photo.score = self.photo.votes.score
self.photo.save!
end
end
This functions very well, however computing the score for a photo is pretty slow - it seems to take 7-12 seconds on average. I've tried adding indices for photo_id, user_id, and one combined for photo_id and value, but this hasn't really improved the performance as far as I can tell.
I'd be interested in feedback from any serious rails gurus (I'm totally an amateur) as to how this could be optimized / improved. How would you tally up votes for a particular photo and value?
Thanks!
--EDIT--
Note that the scores: -2,-1,1,2 represent "two-thumbs down, one-thumb down, thumb up, two-thumbs up", not specific values. I could match these to the values I've assigned to them in the compute score method, but I haven't done that so far because I may want to tweak the weightings over time after seeing more data accumulated.
Also, regardless of how I represent those four possible votes in the DB, I still need both the COUNT of each kind of vote as well as the weighted value of those votes for each photo to compute the score. Thanks!
You need an index on value, by itself. combined indexes only work when the query has both components, starting at the left. Since your where clause does not specify a photo id, it's not using your combined index.
update see http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html
One thing you could do is asking the database once instead of four times for the score counts:
Vote.where(photo_id: photo.id).group(:value).count
would result in a single database query and give you a hash like
{-2 => 21, -1 => 48, 1 => 103, 2 => 84}
Besides that, if you store the actual values of [-5, -2, 2, 5] instead of [-2, -1, 1, 2] in the database, you could just do
Vote.where(photo_id: photo.id).sum
and get your sum direct from the database (or even use avg to get the average instead)
Why do you store -2, 2, 1, 2 instead of the actual grade? If you store the grade (-5 for example), you will be able to compute the score in DB directly without having to run 4 count queries. This will be an improvement for sure.
Putting an index on the value column will speed up the SELECTs if you have lots of records in the DB.
The above posts also bring up some good points on direct optimization. However, as your DB scales, all of these approaches will eventually fall down. Since the score is a derived value, you could cache it in Memcached, Redis, or even SQL which will ensure that fetching the score scales in constant time as the app grows. You can allow the caches to get out of date and keep them updated using a background process. By doing so, your calculation function can take arbitrarily long without impacting the user experience.