Improving performance of Rails model - sql

I have the following model that allows Users to cast Votes on Photos.
class Vote < ActiveRecord::Base
attr_accessible :value
belongs_to :photo
belongs_to :user
validates_associated :photo, :user
validates_uniqueness_of :user_id, :scope => :photo_id
validates_uniqueness_of :photo_id, :scope => :user_id
validates_inclusion_of :value, :in => [-2,-1,1,2], :allow_nil => true
after_save :write_photo_data
def self.score
dd = where( :value => -2 ).count
d = where( :value => -1 ).count
u = where( :value => 1 ).count
uu = where( :value => 2 ).count
self.compute_score(dd,d,u,uu)
end
def self.compute_score(dd, d, u, uu)
tot = [dd,d,u,uu].sum.to_f
score = [-5*dd, -2*d, 2*u, 5*uu].sum / [tot,4].sum*20.0
score.round(2)
end
private
def write_photo_data
self.photo.score = self.photo.votes.score
self.photo.save!
end
end
This functions very well, however computing the score for a photo is pretty slow - it seems to take 7-12 seconds on average. I've tried adding indices for photo_id, user_id, and one combined for photo_id and value, but this hasn't really improved the performance as far as I can tell.
I'd be interested in feedback from any serious rails gurus (I'm totally an amateur) as to how this could be optimized / improved. How would you tally up votes for a particular photo and value?
Thanks!
--EDIT--
Note that the scores: -2,-1,1,2 represent "two-thumbs down, one-thumb down, thumb up, two-thumbs up", not specific values. I could match these to the values I've assigned to them in the compute score method, but I haven't done that so far because I may want to tweak the weightings over time after seeing more data accumulated.
Also, regardless of how I represent those four possible votes in the DB, I still need both the COUNT of each kind of vote as well as the weighted value of those votes for each photo to compute the score. Thanks!

You need an index on value, by itself. combined indexes only work when the query has both components, starting at the left. Since your where clause does not specify a photo id, it's not using your combined index.
update see http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html

One thing you could do is asking the database once instead of four times for the score counts:
Vote.where(photo_id: photo.id).group(:value).count
would result in a single database query and give you a hash like
{-2 => 21, -1 => 48, 1 => 103, 2 => 84}
Besides that, if you store the actual values of [-5, -2, 2, 5] instead of [-2, -1, 1, 2] in the database, you could just do
Vote.where(photo_id: photo.id).sum
and get your sum direct from the database (or even use avg to get the average instead)

Why do you store -2, 2, 1, 2 instead of the actual grade? If you store the grade (-5 for example), you will be able to compute the score in DB directly without having to run 4 count queries. This will be an improvement for sure.

Putting an index on the value column will speed up the SELECTs if you have lots of records in the DB.
The above posts also bring up some good points on direct optimization. However, as your DB scales, all of these approaches will eventually fall down. Since the score is a derived value, you could cache it in Memcached, Redis, or even SQL which will ensure that fetching the score scales in constant time as the app grows. You can allow the caches to get out of date and keep them updated using a background process. By doing so, your calculation function can take arbitrarily long without impacting the user experience.

Related

Improve query performance in Rails when creating a json

I'm working with a Rails 5 API. I have a simple model of a store, with:
order has_one checkout
checkout has_one transaction
checkout belong_to order
transactions belongs_to checkout
checkout has_many items
1 1 1 1
order -----> checkout ------> transaction
1 *
------> item
I want and endpoint that given an amount of the transactions, it returns a json with data from the transactions.
I have this code that works but it takes a lot of time. For example, a month worth of transactions it's taking 1 minute.
def get_all_transactions
transactions = Transaction.where.not(status: 'error')
data = transactions.map do |transaction|
checkout = transaction.checkout
order = Order.find(checkout.order_id)
checkout.items.map do |item|
{
checkout_id: checkout.id,
order_id: checkout.order_id,
item_id: item.id,
client_name: checkout.client.full_name,
order_created_at: order.created_at
}
end
end
data.flatten!
end
How can I improve this code to have a better performance?
I have also notice that removing for example, the checkout.client.full_name it takes like 20 seconds off.
With full_name being in the client model:
def full_name
"#{first_name} #{last_name}".strip
end
Why would that take 20 seconds?
The problems here is that you have layers upon layers of N+1 queries. Every time you call an association that hasn't been eager loaded you're causing another round trip to the database. Even if you add includes or eager_loads then the next issue is that you're loading tons off data of the tables that you're not using and creating model instances in memory just to use a single attribute off them.
The most efficient way to do this is most likely going to be to simply perform a join and just select the columns you're actually interested in:
sql = Item.joins(order: { checkout: :client })
.select(
Item.arel_table[:id].as('item_id'),
Order.arel_table[:id].as('order_id'),
"TRIM(CONCAT(clients.first_name, ' ', clients.last_name)) AS client_name",
Order.arel_table[:created_at ].as('order_created_at')
)
result = Item.connection.select_all(sql.arel).map(&:to_h)
This avoids creating entire models instances in memory in multiple levels when all you need is a single column.
However its very unclear what the actual expected result is here or why you're basing the query off the Transaction model when you're actually getting an array of items in the result.

Rails: ActiveRecord query regarding size of association

I'm trying to figure out how to produce a certain query, using ActiveRecord.
I have the following models
class Activity < ActiveRecord::Base
attr_accessible :limit, ...
has_many :employees
end
class User < ActiveRecord::Base
belongs_to :activity
end
Each activity has a limit, that is to say, an integer attribute containing the maximum amount of users who may belong to it.
I'm looking for a way to select all activities that have spots available, i.e. where the number of users is smaller than that limit.
Any ideas?
Thanks
I think that the SQL syntax to aim for would be:
select *
from activities
where activities.limit > (
select count(*)
from users
where users.activity_id = activities.id)
In Rails-speak ...
Activity.where("activities.limit > (select count(*) from users where users.activity_id = activities.id)")
Not sure whether the column name "limit" is going to give you problems as it's a reserved word. You might have to quote it in the SQL.
I'd also seriously consider a counter cache for users on the activities table, which would make this perform much better. Some databases would support a partial index only for those rows where the users counter cache < limit.
Activity.all.select{|activity| activity.users.length < activity.limit }

Rails: Many to one ( 0 - n ) , finding records

I've got tables items and cards where a card belongs to a user and a item may or may not have any cards for a given user.
The basic associations are set up as follows:
Class User
has_many :cards
Class Item
has_many :cards
Class Card
belongs_to :user
has_and_belongs_to_many :items
I've also created a join table, items_cards with the columns item_id and card_id. I'd like to make a query that tells me if there's a card for a given user/item. In pure SQL I can accomplish this pretty easily:
SELECT count(id)
FROM cards
JOIN items_cards
ON items_cards.card_id = cards.id
WHERE cards.user_id = ?
AND items_cards.item_id = ?
I'm looking for some guidance as to how I'd go about doing this via ActiveRecord. Thanks!
Assuming you have an Item in #item and a User in #user, this will return 'true' if a card exists for that user and that item:
Card.joins(:items).where('cards.user_id = :user_id and items.id = :item_id', :user_id => #user, :item_id => #item).exists?
Here's what's going on:
Card. - You're making a query about the Card model.
joins(:items) - Rails knows how to put together joins for the association types it supports (usually - at least). You're telling it to do whatever joins are required to allow you to query the associated items as well. This will, in this case, result in JOIN items_cards ON items_cards.card_id = cards.id JOIN items ON items_cards.item_id = items.id.
where('cards.user_id = :user_id and items.id = :item_id', :user_id => #user, :item_id => #item) - Your conditional, pretty much the same as in pure SQL. Rails will interpolate the values you specify with a colon (:user_id) using the values in the hash (:user_id => #user). If you give an ActiveRecord object as the value, Rails will automatically use the id of that object. Here, you're saying you only want results where the card belongs to the user you specify, and there is a row for the item you want.
.exists? - Loading ActiveRecord objects is inefficient, so if you only want to know if something exists, Rails can save some time and use a count based query (much like your SQL version). There's also a .count, which you could use instead if you wanted to have the query return the number of results, rather than true or false.

Rails/Sql - order/group search results such that repetition of entities occurs only after appearance of others

In my application, say, animals have many photos. I'm querying photos of animals such that I want all photos of all animals to be displayed. However, I want each animal to appear as a photo before repetition occurs.
Example:
animal instance 1, 'cat', has four photos,
animal instance 2, 'dog', has two photos:
photos should appear ordered as so:
#photo belongs to #animal
tiddles.jpg , cat
fido.jpg dog
meow.jpg cat
rover.jpg dog
puss.jpg cat
felix.jpg, cat (no more dogs so two consecutive cats)
Pagination is required so I can't
order on an array.
Filename
structure/convention provides no
help, though the animal_id exists on
each photo.
Though there are two
types of animal in this example this
is an active record model with
hundreds of records.
Animals may be
selectively queried.
If this isn't possible with active_record then I'll happily use sql; I'm using postgresql.
My brain is frazzled so if anyone can come up with a better title, please go ahead and edit it or suggest in comments.
Here is a PostgreSQL specific solution:
batch_id_sql = "RANK() OVER (PARTITION BY animal_id ORDER BY id ASC)"
Photo.paginate(
:select => "DISTINCT photos.*, (#{batch_id_sql}) batch_id",
:order => "batch_id ASC, photos.animal_id ASC",
:page => 1)
Here is a DB agnostic solution:
batch_id_sql = "
SELECT COUNT(bm.*)
FROM photos bm
WHERE bm.animal_id = photos.animal_id AND
bm.id <= photos.id
"
Photo.paginate(
:select => "photos.*, (#{batch_id_sql}) batch_id",
:order => "batch_id ASC, photos.animal_id ASC",
:page => 1)
Both queries work even when you have a where condition. Benchmark the query using expected data set to check if it meets the expected throughput and latency requirements.
Reference
PostgreSQL Window function
Having no experience in activerecord. Using plain PostgreSQL I would try something like this:
Define a window function over all previous rows which counts how many time the current animal has appeared, then order by this count.
SELECT
filename,
animal_id,
COUNT(*) OVER (PARTITION BY animal_id ORDER BY filename) AS cnt
FROM
photos
ORDER BY
cnt,
animal_id,
filename
Filtering on certain animal_id's will work. This will always order the same way. I don't know if you want something random in there, but it should be easily added.
New solution
Add an integer column called batch_id to the animals table.
class AddBatchIdToPhotos < ActiveRecord::Migration
def self.up
add_column :photos, :batch_id, :integer
set_batch_id
change_column :photos, :batch_id, :integer, :nil => false
add_index :photos, :batch_id
end
def self.down
remove_column :photos, :batch_id
end
def self.set_batch_id
# set the batch id to existing rows
# implement this
end
end
Now add a before_create on the Photo model to set the batch id.
class Photo
belongs_to :animal
before_create :batch_photo_add
after_update :batch_photo_update
after_destroy :batch_photo_remove
private
def batch_photo_add
self.batch_id = next_batch_id_for_animal(animal_id)
true
end
def batch_photo_update
return true unless animal_id_changed?
batch_photo_remove(batch_id, animal_id_was)
batch_photo_add
end
def batch_photo_remove(b_id=batch_id, a_id=animal_id)
Photo.update_all("batch_id = batch_id- 1",
["animal_id = ? AND batch_id > ?", a_id, b_id])
true
end
def next_batch_id_for_animal(a_id)
(Photo.maximum(:batch_id, :conditions => {:animal_id => a_id}) || 0) + 1
end
end
Now you can get the desired result by issuing simple paginate command
#animal_photos = Photo.paginate(:page => 1, :per_page => 10,
:order => :batch_id)
How does this work?
Let's consider we have data set as given below:
id Photo Description Batch Id
1 Cat_photo_1 1
2 Cat_photo_2 2
3 Dog_photo_1 1
2 Cat_photo_3 3
4 Dog_photo_2 2
5 Lion_photo_1 1
6 Cat_photo_4 4
Now if we were to execute a query ordered by batch_id we get this
# batch 1 (cat, dog, lion)
Cat_photo_1
Dog_photo_1
Lion_photo_1
# batch 2 (cat, dog)
Cat_photo_2
Dog_photo_2
# batch 3,4 (cat)
Cat_photo_3
Cat_photo_4
The batch distribution is not random, the animals are filled from the top. The number of animals displayed in a page is governed by per_page parameter passed to paginate method (not the batch size).
Old solution
Have you tried this?
If you are using the will_paginate gem:
# assuming you want to order by animal name
animal_photos = Photo.paginate(:include => :animal, :page => 1,
:order => "animals.name")
animal_photos.each do |animal_photo|
puts animal_photo.file_name
puts animal_photo.animal.name
end
I'd recommend something hybrid/corrected based on KandadaBoggu's input.
First off, the correct way to do it on paper is with row_number() over (partition by animal_id order by id). The suggested rank() will generate a global row number, but you want the one within its partition.
Using a window function is also the most flexible solution (in fact, the only solution) if you want to plan to change the sort order here and there.
Take note that this won't necessarily scale well, however, because in order to sort the results you'll need to:
fetch the whole result set that matches your criteria
sort the whole result set to create the partitions and obtain a rank_id
top-n sort/limit over the result set a second time to get them in their final order
The correct way to do this in practice, if your sort order is immutable, is to maintain a pre-calculated rank_id. KandadaBoggu's other suggestion points in the correct direction in this sense.
When it comes to deletes (and possibly updates, if you don't want them sorted by id), you may run into issues because you end up trading faster reads for slower writes. If deleting the cat with an index of 1 leads to updating the next 50k cats, you're going to be in trouble.
If you've very small sets, the overhead might be very acceptable (don't forget to index animal_id).
If not, there's a workaround if you find the order in which specific animals appear is irrelevant. It goes like this:
Start a transaction.
If the rank_id is going to change (i.e. insert or delete), obtain an advisory lock to ensure that two sessions can't impact the rank_id of the same animal class, e.g.:
SELECT pg_try_advisory_lock('the_table'::regclass, the_animal_id);
(Sleep for .05s if you don't obtain it.)
On insert, find max(rank_id) for that animal_id. Assign it rank_id + 1. Then insert it.
On delete, select the animal with the same animal_id and the largest rank_id. Delete your animal, and assign its old rank_id to the fetched animal (unless you were deleting the last one, of course).
Release the advisory lock.
Commit the work.
Note that the above will make good use of an index on (animal_id, rank_id) and can be done using plpgsql triggers:
create trigger "__animals_rank_id__ins"
before insert on animals
for each row execute procedure lock_animal_id_and_assign_rank_id();
create trigger "_00_animals_rank_id__ins"
after insert on animals
for each row execute procedure unlock_animal_id();
create trigger "__animals_rank_id__del"
before delete on animals
for each row execute procedure lock_animal_id();
create trigger "_00_animals_rank_id__del"
after delete on animals
for each row execute procedure reassign_rank_id_and_unlock_animal_id();
You can then create a multi-column index on your sort criteria if you're not joining all over them place, e.g. (rank_id, name). And you'll end up with a snappy site for reads and writes.
You should be able to get the pictures (or filenames, anyway) using ActiveRecord, ordered by name.
Then you can use Enumerable#group_by and Enumerable#zip to zip all the arrays together.
If you give me more information about how your filenames are really arranged (i.e., are they all for sure with an underscore before the number and a constant name before the underscore for each "type"? etc.), then I can give you an example. I'll write one up momentarily showing how you'd do it for your current example.
You could run two sorts and build one array as follows:
result1= The first of each animal type only. use the ruby "find" method for this search.
result2= All animals, sorted by group. Use "find" to again find the first occurrence of each animal and then use "drop" to remove those "first occurrences" from result2.
Then:
markCustomResult = result1 + result2
Then:
You can use willpaginate on markCustomResult

Limiting user votes in a ruby on rails app

I have an app where users can vote for entries. They are limited to a total number of votes per 24 hours, based on a configuration stored in my Setting model. Here's the code I'm using in my Vote model to check and see if they've hit their limit.
def not_voted_too_much?
#votes_per_period = find_settings.votes_per_period #how many votes are allowed per period
#votes = Vote.find_all_by_user_id(user_id, :order => 'id DESC')
#index = #votes_per_period - 1
if #votes.nil?
true
else
if #votes.size < #votes_per_period
true
else
if #votes[#index].created_at + find_settings.voting_period_in_hours.hours > Time.now.utc
false
else
true
end
end
end
end
When that returns, true -- they're allowed to vote. If false -- they can't. Right now, it relies on the records being retrieved in a certain order and that the one it selects is the oldest. This seems to work, but feels fragile to me.
I'd like to use :order => 'created_at DESC', but when I apply a limit to the query (I'd need to only get as many records as votes are allowed for that period), it seems to always pull the oldest records instead of the latest records and I'm not sure how to go about changing the query to pull the latest votes and not the oldest.
Any thoughts on the best way to go about this?
Can't you just count the user's votes which are newer than 24 hours old and check it against your limits? Am I missing something?
def not_voted_too_much?
votes_count = votes.where("created_at >= ?", 24.hours.ago).count
votes_count < find_settings.votes_per_period
end
(this is assuming you've got the votes association setup correctly in the user model)