Help optimizing ActiveRecord query (voting system) - sql

I have a voting system with two models: Item(id, name) and Vote(id, item_id, user_id).
Here's the code I have so far:
class Item < ActiveRecord::Base
has_many :votes
def self.most_popular
items = Item.all #where can I optimize here?
items.sort {|x,y| x.votes.length <=> y.votes.length}.first #so I don't need to do anything here?
end
end
There's a few things wrong with this, mainly that I retrieve all the Item records, THEN use Ruby to compute popularity. I am almost certain there is a simple solution to this, but I can't quite put my finger on it.
I'd much rather gather records and run the calculations in the initial query. This way, I can add a simple :limit => 1 (or LIMIT 1) to the query.
Any help would be great--either rewrite in all ActiveRecord or even in raw SQl. The latter would actually give me a much clearer picture of the nature of the query I want to execute.

Group the votes by item id, order them by count and then take the item of the first one. In rails 3 the code for this is:
Vote.group(:item_id).order("count(*) DESC").first.item
In rails 2, this should work:
Vote.all(:order => "count(*) DESC", :group => :item_id).first.item

sepp2k has the right idea. In case you're not using Rails 3, the equivalent is:
Vote.first(:group => :item_id, :order => "count(*) DESC", :include => :item).item

Probably there's a better way to do this in ruby, but in SQL (mysql at least) you could try something like this to get a top 10 ranking:
SELECT i.id, i.name, COUNT( v.id ) AS total_votes
FROM Item i
LEFT JOIN Vote v ON ( i.id = v.item_id )
GROUP BY i.id
ORDER BY total_votes DESC
LIMIT 10

One easy way of handling this is to add a vote count field to the Item, and update that each time there is a vote. Rails used to do that automatically for you, but not sure if it's still the case in 2.x and 3.0. It's easy enough for you to do it in any case using an Observer pattern or else just by putting in a "after_save" in the Vote model.
Then your query is very easy, by simply adding a "VOTE_COUNT DESC" order to your query.

Related

Rails: Optimize querying maximum values from associated table

I need to show a list of partners and the maximum value from the reservation_limit column from Klass table.
Partner has_many :klasses
Klass belongs_to :partner
# Partner controller
def index
#partners = Partner.includes(:klasses)
end
# view
<% #partners.each do |partner| %>
Up to <%= partner.klasses.maximum("reservation_limit") %> visits per month
<% end %>
Unfortunately the query below runs for every single Partner.
SELECT MAX("klasses"."reservation_limit") FROM "klasses" WHERE "klasses"."partner_id" = $1 [["partner_id", 1]]
If there are 40 partners then the query will run 40 times. How do I optimize this?
edit: Looks like there's a limit method in rails so I'm changing the limit in question to reservation_limit to prevent confusion.
You can use two forms of SQL to efficiently retrieve this information, and I'm assuming here that you want a result for a partner even where there is no klass record for it
The first is:
select partners.*,
max(klasses.limit) as max_klasses_limit
from partners
left join klasses on klasses.partner_id = partners.id
group by partner.id
Some RDBMSs require that you use "group by partner.*", though, which is potentially expensive in terms of the required sort and the possibility of it spilling to disk.
On the other hand you can add a clause such as:
having("max(klasses.limit) > ?", 3)
... to efficiently filter the partners by their value of maximum klass.limit
The other is:
select partners.*,
(Select max(klasses.limit)
from klasses
where klasses.partner_id = partners.id) as max_klasses_limit
from partners
The second one does not rely on a group by, and in some RDBMSs may be effectively transformed internally to the first form, but may execute less efficiently by the subquery being executed once per row in the partners table (which would stil be much faster than the raw Rails way of actually submitting a query per row).
The Rails ActiveRecord forms of these would be:
Partner.joins("left join klasses on klasses.partner_id = partners.id").
select("partners.*, max(klasses.limit) as max_klasses_limit").
group(:id)
... and ...
Partner.select("partners.*, (select max(klasses.limit)
from klasses
where klasses.partner_id = partners.id) as max_klasses_limit")
Which of these is actually the most efficient is probably going to depend on the RDBMS and even the RDBMS version.
If you don't need a result when there is no klass for the partner, or there is always guaranteed to be one, then:
Partner.joins(:klasses).
select("partners.*, max(klasses.limit) as max_klasses_limit").
group(:id)
Either way, you can then reference
partner.max_klasses_limit
Your initial query brings all the information you need. You only need to work with it as you would work with a regular array of objects.
Change
Up to <%= partner.klasses.maximum("reservation_limit") %> visits per month
to
Up to <%= partner.klasses.empty? ? 0 : partner.klasses.max_by { |k| k.reservation_limit }.reservation_limit %> visits per month
What maximum("reservation_limit") does it to trigger an Active Record query SELECT MAX.... But you don't need this, as you already have all the information you need to process the maximum in your array.
Note
Using .count on an Active Record result will trigger an extra SELECT COUNT... query!
Using .length will not.
It generally helps if you start writing the query in pure SQL and then extract it into ActiveRecord or Arel code.
ActiveRecord is powerful, but it tends to force you to write highly inefficient queries as soon as you derail from the standard CRUD operations.
Here's your query
Partner
.select('partners.*, (SELECT MAX(klasses.reservation_limit) FROM klasses WHERE klasses.partner_id = partners.id) AS maximum_limit')
.joins(:klasses).group('partners.id')
It is a single query, with a subquery. However the subquery is optimized to run only once as it can be parsed ahead and it doesn't run N+1 times.
The code above fetches all the partners, joins them with the klasses records and thanks to the join it can compute the aggregate maximum. Since the join effectively creates a cartesian product of the records, you then need to group by the partners.id (which in fact is required in any case by the MAX aggregate function).
The key here is the AS maximum_limit that will assign a new attribute to the Partner instances returned with the value of the count.
partners = Partner.select ...
partners.each do |partner|
puts partner.maximum_limit
end
This will return max. limits in one select for an array of parthner_ids:
parthner_ids = #partners.map{|p| p.id}
data = Klass.select('MAX("limit") as limit', 'partner_id').where(partner_id: parthner_ids).group('partner_id')
#limits = data.to_a.group_by{|d| d.id}
You can now integrate it into your view:
<% #partners.each do |partner| %>
Up to <%= #limits[partner.id].limit %> visits per month
<% end %>

How to retrieve a list of records and the count of each one's children with condition in Active Record?

There are two models with our familiar one-to-many relationship:
class Custom
has_many :orders
end
class Order
belongs_to :custom
end
I want to do the following work:
get all the custom information whose age is over 18, and how many big orders(pay for 1,000 dollars) they have?
UPDATE:
for the models:
rails g model custom name:string age:integer
rails g model orders amount:decimal custom_id:integer
I hope one left join sql statement will do all my job, and don't construct unnecessary objects like this:
Custom.where('age > ?', '18').includes(:orders).where('orders.amount > ?', '1000')
It will construct a lot of order objects which I don't need, and it will calculate the count by Array#count function which will waste time.
UPDATE 2:
My own solution is wrong, it will remove customs who doesn't have big orders from the result.
Finding adult customers with big orders
This solution uses a single query, with the nested orders relation transformed into a sub-query.
big_customers = Custom.where("age > ?", "18").where(
id: Order.where("amount > ?", "1000").select(:custom_id)
)
Grab all adults and their # of big orders (MySQL)
This can still be done in a single query. The count is grabbed via a join on orders and sticking the count of orders into a column in the result called big_orders_count, which ActiveRecord turns into a method. It involves a lot more "raw" SQL. I don't know any way to avoid this with ActiveRecord except with the great squeel gem.
adults = Custom.where("age > ?", "18").select([
Custom.arel_table["*"],
"count(orders.id) as big_orders_count"
]).joins(%{LEFT JOIN orders
ON orders.custom_id = customs.id
AND orders.amount > 1000})
# see count:
adults.first.big_orders_count
You might want to consider caching counters like this. This join will be expensive on the database, so if you had a dedicated customs.big_order_count column that was either refreshed regularly or updated by an observer that watches for big Order records.
Grab all adults and their # of big orders (PostgreSQL)
Solution 2 is mysql only. To get this to work in postgresql I created a third solution that uses a sub-query. Still one call to the DB :-)
adults = Custom.where("age > ?", "18").select([
%{"customs".*},
%{(
SELECT count(*)
FROM orders
WHERE orders.custom_id = customs.id
AND orders.amount > 1000
) AS big_orders_count}
])
# see count:
adults.first.big_orders_count
I have tested this against postgresql with real data. There may be a way to use more ActiveRecord and less SQL, but this works.
Edited.
#custom_over_18 = Custom.where("age > ?", "18").orders.where("amount > ?", "1000").count

How do I create this active_records query

I have a user model and a story model. Users have many stories.
I want to create a scope that returns the twenty-five users records for users who have created the most stories today, along with the amount of stories that they have created.
I know that there are people on SO that are great with active_records queries. I also know that I am not one of those guys:-(. Help would be greatly appreciated and readily accepted!
#UPDATE with the solution
I've been working with #MrYoshiji's suggestion, and here is what i came up with (note, I'm using this query in my active_admin dashboard):
panel "Today's Top Posters" do
time_range = Date.today.beginning_of_day..Date.today.end_of_day
table_for User.joins(:stories)
.select('users.username, count(stories.*) as story_count')
.group('users.id')
.where(:stories => {:created_at => time_range})
.order('story_count DESC')
.limit(25) do
column :username
column "story_count"
end
end
And low and behold, it works!!!!
Note, when I tried a simplified version of MrYoshiji's suggestion:
User.includes(:stories)
.select('users.username, count(stories.*) as story_count')
.order('story_count DESC') #with or without the group statement
.limit(25)
I got the following error:
> User.includes(:stories).select('users.username, count(stories.*) as story_count').order('story_count DESC').limit(25)
User Load (1.6ms) SELECT DISTINCT "users".id, story_count AS alias_0 FROM "users" LEFT OUTER JOIN "stories" ON "stories"."user_id" = "users"."id" ORDER BY story_count DESC LIMIT 25
ActiveRecord::StatementInvalid: PG::Error: ERROR: column "story_count" does not exist
LINE 1: SELECT DISTINCT "users".id, story_count AS alias_0 FROM "us...
It seems like includes don't like any aliasing.
I can't test this right now, running under Windows... Can you try it?
User.includes(:stories)
.select('users.*, count(stories.*) as story_count')
.group('users.id')
.order('story_count DESC')
.where('stories.created_at BETWEEN ? AND ?', Date.today.beginning_of_day, Day.today.end_of_day)
.limit(25)
Using .includes, as suggested by MrYshiji, did not work due to aliasing problems (see the original question for more info on this). Instead, I used .joins as follows to get the query results that I wanted (note, this query is inside my active_admin dashboard):
panel "Today's Top Posters" do
time_range = Date.today.beginning_of_day..Date.today.end_of_day
table_for User.joins(:stories)
.select('users.username, count(stories.*) as story_count')
.group('users.id')
.where(:stories => {:created_at => time_range})
.order('story_count DESC')
.limit(25) do
column :username
column "story_count"
end
end

Symfony - select from multiple tables

I've got gallery and to display it I need to get few information like number of comments, rating, favs etc. I do something like under, but it doesn't seem good for me.
How to do it in a better way? Maybe is there any way to do it in one query, without subqueries?
I could add to table Image columns like number of comments, favs etc., but if something go wrong then statistics won't be real. Counting it each time is more reliable.
$images = $this->getDoctrine()->getEntityManager()
->createQuery('SELECT img
FROM AcmeMainBundle:Image img
WHERE img.category = :category
ORDER BY img.order ASC, img.id DESC')
->setParameter('category', $category)
->getResult();
$comments = $this->getDoctrine()->getEntityManager()
->createQuery('SELECT i.id, COUNT(i.id) as c_count
FROM AcmeMainBundle:ImgComment c
JOIN c.image i
WHERE i.category = :category
GROUP BY c.image')
->setParameter('category', $category)
->getResult();
$ratings = $this->getDoctrine()->getEntityManager()
->createQuery('SELECT i.id, SUM(r.rating) as suma, COUNT(r.rating) as votes
FROM AcmeMainBundle:ImgRating r
JOIN r.image i
WHERE i.category = :category
GROUP BY r.image')
->setParameter('category', $category)
->getResult();
In most cases, thi is okay. If you don't have thousands of different categories you are pretty lucky that sql caches your queries and results. It also depends on the average amounth of ratings per image for the calculation. Try to copy the generated sql query and benchmark it in Sql:
EXPLAIN $yourquery
Mind that you should have indexes on the fields that are used for subquery calculation that are not updated often (which should be the case for image ratings).
Maybe your want to have look into your mysql statictics, whether your query_cache size do fit. It really depends on the size of the project.
Hope that helps you out a bit.

SQL: how to find a complement to a set with a derived function/value

This one has me stumped, so I'm hoping someone who's smarter than me can help me out.
I'm working on a rails project in which I've got a User model which has an association of clock_periods joined to it, having the following partial definition:
User
has_many :clock_periods
#clock_periods has the following properties:
#clock_in_time:datetime
#clock_out_time:datetime
named_scope :clocked_in, :select => "users.*",
:joins => :clock_periods, :conditions => 'clock_periods.clock_out_time IS NULL'
def clocked_in?
#default scope on clock periods sorts by date
clock_periods.last.clock_out_time.nil?
end
The SQL query to retrieve all clocked in users is trivial:
SELECT users.* FROM users INNER JOIN clock_periods ON clock_periods.user_id = users.id
WHERE clock_periods.clock_out_time IS NULL
The converse however--finding all users who are currently clocked out--is deceptively difficult. I ended up using the following named scope definition, though its hackish:
named_scope :clocked_out, lambda{{
:conditions => ["users.id not in (?)", clocked_in.map(&:id)+ [-1]]
}}
What bothers me about it is that it seems like there ought to be a way to do this in SQL without resorting to generating statements like
SELECT users.* FROM users WHERE users.id NOT IN (1,3,5)
Anybody got a better way, or is this really the only way to handle it?
Besides #Eric's suggestion there's the issue (unless you've guaranteed against it in some other way you're not showing us) that a user might not have any clock period -- then the inner join would fail to include that user and he wouldn't show either as clocked in or as clocked out. Assuming you also want to show those users as clocked out, the SQL should be something like:
SELECT users.*
FROM users
LEFT JOIN clock_periods ON clock_periods.user_id = users.id
WHERE (clock_periods.clock_user_id IS NULL) OR
(getdate() BETWEEN clock_periods.clock_out_time AND
clock_periods.clock_in_time)
(this kind of thing is the main use of outer joins such as LEFT JOIN).
assuming getdate() = the function in your SQL implementation that returns a datetime representing right now.
SELECT users.* FROM users INNER JOIN clock_periods ON clock_periods.user_id = users.id
WHERE getdate() > clock_periods.clock_out_time and getdate() < clock_periods.clock_in_time
In rails, Eric H's answer should look something like:
users = ClockPeriod.find(:all, :select => 'users.*', :include => :user,
:conditions => ['? > clock_periods.clock_out_time AND ? < clock_periods.clock_in_time',
Time.now, Time.now])
At least, I think that would work...