Rails: Optimize querying maximum values from associated table - sql

I need to show a list of partners and the maximum value from the reservation_limit column from Klass table.
Partner has_many :klasses
Klass belongs_to :partner
# Partner controller
def index
#partners = Partner.includes(:klasses)
end
# view
<% #partners.each do |partner| %>
Up to <%= partner.klasses.maximum("reservation_limit") %> visits per month
<% end %>
Unfortunately the query below runs for every single Partner.
SELECT MAX("klasses"."reservation_limit") FROM "klasses" WHERE "klasses"."partner_id" = $1 [["partner_id", 1]]
If there are 40 partners then the query will run 40 times. How do I optimize this?
edit: Looks like there's a limit method in rails so I'm changing the limit in question to reservation_limit to prevent confusion.

You can use two forms of SQL to efficiently retrieve this information, and I'm assuming here that you want a result for a partner even where there is no klass record for it
The first is:
select partners.*,
max(klasses.limit) as max_klasses_limit
from partners
left join klasses on klasses.partner_id = partners.id
group by partner.id
Some RDBMSs require that you use "group by partner.*", though, which is potentially expensive in terms of the required sort and the possibility of it spilling to disk.
On the other hand you can add a clause such as:
having("max(klasses.limit) > ?", 3)
... to efficiently filter the partners by their value of maximum klass.limit
The other is:
select partners.*,
(Select max(klasses.limit)
from klasses
where klasses.partner_id = partners.id) as max_klasses_limit
from partners
The second one does not rely on a group by, and in some RDBMSs may be effectively transformed internally to the first form, but may execute less efficiently by the subquery being executed once per row in the partners table (which would stil be much faster than the raw Rails way of actually submitting a query per row).
The Rails ActiveRecord forms of these would be:
Partner.joins("left join klasses on klasses.partner_id = partners.id").
select("partners.*, max(klasses.limit) as max_klasses_limit").
group(:id)
... and ...
Partner.select("partners.*, (select max(klasses.limit)
from klasses
where klasses.partner_id = partners.id) as max_klasses_limit")
Which of these is actually the most efficient is probably going to depend on the RDBMS and even the RDBMS version.
If you don't need a result when there is no klass for the partner, or there is always guaranteed to be one, then:
Partner.joins(:klasses).
select("partners.*, max(klasses.limit) as max_klasses_limit").
group(:id)
Either way, you can then reference
partner.max_klasses_limit

Your initial query brings all the information you need. You only need to work with it as you would work with a regular array of objects.
Change
Up to <%= partner.klasses.maximum("reservation_limit") %> visits per month
to
Up to <%= partner.klasses.empty? ? 0 : partner.klasses.max_by { |k| k.reservation_limit }.reservation_limit %> visits per month
What maximum("reservation_limit") does it to trigger an Active Record query SELECT MAX.... But you don't need this, as you already have all the information you need to process the maximum in your array.
Note
Using .count on an Active Record result will trigger an extra SELECT COUNT... query!
Using .length will not.

It generally helps if you start writing the query in pure SQL and then extract it into ActiveRecord or Arel code.
ActiveRecord is powerful, but it tends to force you to write highly inefficient queries as soon as you derail from the standard CRUD operations.
Here's your query
Partner
.select('partners.*, (SELECT MAX(klasses.reservation_limit) FROM klasses WHERE klasses.partner_id = partners.id) AS maximum_limit')
.joins(:klasses).group('partners.id')
It is a single query, with a subquery. However the subquery is optimized to run only once as it can be parsed ahead and it doesn't run N+1 times.
The code above fetches all the partners, joins them with the klasses records and thanks to the join it can compute the aggregate maximum. Since the join effectively creates a cartesian product of the records, you then need to group by the partners.id (which in fact is required in any case by the MAX aggregate function).
The key here is the AS maximum_limit that will assign a new attribute to the Partner instances returned with the value of the count.
partners = Partner.select ...
partners.each do |partner|
puts partner.maximum_limit
end

This will return max. limits in one select for an array of parthner_ids:
parthner_ids = #partners.map{|p| p.id}
data = Klass.select('MAX("limit") as limit', 'partner_id').where(partner_id: parthner_ids).group('partner_id')
#limits = data.to_a.group_by{|d| d.id}
You can now integrate it into your view:
<% #partners.each do |partner| %>
Up to <%= #limits[partner.id].limit %> visits per month
<% end %>

Related

How to connect ransacker query to ransack sort search parameter

Problem:
I am using the ransack gem to sort columns in a table. I have 2 models: Campaign and Course. A campaign has many courses, and a course belongs to one campaign. Each course has a number of total_attendees. My Campaigns table has a column for Total Attendees, and I want it to be sortable. So it would sum up the total_attendees field for each course that belongs to a single campaign, and sort based on that sum.
Ex. A campaign has 3 courses, each with 10 attendees. The Total Attendees column on the campaign table would show 30 and it would be sortable against total attendees for all the other campaigns.
I found ransackers:
https://github.com/activerecord-hackery/ransack/wiki/Using-Ransackers
and this SO question: Ransack sort by sum of relation
and from that put together a lot of what is below.
From Model - campaign.rb:
class Campaign < ApplicationRecord
has_many :courses
ransacker :sum_of_total_attendees do
query = "SELECT SUM(r.total_attendees)
FROM campaigns c
LEFT OUTER JOIN courses r
ON r.campaign_id = c.id
GROUP BY c.id"
Arel.sql(query)
end
end
From Model - course.rb:
class Course < ApplicationRecord
belongs_to :campaign, optional: true
end
View:
<th scope="col"><%= sort_link(#q, :sum_of_total_attendees, 'Total Attendees') %></th>
Controller - campaigns_controller.rb:
all_campaigns = Campaign.all
#q = all_campaigns.ransack(params[:q])
#campaigns = #q.result
Errors:
The ransacker query gives me the data I want, but I don't know what to do to get the right information .
Originally, when I clicked on the th link to sort the data, I got this error:
PG::CardinalityViolation: ERROR: more than one row returned by a
subquery used as an expression
I don't know what changed, but now I'm getting this error:
PG::SyntaxError: ERROR: syntax error at or near "SELECT"
LINE 1: SELECT "campaigns".* FROM "campaigns" ORDER BY SELECT SUM(r....
^
: SELECT "campaigns".* FROM "campaigns" ORDER BY SELECT
SUM(r.total_attendees)
FROM campaigns c
LEFT OUTER JOIN courses r
ON r.campaign_id = c.id
GROUP BY c.id ASC
This error seems to say that the ransack search parameter, #q and the ransacker query don't work together. There are two selects in this request, when there should definitely be only one, but the first one is coming from ransack, so I'm not sure how to address it.
How do I get my query to sort correctly with ransack?
Articles I've looked at but did not seem to apply to what I was looking to accomplish with this story:
Ransack Sort By Sum of Relation: This is the one I worked from a lot, but I'm not sure why it works for this user and not for me. They don't show what is changed, if anything, in the controller
Ransack Github Issue For Multiple Params: This doesn't cover the issue of summing table columns.
Rails Ransack Sorting Searching Based On A Definition In The Model: This didn't apply to my need to sort based on summed data.
Three Ways to Bend The Ransack Gem: This looks like what I was doing, but I'm not sure why theirs is working but mine isn't.

Count total number of objects in list ordered by the number of associated objects

I have two models
class User
has_many :subscriptions
end
and
class Subscription
belongs_to :user
end
one one of my pages I would like to display a list of all users ordered by the number of subscriptions each user has. I am not to good with sql queries but I think that
list = Users.all.joins(:subscriptions).group("user.id").order("count(subscriptions.id) DESC")
dose the job. Now to my problem, when I try to count the total number of objects in list, using list.count, I get a hash with user.id and subscription count, like this
{11 => 5,
8 => 7,
1 => 11,
...}
not the total number of users in list.. .count works fine if I have a list sorted by for example user name (which is in the user table). I would really like to use .count since it in a module for pagination thats in a gem but any ideas is great!
Thanks!
We can just use a single query to finish this:
User.joins("LEFT OUTER JOIN ( SELECT user_id, COUNT(*) as num_subscriptions
FROM subscriptions
GROUP BY user_id
) AS temp
ON temp.user_id = users.id")
.order("temp.num_subscriptions DESC")
Basically, my idea is to try to query the number of subscription for each user_id in the subquery, then join with User. I used LEFT OUTER JOIN, because there will be several users which don't have any subscriptions
Improve option: You can define a scope inside User, it would be more beautiful for later usage:
user.rb
class User < ActiveRecord::Base
has_many :subscriptions
scope :sorted_by_num_subscriptions, -> {
joins("LEFT OUTER JOIN ( SELECT user_id, COUNT(*) as num_subscriptions
FROM subscriptions
GROUP BY user_id
) AS temp
ON temp.user_id = users.id")
.order("temp.num_subscriptions DESC")
}
end
Then just use it:
User.sorted_by_num_subscriptions
When grouping, the count method changes it's behavior and indeed, instead of returning the total count of records, it returns a hash of the counts for each group (see the docs for more info). So what you get with list.count is simply a hash of the subscription counts for each user.
So, your query is correct and all you need is to sum up the individual counts in the groups. This can be done simply by:
total_count = list.count.values.sum
If it is the pagination code that calls just a bare count that makes the issue, usually the pagination code is able to accept a parameter with total count. For example, will_paginate accepts the total_entries parameter, so you should be able to pass it the total count like this:
list.paginate(page: 2, total_entries: list.count.values.sum)

Get records with no related data using activerecord and RoR3?

I am making scopes for a model that looks something like this:
class PressRelease < ActiveRecord::Base
has_many :publications
end
What I want to get is all press_releases that does not have publications, but from a scope method, so it can be chained with other scopes. Any ideas?
Thanks!
NOTE: I know that there are methods like present? or any? and so on, but these methods does not return an ActiveRecord::Relation as scope does.
NOTE: I am using RoR 3
Avoid eager_loading if you do not need it (it adds overhead). Also, there is no need for subselect statements.
scope :without_publications, -> { joins("LEFT OUTER JOIN publications ON publications.press_release_id = press_releases.id").where(publications: { id: nil }) }
Explanation and response to comments
My initial thoughts about eager loading overhead is that ActiveRecord would instantiate all the child records (publications) for each press release. Then I realized that the query will never return press release records with publications. So that is a moot point.
There are some points and observations to be made about the way ActiveRecord works. Some things I had previously learned from experience, and some things I learned exploring your question.
The query from includes(:publications).where(publications: {id: nil}) is actually different from my example. It will return all columns from the publications table in addition to the columns from press_releases. The publication columns are completely unnecessary because they will always be null. However, both queries ultimately result in the same set of PressRelease objects.
With the includes method, if you add any sort of limit, for example chaining .first, .last or .limit(), then ActiveRecord (4.2.4) will resort to executing two queries. The first query returns IDs, and the second query uses those IDs to get results. Using the SQL snippet method, ActiveRecord is able to use just one query. Here is an example of this from one of my applications:
Profile.includes(:positions).where(positions: { id: nil }).limit(5)
# SQL (0.8ms) SELECT DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "positions" ON "positions"."profile_id" = "profiles"."id" WHERE "positions"."id" IS NULL LIMIT 5
# SQL (0.8ms) SELECT "profiles"."id" AS t0_r0, ..., "positions"."end_year" AS t1_r11 FROM "profiles" LEFT OUTER JOIN "positions" ON "positions"."profile_id" = "profiles"."id" # WHERE "positions"."id" IS NULL AND "profiles"."id" IN (107, 24, 7, 78, 89)
Profile.joins("LEFT OUTER JOIN positions ON positions.profile_id = profiles.id").where(positions: { id: nil }).limit(5)
# Profile Load (1.0ms) SELECT "profiles".* FROM "profiles" LEFT OUTER JOIN positions ON positions.profile_id = profiles.id WHERE "positions"."id" IS NULL LIMIT 5
Most importantly
eager_loading and includes were not intended to solve the problem at hand. And for this particular case I think you are much more aware of what is needed than ActiveRecord is. You can therefore make better decisions about how to structure the query.
you can de the following in your PressRelease:
scope :your_scope, -> { where('id NOT IN(select press_release_id from publications)') }
this will return all PressRelease record without publications.
Couple ways to do this, first one requires two db queries:
PressRelease.where.not(id: Publications.uniq.pluck(:press_release_id))
or if you don't want to hardcode association foreign key:
PressRelease.where.not(id: PressRelease.uniq.joins(:publications).pluck(:id))
Another one is to do a left join and pick those without associated elements - you get a relation object, but it will be tricky to work with it as it already has a join on it:
PressRelease.eager_load(:publications).where(publications: {id: nil})
Another one is to use counter_cache feature. You will need to add publication_count column to your press_releases table.
class Publications < ActiveRecord::Base
belongs_to :presss_release, counter_cache: true
end
Rails will keep this column in sync with a number of records associated to given mode, so then you can simply do:
PressRelease.where(publications_count: [nil, 0])

How to retrieve a list of records and the count of each one's children with condition in Active Record?

There are two models with our familiar one-to-many relationship:
class Custom
has_many :orders
end
class Order
belongs_to :custom
end
I want to do the following work:
get all the custom information whose age is over 18, and how many big orders(pay for 1,000 dollars) they have?
UPDATE:
for the models:
rails g model custom name:string age:integer
rails g model orders amount:decimal custom_id:integer
I hope one left join sql statement will do all my job, and don't construct unnecessary objects like this:
Custom.where('age > ?', '18').includes(:orders).where('orders.amount > ?', '1000')
It will construct a lot of order objects which I don't need, and it will calculate the count by Array#count function which will waste time.
UPDATE 2:
My own solution is wrong, it will remove customs who doesn't have big orders from the result.
Finding adult customers with big orders
This solution uses a single query, with the nested orders relation transformed into a sub-query.
big_customers = Custom.where("age > ?", "18").where(
id: Order.where("amount > ?", "1000").select(:custom_id)
)
Grab all adults and their # of big orders (MySQL)
This can still be done in a single query. The count is grabbed via a join on orders and sticking the count of orders into a column in the result called big_orders_count, which ActiveRecord turns into a method. It involves a lot more "raw" SQL. I don't know any way to avoid this with ActiveRecord except with the great squeel gem.
adults = Custom.where("age > ?", "18").select([
Custom.arel_table["*"],
"count(orders.id) as big_orders_count"
]).joins(%{LEFT JOIN orders
ON orders.custom_id = customs.id
AND orders.amount > 1000})
# see count:
adults.first.big_orders_count
You might want to consider caching counters like this. This join will be expensive on the database, so if you had a dedicated customs.big_order_count column that was either refreshed regularly or updated by an observer that watches for big Order records.
Grab all adults and their # of big orders (PostgreSQL)
Solution 2 is mysql only. To get this to work in postgresql I created a third solution that uses a sub-query. Still one call to the DB :-)
adults = Custom.where("age > ?", "18").select([
%{"customs".*},
%{(
SELECT count(*)
FROM orders
WHERE orders.custom_id = customs.id
AND orders.amount > 1000
) AS big_orders_count}
])
# see count:
adults.first.big_orders_count
I have tested this against postgresql with real data. There may be a way to use more ActiveRecord and less SQL, but this works.
Edited.
#custom_over_18 = Custom.where("age > ?", "18").orders.where("amount > ?", "1000").count

Help optimizing ActiveRecord query (voting system)

I have a voting system with two models: Item(id, name) and Vote(id, item_id, user_id).
Here's the code I have so far:
class Item < ActiveRecord::Base
has_many :votes
def self.most_popular
items = Item.all #where can I optimize here?
items.sort {|x,y| x.votes.length <=> y.votes.length}.first #so I don't need to do anything here?
end
end
There's a few things wrong with this, mainly that I retrieve all the Item records, THEN use Ruby to compute popularity. I am almost certain there is a simple solution to this, but I can't quite put my finger on it.
I'd much rather gather records and run the calculations in the initial query. This way, I can add a simple :limit => 1 (or LIMIT 1) to the query.
Any help would be great--either rewrite in all ActiveRecord or even in raw SQl. The latter would actually give me a much clearer picture of the nature of the query I want to execute.
Group the votes by item id, order them by count and then take the item of the first one. In rails 3 the code for this is:
Vote.group(:item_id).order("count(*) DESC").first.item
In rails 2, this should work:
Vote.all(:order => "count(*) DESC", :group => :item_id).first.item
sepp2k has the right idea. In case you're not using Rails 3, the equivalent is:
Vote.first(:group => :item_id, :order => "count(*) DESC", :include => :item).item
Probably there's a better way to do this in ruby, but in SQL (mysql at least) you could try something like this to get a top 10 ranking:
SELECT i.id, i.name, COUNT( v.id ) AS total_votes
FROM Item i
LEFT JOIN Vote v ON ( i.id = v.item_id )
GROUP BY i.id
ORDER BY total_votes DESC
LIMIT 10
One easy way of handling this is to add a vote count field to the Item, and update that each time there is a vote. Rails used to do that automatically for you, but not sure if it's still the case in 2.x and 3.0. It's easy enough for you to do it in any case using an Observer pattern or else just by putting in a "after_save" in the Vote model.
Then your query is very easy, by simply adding a "VOTE_COUNT DESC" order to your query.