How to retrieve a list of records and the count of each one's children with condition in Active Record? - sql

There are two models with our familiar one-to-many relationship:
class Custom
has_many :orders
end
class Order
belongs_to :custom
end
I want to do the following work:
get all the custom information whose age is over 18, and how many big orders(pay for 1,000 dollars) they have?
UPDATE:
for the models:
rails g model custom name:string age:integer
rails g model orders amount:decimal custom_id:integer
I hope one left join sql statement will do all my job, and don't construct unnecessary objects like this:
Custom.where('age > ?', '18').includes(:orders).where('orders.amount > ?', '1000')
It will construct a lot of order objects which I don't need, and it will calculate the count by Array#count function which will waste time.
UPDATE 2:
My own solution is wrong, it will remove customs who doesn't have big orders from the result.

Finding adult customers with big orders
This solution uses a single query, with the nested orders relation transformed into a sub-query.
big_customers = Custom.where("age > ?", "18").where(
id: Order.where("amount > ?", "1000").select(:custom_id)
)
Grab all adults and their # of big orders (MySQL)
This can still be done in a single query. The count is grabbed via a join on orders and sticking the count of orders into a column in the result called big_orders_count, which ActiveRecord turns into a method. It involves a lot more "raw" SQL. I don't know any way to avoid this with ActiveRecord except with the great squeel gem.
adults = Custom.where("age > ?", "18").select([
Custom.arel_table["*"],
"count(orders.id) as big_orders_count"
]).joins(%{LEFT JOIN orders
ON orders.custom_id = customs.id
AND orders.amount > 1000})
# see count:
adults.first.big_orders_count
You might want to consider caching counters like this. This join will be expensive on the database, so if you had a dedicated customs.big_order_count column that was either refreshed regularly or updated by an observer that watches for big Order records.
Grab all adults and their # of big orders (PostgreSQL)
Solution 2 is mysql only. To get this to work in postgresql I created a third solution that uses a sub-query. Still one call to the DB :-)
adults = Custom.where("age > ?", "18").select([
%{"customs".*},
%{(
SELECT count(*)
FROM orders
WHERE orders.custom_id = customs.id
AND orders.amount > 1000
) AS big_orders_count}
])
# see count:
adults.first.big_orders_count
I have tested this against postgresql with real data. There may be a way to use more ActiveRecord and less SQL, but this works.

Edited.
#custom_over_18 = Custom.where("age > ?", "18").orders.where("amount > ?", "1000").count

Related

Count total number of objects in list ordered by the number of associated objects

I have two models
class User
has_many :subscriptions
end
and
class Subscription
belongs_to :user
end
one one of my pages I would like to display a list of all users ordered by the number of subscriptions each user has. I am not to good with sql queries but I think that
list = Users.all.joins(:subscriptions).group("user.id").order("count(subscriptions.id) DESC")
dose the job. Now to my problem, when I try to count the total number of objects in list, using list.count, I get a hash with user.id and subscription count, like this
{11 => 5,
8 => 7,
1 => 11,
...}
not the total number of users in list.. .count works fine if I have a list sorted by for example user name (which is in the user table). I would really like to use .count since it in a module for pagination thats in a gem but any ideas is great!
Thanks!
We can just use a single query to finish this:
User.joins("LEFT OUTER JOIN ( SELECT user_id, COUNT(*) as num_subscriptions
FROM subscriptions
GROUP BY user_id
) AS temp
ON temp.user_id = users.id")
.order("temp.num_subscriptions DESC")
Basically, my idea is to try to query the number of subscription for each user_id in the subquery, then join with User. I used LEFT OUTER JOIN, because there will be several users which don't have any subscriptions
Improve option: You can define a scope inside User, it would be more beautiful for later usage:
user.rb
class User < ActiveRecord::Base
has_many :subscriptions
scope :sorted_by_num_subscriptions, -> {
joins("LEFT OUTER JOIN ( SELECT user_id, COUNT(*) as num_subscriptions
FROM subscriptions
GROUP BY user_id
) AS temp
ON temp.user_id = users.id")
.order("temp.num_subscriptions DESC")
}
end
Then just use it:
User.sorted_by_num_subscriptions
When grouping, the count method changes it's behavior and indeed, instead of returning the total count of records, it returns a hash of the counts for each group (see the docs for more info). So what you get with list.count is simply a hash of the subscription counts for each user.
So, your query is correct and all you need is to sum up the individual counts in the groups. This can be done simply by:
total_count = list.count.values.sum
If it is the pagination code that calls just a bare count that makes the issue, usually the pagination code is able to accept a parameter with total count. For example, will_paginate accepts the total_entries parameter, so you should be able to pass it the total count like this:
list.paginate(page: 2, total_entries: list.count.values.sum)

Get records with no related data using activerecord and RoR3?

I am making scopes for a model that looks something like this:
class PressRelease < ActiveRecord::Base
has_many :publications
end
What I want to get is all press_releases that does not have publications, but from a scope method, so it can be chained with other scopes. Any ideas?
Thanks!
NOTE: I know that there are methods like present? or any? and so on, but these methods does not return an ActiveRecord::Relation as scope does.
NOTE: I am using RoR 3
Avoid eager_loading if you do not need it (it adds overhead). Also, there is no need for subselect statements.
scope :without_publications, -> { joins("LEFT OUTER JOIN publications ON publications.press_release_id = press_releases.id").where(publications: { id: nil }) }
Explanation and response to comments
My initial thoughts about eager loading overhead is that ActiveRecord would instantiate all the child records (publications) for each press release. Then I realized that the query will never return press release records with publications. So that is a moot point.
There are some points and observations to be made about the way ActiveRecord works. Some things I had previously learned from experience, and some things I learned exploring your question.
The query from includes(:publications).where(publications: {id: nil}) is actually different from my example. It will return all columns from the publications table in addition to the columns from press_releases. The publication columns are completely unnecessary because they will always be null. However, both queries ultimately result in the same set of PressRelease objects.
With the includes method, if you add any sort of limit, for example chaining .first, .last or .limit(), then ActiveRecord (4.2.4) will resort to executing two queries. The first query returns IDs, and the second query uses those IDs to get results. Using the SQL snippet method, ActiveRecord is able to use just one query. Here is an example of this from one of my applications:
Profile.includes(:positions).where(positions: { id: nil }).limit(5)
# SQL (0.8ms) SELECT DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "positions" ON "positions"."profile_id" = "profiles"."id" WHERE "positions"."id" IS NULL LIMIT 5
# SQL (0.8ms) SELECT "profiles"."id" AS t0_r0, ..., "positions"."end_year" AS t1_r11 FROM "profiles" LEFT OUTER JOIN "positions" ON "positions"."profile_id" = "profiles"."id" # WHERE "positions"."id" IS NULL AND "profiles"."id" IN (107, 24, 7, 78, 89)
Profile.joins("LEFT OUTER JOIN positions ON positions.profile_id = profiles.id").where(positions: { id: nil }).limit(5)
# Profile Load (1.0ms) SELECT "profiles".* FROM "profiles" LEFT OUTER JOIN positions ON positions.profile_id = profiles.id WHERE "positions"."id" IS NULL LIMIT 5
Most importantly
eager_loading and includes were not intended to solve the problem at hand. And for this particular case I think you are much more aware of what is needed than ActiveRecord is. You can therefore make better decisions about how to structure the query.
you can de the following in your PressRelease:
scope :your_scope, -> { where('id NOT IN(select press_release_id from publications)') }
this will return all PressRelease record without publications.
Couple ways to do this, first one requires two db queries:
PressRelease.where.not(id: Publications.uniq.pluck(:press_release_id))
or if you don't want to hardcode association foreign key:
PressRelease.where.not(id: PressRelease.uniq.joins(:publications).pluck(:id))
Another one is to do a left join and pick those without associated elements - you get a relation object, but it will be tricky to work with it as it already has a join on it:
PressRelease.eager_load(:publications).where(publications: {id: nil})
Another one is to use counter_cache feature. You will need to add publication_count column to your press_releases table.
class Publications < ActiveRecord::Base
belongs_to :presss_release, counter_cache: true
end
Rails will keep this column in sync with a number of records associated to given mode, so then you can simply do:
PressRelease.where(publications_count: [nil, 0])

Rails: Optimize querying maximum values from associated table

I need to show a list of partners and the maximum value from the reservation_limit column from Klass table.
Partner has_many :klasses
Klass belongs_to :partner
# Partner controller
def index
#partners = Partner.includes(:klasses)
end
# view
<% #partners.each do |partner| %>
Up to <%= partner.klasses.maximum("reservation_limit") %> visits per month
<% end %>
Unfortunately the query below runs for every single Partner.
SELECT MAX("klasses"."reservation_limit") FROM "klasses" WHERE "klasses"."partner_id" = $1 [["partner_id", 1]]
If there are 40 partners then the query will run 40 times. How do I optimize this?
edit: Looks like there's a limit method in rails so I'm changing the limit in question to reservation_limit to prevent confusion.
You can use two forms of SQL to efficiently retrieve this information, and I'm assuming here that you want a result for a partner even where there is no klass record for it
The first is:
select partners.*,
max(klasses.limit) as max_klasses_limit
from partners
left join klasses on klasses.partner_id = partners.id
group by partner.id
Some RDBMSs require that you use "group by partner.*", though, which is potentially expensive in terms of the required sort and the possibility of it spilling to disk.
On the other hand you can add a clause such as:
having("max(klasses.limit) > ?", 3)
... to efficiently filter the partners by their value of maximum klass.limit
The other is:
select partners.*,
(Select max(klasses.limit)
from klasses
where klasses.partner_id = partners.id) as max_klasses_limit
from partners
The second one does not rely on a group by, and in some RDBMSs may be effectively transformed internally to the first form, but may execute less efficiently by the subquery being executed once per row in the partners table (which would stil be much faster than the raw Rails way of actually submitting a query per row).
The Rails ActiveRecord forms of these would be:
Partner.joins("left join klasses on klasses.partner_id = partners.id").
select("partners.*, max(klasses.limit) as max_klasses_limit").
group(:id)
... and ...
Partner.select("partners.*, (select max(klasses.limit)
from klasses
where klasses.partner_id = partners.id) as max_klasses_limit")
Which of these is actually the most efficient is probably going to depend on the RDBMS and even the RDBMS version.
If you don't need a result when there is no klass for the partner, or there is always guaranteed to be one, then:
Partner.joins(:klasses).
select("partners.*, max(klasses.limit) as max_klasses_limit").
group(:id)
Either way, you can then reference
partner.max_klasses_limit
Your initial query brings all the information you need. You only need to work with it as you would work with a regular array of objects.
Change
Up to <%= partner.klasses.maximum("reservation_limit") %> visits per month
to
Up to <%= partner.klasses.empty? ? 0 : partner.klasses.max_by { |k| k.reservation_limit }.reservation_limit %> visits per month
What maximum("reservation_limit") does it to trigger an Active Record query SELECT MAX.... But you don't need this, as you already have all the information you need to process the maximum in your array.
Note
Using .count on an Active Record result will trigger an extra SELECT COUNT... query!
Using .length will not.
It generally helps if you start writing the query in pure SQL and then extract it into ActiveRecord or Arel code.
ActiveRecord is powerful, but it tends to force you to write highly inefficient queries as soon as you derail from the standard CRUD operations.
Here's your query
Partner
.select('partners.*, (SELECT MAX(klasses.reservation_limit) FROM klasses WHERE klasses.partner_id = partners.id) AS maximum_limit')
.joins(:klasses).group('partners.id')
It is a single query, with a subquery. However the subquery is optimized to run only once as it can be parsed ahead and it doesn't run N+1 times.
The code above fetches all the partners, joins them with the klasses records and thanks to the join it can compute the aggregate maximum. Since the join effectively creates a cartesian product of the records, you then need to group by the partners.id (which in fact is required in any case by the MAX aggregate function).
The key here is the AS maximum_limit that will assign a new attribute to the Partner instances returned with the value of the count.
partners = Partner.select ...
partners.each do |partner|
puts partner.maximum_limit
end
This will return max. limits in one select for an array of parthner_ids:
parthner_ids = #partners.map{|p| p.id}
data = Klass.select('MAX("limit") as limit', 'partner_id').where(partner_id: parthner_ids).group('partner_id')
#limits = data.to_a.group_by{|d| d.id}
You can now integrate it into your view:
<% #partners.each do |partner| %>
Up to <%= #limits[partner.id].limit %> visits per month
<% end %>

Filtering model with HABTM relationship

I have 2 models - Restaurant and Feature. They are connected via has_and_belongs_to_many relationship. The gist of it is that you have restaurants with many features like delivery, pizza, sandwiches, salad bar, vegetarian option,… So now when the user wants to filter the restaurants and lets say he checks pizza and delivery, I want to display all the restaurants that have both features; pizza, delivery and maybe some more, but it HAS TO HAVE pizza AND delivery.
If I do a simple .where('features IN (?)', params[:features]) I (of course) get the restaurants that have either - so or pizza or delivery or both - which is not at all what I want.
My SQL/Rails knowledge is kinda limited since I'm new to this but I asked a friend and now I have this huuuge SQL that gets the job done:
Restaurant.find_by_sql(['SELECT restaurant_id FROM (
SELECT features_restaurants.*, ROW_NUMBER() OVER(PARTITION BY restaurants.id ORDER BY features.id) AS rn FROM restaurants
JOIN features_restaurants ON restaurants.id = features_restaurants.restaurant_id
JOIN features ON features_restaurants.feature_id = features.id
WHERE features.id in (?)
) t
WHERE rn = ?', params[:features], params[:features].count])
So my question is: is there a better - more Rails even - way of doing this? How would you do it?
Oh BTW I'm using Rails 4 on Heroku so it's a Postgres DB.
This is an example of a set-iwthin-sets query. I advocate solving these with group by and having, because this provides a general framework.
Here is how this works in your case:
select fr.restaurant_id
from features_restaurants fr join
features f
on fr.feature_id = f.feature_id
group by fr.restaurant_id
having sum(case when f.feature_name = 'pizza' then 1 else 0 end) > 0 and
sum(case when f.feature_name = 'delivery' then 1 else 0 end) > 0
Each condition in the having clause is counting for the presence of one of the features -- "pizza" and "delivery". If both features are present, then you get the restaurant_id.
How much data is in your features table? Is it just a table of ids and names?
If so, and you're willing to do a little denormalization, you can do this much more easily by encoding the features as a text array on restaurant.
With this scheme your queries boil down to
select * from restaurants where restaurants.features #> ARRAY['pizza', 'delivery']
If you want to maintain your features table because it contains useful data, you can store the array of feature ids on the restaurant and do a query like this:
select * from restaurants where restaurants.feature_ids #> ARRAY[5, 17]
If you don't know the ids up front, and want it all in one query, you should be able to do something along these lines:
select * from restaurants where restaurants.feature_ids #> (
select id from features where name in ('pizza', 'delivery')
) as matched_features
That last query might need some more consideration...
Anyways, I've actually got a pretty detailed article written up about Tagging in Postgres and ActiveRecord if you want some more details.
This is not "copy and paste" solution but if you consider following steps you will have fast working query.
index feature_name column (I'm assuming that column feature_id is indexed on both tables)
place each feature_name param in exists():
select fr.restaurant_id
from
features_restaurants fr
where
exists(select true from features f where fr.feature_id = f.feature_id and f.feature_name = 'pizza')
and
exists(select true from features f where fr.feature_id = f.feature_id and f.feature_name = 'delivery')
group by
fr.restaurant_id
Maybe you're looking at it backwards?
Maybe try merging the restaurants returned by each feature.
Simplified:
pizza_restaurants = Feature.find_by_name('pizza').restaurants
delivery_restaurants = Feature.find_by_name('delivery').restaurants
pizza_delivery_restaurants = pizza_restaurants & delivery_restaurants
Obviously, this is a single instance solution. But it illustrates the idea.
UPDATE
Here's a dynamic method to pull in all filters without writing SQL (i.e. the "Railsy" way)
def get_restaurants_by_feature_names(features)
# accepts an array of feature names
restaurants = Restaurant.all
features.each do |f|
feature_restaurants = Feature.find_by_name(f).restaurants
restaurants = feature_restaurants & restaurants
end
return restaurants
end
Since its an AND condition (the OR conditions get dicey with AREL). I reread your stated problem and ignoring the SQL. I think this is what you want.
# in Restaurant
has_many :features
# in Feature
has_many :restaurants
# this is a contrived example. you may be doing something like
# where(name: 'pizza'). I'm just making this condition up. You
# could also make this more DRY by just passing in the name if
# that's what you're doing.
def self.pizza
where(pizza: true)
end
def self.delivery
where(delivery: true)
end
# query
Restaurant.features.pizza.delivery
Basically you call the association with ".features" and then you use the self methods defined on features. Hopefully I didn't misunderstand the original problem.
Cheers!
Restaurant
.joins(:features)
.where(features: {name: ['pizza','delivery']})
.group(:id)
.having('count(features.name) = ?', 2)
This seems to work for me. I tried it with SQLite though.

ActiveRecord: Adding condition to ON clause for includes

I have a model offers and another historical_offers, one offer has_many historical_offers.
Now I would like to eager load the historical_offers of one given day for a set of offers, if it exists. For this, I think I need to pass the day to the ON clause, not the WHERE clause, so that I get all offers, also when there is no historical_offer for the given day.
With
Offer.where(several_complex_conditions).includes(:historical_offers).where("historical_offers.day = ?", Date.today)
I would get
SELECT * FROM offers
LEFT OUTER JOIN historical_offers
ON offers.id = historical_offers.offer_id
WHERE day = '2012-11-09' AND ...
But I want to have the condition in the ON clause, not in the WHERE clause:
SELECT * FROM offers
LEFT OUTER JOIN historical_offers
ON offers.id = historical_offers.offer_id AND day = '2012-11-09'
WHERE ...
I guess I could alter the has_many definition with a lambda condition for a specific date, but how would I pass in a date then?
Alternatively I could write the joins mysqlf like this:
Offer.where(several_complex_conditions)
.joins(["historical_offers ON offers.id = historical_offers.offer_id AND day = ?", Date.today])
But how can I hook this up so that eager loading is done?
After a few hours headscratching and trying all sorts of ways to accomplish eager loading of a constrained set of associated records I came across #dbenhur's answer in this thread which works fine for me - however the condition isn't something I'm passing in (it's a date relative to Date.today). Basically it is creating an association with the conditions I wanted to put into the LEFT JOIN ON clause into the has_many condition.
has_many :prices, order: "rate_date"
has_many :future_valid_prices,
class_name: 'Price',
conditions: ['rate_date > ? and rate is not null', Date.today-7.days]
And then in my controller:
#property = current_agent.properties.includes(:future_valid_prices).find_by_id(params[:id])