ActiveRecord: Adding condition to ON clause for includes - ruby-on-rails-3

I have a model offers and another historical_offers, one offer has_many historical_offers.
Now I would like to eager load the historical_offers of one given day for a set of offers, if it exists. For this, I think I need to pass the day to the ON clause, not the WHERE clause, so that I get all offers, also when there is no historical_offer for the given day.
With
Offer.where(several_complex_conditions).includes(:historical_offers).where("historical_offers.day = ?", Date.today)
I would get
SELECT * FROM offers
LEFT OUTER JOIN historical_offers
ON offers.id = historical_offers.offer_id
WHERE day = '2012-11-09' AND ...
But I want to have the condition in the ON clause, not in the WHERE clause:
SELECT * FROM offers
LEFT OUTER JOIN historical_offers
ON offers.id = historical_offers.offer_id AND day = '2012-11-09'
WHERE ...
I guess I could alter the has_many definition with a lambda condition for a specific date, but how would I pass in a date then?
Alternatively I could write the joins mysqlf like this:
Offer.where(several_complex_conditions)
.joins(["historical_offers ON offers.id = historical_offers.offer_id AND day = ?", Date.today])
But how can I hook this up so that eager loading is done?

After a few hours headscratching and trying all sorts of ways to accomplish eager loading of a constrained set of associated records I came across #dbenhur's answer in this thread which works fine for me - however the condition isn't something I'm passing in (it's a date relative to Date.today). Basically it is creating an association with the conditions I wanted to put into the LEFT JOIN ON clause into the has_many condition.
has_many :prices, order: "rate_date"
has_many :future_valid_prices,
class_name: 'Price',
conditions: ['rate_date > ? and rate is not null', Date.today-7.days]
And then in my controller:
#property = current_agent.properties.includes(:future_valid_prices).find_by_id(params[:id])

Related

Select oldest HABTM record with group by clause

I want to show a line chart on the admin page (with chartkick) with the incremental number of scores related to their earliest export date.
I have the following models:
# score.rb
class Score < ApplicationRecord
has_and_belongs_to_many :export_orders, join_table: :scores_export_orders
end
# export_order.rb
class ExportOrder < ApplicationRecord
has_and_belongs_to_many :scores, join_table: :scores_export_orders
end
How do I select, for each Score having at least one ExportOrder, the corresponding ExportOrder with the earliest created_at (in date only format)?
I had a look at this, but my situation has a HABTM relationship instead of a simple has_many.
I tried this code, to get at least a mapping between oldest export date and number of scores:
sql = "
SELECT
COUNT(DISTINCT scores.id), MIN(export_orders.created_at::date)
FROM
scores
INNER JOIN
scores_export_orders
ON
scores.id = scores_export_orders.score_id
INNER JOIN
export_orders
ON
export_orders.id = scores_export_orders.export_order_id
GROUP BY
export_orders.created_at::date
".split("\n").join(' ')
query = ActiveRecord::Base.connection.execute(sql)
query.map { |v| [v['count'], v['min']] }
but the total number of scores is greater than all scores having an export date.
Any ideas?
Try:
class Score < ApplicationRecord
has_and_belongs_to_many :export_orders, join_table: :scores_export_orders
def earliest_export_date
export_orders.pluck(&:created_at).min
end
end
This will let you call #score.earliest_export_date, which should return the value you want.
I also think it's the most performant way to do it in ruby, although someone may correct me on that.
The following has better performance than Mark's solution since it relies on pure SQL. Basically, the GROUP BY clause required grouping by scores_export_orders.score_id rather than export_orders.created_at:
sql = "
SELECT
COUNT(DISTINCT scores_export_orders.score_id), MIN(export_orders.created_at::date)
INNER JOIN
scores_export_orders
INNER JOIN
export_orders
ON
export_orders.id = scores_export_orders.export_order_id
GROUP BY
scores_export_orders.score_id
".split("\n").join(' ')
query = ActiveRecord::Base.connection.execute(sql)
query.map { |v| [v['count'], v['min']] }
I couldn't find an exact equivalent in ActiveRecord instructions (all of such attempts were giving me strange results), so executing the SQL will also do the trick.

ActiveRecord query to sum over joins

I want to get the sum of the receipt items that are in a particular budget (same title) and from the current query I'm getting to many record and obvious wrong sum of amounts from the receipt items.
My current attempt is looking like that in ActiveRecord (AR):
ReceiptItem.includes(donation: [:budgets]).joins(:donation, :receipt).where(budgets: {title: "Some title 2015"}).sum(:amount)
and my SQL attempt was looking like that (its also wrong):
-- want to test just the outcome its not actually not summing up the amounts
SELECT "receipt_items"."amount"
FROM
"receipt_items" INNER JOIN "donations" ON "donations"."id" = "receipt_items"."donation_id"
RIGHT JOIN "receipts" ON "receipts"."receipt_id" = "receipt_items"."receipt_id"
LEFT OUTER JOIN "budgets" ON "budgets"."donation_id" = "donations"."id"
WHERE "budgets"."title" = 'Some title 2015';
Why I'm getting double records although I've joined the tables and set also the condition?
Here is the ER modell to understand the problem.
And here's the AR Assoziations:
class Budget < ActiveRecord::Base
belongs_to :donation
class Donation < ActiveRecord::Base
has_many :receipt_items
has_many :budgets
class ReceiptItem < ActiveRecord::Base
belongs_to :donation
Because a budget can be linked to a reciept item multiple times, via different donations, it's appearing in the big join table multiple times, and thus being counted several times.
Let's try to think this through a step at a time. If you wanted to do it without worrying about eager loading, you would do:
Budget.where(title: "some title").all.collect(&:donation).collect(&:receipt_items).flatten.uniq.collect(&:amount).sum
is that right?
If so, you can tailor the eager loading to fit this chain of method calls:
Budget.where(title: "some title", include: {:donation => [:receipt_items]}).all.collect(&:donation).collect(&:receipt_items).uniq.collect(&:amount).sum
try that?

Get records with no related data using activerecord and RoR3?

I am making scopes for a model that looks something like this:
class PressRelease < ActiveRecord::Base
has_many :publications
end
What I want to get is all press_releases that does not have publications, but from a scope method, so it can be chained with other scopes. Any ideas?
Thanks!
NOTE: I know that there are methods like present? or any? and so on, but these methods does not return an ActiveRecord::Relation as scope does.
NOTE: I am using RoR 3
Avoid eager_loading if you do not need it (it adds overhead). Also, there is no need for subselect statements.
scope :without_publications, -> { joins("LEFT OUTER JOIN publications ON publications.press_release_id = press_releases.id").where(publications: { id: nil }) }
Explanation and response to comments
My initial thoughts about eager loading overhead is that ActiveRecord would instantiate all the child records (publications) for each press release. Then I realized that the query will never return press release records with publications. So that is a moot point.
There are some points and observations to be made about the way ActiveRecord works. Some things I had previously learned from experience, and some things I learned exploring your question.
The query from includes(:publications).where(publications: {id: nil}) is actually different from my example. It will return all columns from the publications table in addition to the columns from press_releases. The publication columns are completely unnecessary because they will always be null. However, both queries ultimately result in the same set of PressRelease objects.
With the includes method, if you add any sort of limit, for example chaining .first, .last or .limit(), then ActiveRecord (4.2.4) will resort to executing two queries. The first query returns IDs, and the second query uses those IDs to get results. Using the SQL snippet method, ActiveRecord is able to use just one query. Here is an example of this from one of my applications:
Profile.includes(:positions).where(positions: { id: nil }).limit(5)
# SQL (0.8ms) SELECT DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "positions" ON "positions"."profile_id" = "profiles"."id" WHERE "positions"."id" IS NULL LIMIT 5
# SQL (0.8ms) SELECT "profiles"."id" AS t0_r0, ..., "positions"."end_year" AS t1_r11 FROM "profiles" LEFT OUTER JOIN "positions" ON "positions"."profile_id" = "profiles"."id" # WHERE "positions"."id" IS NULL AND "profiles"."id" IN (107, 24, 7, 78, 89)
Profile.joins("LEFT OUTER JOIN positions ON positions.profile_id = profiles.id").where(positions: { id: nil }).limit(5)
# Profile Load (1.0ms) SELECT "profiles".* FROM "profiles" LEFT OUTER JOIN positions ON positions.profile_id = profiles.id WHERE "positions"."id" IS NULL LIMIT 5
Most importantly
eager_loading and includes were not intended to solve the problem at hand. And for this particular case I think you are much more aware of what is needed than ActiveRecord is. You can therefore make better decisions about how to structure the query.
you can de the following in your PressRelease:
scope :your_scope, -> { where('id NOT IN(select press_release_id from publications)') }
this will return all PressRelease record without publications.
Couple ways to do this, first one requires two db queries:
PressRelease.where.not(id: Publications.uniq.pluck(:press_release_id))
or if you don't want to hardcode association foreign key:
PressRelease.where.not(id: PressRelease.uniq.joins(:publications).pluck(:id))
Another one is to do a left join and pick those without associated elements - you get a relation object, but it will be tricky to work with it as it already has a join on it:
PressRelease.eager_load(:publications).where(publications: {id: nil})
Another one is to use counter_cache feature. You will need to add publication_count column to your press_releases table.
class Publications < ActiveRecord::Base
belongs_to :presss_release, counter_cache: true
end
Rails will keep this column in sync with a number of records associated to given mode, so then you can simply do:
PressRelease.where(publications_count: [nil, 0])

How to retrieve a list of records and the count of each one's children with condition in Active Record?

There are two models with our familiar one-to-many relationship:
class Custom
has_many :orders
end
class Order
belongs_to :custom
end
I want to do the following work:
get all the custom information whose age is over 18, and how many big orders(pay for 1,000 dollars) they have?
UPDATE:
for the models:
rails g model custom name:string age:integer
rails g model orders amount:decimal custom_id:integer
I hope one left join sql statement will do all my job, and don't construct unnecessary objects like this:
Custom.where('age > ?', '18').includes(:orders).where('orders.amount > ?', '1000')
It will construct a lot of order objects which I don't need, and it will calculate the count by Array#count function which will waste time.
UPDATE 2:
My own solution is wrong, it will remove customs who doesn't have big orders from the result.
Finding adult customers with big orders
This solution uses a single query, with the nested orders relation transformed into a sub-query.
big_customers = Custom.where("age > ?", "18").where(
id: Order.where("amount > ?", "1000").select(:custom_id)
)
Grab all adults and their # of big orders (MySQL)
This can still be done in a single query. The count is grabbed via a join on orders and sticking the count of orders into a column in the result called big_orders_count, which ActiveRecord turns into a method. It involves a lot more "raw" SQL. I don't know any way to avoid this with ActiveRecord except with the great squeel gem.
adults = Custom.where("age > ?", "18").select([
Custom.arel_table["*"],
"count(orders.id) as big_orders_count"
]).joins(%{LEFT JOIN orders
ON orders.custom_id = customs.id
AND orders.amount > 1000})
# see count:
adults.first.big_orders_count
You might want to consider caching counters like this. This join will be expensive on the database, so if you had a dedicated customs.big_order_count column that was either refreshed regularly or updated by an observer that watches for big Order records.
Grab all adults and their # of big orders (PostgreSQL)
Solution 2 is mysql only. To get this to work in postgresql I created a third solution that uses a sub-query. Still one call to the DB :-)
adults = Custom.where("age > ?", "18").select([
%{"customs".*},
%{(
SELECT count(*)
FROM orders
WHERE orders.custom_id = customs.id
AND orders.amount > 1000
) AS big_orders_count}
])
# see count:
adults.first.big_orders_count
I have tested this against postgresql with real data. There may be a way to use more ActiveRecord and less SQL, but this works.
Edited.
#custom_over_18 = Custom.where("age > ?", "18").orders.where("amount > ?", "1000").count

Rails 3 query matching attribute of has_one association that is a subset of has_many association

The title is confusing, but allow me to explain. I have a Car model that has multiple datapoints with different timestamps. We are almost always concerned with attributes of its latest status. So the model has_many statuses, along with a has_one to easily access it's latest one:
class Car < ActiveRecord::Base
has_many :statuses, class_name: 'CarStatus', order: "timestamp DESC"
has_one :latest_status, class_name: 'CarStatus', order: "timestamp DESC"
delegate :location, :timestamp, to: 'latest_status', prefix: 'latest', allow_nil: true
# ...
end
To give you an idea of what the statuses hold:
loc = Car.first.latest_location # Location object (id = 1 for example)
loc.name # "Miami, FL"
Let's say I wanted to have a (chainable) scope to find all cars with a latest location id of 1. Currently I have a sort of complex method:
# car.rb
def self.by_location_id(id)
ids = []
find_each(include: :latest_status) do |car|
ids << car.id if car.latest_status.try(:location_id) == id.to_i
end
where("id in (?)", ids)
end
There may be a quicker way to do this using SQL, but not sure how to only get the latest status for each car. There may be many status records with a location_id of 1, but if that's not the latest location for its car, it should not be included.
To make it harder... let's add another level and be able to scope by location name. I have this method, preloading statuses along with their location objects to be able to access the name:
def by_location_name(loc)
ids = []
find_each(include: {latest_status: :location}) do |car|
ids << car.id if car.latest_location.try(:name) =~ /#{loc}/i
end
where("id in (?)", ids)
end
This will match the location above with "miami", "fl", "MIA", etc... Does anyone have any suggestions on how I can make this more succinct/efficient? Would it be better to define my associations differently? Or maybe it will take some SQL ninja skills, which I admittedly don't have.
Using Postgres 9.1 (hosted on Heroku cedar stack)
All right. Since you're using postgres 9.1 like I am, I'll take a shot at this. Tackling the first problem first (scope to filter by location of last status):
This solution takes advantage of PostGres's support for analytic functions, as described here: http://explainextended.com/2009/11/26/postgresql-selecting-records-holding-group-wise-maximum/
I think the following gives you part of what you need (replace/interpolate the location id you're interested in for the '?', naturally):
select *
from (
select cars.id as car_id, statuses.id as status_id, statuses.location_id, statuses.created_at, row_number() over (partition by statuses.id order by statuses.created_at) as rn
from cars join statuses on cars.id = statuses.car_id
) q
where rn = 1 and location_id = ?
This query will return car_id, status_id, location_id, and a timestamp (called created_at by default, although you could alias it if some other name is easier to work with).
Now to convince Rails to return results based on this. Because you'll probably want to use eager loading with this, find_by_sql is pretty much out. There is a trick I discovered though, using .joins to join to a subquery. Here's approximately what it might look like:
def self.by_location(loc)
joins(
self.escape_sql('join (
select *
from (
select cars.id as car_id, statuses.id as status_id, statuses.location_id, statuses.created_at, row_number() over (partition by statuses.id order by statuses.created_at) as rn
from cars join statuses on cars.id = statuses.car_id
) q
where rn = 1 and location_id = ?
) as subquery on subquery.car_id = cars.id order by subquery.created_at desc', loc)
)
end
Join will act as a filter, giving you only the Car objects that were involved in the subquery.
Note: In order to refer to escape_sql as I do above, you'll need to modify ActiveRecord::Base slightly. I do this by adding this to an initializer in the app (which I place in app/config/initializers/active_record.rb):
class ActiveRecord::Base
def self.escape_sql(clause, *rest)
self.send(:sanitize_sql_array, rest.empty? ? clause : ([clause] + rest))
end
end
This allows you to call .escape_sql on any of your models that are based on AR::B. I find this profoundly useful, but if you've got some other way to sanitize sql, feel free to use that instead.
For the second part of the question - unless there are multiple locations with the same name, I'd just do a Location.find_by_name to turn it into an id to pass into the above. Basically this:
def self.by_location_name(name)
loc = Location.find_by_name(name)
by_location(loc)
end