Select oldest HABTM record with group by clause - sql

I want to show a line chart on the admin page (with chartkick) with the incremental number of scores related to their earliest export date.
I have the following models:
# score.rb
class Score < ApplicationRecord
has_and_belongs_to_many :export_orders, join_table: :scores_export_orders
end
# export_order.rb
class ExportOrder < ApplicationRecord
has_and_belongs_to_many :scores, join_table: :scores_export_orders
end
How do I select, for each Score having at least one ExportOrder, the corresponding ExportOrder with the earliest created_at (in date only format)?
I had a look at this, but my situation has a HABTM relationship instead of a simple has_many.
I tried this code, to get at least a mapping between oldest export date and number of scores:
sql = "
SELECT
COUNT(DISTINCT scores.id), MIN(export_orders.created_at::date)
FROM
scores
INNER JOIN
scores_export_orders
ON
scores.id = scores_export_orders.score_id
INNER JOIN
export_orders
ON
export_orders.id = scores_export_orders.export_order_id
GROUP BY
export_orders.created_at::date
".split("\n").join(' ')
query = ActiveRecord::Base.connection.execute(sql)
query.map { |v| [v['count'], v['min']] }
but the total number of scores is greater than all scores having an export date.
Any ideas?

Try:
class Score < ApplicationRecord
has_and_belongs_to_many :export_orders, join_table: :scores_export_orders
def earliest_export_date
export_orders.pluck(&:created_at).min
end
end
This will let you call #score.earliest_export_date, which should return the value you want.
I also think it's the most performant way to do it in ruby, although someone may correct me on that.

The following has better performance than Mark's solution since it relies on pure SQL. Basically, the GROUP BY clause required grouping by scores_export_orders.score_id rather than export_orders.created_at:
sql = "
SELECT
COUNT(DISTINCT scores_export_orders.score_id), MIN(export_orders.created_at::date)
INNER JOIN
scores_export_orders
INNER JOIN
export_orders
ON
export_orders.id = scores_export_orders.export_order_id
GROUP BY
scores_export_orders.score_id
".split("\n").join(' ')
query = ActiveRecord::Base.connection.execute(sql)
query.map { |v| [v['count'], v['min']] }
I couldn't find an exact equivalent in ActiveRecord instructions (all of such attempts were giving me strange results), so executing the SQL will also do the trick.

Related

ActiveRecord query to sum over joins

I want to get the sum of the receipt items that are in a particular budget (same title) and from the current query I'm getting to many record and obvious wrong sum of amounts from the receipt items.
My current attempt is looking like that in ActiveRecord (AR):
ReceiptItem.includes(donation: [:budgets]).joins(:donation, :receipt).where(budgets: {title: "Some title 2015"}).sum(:amount)
and my SQL attempt was looking like that (its also wrong):
-- want to test just the outcome its not actually not summing up the amounts
SELECT "receipt_items"."amount"
FROM
"receipt_items" INNER JOIN "donations" ON "donations"."id" = "receipt_items"."donation_id"
RIGHT JOIN "receipts" ON "receipts"."receipt_id" = "receipt_items"."receipt_id"
LEFT OUTER JOIN "budgets" ON "budgets"."donation_id" = "donations"."id"
WHERE "budgets"."title" = 'Some title 2015';
Why I'm getting double records although I've joined the tables and set also the condition?
Here is the ER modell to understand the problem.
And here's the AR Assoziations:
class Budget < ActiveRecord::Base
belongs_to :donation
class Donation < ActiveRecord::Base
has_many :receipt_items
has_many :budgets
class ReceiptItem < ActiveRecord::Base
belongs_to :donation
Because a budget can be linked to a reciept item multiple times, via different donations, it's appearing in the big join table multiple times, and thus being counted several times.
Let's try to think this through a step at a time. If you wanted to do it without worrying about eager loading, you would do:
Budget.where(title: "some title").all.collect(&:donation).collect(&:receipt_items).flatten.uniq.collect(&:amount).sum
is that right?
If so, you can tailor the eager loading to fit this chain of method calls:
Budget.where(title: "some title", include: {:donation => [:receipt_items]}).all.collect(&:donation).collect(&:receipt_items).uniq.collect(&:amount).sum
try that?

Rails: ActiveRecord query regarding size of association

I'm trying to figure out how to produce a certain query, using ActiveRecord.
I have the following models
class Activity < ActiveRecord::Base
attr_accessible :limit, ...
has_many :employees
end
class User < ActiveRecord::Base
belongs_to :activity
end
Each activity has a limit, that is to say, an integer attribute containing the maximum amount of users who may belong to it.
I'm looking for a way to select all activities that have spots available, i.e. where the number of users is smaller than that limit.
Any ideas?
Thanks
I think that the SQL syntax to aim for would be:
select *
from activities
where activities.limit > (
select count(*)
from users
where users.activity_id = activities.id)
In Rails-speak ...
Activity.where("activities.limit > (select count(*) from users where users.activity_id = activities.id)")
Not sure whether the column name "limit" is going to give you problems as it's a reserved word. You might have to quote it in the SQL.
I'd also seriously consider a counter cache for users on the activities table, which would make this perform much better. Some databases would support a partial index only for those rows where the users counter cache < limit.
Activity.all.select{|activity| activity.users.length < activity.limit }

ActiveRecord: Adding condition to ON clause for includes

I have a model offers and another historical_offers, one offer has_many historical_offers.
Now I would like to eager load the historical_offers of one given day for a set of offers, if it exists. For this, I think I need to pass the day to the ON clause, not the WHERE clause, so that I get all offers, also when there is no historical_offer for the given day.
With
Offer.where(several_complex_conditions).includes(:historical_offers).where("historical_offers.day = ?", Date.today)
I would get
SELECT * FROM offers
LEFT OUTER JOIN historical_offers
ON offers.id = historical_offers.offer_id
WHERE day = '2012-11-09' AND ...
But I want to have the condition in the ON clause, not in the WHERE clause:
SELECT * FROM offers
LEFT OUTER JOIN historical_offers
ON offers.id = historical_offers.offer_id AND day = '2012-11-09'
WHERE ...
I guess I could alter the has_many definition with a lambda condition for a specific date, but how would I pass in a date then?
Alternatively I could write the joins mysqlf like this:
Offer.where(several_complex_conditions)
.joins(["historical_offers ON offers.id = historical_offers.offer_id AND day = ?", Date.today])
But how can I hook this up so that eager loading is done?
After a few hours headscratching and trying all sorts of ways to accomplish eager loading of a constrained set of associated records I came across #dbenhur's answer in this thread which works fine for me - however the condition isn't something I'm passing in (it's a date relative to Date.today). Basically it is creating an association with the conditions I wanted to put into the LEFT JOIN ON clause into the has_many condition.
has_many :prices, order: "rate_date"
has_many :future_valid_prices,
class_name: 'Price',
conditions: ['rate_date > ? and rate is not null', Date.today-7.days]
And then in my controller:
#property = current_agent.properties.includes(:future_valid_prices).find_by_id(params[:id])

Rails 3 query matching attribute of has_one association that is a subset of has_many association

The title is confusing, but allow me to explain. I have a Car model that has multiple datapoints with different timestamps. We are almost always concerned with attributes of its latest status. So the model has_many statuses, along with a has_one to easily access it's latest one:
class Car < ActiveRecord::Base
has_many :statuses, class_name: 'CarStatus', order: "timestamp DESC"
has_one :latest_status, class_name: 'CarStatus', order: "timestamp DESC"
delegate :location, :timestamp, to: 'latest_status', prefix: 'latest', allow_nil: true
# ...
end
To give you an idea of what the statuses hold:
loc = Car.first.latest_location # Location object (id = 1 for example)
loc.name # "Miami, FL"
Let's say I wanted to have a (chainable) scope to find all cars with a latest location id of 1. Currently I have a sort of complex method:
# car.rb
def self.by_location_id(id)
ids = []
find_each(include: :latest_status) do |car|
ids << car.id if car.latest_status.try(:location_id) == id.to_i
end
where("id in (?)", ids)
end
There may be a quicker way to do this using SQL, but not sure how to only get the latest status for each car. There may be many status records with a location_id of 1, but if that's not the latest location for its car, it should not be included.
To make it harder... let's add another level and be able to scope by location name. I have this method, preloading statuses along with their location objects to be able to access the name:
def by_location_name(loc)
ids = []
find_each(include: {latest_status: :location}) do |car|
ids << car.id if car.latest_location.try(:name) =~ /#{loc}/i
end
where("id in (?)", ids)
end
This will match the location above with "miami", "fl", "MIA", etc... Does anyone have any suggestions on how I can make this more succinct/efficient? Would it be better to define my associations differently? Or maybe it will take some SQL ninja skills, which I admittedly don't have.
Using Postgres 9.1 (hosted on Heroku cedar stack)
All right. Since you're using postgres 9.1 like I am, I'll take a shot at this. Tackling the first problem first (scope to filter by location of last status):
This solution takes advantage of PostGres's support for analytic functions, as described here: http://explainextended.com/2009/11/26/postgresql-selecting-records-holding-group-wise-maximum/
I think the following gives you part of what you need (replace/interpolate the location id you're interested in for the '?', naturally):
select *
from (
select cars.id as car_id, statuses.id as status_id, statuses.location_id, statuses.created_at, row_number() over (partition by statuses.id order by statuses.created_at) as rn
from cars join statuses on cars.id = statuses.car_id
) q
where rn = 1 and location_id = ?
This query will return car_id, status_id, location_id, and a timestamp (called created_at by default, although you could alias it if some other name is easier to work with).
Now to convince Rails to return results based on this. Because you'll probably want to use eager loading with this, find_by_sql is pretty much out. There is a trick I discovered though, using .joins to join to a subquery. Here's approximately what it might look like:
def self.by_location(loc)
joins(
self.escape_sql('join (
select *
from (
select cars.id as car_id, statuses.id as status_id, statuses.location_id, statuses.created_at, row_number() over (partition by statuses.id order by statuses.created_at) as rn
from cars join statuses on cars.id = statuses.car_id
) q
where rn = 1 and location_id = ?
) as subquery on subquery.car_id = cars.id order by subquery.created_at desc', loc)
)
end
Join will act as a filter, giving you only the Car objects that were involved in the subquery.
Note: In order to refer to escape_sql as I do above, you'll need to modify ActiveRecord::Base slightly. I do this by adding this to an initializer in the app (which I place in app/config/initializers/active_record.rb):
class ActiveRecord::Base
def self.escape_sql(clause, *rest)
self.send(:sanitize_sql_array, rest.empty? ? clause : ([clause] + rest))
end
end
This allows you to call .escape_sql on any of your models that are based on AR::B. I find this profoundly useful, but if you've got some other way to sanitize sql, feel free to use that instead.
For the second part of the question - unless there are multiple locations with the same name, I'd just do a Location.find_by_name to turn it into an id to pass into the above. Basically this:
def self.by_location_name(name)
loc = Location.find_by_name(name)
by_location(loc)
end

Rails 3 Order Records By Grand-child Count

I'm trying to do some fairly complicated record sorting that I was having a bit of trouble with. I have three models:
class User < ActiveRecord::Base
has_many :registers
has_many :results, :through => :registers
#Find all the Users that exist as registrants for a tournament
scope :with_tournament_entrees, :include => :registers, :conditions => "registers.id IS NOT NULL"
end
Register
class Register < ActiveRecord::Base
belongs_to :user
has_many :results
end
Result
class Result < ActiveRecord::Base
belongs_to :register
end
Now on a Tournament result page I list all users by their total wins (wins is calculated through the results table). First thing first I find all users who have entered a tournament with the query:
User.with_tournament_entrees
With this I can simply loop through the returned users and query each individual record with the following to retrieve each users "Total Wins":
user.results.where("win = true").count()
However I would also like to take this a step further and order all of the users by their "Total Wins", and this is the best I could come up with:
User.with_tournament_entrees.select('SELECT *,
(SELECT count(*)
FROM results
INNER JOIN "registers"
ON "results"."register_id" = "registers"."id"
WHERE "registers"."user_id" = "users.id"
AND (win = true)
) AS total_wins
FROM users ORDER BY total_wins DESC')
I think it's close, but it doesn't actually order by the total_wins in descending order as I instruct it to. I'm using a PostgreSQL database.
Edit:
There's actually three selects taking place, the first occurs on User.with_tournament_entries which just performs a quick filter on the User table. If I ignore that and try
SELECT *, (SELECT count(*) FROM results INNER JOIN "registers" ON "results"."register_id" = "registers"."id" WHERE "registers"."user_id" = "users.id" AND (win = true)) AS total_wins FROM users ORDER BY total_wins DESC;
it fails in both PSQL and the ERB console. I get the error message:
PGError: ERROR: column "users.id" does not exist
I think this happens because the inner-select occurs before the outer-select so it doesn't have access to the user id before hand. Not sure how to give it access to all user ids before than inner select occurs but this isn't an issue when I do User.with_tournament_entires followed by the query.
In your SQL, "users.id" is quoted wrong -- it's telling Postgres to look for a column named, literally, "users.id".
It should be "users"."id", or, just users.id (you only need to quote it if you have a table/column name that conflicts with a postgres keyword, or have punctuation or something else unusual).