ActiveRecord/ARel modify `ON` in a left out join from includes - sql

I'm wondering if it's possible to specify additional JOIN ON criteria using ActiveRecord includes?
ex: I'm fetching a record and including an association with some conditions
record.includes(:other_record).where(:other_record => {:something => :another})
This gives me (roughly):
select * from records
left outer join other_records on other_records.records_id = records.id
where other_records.something = another
Does anyone know how I can specify an extra join condition so I could achieve something like.
select * from records
left outer join other_records on other_records.records_id = records.id
and other_records.some_date > now()
where other_records.something = another
I want my includes to pull in the other_records but I need additional criteria in my join. Anything using ARel would also be great, I've just never known how to plug a left outer join from ARel into and ActiveRecord::Relation

I can get you close with ARel. NOTE: My code ends up calling two queries behind the scenes, which I'll explain.
I had to work out LEFT JOINs in ARel myself, recently. Best thing you can do when playing with ARel is to fire up a Rails console or IRB session and run the #to_sql method on your ARel objects to see what kind of SQL they represent. Do it early and often!
Here's your SQL, touched up a bit for consistency:
SELECT *
FROM records
LEFT OUTER JOIN other_records ON other_records.record_id = records.id
AND other_records.some_date > now()
WHERE other_records.something = 'another'
I'll assume your records model is Record and other_records is OtherRecord. Translated to ARel and ActiveRecord:
class Record < ActiveRecord::Base
# Named scope that LEFT JOINs OtherRecords with some_date in the future
def left_join_other_in_future
# Handy ARel aliases
records = Record.arel_table
other = OtherRecord.arel_table
# Join criteria
record_owns_other = other[:record_id].eq(records[:id])
other_in_future = other[:some_date].gt(Time.now)
# ARel's #join method lets you specify the join node type. Defaults to InnerJoin.
# The #join_sources method extracts the ARel join node. You can plug that node
# into ActiveRecord's #joins method. If you call #to_sql on the join node,
# you'll get 'LEFT OUTER JOIN other_records ...'
left_join_other = records.join(other, Arel::Nodes::OuterJoin).
on(record_owns_other.and(other_in_future)).
join_sources
# Pull it together back in regular ActiveRecord and eager-load OtherRecords.
joins(left_join_other).includes(:other_records)
end
end
# MEANWHILE...
# Elsewhere in your app
Record.left_join_other_in_future.where(other_records: {something: 'another'})
I bottled the join in a named scope so you don't need to have all that ARel mixed in with your application logic.
My ARel ends up calling two queries behind the scenes: the first fetches Records using your JOIN and WHERE criteria, the second fetches all OtherRecords "WHERE other_records.record_id IN (...)" using a big list of all the Record IDs from the first query.
Record.includes() definitely gives you the LEFT JOIN you want, but I don't know of a way to inject your own criteria into the join. You could use Record.joins() instead of ARel if you wanted to write the SQL yourself:
Record.joins('LEFT OUTER JOIN other_records' +
' ON other_records.record_id = records.id' +
' AND other_records.some_date > NOW()')
I really, really prefer to let the database adapter write my SQL, so I used ARel.
If it were me, I'd consider putting the additional join criterion in the WHERE clause. I assume you're asking because putting the additional criterion on the join makes the query's EXPLAIN look better or because you don't want to deal with NULLs in the other_records.some_date column when there aren't any related other_records.

If you have a simple (equality) extra join condition it could simply be
record.includes(:other_record).where(:other_record => {:something => :another,
:some_date => Time.now})
But if you need the greater than comparison the following should do it.
record.includes(:other_record).where([
'other_records.something = ? and other_records.some_date > ?',
another, Time.now])
Hope that helps.

Related

Get records with no related data using activerecord and RoR3?

I am making scopes for a model that looks something like this:
class PressRelease < ActiveRecord::Base
has_many :publications
end
What I want to get is all press_releases that does not have publications, but from a scope method, so it can be chained with other scopes. Any ideas?
Thanks!
NOTE: I know that there are methods like present? or any? and so on, but these methods does not return an ActiveRecord::Relation as scope does.
NOTE: I am using RoR 3
Avoid eager_loading if you do not need it (it adds overhead). Also, there is no need for subselect statements.
scope :without_publications, -> { joins("LEFT OUTER JOIN publications ON publications.press_release_id = press_releases.id").where(publications: { id: nil }) }
Explanation and response to comments
My initial thoughts about eager loading overhead is that ActiveRecord would instantiate all the child records (publications) for each press release. Then I realized that the query will never return press release records with publications. So that is a moot point.
There are some points and observations to be made about the way ActiveRecord works. Some things I had previously learned from experience, and some things I learned exploring your question.
The query from includes(:publications).where(publications: {id: nil}) is actually different from my example. It will return all columns from the publications table in addition to the columns from press_releases. The publication columns are completely unnecessary because they will always be null. However, both queries ultimately result in the same set of PressRelease objects.
With the includes method, if you add any sort of limit, for example chaining .first, .last or .limit(), then ActiveRecord (4.2.4) will resort to executing two queries. The first query returns IDs, and the second query uses those IDs to get results. Using the SQL snippet method, ActiveRecord is able to use just one query. Here is an example of this from one of my applications:
Profile.includes(:positions).where(positions: { id: nil }).limit(5)
# SQL (0.8ms) SELECT DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "positions" ON "positions"."profile_id" = "profiles"."id" WHERE "positions"."id" IS NULL LIMIT 5
# SQL (0.8ms) SELECT "profiles"."id" AS t0_r0, ..., "positions"."end_year" AS t1_r11 FROM "profiles" LEFT OUTER JOIN "positions" ON "positions"."profile_id" = "profiles"."id" # WHERE "positions"."id" IS NULL AND "profiles"."id" IN (107, 24, 7, 78, 89)
Profile.joins("LEFT OUTER JOIN positions ON positions.profile_id = profiles.id").where(positions: { id: nil }).limit(5)
# Profile Load (1.0ms) SELECT "profiles".* FROM "profiles" LEFT OUTER JOIN positions ON positions.profile_id = profiles.id WHERE "positions"."id" IS NULL LIMIT 5
Most importantly
eager_loading and includes were not intended to solve the problem at hand. And for this particular case I think you are much more aware of what is needed than ActiveRecord is. You can therefore make better decisions about how to structure the query.
you can de the following in your PressRelease:
scope :your_scope, -> { where('id NOT IN(select press_release_id from publications)') }
this will return all PressRelease record without publications.
Couple ways to do this, first one requires two db queries:
PressRelease.where.not(id: Publications.uniq.pluck(:press_release_id))
or if you don't want to hardcode association foreign key:
PressRelease.where.not(id: PressRelease.uniq.joins(:publications).pluck(:id))
Another one is to do a left join and pick those without associated elements - you get a relation object, but it will be tricky to work with it as it already has a join on it:
PressRelease.eager_load(:publications).where(publications: {id: nil})
Another one is to use counter_cache feature. You will need to add publication_count column to your press_releases table.
class Publications < ActiveRecord::Base
belongs_to :presss_release, counter_cache: true
end
Rails will keep this column in sync with a number of records associated to given mode, so then you can simply do:
PressRelease.where(publications_count: [nil, 0])

Can I do a NATURAL JOIN is Slick v2?

The title is self-explanatory. Using 2.0.0-M3, I'd like to avoid unnecessary verbosity is the form of explicitly naming the columns to be joined on, since they are appropriately named, and since NATURAL JOIN is part of the SQL standard. Not to mention, Wikipedia itself even says that "The natural join is arguably one of the most important operators since it is the relational counterpart of logical AND."
I think the foregoing ought to be clear enough, but just if not, read on. Suppose I want to know the supplier-name and part-number of each part. Assuming appropriate case classes not shown:
class Suppliers(tag: Tag) extends Table[Supplier](tag, "suppliers") {
def snum = column[String]("snum")
def sname = column[String]("sname")
def * = (snum, sname) <> (Supplier.tupled, Supplier.unapply _)
}
class Shipments(tag: Tag) extends Table[Shipment](tag, "shipments") {
def snum = column[String]("snum")
def pnum = column[String]("pnum")
def * = (snum, pnum) <> (Shipment.tupled, Shipment.unapply _)
}
val suppliers = TableQuery[Suppliers]
val shipments = TableQuery[Shipments]
Given that both tables have the snum column I want to join on, seems as if
( suppliers join shipments ).run
ought to return a Vector with my desired data, but I get a failed attempt at an INNER JOIN, failing (at run-time) since it's missing any join condition.
I know I can do
suppliers.flatMap( s => shipments filter (sp => sp.snum === s.snum) map (sp => (s.sname, sp.pnum)) )
but, even without the names of all the columns I omitted for clarity of this question, it's still quite a lot more typing (and proofreading) than simply
suppliers join shipments
or, for that matter
SELECT * FROM suppliers NATURAL JOIN shipments;
If the Scala code is messier than the SQL code, then I really start questioning things. Is there no way simply to do a natural join in Slick?
Currently not supported by Slick. Please submit a ticket or pull request.
To improve readability of query code, you can put your join conditions into re-usable values. Or you can put the whole join in a function or method extension of Query[Suppliers,Supplier].
Alternatively you could look at the AutoJoin pattern (which basically makes your join conditions implicit) described here http://slick.typesafe.com/docs/#20130612_slick_vs_orm_scaladays_2013 and implemented here https://github.com/cvogt/play-slick/blob/scaladays2013/samples/computer-database/app/util/autojoin.scala

ActiveRecord: Adding condition to ON clause for includes

I have a model offers and another historical_offers, one offer has_many historical_offers.
Now I would like to eager load the historical_offers of one given day for a set of offers, if it exists. For this, I think I need to pass the day to the ON clause, not the WHERE clause, so that I get all offers, also when there is no historical_offer for the given day.
With
Offer.where(several_complex_conditions).includes(:historical_offers).where("historical_offers.day = ?", Date.today)
I would get
SELECT * FROM offers
LEFT OUTER JOIN historical_offers
ON offers.id = historical_offers.offer_id
WHERE day = '2012-11-09' AND ...
But I want to have the condition in the ON clause, not in the WHERE clause:
SELECT * FROM offers
LEFT OUTER JOIN historical_offers
ON offers.id = historical_offers.offer_id AND day = '2012-11-09'
WHERE ...
I guess I could alter the has_many definition with a lambda condition for a specific date, but how would I pass in a date then?
Alternatively I could write the joins mysqlf like this:
Offer.where(several_complex_conditions)
.joins(["historical_offers ON offers.id = historical_offers.offer_id AND day = ?", Date.today])
But how can I hook this up so that eager loading is done?
After a few hours headscratching and trying all sorts of ways to accomplish eager loading of a constrained set of associated records I came across #dbenhur's answer in this thread which works fine for me - however the condition isn't something I'm passing in (it's a date relative to Date.today). Basically it is creating an association with the conditions I wanted to put into the LEFT JOIN ON clause into the has_many condition.
has_many :prices, order: "rate_date"
has_many :future_valid_prices,
class_name: 'Price',
conditions: ['rate_date > ? and rate is not null', Date.today-7.days]
And then in my controller:
#property = current_agent.properties.includes(:future_valid_prices).find_by_id(params[:id])

Nested sql queries in rails when :has_and_belongst_to_many

In my application I the next task that has not already been done by a user. I have Three models, A Book that has many Tasks and then I have a User that has has and belongs to many tasks. The table tasks_users table contains all completed tasks so I need to write a complex query to find the next task to perform.
I have came up with two solutions in pure SQL that works, but I cant translate them to rails, thats what I need help with
SELECT * FROM `tasks`
WHERE `tasks`.`book_id` = #book_id
AND `tasks`.`id` NOT IN (
SELECT `tasks_users`.`task_id`
FROM `tasks_users`
WHERE `tasks_users`.`user_id` = #user_id)
ORDER BY `task`.`date` ASC
LIMIT 1;
and equally without nested select
SELECT *
FROM tasks
LEFT JOIN tasks_users
ON tasks_users.tasks_id = task.id
AND tasks_users.user_id = #user_id
WHERE tasks_users.task_id IS NULL
AND tasks.book_id = #book_id
LIMIT 1;
This is what I Have done in rails with the MetaWhere plugin
book.tasks.joins(:users.outer).where(:users => {:id => nil})
but I cant figure out how to get the current user there too,
Thanks for any help!
I think this will duplicate the second form with the LEFT JOIN:
class Task < ActiveRecord::Base
scope :next_task, lambda { |book,user| book.tasks.\
joins("LEFT JOIN task_users ON task_users.task_id=tasks.id AND task_users.user_id=#{user.id}").\
where(:tasks=>{:task_users=>{:task_id=>nil}}).\
order("date DESC").limit(1) }
end
Note that instead of tasks_users this uses the table name task_user, which is more typical for a join model. Also, it needs to be called with:
Task.next_task(#book_id,#user_id)
book.tasks.where("tasks.id not in (select task_id from tasks_users where user_id=?)", #user_id).first
That would give you the first task that doesn't already have an entry in tasks_users for the current user.

How do I write a named scope to filter by all of an array passed in, and not just by matching one element (using IN)

I have two models, Project and Category, which have a many-to-many relationship between them. The Project model is very simple:
class Project < ActiveRecord::Base
has_and_belongs_to_many :categories
scope :in_categories, lambda { |categories|
joins(:categories).
where("categories.id in (?)", categories.collect(&:to_i))
}
end
The :in_categories scope takes an array of Category IDs (as strings), so using this scope I can get back every project that belongs to at least one of the categories passed in.
But what I'm actually trying to do is filter (a better name would be :has_categories). I want to just get the projects that belong to all of the categories passed in. So if I pass in ["1", "3", "4"] I only want to get the projects that belong to all of the categories.
There are two common solutions in SQL to do what you're describing.
Self-join:
SELECT ...
FROM Projects p
JOIN Categories c1 ON c1.project_id = p.id
JOIN Categories c3 ON c3.project_id = p.id
JOIN Categories c4 ON c4.project_id = p.id
WHERE (c1.id, c3.id, c4.id) = (1, 3, 4);
Note I'm using syntax to compare tuples. This is equivalent to:
WHERE c1.id = 1 AND c3.id = 3 AND c4.id = 4;
In general, the self-join solution has very good performance if you have a covering index. Probably Categories.(project_id,id) would be the right index, but analyze the SQL with EXPLAIN to be sure.
The disadvantage of this method is that you need four joins if you're searching for projects that match four different categories. Five joins for five categories, etc.
Group-by:
SELECT ...
FROM Projects p
JOIN Categories cc ON c.project_id = p.id
WHERE c.id IN (1, 3, 4)
GROUP BY p.id
HAVING COUNT(*) = 3;
If you're using MySQL (I assume you are), most GROUP BY queries invoke a temp table and this kills performance.
I'll leave it as an exercise for you to adapt one of these SQL solutions to equivalent Rails ActiveRecord API.
It seems like in ActiveRecord you would do it like so:
scope :has_categories, lambda { |categories|
joins(:categories).
where("categories.id in (?)", categories.collect(&:to_i)).
group("projects.id HAVING COUNT(projects.id) = #{categories.count}")
}