Matching nested model association attribute with includes - ruby-on-rails-3

Suppose I have the following models:
class Post < ActiveRecord::Base
has_many :authors
class Author < ActiveRecord::Base
belongs_to :post
And suppose the Author model has an attribute, name.
I want to search for all posts with a given author "alice", by that author's name. Say there is another author "bob" who co-authored a post with alice.
If I search for the first result using includes and where:
post = Post.includes(:authors).where("authors.name" => "alice").first
You'll see that the post only has one author now, even if in fact there are more:
post.authors #=> [#<Author id: 1, name: "alice", ...>]
post.reload
post.authors #=> [#<Author id: 1, name: "alice", ...>, #<Author id: 2, name: "bob", ...>]
The problem seems to be the combination of includes and where, which limits the scope correctly to the desired post, but at the same time hides all associations except for the one that is matched.
I want to end up with an ActiveRecord::Relation for chaining, so the reload solution above is not really satisfactory. Replacing includes by joins solves this, but does not eager load the associations:
Post.joins(:authors).where("authors.name" => "alice").first.authors
#=> [#<Author id: 1, name: "alice", ...>, #<Author id: 2, name: "bob", ...>]
Post.joins(:authors).where("authors.name" => "alice").first.authors.loaded?
#=> false
Any suggestions? Thanks in advance, I've been banging my head over this problem for a while.

I see what you're doing as expected behaviour, at least that's how SQL works... You're restricting the join on authors to where authors.id = 1, so why would it load any others? ActiveRecord just takes the rows that the database returned, it has no way of knowing if there are others, without doing another query based on the posts.id.
Here's one possible solution with a subquery, this will work as a chainable relation, and executes in one query:
relation = Post.find_by_id(id: Author.where(id:1).select(:post_id))
If you add the includes, you will see the queries happen one of two ways:
relation = relation.includes(:authors)
relation.first
# 1. Post Load SELECT DISTINCT `posts`.`id`...
# 2. SQL SELECT `posts`.`id` AS t0_r0, `posts`.`title` AS t0_r1, ...
relation.all.first
# 1. SQL SELECT `posts`.`id` AS t0_r0, `posts`.`title` AS t0_r1, ...
So depending on the scenario, ActiveRecord decides whether to look up the id with a simpler query before loading all the associated authors. Sometimes it makes more sense to run the query in 2 steps.

Coming back to this question after a long long time, I realized there is a better way to do this. The key is to do not one but two joins, one with includes and one with Arel using a table alias:
posts = Post.arel_table
authors = Author.arel_table.alias("matching_authors")
join = posts.join(authors, Arel::Nodes::InnerJoin).
on(authors[:post_id].eq(posts[:id])).join_sources
post = Post.includes(:authors).joins(join).
where(matching_authors: { name: "Alice" }).first
The SQL for this query is quite long since it has includes, but the key point is that it has not one but two joins, one (from includes) using a LEFT OUTER JOIN on the alias posts_authors, the other (from the Arel join) using an INNER JOIN on the alias matching_authors. The WHERE only applies to the latter alias, so results on the association in the returned results are not limited by this condition.

I ran into the same issue (which I describe as: where clause filters the associated model, rather than the primary model, when includes is used to prevent N+1 queries).
After flailing around trying various solutions, I found that using preload in conjunction with joins solves this for me. The Rails documentation is not super useful here. But apparently preload will explicitly use two separate queries, one to filter/select the primary model, and a second query to load the associated models. This blog post also has some insights that helped lead me to the solution.
Applying this to your models would be something like:
post = Post.preload(:authors).joins(:authors).where("authors.name" => "alice").first
I suspect that under the covers this is doing the same thing as your accepted answer, but at a nicer level of abstraction.
I wish the Rails docs were more explicit about how to do this. It's subtle enough that I wrote a bunch of tests around this precise situation in my code base.

Actually, it's because this code:
post = Post.includes(:authors).where("authors.name" => "alice").first
returns the first matched record because of the ".first". I think if you did this:
post = Post.includes(:authors).where("authors.name" => "alice")
you would get back all posts with "alice" and her other co-authors if I understand what you're asking correctly.

Related

Best way to order data by association value in rails

Trying to order all rows of a model for an ordering filter with react on the front, I have encountered this problem.
For example if I have "rooms" and each reservation having many products, having each product different prices, I came up with this way of ordering the rooms by the their respective lowest valued product or the highest:
scope :high_price, lambda { joins(:products).group('rooms.id').order('max(products.week_price) DESC') }
scope :low_price, lambda { joins(:products).group('rooms.id').order('min(products.week_price) ASC') }
The problem comes when, if I save this into an instance variable:
#ordered_rooms = Room.low_price
And then I try to manipulate given instance, I will run into this issue:
ActiveRecord::StatementInvalid (PG::GroupingError: ERROR: column "products.id" must appear in the GROUP BY clause or be used in an aggregate function
I found some explanation of the problem here, but then my doubt would be how to do this queries so this error does not come up?
I found that, if the metric I was looking for, would be the number of products by room, it would be easier to look up with:
scope :most_products, lambda { order('products_count DESC') }
adding this relation into product:
belongs_to :room, required: false, counter_cache: true
would it be possible to define this kind of cache for other metrics, or how should I go about these queries?

Query data from two associated tables

For my app:
user has_many images, image belongs_to user
image has_one location, location belongs_to image
Perhaps the location's fields should just be part of the image. But regardless, I'm trying to write this query in Rails:
SELECT image.caption, location.latitude, location.longitude
FROM image, location
WHERE location.image_id = image.id
AND image.user_id = 5
or alternatively, if it's easier:
SELECT image.*, location.*
FROM image, location
WHERE location.image_id = image.id
AND image.user_id = 5
How would I write this as an ActiveRecord query?
I think you want to read about Eager Loading Associations.
#images = Image.includes(:location).where("images.user_id = ?", 5)
This will find Image instances where user_id = 5. It then runs a 2nd query that will JOIN and build the associated Location instance (thats what the .includes(:location) will do for you).
This more closely matches your alternative query, as it does select all columns from images and location tables.
You can build an Array based on this containing a hash with only the keys you're interested in through something like this.
#hash_object = #images.collect { |i| { caption: i.caption, latitude: i.location.latitude, longitude: i.location.longitude } }
If you want to build this with only a single query, you can use .joins(:location) in combination with .includes(:location)
Image.joins(:location).includes(:location).where("images.user_id = ?", 5)
Important: This will omit Image instances who have no assoicated Location. You can modify the joins() a bit to help with this, but the above will have this omission.
If you really want only specific columns to be selected, read up on Selecting Specific Columns though there are warnings for the use of this
If the select method is used, all the returning objects will be read only.
and
Be careful because this also means you’re initializing a model object with only the fields that you’ve selected.
In Rails master (not out in 3.2.11) you can pass multiple columns to .pluck() but this appears to only be restricted to a single table (you wouldn't be able to get the locations table's :latitude and :longitude when plucking from Image). It's good to know about though.

Rails ActiveRecord::Relation object using including models

I need to retrieve information from two separate models which are similar but not the same. I am trying to do things like
I have looked into a few methods however they return an array of active objects rather than an
ActiveRecord::Relation which is required for many of the features of my app to work.
Is there any way to return an ActiveRecord::Relation object containing a union of both tables?
I have tried things like
#group = Mymodel.find_by_sql("SELECT id FROM Mymodels
UNION SELECT id FROM AnotherModels")
and also explored using the Model.where method however cannot return an ActiveRecord::Relation
EDIT:
Just to be clear I need to return ActiveRecord::Relation that is a union or a merge of the two tables
Have you tried MyFirstModel.joins(:my_second_models)? Check out details joins in the API here.
EDIT: Single Table Inheritance is a better solution to this problem. See comments below.
Try something like this:
Model.joins(:other_model).where("attr1" = :attr1,
{ attr1: "example" }).group(:attr1)
Since you commented about where, I added the where method on the call. You can also group everything using :group in the end.

Django aggregate query

I have a model Page, which can have Posts on it. What I want to do is get every Page, plus the most recent Post on that page. If the Page has no Posts, I still want the page. (Sound familiar? This is a LEFT JOIN in SQL).
Here is what I currently have:
Page.objects.annotate(most_recent_post=Max('post__post_time'))
This only gets Pages, but it doesn't get Posts. How can I get the Posts as well?
Models:
class Page(models.Model):
name = models.CharField(max_length=50)
created = models.DateTimeField(auto_now_add = True)
enabled = models.BooleanField(default = True)
class Post(models.Model):
user = models.ForeignKey(User)
page = models.ForeignKey(Page)
post_time = models.DateTimeField(auto_now_add = True)
Depending on the relationship between the two, you should be able to follow the relationships quite easily, and increase performance by using select_related
Taking this:
class Page(models.Model):
...
class Post(models.Model):
page = ForeignKey(Page, ...)
You can follow the forward relationship (i.e. get all the posts and their associated pages) efficiently using select_related:
Post.objects.select_related('page').all()
This will result in only one (larger) query where all the page objects are prefetched.
In the reverse situation (like you have) where you want to get all pages and their associated posts, select_related won't work. See this,this and this question for more information about what you can do.
Probably your best bet is to use the techniques described in the django docs here: Following Links Backward.
After you do:
pages = Page.objects.annotate(most_recent_post=Max('post__post_time'))
posts = [page.post_set.filter(post_time=page.most_recent_post) for page in pages]
And then posts[0] should have the most recent post for pages[0] etc. I don't know if this is the most efficient solution, but this was the solution mentioned in another post about the lack of left joins in django.
You can create a database view that will contain all Page columns alongside with with necessary latest Post columns:
CREATE VIEW `testapp_pagewithrecentpost` AS
SELECT testapp_page.*, testapp_post.* -- I suggest as few post columns as possible here
FROM `testapp_page` LEFT JOIN `testapp_page`
ON test_page.id = test_post.page_id
AND test_post.post_time =
( SELECT MAX(test_post.post_time)
FROM test_post WHERE test_page.id = test_post.page_id );
Then you need to create a model with flag managed = False (so that manage.py sync won't break). You can also use inheritance from abstract Model to avoid column duplication:
class PageWithRecentPost(models.Model): # Or extend abstract BasePost ?
# Page columns goes here
# Post columns goes here
# We use LEFT JOIN, so all columns from the
# 'post' model will need blank=True, null=True
class Meta:
managed = False # Django will not handle creation/reset automatically
By doing that you can do what you initially wanted, so fetch from both tables in just one query:
pages_with_recent_post = PageWithRecentPost.objects.filter(...)
for page in pages_with_recent_post:
print page.name # Page column
print page.post_time # Post column
However this approach is not drawback free:
It's very DB engine-specific
You'll need to add VIEW creation SQL to your project
If your models are complex it's very likely that you'll need to resolve table column name clashes.
Model based on a database view will very likely be read-only (INSERT/UPDATE will fail).
It adds complexity to your project. Allowing for multiple queries is a definitely simpler solution.
Changes in Page/Post will require re-creating the view.

rails scope filtering

Hey guys, I have the following scope:
scope :expires_within, lambda
{|time| where("created_at BETWEEN ? AND ?", 30.days.ago,
time.from_now - 30.days)}
It's not all that important, it works.
This simply gives me all of the objects in my database which were created within a certain time frame. What I want to do is filter this scope such that it removes some of the objects.
The above scope is on a model named Post. I have another model named Lock which "belongs to" a Post, and each Post "has many" Locks. So this means that there is a foreign key on each lock with the id of its corresponding Post.
What I want to accomplish is the following: I want to filter out the posts from the above scope which do not have any locks. So from an abstract/high-level view: I want to get the posts returned from the above scope and remove any which have any associated locks (even if just one).
Is this possible? Would I have to use some form of compound query, using something like except? I'd appreciate any help.
I currently have something that works, but I have a nagging feeling that it isn't very efficient, perhaps it can be done on the database by modifying the above scope and be more efficient:
Post.expires_within(1.day) - Lock.all.collect { |lock| lock.post }
So this basically gets the collection of posts, then it fetches each of the locks' posts and dumps them all into an array which is then subtracted from the original set of posts.
Someone who has already experienced this problem was kind enough to help me out on IRC (Radar), and pointed me to this answer. Now my new scope is the following:
scope :not_locked, lambda { joins("LEFT JOIN locks on
(posts.id = locks.post_id)").where("locks.post_id IS NULL") }
scope :expires_within, lambda {|time| where("posts.created_at BETWEEN ? AND ?",
30.days.ago, time.from_now - 30.days).not_locked }
And it works very well. Hope that helps anyone else out there with the same problem.
With plain ActiveRelation, string-based LEFT JOINs are unavoidable; however, you can greatly simplify the BETWEEN calculations using the Ruby Range class:
scope :expires_within, lambda { |time|
where(:created_at => 30.days.ago..(time.from_now - 30.days)) }
You should be do it with a subquery, something like...
scope :without_locks, :conditions => "not exists(select * from locks where posts.id = locks.post_id)"