ActiveRecord: can't use `pluck` after `where` clause with eager-loaded associations - sql

I have an app that has a number of Post models, each of which belongs_to a User model. When these posts are published, a PublishedPost model is created that belongs_to the relevant Post model.
I'm trying to build an ActiveRecord query to find published posts that match a user name, then get the ids of those published posts, but I'm getting an error when I try to use the pluck method after eager-loading my associations and searching them with the where method.
Here's (part of) my controller:
class PublishedPostsController < ApplicationController
def index
ar_query = PublishedPost.order("published_posts.created_at DESC")
if params[:searchQuery].present?
search_query = params[:searchQuery]
ar_query = ar_query.includes(:post => :user)
.where("users.name like ?", "%#{search_query}%")
end
#found_ids = ar_query.pluck(:id)
...
end
end
When the pluck method is called, I get this:
ActiveRecord::StatementInvalid: Mysql2::Error: Unknown column 'users.name' in 'where clause': SELECT id FROM `published_posts` WHERE (users.name like '%Andrew%') ORDER BY published_posts.created_at DESC
I can get the results I'm looking for with
#found_ids = ar_query.select(:id).map{|r| r.id}
but I'd rather use pluck as it seems like the cleaner way to go. I can't figure out why it's not working, though. Any ideas?

You need to and should do joins instead of includes here.
The two functions are pretty similar except that the data from joins is not returned in the result of the query whereas the data in an includes is.
In that respect, includes and pluck are kind of antithetical. One says to return me all the data you possibly can, whereas the other says to only give me only this one little bit.
Since you only want a small amount of the data, you want to do joins. (Strangely select which also seems somewhat antithetical still works, but you would need to remove the ambiguity over id in this case.)
Try it out in the console and you'll see that includes causes a query that looks kind of like this: SELECT "posts"."id" as t0_ro, "posts"."text" as t0_r1, "users"."id" as t1_r0, "users"."name" as t1_r1 ... When you tack on a pluck statement all those crazy tx_ry columns go away and are replaced by whatever you specified.
I hope that helps, but if not maybe this RailsCast can. It is explained around the 5 minute mark.
http://railscasts.com/episodes/181-include-vs-joins

If you got here by searching "rails pluck ambiguous column", you may want to know you can just replace query.pluck(:id) with:
query.pluck("table_name.id")

Your query wouldn't work as it is written, even without the pluck call.
Reason being, your WHERE clause includes literal SQL referencing the users table which Rails doesn't notice and decides to use multiple queries and join in memory ( .preload() ) instead of joining in the database level ( .eager_load() ):
SELECT * from published_posts WHERE users.name like "pattern" ORDER BY published_posts.created_at DESC
SELECT * from posts WHERE id IN ( a_list_of_all_post_ids_in_publised_posts_returned_above )
SELECT * from users WHERE id IN ( a_list_of_all_user_ids_in_posts_returned_above )
The first of the 3 queries fails and it is the error you get.
To force Rails use a JOIN here, you should either use the explicit .eager_load() instead of .includes(), or add a .references() clause.
Other than that, what #Geoff answered stands, you don't really need to .includes() here, but rather a .joins().

Related

Rails/ActiveRecord: Can I perform this query without passing the SQL string to #order?

I have two models Issue and Label. They have a many to many relationship.
I have a method that returns the ten labels that point to the most issues.
class Label < ApplicationRecord
has_many :tags
has_many :issues, through: :tags
def self.top
Label.joins(:issues)
.group(:name)
.order('count_id desc')
.count(:id)
.take(10)
end
end
It does exactly what I expect it to but I want to know if it's possible to compose the query without the SQL string.
order('count_id DESC') is confusing me. Where does count_id come from? There isn’t a column named count_id.
Label.joins(:issues).group(:name).column_names
#=> ["id", "name", "created_at", "updated_at"]
I’ve found some SQL examples here. I think it’s basically the same as ORDER BY COUNT(Id):
SELECT COUNT(Id), Country
FROM Customer
GROUP BY Country
ORDER BY COUNT(Id) DESC
Is it possible to perform the same query without passing in the SQL string? Can it be done with the ActiveRecord querying interface alone?
If you look at your query log, you'll see something like:
select count(labels.id) as count_id ...
The combination of your group call (with any argument) and the count(:id) call gets ActiveRecord to add the count_id column alias to the query. I don't think this is documented or specified anywhere (at least that I can find) but you can see it happen if you're brave enough to walk through the Active Record source.
In general, if you add a GROUP BY and then count(:x), Active Record will add a count_x alias. There's no column for this so you can't say order(:count_id), order(count_id: :desc), or any of the other common non-String alternatives. AFAIK, you have to use a string but you can wrap it in an Arel.sql to prevent future deprecation issues:
Label.joins(:issues)
.group(:name)
.order(Arel.sql('count_id desc'))
.count(:id)
.take(10)
There's no guarantee about this so if you use it, you should include something in your test suite to catch any problems if the behavior changes in the future.

How to simulate ActiveRecord Model.count.to_sql

I want to display the SQL used in a count. However, Model.count.to_sql will not work because count returns a FixNum that doesn't have a to_sql method. I think the simplest solution is to do this:
Model.where(nil).to_sql.sub(/SELECT.*FROM/, "SELECT COUNT(*) FROM")
This creates the same SQL as is used in Model.count, but is it going to cause a problem further down the line? For example, if I add a complicated where clause and some joins.
Is there a better way of doing this?
You can try
Model.select("count(*) as model_count").to_sql
You may want to dip into Arel:
Model.select(Arel.star.count).to_sql
ASIDE:
I find I often want to find sub counts, so I embed the count(*) into another query:
child_counts = ChildModel.select(Arel.star.count)
.where(Model.arel_attribute(:id).eq(
ChildModel.arel_attribute(:model_id)))
Model.select(Arel.star).select(child_counts.as("child_count"))
.order(:id).limit(10).to_sql
which then gives you all the child counts for each of the models:
SELECT *,
(
SELECT COUNT(*)
FROM "child_models"
WHERE "models"."id" = "child_models"."model_id"
) child_count
FROM "models"
ORDER BY "models"."id" ASC
LIMIT 10
Best of luck
UPDATE:
Not sure if you are trying to solve this in a generic way or not. Also not sure what kind of scopes you are using on your Model.
We do have a method that automatically calls a count for a query that is put into the ui layer. I found using count(:all) is more stable than the simple count, but sounds like that does not overlap your use case. Maybe you can improve your solution using the except clause that we use:
scope.except(:select, :includes, :references, :offset, :limit, :order)
.count(:all)
The where clause and the joins necessary for the where clause work just fine for us. We tend to want to keep the joins and where clause since that needs to be part of the count. While you definitely want to remove the includes (which should be removed by rails automatically in my opinion), but the references (much trickier especially in the case where it references a has_many and requires a distinct) that starts to throw a wrench in there. If you need to use references, you may be able to convert these over to a left_join.
You may want to double check the parameters that these "join" methods take. Some of them take table names and others take relation names. Later rails version have gotten better and take relation names - be sure you are looking at the docs for the right version of rails.
Also, in our case, we spend more time trying to get sub selects with more complicated relationships, we have to do some munging. Looks like we are not dealing with where clauses as much.
ref2

Rails ActiveRecord finding questions by tag in named scope

I want the equivalent of SO search by tag, so I need an exists query but I also still need to left join on all tags. I've tried a couple of approaches and I'm out of ideas.
The Qustion - Tag relationship is through has_and_belongs_to_many both ways (i.e. I have a QuestionTags joiner table)
e.g.
Question.join(:tags).where('tag.name = ?', tag_name).includes(:tags)
I would expect this to do what I need but actually it just mashes up the includes with the join and I just end up with basically an inner join.
Question.includes(:tags)
.where("exists (
select 1 from questions_tags
where question_id = questions.id
and tag_id = (select id
from tags
where tags.name = ?))", tag_name)
This fetches the correct results but a) is really ugly and b) gives a deprecation warning as again it seems to confuse the includes with the join:
DEPRECATION WARNING: It looks like you are eager loading table(s) (one
of: questions, tags) that are referenced in a string SQL sn ippet. For
example:
Post.includes(:comments).where("comments.title = 'foo'")
Note I'm trying to write these as named scopes.
Let me know if the question isn't clear. Thanks in advance.
OK, got it. I know no built in syntax to do it. I have used an alternative before, You can do like this:
Question.include(:tags).where("questions.id IN (
#{ Question.joins(:tags).where('tags.name = ?', tag_name).select('questions.id').to_sql})")
You can also join this subquery to your questions table instead of using IN. Alternatively if You are not against adding gems and You are using Postgres, use this gem.
It provides really neat syntax for advanced queries.
Use preload instead of includes:
Question.preload(:tags).where("exists ....

Rails .joins doesn't load the association

Helo,
My query:
#county = County.joins(:state)
.where("counties.slug = ? AND states.slug = ?", params[:county_slug])
.select('states.*, counties.*')
.first!
From the log, the SQL looks like this:
SELECT states.*, counties.* FROM "counties" INNER JOIN "states" ON "states"."id" = "counties"."state_id" LIMIT 1
My problem is that is doesn't eager load the data from the associated table (states), because when I do, for example, #county.state.name, it runs another query, although, as you can see from the log, it had already queried the database for the data in that table as well. But it doesn't pre populate #county.state
Any idea how i can get all the data from the database in just ONE query?
Thx
I think you need to use include instead of joins to get the eager loading. There's a good railscasts episode about the differences: http://railscasts.com/episodes/181-include-vs-joins , in particular:
The question we need to ask is “are we using any of the related model’s attributes?” In our case the answer is “yes” as we’re showing the user’s name against each comment. This means that we want to get the users at the same time as we retrieve the comments and so we should be using include here.

Rails Query Issue

I have photos which have_many comments.
I want to select whatever photos have recent comments and display those photos in a kind of "timeline" where the most recently commented photo is at the top and other photos fall below.
I tried this, and it worked on SQLite:
#photos = Photo.select('DISTINCT photos.*').joins(:comments).order('comments.created_at DESC')
However testing on PostgreSQL raises this error:
PGError: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
\n: SELECT DISTINCT photos.* FROM \"photos\" INNER JOIN \"comments\" ON \...
So, the problem is, I'm selecting Photos but ordering by recency of comments... and Postgre doesn't like that.
Can anyone suggest either:
A: How I can fix this query...
or
B: A different way to retrieve photos by the recency of their comments?
The important reason I'm doing it this way instead of through the comments model is I want to show each photo once with any recent comments beside it, not show each comment by itself with the same photos appearing multiple times.
Thanks!
Check out the :touch parameter of of the belongs_to association:
:touch
If true, the associated object will be
touched (the updated_at/on attributes
set to now) when this record is either
saved or destroyed. If you specify a
symbol, that attribute will be updated
with the current time instead of the
updated_at/on attribute.
http://api.rubyonrails.org/classes/ActiveRecord/Associations/ClassMethods.html#method-i-belongs_to
In your Comment model, therefore, you would have:
belongs_to :photo, :touch => :comments_updated_at
Now, in order to create a time line of photos with recently updated comments all you need to do is:
Photo.order('comments_updated_at DESC').all
Just be sure to add the "comments_updated_at" datetime field to your Photo model.
Make sense?
Just for the future readers of this question, the real answer to your SQL issue in SQlite vs Postgresql is that in the SQL "standard", every selected column needs to be in the GROUP BY or be an aggregate function.
https://www.techonthenet.com/sql/group_by.php (or whatever SQL ref you want to take a look at)
Your SQLite query used SELECT * instead of specific columns. That would have blown up with a similar error on most databases like Postgresql (MySQL, Maria, probably MSSQL Server). It's definitely invalid SQL grammar for a lot of good reasons.
Under the hood, I have no clue what SQlite does -- maybe it expands the * into fields and adds them to the GROUP BY under the hood? But its not a good SQL statement which is which it threw the error.