Why is Rails ActiveRecord hitting the database twice? - sql

#integration = Integration.first(:conditions=> {:integration_name => params[:integration_name]}, :joins => :broker, :select => ['`integrations`.*, `brokers`.*'])
$stderr.puts #integration.broker.id # This line causes Brokers to be queried again
Results in:
Integration Load (0.4ms) SELECT `integrations`.*, `brokers`.* FROM `integrations` INNER JOIN `brokers` ON `brokers`.id = `integrations`.broker_id WHERE (`integrations`.`integration_name` = 'chicke') LIMIT 1
Integration Columns (1.5ms) SHOW FIELDS FROM `integrations`
Broker Columns (1.6ms) SHOW FIELDS FROM `brokers`
Broker Load (0.3ms) SELECT * FROM `brokers` WHERE (`brokers`.`id` = 1)
Any ideas why Rails would hit the databse again for brokers even though I already joined/selected them?
Here are the models (Broker -> Integration is a 1-to-many relationship). Note that this is incomplete, and I have only included the lines that establish their relationship
class Broker < ActiveRecord::Base
# ActiveRecord Associations
has_many :integrations
class Integration < ActiveRecord::Base
belongs_to :broker
I'm using Rails/ActiveRecord 2.3.14, so keep that in mind.
When I do Integration.first(:conditions=> {:integration_name => params[:integration_name]}, :include => :broker) that line causes two SELECTs
Integration Load (0.6ms) SELECT * FROM `integrations` WHERE (`integrations`.`integration_name` = 'chicke') LIMIT 1
Integration Columns (2.4ms) SHOW FIELDS FROM `integrations`
Broker Columns (1.9ms) SHOW FIELDS FROM `brokers`
Broker Load (0.3ms) SELECT * FROM `brokers` WHERE (`brokers`.`id` = 1)

Use include instead of joins to avoid reloading Broker object.
Integration.first(:conditions=>{:integration_name => params[:integration_name]},
:include => :broker)
There is no need to give the select clause as you are not trying to normalize the brokers table columns.
Note 1:
While eager loading dependencies, AR executes one SQL per dependency. In your case AR will execute main sql + broker sql. Since you are trying to get one row there isn't much gain. When you are trying to access N rows you will avoid the N+1 problem if you eager-load the dependencies.
Note 2:
In some cases it might be beneficial to use custom eager loading strategies. Let us assume that you just want to get the associated broker name for the integration. You can optimize your sql as follows:
integration = Integration.first(
:select => "integrations.*, brokers.name broker_name",
:conditions=>{:integration_name => params[:integration_name]},
:joins => :broker)
integration.broker_name # prints the broker name
The object returned by the query will have all the aliased columns in the select clause.
Above solution will not work when you want to return the Integration object even when there is no corresponding Broker object. You have to use OUTER JOIN.
integration = Integration.first(
:select => "integrations.*, brokers.name broker_name",
:conditions=>{:integration_name => params[:integration_name]},
:joins => "LEFT OUTER JOIN brokers ON brokers.integration_id = integrations.id")

The :joins options just makes active record add a join clause to the query. It doesn't actually make active record do anything with the rows that have been returned. The association isn't loaded and so accessing it triggers a query
The :include option is all about loading associations ahead of time. Active record has two strategies for doing this. One is via a big join query and one is by firing one query per association. The default is the latter, which is why you see two queries.
On rails 3.x you can decide which of those strategies you want by doing Integration.preload(:broker) or Integration.eager_graph(:broker).
There is no such facility in rails 2.x, so the only thing you can do is trick the heuristics used to determine the strategy. Whenever rails thinks that the order clause, select clause or conditions refer to columns on the included association it switches to the joins strategy (because it is the only one that works in that case).
For example doing something like
Integration.first(:conditions => {...}, :include => :broker, :select => 'brokers.id as ignored')
should force the alternate strategy (and active record actually ignores the select option in this case).

Related

Rails/ActiveRecord: Can I perform this query without passing the SQL string to #order?

I have two models Issue and Label. They have a many to many relationship.
I have a method that returns the ten labels that point to the most issues.
class Label < ApplicationRecord
has_many :tags
has_many :issues, through: :tags
def self.top
Label.joins(:issues)
.group(:name)
.order('count_id desc')
.count(:id)
.take(10)
end
end
It does exactly what I expect it to but I want to know if it's possible to compose the query without the SQL string.
order('count_id DESC') is confusing me. Where does count_id come from? There isn’t a column named count_id.
Label.joins(:issues).group(:name).column_names
#=> ["id", "name", "created_at", "updated_at"]
I’ve found some SQL examples here. I think it’s basically the same as ORDER BY COUNT(Id):
SELECT COUNT(Id), Country
FROM Customer
GROUP BY Country
ORDER BY COUNT(Id) DESC
Is it possible to perform the same query without passing in the SQL string? Can it be done with the ActiveRecord querying interface alone?
If you look at your query log, you'll see something like:
select count(labels.id) as count_id ...
The combination of your group call (with any argument) and the count(:id) call gets ActiveRecord to add the count_id column alias to the query. I don't think this is documented or specified anywhere (at least that I can find) but you can see it happen if you're brave enough to walk through the Active Record source.
In general, if you add a GROUP BY and then count(:x), Active Record will add a count_x alias. There's no column for this so you can't say order(:count_id), order(count_id: :desc), or any of the other common non-String alternatives. AFAIK, you have to use a string but you can wrap it in an Arel.sql to prevent future deprecation issues:
Label.joins(:issues)
.group(:name)
.order(Arel.sql('count_id desc'))
.count(:id)
.take(10)
There's no guarantee about this so if you use it, you should include something in your test suite to catch any problems if the behavior changes in the future.

Chaining scopes with joins

these two scopes don't seem to be chainable
scope :approved, ->{ with_stage(:approved)}
which in sql is
WHERE (pages.stage & 4 <> 0)
and
scope :with_galleries, ->{ joins("LEFT OUTER JOIN galleries ON galleries.galleriable_type = 'Brand' AND galleries.galleriable_id = page.brand_id").where("galleries.id is NOT NULL") }
this scope should give only the pages that have galleries (each page has one brand and each brand can have many galleries)
if I chain the :with_galleries, it seems that the rest of the conditions on pages table is lost
Am I doing the joins wrong?
You would get a more useful result if you let ActiveRecord do more of the heavy lifting for you. In particular, if you've set up associations properly, you should be able to write the following instead:
scope :with_galleries, joins(brand: :galleries)
... which would yield a properly chainable scope.
That would depend on two associations, one from your page model to the brand:
'belongs_to :brand'
and one from the brand to the galleries::
has_many :galleries, as: :galleriable
I'm inferring your model setup from the query that you've written, so I may have guessed wrong. But the basic principle here is to declare your associations and let ActiveRecord construct queries (unless your query is something very unusual, which yours is not -- you're just filtering depending on whether there are associated records, a common operation).
You need to construct the second scope using Arel. The simplest approach is to craft the full SQL statement you want the second scope to represent, and then paste it into http://www.scuttle.io/

Get all data from all tables (Postgresql)

data = Program.joins(:program_schedules, :channel).
where(
"program_schedules.start" => options[:time_range],
"programs.ptype" => "movie",
"channels.country" => options[:country]).
where("programs.official_rating >= ?", options[:min_rating]).
group("programs.id").
order("programs.popularity DESC")
This query retrieve only the "programs" table (I think because the "group by" clause).
How I could retrieve all data from all tables (programs, programs_schedules, channel) ?
class Program < ActiveRecord::Base
belongs_to :channel
has_many :program_schedules
Ruby on Rails 3.2
Postgresql 9.2
Are you looking for the eager loading of associations provided by the includes method ?
Eager loading is a way to find objects of a certain class and a number of named associations. This is one of the easiest ways of to prevent the dreaded 1+N problem in which fetching 100 posts that each need to display their author triggers 101 database queries. Through the use of eager loading, the 101 queries can be reduced to 2.
Update:
You can get the SQL as a string by appending .to_sql to your query. .explain can also help.
If you want to get the data as a hash and no more as model instances, you can serialize the result of the query and include the associations using .serializable_hash :
data.serializable_hash(include: [:program_schedules, :channel])
You can define which fields to include using :except or :only options in the same way as described here for .as_json (which acts similarly) :
http://api.rubyonrails.org/classes/ActiveModel/Serializers/JSON.html#method-i-as_json

ActiveRecord: can't use `pluck` after `where` clause with eager-loaded associations

I have an app that has a number of Post models, each of which belongs_to a User model. When these posts are published, a PublishedPost model is created that belongs_to the relevant Post model.
I'm trying to build an ActiveRecord query to find published posts that match a user name, then get the ids of those published posts, but I'm getting an error when I try to use the pluck method after eager-loading my associations and searching them with the where method.
Here's (part of) my controller:
class PublishedPostsController < ApplicationController
def index
ar_query = PublishedPost.order("published_posts.created_at DESC")
if params[:searchQuery].present?
search_query = params[:searchQuery]
ar_query = ar_query.includes(:post => :user)
.where("users.name like ?", "%#{search_query}%")
end
#found_ids = ar_query.pluck(:id)
...
end
end
When the pluck method is called, I get this:
ActiveRecord::StatementInvalid: Mysql2::Error: Unknown column 'users.name' in 'where clause': SELECT id FROM `published_posts` WHERE (users.name like '%Andrew%') ORDER BY published_posts.created_at DESC
I can get the results I'm looking for with
#found_ids = ar_query.select(:id).map{|r| r.id}
but I'd rather use pluck as it seems like the cleaner way to go. I can't figure out why it's not working, though. Any ideas?
You need to and should do joins instead of includes here.
The two functions are pretty similar except that the data from joins is not returned in the result of the query whereas the data in an includes is.
In that respect, includes and pluck are kind of antithetical. One says to return me all the data you possibly can, whereas the other says to only give me only this one little bit.
Since you only want a small amount of the data, you want to do joins. (Strangely select which also seems somewhat antithetical still works, but you would need to remove the ambiguity over id in this case.)
Try it out in the console and you'll see that includes causes a query that looks kind of like this: SELECT "posts"."id" as t0_ro, "posts"."text" as t0_r1, "users"."id" as t1_r0, "users"."name" as t1_r1 ... When you tack on a pluck statement all those crazy tx_ry columns go away and are replaced by whatever you specified.
I hope that helps, but if not maybe this RailsCast can. It is explained around the 5 minute mark.
http://railscasts.com/episodes/181-include-vs-joins
If you got here by searching "rails pluck ambiguous column", you may want to know you can just replace query.pluck(:id) with:
query.pluck("table_name.id")
Your query wouldn't work as it is written, even without the pluck call.
Reason being, your WHERE clause includes literal SQL referencing the users table which Rails doesn't notice and decides to use multiple queries and join in memory ( .preload() ) instead of joining in the database level ( .eager_load() ):
SELECT * from published_posts WHERE users.name like "pattern" ORDER BY published_posts.created_at DESC
SELECT * from posts WHERE id IN ( a_list_of_all_post_ids_in_publised_posts_returned_above )
SELECT * from users WHERE id IN ( a_list_of_all_user_ids_in_posts_returned_above )
The first of the 3 queries fails and it is the error you get.
To force Rails use a JOIN here, you should either use the explicit .eager_load() instead of .includes(), or add a .references() clause.
Other than that, what #Geoff answered stands, you don't really need to .includes() here, but rather a .joins().

NHibernate Query parent and child objects eagerly without join

I have simple domain with Order and OrderLines. Is it possible to load the Order and associated OrderLine objects without a join? I'm trying to use the Future/FutureValue to perform two simple queries. I'm hoping that NHibernate knows how to combine these in the cache. I'm using NHibernate 3.2 with code only mapping.
So far here is what I have:
// Get all the order lines for the order
var lineQuery = session.QueryOver<OrderLine>()
.Where(x => x.WebOrder.Id == id).Future<OrderLine>();
// Get the order
var orderQuery = session.QueryOver<WebOrder>()
.Where(x => x.Id == id)
.FutureValue<WebOrder>();
var order = orderQuery.Value;
This works as expected sending two queries into the database. However when I use loop to go through order.OrderLines NHibernate send another query to get the order lines. My guess is that because I'm using constraints (Where(x => ...) NHibernate doesn't know how to get the objects from the session cache.
Why do I want to do this without join
I know I can use Fetch(x => x.OrderLines).Eager but sometimes the actual parent (in this case Order) is so large that I don't want to perform the join. After all the result set contains all the order columns for each orderline if I perform the join. I don't have any raw numbers or anything I'm just wondering if this is possible.
it's quite possilbe. see nHib's fetching strategies.
you can choose either 'select' (if you're only dealing with one Order at a time) or 'subselect'.