Rails eager_load with conditions on association - sql

I have a Rails application which has Stations (weather stations) and Observations. The app shows many weather stations on a map with the current wind speed and direction.
I have a method which is used on the stations#index method which selects the stations and joins the latest observation per station.
class Station < ActiveRecord::Base
has_many :observations
def self.with_observations(limit = 1)
eager_load(:observations).where(
observations: { id: Observation.pluck_from_each_station(limit) }
)
end
end
Observation.pluck_from_each_station returns an array of ids. The observations table contains many thousands of rows so this is necessary to keep rails from eager loading thousands of records.
This method should return all the stations - whether the have any observations or not. However this is currently not the case.
it "includes stations that have no observations" do
new_station = create(:station)
stations = Station.with_observations(2)
expect(stations).to include new_station # fails
end
From my understanding a LEFT OUTER JOIN should return all rows wether the there are any results in the joined table or not. Why is this not working as expected?
This is an example of the SQL generated:
SELECT "stations"."id" AS t0_r0,
"stations"."name" AS t0_r1,
"stations"."hw_id" AS t0_r2,
"stations"."latitude" AS t0_r3,
"stations"."longitude" AS t0_r4,
"stations"."balance" AS t0_r5,
"stations"."timezone" AS t0_r6,
"stations"."user_id" AS t0_r7,
"stations"."created_at" AS t0_r8,
"stations"."updated_at" AS t0_r9,
"stations"."slug" AS t0_r10,
"stations"."speed_calibration" AS t0_r11,
"stations"."firmware_version" AS t0_r12,
"stations"."gsm_software" AS t0_r13,
"stations"."description" AS t0_r14,
"stations"."sampling_rate" AS t0_r15,
"stations"."status" AS t0_r16,
"observations"."id" AS t1_r0,
"observations"."station_id" AS t1_r1,
"observations"."speed" AS t1_r2,
"observations"."direction" AS t1_r3,
"observations"."max_wind_speed" AS t1_r4,
"observations"."min_wind_speed" AS t1_r5,
"observations"."temperature" AS t1_r6,
"observations"."created_at" AS t1_r7,
"observations"."updated_at" AS t1_r8,
"observations"."speed_calibration" AS t1_r9
FROM "stations"
LEFT OUTER JOIN
"observations"
ON "observations"."station_id" = "stations"."id"
WHERE "observations"."id" IN (450, 500, 550, 600, 650, 700, 750, 800);

I think that happens, because u are excluding records where "observations"."id" is null after the left join:
eager_load(:observations).where(
'"observations"."id" is null or "observations"."id" in (?)', Observation.pluck_from_each_station(limit)
)
It is logically the same as left join on two conditions, but as rails doesn't have this feature you can work around it using the where clause.

Related

Rails: How to load associations in a single query while customizing association columns

If I use eager_load to fetch the association I want in a single query, I get way too many columns:
scope = Product.eager_load(:account).to_a
scope.last.account.name
Then I get a query that looks like:
SQL (3.3ms) SELECT "products"."id" AS t0_r0, "products"."account_id" AS t0_r1, "products"."notes" AS t0_r2, "products"."created_at" AS t0_r3, "products"."updated_at" AS t0_r4, "products"."rep_id" AS t0_r5, "products"."senior_rep_id" AS t0_r6, "products"."name_id" AS t0_r7...
My goal is to just get the columns I want, such as I might expect by calling .select("intakes.id, intakes.account_id, accounts.id, accounts.name"). However if I add that to my query, it simply prepends that line to the front and basically ignores it:
SQL (3.3ms) SELECT products.id, products.account_id, accounts.id, accounts.name, "products"."id" AS t0_r0, "products"."account_id" AS t0_r1, "products"."notes" AS t0_r2, "products"."created_at" AS t0_r3, "products"."updated_at" AS t0_r4, "products"."rep_id" AS t0_r5, "products"."senior_rep_id" AS t0_r6, "products"."name_id" AS t0_r7...
So now my query is even longer, but I didn't get any benefit. I can also try with left_joins:
scope = Product.left_joins(:account).select("products.id, products.account_id, accounts.id, accounts.name").to_a
This will yield me N+1 queries, where the select is ignored for associations:
Products Load (0.9ms) SELECT products.id, products.account_id, accounts.id, accounts.name FROM "products" LEFT OUTER JOIN "accounts" ON "accounts"."id" = "products"."account_id"
Account Load (0.2ms) SELECT "accounts".* FROM "accounts" WHERE "accounts"."id" = $1 LIMIT $2 [["id", 16], ["LIMIT", 1]]
However I can avoid the N+1 if I recognize the selected columns DID go somewhere, in that accounts.name mapped to name on the Product:
scope.last.attributes["name"]
This technically gets me the information I want in a single query, but when building a real query with many associations, trying to rename and remap this data to lots of custom names suddenly makes me wonder why I'm using ActiveModel in the first place. Is there more of a "Rails Way" to do this where scope.last.account.name will still have its value set in the way it would had I used eager_load?
You can leave query scope = Product.left_joins(:account).select("products.id, products.account_id, accounts.id, accounts.name").to_a than your second line should be like scope.last.name to prevent second db request. If you have same columns in account and product tables set different names for them in select method (i.e. scope = Product.left_joins(:account).select("products.id, products.account_id, accounts.id, accounts.name as account_name").to_a than extract it like scope.last.account_name.

How to build an efficient select command across multiples tables in Rails4?

Trying to figure out an efficient ways to select records that have attributes across multiple tables. Here's the basic setup:
structure
Plants (fields: id, name_id, location_id, color) (1000 records)
Names (fields: id, Common_name) (50 records)
Location (fields: id, Bed_name) (125 records)
model
Plants - belongs_to Names, belongs_to Location
Names - has_many Plants
Location - has_many Plants
My goal is to output a list of every Rose in the side yard, and display the color, but I am stuck on the select command. If I get all plants (p = Plant.all) I know that I can easily create my output with a statement like <%= "#{p.name.common_name} in bed #{p.location.bed_name} has a color of #{p.color}" %>
If I do two joins I'm looking at way more records that I need and a MUCH longer search time. As an example - I have 67 roses in 16 different beds, however, I only have 3 roses in the side yard.
My gut tells me that I should be able to do something like:
select all plants with the name of Rose, then from this selection select all Roses that are in the side yard.
Can anybody help point me in the correct direction?
You can combine it all into a single query like this:
Plant.joins(:name, :location).where(names: { common_name: "rose" }, locations: { bed_name: "side" })
This results in a single SQL query like this:
SELECT "plants".* FROM "plants" INNER JOIN "names" ON "names"."id" = "plants"."name_id" INNER JOIN "locations" ON "locations"."id" = "plants"."location_id" WHERE "names"."common_name" = 'rose' AND "locations"."bed_name" = 'side'
Note that you have to use the plural table names in the where clause, but the singular association name in the joins clause.
This will run nearly instantaneously even with enormous tables, assuming your tables are properly indexed.
This is a simple example, but you can do fairly complex joins with conditions. Full details can be found in the ActiveRecord documentation.
Edit
Per #Dan's comment, you can speed this up more by using includes to pre-fetch the association data in the join:
Plant.includes(:name, :location).where(names: { common_name: "rose" }, locations: { bed_name: "side" })
This will load the related records from names and locations at the same time. includes is handy for eliminating (or at least reducing) N+1 queries. It is also smart enough to know when it can retrieve all the data in a single query, and falls back to multiple queries when that makes more sense; you don't have to think about it (although sometimes it can reduce efficiency, so keep an eye on your logs if you think it's reduces performance).
Using includes in this case is very efficient, resulting in a single SQL query which includes association data:
SELECT "plants"."id" AS t0_r0, "plants"."color" AS t0_r1, "plants"."name_id" AS t0_r2, "plants"."location_id" AS t0_r3, "plants"."created_at" AS t0_r4, "plants"."updated_at" AS t0_r5, "names"."id" AS t1_r0, "names"."common_name" AS t1_r1, "names"."created_at" AS t1_r2, "names"."updated_at" AS t1_r3, "locations"."id" AS t2_r0, "locations"."bed_name" AS t2_r1, "locations"."created_at" AS t2_r2, "locations"."updated_at" AS t2_r3 FROM "plants" LEFT OUTER JOIN "names" ON "names"."id" = "plants"."name_id" LEFT OUTER JOIN "locations" ON "locations"."id" = "plants"."location_id" WHERE "names"."common_name" = 'rose' AND "locations"."bed_name" = 'side'

Wrong results on multiple inner joins with an OR condition

I have three models: Event, Comment and Photo. An Event has both has_many :comments and has_many :photos.
My goal is to find all Events which have received new comments and/or photos in the last 24 hours.
Assume that two events exist, one with a recent comment and another with a recent photo.
If I query them separately with a single join, everything works as expected:
Event.joins(:comments).where("comments.created_at >= :today", :today => Time.now.beginning_of_day)
returns: [#<Event id: 1>]
Event.joins(:photos).where("photos.created_at >= :today", :today => Time.now.beginning_of_day)
returns: [#<Event id: 2>]
Why is it when I combine both joins:
Event.joins(:comments, :photos)
.where("comments.created_at >= :today OR photos.created_at >= :today", :today => Time.now.beginning_of_day)
that I receive [#<Event id: 2>, #<Event id: 2>] — twice #2, but not #1?
The SQL generated by ARel is
SELECT "events".*
FROM "events"
INNER JOIN "comments"
ON "comments"."event_id" = "events"."id"
INNER JOIN "photos"
ON "photos"."event_id" = "events"."id"
WHERE ( comments.created_at >= '2013-03-25 23:00:00.000000'
OR photos.created_at >= '2013-03-25 23:00:00.000000' )
I believe it is because of the INNER JOIN. This query would bring ONLY the events that has both photos and comments associated with it.
You can try using LEFT JOIN instead of INNER on your RDBMS and see that both results are fetched.
It's an SQL problem : you have to specify LEFT JOIN instead of INNER JOIN. include will do it for you :
Event.include(:comments, :photos).where(...).uniq

Rails Active Record Query Generation -- Strange behavior when using includes and dot('.')

I have a business model and an address model like this
class Business < ActiveRecord::Base
has_one :address, as: :addressable
end
class Address < ActiveRecord::Base
belongs_to :addressable, polymorphic: true
end
Now the strange part. If I run the following
Business.includes(:address).where(['email like ?', '%whatever%'])
Two Queries are generated
SELECT `businesses`.* FROM `businesses` WHERE (email like '%whatever%')
SELECT `addresses`.* FROM `addresses` WHERE `addresses`.`addressable_type` = 'Business' AND `addresses`.`addressable_id` IN (26)
But If The text that will appear inside the like clause has a dot('.') like the below
Business.includes(:address).where(['email like ?', '%what.ever%'])
This time a single query is generated using the JOIN
SELECT `businesses`.`id` AS t0_r0, `businesses`.`name` AS t0_r1, `businesses`.`primary_category_id` AS t0_r2, `businesses`.`secondary_category_id` AS t0_r3, `businesses`.`sub_primary_category_id` AS t0_r4, `businesses`.`sub_secondary_category_id` AS t0_r5, `businesses`.`website` AS t0_r6, `businesses`.`phone` AS t0_r7, `businesses`.`manager_name` AS t0_r8, `businesses`.`manager_phone` AS t0_r9, `businesses`.`email` AS t0_r10, `businesses`.`created_at` AS t0_r11, `businesses`.`updated_at` AS t0_r12, `businesses`.`encrypted_password` AS t0_r13, `businesses`.`reset_password_token` AS t0_r14, `businesses`.`reset_password_sent_at` AS t0_r15, `businesses`.`remember_created_at` AS t0_r16, `businesses`.`sign_in_count` AS t0_r17, `businesses`.`current_sign_in_at` AS t0_r18, `businesses`.`last_sign_in_at` AS t0_r19, `businesses`.`current_sign_in_ip` AS t0_r20, `businesses`.`last_sign_in_ip` AS t0_r21, `businesses`.`confirmation_token` AS t0_r22, `businesses`.`confirmed_at` AS t0_r23, `businesses`.`confirmation_sent_at` AS t0_r24, `businesses`.`failed_attempts` AS t0_r25, `businesses`.`unlock_token` AS t0_r26, `businesses`.`locked_at` AS t0_r27, `addresses`.`id` AS t1_r0, `addresses`.`line1` AS t1_r1, `addresses`.`line2` AS t1_r2, `addresses`.`city` AS t1_r3, `addresses`.`country` AS t1_r4, `addresses`.`zip` AS t1_r5, `addresses`.`neighbourhood` AS t1_r6, `addresses`.`addressable_id` AS t1_r7, `addresses`.`addressable_type` AS t1_r8, `addresses`.`created_at` AS t1_r9, `addresses`.`updated_at` AS t1_r10, `addresses`.`latitude` AS t1_r11, `addresses`.`longitude` AS t1_r12, `addresses`.`gmaps` AS t1_r13 FROM `businesses` LEFT OUTER JOIN `addresses` ON `addresses`.`addressable_id` = `businesses`.`id` AND `addresses`.`addressable_type` = 'Business' WHERE (email like '%what.ever%')
I have tried with different values for the like clause, and noticed that the JOIN query is only generated when the input has a dot('.') and there are atleast 2 chars before the dot('.').
Few examples
'whatever' results in 2 Queries without JOIN
'w.hatever' results in 2 Queries without JOIN
'w.h' results in 2 Queries without JOIN
'w.ha' results in 2 Queries without JOIN
'w.' results in 2 Queries without JOIN
'xxxxxxxxw.hatever' results in 1 Query with JOIN
'what.ever' results in 1 Query with JOIN
'wh.' results in 1 Query with Join
My Guess is this has got something to do with Rails thinking that if there is a dot in the where clause It may refer to another table so a join should be applied and if there isn't a dot or if there is only one char before the dot ( may be table names can't of a single char that's y.. A guess!!) than it generates 2 queries.
Anyone has seen this kind of issue? I am using Rails 3.2

ruby on rails: how does this sql query work exactly?

What does the #posts line do exactly?
def index
if params[:user_id] && params[:artist_id]
#id = params[:user_id]
#name = Artist.find(params[:artist_id]).name
#posts = Post.includes(:user).where('users.id' => #id).joins(:artists).where('artists.name' => #name)
end
end
It seems to give me this really long query: confused as to why it needs all this.
Thanks
Started GET "/users/example4/artists/22/posts" for 127.0.0.1 at 2011-07-29 16:34:48 -0700
Processing by PostsController#index as HTML
Parameters: {"user_id"=>"example4", "artist_id"=>"22"}
Artist Load (0.2ms) SELECT "artists".* FROM "artists" WHERE "artists"."id" = 22 LIMIT 1
Post Load (0.5ms) SELECT "posts"."id" AS t0_r0, "posts"."title" AS t0_r1, "posts"."content" AS t0_r2, "posts"."user_id" AS t0_r3, "posts"."created_at" AS t0_r4, "posts"."updated_at" AS t0_r5, "posts"."item_name" AS t0_r6, "posts"."a_name" AS t0_r7, "posts"."image" AS t0_r8,
"posts"."collection_id" AS t0_r09, "posts"."featured_post" AS t0_r10, "users"."id" AS t1_r0,
"users"."email" AS t1_r1, "users"."encrypted_password" AS t1_r2, "users"."reset_password_token" AS
t1_r3, "users"."remember_token" AS t1_r4, "users"."remember_created_at" AS t1_r5,
"users"."sign_in_count" AS t1_r6, "users"."current_sign_in_at" AS t1_r7, "users"."last_sign_in_at"
AS t1_r8, "users"."current_sign_in_ip" AS t1_r9, "users"."last_sign_in_ip" AS t1_r10,
"users"."created_at" AS t1_r11, "users"."updated_at" AS t1_r12, "users"."name" AS t1_r13,
"users"."username" AS t1_r14, "users"."bio" AS t1_r15, "users"."avatar" AS t1_r16,
"users"."cached_slug" AS t1_r17, "users"."bg_image" AS t1_r18, "users"."bg_tile" AS t1_r19 FROM
"posts" INNER JOIN "artisanships" ON "posts"."id" = "artisanships"."post_id" INNER JOIN "artists"
ON "artists"."id" = "artisanships"."artist_id" LEFT OUTER JOIN "users" ON "users"."id" =
"posts"."user_id" WHERE "users"."id" = 0 AND "artists"."name" = 'bobby' ORDER BY posts.created_at
DESC
User Load (1.6ms) SELECT "users".* FROM "users" WHERE "users"."id" = 7 LIMIT 1
Rendered posts/artists.html.erb within layouts/application (275.3ms)
Completed 200 OK in 481ms (Views: 316.4ms | ActiveRecord: 3.4ms)
I'd recommend reading the AR querying guide if you haven't yet.
The code you've pasted is from a controller–and in that action, you're asking for multiple things. That's why multiple queries are taking place.
This line:
#name = Artist.find(params[:artist_id]).name
is querying the artists table and getting the name attribute from the result
While this line:
#posts = Post.includes(:user).where('users.id' => #id).joins(:artists).where('artists.name' => #name)
has both an .includes and a .joins in it ... that's why all the querying is required. As long as you have proper indexes created in your table, it shouldn't be a problem.
It's using a method called eager loading. That's what the includes does.
What this means is instead of when you do
#posts.first.user
rails querying again it writes 3 queries and calls all the users for the post and loads them into memory. That way you cut down on query counts when you do something like
#posts.each do |post|
pusts post.user
The joins is the same as an SQL join