Rails JOIN returns seemingly correct SQL, but returns the wrong result - sql

The setup I have has publications, drafts, and live versions. Publication has a polymorphic belongs_to since many different types of objects can be drafted.
# Publication.all
Publication id: 1, publishable_id: 2, publishable_type: "Foo",
original_id: 1, original_type: "Foo"
# published scope on Foo
select('*, MAX(publications.created_at)').
joins(:publications).
group('publications.original_id')
# Foo.published.all
[<Foo id: 1, ...>]
Here is the published scope's to_sql:
SELECT *, MAX(publications.created_at)
FROM "foos"
INNER JOIN "publications"
ON "publications"."publishable_id" = "foos"."id"
AND "publications"."publishable_type" = 'Foo'
GROUP BY publications.original_id
Because there is only one publication with a publishable_id of 2, I expect this query to return the second Foo. But when I call the published scope on Foo, I instead get the first one. How is this possible? I thought that an INNER JOIN would limit the results to where the join condition is satisfied? How am I getting the complete opposite of what I'm looking for?
Something interesting: just performing the joins returns the correct result:
self.class.unscoped.joins(:publications)
However, the published scope (shown above) returns the incorrect result. Is something happening with the SELECT or GROUP BY parts of the query that is causing this?

Related

Get records with no related data using activerecord and RoR3?

I am making scopes for a model that looks something like this:
class PressRelease < ActiveRecord::Base
has_many :publications
end
What I want to get is all press_releases that does not have publications, but from a scope method, so it can be chained with other scopes. Any ideas?
Thanks!
NOTE: I know that there are methods like present? or any? and so on, but these methods does not return an ActiveRecord::Relation as scope does.
NOTE: I am using RoR 3
Avoid eager_loading if you do not need it (it adds overhead). Also, there is no need for subselect statements.
scope :without_publications, -> { joins("LEFT OUTER JOIN publications ON publications.press_release_id = press_releases.id").where(publications: { id: nil }) }
Explanation and response to comments
My initial thoughts about eager loading overhead is that ActiveRecord would instantiate all the child records (publications) for each press release. Then I realized that the query will never return press release records with publications. So that is a moot point.
There are some points and observations to be made about the way ActiveRecord works. Some things I had previously learned from experience, and some things I learned exploring your question.
The query from includes(:publications).where(publications: {id: nil}) is actually different from my example. It will return all columns from the publications table in addition to the columns from press_releases. The publication columns are completely unnecessary because they will always be null. However, both queries ultimately result in the same set of PressRelease objects.
With the includes method, if you add any sort of limit, for example chaining .first, .last or .limit(), then ActiveRecord (4.2.4) will resort to executing two queries. The first query returns IDs, and the second query uses those IDs to get results. Using the SQL snippet method, ActiveRecord is able to use just one query. Here is an example of this from one of my applications:
Profile.includes(:positions).where(positions: { id: nil }).limit(5)
# SQL (0.8ms) SELECT DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "positions" ON "positions"."profile_id" = "profiles"."id" WHERE "positions"."id" IS NULL LIMIT 5
# SQL (0.8ms) SELECT "profiles"."id" AS t0_r0, ..., "positions"."end_year" AS t1_r11 FROM "profiles" LEFT OUTER JOIN "positions" ON "positions"."profile_id" = "profiles"."id" # WHERE "positions"."id" IS NULL AND "profiles"."id" IN (107, 24, 7, 78, 89)
Profile.joins("LEFT OUTER JOIN positions ON positions.profile_id = profiles.id").where(positions: { id: nil }).limit(5)
# Profile Load (1.0ms) SELECT "profiles".* FROM "profiles" LEFT OUTER JOIN positions ON positions.profile_id = profiles.id WHERE "positions"."id" IS NULL LIMIT 5
Most importantly
eager_loading and includes were not intended to solve the problem at hand. And for this particular case I think you are much more aware of what is needed than ActiveRecord is. You can therefore make better decisions about how to structure the query.
you can de the following in your PressRelease:
scope :your_scope, -> { where('id NOT IN(select press_release_id from publications)') }
this will return all PressRelease record without publications.
Couple ways to do this, first one requires two db queries:
PressRelease.where.not(id: Publications.uniq.pluck(:press_release_id))
or if you don't want to hardcode association foreign key:
PressRelease.where.not(id: PressRelease.uniq.joins(:publications).pluck(:id))
Another one is to do a left join and pick those without associated elements - you get a relation object, but it will be tricky to work with it as it already has a join on it:
PressRelease.eager_load(:publications).where(publications: {id: nil})
Another one is to use counter_cache feature. You will need to add publication_count column to your press_releases table.
class Publications < ActiveRecord::Base
belongs_to :presss_release, counter_cache: true
end
Rails will keep this column in sync with a number of records associated to given mode, so then you can simply do:
PressRelease.where(publications_count: [nil, 0])

Rails ActiveRecord Join Query With conditions

I have following SQL Query:
SELECT campaigns.* , campaign_countries.points, offers.image
FROM campaigns
JOIN campaign_countries ON campaigns.id = campaign_countries.campaign_id
JOIN countries ON campaign_countries.country_id = countries.id
JOIN offers ON campaigns.offer_id = offers.id
WHERE countries.code = 'US'
This works perfectly well. I want its rails active record version some thing like:
Campaign.includes(campaign_countries: :country).where(countries: {code: "US"})
Above code runs more or less correct query (did not try to include offers table), issue is returned result is collection of Campaign objects so obviously it does not include Points
My tables are:
campaigns --HAS_MANY--< campaign_countries --BELONGS_TO--< countries
campaigns --BELONGS_TO--> offers
Any suggestions to write AR version of this SQL? I don't want to use SQL statement in my code.
I some how got this working without SQL but surely its poor man's solution:
in my controller I have:
campaigns = Campaign.includes(campaign_countries: :country).where(countries: {code: country.to_s})
render :json => campaigns.to_json(:country => country)
in campaign model:
def points_for_country country
CampaignCountry.joins(:campaign, :country).where(countries: {code: country}, campaigns: {id: self.id}).first
end
def as_json options={}
json = {
id: id,
cid: cid,
name: name,
offer: offer,
points_details: options[:country] ? points_for_country(options[:country]) : ""
}
end
and in campaign_countries model:
def as_json options={}
json = {
face_value: face_value,
actual_value: actual_value,
points: points
}
end
Why this is not good solution? because it invokes too many queries:
1. It invokes query when first join is performed to get list of campaigns specific to country
2. For each campaign found in first query it will invoke one more query on campaign_countries table to get Points for that campaign and country.
This is bad, Bad and BAD solution. Any suggestions to improve this?
If You have campaign, You can use campaign.campaign_countries to get associated campaign_countries and just get points from them.
> campaign.campaign_countries.map(&:points)
=> [1,2,3,4,5]
Similarly You will be able to get image from offers relation.
EDIT:
Ok, I guess now I know what's going on. You can use joins with select to get object with attached fields from join tables.
cs = Campaign.joins(campaign_countries: :country).joins(:offers).select('campaigns.*, campaign_countries.points, offers.image').where(countries: {code: "US"})
You can than reference additional fields by their name on Campaign object
cs.first.points
cs.first.image
But be sure, that additional column names do not overlap with some primary table fields or object methods.
EDIT 2:
After some more research I came to conclusion that my first version was actually correct for this case. I will use my own console as example.
> u = User.includes(:orders => :cart).where(:carts => { :id => [5168, 5167] }).first
> u.orders.length # no query is performed
=> 2
> u.orders.count # count query is performed
=> 5
So when You use includes with condition on country, in campaign_countries are stored only campaign_countries that fulfill Your condition.
Try this:
Campaign.joins( [{ :campaign_countries => :countries}, :offers]).where('`countries`.`code` = ?', "US")

Creating a Rails 3 scope that joins to a subquery

First off, I'm a Ruby/Rails newbie, so I apologize if this question is basic.
I've got a DB that (among other things) looks like this:
organizations { id, name, current_survey_id }
surveys { id, organization_id }
responses { id, survey_id, question_response_integer }
I'm trying to create a scope method that adds the average of the current survey answers to a passed-in Organization relation. In other words, the scope that's getting passed into the method would generate SQL that looks like more-or-less like this:
select * from organizations
And I'd like the scope, after it gets processed by my lambda, to generate SQL that looks like this:
select o.id, o.name, cs.average_responses
from organizations o join
(select r.id, avg(r.question_response_integer) as average_responses
from responses r
group by r.id) cs on cs.id = o.current_survey_id
The best I've got is something like this:
current_survey_average: lambda do |scope, sort_direction|
average_answers = Responses.
select("survey_id, avg(question_response_integer) as average_responses").
group("survey_id")
scope.joins(average_answers).order("average_responses #{sort_direction}")
end
That's mostly just a stab in the dark - among other things, it doesn't specify how the scope could be expected to join to average_answers - but I haven't been able to find any documentation about how to do that sort of join, and I'm running out of things to try.
Any suggestions?
EDIT: Thanks to Sean Hill for the answer. Just to have it on record, here's the code I ended up going with:
current_survey_average: lambda do |scope, sort_direction|
scope_table = scope.arel.froms.first.name
query = <<-QUERY
inner join (
select r.survey_id, avg(r.question_response_integer) as average_responses
from responses r
group by r.survey_id
) cs
on cs.survey_id = #{scope_table}.current_survey_id
QUERY
scope.
joins(query).
order("cs.average_responses #{sort_direction}")
end
That said, I can see the benefit of putting the averaged_answers scope directly onto the Responses class - so I may end up doing that.
I have not been able to test this, but I think the following would work, either as-is or with a little tweaking.
class Response < ActiveRecord::Base
scope :averaged, -> { select('r.id, avg(r.question_response_integer) as average_responses').group('r.id') }
scope :current_survey_average, ->(incoming_scope, sort_direction) do
scope_table = incoming_scope.arel.froms.first.name
query = <<-QUERY
INNER JOIN ( #{Arel.sql(averaged.to_sql)} ) cs
ON cs.id = #{scope_table}.current_survey_id
QUERY
incoming_scope.joins(query).order("average_responses #{sort_direction}")
end
end
So what I've done here is that I have split out the inner query into another scope called averaged. Since you do not know which table the incoming scope in current_survey_average is coming from, I got the scope table name via scope.arel.froms.first.name. Then I created a query string that uses the averaged scope and joined it using the scope_table variable. The rest is pretty self-explanatory.
If you do know that the incoming scope will always be from the organizations table, then you don't need the extra scope_table variable. You can just hardcode it into the join query string.
I would make one suggestion. If you do not have control over sort_direction, then I would not directly input that into the order string.

Rails .joins query over multiple associations

I have this query that works as expected:
#dog.listings.joins(:address_country).merge(Country.where(permalink: 'uk'))
This query gives me the Listings where the country matches 'uk' (Listing has_one :address_country, which is a country from the Country model)
But when I add another association to the chain in between cat and listing (litter), it doesn't work (a litter belongs to a listing, as well as to a cat):
#dog.litters.joins(:listing) & Listing.joins(:address_country) & Country.where(permalink: 'uk')
In this query I'd like it to fetch the Litters where the country (of the associated listing) matches. But it just returns an empty array. The first query works, and I guess I just need to bolt that on to #cat.litters?)
In Rails C, I'm getting this:
d.litters.joins(:listing) & Listing.joins(:address_country).merge(Country.where(permalink: 'uk'))
Litter Load (0.6ms) SELECT "litters".* FROM "litters" INNER JOIN "listings" ON "listings"."id" = "litters"."listing_id" WHERE "litters"."litterable_id" = 11 AND "litters"."litterable_type" = 'Dog'
Listing Load (0.4ms) SELECT "listings".* FROM "listings" INNER JOIN "countries" ON "countries"."id" = "listings"."address_country_id" WHERE "countries"."permalink" = 'uk'
=> []
Any ideas what I'm doing wrong?
One thing that's definitely wrong is to assume that & is the same as merge. It used to be but was removed in fbd917 - now it's just ruby's array intersection and that's not what you want.
I am not sure I follow the database schema from the brief description you gave but just rewriting it to merge is worth the shot:
#dog.litters.joins(:listing).merge(Listing.joins(:address_country)).merge(Country.where(permalink: 'uk'))
and again without actually running the code I would guess that this is equivalent:
#dog.litters.joins(listing: :address_country).where(countries: {permalink: "uk"})

How do I write a named scope to filter by all of an array passed in, and not just by matching one element (using IN)

I have two models, Project and Category, which have a many-to-many relationship between them. The Project model is very simple:
class Project < ActiveRecord::Base
has_and_belongs_to_many :categories
scope :in_categories, lambda { |categories|
joins(:categories).
where("categories.id in (?)", categories.collect(&:to_i))
}
end
The :in_categories scope takes an array of Category IDs (as strings), so using this scope I can get back every project that belongs to at least one of the categories passed in.
But what I'm actually trying to do is filter (a better name would be :has_categories). I want to just get the projects that belong to all of the categories passed in. So if I pass in ["1", "3", "4"] I only want to get the projects that belong to all of the categories.
There are two common solutions in SQL to do what you're describing.
Self-join:
SELECT ...
FROM Projects p
JOIN Categories c1 ON c1.project_id = p.id
JOIN Categories c3 ON c3.project_id = p.id
JOIN Categories c4 ON c4.project_id = p.id
WHERE (c1.id, c3.id, c4.id) = (1, 3, 4);
Note I'm using syntax to compare tuples. This is equivalent to:
WHERE c1.id = 1 AND c3.id = 3 AND c4.id = 4;
In general, the self-join solution has very good performance if you have a covering index. Probably Categories.(project_id,id) would be the right index, but analyze the SQL with EXPLAIN to be sure.
The disadvantage of this method is that you need four joins if you're searching for projects that match four different categories. Five joins for five categories, etc.
Group-by:
SELECT ...
FROM Projects p
JOIN Categories cc ON c.project_id = p.id
WHERE c.id IN (1, 3, 4)
GROUP BY p.id
HAVING COUNT(*) = 3;
If you're using MySQL (I assume you are), most GROUP BY queries invoke a temp table and this kills performance.
I'll leave it as an exercise for you to adapt one of these SQL solutions to equivalent Rails ActiveRecord API.
It seems like in ActiveRecord you would do it like so:
scope :has_categories, lambda { |categories|
joins(:categories).
where("categories.id in (?)", categories.collect(&:to_i)).
group("projects.id HAVING COUNT(projects.id) = #{categories.count}")
}