Rails - scope for records that are not in a join table alongside a specific association - sql

I have two models in a Rails app - Tournament and Player associated through a join table:
class Tournament < ApplicationRecord
has_many :tournament_players
has_many :players, through: :tournament_players
end
class Player < ApplicationRecord
has_many :tournament_players
has_many :tournaments, through: :tournament_players
scope :selected, -> (tournament) { includes(:tournaments).where(tournaments: {id: tournament.id}) }
end
I have lots of Tournaments, and each one can have lots of Players. Players can play in lots of Tournaments. The scope
scope :selected, -> (tournament) { includes(:tournaments).where(tournaments: {id: tournament.id}) }
successfuly finds all the players already added to a tournament, given that tournament as an argument.
What I'd like is a scope that does the opposite - returns all the players not yet added to a given tournament. I've tried
scope :not_selected, -> (tournament) { includes(:tournaments).where.not(tournaments: {id: tournament.id}) }
but that returns many of the same players, I think because the players exist as part of other tournaments. The SQL for that looks something like:
SELECT "players".*, "tournaments”.* FROM "players" LEFT OUTER JOIN
"tournament_players" ON "tournament_players"."player_id" =
"players"."id" LEFT OUTER JOIN "tournaments" ON "tournaments"."id" =
"tournament_players"."tournament_id" WHERE ("tournaments"."id" != $1)
ORDER BY "players"."name" ASC [["id", 22]]
I've also tried the suggestions on this question - using
scope :not_selected, -> (tournament) { includes(:tournaments).where(tournaments: {id: nil}) }
but that doesn't seem to work - it just returns an empty array, again I think because the Players exist in the join table as part of a separate Tournament. The SQL for that looks something like:
SELECT "players”.*, "tournaments”.* FROM "players" LEFT OUTER JOIN
"tournament_players" ON "tournament_players"."player_id" =
"players"."id" LEFT OUTER JOIN "tournaments" ON "tournaments"."id" =
"tournament_players"."tournament_id" WHERE "tournaments"."id" IS NULL
ORDER BY "players"."name" ASC

What you need to do is:
Make a left join with the reference table, with an additional condition on the tournament ID matching the one that you want to find the not-selected players for
Apply a WHERE clause indicating that there was no JOIN made.
This code should do it:
# player.rb
scope :not_selected, -> (tournament) do
joins("LEFT JOIN tournament_players tp ON players.id = tp.player_id AND tp.tournament_id = #{tournament.id}").where(tp: {tournament_id: nil})
end
If only Rails had a nicer way to write LEFT JOIN queries with additional conditions...
A few notes:
Don't join the actual relation (i.e. Tournament), it dramatically decreases performance of your query, and it's unnecessary, because all your condition prerequisites are inside the reference table. Besides, all the rows you're interested in return NULL data from the tournaments table.
Don't use eager_load. Besides to my best knowledge its not supporting custom conditions, it would create models for all related objects, which you don't need.

ok try this:
includes(:tournaments).distinct.where.not(tournaments: {id: tournament.id})

Related

Rails scope for parent records that do not have particular child records

I have a parent model Effort that has_many split_times:
class Effort
has_many :split_times
end
class SplitTime
belongs_to :effort
belongs_to :split
end
class Split
has_many :split_times
enum kind: [:start, :finish, :intermediate]
end
I need a scope that will return efforts that do not have a start split_time. This seems like it should be possible, but so far I'm not able to do it.
I can return efforts with no split_times with this:
scope :without_split_times, -> { includes(:split_times).where(split_times: {:id => nil}) }
And I can return efforts that have at least one split_time with this:
scope :with_split_times, -> { joins(:split_times).uniq }
Here's my attempt at the scope I want:
scope :without_start_time, -> { joins(split_times: :split).where(split_times: {:id => nil}).where('splits.kind != ?', Split.kinds[:start]) }
But that doesn't work. I need something that will return all efforts that do not have a split_time that has a split with kind: :start even if the efforts have other split_times. I would prefer a Rails solution but can go to raw SQL if necessary. I'm using Postgres if it matters.
You can left join on your criteria (i.e. splits.kind = 'start') which will include nulls (i.e. there was no matching row to join). The difference is that Rails' join will by default give you an inner join (there are matching rows in both tables) but you want a left join as you need to check that there is no matching row on the right table.
With the results of that join you can group by event and then count the number of matching splits - if it's 0 then there are no matching start splits for that event!
This might do the trick for you:
scope :without_start_time, -> {
joins("LEFT JOIN split_times ON split_times.effort_id = efforts.id").
joins("LEFT OUTER JOIN splits ON split_times.split_id = splits.id " \
"AND splits.kind = 0").
group("efforts.id").
having("COUNT(splits.id) = 0")
}

Get records with no related data using activerecord and RoR3?

I am making scopes for a model that looks something like this:
class PressRelease < ActiveRecord::Base
has_many :publications
end
What I want to get is all press_releases that does not have publications, but from a scope method, so it can be chained with other scopes. Any ideas?
Thanks!
NOTE: I know that there are methods like present? or any? and so on, but these methods does not return an ActiveRecord::Relation as scope does.
NOTE: I am using RoR 3
Avoid eager_loading if you do not need it (it adds overhead). Also, there is no need for subselect statements.
scope :without_publications, -> { joins("LEFT OUTER JOIN publications ON publications.press_release_id = press_releases.id").where(publications: { id: nil }) }
Explanation and response to comments
My initial thoughts about eager loading overhead is that ActiveRecord would instantiate all the child records (publications) for each press release. Then I realized that the query will never return press release records with publications. So that is a moot point.
There are some points and observations to be made about the way ActiveRecord works. Some things I had previously learned from experience, and some things I learned exploring your question.
The query from includes(:publications).where(publications: {id: nil}) is actually different from my example. It will return all columns from the publications table in addition to the columns from press_releases. The publication columns are completely unnecessary because they will always be null. However, both queries ultimately result in the same set of PressRelease objects.
With the includes method, if you add any sort of limit, for example chaining .first, .last or .limit(), then ActiveRecord (4.2.4) will resort to executing two queries. The first query returns IDs, and the second query uses those IDs to get results. Using the SQL snippet method, ActiveRecord is able to use just one query. Here is an example of this from one of my applications:
Profile.includes(:positions).where(positions: { id: nil }).limit(5)
# SQL (0.8ms) SELECT DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "positions" ON "positions"."profile_id" = "profiles"."id" WHERE "positions"."id" IS NULL LIMIT 5
# SQL (0.8ms) SELECT "profiles"."id" AS t0_r0, ..., "positions"."end_year" AS t1_r11 FROM "profiles" LEFT OUTER JOIN "positions" ON "positions"."profile_id" = "profiles"."id" # WHERE "positions"."id" IS NULL AND "profiles"."id" IN (107, 24, 7, 78, 89)
Profile.joins("LEFT OUTER JOIN positions ON positions.profile_id = profiles.id").where(positions: { id: nil }).limit(5)
# Profile Load (1.0ms) SELECT "profiles".* FROM "profiles" LEFT OUTER JOIN positions ON positions.profile_id = profiles.id WHERE "positions"."id" IS NULL LIMIT 5
Most importantly
eager_loading and includes were not intended to solve the problem at hand. And for this particular case I think you are much more aware of what is needed than ActiveRecord is. You can therefore make better decisions about how to structure the query.
you can de the following in your PressRelease:
scope :your_scope, -> { where('id NOT IN(select press_release_id from publications)') }
this will return all PressRelease record without publications.
Couple ways to do this, first one requires two db queries:
PressRelease.where.not(id: Publications.uniq.pluck(:press_release_id))
or if you don't want to hardcode association foreign key:
PressRelease.where.not(id: PressRelease.uniq.joins(:publications).pluck(:id))
Another one is to do a left join and pick those without associated elements - you get a relation object, but it will be tricky to work with it as it already has a join on it:
PressRelease.eager_load(:publications).where(publications: {id: nil})
Another one is to use counter_cache feature. You will need to add publication_count column to your press_releases table.
class Publications < ActiveRecord::Base
belongs_to :presss_release, counter_cache: true
end
Rails will keep this column in sync with a number of records associated to given mode, so then you can simply do:
PressRelease.where(publications_count: [nil, 0])

SQL LEFT JOIN value NOT in either join column

I suspect this is a rather common scenario and may show my ineptitude as a DB developer, but here goes anyway ...
I have two tables: Profiles and HiddenProfiles and the HiddenProfiles table has two relevant foreign keys: profile_id and hidden_profile_id that store ids from the Profiles table.
As you can imagine, a user can hide another user (wherein his profile ID would be the profile_id in the HiddenProfiles table) or he can be hidden by another user (wherein his profile ID would be put in the hidden_profile_id column). Again, a pretty common scenario.
Desired Outcome:
I want to do a join (or to be honest, whatever would be the most efficient query) on the Profiles and HiddenProfiles table to find all the profiles that a given profile is both not hiding AND not hidden from.
In my head I thought it would be pretty straightforward, but the iterations I came up with kept seeming to miss one half of the problem. Finally, I ended up with something that looks like this:
SELECT "profiles".* FROM "profiles"
LEFT JOIN hidden_profiles hp1 on hp1.profile_id = profiles.id and (hp1.hidden_profile_id = 1)
LEFT JOIN hidden_profiles hp2 on hp2.hidden_profile_id = profiles.id and (hp2.profile_id = 1)
WHERE (hp1.hidden_profile_id is null) AND (hp2.profile_id is null)
Don't get me wrong, this "works" but in my heart of hearts I feel like there should be a better way. If in fact there is not, I'm more than happy to accept that answer from someone with more wisdom than myself on the matter. :)
And for what it's worth these are two RoR models sitting on a Postgres DB, so solutions tailored to those constraints are appreciated.
Models are as such:
class Profile < ActiveRecord::Base
...
has_many :hidden_profiles, dependent: :delete_all
scope :not_hidden_to_me, -> (profile) { joins("LEFT JOIN hidden_profiles hp1 on hp1.profile_id = profiles.id and (hp1.hidden_profile_id = #{profile.id})").where("hp1.hidden_profile_id is null") }
scope :not_hidden_by_me, -> (profile) { joins("LEFT JOIN hidden_profiles hp2 on hp2.hidden_profile_id = profiles.id and (hp2.profile_id = #{profile.id})").where("hp2.profile_id is null") }
scope :not_hidden, -> (profile) { self.not_hidden_to_me(profile).not_hidden_by_me(profile) }
...
end
class HiddenProfile < ActiveRecord::Base
belongs_to :profile
belongs_to :hidden_profile, class_name: "Profile"
end
So to get the profiles I want I'm doing the following:
Profile.not_hidden(given_profile)
And again, maybe this is fine, but if there's a better way I'll happily take it.
If you want to get this list just for a single profile, I would implement an instance method to perform effectively the same query in ActiveRecord. The only modification I made is to perform a single join onto a union of subqueries and to apply the conditions on the subqueries. This should reduce the columns that need to be loaded into memory, and hopefully be faster (you'd need to benchmark against your data to be sure):
class Profile < ActiveRecord::Base
def visible_profiles
Profile.joins("LEFT OUTER JOIN (
SELECT profile_id p_id FROM hidden_profiles WHERE hidden_profile_id = #{id}
UNION ALL
SELECT hidden_profile_id p_id FROM hidden_profiles WHERE profile_id = #{id}
) hp ON hp.p_id = profiles.id").where("hp.p_id IS NULL")
end
end
Since this method returns an ActiveRecord scope, you can chain additional conditions if desired:
Profile.find(1).visible_profiles.where("created_at > ?", Time.new(2015,1,1)).order(:name)
Personally I've never liked the join = null approach. I find it counter intuitive. You're asking for a join, and then limiting the results to records that don't match.
I'd approach it more as
SELECT id FROM profiles p
WHERE
NOT EXISTS
(SELECT * FROM hidden_profiles hp1
WHERE hp1.hidden_profile_id = 1 and hp1.profile_id = p.profile_id)
AND
NOT EXISTS (SELECT * FROM hidden_profiles hp2
WHERE hp2.hidden_profile_id = p.profile_id and hp2.profile_id = 1)
But you're going to need to run it some EXPLAINs with realistic volumes to be sure of which works best.

How to select items from one table based on HABTM relation in another? Postgres "ALL" does not work,

I am trying to retrieve mangas (comics) that have certain categories. For example in the code below, I am trying to search for Adventure(id=29) and Comedy(id=25) mangas. I am using "ALL" operator because I want BOTH categories be in mangas. (i.e return all Manga that have both a category of 25 AND 29 through the relation table, but can also have other categories attached to them)
#search = Manga.find_by_sql("
SELECT m.*
FROM mangas m
JOIN categorizations c ON c.manga_id = m.id AND c.category_id = ALL (array[29,25])
")
Problems? The query is not working as I am expecting (maybe I misunderstand something about ALL operator). I am getting nothing back from the query.
So I tried to change it to
JOIN categorizations c ON c.manga_id = m.id AND c.category_id >= ALL (array[29,25])
I get back mangas whose IDs are GREATER than 29. I am not even getting category #29.
Is there something I am missing here?
Also the query is... VERY slow. I would appreciate it if someone comes with a query that return back what I want.
I am using Ruby on Rails 4.2 and postgresql
Thanks
Update: (posting models relationship)
class Manga < ActiveRecord::Base
has_many :categorizations, :dependent => :destroy
has_many :categories, through: :categorizations
end
class Category < ActiveRecord::Base
has_many :categorizations, :dependent => :destroy
has_many :mangas, through: :categorizations
end
class Categorization < ActiveRecord::Base
belongs_to :manga
belongs_to :category
end
My attempt based on #Beartech answer:
wheres = categories_array.join(" = ANY (cat_ids) AND ")+" = ANY (cat_ids)"
#m = Manga.find_by_sql("
SELECT mangas.*
FROM
(SELECT manga_id, cat_ids
FROM
(
SELECT c.manga_id, array_agg(c.category_id) cat_ids
FROM categorizations c GROUP BY c.manga_id
)
AS sub_table1 WHERE #{wheres}
)
AS sub_table2
INNER JOIN mangas ON sub_table2.manga_id = mangas.id
")
I'm adding this as a different answer, because I like to have the other one for historic reasons. It gets the job done, but not efficiently, so maybe someone will see where it can be improved. That said...
THE ANSWER IS!!!
It all comes back around to the Postgresql functions ALL is not what you want. You want the "CONTAINS" operator, which is #>. You also need some sort of aggregate function because you want to match each Manga with all of it's categories, select only the ones that contain both 25 and 29.
Here is the sql for that:
SELECT manga.*
FROM
(SELECT manga_id, cat_ids
FROM
(SELECT manga_id, array_agg(category_id) cat_ids
FROM categorizations GROUP BY manga_id)
AS sub_table1 WHERE cat_ids #> ARRAY[25,29] )
AS sub_table2
INNER JOIN manga
ON sub_table2.manga_id = manga.id
;
So you are pulling a subquery that grabs all of the matching rows in the join table, puts their category ids into an array, and grouping by the manga id. Now you can join that against the manga table to get the actual manga records
The ruby looks like:
#search = Manga.find_by_sql("SELECT manga.* FROM (SELECT manga_id, cat_ids FROM (SELECT manga_id, array_agg(category_id) cat_ids FROM categorizations GROUP BY manga_id) AS sub_table1 WHERE cat_ids #> ARRAY[25,29] ) AS sub_table2 INNER JOIN manga ON sub_table2.manga_id = manga.id
It's fast and clean, doing it all in the native SQL.
You can interpolate variables into the .find_by_sql() text. This gives you an instant search function since #> is asking if the array of categories contains all of the search terms.
terms = [25,29]
q = %Q(SELECT manga.* FROM (SELECT manga_id, cat_ids FROM (SELECT manga_id, array_agg(category_id) cat_ids FROM categorizations GROUP BY manga_id) AS sub_table1 WHERE cat_ids #> ARRAY#{terms} ) AS sub_table2 INNER JOIN manga ON sub_table2.manga_id = manga.id")
Manga.find_by_sql(q)
Important
I am fairly certain that the above code is in some way insecure. I would assume that you are going to validate the input of the array in some way, i.e.
terms.all? {|term| term.is_a? Integer} ? terms : terms = []
Third times the charm, right? LOL
OK, totally changing my answer because it seems like this should be SUPER EASY in Rails, but it has stumped the heck out of me...
I am heavily depending on This answer to come up with this. You should put a scope in your Manga model:
class Manga < ActiveRecord::Base
has_many :categorizations, :dependent => :destroy
has_many :categories, through: :categorizations
scope :in_categories, lambda { |*search_categories|
joins(:categories).where(:categorizations => { :category_id => search_categories } )
}
end
Then call it like:
#search = Manga.in_categories(25,29).group_by {|manga| ([25,29] & manga.category_ids) == [25,29]}
This iterates through all of the Manga that contain at least ONE or more of the two categories, makes a "set" of the array of [25,29] with the array from the manga.category_ids and checks to see if that set equals your reqeusted set. This weeds out ALL Manga that only have one of the two keys.
#search will now be a hash with two keys true and false:
{true => [#<Manga id: 9, name: 'Guardians of...
.... multiple manga objects that belong to at least
the two categories requested but not eliminated if
they also belong to a third of fourth category ... ]
false => [ ... any Manga that only belong to ONE of the two
categories requested ... ]
}
Now to get just the unique Mangas that belong to BOTH categories use .uniq:
#search[true].uniq
BOOM!! You have an array of you Manga objects that match BOTH of your categories.
OR
You can simplify it with:
#search = Manga.in_categories(25,29).keep_if {|manga| ([25,29] & manga.category_ids) == [25,29]}
#search.uniq!
I like that a little bit better, it looks cleaner.
AND NOW FOR YOU SQL JUNKIES
#search = Manga.find_by_sql("Select *
FROM categorizations
JOIN manga ON categorizations.manga_id = manga.id
WHERE categorizations.cateogry_id IN (25,29)").keep_if {|manga| ([25,29] & manga.category_ids) == [25,29]}
#search.uniq!
* OK OK OK I'll stop after this one. :-) *
Roll it all into the scope in Manga.rb:
scope :in_categories, lambda { |*search_categories|
joins(:categories).where(:categorizations => { :category_id => search_categories } ).uniq!.keep_if {|manga| manga.category_ids.include? search_categories[0] and manga.category_ids.include? search_categories[1]} }
THERE HAS GOT TO BE AN EASIER WAY??? (actually that last one is pretty easy)

ActiveRecord: Adding condition to ON clause for includes

I have a model offers and another historical_offers, one offer has_many historical_offers.
Now I would like to eager load the historical_offers of one given day for a set of offers, if it exists. For this, I think I need to pass the day to the ON clause, not the WHERE clause, so that I get all offers, also when there is no historical_offer for the given day.
With
Offer.where(several_complex_conditions).includes(:historical_offers).where("historical_offers.day = ?", Date.today)
I would get
SELECT * FROM offers
LEFT OUTER JOIN historical_offers
ON offers.id = historical_offers.offer_id
WHERE day = '2012-11-09' AND ...
But I want to have the condition in the ON clause, not in the WHERE clause:
SELECT * FROM offers
LEFT OUTER JOIN historical_offers
ON offers.id = historical_offers.offer_id AND day = '2012-11-09'
WHERE ...
I guess I could alter the has_many definition with a lambda condition for a specific date, but how would I pass in a date then?
Alternatively I could write the joins mysqlf like this:
Offer.where(several_complex_conditions)
.joins(["historical_offers ON offers.id = historical_offers.offer_id AND day = ?", Date.today])
But how can I hook this up so that eager loading is done?
After a few hours headscratching and trying all sorts of ways to accomplish eager loading of a constrained set of associated records I came across #dbenhur's answer in this thread which works fine for me - however the condition isn't something I'm passing in (it's a date relative to Date.today). Basically it is creating an association with the conditions I wanted to put into the LEFT JOIN ON clause into the has_many condition.
has_many :prices, order: "rate_date"
has_many :future_valid_prices,
class_name: 'Price',
conditions: ['rate_date > ? and rate is not null', Date.today-7.days]
And then in my controller:
#property = current_agent.properties.includes(:future_valid_prices).find_by_id(params[:id])