How to select items from one table based on HABTM relation in another? Postgres "ALL" does not work, - sql

I am trying to retrieve mangas (comics) that have certain categories. For example in the code below, I am trying to search for Adventure(id=29) and Comedy(id=25) mangas. I am using "ALL" operator because I want BOTH categories be in mangas. (i.e return all Manga that have both a category of 25 AND 29 through the relation table, but can also have other categories attached to them)
#search = Manga.find_by_sql("
SELECT m.*
FROM mangas m
JOIN categorizations c ON c.manga_id = m.id AND c.category_id = ALL (array[29,25])
")
Problems? The query is not working as I am expecting (maybe I misunderstand something about ALL operator). I am getting nothing back from the query.
So I tried to change it to
JOIN categorizations c ON c.manga_id = m.id AND c.category_id >= ALL (array[29,25])
I get back mangas whose IDs are GREATER than 29. I am not even getting category #29.
Is there something I am missing here?
Also the query is... VERY slow. I would appreciate it if someone comes with a query that return back what I want.
I am using Ruby on Rails 4.2 and postgresql
Thanks
Update: (posting models relationship)
class Manga < ActiveRecord::Base
has_many :categorizations, :dependent => :destroy
has_many :categories, through: :categorizations
end
class Category < ActiveRecord::Base
has_many :categorizations, :dependent => :destroy
has_many :mangas, through: :categorizations
end
class Categorization < ActiveRecord::Base
belongs_to :manga
belongs_to :category
end
My attempt based on #Beartech answer:
wheres = categories_array.join(" = ANY (cat_ids) AND ")+" = ANY (cat_ids)"
#m = Manga.find_by_sql("
SELECT mangas.*
FROM
(SELECT manga_id, cat_ids
FROM
(
SELECT c.manga_id, array_agg(c.category_id) cat_ids
FROM categorizations c GROUP BY c.manga_id
)
AS sub_table1 WHERE #{wheres}
)
AS sub_table2
INNER JOIN mangas ON sub_table2.manga_id = mangas.id
")

I'm adding this as a different answer, because I like to have the other one for historic reasons. It gets the job done, but not efficiently, so maybe someone will see where it can be improved. That said...
THE ANSWER IS!!!
It all comes back around to the Postgresql functions ALL is not what you want. You want the "CONTAINS" operator, which is #>. You also need some sort of aggregate function because you want to match each Manga with all of it's categories, select only the ones that contain both 25 and 29.
Here is the sql for that:
SELECT manga.*
FROM
(SELECT manga_id, cat_ids
FROM
(SELECT manga_id, array_agg(category_id) cat_ids
FROM categorizations GROUP BY manga_id)
AS sub_table1 WHERE cat_ids #> ARRAY[25,29] )
AS sub_table2
INNER JOIN manga
ON sub_table2.manga_id = manga.id
;
So you are pulling a subquery that grabs all of the matching rows in the join table, puts their category ids into an array, and grouping by the manga id. Now you can join that against the manga table to get the actual manga records
The ruby looks like:
#search = Manga.find_by_sql("SELECT manga.* FROM (SELECT manga_id, cat_ids FROM (SELECT manga_id, array_agg(category_id) cat_ids FROM categorizations GROUP BY manga_id) AS sub_table1 WHERE cat_ids #> ARRAY[25,29] ) AS sub_table2 INNER JOIN manga ON sub_table2.manga_id = manga.id
It's fast and clean, doing it all in the native SQL.
You can interpolate variables into the .find_by_sql() text. This gives you an instant search function since #> is asking if the array of categories contains all of the search terms.
terms = [25,29]
q = %Q(SELECT manga.* FROM (SELECT manga_id, cat_ids FROM (SELECT manga_id, array_agg(category_id) cat_ids FROM categorizations GROUP BY manga_id) AS sub_table1 WHERE cat_ids #> ARRAY#{terms} ) AS sub_table2 INNER JOIN manga ON sub_table2.manga_id = manga.id")
Manga.find_by_sql(q)
Important
I am fairly certain that the above code is in some way insecure. I would assume that you are going to validate the input of the array in some way, i.e.
terms.all? {|term| term.is_a? Integer} ? terms : terms = []

Third times the charm, right? LOL
OK, totally changing my answer because it seems like this should be SUPER EASY in Rails, but it has stumped the heck out of me...
I am heavily depending on This answer to come up with this. You should put a scope in your Manga model:
class Manga < ActiveRecord::Base
has_many :categorizations, :dependent => :destroy
has_many :categories, through: :categorizations
scope :in_categories, lambda { |*search_categories|
joins(:categories).where(:categorizations => { :category_id => search_categories } )
}
end
Then call it like:
#search = Manga.in_categories(25,29).group_by {|manga| ([25,29] & manga.category_ids) == [25,29]}
This iterates through all of the Manga that contain at least ONE or more of the two categories, makes a "set" of the array of [25,29] with the array from the manga.category_ids and checks to see if that set equals your reqeusted set. This weeds out ALL Manga that only have one of the two keys.
#search will now be a hash with two keys true and false:
{true => [#<Manga id: 9, name: 'Guardians of...
.... multiple manga objects that belong to at least
the two categories requested but not eliminated if
they also belong to a third of fourth category ... ]
false => [ ... any Manga that only belong to ONE of the two
categories requested ... ]
}
Now to get just the unique Mangas that belong to BOTH categories use .uniq:
#search[true].uniq
BOOM!! You have an array of you Manga objects that match BOTH of your categories.
OR
You can simplify it with:
#search = Manga.in_categories(25,29).keep_if {|manga| ([25,29] & manga.category_ids) == [25,29]}
#search.uniq!
I like that a little bit better, it looks cleaner.
AND NOW FOR YOU SQL JUNKIES
#search = Manga.find_by_sql("Select *
FROM categorizations
JOIN manga ON categorizations.manga_id = manga.id
WHERE categorizations.cateogry_id IN (25,29)").keep_if {|manga| ([25,29] & manga.category_ids) == [25,29]}
#search.uniq!
* OK OK OK I'll stop after this one. :-) *
Roll it all into the scope in Manga.rb:
scope :in_categories, lambda { |*search_categories|
joins(:categories).where(:categorizations => { :category_id => search_categories } ).uniq!.keep_if {|manga| manga.category_ids.include? search_categories[0] and manga.category_ids.include? search_categories[1]} }
THERE HAS GOT TO BE AN EASIER WAY??? (actually that last one is pretty easy)

Related

Rails and SQL - get related by all elements from array, entries

I have something like this:
duplicates = ['a','b','c','d']
if duplicates.length > 4
Photo.includes(:tags).where('tags.name IN (?)',duplicates)
.references(:tags).limit(15).each do |f|
returned_array.push(f.id)
end
end
duplicates is an array of tags that were duplicated with other Photo tags
What I want is to get Photo which includes all tags from duplicates array, but right now I get every Photo that include at least one tag from array.
THANKS FOR ANSWERS:
I try them and somethings starts to work but wasn't too clear for me and take some time to execute.
Today I make it creating arrays, compare them, take duplicates which exist in array more than X times and finally have uniq array of photos ids.
If you want to find photos that have all the given tags you just need to apply a GROUP and use HAVING to set a condition on the group:
class Photo
def self.with_tags(*names)
t = Tag.arel_table
joins(:tags)
.where(tags: { name: names })
.group(:id)
.having(t[:id].count.eq(tags.length)) # COUNT(tags.id) = ?
end
end
This is somewhat like a WHERE clause but it applies to the group. Using .gteq (>=) instead of .eq will give you records that can have all the tags in the list but may have more.
A better way to solve this is to use a better domain model that doesn't allow duplicates in the first place:
class Photo < ApplicationRecord
has_many :taggings
has_many :tags, through: :taggings
end
class Tag < ApplicationRecord
has_many :taggings
has_many :photos, through: :taggings
validates :name,
uniqueness: true,
presenece: true
end
class Tagging < ApplicationRecord
belongs_to :photo
belongs_to :tag
validates :tag_id,
uniqueness: { scope: :photo_id }
end
By adding unique indexes on tags.name and a compound index on taggings.tag_id and taggings.photo_id duplicates cannot be created.
The issue as I see it is that you're only doing one join, which means that you have to specify that tags.name is within the list of duplicates.
You could solve this in two places:
In the database query
In you application code
For your example the query is something like "find all records in the photos table which also have a relation to a specific set of records in the tags table". So we need to join the photos table to the tags table, and also specify that the only tags we join are those within the duplicate list.
We can use a inner join for this
select photos.* from photos
inner join tags as d1 on d1.name = 'a' and d1.photo_id = photos.id
inner join tags as d2 on d2.name = 'b' and d2.photo_id = photos.id
inner join tags as d3 on d3.name = 'c' and d3.photo_id = photos.id
inner join tags as d4 on d4.name = 'd' and d4.photo_id = photos.id
In ActiveRecord it seems we can't specify aliases for joins, but we can chain queries, so we can do something like this:
query = Photo
duplicate.each_with_index do |tag, index|
join_name = "d#{index}"
query = query.joins("inner join tags as #{join_name} on #{join_name}.name = '#{tag}' and #{join_name}.photo_id = photos.id")
end
Ugly, but gets the job done. I'm sure there would be a better way using arel instead - but it demonstrates how to construct a SQL query to find all photos that have a relation to all of the duplicate tags.
The other method is to extent what you have and filter in the application. As you already have the photos that has at least one of the tags, you could just select those which have all the tags.
Photo
.includes(:tags)
.joins(:tags)
.where('tags.name IN (?)',duplicates)
.select do |photo|
(duplicates - photo.tags.map(&:name)).empty?
end
(duplicates - photo.tags.map(&:name)).empty? takes the duplicates array and removes all occurrences of any item that is also in the photo tags. If this returns an empty array then we know that the tags in the photo had all the duplicate tags as well.
This could have performance issues if the duplicates array is large, since it could potentially return all photos from the database.

Rails - scope for records that are not in a join table alongside a specific association

I have two models in a Rails app - Tournament and Player associated through a join table:
class Tournament < ApplicationRecord
has_many :tournament_players
has_many :players, through: :tournament_players
end
class Player < ApplicationRecord
has_many :tournament_players
has_many :tournaments, through: :tournament_players
scope :selected, -> (tournament) { includes(:tournaments).where(tournaments: {id: tournament.id}) }
end
I have lots of Tournaments, and each one can have lots of Players. Players can play in lots of Tournaments. The scope
scope :selected, -> (tournament) { includes(:tournaments).where(tournaments: {id: tournament.id}) }
successfuly finds all the players already added to a tournament, given that tournament as an argument.
What I'd like is a scope that does the opposite - returns all the players not yet added to a given tournament. I've tried
scope :not_selected, -> (tournament) { includes(:tournaments).where.not(tournaments: {id: tournament.id}) }
but that returns many of the same players, I think because the players exist as part of other tournaments. The SQL for that looks something like:
SELECT "players".*, "tournaments”.* FROM "players" LEFT OUTER JOIN
"tournament_players" ON "tournament_players"."player_id" =
"players"."id" LEFT OUTER JOIN "tournaments" ON "tournaments"."id" =
"tournament_players"."tournament_id" WHERE ("tournaments"."id" != $1)
ORDER BY "players"."name" ASC [["id", 22]]
I've also tried the suggestions on this question - using
scope :not_selected, -> (tournament) { includes(:tournaments).where(tournaments: {id: nil}) }
but that doesn't seem to work - it just returns an empty array, again I think because the Players exist in the join table as part of a separate Tournament. The SQL for that looks something like:
SELECT "players”.*, "tournaments”.* FROM "players" LEFT OUTER JOIN
"tournament_players" ON "tournament_players"."player_id" =
"players"."id" LEFT OUTER JOIN "tournaments" ON "tournaments"."id" =
"tournament_players"."tournament_id" WHERE "tournaments"."id" IS NULL
ORDER BY "players"."name" ASC
What you need to do is:
Make a left join with the reference table, with an additional condition on the tournament ID matching the one that you want to find the not-selected players for
Apply a WHERE clause indicating that there was no JOIN made.
This code should do it:
# player.rb
scope :not_selected, -> (tournament) do
joins("LEFT JOIN tournament_players tp ON players.id = tp.player_id AND tp.tournament_id = #{tournament.id}").where(tp: {tournament_id: nil})
end
If only Rails had a nicer way to write LEFT JOIN queries with additional conditions...
A few notes:
Don't join the actual relation (i.e. Tournament), it dramatically decreases performance of your query, and it's unnecessary, because all your condition prerequisites are inside the reference table. Besides, all the rows you're interested in return NULL data from the tournaments table.
Don't use eager_load. Besides to my best knowledge its not supporting custom conditions, it would create models for all related objects, which you don't need.
ok try this:
includes(:tournaments).distinct.where.not(tournaments: {id: tournament.id})

SQL LEFT JOIN value NOT in either join column

I suspect this is a rather common scenario and may show my ineptitude as a DB developer, but here goes anyway ...
I have two tables: Profiles and HiddenProfiles and the HiddenProfiles table has two relevant foreign keys: profile_id and hidden_profile_id that store ids from the Profiles table.
As you can imagine, a user can hide another user (wherein his profile ID would be the profile_id in the HiddenProfiles table) or he can be hidden by another user (wherein his profile ID would be put in the hidden_profile_id column). Again, a pretty common scenario.
Desired Outcome:
I want to do a join (or to be honest, whatever would be the most efficient query) on the Profiles and HiddenProfiles table to find all the profiles that a given profile is both not hiding AND not hidden from.
In my head I thought it would be pretty straightforward, but the iterations I came up with kept seeming to miss one half of the problem. Finally, I ended up with something that looks like this:
SELECT "profiles".* FROM "profiles"
LEFT JOIN hidden_profiles hp1 on hp1.profile_id = profiles.id and (hp1.hidden_profile_id = 1)
LEFT JOIN hidden_profiles hp2 on hp2.hidden_profile_id = profiles.id and (hp2.profile_id = 1)
WHERE (hp1.hidden_profile_id is null) AND (hp2.profile_id is null)
Don't get me wrong, this "works" but in my heart of hearts I feel like there should be a better way. If in fact there is not, I'm more than happy to accept that answer from someone with more wisdom than myself on the matter. :)
And for what it's worth these are two RoR models sitting on a Postgres DB, so solutions tailored to those constraints are appreciated.
Models are as such:
class Profile < ActiveRecord::Base
...
has_many :hidden_profiles, dependent: :delete_all
scope :not_hidden_to_me, -> (profile) { joins("LEFT JOIN hidden_profiles hp1 on hp1.profile_id = profiles.id and (hp1.hidden_profile_id = #{profile.id})").where("hp1.hidden_profile_id is null") }
scope :not_hidden_by_me, -> (profile) { joins("LEFT JOIN hidden_profiles hp2 on hp2.hidden_profile_id = profiles.id and (hp2.profile_id = #{profile.id})").where("hp2.profile_id is null") }
scope :not_hidden, -> (profile) { self.not_hidden_to_me(profile).not_hidden_by_me(profile) }
...
end
class HiddenProfile < ActiveRecord::Base
belongs_to :profile
belongs_to :hidden_profile, class_name: "Profile"
end
So to get the profiles I want I'm doing the following:
Profile.not_hidden(given_profile)
And again, maybe this is fine, but if there's a better way I'll happily take it.
If you want to get this list just for a single profile, I would implement an instance method to perform effectively the same query in ActiveRecord. The only modification I made is to perform a single join onto a union of subqueries and to apply the conditions on the subqueries. This should reduce the columns that need to be loaded into memory, and hopefully be faster (you'd need to benchmark against your data to be sure):
class Profile < ActiveRecord::Base
def visible_profiles
Profile.joins("LEFT OUTER JOIN (
SELECT profile_id p_id FROM hidden_profiles WHERE hidden_profile_id = #{id}
UNION ALL
SELECT hidden_profile_id p_id FROM hidden_profiles WHERE profile_id = #{id}
) hp ON hp.p_id = profiles.id").where("hp.p_id IS NULL")
end
end
Since this method returns an ActiveRecord scope, you can chain additional conditions if desired:
Profile.find(1).visible_profiles.where("created_at > ?", Time.new(2015,1,1)).order(:name)
Personally I've never liked the join = null approach. I find it counter intuitive. You're asking for a join, and then limiting the results to records that don't match.
I'd approach it more as
SELECT id FROM profiles p
WHERE
NOT EXISTS
(SELECT * FROM hidden_profiles hp1
WHERE hp1.hidden_profile_id = 1 and hp1.profile_id = p.profile_id)
AND
NOT EXISTS (SELECT * FROM hidden_profiles hp2
WHERE hp2.hidden_profile_id = p.profile_id and hp2.profile_id = 1)
But you're going to need to run it some EXPLAINs with realistic volumes to be sure of which works best.

ActiveRecord .merge not working on two relations

I have the following models in my app:
class Company < ActiveRecord::Base
has_many :gallery_cards, dependent: :destroy
has_many :photos, through: :gallery_cards
has_many :direct_photos, class_name: 'Photo'
end
class Photo < ActiveRecord::Base
belongs_to :gallery_card
belongs_to :company
end
class GalleryCard < ActiveRecord::Base
belongs_to :company
has_many :photos
end
As you can see, Company has_many :photos, through: :gallery_cards and also has_many :photos. Photo has both a gallery_card_id and a company_id column.
What I want to be able to do is write a query like #company.photos that returns an ActiveRecord::Relation of all the company's photos. In my Company model, I currently have the method below, but that returns an array or ActiveRecord objects, rather than a relation.
def all_photos
photos + direct_photos
end
I've tried using the .merge() method (see below), but that returns an empty relation. I think the reason is because the conditions that are used to select #company.photos and #company.direct_photos are different. This SO post explains it in more detail.
#company = Company.find(params[:id])
photos = #company.photos
direct_photos = #company.direct_photos
direct_photos.merge(photos) = []
photos.merge(direct_photos) = []
I've also tried numerous combinations of .joins and .includes without success.
this might be a candidate for a raw SQL query, but my SQL skills are rather basic.
For what it's worth, I revisited this and came up (with help) another query that grabs everything in one shot, rather than building an array of ids for a second query. This also includes the other join tables:
Photo.joins("
LEFT OUTER JOIN companies ON photos.company_id = #{id}
LEFT OUTER JOIN gallery_cards ON gallery_cards.id = photos.gallery_card_id
LEFT OUTER JOIN quote_cards ON quote_cards.id = photos.quote_card_id
LEFT OUTER JOIN team_cards ON team_cards.id = photos.team_card_id
LEFT OUTER JOIN who_cards ON who_cards.id = photos.who_card_id
LEFT OUTER JOIN wild_cards ON wild_cards.id = photos.wild_card_id"
).where("photos.company_id = #{id}
OR gallery_cards.company_id = #{id}
OR quote_cards.company_id = #{id}
OR team_cards.company_id = #{id}
OR who_cards.company_id = #{id}
OR wild_cards.company_id = #{id}").uniq
ActiveRecord's merge returns the intersection not the union of the two queries – counterintuitively IMO.
To find the union, you need to use OR, for which ActiveRecord has poor built-in support. So I think you're correct that its best to write the conditions in SQL:
def all_photos
Photo.joins("LEFT OUTER JOIN gallery_cards ON gallery_cards.id = photos.gallery_card_id")
.where("photos.company_id = :id OR gallery_cards.company_id = :id", id: id)
end
ETA The query associates the gallery_cards to photos with a LEFT OUTER JOIN, which preserves those photo rows without associated gallery card rows. You can then query based on either photos columns or on associated gallery_cards columns – in this case, company_id from either table.
You can leverage ActiveRecord scope chaining to join and query from additional tables:
def all_photos
Photo.joins("LEFT OUTER JOIN gallery_cards ON gallery_cards.id = photos.gallery_card_id")
.joins("LEFT OUTER JOIN quote_cards ON quote_cards.id = photos.quote_card_id")
.where("photos.company_id = :id OR gallery_cards.company_id = :id OR quote_cards.company_id = :id", id: id)
end

How to select only unique entries from a join table

I would like to make my block of code more efficient. I have two models and a join table for them. They both have a has_many :through relationship. Some parts belong to multiple groups, some only belong to one. I need to get the records that belong to only one group and in the most efficient manner as there are thousands of parts. Here are my models:
part.rb
class Part < ActiveRecord::Base
attr_accessible :name,
:group_ids
has_many :part_groups, dependent: :destroy
has_many :groups, through: :part_groups, select: 'groups.*, part_groups.*'
end
group.rb
class Group < ActiveRecord::Base
attr_accessible :name,
:part_ids
has_many :part_groups, dependent: :destroy
has_many :parts, through: :part_groups, select: 'parts.*, part_groups.*'
end
part_group.rb
class PartGroup < ActiveRecord::Base
attr_accessible :part_id,
:group_id
belongs_to :part
belongs_to :group
end
What I would like to be able to do is get all the parts that belong only to Group A and only to Group B, but not ones that belong to both A & B. After struggling with this for hours and getting nowhere I'm using this as a stop gap:
#groupA = []
#groupB = []
Part.all.each do |part|
if part.group_ids.length == 1
if part.group_ids.first == 1
#groupA.push(part)
elsif part.group_ids.first == 2
#groupB.push(part)
end
end
end
This obviously isn't scalable as there will be many groups. I've tried various methods of join and include that I've been googling but so far nothing has worked.
I am also new to rails , So as far i understand this is the structure of your tables.
parts
Id | Name
groups
Id | Name
part_groups
Id | part_id | group_id
So you can do the following,
Group.find(1).parts // Parts belong to group A
Group.find(2).parts // Parts belong to group B
so this may give parts that belong to other groups also.
Objective is to get parts that belongs only to group A and only to group B
Try for
Group.find(1).parts.collect{|row| row if row.groups.count==1}.flatten
I think this is better approach than yours , because am traversing only those parts which belong to group1.
The raw sql for this could look like
select parts.* from parts
inner join part_groups on parts.id = part_groups.part_id
left outer join part_groups as group_b on group_b.part_id = parts.id and group_b.group_id = 456
where group_b.id is null and part_groups.group_id = 123
Assuming that group a had id 123 and group b had id 456.
What this does is try to join the part_groups table twice (so an alias needs to be used the second time), once where group_id matches group A and once against group B. The use of the left join allows us to require that the second join (against B) produces no rows.
Activerecord doesn't provide much assistance for this, other than allowing you to pass arbitrary sql fragments to joins, so you end up with something like
Part.select("parts.*").
.joins(:part_groups).
.joins("left outer join part_groups as group_b on group_b.group_id = #{groupb.id} and group_b.part_id = parts.id").
.where(:part_groups => {:group_id => groupa.id}).where("group_b.id is null")
Arel (which underpins the query generation part of active record) can generate this sort of query but this isn't exposed directly.