Group entries based on associated model fields in aggregate - sql

I have two models, Keyword and Company, associated using has_and_belongs_to_many.
On the Keyword model, there is a boolean field included. If this is set to true, any Company associated with that Keyword should be considered included, unless it has any associated Keywords with included set to false. included can also be set to nil which I consider a state of "empty".
To summarize:
Company A has 2 associated Keywords, 1 included = true and 1 included = nil. This company is INCLUDED.
Company B has 2 associated Keywords, 1 included = false and 1 included = true. This company is EXCLUDED.
Company C has 2 associated Keywords, both included = nil. This company is EMPTY.
What is the best way to 1) count the number of Included, Excluded, and Empty companies, and 2) query/scope the Company model to Included, Excluded, or Empty?
The current solution I have hacked together is causing expensive queries that are resulting (usually) in request timeouts. Models follow:
class Company < ApplicationRecord
has_and_belongs_to_many :keywords
scope :included, -> { joins(:keywords).merge(Keyword.included).group('id').reorder('') }
scope :excluded, -> { joins(:keywords).merge(Keyword.excluded).group('id').reorder('') }
scope :empty, -> { joins(:keywords).merge(Keyword.empty).group('id').reorder('') }
end
class Keyword < ApplicationRecord
has_and_belongs_to_many :companies
scope :excluded, -> { where(included: false) }
scope :included, -> { where(included: true) }
scope :empty, -> { where(included: nil) }
end
(reorder is included in Company model to resolve a quirk with pg_search gem and grouping)

I´m always trying to avoid joins because they are creating duplicates that you´d have to remove with a group('id').
I´d rather use EXISTS.
class Company < ApplicationRecord
has_and_belongs_to_many :keywords
scope :included, -> do
where(
Keyword
.included
.joins("INNER JOIN companies_keywords ON companies_keywords.keyword_id = keywords.id AND company_id = companies.id")
.arel.exists
).where.not(
Keyword
.excluded
.joins("INNER JOIN companies_keywords ON companies_keywords.keyword_id = keywords.id AND company_id = companies.id")
.arel.exists
)
end
scope :excluded, -> do
where(
Keyword
.excluded
.joins("INNER JOIN companies_keywords ON companies_keywords.keyword_id = keywords.id AND company_id = companies.id")
.arel.exists
)
end
scope :empty, -> do
where.not(
Keyword
.joins("INNER JOIN companies_keywords ON companies_keywords.keyword_id = keywords.id AND company_id = companies.id")
.where(included: [true, false])
.arel.exists
)
end
end
Also i recently discovered this Gem which simplifies exists queries. I find that particularly helpful.
You could just write:
scope :included, -> { where_exists(:keywords, &:included).where_not_exists(:keywords, &:excluded) }
I didn´t test the performance on these but it should be pretty solid (as long as you added the right indexes).
Hope this answers your questions.

Related

Find an existing messaging group when given the potential members

I have an application where users can create messaging groups. MessageGroups have members through MessageMemberships. MessageMemberships belongs to a 'profile', which is polymorphic due to their being different types of 'profiles' in the db.
MessageGroup
class MessageGroup < ApplicationRecord
has_many :message_memberships, dependent: :destroy
has_many :coach_profiles, through: :message_memberships, source: :profile, source_type: "CoachProfile"
has_many :parent_profiles, through: :message_memberships, source: :profile, source_type: "ParentProfile"
has_many :customers, through: :message_memberships, source: :profile, source_type: "Customer"
end
MessageMembership
class MessageMembership < ApplicationRecord
belongs_to :message_group
belongs_to :profile, polymorphic: true
end
In my UI, I'd like to be able to first query to see if a messaging group exists with exactly x members so I can use that, rather than creating an entirely new messaging group (similar to how Slack or iMessages will find you an existing thread).
How would you go about querying that?
The code (not tested) below assumes:
You have (or can add) a message_memberships_count counter_cache column to the message_groups table. (and maybe adding an index to the message_memberships_count column to speed up the query)
You have proper unique indexing in the message_memberships table that will prevent a profile from being added to the same message_group multiple times
How it works:
There is a loop that will do multiple inner joins on the same table to ensure that the association exists for each profile
The query will then check that the total number of members in the group is equal to the number of profiles
class MessageGroup < ApplicationRecord
...
def self.for_profiles(profiles)
query = "SELECT \"message_groups\".* FROM \"message_groups\""
profiles.each do |profile|
klass = profile.class.name
# provide an alias to the table to prevent `PG::DuplicateAlias: ERROR
table_alias = "message_memberships_#{Digest::SHA1.hexdigest("#{klass}_#{profile.id}")[0..6]}"
query += " INNER JOIN \"message_memberships\" \"#{table_alias}\" ON \"#{table_alias}\".\"message_group_id\" = \"message_groups\".\"id\" AND \"#{table_alias}\".\"profile_type\" = #{klass} AND \"#{table_alias}\".\"profile_id\" = #{profile.id}"
end
query += " where \"message_groups\".\"message_memberships_count\" = #{profiles.length}"
self.find_by_sql(query)
end
end
Based on #AbM's answer I arrived at the following. This has the same assumptions as the previous answer, counter cache and unique indexing should be in place.
def self.find_direct_with_profiles!(profiles)
# Not present, some authorization checks that may raise (hence the bang method name)
# Loop through the profiles and join them all together so we get a join that contains
# all the data we need in order to filter it down
join = ""
conditions = ""
profiles.each_with_index do |profile, index|
klass = profile.class.name
# provide an alias to the table to prevent `PG::DuplicateAlias: ERROR
table_alias = "message_memberships_#{Digest::SHA1.hexdigest("#{klass}_#{profile.id}")[0..6]}"
join += " INNER JOIN \"message_memberships\" \"#{table_alias}\" ON \"#{table_alias}\".\"message_group_id\" = \"message_groups\".\"id\""
condition_join = index == 0 ? 'where' : ' and'
conditions += "#{condition_join} \"#{table_alias}\".\"profile_type\" = '#{klass}' and \"#{table_alias}\".\"profile_id\" = #{profile.id}"
end
# Add one
size_conditional = " and \"message_groups\".\"message_memberships_count\" = #{profiles.size}"
# Add any other conditions you may need
conditions += "#{size_conditional}"
query = "SELECT \"message_groups\".* FROM \"message_groups\" #{join} #{conditions}"
# find_by_sql returns an array with hydrated models from the select statement. In this case I am just grabbing the first one to match other finder active record method conventions
self.find_by_sql(query).first
end

Rails scope for parent records that do not have particular child records

I have a parent model Effort that has_many split_times:
class Effort
has_many :split_times
end
class SplitTime
belongs_to :effort
belongs_to :split
end
class Split
has_many :split_times
enum kind: [:start, :finish, :intermediate]
end
I need a scope that will return efforts that do not have a start split_time. This seems like it should be possible, but so far I'm not able to do it.
I can return efforts with no split_times with this:
scope :without_split_times, -> { includes(:split_times).where(split_times: {:id => nil}) }
And I can return efforts that have at least one split_time with this:
scope :with_split_times, -> { joins(:split_times).uniq }
Here's my attempt at the scope I want:
scope :without_start_time, -> { joins(split_times: :split).where(split_times: {:id => nil}).where('splits.kind != ?', Split.kinds[:start]) }
But that doesn't work. I need something that will return all efforts that do not have a split_time that has a split with kind: :start even if the efforts have other split_times. I would prefer a Rails solution but can go to raw SQL if necessary. I'm using Postgres if it matters.
You can left join on your criteria (i.e. splits.kind = 'start') which will include nulls (i.e. there was no matching row to join). The difference is that Rails' join will by default give you an inner join (there are matching rows in both tables) but you want a left join as you need to check that there is no matching row on the right table.
With the results of that join you can group by event and then count the number of matching splits - if it's 0 then there are no matching start splits for that event!
This might do the trick for you:
scope :without_start_time, -> {
joins("LEFT JOIN split_times ON split_times.effort_id = efforts.id").
joins("LEFT OUTER JOIN splits ON split_times.split_id = splits.id " \
"AND splits.kind = 0").
group("efforts.id").
having("COUNT(splits.id) = 0")
}

ActiveRecord: Exclude group if at least one record within it doesn't meet condition

I have two models: an owner and a pet. An owner has_many :pets and a pet belongs_to :owner.
What I want to do is grab only those owners that have pets which ALL weigh over 30lbs.
#app/models/owner.rb
class Owner < ActiveRecord::Base
has_many :pets
#return only those owners that have heavy pets
end
#app/models/pet.rb
class Pet < ActiveRecord::Base
belongs_to :owner
scope :heavy, ->{ where(["weight > ?", 30])}
end
Here is what is in my database. I have three owners:
Neil, and ALL of which ARE heavy;
John, and ALL of which ARE NOT heavy;
Bob, and SOME of his pets ARE heavy and SOME that ARE NOT heavy.
The query should return only Neil. Right now my attempts return Neil and Bob.
You can form a group for each owner_id and check, if all rows within group match required condition or at least one row doesn't match it, you can achieve it with group by and having clauses:
scope :heavy, -> { group("owner_id").having(["count(case when weight <= ? then weight end) = 0", 30]) }
There is also another option, more of a Rails-ActiverRecord approach:
scope :heavy, -> { where.not(owner_id: Pet.where(["weight <= ?", 30]).distinct.pluck(:owner_id)).distinct }
Here you get all owner_ids that don't fit condition (searching by contradiction) and exclude them from the result of original query.
Isn't this simply a matter of finding the owners for whom the minimum pet weight is greater than some value:
scope :heavy, -> { group("owner_id").joins(:pets).having("min(pets.weight) >= ?", 30)}
Or conversely,
scope :light, -> { group("owner_id").joins(:pets).having("max(pets.weight) < ?", 30)}
These are scopes on the Owner, by the way, not the Pet
Another approach is to turn this into a scope on Owner:
Owner.where(Pet.where.not("pets.owner_id = owners.id and pets.weight < ?", 30).exists)
Subtly different, as it is checking for the non-existence of a per with a weight less than 30, so if an owner has no pets then this condition will match for that owner.
In database terms, this is going to be the most efficient query for large data sets.
Indexing of pets(owner_id, weight) is recommended for both these approaches.
What if you do it in two steps, first you get all owner_ids that have at least 1 heavy pet, then get all owner_ids that have at least 1 not-heavy pet and then grab the owners where id exists in the first array but not in the second?
Something like:
scope :not_heavy, -> { where('weight <= ?', 30) }
...
owner_ids = Pet.heavy.pluck(:owner_id) - Pet.not_heavy.pluck(:owner_id)
owners_with_all_pets_heavy = Owner.where(id: owner_ids)
You can just add a the uniq to your scope:
scope :heavy_pets, -> { uniq.joins(:pets).merge(Pet.heavy) }
It works on a database level, using the distinct query.

LEFT OUTER JOIN in Rails 4

I have 3 models:
class Student < ActiveRecord::Base
has_many :student_enrollments, dependent: :destroy
has_many :courses, through: :student_enrollments
end
class Course < ActiveRecord::Base
has_many :student_enrollments, dependent: :destroy
has_many :students, through: :student_enrollments
end
class StudentEnrollment < ActiveRecord::Base
belongs_to :student
belongs_to :course
end
I wish to query for a list of courses in the Courses table, that do not exist in the StudentEnrollments table that are associated with a certain student.
I found that perhaps Left Join is the way to go, but it seems that joins() in rails only accept a table as argument.
The SQL query that I think would do what I want is:
SELECT *
FROM Courses c LEFT JOIN StudentEnrollment se ON c.id = se.course_id
WHERE se.id IS NULL AND se.student_id = <SOME_STUDENT_ID_VALUE> and c.active = true
How do I execute this query the Rails 4 way?
Any input is appreciated.
You can pass a string that is the join-sql too. eg joins("LEFT JOIN StudentEnrollment se ON c.id = se.course_id")
Though I'd use rails-standard table naming for clarity:
joins("LEFT JOIN student_enrollments ON courses.id = student_enrollments.course_id")
If anyone came here looking for a generic way to do a left outer join in Rails 5, you can use the #left_outer_joins function.
Multi-join example:
Ruby:
Source.
select('sources.id', 'count(metrics.id)').
left_outer_joins(:metrics).
joins(:port).
where('ports.auto_delete = ?', true).
group('sources.id').
having('count(metrics.id) = 0').
all
SQL:
SELECT sources.id, count(metrics.id)
FROM "sources"
INNER JOIN "ports" ON "ports"."id" = "sources"."port_id"
LEFT OUTER JOIN "metrics" ON "metrics"."source_id" = "sources"."id"
WHERE (ports.auto_delete = 't')
GROUP BY sources.id
HAVING (count(metrics.id) = 0)
ORDER BY "sources"."id" ASC
There is actually a "Rails Way" to do this.
You could use Arel, which is what Rails uses to construct queries for ActiveRecrods
I would wrap it in method so that you can call it nicely and pass in whatever argument you would like, something like:
class Course < ActiveRecord::Base
....
def left_join_student_enrollments(some_user)
courses = Course.arel_table
student_entrollments = StudentEnrollment.arel_table
enrollments = courses.join(student_enrollments, Arel::Nodes::OuterJoin).
on(courses[:id].eq(student_enrollments[:course_id])).
join_sources
joins(enrollments).where(
student_enrollments: {student_id: some_user.id, id: nil},
active: true
)
end
....
end
There is also the quick (and slightly dirty) way that many use
Course.eager_load(:students).where(
student_enrollments: {student_id: some_user.id, id: nil},
active: true
)
eager_load works great, it just has the "side effect" of loding models in memory that you might not need (like in your case)
Please see Rails ActiveRecord::QueryMethods .eager_load
It does exactly what you are asking in a neat way.
Combining includes and where results in ActiveRecord performing a LEFT OUTER JOIN behind the scenes (without the where this would generate the normal set of two queries).
So you could do something like:
Course.includes(:student_enrollments).where(student_enrollments: { course_id: nil })
Docs here: http://guides.rubyonrails.org/active_record_querying.html#specifying-conditions-on-eager-loaded-associations
Adding to the answer above, to use includes, if you want an OUTER JOIN without referencing the table in the where (like id being nil) or the reference is in a string you can use references. That would look like this:
Course.includes(:student_enrollments).references(:student_enrollments)
or
Course.includes(:student_enrollments).references(:student_enrollments).where('student_enrollments.id = ?', nil)
http://api.rubyonrails.org/classes/ActiveRecord/QueryMethods.html#method-i-references
You'd execute the query as:
Course.joins('LEFT JOIN student_enrollment on courses.id = student_enrollment.course_id')
.where(active: true, student_enrollments: { student_id: SOME_VALUE, id: nil })
I know that this is an old question and an old thread but in Rails 5, you could simply do
Course.left_outer_joins(:student_enrollments)
You could use left_joins gem, which backports left_joins method from Rails 5 for Rails 4 and 3.
Course.left_joins(:student_enrollments)
.where('student_enrollments.id' => nil)
I've been struggling with this kind of problem for quite some while, and decided to do something to solve it once and for all. I published a Gist that addresses this issue: https://gist.github.com/nerde/b867cd87d580e97549f2
I created a little AR hack that uses Arel Table to dynamically build the left joins for you, without having to write raw SQL in your code:
class ActiveRecord::Base
# Does a left join through an association. Usage:
#
# Book.left_join(:category)
# # SELECT "books".* FROM "books"
# # LEFT OUTER JOIN "categories"
# # ON "books"."category_id" = "categories"."id"
#
# It also works through association's associations, like `joins` does:
#
# Book.left_join(category: :master_category)
def self.left_join(*columns)
_do_left_join columns.compact.flatten
end
private
def self._do_left_join(column, this = self) # :nodoc:
collection = self
if column.is_a? Array
column.each do |col|
collection = collection._do_left_join(col, this)
end
elsif column.is_a? Hash
column.each do |key, value|
assoc = this.reflect_on_association(key)
raise "#{this} has no association: #{key}." unless assoc
collection = collection._left_join(assoc)
collection = collection._do_left_join value, assoc.klass
end
else
assoc = this.reflect_on_association(column)
raise "#{this} has no association: #{column}." unless assoc
collection = collection._left_join(assoc)
end
collection
end
def self._left_join(assoc) # :nodoc:
source = assoc.active_record.arel_table
pk = assoc.association_primary_key.to_sym
joins source.join(assoc.klass.arel_table,
Arel::Nodes::OuterJoin).on(source[assoc.foreign_key].eq(
assoc.klass.arel_table[pk])).join_sources
end
end
Hope it helps.
See below my original post to this question.
Since then, I have implemented my own .left_joins() for ActiveRecord v4.0.x (sorry, my app is frozen at this version so I've had no need to port it to other versions):
In file app/models/concerns/active_record_extensions.rb, put the following:
module ActiveRecordBaseExtensions
extend ActiveSupport::Concern
def left_joins(*args)
self.class.left_joins(args)
end
module ClassMethods
def left_joins(*args)
all.left_joins(args)
end
end
end
module ActiveRecordRelationExtensions
extend ActiveSupport::Concern
# a #left_joins implementation for Rails 4.0 (WARNING: this uses Rails 4.0 internals
# and so probably only works for Rails 4.0; it'll probably need to be modified if
# upgrading to a new Rails version, and will be obsolete in Rails 5 since it has its
# own #left_joins implementation)
def left_joins(*args)
eager_load(args).construct_relation_for_association_calculations
end
end
ActiveRecord::Base.send(:include, ActiveRecordBaseExtensions)
ActiveRecord::Relation.send(:include, ActiveRecordRelationExtensions)
Now I can use .left_joins() everywhere I'd normally use .joins().
----------------- ORIGINAL POST BELOW -----------------
If you want OUTER JOINs without all the extra eagerly loaded ActiveRecord objects, use .pluck(:id) after .eager_load() to abort the eager load while preserving the OUTER JOIN. Using .pluck(:id) thwarts eager loading because the column name aliases (items.location AS t1_r9, for example) disappear from the generated query when used (these independently named fields are used to instantiate all the eagerly loaded ActiveRecord objects).
A disadvantage of this approach is that you then need to run a second query to pull in the desired ActiveRecord objects identified in the first query:
# first query
idents = Course
.eager_load(:students) # eager load for OUTER JOIN
.where(
student_enrollments: {student_id: some_user.id, id: nil},
active: true
)
.distinct
.pluck(:id) # abort eager loading but preserve OUTER JOIN
# second query
Course.where(id: idents)
It'a join query in Active Model in Rails.
Please click here for More info about Active Model Query Format.
#course= Course.joins("LEFT OUTER JOIN StudentEnrollment
ON StudentEnrollment .id = Courses.user_id").
where("StudentEnrollment .id IS NULL AND StudentEnrollment .student_id =
<SOME_STUDENT_ID_VALUE> and Courses.active = true").select
Use Squeel:
Person.joins{articles.inner}
Person.joins{articles.outer}
If anyone out there still needs true left_outer_joins support in Rails 4.2 then if you install the gem "brick" on Rails 4.2.0 or later it automatically adds the Rails 5.0 implementation of left_outer_joins. You would probably want to turn off the rest of its functionality, that is unless you want an automatic "admin panel" kind of thing available in your app!

Scope with association and ActiveRecord

I have an app that records calls. Each call can have multiple units associated with it. Part of my app has a reports section which basically just does a query on the Call model for different criteria. I've figured out how to write some scopes that do what I want and chain them to the results of my reporting search functionality. But I can't figure out how to search by "unit". Below are relevant excerpts from my code:
Call.rb
has_many :call_units
has_many :units, through: :call_units
#Report search logic
def self.report(search)
search ||= { type: "all" }
# Determine which scope to search by
results = case search[:type]
when "open"
open_status
when "canceled"
cancel
when "closed"
closed
when "waitreturn"
waitreturn
when "wheelchair"
wheelchair
else
scoped
end
#Search results by unit name, this is what I need help with. Scope or express otherwise?
results = results. ??????
results = results.by_service_level(search[:service_level]) if search[:service_level].present?
results = results.from_facility(search[:transferred_from]) if search[:transferred_from].present?
results = results.to_facility(search[:transferred_to]) if search[:transferred_to].present?
# If searching with BOTH a start and end date
if search[:start_date].present? && search[:end_date].present?
results = results.search_between(Date.parse(search[:start_date]), Date.parse(search[:end_date]))
# If search with any other date parameters (including none)
else
results = results.search_by_start_date(Date.parse(search[:start_date])) if search[:start_date].present?
results = results.search_by_end_date(Date.parse(search[:end_date])) if search[:end_date].present?
end
results
end
Since I have an association for units already, I'm not sure if I need to make a scope for units somehow or express the results somehow in the results variable in my search logic.
Basically, you want a scope that uses a join so you can use a where criteria in against the associated model? Is that correct?
So in SQL you're looking for something like
select * from results r
inner join call_units c on c.result_id = r.id
inner join units u on u.call_unit_id = c.id
where u.name = ?
and the scope would be (from memory, I haven't debugged this) something like:
scope :by_unit_name, lambda {|unit_name|
joins(:units).where('units.unit_name = ?', unit_name)
}
units.name isn't a column in the db. Changing it to units.unit_name didn't raise an exception and seems to be what I want. Here's what I have in my results variable:
results = results.by_unit_name(search[:unit_name]) if search[:unit_name].present?
When I try to search by a different unit name no results show up. Here's the code I'm using to search:
<%= select_tag "search[unit_name]", options_from_collection_for_select(Unit.order(:unit_name), :unit_name, :unit_name, selected: params[:search].try(:[], :unit_name)), prompt: "Any Unit" %>