Rails 3 user matching-algorithm to SQL Query (COMPLICATED) - sql

I'm currently working on an app that matches users based on answered questions.
I realized my algorithm in normal RoR and ActiveRecord queries but it's waaay to slow to use it. To match one user with 100 other users takes
Completed 200 OK in 17741ms (Views: 106.1ms | ActiveRecord: 1078.6ms)
on my local machine. But still...
I now want to realize this in raw SQL in order to gain some more performance. But I'm really having trouble getting my head around SQL queries inside of SQL queries and stuff like this plus calculations etc. My head is about to explode and I don't even know where to start.
Here's my algorithm:
def match(user)
#a_score = (self.actual_score(user).to_f / self.possible_score(user).to_f) * 100
#b_score = (user.actual_score(self).to_f / user.possible_score(self).to_f) * 100
if self.common_questions(user) == []
0.to_f
else
match = Math.sqrt(#a_score * #b_score) - (100 / self.common_questions(user).count)
if match <= 0
0.to_f
else
match
end
end
end
def possible_score(user)
i = 0
self.user_questions.select("question_id, importance").find_each do |n|
if user.user_questions.select(:id).find_by_question_id(n.question_id)
i += Importance.find_by_id(n.importance).value
end
end
return i
end
def actual_score(user)
i = 0
self.user_questions.select("question_id, importance").includes(:accepted_answers).find_each do |n|
#user_answer = user.user_questions.select("answer_id").find_by_question_id(n.question_id)
unless #user_answer == nil
if n.accepted_answers.select(:answer_id).find_by_answer_id(#user_answer.answer_id)
i += Importance.find_by_id(n.importance).value
end
end
end
return i
end
So basically a user answers a questions, picks what answers he accepts and how important that question is to him. The algorithm then checks what questions 2 users have in common, if user1 gave an answer user2 accepts, if yes then the importance user2 gave for each question is added which makes up the score user1 made. Also the other way around for user2. Divided by the possible score gives the percentage and both percentages applied to the geometric mean gives me one total match percentage for both users. Fairly complicated I know. Tell if I didn't explain it good enough. I just hope I can express this in raw SQL. Performance is everything in this.
Here are my database tables:
CREATE TABLE "users" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "username" varchar(255) DEFAULT '' NOT NULL); (left some unimportant stuff out, it's all there in the databse dump i uploaded)
CREATE TABLE "user_questions" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "user_id" integer, "question_id" integer, "answer_id" integer(255), "importance" integer, "explanation" text, "private" boolean DEFAULT 'f', "created_at" datetime);
CREATE TABLE "accepted_answers" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "user_question_id" integer, "answer_id" integer);
I guess the top of the SQL query has to look something like this?
SELECT u1.id AS user1, u2.id AS user2, COALESCE(SQRT( (100.0*actual_score/possible_score) * (100.0*actual_score/possible_score) ), 0) AS match
FROM
But since I'm not an SQL master and can only do the usual stuff my head is about to explode.
I hope someone can help me figure this out. Or atleast improve my performance somehow! Thanks so much!
EDIT:
So based on Wizard's answer I've managed to get a nice SQL statement for "possible_score"
SELECT SUM(value) AS sum_id
FROM user_questions AS uq1
INNER JOIN importances ON importances.id = uq1.importance
INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = 101
WHERE uq1.user_id = 1
I've tried to get the "actual_score" with this but it didn't work. My database manager crashed when I executed this.
SELECT SUM(imp.value) AS sum_id
FROM user_questions AS uq1
INNER JOIN importances imp ON imp.id = uq1.importance
INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = 101
INNER JOIN accepted_answers as ON as.user_question_id = uq1.id AND as.answer_id = uq2.answer_id
WHERE uq1.user_id = 1
EDIT2
Okay I'm an idiot! I can't use "as" as an alias of course. Changed it to aa and it worked! W00T!

I know you were thinking about moving to a SQL solution, but there are some major performance improvements which can be made to your Ruby code which might eliminate the need to use hand-coded SQL. When optimizing your code it is often worth using a profiler to make sure you really know which parts are the problem. In your example I think some big improvements can be made by removing iterative code and database queries which are executed during each iteration!
Also, if you are using a recent version of ActiveRecord you can generate queries with subselects without the need to code any SQL. Of course it is important that you have proper indexes created for your database.
I'm making a lot of assumptions about your models and relationships based on what I can infer from your code. If I'm wrong let me know and I'll try to make some adjustments accordingly.
def match(user)
if self.common_questions(user) == []
0.to_f
else
# Move a_score and b_score calculation inside this conditional branch since it is otherwise not needed.
#a_score = (self.actual_score(user).to_f / self.possible_score(user).to_f) * 100
#b_score = (user.actual_score(self).to_f / user.possible_score(self).to_f) * 100
match = Math.sqrt(#a_score * #b_score) - (100 / self.common_questions(user).count)
if match <= 0
0.to_f
else
match
end
end
end
def possible_score(user)
# If user_questions.importance contains ID values of importances, then you should set up a relation between UserQuestion and Importance.
# I.e. UserQuestion belongs_to :importance, and Importance has_many :user_questions.
# I'm assuming that user_questions represents join models between users and questions.
# I.e. User has_many :user_questions, and User has_many :questions, :through => :user_questions.
# Question has_many :user_questions, and Question has_many :users, :through => :user_questions
# From your code this seems like the logical setup. Let me know if my assumption is wrong.
self.user_questions.
joins(:importance). # Requires the relation between UserQuestion and Importance I described above
where(:question_id => Question.joins(:user_questions).where(:user_id => user.id)). # This should create a where clause with a subselect with recent versions of ActiveRecord
sum(:value) # I'm also assuming that the importances table has a `value` column.
end
def actual_score(user)
user_questions.
joins(:importance, :accepted_answers). # It looks like accepted_answers indicates an answers table
where(:answer_id => Answer.joins(:user_questions).where(:user_id => user.id)).
sum(:value)
end
UserQuestion seems to be a super join model between User, Question, Answer and Importance. Here are the model relations relevant to the code (not including the has_many :through relations you could create). I think you probably have these already:
# User
has_many :user_questions
# UserQuestion
belongs_to :user
belongs_to :question
belongs_to :importance, :foreign_key => :importance # Maybe rename the column `importance` to `importance_id`
belongs_to :answer
# Question
has_many :user_questions
# Importance
has_many :user_questions
# Answer
has_many :user_questions

So here's my new match function. I couldn't put everything in one query yet, because SQLite doesn't support math functions. But as soon as I switch to MySQL I will put everything in just one query. All this already gave me a HUGE performance boost of:
Completed 200 OK in 528ms (Views: 116.5ms | ActiveRecord: 214.0ms)
to match one user with 100 other users. Quite good! I'll have to see how good it performs once I fill my database with 10k fake users. And extra cudos to "Wizard of Ogz" for pointing out my inefficient code!
EDIT:
tried it with only 1000 users, 10 to 100 UserQuestions each, and ...
Completed 200 OK in 104871ms (Views: 2146.0ms | ActiveRecord: 93780.5ms)
... boy did that take long! I will have to think of something to tackle this problem.
def match(user)
if self.common_questions(user) == []
0.to_f
else
#a_score = UserQuestion.find_by_sql(["SELECT 100.0*as1.actual_score/ps1.possible_score AS match
FROM (SELECT SUM(imp.value) AS actual_score
FROM user_questions AS uq1
INNER JOIN importances imp ON imp.id = uq1.importance
INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = ?
INNER JOIN accepted_answers aa ON aa.user_question_id = uq1.id AND aa.answer_id = uq2.answer_id
WHERE uq1.user_id = ?) AS as1, (SELECT SUM(value) AS possible_score
FROM user_questions AS uq1
INNER JOIN importances ON importances.id = uq1.importance
INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = ?
WHERE uq1.user_id = ?) AS ps1",user.id, self.id, user.id, self.id]).collect(&:match).first.to_f
#b_score = UserQuestion.find_by_sql(["SELECT 100.0*as1.actual_score/ps1.possible_score AS match
FROM (SELECT SUM(imp.value) AS actual_score
FROM user_questions AS uq1
INNER JOIN importances imp ON imp.id = uq1.importance
INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = ?
INNER JOIN accepted_answers aa ON aa.user_question_id = uq1.id AND aa.answer_id = uq2.answer_id
WHERE uq1.user_id = ?) AS as1, (SELECT SUM(value) AS possible_score
FROM user_questions AS uq1
INNER JOIN importances ON importances.id = uq1.importance
INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = ?
WHERE uq1.user_id = ?) AS ps1",self.id, user.id, self.id, user.id]).collect(&:match).first.to_f
match = Math.sqrt(#a_score * #b_score) - (100 / self.common_questions(user).count)
if match <= 0
0.to_f
else
match
end
end
end

Related

SQL LEFT JOIN value NOT in either join column

I suspect this is a rather common scenario and may show my ineptitude as a DB developer, but here goes anyway ...
I have two tables: Profiles and HiddenProfiles and the HiddenProfiles table has two relevant foreign keys: profile_id and hidden_profile_id that store ids from the Profiles table.
As you can imagine, a user can hide another user (wherein his profile ID would be the profile_id in the HiddenProfiles table) or he can be hidden by another user (wherein his profile ID would be put in the hidden_profile_id column). Again, a pretty common scenario.
Desired Outcome:
I want to do a join (or to be honest, whatever would be the most efficient query) on the Profiles and HiddenProfiles table to find all the profiles that a given profile is both not hiding AND not hidden from.
In my head I thought it would be pretty straightforward, but the iterations I came up with kept seeming to miss one half of the problem. Finally, I ended up with something that looks like this:
SELECT "profiles".* FROM "profiles"
LEFT JOIN hidden_profiles hp1 on hp1.profile_id = profiles.id and (hp1.hidden_profile_id = 1)
LEFT JOIN hidden_profiles hp2 on hp2.hidden_profile_id = profiles.id and (hp2.profile_id = 1)
WHERE (hp1.hidden_profile_id is null) AND (hp2.profile_id is null)
Don't get me wrong, this "works" but in my heart of hearts I feel like there should be a better way. If in fact there is not, I'm more than happy to accept that answer from someone with more wisdom than myself on the matter. :)
And for what it's worth these are two RoR models sitting on a Postgres DB, so solutions tailored to those constraints are appreciated.
Models are as such:
class Profile < ActiveRecord::Base
...
has_many :hidden_profiles, dependent: :delete_all
scope :not_hidden_to_me, -> (profile) { joins("LEFT JOIN hidden_profiles hp1 on hp1.profile_id = profiles.id and (hp1.hidden_profile_id = #{profile.id})").where("hp1.hidden_profile_id is null") }
scope :not_hidden_by_me, -> (profile) { joins("LEFT JOIN hidden_profiles hp2 on hp2.hidden_profile_id = profiles.id and (hp2.profile_id = #{profile.id})").where("hp2.profile_id is null") }
scope :not_hidden, -> (profile) { self.not_hidden_to_me(profile).not_hidden_by_me(profile) }
...
end
class HiddenProfile < ActiveRecord::Base
belongs_to :profile
belongs_to :hidden_profile, class_name: "Profile"
end
So to get the profiles I want I'm doing the following:
Profile.not_hidden(given_profile)
And again, maybe this is fine, but if there's a better way I'll happily take it.
If you want to get this list just for a single profile, I would implement an instance method to perform effectively the same query in ActiveRecord. The only modification I made is to perform a single join onto a union of subqueries and to apply the conditions on the subqueries. This should reduce the columns that need to be loaded into memory, and hopefully be faster (you'd need to benchmark against your data to be sure):
class Profile < ActiveRecord::Base
def visible_profiles
Profile.joins("LEFT OUTER JOIN (
SELECT profile_id p_id FROM hidden_profiles WHERE hidden_profile_id = #{id}
UNION ALL
SELECT hidden_profile_id p_id FROM hidden_profiles WHERE profile_id = #{id}
) hp ON hp.p_id = profiles.id").where("hp.p_id IS NULL")
end
end
Since this method returns an ActiveRecord scope, you can chain additional conditions if desired:
Profile.find(1).visible_profiles.where("created_at > ?", Time.new(2015,1,1)).order(:name)
Personally I've never liked the join = null approach. I find it counter intuitive. You're asking for a join, and then limiting the results to records that don't match.
I'd approach it more as
SELECT id FROM profiles p
WHERE
NOT EXISTS
(SELECT * FROM hidden_profiles hp1
WHERE hp1.hidden_profile_id = 1 and hp1.profile_id = p.profile_id)
AND
NOT EXISTS (SELECT * FROM hidden_profiles hp2
WHERE hp2.hidden_profile_id = p.profile_id and hp2.profile_id = 1)
But you're going to need to run it some EXPLAINs with realistic volumes to be sure of which works best.

LEFT OUTER JOIN in Rails 4

I have 3 models:
class Student < ActiveRecord::Base
has_many :student_enrollments, dependent: :destroy
has_many :courses, through: :student_enrollments
end
class Course < ActiveRecord::Base
has_many :student_enrollments, dependent: :destroy
has_many :students, through: :student_enrollments
end
class StudentEnrollment < ActiveRecord::Base
belongs_to :student
belongs_to :course
end
I wish to query for a list of courses in the Courses table, that do not exist in the StudentEnrollments table that are associated with a certain student.
I found that perhaps Left Join is the way to go, but it seems that joins() in rails only accept a table as argument.
The SQL query that I think would do what I want is:
SELECT *
FROM Courses c LEFT JOIN StudentEnrollment se ON c.id = se.course_id
WHERE se.id IS NULL AND se.student_id = <SOME_STUDENT_ID_VALUE> and c.active = true
How do I execute this query the Rails 4 way?
Any input is appreciated.
You can pass a string that is the join-sql too. eg joins("LEFT JOIN StudentEnrollment se ON c.id = se.course_id")
Though I'd use rails-standard table naming for clarity:
joins("LEFT JOIN student_enrollments ON courses.id = student_enrollments.course_id")
If anyone came here looking for a generic way to do a left outer join in Rails 5, you can use the #left_outer_joins function.
Multi-join example:
Ruby:
Source.
select('sources.id', 'count(metrics.id)').
left_outer_joins(:metrics).
joins(:port).
where('ports.auto_delete = ?', true).
group('sources.id').
having('count(metrics.id) = 0').
all
SQL:
SELECT sources.id, count(metrics.id)
FROM "sources"
INNER JOIN "ports" ON "ports"."id" = "sources"."port_id"
LEFT OUTER JOIN "metrics" ON "metrics"."source_id" = "sources"."id"
WHERE (ports.auto_delete = 't')
GROUP BY sources.id
HAVING (count(metrics.id) = 0)
ORDER BY "sources"."id" ASC
There is actually a "Rails Way" to do this.
You could use Arel, which is what Rails uses to construct queries for ActiveRecrods
I would wrap it in method so that you can call it nicely and pass in whatever argument you would like, something like:
class Course < ActiveRecord::Base
....
def left_join_student_enrollments(some_user)
courses = Course.arel_table
student_entrollments = StudentEnrollment.arel_table
enrollments = courses.join(student_enrollments, Arel::Nodes::OuterJoin).
on(courses[:id].eq(student_enrollments[:course_id])).
join_sources
joins(enrollments).where(
student_enrollments: {student_id: some_user.id, id: nil},
active: true
)
end
....
end
There is also the quick (and slightly dirty) way that many use
Course.eager_load(:students).where(
student_enrollments: {student_id: some_user.id, id: nil},
active: true
)
eager_load works great, it just has the "side effect" of loding models in memory that you might not need (like in your case)
Please see Rails ActiveRecord::QueryMethods .eager_load
It does exactly what you are asking in a neat way.
Combining includes and where results in ActiveRecord performing a LEFT OUTER JOIN behind the scenes (without the where this would generate the normal set of two queries).
So you could do something like:
Course.includes(:student_enrollments).where(student_enrollments: { course_id: nil })
Docs here: http://guides.rubyonrails.org/active_record_querying.html#specifying-conditions-on-eager-loaded-associations
Adding to the answer above, to use includes, if you want an OUTER JOIN without referencing the table in the where (like id being nil) or the reference is in a string you can use references. That would look like this:
Course.includes(:student_enrollments).references(:student_enrollments)
or
Course.includes(:student_enrollments).references(:student_enrollments).where('student_enrollments.id = ?', nil)
http://api.rubyonrails.org/classes/ActiveRecord/QueryMethods.html#method-i-references
You'd execute the query as:
Course.joins('LEFT JOIN student_enrollment on courses.id = student_enrollment.course_id')
.where(active: true, student_enrollments: { student_id: SOME_VALUE, id: nil })
I know that this is an old question and an old thread but in Rails 5, you could simply do
Course.left_outer_joins(:student_enrollments)
You could use left_joins gem, which backports left_joins method from Rails 5 for Rails 4 and 3.
Course.left_joins(:student_enrollments)
.where('student_enrollments.id' => nil)
I've been struggling with this kind of problem for quite some while, and decided to do something to solve it once and for all. I published a Gist that addresses this issue: https://gist.github.com/nerde/b867cd87d580e97549f2
I created a little AR hack that uses Arel Table to dynamically build the left joins for you, without having to write raw SQL in your code:
class ActiveRecord::Base
# Does a left join through an association. Usage:
#
# Book.left_join(:category)
# # SELECT "books".* FROM "books"
# # LEFT OUTER JOIN "categories"
# # ON "books"."category_id" = "categories"."id"
#
# It also works through association's associations, like `joins` does:
#
# Book.left_join(category: :master_category)
def self.left_join(*columns)
_do_left_join columns.compact.flatten
end
private
def self._do_left_join(column, this = self) # :nodoc:
collection = self
if column.is_a? Array
column.each do |col|
collection = collection._do_left_join(col, this)
end
elsif column.is_a? Hash
column.each do |key, value|
assoc = this.reflect_on_association(key)
raise "#{this} has no association: #{key}." unless assoc
collection = collection._left_join(assoc)
collection = collection._do_left_join value, assoc.klass
end
else
assoc = this.reflect_on_association(column)
raise "#{this} has no association: #{column}." unless assoc
collection = collection._left_join(assoc)
end
collection
end
def self._left_join(assoc) # :nodoc:
source = assoc.active_record.arel_table
pk = assoc.association_primary_key.to_sym
joins source.join(assoc.klass.arel_table,
Arel::Nodes::OuterJoin).on(source[assoc.foreign_key].eq(
assoc.klass.arel_table[pk])).join_sources
end
end
Hope it helps.
See below my original post to this question.
Since then, I have implemented my own .left_joins() for ActiveRecord v4.0.x (sorry, my app is frozen at this version so I've had no need to port it to other versions):
In file app/models/concerns/active_record_extensions.rb, put the following:
module ActiveRecordBaseExtensions
extend ActiveSupport::Concern
def left_joins(*args)
self.class.left_joins(args)
end
module ClassMethods
def left_joins(*args)
all.left_joins(args)
end
end
end
module ActiveRecordRelationExtensions
extend ActiveSupport::Concern
# a #left_joins implementation for Rails 4.0 (WARNING: this uses Rails 4.0 internals
# and so probably only works for Rails 4.0; it'll probably need to be modified if
# upgrading to a new Rails version, and will be obsolete in Rails 5 since it has its
# own #left_joins implementation)
def left_joins(*args)
eager_load(args).construct_relation_for_association_calculations
end
end
ActiveRecord::Base.send(:include, ActiveRecordBaseExtensions)
ActiveRecord::Relation.send(:include, ActiveRecordRelationExtensions)
Now I can use .left_joins() everywhere I'd normally use .joins().
----------------- ORIGINAL POST BELOW -----------------
If you want OUTER JOINs without all the extra eagerly loaded ActiveRecord objects, use .pluck(:id) after .eager_load() to abort the eager load while preserving the OUTER JOIN. Using .pluck(:id) thwarts eager loading because the column name aliases (items.location AS t1_r9, for example) disappear from the generated query when used (these independently named fields are used to instantiate all the eagerly loaded ActiveRecord objects).
A disadvantage of this approach is that you then need to run a second query to pull in the desired ActiveRecord objects identified in the first query:
# first query
idents = Course
.eager_load(:students) # eager load for OUTER JOIN
.where(
student_enrollments: {student_id: some_user.id, id: nil},
active: true
)
.distinct
.pluck(:id) # abort eager loading but preserve OUTER JOIN
# second query
Course.where(id: idents)
It'a join query in Active Model in Rails.
Please click here for More info about Active Model Query Format.
#course= Course.joins("LEFT OUTER JOIN StudentEnrollment
ON StudentEnrollment .id = Courses.user_id").
where("StudentEnrollment .id IS NULL AND StudentEnrollment .student_id =
<SOME_STUDENT_ID_VALUE> and Courses.active = true").select
Use Squeel:
Person.joins{articles.inner}
Person.joins{articles.outer}
If anyone out there still needs true left_outer_joins support in Rails 4.2 then if you install the gem "brick" on Rails 4.2.0 or later it automatically adds the Rails 5.0 implementation of left_outer_joins. You would probably want to turn off the rest of its functionality, that is unless you want an automatic "admin panel" kind of thing available in your app!

Filtering model with HABTM relationship

I have 2 models - Restaurant and Feature. They are connected via has_and_belongs_to_many relationship. The gist of it is that you have restaurants with many features like delivery, pizza, sandwiches, salad bar, vegetarian option,… So now when the user wants to filter the restaurants and lets say he checks pizza and delivery, I want to display all the restaurants that have both features; pizza, delivery and maybe some more, but it HAS TO HAVE pizza AND delivery.
If I do a simple .where('features IN (?)', params[:features]) I (of course) get the restaurants that have either - so or pizza or delivery or both - which is not at all what I want.
My SQL/Rails knowledge is kinda limited since I'm new to this but I asked a friend and now I have this huuuge SQL that gets the job done:
Restaurant.find_by_sql(['SELECT restaurant_id FROM (
SELECT features_restaurants.*, ROW_NUMBER() OVER(PARTITION BY restaurants.id ORDER BY features.id) AS rn FROM restaurants
JOIN features_restaurants ON restaurants.id = features_restaurants.restaurant_id
JOIN features ON features_restaurants.feature_id = features.id
WHERE features.id in (?)
) t
WHERE rn = ?', params[:features], params[:features].count])
So my question is: is there a better - more Rails even - way of doing this? How would you do it?
Oh BTW I'm using Rails 4 on Heroku so it's a Postgres DB.
This is an example of a set-iwthin-sets query. I advocate solving these with group by and having, because this provides a general framework.
Here is how this works in your case:
select fr.restaurant_id
from features_restaurants fr join
features f
on fr.feature_id = f.feature_id
group by fr.restaurant_id
having sum(case when f.feature_name = 'pizza' then 1 else 0 end) > 0 and
sum(case when f.feature_name = 'delivery' then 1 else 0 end) > 0
Each condition in the having clause is counting for the presence of one of the features -- "pizza" and "delivery". If both features are present, then you get the restaurant_id.
How much data is in your features table? Is it just a table of ids and names?
If so, and you're willing to do a little denormalization, you can do this much more easily by encoding the features as a text array on restaurant.
With this scheme your queries boil down to
select * from restaurants where restaurants.features #> ARRAY['pizza', 'delivery']
If you want to maintain your features table because it contains useful data, you can store the array of feature ids on the restaurant and do a query like this:
select * from restaurants where restaurants.feature_ids #> ARRAY[5, 17]
If you don't know the ids up front, and want it all in one query, you should be able to do something along these lines:
select * from restaurants where restaurants.feature_ids #> (
select id from features where name in ('pizza', 'delivery')
) as matched_features
That last query might need some more consideration...
Anyways, I've actually got a pretty detailed article written up about Tagging in Postgres and ActiveRecord if you want some more details.
This is not "copy and paste" solution but if you consider following steps you will have fast working query.
index feature_name column (I'm assuming that column feature_id is indexed on both tables)
place each feature_name param in exists():
select fr.restaurant_id
from
features_restaurants fr
where
exists(select true from features f where fr.feature_id = f.feature_id and f.feature_name = 'pizza')
and
exists(select true from features f where fr.feature_id = f.feature_id and f.feature_name = 'delivery')
group by
fr.restaurant_id
Maybe you're looking at it backwards?
Maybe try merging the restaurants returned by each feature.
Simplified:
pizza_restaurants = Feature.find_by_name('pizza').restaurants
delivery_restaurants = Feature.find_by_name('delivery').restaurants
pizza_delivery_restaurants = pizza_restaurants & delivery_restaurants
Obviously, this is a single instance solution. But it illustrates the idea.
UPDATE
Here's a dynamic method to pull in all filters without writing SQL (i.e. the "Railsy" way)
def get_restaurants_by_feature_names(features)
# accepts an array of feature names
restaurants = Restaurant.all
features.each do |f|
feature_restaurants = Feature.find_by_name(f).restaurants
restaurants = feature_restaurants & restaurants
end
return restaurants
end
Since its an AND condition (the OR conditions get dicey with AREL). I reread your stated problem and ignoring the SQL. I think this is what you want.
# in Restaurant
has_many :features
# in Feature
has_many :restaurants
# this is a contrived example. you may be doing something like
# where(name: 'pizza'). I'm just making this condition up. You
# could also make this more DRY by just passing in the name if
# that's what you're doing.
def self.pizza
where(pizza: true)
end
def self.delivery
where(delivery: true)
end
# query
Restaurant.features.pizza.delivery
Basically you call the association with ".features" and then you use the self methods defined on features. Hopefully I didn't misunderstand the original problem.
Cheers!
Restaurant
.joins(:features)
.where(features: {name: ['pizza','delivery']})
.group(:id)
.having('count(features.name) = ?', 2)
This seems to work for me. I tried it with SQLite though.

Rails 3 query matching attribute of has_one association that is a subset of has_many association

The title is confusing, but allow me to explain. I have a Car model that has multiple datapoints with different timestamps. We are almost always concerned with attributes of its latest status. So the model has_many statuses, along with a has_one to easily access it's latest one:
class Car < ActiveRecord::Base
has_many :statuses, class_name: 'CarStatus', order: "timestamp DESC"
has_one :latest_status, class_name: 'CarStatus', order: "timestamp DESC"
delegate :location, :timestamp, to: 'latest_status', prefix: 'latest', allow_nil: true
# ...
end
To give you an idea of what the statuses hold:
loc = Car.first.latest_location # Location object (id = 1 for example)
loc.name # "Miami, FL"
Let's say I wanted to have a (chainable) scope to find all cars with a latest location id of 1. Currently I have a sort of complex method:
# car.rb
def self.by_location_id(id)
ids = []
find_each(include: :latest_status) do |car|
ids << car.id if car.latest_status.try(:location_id) == id.to_i
end
where("id in (?)", ids)
end
There may be a quicker way to do this using SQL, but not sure how to only get the latest status for each car. There may be many status records with a location_id of 1, but if that's not the latest location for its car, it should not be included.
To make it harder... let's add another level and be able to scope by location name. I have this method, preloading statuses along with their location objects to be able to access the name:
def by_location_name(loc)
ids = []
find_each(include: {latest_status: :location}) do |car|
ids << car.id if car.latest_location.try(:name) =~ /#{loc}/i
end
where("id in (?)", ids)
end
This will match the location above with "miami", "fl", "MIA", etc... Does anyone have any suggestions on how I can make this more succinct/efficient? Would it be better to define my associations differently? Or maybe it will take some SQL ninja skills, which I admittedly don't have.
Using Postgres 9.1 (hosted on Heroku cedar stack)
All right. Since you're using postgres 9.1 like I am, I'll take a shot at this. Tackling the first problem first (scope to filter by location of last status):
This solution takes advantage of PostGres's support for analytic functions, as described here: http://explainextended.com/2009/11/26/postgresql-selecting-records-holding-group-wise-maximum/
I think the following gives you part of what you need (replace/interpolate the location id you're interested in for the '?', naturally):
select *
from (
select cars.id as car_id, statuses.id as status_id, statuses.location_id, statuses.created_at, row_number() over (partition by statuses.id order by statuses.created_at) as rn
from cars join statuses on cars.id = statuses.car_id
) q
where rn = 1 and location_id = ?
This query will return car_id, status_id, location_id, and a timestamp (called created_at by default, although you could alias it if some other name is easier to work with).
Now to convince Rails to return results based on this. Because you'll probably want to use eager loading with this, find_by_sql is pretty much out. There is a trick I discovered though, using .joins to join to a subquery. Here's approximately what it might look like:
def self.by_location(loc)
joins(
self.escape_sql('join (
select *
from (
select cars.id as car_id, statuses.id as status_id, statuses.location_id, statuses.created_at, row_number() over (partition by statuses.id order by statuses.created_at) as rn
from cars join statuses on cars.id = statuses.car_id
) q
where rn = 1 and location_id = ?
) as subquery on subquery.car_id = cars.id order by subquery.created_at desc', loc)
)
end
Join will act as a filter, giving you only the Car objects that were involved in the subquery.
Note: In order to refer to escape_sql as I do above, you'll need to modify ActiveRecord::Base slightly. I do this by adding this to an initializer in the app (which I place in app/config/initializers/active_record.rb):
class ActiveRecord::Base
def self.escape_sql(clause, *rest)
self.send(:sanitize_sql_array, rest.empty? ? clause : ([clause] + rest))
end
end
This allows you to call .escape_sql on any of your models that are based on AR::B. I find this profoundly useful, but if you've got some other way to sanitize sql, feel free to use that instead.
For the second part of the question - unless there are multiple locations with the same name, I'd just do a Location.find_by_name to turn it into an id to pass into the above. Basically this:
def self.by_location_name(name)
loc = Location.find_by_name(name)
by_location(loc)
end

Nested sql queries in rails when :has_and_belongst_to_many

In my application I the next task that has not already been done by a user. I have Three models, A Book that has many Tasks and then I have a User that has has and belongs to many tasks. The table tasks_users table contains all completed tasks so I need to write a complex query to find the next task to perform.
I have came up with two solutions in pure SQL that works, but I cant translate them to rails, thats what I need help with
SELECT * FROM `tasks`
WHERE `tasks`.`book_id` = #book_id
AND `tasks`.`id` NOT IN (
SELECT `tasks_users`.`task_id`
FROM `tasks_users`
WHERE `tasks_users`.`user_id` = #user_id)
ORDER BY `task`.`date` ASC
LIMIT 1;
and equally without nested select
SELECT *
FROM tasks
LEFT JOIN tasks_users
ON tasks_users.tasks_id = task.id
AND tasks_users.user_id = #user_id
WHERE tasks_users.task_id IS NULL
AND tasks.book_id = #book_id
LIMIT 1;
This is what I Have done in rails with the MetaWhere plugin
book.tasks.joins(:users.outer).where(:users => {:id => nil})
but I cant figure out how to get the current user there too,
Thanks for any help!
I think this will duplicate the second form with the LEFT JOIN:
class Task < ActiveRecord::Base
scope :next_task, lambda { |book,user| book.tasks.\
joins("LEFT JOIN task_users ON task_users.task_id=tasks.id AND task_users.user_id=#{user.id}").\
where(:tasks=>{:task_users=>{:task_id=>nil}}).\
order("date DESC").limit(1) }
end
Note that instead of tasks_users this uses the table name task_user, which is more typical for a join model. Also, it needs to be called with:
Task.next_task(#book_id,#user_id)
book.tasks.where("tasks.id not in (select task_id from tasks_users where user_id=?)", #user_id).first
That would give you the first task that doesn't already have an entry in tasks_users for the current user.