SQL LEFT JOIN value NOT in either join column - sql

I suspect this is a rather common scenario and may show my ineptitude as a DB developer, but here goes anyway ...
I have two tables: Profiles and HiddenProfiles and the HiddenProfiles table has two relevant foreign keys: profile_id and hidden_profile_id that store ids from the Profiles table.
As you can imagine, a user can hide another user (wherein his profile ID would be the profile_id in the HiddenProfiles table) or he can be hidden by another user (wherein his profile ID would be put in the hidden_profile_id column). Again, a pretty common scenario.
Desired Outcome:
I want to do a join (or to be honest, whatever would be the most efficient query) on the Profiles and HiddenProfiles table to find all the profiles that a given profile is both not hiding AND not hidden from.
In my head I thought it would be pretty straightforward, but the iterations I came up with kept seeming to miss one half of the problem. Finally, I ended up with something that looks like this:
SELECT "profiles".* FROM "profiles"
LEFT JOIN hidden_profiles hp1 on hp1.profile_id = profiles.id and (hp1.hidden_profile_id = 1)
LEFT JOIN hidden_profiles hp2 on hp2.hidden_profile_id = profiles.id and (hp2.profile_id = 1)
WHERE (hp1.hidden_profile_id is null) AND (hp2.profile_id is null)
Don't get me wrong, this "works" but in my heart of hearts I feel like there should be a better way. If in fact there is not, I'm more than happy to accept that answer from someone with more wisdom than myself on the matter. :)
And for what it's worth these are two RoR models sitting on a Postgres DB, so solutions tailored to those constraints are appreciated.
Models are as such:
class Profile < ActiveRecord::Base
...
has_many :hidden_profiles, dependent: :delete_all
scope :not_hidden_to_me, -> (profile) { joins("LEFT JOIN hidden_profiles hp1 on hp1.profile_id = profiles.id and (hp1.hidden_profile_id = #{profile.id})").where("hp1.hidden_profile_id is null") }
scope :not_hidden_by_me, -> (profile) { joins("LEFT JOIN hidden_profiles hp2 on hp2.hidden_profile_id = profiles.id and (hp2.profile_id = #{profile.id})").where("hp2.profile_id is null") }
scope :not_hidden, -> (profile) { self.not_hidden_to_me(profile).not_hidden_by_me(profile) }
...
end
class HiddenProfile < ActiveRecord::Base
belongs_to :profile
belongs_to :hidden_profile, class_name: "Profile"
end
So to get the profiles I want I'm doing the following:
Profile.not_hidden(given_profile)
And again, maybe this is fine, but if there's a better way I'll happily take it.

If you want to get this list just for a single profile, I would implement an instance method to perform effectively the same query in ActiveRecord. The only modification I made is to perform a single join onto a union of subqueries and to apply the conditions on the subqueries. This should reduce the columns that need to be loaded into memory, and hopefully be faster (you'd need to benchmark against your data to be sure):
class Profile < ActiveRecord::Base
def visible_profiles
Profile.joins("LEFT OUTER JOIN (
SELECT profile_id p_id FROM hidden_profiles WHERE hidden_profile_id = #{id}
UNION ALL
SELECT hidden_profile_id p_id FROM hidden_profiles WHERE profile_id = #{id}
) hp ON hp.p_id = profiles.id").where("hp.p_id IS NULL")
end
end
Since this method returns an ActiveRecord scope, you can chain additional conditions if desired:
Profile.find(1).visible_profiles.where("created_at > ?", Time.new(2015,1,1)).order(:name)

Personally I've never liked the join = null approach. I find it counter intuitive. You're asking for a join, and then limiting the results to records that don't match.
I'd approach it more as
SELECT id FROM profiles p
WHERE
NOT EXISTS
(SELECT * FROM hidden_profiles hp1
WHERE hp1.hidden_profile_id = 1 and hp1.profile_id = p.profile_id)
AND
NOT EXISTS (SELECT * FROM hidden_profiles hp2
WHERE hp2.hidden_profile_id = p.profile_id and hp2.profile_id = 1)
But you're going to need to run it some EXPLAINs with realistic volumes to be sure of which works best.

Related

Rails - scope for records that are not in a join table alongside a specific association

I have two models in a Rails app - Tournament and Player associated through a join table:
class Tournament < ApplicationRecord
has_many :tournament_players
has_many :players, through: :tournament_players
end
class Player < ApplicationRecord
has_many :tournament_players
has_many :tournaments, through: :tournament_players
scope :selected, -> (tournament) { includes(:tournaments).where(tournaments: {id: tournament.id}) }
end
I have lots of Tournaments, and each one can have lots of Players. Players can play in lots of Tournaments. The scope
scope :selected, -> (tournament) { includes(:tournaments).where(tournaments: {id: tournament.id}) }
successfuly finds all the players already added to a tournament, given that tournament as an argument.
What I'd like is a scope that does the opposite - returns all the players not yet added to a given tournament. I've tried
scope :not_selected, -> (tournament) { includes(:tournaments).where.not(tournaments: {id: tournament.id}) }
but that returns many of the same players, I think because the players exist as part of other tournaments. The SQL for that looks something like:
SELECT "players".*, "tournaments”.* FROM "players" LEFT OUTER JOIN
"tournament_players" ON "tournament_players"."player_id" =
"players"."id" LEFT OUTER JOIN "tournaments" ON "tournaments"."id" =
"tournament_players"."tournament_id" WHERE ("tournaments"."id" != $1)
ORDER BY "players"."name" ASC [["id", 22]]
I've also tried the suggestions on this question - using
scope :not_selected, -> (tournament) { includes(:tournaments).where(tournaments: {id: nil}) }
but that doesn't seem to work - it just returns an empty array, again I think because the Players exist in the join table as part of a separate Tournament. The SQL for that looks something like:
SELECT "players”.*, "tournaments”.* FROM "players" LEFT OUTER JOIN
"tournament_players" ON "tournament_players"."player_id" =
"players"."id" LEFT OUTER JOIN "tournaments" ON "tournaments"."id" =
"tournament_players"."tournament_id" WHERE "tournaments"."id" IS NULL
ORDER BY "players"."name" ASC
What you need to do is:
Make a left join with the reference table, with an additional condition on the tournament ID matching the one that you want to find the not-selected players for
Apply a WHERE clause indicating that there was no JOIN made.
This code should do it:
# player.rb
scope :not_selected, -> (tournament) do
joins("LEFT JOIN tournament_players tp ON players.id = tp.player_id AND tp.tournament_id = #{tournament.id}").where(tp: {tournament_id: nil})
end
If only Rails had a nicer way to write LEFT JOIN queries with additional conditions...
A few notes:
Don't join the actual relation (i.e. Tournament), it dramatically decreases performance of your query, and it's unnecessary, because all your condition prerequisites are inside the reference table. Besides, all the rows you're interested in return NULL data from the tournaments table.
Don't use eager_load. Besides to my best knowledge its not supporting custom conditions, it would create models for all related objects, which you don't need.
ok try this:
includes(:tournaments).distinct.where.not(tournaments: {id: tournament.id})

Get records with no related data using activerecord and RoR3?

I am making scopes for a model that looks something like this:
class PressRelease < ActiveRecord::Base
has_many :publications
end
What I want to get is all press_releases that does not have publications, but from a scope method, so it can be chained with other scopes. Any ideas?
Thanks!
NOTE: I know that there are methods like present? or any? and so on, but these methods does not return an ActiveRecord::Relation as scope does.
NOTE: I am using RoR 3
Avoid eager_loading if you do not need it (it adds overhead). Also, there is no need for subselect statements.
scope :without_publications, -> { joins("LEFT OUTER JOIN publications ON publications.press_release_id = press_releases.id").where(publications: { id: nil }) }
Explanation and response to comments
My initial thoughts about eager loading overhead is that ActiveRecord would instantiate all the child records (publications) for each press release. Then I realized that the query will never return press release records with publications. So that is a moot point.
There are some points and observations to be made about the way ActiveRecord works. Some things I had previously learned from experience, and some things I learned exploring your question.
The query from includes(:publications).where(publications: {id: nil}) is actually different from my example. It will return all columns from the publications table in addition to the columns from press_releases. The publication columns are completely unnecessary because they will always be null. However, both queries ultimately result in the same set of PressRelease objects.
With the includes method, if you add any sort of limit, for example chaining .first, .last or .limit(), then ActiveRecord (4.2.4) will resort to executing two queries. The first query returns IDs, and the second query uses those IDs to get results. Using the SQL snippet method, ActiveRecord is able to use just one query. Here is an example of this from one of my applications:
Profile.includes(:positions).where(positions: { id: nil }).limit(5)
# SQL (0.8ms) SELECT DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "positions" ON "positions"."profile_id" = "profiles"."id" WHERE "positions"."id" IS NULL LIMIT 5
# SQL (0.8ms) SELECT "profiles"."id" AS t0_r0, ..., "positions"."end_year" AS t1_r11 FROM "profiles" LEFT OUTER JOIN "positions" ON "positions"."profile_id" = "profiles"."id" # WHERE "positions"."id" IS NULL AND "profiles"."id" IN (107, 24, 7, 78, 89)
Profile.joins("LEFT OUTER JOIN positions ON positions.profile_id = profiles.id").where(positions: { id: nil }).limit(5)
# Profile Load (1.0ms) SELECT "profiles".* FROM "profiles" LEFT OUTER JOIN positions ON positions.profile_id = profiles.id WHERE "positions"."id" IS NULL LIMIT 5
Most importantly
eager_loading and includes were not intended to solve the problem at hand. And for this particular case I think you are much more aware of what is needed than ActiveRecord is. You can therefore make better decisions about how to structure the query.
you can de the following in your PressRelease:
scope :your_scope, -> { where('id NOT IN(select press_release_id from publications)') }
this will return all PressRelease record without publications.
Couple ways to do this, first one requires two db queries:
PressRelease.where.not(id: Publications.uniq.pluck(:press_release_id))
or if you don't want to hardcode association foreign key:
PressRelease.where.not(id: PressRelease.uniq.joins(:publications).pluck(:id))
Another one is to do a left join and pick those without associated elements - you get a relation object, but it will be tricky to work with it as it already has a join on it:
PressRelease.eager_load(:publications).where(publications: {id: nil})
Another one is to use counter_cache feature. You will need to add publication_count column to your press_releases table.
class Publications < ActiveRecord::Base
belongs_to :presss_release, counter_cache: true
end
Rails will keep this column in sync with a number of records associated to given mode, so then you can simply do:
PressRelease.where(publications_count: [nil, 0])

ActiveRecord: Adding condition to ON clause for includes

I have a model offers and another historical_offers, one offer has_many historical_offers.
Now I would like to eager load the historical_offers of one given day for a set of offers, if it exists. For this, I think I need to pass the day to the ON clause, not the WHERE clause, so that I get all offers, also when there is no historical_offer for the given day.
With
Offer.where(several_complex_conditions).includes(:historical_offers).where("historical_offers.day = ?", Date.today)
I would get
SELECT * FROM offers
LEFT OUTER JOIN historical_offers
ON offers.id = historical_offers.offer_id
WHERE day = '2012-11-09' AND ...
But I want to have the condition in the ON clause, not in the WHERE clause:
SELECT * FROM offers
LEFT OUTER JOIN historical_offers
ON offers.id = historical_offers.offer_id AND day = '2012-11-09'
WHERE ...
I guess I could alter the has_many definition with a lambda condition for a specific date, but how would I pass in a date then?
Alternatively I could write the joins mysqlf like this:
Offer.where(several_complex_conditions)
.joins(["historical_offers ON offers.id = historical_offers.offer_id AND day = ?", Date.today])
But how can I hook this up so that eager loading is done?
After a few hours headscratching and trying all sorts of ways to accomplish eager loading of a constrained set of associated records I came across #dbenhur's answer in this thread which works fine for me - however the condition isn't something I'm passing in (it's a date relative to Date.today). Basically it is creating an association with the conditions I wanted to put into the LEFT JOIN ON clause into the has_many condition.
has_many :prices, order: "rate_date"
has_many :future_valid_prices,
class_name: 'Price',
conditions: ['rate_date > ? and rate is not null', Date.today-7.days]
And then in my controller:
#property = current_agent.properties.includes(:future_valid_prices).find_by_id(params[:id])

Rails 3 user matching-algorithm to SQL Query (COMPLICATED)

I'm currently working on an app that matches users based on answered questions.
I realized my algorithm in normal RoR and ActiveRecord queries but it's waaay to slow to use it. To match one user with 100 other users takes
Completed 200 OK in 17741ms (Views: 106.1ms | ActiveRecord: 1078.6ms)
on my local machine. But still...
I now want to realize this in raw SQL in order to gain some more performance. But I'm really having trouble getting my head around SQL queries inside of SQL queries and stuff like this plus calculations etc. My head is about to explode and I don't even know where to start.
Here's my algorithm:
def match(user)
#a_score = (self.actual_score(user).to_f / self.possible_score(user).to_f) * 100
#b_score = (user.actual_score(self).to_f / user.possible_score(self).to_f) * 100
if self.common_questions(user) == []
0.to_f
else
match = Math.sqrt(#a_score * #b_score) - (100 / self.common_questions(user).count)
if match <= 0
0.to_f
else
match
end
end
end
def possible_score(user)
i = 0
self.user_questions.select("question_id, importance").find_each do |n|
if user.user_questions.select(:id).find_by_question_id(n.question_id)
i += Importance.find_by_id(n.importance).value
end
end
return i
end
def actual_score(user)
i = 0
self.user_questions.select("question_id, importance").includes(:accepted_answers).find_each do |n|
#user_answer = user.user_questions.select("answer_id").find_by_question_id(n.question_id)
unless #user_answer == nil
if n.accepted_answers.select(:answer_id).find_by_answer_id(#user_answer.answer_id)
i += Importance.find_by_id(n.importance).value
end
end
end
return i
end
So basically a user answers a questions, picks what answers he accepts and how important that question is to him. The algorithm then checks what questions 2 users have in common, if user1 gave an answer user2 accepts, if yes then the importance user2 gave for each question is added which makes up the score user1 made. Also the other way around for user2. Divided by the possible score gives the percentage and both percentages applied to the geometric mean gives me one total match percentage for both users. Fairly complicated I know. Tell if I didn't explain it good enough. I just hope I can express this in raw SQL. Performance is everything in this.
Here are my database tables:
CREATE TABLE "users" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "username" varchar(255) DEFAULT '' NOT NULL); (left some unimportant stuff out, it's all there in the databse dump i uploaded)
CREATE TABLE "user_questions" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "user_id" integer, "question_id" integer, "answer_id" integer(255), "importance" integer, "explanation" text, "private" boolean DEFAULT 'f', "created_at" datetime);
CREATE TABLE "accepted_answers" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "user_question_id" integer, "answer_id" integer);
I guess the top of the SQL query has to look something like this?
SELECT u1.id AS user1, u2.id AS user2, COALESCE(SQRT( (100.0*actual_score/possible_score) * (100.0*actual_score/possible_score) ), 0) AS match
FROM
But since I'm not an SQL master and can only do the usual stuff my head is about to explode.
I hope someone can help me figure this out. Or atleast improve my performance somehow! Thanks so much!
EDIT:
So based on Wizard's answer I've managed to get a nice SQL statement for "possible_score"
SELECT SUM(value) AS sum_id
FROM user_questions AS uq1
INNER JOIN importances ON importances.id = uq1.importance
INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = 101
WHERE uq1.user_id = 1
I've tried to get the "actual_score" with this but it didn't work. My database manager crashed when I executed this.
SELECT SUM(imp.value) AS sum_id
FROM user_questions AS uq1
INNER JOIN importances imp ON imp.id = uq1.importance
INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = 101
INNER JOIN accepted_answers as ON as.user_question_id = uq1.id AND as.answer_id = uq2.answer_id
WHERE uq1.user_id = 1
EDIT2
Okay I'm an idiot! I can't use "as" as an alias of course. Changed it to aa and it worked! W00T!
I know you were thinking about moving to a SQL solution, but there are some major performance improvements which can be made to your Ruby code which might eliminate the need to use hand-coded SQL. When optimizing your code it is often worth using a profiler to make sure you really know which parts are the problem. In your example I think some big improvements can be made by removing iterative code and database queries which are executed during each iteration!
Also, if you are using a recent version of ActiveRecord you can generate queries with subselects without the need to code any SQL. Of course it is important that you have proper indexes created for your database.
I'm making a lot of assumptions about your models and relationships based on what I can infer from your code. If I'm wrong let me know and I'll try to make some adjustments accordingly.
def match(user)
if self.common_questions(user) == []
0.to_f
else
# Move a_score and b_score calculation inside this conditional branch since it is otherwise not needed.
#a_score = (self.actual_score(user).to_f / self.possible_score(user).to_f) * 100
#b_score = (user.actual_score(self).to_f / user.possible_score(self).to_f) * 100
match = Math.sqrt(#a_score * #b_score) - (100 / self.common_questions(user).count)
if match <= 0
0.to_f
else
match
end
end
end
def possible_score(user)
# If user_questions.importance contains ID values of importances, then you should set up a relation between UserQuestion and Importance.
# I.e. UserQuestion belongs_to :importance, and Importance has_many :user_questions.
# I'm assuming that user_questions represents join models between users and questions.
# I.e. User has_many :user_questions, and User has_many :questions, :through => :user_questions.
# Question has_many :user_questions, and Question has_many :users, :through => :user_questions
# From your code this seems like the logical setup. Let me know if my assumption is wrong.
self.user_questions.
joins(:importance). # Requires the relation between UserQuestion and Importance I described above
where(:question_id => Question.joins(:user_questions).where(:user_id => user.id)). # This should create a where clause with a subselect with recent versions of ActiveRecord
sum(:value) # I'm also assuming that the importances table has a `value` column.
end
def actual_score(user)
user_questions.
joins(:importance, :accepted_answers). # It looks like accepted_answers indicates an answers table
where(:answer_id => Answer.joins(:user_questions).where(:user_id => user.id)).
sum(:value)
end
UserQuestion seems to be a super join model between User, Question, Answer and Importance. Here are the model relations relevant to the code (not including the has_many :through relations you could create). I think you probably have these already:
# User
has_many :user_questions
# UserQuestion
belongs_to :user
belongs_to :question
belongs_to :importance, :foreign_key => :importance # Maybe rename the column `importance` to `importance_id`
belongs_to :answer
# Question
has_many :user_questions
# Importance
has_many :user_questions
# Answer
has_many :user_questions
So here's my new match function. I couldn't put everything in one query yet, because SQLite doesn't support math functions. But as soon as I switch to MySQL I will put everything in just one query. All this already gave me a HUGE performance boost of:
Completed 200 OK in 528ms (Views: 116.5ms | ActiveRecord: 214.0ms)
to match one user with 100 other users. Quite good! I'll have to see how good it performs once I fill my database with 10k fake users. And extra cudos to "Wizard of Ogz" for pointing out my inefficient code!
EDIT:
tried it with only 1000 users, 10 to 100 UserQuestions each, and ...
Completed 200 OK in 104871ms (Views: 2146.0ms | ActiveRecord: 93780.5ms)
... boy did that take long! I will have to think of something to tackle this problem.
def match(user)
if self.common_questions(user) == []
0.to_f
else
#a_score = UserQuestion.find_by_sql(["SELECT 100.0*as1.actual_score/ps1.possible_score AS match
FROM (SELECT SUM(imp.value) AS actual_score
FROM user_questions AS uq1
INNER JOIN importances imp ON imp.id = uq1.importance
INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = ?
INNER JOIN accepted_answers aa ON aa.user_question_id = uq1.id AND aa.answer_id = uq2.answer_id
WHERE uq1.user_id = ?) AS as1, (SELECT SUM(value) AS possible_score
FROM user_questions AS uq1
INNER JOIN importances ON importances.id = uq1.importance
INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = ?
WHERE uq1.user_id = ?) AS ps1",user.id, self.id, user.id, self.id]).collect(&:match).first.to_f
#b_score = UserQuestion.find_by_sql(["SELECT 100.0*as1.actual_score/ps1.possible_score AS match
FROM (SELECT SUM(imp.value) AS actual_score
FROM user_questions AS uq1
INNER JOIN importances imp ON imp.id = uq1.importance
INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = ?
INNER JOIN accepted_answers aa ON aa.user_question_id = uq1.id AND aa.answer_id = uq2.answer_id
WHERE uq1.user_id = ?) AS as1, (SELECT SUM(value) AS possible_score
FROM user_questions AS uq1
INNER JOIN importances ON importances.id = uq1.importance
INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = ?
WHERE uq1.user_id = ?) AS ps1",self.id, user.id, self.id, user.id]).collect(&:match).first.to_f
match = Math.sqrt(#a_score * #b_score) - (100 / self.common_questions(user).count)
if match <= 0
0.to_f
else
match
end
end
end

Nested sql queries in rails when :has_and_belongst_to_many

In my application I the next task that has not already been done by a user. I have Three models, A Book that has many Tasks and then I have a User that has has and belongs to many tasks. The table tasks_users table contains all completed tasks so I need to write a complex query to find the next task to perform.
I have came up with two solutions in pure SQL that works, but I cant translate them to rails, thats what I need help with
SELECT * FROM `tasks`
WHERE `tasks`.`book_id` = #book_id
AND `tasks`.`id` NOT IN (
SELECT `tasks_users`.`task_id`
FROM `tasks_users`
WHERE `tasks_users`.`user_id` = #user_id)
ORDER BY `task`.`date` ASC
LIMIT 1;
and equally without nested select
SELECT *
FROM tasks
LEFT JOIN tasks_users
ON tasks_users.tasks_id = task.id
AND tasks_users.user_id = #user_id
WHERE tasks_users.task_id IS NULL
AND tasks.book_id = #book_id
LIMIT 1;
This is what I Have done in rails with the MetaWhere plugin
book.tasks.joins(:users.outer).where(:users => {:id => nil})
but I cant figure out how to get the current user there too,
Thanks for any help!
I think this will duplicate the second form with the LEFT JOIN:
class Task < ActiveRecord::Base
scope :next_task, lambda { |book,user| book.tasks.\
joins("LEFT JOIN task_users ON task_users.task_id=tasks.id AND task_users.user_id=#{user.id}").\
where(:tasks=>{:task_users=>{:task_id=>nil}}).\
order("date DESC").limit(1) }
end
Note that instead of tasks_users this uses the table name task_user, which is more typical for a join model. Also, it needs to be called with:
Task.next_task(#book_id,#user_id)
book.tasks.where("tasks.id not in (select task_id from tasks_users where user_id=?)", #user_id).first
That would give you the first task that doesn't already have an entry in tasks_users for the current user.