Ruby on Rails - How to join two tables? - ruby-on-rails-3

I have two tables (subjects and pages) in one-to-many relations. I want to add criterias from subjects as well pages to parse a sql, but the progress has been very slow and often times running into problems. I'm brand new in rails, please help.
class Subject < ActiveRecord::Base
has_many :pages
end
class Page < ActiveRecord::Base
belongs_to :subject
end
sample data in subjects, listed three columns below:
id name level
1 'Math' 1
6 'Math' 2
...
Sample data in pages, listed columns below:
id name subject_id
-- -------------------- ----------
2 Addition 1
4 Subtraction 1
5 Simple Multiplication 6
6 Simple Division 6
7 Hard Multiplication 6
8 Hard Division 6
9 Elementary Divsion 1
Given that I don't know the subject.id, I only know the subject name and level, and page name. Here is the sql I want to generate (or something similar that would achieve the same result):
select subjects.id, subjects.name, pages.id, pages.name from subjects, pages
where subjects.id = pages.subject_id
and subjects.name = 'Math'
and subjects.level = '2'
and pages.name like '%Division' ;
I expect to get two rows in the result:
subjects.id subjects.name pages.id pages.name
----------- ------------- -------- -----------
6 Math 6 Simple Division
6 Math 8 Hard Division
This is a very simple sql, but I have not been able to get want I wanted in rails.
Here is my rails console:
>> subject = Subject.where(:name => 'Math', :level => 2)
Subject Load (0.4ms) SELECT `subjects`.* FROM `subjects` WHERE `subjects`.`name` = 'Math' AND `subjects`.`level` = 2
[#<Subject id: 6, name: "Math", position: 1, visible: true, created_at: "2011-12-17 04:25:54", updated_at: "2011-12-17 04:25:54", level: 2>]
>>
>> subject.joins(:pages).where(['pages.name LIKE ?', '%Division'])
Subject Load (4.2ms) SELECT `subjects`.* FROM `subjects` INNER JOIN `pages` ON `pages`.`subject_id` = `subjects`.`id` WHERE `subjects`.`name` = 'Math' AND `subjects`.`level` = 2 AND (pages.name LIKE '%Division')
[#<Subject id: 6, name: "Math", position: 1, visible: true, created_at: "2011-12-17 04:25:54", updated_at: "2011-12-17 04:25:54", level: 2>, #<Subject id: 6, name: "Math", position: 1, visible: true, created_at: "2011-12-17 04:25:54", updated_at: "2011-12-17 04:25:54", level: 2>]
>>
>> subject.to_sql
"SELECT `subjects`.* FROM `subjects` WHERE `subjects`.`name` = 'Math' AND `subjects`.`level` = 2"
>> subject.size
1
>> subject.class
ActiveRecord::Relation
1st statement: subject = Subject.where(:name => 'Math', :level => 2)
2nd statement: subject.joins(:pages).where(['pages.name LIKE ?', '%Division'])
Questions:
the results of the chained sql really returns two rows, but subject.size says only 1?
How do I tell it to return columns from :pages as well?
Why subject.to_sql still shows the sql from statement 1 only, why did it not include the chained sql from statement 2?
Essentially, what do I need to write the statements differently to parse the sql as listed above (or achieve the same result)?
Many thanks.

1) ActiveRecord is going to map your query results to objects not arbitrary returned rows, so because you based the query creation off of the Subject class it is looking at your resulting rows and figures out that it is only referring to 1 unique Subject object, so returns just that single Subject instance.
2) The column data is there, but you are working against what ActiveRecord wants to give you, which is objects. If you would rather have Pages returned, then you need to base the creation of the query on the Page class.
3) You didn't save the results of adding the join(:pages)... back into the subject variable. If you did:
subject = subject.joins(:pages).where(['pages.name LIKE ?', '%Division'])
You would get the full query when running subject.to_sql
4) To get page objects you can do something like this, notice that we are basing it off of the Page class:
pages = Page.joins(:subject).where(['subjects.name = ? AND subjects.level = ? AND pages.name LIKE ?', 'Math', 2, '%Division'])
Then to access the subject name from there for the first Page object returned:
pages[0].subject.name
Which because you have the join in the first, won't result in another SQL query. Hope this helps!

Related

How to query records that are scoped to the most recent record per association

I have a Mark model which has many mark_allocations
I need to find the number of mark_allocations that are correct, i.e.
MarkAllocation.where(status: :correct).count
However I only want to query from the most recent mark_allocations per mark.
I have already achieved this in ruby land like so:
MarkAllocation
.order(created_at: :desc)
.uniq(&:mark_id)
.select { |m| m.correct? }
.size
However this has become a performance bottleneck and I would like to do the selection at database level.
So far in my efforts I can get distinct records per mark no problem, but I am struggling to apply an order to get the most recent records per mark. I also have no idea how to go from that point, to further querying for only correct mark_allocations.
I have come up with this:
MarkAllocation
.select(:mark_id, :state, :created_at)
.order(created_at: :desc)
.distinct(:mark_id)
.where(state: :correct)
.count(:mark_id)
But I know it is not correct and I can see the ORDER clause is missing from the raw sql it outputs.
EDIT:
Here is an example of how it is currently working with the ruby cody.
mark_allocations = [
{mark_id: 1, status: :correct, created_at: 2.days.ago},
{mark_id: 1, status: :incorrect, created_at: 1.day.ago},
{mark_id: 2, status: :correct, created_at: 1.day.ago}
]
mark_allocations = mark_allocations.order(created_at: :desc).uniq(&:mark_id)
=> [
{mark_id: 1, status: :incorrect, created_at: 1.day.ago},
{mark_id: 2, status: :correct, created_at: 1.day.ago}
]
mark_allocations = mark_allocations.select { |m| m.correct? }
=> [{mark_id: 2, status: :correct, created_at: 1.day.ago}]
mark_allocations.size
=> 1
This would be one of those cases where you'd have to get your hands dirty with some SQL.
In PostgreSQL, you would need a query like this to get the records:
SELECT *
FROM (
SELECT DISTINCT ON (mark_id) *
FROM (
SELECT *
FROM marks
ORDER BY created_at DESC
) m
) mm
WHERE status = 1;
This is assuming 1 is the enum integer for your correct state.
Or, this to get the count from those records:
SELECT COUNT(*)
FROM (
SELECT DISTINCT ON (mark_id) *
FROM (
SELECT *
FROM marks
ORDER BY created_at DESC
) m
) mm
WHERE status = 1;
I would paste that into my MarkAllocation model with something like:
def self.counts
query = <<-SQL.strip_heredoc
SELECT COUNT(*)
FROM (
SELECT DISTINCT ON (mark_id) *
FROM (
SELECT *
FROM marks
ORDER BY created_at DESC
) m
) mm
WHERE status = #{states[:correct]};
SQL
ActiveRecord::Base.connection.exec_query(query)
end
Sure it doesn't look pretty but it'll be more performant.
If you want the actual unique records, you could do something like:
def self.records
nested_sql = MarkAllocation.all.order(created_at: :desc).to_sql
query = <<-SQL.strip_heredoc
SELECT *
FROM (
SELECT DISTINCT ON (mark_id) *
FROM (
#{nested_sql}
) m
) mm
WHERE status = #{states[:correct]};
SQL
find_by_sql(query)
end
The records example above shows how you might have ActiveRecord generate the nested SQL and how you would use find_by_sql to transform the results into MarkAllocation ActiveRecord results.

Rails 4 ActiveRecord: Order recrods by attribute and association if it exists

I have three models that I am having trouble ordering:
User(:id, :name, :email)
Capsule(:id, :name)
Outfit(:id, :name, :capsule_id, :likes_count)
Like(:id, :outfit_id, :user_id)
I want to get all the Outfits that belong to a Capsule and order them by the likes_count.
This is fairly trivial and I can get them like this:
Outfit.where(capsule_id: capsule.id).includes(:likes).order(likes_count: :desc)
However, I then want to also order the outfits so that if a given user has liked it, it appears higher in the list.
Example if I have the following outfit records:
Outfit(id: 1, capsule_id: 2, likes_count: 1)
Outfit(id: 2, capsule_id: 2, likes_count: 2)
Outfit(id: 3, capsule_id: 2, likes_count: 2)
And the given user has only liked outfit with id 3, the returned order should be IDs: 3, 2, 1
I'm sure this is fairly easy, but I can't seem to get it. Any help would be greatly appreciated :)
Postgres SQL with a subquery
SELECT outfits.*
FROM outfits
LEFT OUTER JOIN (SELECT likes.outfit_id, 1 AS weight
FROM likes
WHERE likes.user_id = #user_id) AS user_likes
ON user_likes.outfit_id = outfits.id
WHERE outfits.capsule_id = #capsule_id
ORDER BY user_likes.weight ASC, outfits.likes_count DESC;
Postgres gives NULL values bigger weight when ordering. I am not sure how this would look in Arel query. You can try converting it using this cheatsheets.

How do I modify this where clause to check if a column has a nil value in SQL?

I have the following query:
user.all_memberships.where("user_id = ? OR invited_id = ?", user2.id, user2.id)
This is what my membership model looks like:
#<Membership id: 4, family_tree_id: 3, user_id: 1, created_at: "2015-10-23 20:33:41", updated_at: "2015-10-23 20:33:41", relation: nil, invited_id: nil>
What I would like the query to also check is if relation != nil....but not sure how to represent that in the SQL passed in to the where clause in quotes.
In other words, I would like for it to check for the presence of user2.id in either the user_id or invited_id column. But...relation also has to be NOT nil.
How do I do that?
Edit 1
When I do the following query, as per Blindy's suggestion below:
user.all_memberships.where("relation is not null and user_id = ? OR invited_id = ?", user2.id, user2.id)
It generates this query, that seems to work but based on the SQL it generates I am nervous about it. Notice the second AND and the second part of that statement. It feels like it may return false positives occasionally.
(0.9ms) SELECT COUNT(*) FROM "memberships" WHERE (memberships.user_id = 1 OR memberships.invited_id = 1) AND (relation is not null and user_id = 2 OR invited_id = 2)
How do I customize the AR where query to be more true to what I want to do and not have any hidden gotchas?
This isn't completely clear, but I think you mean something like:
relation is not null and (user_id = ? OR invited_id = ?)

Rails: group by the count of a column

I'm not sure how to best express this question.
Let's say I have a UserSkill, which belongs_to :user and belongs_to :skill. I have a collection of Skills, and from those I have an array of skill_ids, say with .map(&:id).
I can easily use this array to do an IN type query like UserSkill.where(skill_id: skill_ids).
But I want to find the users that have the most skills from my input.
I tried writing this naively as UserSkill.where(skill_id: skill_ids).group("user_skills.user_id").order("count(user_skills.user_id) desc"), but that has a syntax error.
To further clarify, let's say we have User id: 1 and User id: 2. Our result from UserSkill.where(skill_id: skill_ids) is the following:
UserSkill user_id: 1, skill_id: 1
UserSkill user_id: 1, skill_id: 2
UserSkill user_id: 2, skill_id: 2
The result I'd be looking for would be:
User id: 1
User id: 2
What's the right query for this? And how should I be phrasing this question to begin with?
Assuming a has_many association from User to UserSkill, you could try
User.joins(:user_skills).
group("users.id").
order("COUNT(users.id) DESC").
merge(UserSkill.where(skill_id: skill_ids))
In SQL I might write this:
select users.*
from users
join user_skills on users.id = user_skills.user_id
where
user_skills.skill id in (1,2,3)
group by users.id
order by count(*) desc, users.id asc
limit 5
Which might look like this:
User.joins("user_skills on users.id = user_skills.user_id").
where("user_skills.skill_id" => skill_ids).
group("users.id").
order("count(*) desc").
limit(5)

Remove duplicate records based on multiple columns?

I'm using Heroku to host my Ruby on Rails application and for one reason or another, I may have some duplicate rows.
Is there a way to delete duplicate records based on 2 or more criteria but keep just 1 record of that duplicate collection?
In my use case, I have a Make and Model relationship for cars in my database.
Make Model
--- ---
Name Name
Year
Trim
MakeId
I'd like to delete all Model records that have the same Name, Year and Trim but keep 1 of those records (meaning, I need the record but only once). I'm using Heroku console so I can run some active record queries easily.
Any suggestions?
class Model
def self.dedupe
# find all models and group them on keys which should be common
grouped = all.group_by{|model| [model.name,model.year,model.trim,model.make_id] }
grouped.values.each do |duplicates|
# the first one we want to keep right?
first_one = duplicates.shift # or pop for last one
# if there are any more left, they are duplicates
# so delete all of them
duplicates.each{|double| double.destroy} # duplicates can now be destroyed
end
end
end
Model.dedupe
Find All
Group them on keys which you need for uniqueness
Loop on the grouped model's values of the hash
remove the first value because you want to retain one copy
delete the rest
If your User table data like below
User.all =>
[
#<User id: 15, name: "a", email: "a#gmail.com", created_at: "2013-08-06 08:57:09", updated_at: "2013-08-06 08:57:09">,
#<User id: 16, name: "a1", email: "a#gmail.com", created_at: "2013-08-06 08:57:20", updated_at: "2013-08-06 08:57:20">,
#<User id: 17, name: "b", email: "b#gmail.com", created_at: "2013-08-06 08:57:28", updated_at: "2013-08-06 08:57:28">,
#<User id: 18, name: "b1", email: "b1#gmail.com", created_at: "2013-08-06 08:57:35", updated_at: "2013-08-06 08:57:35">,
#<User id: 19, name: "b11", email: "b1#gmail.com", created_at: "2013-08-06 09:01:30", updated_at: "2013-08-06 09:01:30">,
#<User id: 20, name: "b11", email: "b1#gmail.com", created_at: "2013-08-06 09:07:58", updated_at: "2013-08-06 09:07:58">]
1.9.2p290 :099 >
Email id's are duplicate, so our aim is to remove all duplicate email ids from user table.
Step 1:
To get all distinct email records id.
ids = User.select("MIN(id) as id").group(:email,:name).collect(&:id)
=> [15, 16, 18, 19, 17]
Step 2:
To remove duplicate id's from user table with distinct email records id.
Now the ids array holds the following ids.
[15, 16, 18, 19, 17]
User.where("id NOT IN (?)",ids) # To get all duplicate records
User.where("id NOT IN (?)",ids).destroy_all
** RAILS 4 **
ActiveRecord 4 introduces the .not method which allows you to write the following in Step 2:
User.where.not(id: ids).destroy_all
Similar to #Aditya Sanghi 's answer, but this way will be more performant because you are only selecting the duplicates, rather than loading every Model object into memory and then iterating over all of them.
# returns only duplicates in the form of [[name1, year1, trim1], [name2, year2, trim2],...]
duplicate_row_values = Model.select('name, year, trim, count(*)').group('name, year, trim').having('count(*) > 1').pluck(:name, :year, :trim)
# load the duplicates and order however you wantm and then destroy all but one
duplicate_row_values.each do |name, year, trim|
Model.where(name: name, year: year, trim: trim).order(id: :desc)[1..-1].map(&:destroy)
end
Also, if you truly don't want duplicate data in this table, you probably want to add a multi-column unique index to the table, something along the lines of:
add_index :models, [:name, :year, :trim], unique: true, name: 'index_unique_models'
You could try the following: (based on previous answers)
ids = Model.group('name, year, trim').pluck('MIN(id)')
to get all valid records. And then:
Model.where.not(id: ids).destroy_all
to remove the unneeded records. And certainly, you can make a migration that adds a unique index for the three columns so this is enforced at the DB level:
add_index :models, [:name, :year, :trim], unique: true
To run it on a migration I ended up doing like the following (based on the answer above by #aditya-sanghi)
class AddUniqueIndexToXYZ < ActiveRecord::Migration
def change
# delete duplicates
dedupe(XYZ, 'name', 'type')
add_index :xyz, [:name, :type], unique: true
end
def dedupe(model, *key_attrs)
model.select(key_attrs).group(key_attrs).having('count(*) > 1').each { |duplicates|
dup_rows = model.where(duplicates.attributes.slice(key_attrs)).to_a
# the first one we want to keep right?
dup_rows.shift
dup_rows.each{ |double| double.destroy } # duplicates can now be destroyed
}
end
end
Based on #aditya-sanghi's answer, with a more efficient way to find duplicates using SQL.
Add this to your ApplicationRecord to be able to deduplicate any model:
class ApplicationRecord < ActiveRecord::Base
# …
def self.destroy_duplicates_by(*columns)
groups = select(columns).group(columns).having(Arel.star.count.gt(1))
groups.each do |duplicates|
records = where(duplicates.attributes.symbolize_keys.slice(*columns))
records.offset(1).destroy_all
end
end
end
You can then call destroy_duplicates_by to destroy all records (except the first) that have the same values for the given columns. For example:
Model.destroy_duplicates_by(:name, :year, :trim, :make_id)
I chose a slightly safer route (IMHO). I started by getting all the unique records.
ids = Model.where(other_model_id: 1).uniq(&:field).map(&:id)
Then I got all the ids
all_ids = Model.where(other_model_id: 1).map(&:id)
This allows me to do a matrix subtraction for the duplicates
dups = all_ids - ids
I then map over the duplicate ids and fetch the model because I want to ensure I have the records I am interested in.
records = dups.map do |id| Model.find(id) end
When I am sure I want to delete, I iterate again to delete.
records.map do |record| record.delete end
When deleting duplicate records on a production system, you want to be very sure you are not deleting important live data, so in this process, I can double-check everything.
So in the case above:
all_ids = Model.all.map(&:ids)
uniq_ids = Model.all.group_by do |model|
[model.name, model.year, model.trim]
end.values.map do |duplicates|
duplicates.first.id
end
dups = all_ids - uniq_ids
records = dups.map { |id| Model.find(id) }
records.map { |record| record.delete }
or something like this.
You can try this sql query, to remove all duplicate records but latest one
DELETE FROM users USING users user WHERE (users.name = user.name AND users.year = user.year AND users.trim = user.trim AND users.id < user.id);