Problems with Rails 3 Active Record Query Interface with join on ID - sql

I have been having a problem with the Rails 3 Active Record Query Interface. I have a lookup table (lookups), a Main table (through_references), and a through/join table called through_tables. Thus this is a HABTM configuration that I have set up using has_many :through.
Update: Of special note here is that when I am doing these joins, I have been joining on IDs, to provide filtering of records. It seems that this does not work with Active Record Query Interface. If you do not want to see the gory details of my travails, you can skip down to see my workaround below.
We are also going to have a number of Main Items (through_references table) should be able to have any combination of lookup items, and to conveniently be able to click the relevant lookup items say through check boxes.
I have posted the code on github. There is quite a lot more explanations on the github source code. to see the results, go to the lookups index page. Note that you will need to create the records using the scaffold code.
I also have the code up and running on heroku, with more explanations and examples.
class Lookup < ActiveRecord::Base
has_many :fk_references
has_many :through_tables
has_many :through_references, :through => :through_tables
attr_accessible :name, :value
end
class ThroughTable < ActiveRecord::Base
belongs_to :through_reference
belongs_to :lookup
attr_accessible :description, :through_reference_id, :lookup_id
end
class ThroughReference < ActiveRecord::Base
has_many :through_tables
has_many :lookups, :through => :through_tables
attr_accessible :description
end
If we want to have a listing if all the lookup items, and the Main Items that correspond with them, we can LEFT JOIN the ‘lookups’ table with the Main Items (through_references) table.
Corresponding SQL:
SELECT * FROM lookups
LEFT OUTER JOIN through_tables ON (lookups.id = through_tables.lookup_id AND through_tables.through_reference_id = 1)
LEFT OUTER JOIN through_references ON through_references.id = through_tables.through_reference_id
ORDER BY lookups.id
Returned records:
1;“Lookup Item 1”;“1”;“2012-06-06 17:14:40.819791”;“2012-06-06 17:14:40.819791”;1;1;1;“Main Item 1 has Lookup item 1”;“2012-06-06 17:17:31.355425”;“2012-06-06 17:17:31.355425”;1;“Main Item 1”;“2012-06-06 17:16:30.004375”;“2012-06-06 17:16:30.004375”
2;“Lookup Item 2”;“2”;“2012-06-06 17:14:59.584756”;“2012-06-06 17:14:59.584756”;;;;“”;“”;“”;;“”;“”;“”
3;“Lookup Item 3”;“3”;“2012-06-06 17:15:14.700239”;“2012-06-06 17:15:14.700239”;2;1;3;“Main Item 1 has Lookup item 3”;“2012-06-06 17:17:53.169715”;“2012-06-06 17:17:53.169715”;1;“Main Item 1”;“2012-06-06 17:16:30.004375”;“2012-06-06 17:16:30.004375”
This is what I expected.
=== Active Record Query Interface using custom left join
Lookup.joins(“LEFT OUTER JOIN through_tables ON (lookups.id = through_tables.lookup_id AND through_tables.through_reference_id = 1)” ).includes(:through_references).order(‘lookups.id’)
What is returned from Active Record Query Interface (note I navigate down through the Active Record hierarchy):
Lookup ID Lookup Name Lookup Value Through Table ID Through Table Description Main Item ID Main Item Description
1 Lookup Item 1 1 1 Main Item 1 has Lookup item 1 1 Main Item 1
1 Lookup Item 1 1 3 Main Item 2 has Lookup item 1 2 Main Item 2
2 Lookup Item 2 2 4 Main Item 2 has Lookup item 2 2 Main Item 2
3 Lookup Item 3 3 2 Main Item 1 has Lookup item 3 1 Main Item 1
This is NOT what I expected.
What we have here is identical to the simple left join (without the AND clause). This tells me that the AND clause is being ignored in the Active Record Query Interface.
=== Active Record Query Interface using find_by_sql approach
Lookup.find_by_sql("SELECT * FROM lookups LEFT OUTER JOIN through_tables ON (through_tables.lookup_id = lookups.id AND through_tables.through_reference_id = 1) LEFT OUTER JOIN through_references ON through_references.id = through_tables.through_reference_id ORDER BY lookups.value, through_references.id" )
What is returned from Active Record Query Interface (note I navigate down through the Active Record hierarchy)::
Lookup ID Lookup Name Lookup Value Through Table ID Through Table Description Main Item ID Main Item Description
1 Lookup Item 1 1 3 Main Item 2 has Lookup item 1 2 Main Item 2
1 Lookup Item 1 1 1 Main Item 1 has Lookup item 1 1 Main Item 1
Lookup Item 2 2 No through_tables entry
1 Lookup Item 3 3 3 Main Item 2 has Lookup item 1 2 Main Item 2
1 Lookup Item 3 3 1 Main Item 1 has Lookup item 1 1 Main Item 1
The results here are crazy!
Is this a BUG, is this the intended effects, or am I missing something ?
I hope there is a clean way of doing this, without having to generate two result sets, and merge them by code.

I have found a work-around. The issue seems to be that Active Record will not recognize joins that filter on an ID (LEFT OUTER JOIN xyz ON xyz.id = ID).
My work-around involves creating a stored procedure or function that takes the ID in as a parameter, does the join in the Database, and returns a nice flat recordset.
see: Heroku demo page (skip to bottom)
Note, I am not marking this as a solution, because this is a work-around, and nothing to do with active record.

Well, reading the github project, I see this:
What I really want to do is have a list of all of the lookup items,
and if there are matching Main Items, have them appended on to the
returned record, and if not, I want nulls. This is a technique that I
have used for over 10 years.
I'm thinking that problem is exactly that you want to do it that way, when it would be more natural to let rails eager loading handle it, and so you've gotten fixated on fetching everything in a single massive join.
What I would do is something like:
Lookup.where( .. insert any needed conditions here ...).includes(:through_tables)
Then ActiveQuery will then fetch all the Lookup in one query, and then use eager loading to fetch any associations named in the includes statement, one query per association.
Note I'm not saying that joins are bad, just saying that this is a more natural way to do it in rails. I like to use the Preloader http://apidock.com/rails/ActiveRecord/Associations/Preloader to separate out the decision about what to eager load from the decision about which data to fetch. I find that helpful in controllers - let the model decide what the conditions are, but let the controller decide which objects it'll need to eager load.
HTH

Related

How to connect ransacker query to ransack sort search parameter

Problem:
I am using the ransack gem to sort columns in a table. I have 2 models: Campaign and Course. A campaign has many courses, and a course belongs to one campaign. Each course has a number of total_attendees. My Campaigns table has a column for Total Attendees, and I want it to be sortable. So it would sum up the total_attendees field for each course that belongs to a single campaign, and sort based on that sum.
Ex. A campaign has 3 courses, each with 10 attendees. The Total Attendees column on the campaign table would show 30 and it would be sortable against total attendees for all the other campaigns.
I found ransackers:
https://github.com/activerecord-hackery/ransack/wiki/Using-Ransackers
and this SO question: Ransack sort by sum of relation
and from that put together a lot of what is below.
From Model - campaign.rb:
class Campaign < ApplicationRecord
has_many :courses
ransacker :sum_of_total_attendees do
query = "SELECT SUM(r.total_attendees)
FROM campaigns c
LEFT OUTER JOIN courses r
ON r.campaign_id = c.id
GROUP BY c.id"
Arel.sql(query)
end
end
From Model - course.rb:
class Course < ApplicationRecord
belongs_to :campaign, optional: true
end
View:
<th scope="col"><%= sort_link(#q, :sum_of_total_attendees, 'Total Attendees') %></th>
Controller - campaigns_controller.rb:
all_campaigns = Campaign.all
#q = all_campaigns.ransack(params[:q])
#campaigns = #q.result
Errors:
The ransacker query gives me the data I want, but I don't know what to do to get the right information .
Originally, when I clicked on the th link to sort the data, I got this error:
PG::CardinalityViolation: ERROR: more than one row returned by a
subquery used as an expression
I don't know what changed, but now I'm getting this error:
PG::SyntaxError: ERROR: syntax error at or near "SELECT"
LINE 1: SELECT "campaigns".* FROM "campaigns" ORDER BY SELECT SUM(r....
^
: SELECT "campaigns".* FROM "campaigns" ORDER BY SELECT
SUM(r.total_attendees)
FROM campaigns c
LEFT OUTER JOIN courses r
ON r.campaign_id = c.id
GROUP BY c.id ASC
This error seems to say that the ransack search parameter, #q and the ransacker query don't work together. There are two selects in this request, when there should definitely be only one, but the first one is coming from ransack, so I'm not sure how to address it.
How do I get my query to sort correctly with ransack?
Articles I've looked at but did not seem to apply to what I was looking to accomplish with this story:
Ransack Sort By Sum of Relation: This is the one I worked from a lot, but I'm not sure why it works for this user and not for me. They don't show what is changed, if anything, in the controller
Ransack Github Issue For Multiple Params: This doesn't cover the issue of summing table columns.
Rails Ransack Sorting Searching Based On A Definition In The Model: This didn't apply to my need to sort based on summed data.
Three Ways to Bend The Ransack Gem: This looks like what I was doing, but I'm not sure why theirs is working but mine isn't.

Rails 5 - ActiveRecord - Filtering has_many relation with includes and not exclude empty values

Let me explain the title.
I have model A that has_many model B.
I want to filter model B by month and year of the date while showing all model A's
so for example:
A1 -> 3 Bs
A2 -> 0 Bs
A3 -> 1 B
This is my query right now:
A.includes(:b_relation)
.where("extract(month from b.date) = #{month}").references(:b_relation)
.where("extract(year from b.date) = #{year}").references(:b_relation)
.all
It works! BUT it only gives me the A that has at least one B. The A's that have none don't show.
How can I make the query include the model A's that don't have any B's?
The query you're doing now is using a INNER JOIN which will exclude records from A that have no associated Bs. What you want instead is a LEFT OUTER JOIN—aka a LEFT JOIN. Left joins include all rows from the parent table, whether or not there are any associated records from the associated table.
I always find this image useful for visualizing SQL join types:
Rails 5 has a left_outer_joins method for this (alias: left_joins):
A.left_outer_joins(:b_relation)
In earlier versions of Rails, it's more manual (I'm just making up the table names here):
A.joins('LEFT OUTER JOIN "bs" ON "bs"."a_id" = "as"."id"')

ActiveRecord query to sum over joins

I want to get the sum of the receipt items that are in a particular budget (same title) and from the current query I'm getting to many record and obvious wrong sum of amounts from the receipt items.
My current attempt is looking like that in ActiveRecord (AR):
ReceiptItem.includes(donation: [:budgets]).joins(:donation, :receipt).where(budgets: {title: "Some title 2015"}).sum(:amount)
and my SQL attempt was looking like that (its also wrong):
-- want to test just the outcome its not actually not summing up the amounts
SELECT "receipt_items"."amount"
FROM
"receipt_items" INNER JOIN "donations" ON "donations"."id" = "receipt_items"."donation_id"
RIGHT JOIN "receipts" ON "receipts"."receipt_id" = "receipt_items"."receipt_id"
LEFT OUTER JOIN "budgets" ON "budgets"."donation_id" = "donations"."id"
WHERE "budgets"."title" = 'Some title 2015';
Why I'm getting double records although I've joined the tables and set also the condition?
Here is the ER modell to understand the problem.
And here's the AR Assoziations:
class Budget < ActiveRecord::Base
belongs_to :donation
class Donation < ActiveRecord::Base
has_many :receipt_items
has_many :budgets
class ReceiptItem < ActiveRecord::Base
belongs_to :donation
Because a budget can be linked to a reciept item multiple times, via different donations, it's appearing in the big join table multiple times, and thus being counted several times.
Let's try to think this through a step at a time. If you wanted to do it without worrying about eager loading, you would do:
Budget.where(title: "some title").all.collect(&:donation).collect(&:receipt_items).flatten.uniq.collect(&:amount).sum
is that right?
If so, you can tailor the eager loading to fit this chain of method calls:
Budget.where(title: "some title", include: {:donation => [:receipt_items]}).all.collect(&:donation).collect(&:receipt_items).uniq.collect(&:amount).sum
try that?

Rails: select unique values from 2 models with belongs_to and has_many relation

I have 2 models: Player and Item
Player Model:
class Player < ActiveRecord::Base
has_many :items
end
Item Model:
class Item < ActiveRecord::Base
belongs_to :player
end
I'm trying to store an array of unique players (only their id and names) that created the 5 most recent items.
It works if I make a loop and do the query one by one
Item.order('created_at DESC').limit(5).first.player.id
Item.order('created_at DESC').limit(5).first.player.name
But I'm wondering if there's a way to use pluck to do the query without a loop and also only take unique values of players?
Thanks!
Do you want to get an array of unique players that created the 5 most recent items or do you want to get an array of the 5 most recent unique players that created an item?
The latter will always return 5 players (when you've got 5 or more players that have created items) where as the former might only return one single player (since he could have created the 5 most recent items).
If you want the players of the 5 most recent items, try this:
player_ids = Item.order(created_at :desc).limit(5).pluck(:player_id).uniq
Player.where(id: player_ids).pluck(:id, :name)
If you want the 5 most recent unique players that've created an item, try this:
player_ids = Item.order(created_at :desc).pluck(:player_id).uniq.first(5)
Player.where(id: player_ids).pluck(:id, :name)
You could even use a join with the player association. Note that you've got to use the table name in your pluck now:
Item.joins(:player).order(created_at :desc).limit(5).pluck('players.id', 'players.name').uniq
or
Item.joins(:player).order(created_at :desc).pluck('players.id', 'players.name').uniq.first(5)

rails ancestry pagination

I've just followed the Railscast tutorial:
http://railscasts.com/episodes/262-trees-with-ancestry
Is it possible to paginate results from Ancestry which have been arranged?
eg: Given I have the following in my Message controller:
def index
#messages = Message.arrange(:order => :name)
end
Then how would I paginate this as it's going to result in a hash?
Update
I found that if I use .keys then it will paginate, but only the top level not the children.
Message.scoped.arrange(:order => :name).keys
Update
Each message has a code and some content. I can have nested messages
Suppose I have
code - name
1 - Test1
1 - test1 sub1
2 - test1 sub2
2 - Test2
1 - test2 sub1
2 - test2 sub2
3 - test2 sub3
This is how I want to display the listing, but I also want to paginate this sorted tree.
It is possible but I've only managed to do it using two database trips.
The main issue stems from not being able to set limits on a node's children, which leads to either a node's children being truncated or children being orphaned on subsequent pages.
An example:
id: 105, Ancestry: Null
id: 117, Ancestry: 105
id: 118, Ancestry: 105/117
id: 119, Ancestry: 105/117/118
A LIMIT 0,3 (for the sake of the example above) would return the first three records, which will render all but id:119. The subsequent LIMIT 3,3 will return id: 119 which will not render correctly as its parents are not present.
One solution I've employed is using two queries:
The first returns root nodes only. These can be sorted and it is this query that is paginated.
A second query is issued, based on the first, which returns all children of the paginated parents. You should be able to sort children per level.
In my case, I have a Post model (which has_ancestry) . Each post can have any level of replies. Also a post object has a replies count which is a cache counter for its immediate children.
In the controller:
roots = #topic.posts.roots_only.paginate :page => params[:page]
#posts = Post.fetch_children_for_roots(#topic, roots)
In the Post model:
named_scope :roots_only, :conditions => 'posts.ancestry is null'
def self.fetch_children_for_roots(postable, roots)
unless roots.blank?
condition = roots.select{|r|r.replies_count > 0}.collect{|r| "(ancestry like '#{r.id}%')"}.join(' or ')
unless condition.blank?
children = postable.posts.scoped(:from => 'posts FORCE INDEX (index_posts_on_ancestry)', :conditions => condition).all
roots.concat children
end
end
roots
end
Some notes:
MySQL will stop using the ancestry column index if multiple LIKE statements are used. The FORCE INDEX forces mySQL to use the index and prevents a full table scan
LIKE statements are only built for nodes with direct children, so that replies_count column came in handy
What the class method does is appends children to root, which is a WillPaginate::Collection
Finally, these can be managed in your view:
=will_paginate #posts
-Post.arrange_nodes(#posts).each do |post, replies|
=do stuff here
The key method here is arrange_nodes which is mixed in from the ancestry plugin and into your model. This basically takes a sorted Array of nodes and returns a sorted and hierarchical Hash.
I appreciate that this method does not directly address your question but I hope that the same method, with tweaks, can be applied for your case.
There is probably a more elegant way of doing this but overall I'm happy with the solution (until a better one comes along).