Trying to figure out an efficient ways to select records that have attributes across multiple tables. Here's the basic setup:
structure
Plants (fields: id, name_id, location_id, color) (1000 records)
Names (fields: id, Common_name) (50 records)
Location (fields: id, Bed_name) (125 records)
model
Plants - belongs_to Names, belongs_to Location
Names - has_many Plants
Location - has_many Plants
My goal is to output a list of every Rose in the side yard, and display the color, but I am stuck on the select command. If I get all plants (p = Plant.all) I know that I can easily create my output with a statement like <%= "#{p.name.common_name} in bed #{p.location.bed_name} has a color of #{p.color}" %>
If I do two joins I'm looking at way more records that I need and a MUCH longer search time. As an example - I have 67 roses in 16 different beds, however, I only have 3 roses in the side yard.
My gut tells me that I should be able to do something like:
select all plants with the name of Rose, then from this selection select all Roses that are in the side yard.
Can anybody help point me in the correct direction?
You can combine it all into a single query like this:
Plant.joins(:name, :location).where(names: { common_name: "rose" }, locations: { bed_name: "side" })
This results in a single SQL query like this:
SELECT "plants".* FROM "plants" INNER JOIN "names" ON "names"."id" = "plants"."name_id" INNER JOIN "locations" ON "locations"."id" = "plants"."location_id" WHERE "names"."common_name" = 'rose' AND "locations"."bed_name" = 'side'
Note that you have to use the plural table names in the where clause, but the singular association name in the joins clause.
This will run nearly instantaneously even with enormous tables, assuming your tables are properly indexed.
This is a simple example, but you can do fairly complex joins with conditions. Full details can be found in the ActiveRecord documentation.
Edit
Per #Dan's comment, you can speed this up more by using includes to pre-fetch the association data in the join:
Plant.includes(:name, :location).where(names: { common_name: "rose" }, locations: { bed_name: "side" })
This will load the related records from names and locations at the same time. includes is handy for eliminating (or at least reducing) N+1 queries. It is also smart enough to know when it can retrieve all the data in a single query, and falls back to multiple queries when that makes more sense; you don't have to think about it (although sometimes it can reduce efficiency, so keep an eye on your logs if you think it's reduces performance).
Using includes in this case is very efficient, resulting in a single SQL query which includes association data:
SELECT "plants"."id" AS t0_r0, "plants"."color" AS t0_r1, "plants"."name_id" AS t0_r2, "plants"."location_id" AS t0_r3, "plants"."created_at" AS t0_r4, "plants"."updated_at" AS t0_r5, "names"."id" AS t1_r0, "names"."common_name" AS t1_r1, "names"."created_at" AS t1_r2, "names"."updated_at" AS t1_r3, "locations"."id" AS t2_r0, "locations"."bed_name" AS t2_r1, "locations"."created_at" AS t2_r2, "locations"."updated_at" AS t2_r3 FROM "plants" LEFT OUTER JOIN "names" ON "names"."id" = "plants"."name_id" LEFT OUTER JOIN "locations" ON "locations"."id" = "plants"."location_id" WHERE "names"."common_name" = 'rose' AND "locations"."bed_name" = 'side'
Related
Problem:
I am using the ransack gem to sort columns in a table. I have 2 models: Campaign and Course. A campaign has many courses, and a course belongs to one campaign. Each course has a number of total_attendees. My Campaigns table has a column for Total Attendees, and I want it to be sortable. So it would sum up the total_attendees field for each course that belongs to a single campaign, and sort based on that sum.
Ex. A campaign has 3 courses, each with 10 attendees. The Total Attendees column on the campaign table would show 30 and it would be sortable against total attendees for all the other campaigns.
I found ransackers:
https://github.com/activerecord-hackery/ransack/wiki/Using-Ransackers
and this SO question: Ransack sort by sum of relation
and from that put together a lot of what is below.
From Model - campaign.rb:
class Campaign < ApplicationRecord
has_many :courses
ransacker :sum_of_total_attendees do
query = "SELECT SUM(r.total_attendees)
FROM campaigns c
LEFT OUTER JOIN courses r
ON r.campaign_id = c.id
GROUP BY c.id"
Arel.sql(query)
end
end
From Model - course.rb:
class Course < ApplicationRecord
belongs_to :campaign, optional: true
end
View:
<th scope="col"><%= sort_link(#q, :sum_of_total_attendees, 'Total Attendees') %></th>
Controller - campaigns_controller.rb:
all_campaigns = Campaign.all
#q = all_campaigns.ransack(params[:q])
#campaigns = #q.result
Errors:
The ransacker query gives me the data I want, but I don't know what to do to get the right information .
Originally, when I clicked on the th link to sort the data, I got this error:
PG::CardinalityViolation: ERROR: more than one row returned by a
subquery used as an expression
I don't know what changed, but now I'm getting this error:
PG::SyntaxError: ERROR: syntax error at or near "SELECT"
LINE 1: SELECT "campaigns".* FROM "campaigns" ORDER BY SELECT SUM(r....
^
: SELECT "campaigns".* FROM "campaigns" ORDER BY SELECT
SUM(r.total_attendees)
FROM campaigns c
LEFT OUTER JOIN courses r
ON r.campaign_id = c.id
GROUP BY c.id ASC
This error seems to say that the ransack search parameter, #q and the ransacker query don't work together. There are two selects in this request, when there should definitely be only one, but the first one is coming from ransack, so I'm not sure how to address it.
How do I get my query to sort correctly with ransack?
Articles I've looked at but did not seem to apply to what I was looking to accomplish with this story:
Ransack Sort By Sum of Relation: This is the one I worked from a lot, but I'm not sure why it works for this user and not for me. They don't show what is changed, if anything, in the controller
Ransack Github Issue For Multiple Params: This doesn't cover the issue of summing table columns.
Rails Ransack Sorting Searching Based On A Definition In The Model: This didn't apply to my need to sort based on summed data.
Three Ways to Bend The Ransack Gem: This looks like what I was doing, but I'm not sure why theirs is working but mine isn't.
I am not the best at UML/database diagrams but the following hopefully shows my DB design (MsSQL)
I have a "Location" table with zero to many pallets assiociated with it (there are 0-many pallets IN a location). However that location can be one of two types, location x or y. This diagram simplifies it, but there are many different types of location, and very different fields for each type.
I am using Sequelize as an ORM, and trying to figure out how to do a particular query. I think I am close but quite stuck.
What I need is:
Select a single LocationTypeX where "active" is true, and where its corresponding Location has less than 10 pallets in it.
Previously I have gone and got all LocationTypeX where "active" is true. Included Location and Pallet (on location) and did it all in code to figure out which location is relavent. however that is taking forever, as there are thousands of Locations and loads of pallets spread out through them.
All I am after is to show the Location Name. That is it. But the Location name of one that matches the above condition. Hopefully someone can help?
So far I have
models.Location.findAll({
group: ['Location.id', 'Pallets.id'],
attributes: ['Location.id', 'Pallets.id', [models.sequelize.fn('COUNT',models.sequelize.col('Pallets.id')), 'PalletCount']],
include: [{
model: models.Pallet,
attributes: []
}]
}).then((ret)=> {
console.log(ret);
});
But this doesn't do the "active" check. And also doesn't do the where clause on the amount of pallets. Back to square one
I have managed to answer my own question with raw SQL that I can use in Sequelize. However I would still be interested to know how to do it in Sequelize
models.sequelize.query(
`SELECT TOP 1 Locations.id, Locations.name FROM LocationTypeX
INNER JOIN Locations ON LocationTypeX.LocationId = Locations.id
LEFT JOIN Pallets on (Locations.id = Pallets.LocationId)
WHERE LocationTypeX.active = 1
GROUP BY Locations.id, Locations.name
HAVING COUNT(Pallets.id) < 3`
, { type: models.sequelize.QueryTypes.SELECT})
I have a mildly-complex ActiveRecord query in Rails 3.2 / Postgres that returns documents that are related and most relevant to all documents a user has favorited in the past.
The problem is that despite specifying uniq my query does not return distinct document records:
Document.joins("INNER JOIN related_documents ON
documents.docid = related_documents.docid_id")
.select("documents.*, related_documents.relevance_score")
.where("related_documents.document_id IN (?)",
some_user.favorited_documents)
.order("related_documents.relevance_score DESC")
.uniq
.limit(10)
I use a RelatedDocument join table, ranking each relation by a related_document.relevance_score which I use to order the query result before sampling the top 10. (See this question for schema description.)
The problem is that because I select("documents.*, related_documents.relevance_score"), the same document record returned multiple times with different relevance_scores are considered unique results. (i.e. if the document is a related_document for multiple favorited-documents.)
How do I return unique Documents regardless of the related_document.relevance_score?
I have tried splitting the select into two seperate selects, and changing the position of uniq in the query with no success.
Unfortunately I must select("related_documents.relevance_score") so as to order the results by this field.
Thanks!
UPDATE - SOLUTION
Thanks to Jethroo below, GROUP BY is the needed addition, giving me the follow working query:
Document.joins("INNER JOIN related_documents ON
documents.docid = related_documents.docid_id")
.select("documents.*, max(related_documents.relevance_score)")
.where("related_documents.document_id IN (?)",
some_user.favorited_documents)
.order("related_documents.relevance_score DESC")
.group("documents.id")
.uniq
.limit(10)
Have you tried to group it by documents.docid see http://guides.rubyonrails.org/active_record_querying.html#group?
I am making scopes for a model that looks something like this:
class PressRelease < ActiveRecord::Base
has_many :publications
end
What I want to get is all press_releases that does not have publications, but from a scope method, so it can be chained with other scopes. Any ideas?
Thanks!
NOTE: I know that there are methods like present? or any? and so on, but these methods does not return an ActiveRecord::Relation as scope does.
NOTE: I am using RoR 3
Avoid eager_loading if you do not need it (it adds overhead). Also, there is no need for subselect statements.
scope :without_publications, -> { joins("LEFT OUTER JOIN publications ON publications.press_release_id = press_releases.id").where(publications: { id: nil }) }
Explanation and response to comments
My initial thoughts about eager loading overhead is that ActiveRecord would instantiate all the child records (publications) for each press release. Then I realized that the query will never return press release records with publications. So that is a moot point.
There are some points and observations to be made about the way ActiveRecord works. Some things I had previously learned from experience, and some things I learned exploring your question.
The query from includes(:publications).where(publications: {id: nil}) is actually different from my example. It will return all columns from the publications table in addition to the columns from press_releases. The publication columns are completely unnecessary because they will always be null. However, both queries ultimately result in the same set of PressRelease objects.
With the includes method, if you add any sort of limit, for example chaining .first, .last or .limit(), then ActiveRecord (4.2.4) will resort to executing two queries. The first query returns IDs, and the second query uses those IDs to get results. Using the SQL snippet method, ActiveRecord is able to use just one query. Here is an example of this from one of my applications:
Profile.includes(:positions).where(positions: { id: nil }).limit(5)
# SQL (0.8ms) SELECT DISTINCT "profiles"."id" FROM "profiles" LEFT OUTER JOIN "positions" ON "positions"."profile_id" = "profiles"."id" WHERE "positions"."id" IS NULL LIMIT 5
# SQL (0.8ms) SELECT "profiles"."id" AS t0_r0, ..., "positions"."end_year" AS t1_r11 FROM "profiles" LEFT OUTER JOIN "positions" ON "positions"."profile_id" = "profiles"."id" # WHERE "positions"."id" IS NULL AND "profiles"."id" IN (107, 24, 7, 78, 89)
Profile.joins("LEFT OUTER JOIN positions ON positions.profile_id = profiles.id").where(positions: { id: nil }).limit(5)
# Profile Load (1.0ms) SELECT "profiles".* FROM "profiles" LEFT OUTER JOIN positions ON positions.profile_id = profiles.id WHERE "positions"."id" IS NULL LIMIT 5
Most importantly
eager_loading and includes were not intended to solve the problem at hand. And for this particular case I think you are much more aware of what is needed than ActiveRecord is. You can therefore make better decisions about how to structure the query.
you can de the following in your PressRelease:
scope :your_scope, -> { where('id NOT IN(select press_release_id from publications)') }
this will return all PressRelease record without publications.
Couple ways to do this, first one requires two db queries:
PressRelease.where.not(id: Publications.uniq.pluck(:press_release_id))
or if you don't want to hardcode association foreign key:
PressRelease.where.not(id: PressRelease.uniq.joins(:publications).pluck(:id))
Another one is to do a left join and pick those without associated elements - you get a relation object, but it will be tricky to work with it as it already has a join on it:
PressRelease.eager_load(:publications).where(publications: {id: nil})
Another one is to use counter_cache feature. You will need to add publication_count column to your press_releases table.
class Publications < ActiveRecord::Base
belongs_to :presss_release, counter_cache: true
end
Rails will keep this column in sync with a number of records associated to given mode, so then you can simply do:
PressRelease.where(publications_count: [nil, 0])
There are two models with our familiar one-to-many relationship:
class Custom
has_many :orders
end
class Order
belongs_to :custom
end
I want to do the following work:
get all the custom information whose age is over 18, and how many big orders(pay for 1,000 dollars) they have?
UPDATE:
for the models:
rails g model custom name:string age:integer
rails g model orders amount:decimal custom_id:integer
I hope one left join sql statement will do all my job, and don't construct unnecessary objects like this:
Custom.where('age > ?', '18').includes(:orders).where('orders.amount > ?', '1000')
It will construct a lot of order objects which I don't need, and it will calculate the count by Array#count function which will waste time.
UPDATE 2:
My own solution is wrong, it will remove customs who doesn't have big orders from the result.
Finding adult customers with big orders
This solution uses a single query, with the nested orders relation transformed into a sub-query.
big_customers = Custom.where("age > ?", "18").where(
id: Order.where("amount > ?", "1000").select(:custom_id)
)
Grab all adults and their # of big orders (MySQL)
This can still be done in a single query. The count is grabbed via a join on orders and sticking the count of orders into a column in the result called big_orders_count, which ActiveRecord turns into a method. It involves a lot more "raw" SQL. I don't know any way to avoid this with ActiveRecord except with the great squeel gem.
adults = Custom.where("age > ?", "18").select([
Custom.arel_table["*"],
"count(orders.id) as big_orders_count"
]).joins(%{LEFT JOIN orders
ON orders.custom_id = customs.id
AND orders.amount > 1000})
# see count:
adults.first.big_orders_count
You might want to consider caching counters like this. This join will be expensive on the database, so if you had a dedicated customs.big_order_count column that was either refreshed regularly or updated by an observer that watches for big Order records.
Grab all adults and their # of big orders (PostgreSQL)
Solution 2 is mysql only. To get this to work in postgresql I created a third solution that uses a sub-query. Still one call to the DB :-)
adults = Custom.where("age > ?", "18").select([
%{"customs".*},
%{(
SELECT count(*)
FROM orders
WHERE orders.custom_id = customs.id
AND orders.amount > 1000
) AS big_orders_count}
])
# see count:
adults.first.big_orders_count
I have tested this against postgresql with real data. There may be a way to use more ActiveRecord and less SQL, but this works.
Edited.
#custom_over_18 = Custom.where("age > ?", "18").orders.where("amount > ?", "1000").count