Limit the amount of results in associated queries with Rails 3.2 - sql

I have the following query:
#books = Books.includes(:author, :pages)
.find(:all,
:order => 'created_at DESC')
Let's assume my "Pages" table has fields "words, pictures". For blank pages, field "words" is NULL. There are many "Pages" records per book.
The problem with the above query, is that it retrieves ALL the pages for each book. I would like to retrieve only 1 page record for example with the condition "NOT NULL" on the "words" field. However, I don't want to exclude from the query results the Books that do not match the pages query (I have 10 books in my table and I want 10 books to be retrieved. The book.page association should be "nil" for the books where the condition does not match.)
I hope this makes sense.

Check this SO question:
Rails 3 - Eager loading with conditions
It looks like what you want
class Category
has_many :children, :class_name => "Category",
:foreign_key => "parent_id"
has_many :published_pages, :class_name => "Page",
:conditions => { :is_published => true }
end

If you only want a single blank page to be returned then you could add an association:
has_one :first_blank_page, -> {merge(Page.blanks).limit(1)}, :class_name => "Page"
... where in page.rb ...
def blanks
where(:words => nil)
end
Then you can:
#books = Books.includes(:author, :first_blank_page).order('created_at desc')
... and subsequently reading first_blank_page would be very efficient.
The limit will not be used if you eager load, though, as the SQL syntax for that sort of this would be very complex to execute as one query, so you'd want to consider whether you want to eager load all of the pages per book and then just use one per book. It's a tricky trade-off.

Related

Rails sort collection by association count

I'm working in Rails 4, and have two relevant models:
Account Model
has_many :agent_recalls, primary_key: "id", :foreign_key => "pickup_agent_id", class_name: "Booking"
Hence, queries like Account.find(10).agent_recalls would work.
What I want to do is sort the entire Account collection by this agent_recalls association.
Ideally it'd look something like (but obviously not):
#agents = Account.where(agent: true).order(:agent_recalls)
Question: What's the correct query to output an ordered list, by this agent_recall count?
Well to accomplish what you are looking for you have 2 options:
first, only a query, but it will implied a join, so there will be lost the Accounts that doesn't have any agent_recalls, so i will discard this option
second, i think this one is more appropriate for what you are trying to do
Account.find(:all, :conditions => { :agent => true }, :include => :agent_recalls).sort_by {|a| a. agent_recalls.size}
As you can see is a mix between a query and ruby, hope it helps :)

Rails: complex search on 3 models, return only newest - how to do this?

I'm trying to add an advanced search option to my app in which the user can search for certain links based on attributes from 3 different models.
My app is set up so that a User has_many :websites, Website has_many :links, and Link has_many :stats
I know how create SQL queries with joins or includes etc in Rails but I'm getting stuck since I only want to retrieve the latest stat for each link and not all of them - and I don't know the most efficient way to do this.
So for example, let's say a user has 2 websites, each with 10 links, and each link has 100 stats, that's 2,022 objects total, but I only want to search through 42 objects (only 1 stat per link).
Once I get only those 42 objects in a database query I can add .where("attribute like ?", user_input) and return the correct links.
Update
I've tried adding the following to my Link model:
has_many :stats, dependent: :destroy
has_many :one_stat, class_name: "Stat", order: "id ASC", limit: 1
But this doesn't seem to work, for example if I do:
#links = Link.includes(:one_stat).all
#links.each do |l|
puts l.one_stat.size
end
Instead of getting 1, 1, 1... I get the number of all the stats: 125, 40, 76....
Can I use the limit option to get the results I want or does it not work that way?
2nd Update
I've updated my code according to Erez's advice, but still not working properly:
has_one :latest_stat, class_name: "Stat", order: "id ASC"
#links = Link.includes(:latest_stat)
#links.each do |l|
puts l.latest_stat.indexed
end
=> true
=> true
=> true
=> false
=> true
=> true
=> true
Link.includes(:latest_stat).where("stats.indexed = ?", false).count
=> 6
Link.includes(:latest_stat).where("stats.indexed = ?", true).count
=> 7
It should return 1 and 6, but it's still checking all the stats rather than the latest only.
Sometimes, you gotta break through the AR abstraction and get your SQL on. Just a tiny bit.
Let's assume you have really simple relationships: Website has_many :links, and Link belongs_to :website and has_many :stats, and Stat belongs_to :link. No denormalization anywhere. Now, you want to build a query that finds, all of their links, and, for each link, the latest stat, but only for stats with some property (or it could be websites with some property or links with some property).
Untested, but something like:
Website
.includes(:links => :stats)
.where("stats.indexed" => true)
.where("stats.id = (select max(stats2.id)
from stats stats2 where stats2.link_id = links.id)")
That last bit subselects stats that are part of each link and finds the max id. It then filters out stats (from the join at the top) that don't match that max id. The query returns websites, which each have some number of links, and each link has just one stat in its stats collection.
Some extra info
I originally wrote this answer in terms of window functions, which turned out to be overkill, but I think I should cover it here anyway, since, well, fun. You'll note that the aggregate function trick we used above only works because we're determining which stat to use based on its ID, which exactly the property we need to filter the stats from the join by. But let's say you wanted only the first stat as ranked by some criteria other than ID, such as as, say, number_of_clicks; that trick won't work anymore because the aggregation loses track of the IDs. That's where window functions come in.
Again, totally untested:
Website
.includes(:links => :stats)
.where("stats.indexed" => true)
.where(
"(stats.id, 1) in (
select id, row_number()
over (partition by stats2.id order by stats2.number_of_clicks DESC)
from stat stats2 where stats2.link_id = links.id
)"
)
That last where subselects stats that match each link and order them by number_of_clicks ascending, then the in part matches it to a stat from the join. Note that window queries aren't portable to other database platforms. You could also use this technique to solve the original problem you posed (just swap stats2.id for stats2.number_of_clicks); it could conceivably perform better, and is advocated by this blog post.
I'd try this:
has_one :latest_stat, class_name: "Stat", order: "id ASC"
#links = Link.includes(:latest_stat)
#links.each do |l|
puts l.latest_stat
end
Note you can't print latest_stat.size since it is the stat object itself and not a relation.
Is this what you're looking for?
#user.websites.map { |site| site.links.map { |link| link.stats.last } }.flatten
For a given user, this will return an array with that contains the last stats for the links on that users website.

How to retrieve both "associated" records and "associated through" records in a performant way?

I am using Ruby on Rails 3.1 and I am trying to improve an SQL query in order to retrieve both "associated" records and "associated through" records (ActiveRecord::Associations) in a performant way so to avoid the "N + 1 query problem". That is, I have:
class Article < ActiveRecord::Base
has_many :category_relationships
has_many :categories,
:through => :category_relationships
end
class Category < ActiveRecord::Base
has_many :article_relationships
has_many :articles,
:through => :article_relationships
end
In a couple of SQL queries (that is, in a "performant way", maybe by using the Ruby on Rails includes() method) I would like to retrieve both categories and category_relationships, or both articles and article_relationships.
How can I make that?
P.S.: I am improve queries like the followings:
#category = Category.first
articles = #category.articles.where(:user_id => #current_user.id)
articles.each do |article|
# Note: In this example the 'inspect' method is just a method to "trigger" the
# "eager loading" functionalities
article.category_relationships.inspect
end
You can do
Article.includes(:category_relationships => :categories).find(1)
Which will reduce this to 3 queries (1 for each table). For performance, also make sure your foreign keys have an index.
But in general, I'm curious why the "category_relationships" entity exists at all, and why this isn't a has_and_belongs_to sort of situation?
Updated
As per your changed question, you can still do
Category.includes(:article_relationships => :articles).first
If you watch the console (or tail log/development) you'll see that when you call the associations, it'll hit the cached values and you're golden.
But I am still curious why you're not using a Has and Belongs To Many association.

Creating a news feed in Rails 3

I want to create an activity feed from recent article and comments in my rails app. They are two different types of activerecord (their table structures are different).
Ideally I would be able to create a mixed array of articles and comments and then show them in reverse chronological order.
So, I can figure out how to get an array of both articles and comments and then merge them together and sort by created_at, but I'm pretty sure that won't work as soon as I start using pagination as well.
Is there any way to create a scope like thing that will create a mixed array?
One of the other problems for me, is that it could be all articles and it could be all comments or some combination in between. So I can't just say I'll take the 15 last articles and the 15 last comments.
Any ideas on how to solve this?
When I've done this before I've managed it by having a denormalised UserActivity model or similar with a belongs_to polymorphic association to an ActivitySource - which can be any of the types of content that you want to display (posts, comments, up votes, likes, whatever...).
Then when any of the entities to be displayed are created, you have an Observer that fires and creates a row in the UserActivity table with a link to the record.
Then to display the list, you just query on UserActivity ordering by created_at descending, and then navigate through the polymorphic activity_source association to get the content data. You'll then need some smarts in your view templates to render comments and posts and whatever else differently though.
E.g. something like...
user_activity.rb:
class UserActivity < ActiveRecord::Base
belongs_to :activity_source, :polymorphic => true
# awesomeness continues here...
end
comment.rb (post/whatever)
class Comment < ActiveRecord::Base
# comment awesomeness here...
end
activity_source_observer.rb
class ActivitySourceObserver < ActiveRecord::Observer
observe :comment, :post
def after_create(activity_source)
UserActivity.create!(
:user => activity_source.user,
:activity_source_id => activity_source.id,
:activity_source_type => activity_source.class.to_s,
:created_at => activity_source.created_at,
:updated_at => activity_source.updated_at)
end
def before_destroy(activity_source)
UserActivity.destroy_all(:activity_source_id => activity_source.id)
end
end
Take a look at this railscast.
Then you can paginate 15 articles and in app/views/articles/index you can do something like this:
- #articles.each do |article|
%tr
%td= article.body
%tr
%td= nested_comments article.comment.descendants.arrange(:order => :created_at, :limit => 15)
This assumes the following relations:
#app/models/article.rb
has_one :comment # dummy root comment
#app/models/comment.rb
belongs_to :article
has_ancestry
And you add comments to an article as follows:
root_comment = #article.build_comment
root_comment.save
new_comment = root_comment.children.new
# add reply to new_comment
new_reply = new_comment.children.new
And so forth.

Ruby-on-Rails: How to pull out most recent entries from a limited subset of a database table

Imagine something like a model User who has many Friends, each of who has many Comments, where I'm trying to display to the user the latest 100 comments by his friends.
Is it possible to draw out the latest 100 in a single SQL query, or am I going to have to use Ruby application logic to parse a bigger list or make multiple queries?
I see two ways of going about this:
starting at User.find and use some complex combination of :join and :limit. This method seems promising, but unfortunately, would return me users and not comments, and once I get those back, I'd have lots of models taking up memory (for each Friend and the User), lots of unnecessary fields being transferred (everything for the User, and everything about the name row for the Friends), and I'd still have to step through somehow to collect and sort all the comments in application logic.
starting at the Comments and using some sort of find_by_sql, but I just can't seem to figure out what I'd need to put in. I don't know how you could have the necessary information to pass in with this to limit it to only looking at comments made by friends.
Edit: I'm having some difficult getting EmFi's solution to work, and would appreciate any insight anyone can provide.
Friends are a cyclic association through a join table.
has_many :friendships
has_many :friends,
:through => :friendships,
:conditions => "status = #{Friendship::FULL}"
This is the error I'm getting in relevant part:
ERROR: column users.user_id does not exist
: SELECT "comments".* FROM "comments" INNER JOIN "users" ON "comments".user_id = "users".id WHERE (("users".user_id = 1) AND ((status = 2)))
When I just enter user.friends, and it works, this is the query it executes:
: SELECT "users".* FROM "users" INNER JOIN "friendships" ON "users".id = "friendships".friend_id WHERE (("friendships".user_id = 1) AND ((status = 2)))
So it seems like it's mangling the :through to have two :through's in one query.
Given the following relationships:
class User < ActiveRecord::Base
has_many :friends
has_many :comments
has_many :friends_comments, :through => :friends, :source => :comments
end
This statement will execute a single SQL statement. Associations essentially create named scopes for you that aren't evaluated until the end of the chain.
#user.friends_comments.find(:limit => 100, :order => 'created_at DESC')
If this is a common query, the find can be simplified into its own scope.
class Comments < ActiveRecord::Base
belongs_to :user
#named_scope was renamed to scope in Rails 3.2. If you're working
#if you're working in a previous version uncomment the following line.
#named_scope :recent, :limit => 100, : order => 'created at DESC'
scope :recent, :limit => 100, :order => 'created_at DESC'
end
So now you can do:
#user.friends_comments.recent
N.B.: The friends association on user may be a cyclical one through a join table, but that's not important to this solution. As long as friends is a working association on User, the preceding will work.