Rails: complex search on 3 models, return only newest - how to do this? - sql

I'm trying to add an advanced search option to my app in which the user can search for certain links based on attributes from 3 different models.
My app is set up so that a User has_many :websites, Website has_many :links, and Link has_many :stats
I know how create SQL queries with joins or includes etc in Rails but I'm getting stuck since I only want to retrieve the latest stat for each link and not all of them - and I don't know the most efficient way to do this.
So for example, let's say a user has 2 websites, each with 10 links, and each link has 100 stats, that's 2,022 objects total, but I only want to search through 42 objects (only 1 stat per link).
Once I get only those 42 objects in a database query I can add .where("attribute like ?", user_input) and return the correct links.
Update
I've tried adding the following to my Link model:
has_many :stats, dependent: :destroy
has_many :one_stat, class_name: "Stat", order: "id ASC", limit: 1
But this doesn't seem to work, for example if I do:
#links = Link.includes(:one_stat).all
#links.each do |l|
puts l.one_stat.size
end
Instead of getting 1, 1, 1... I get the number of all the stats: 125, 40, 76....
Can I use the limit option to get the results I want or does it not work that way?
2nd Update
I've updated my code according to Erez's advice, but still not working properly:
has_one :latest_stat, class_name: "Stat", order: "id ASC"
#links = Link.includes(:latest_stat)
#links.each do |l|
puts l.latest_stat.indexed
end
=> true
=> true
=> true
=> false
=> true
=> true
=> true
Link.includes(:latest_stat).where("stats.indexed = ?", false).count
=> 6
Link.includes(:latest_stat).where("stats.indexed = ?", true).count
=> 7
It should return 1 and 6, but it's still checking all the stats rather than the latest only.

Sometimes, you gotta break through the AR abstraction and get your SQL on. Just a tiny bit.
Let's assume you have really simple relationships: Website has_many :links, and Link belongs_to :website and has_many :stats, and Stat belongs_to :link. No denormalization anywhere. Now, you want to build a query that finds, all of their links, and, for each link, the latest stat, but only for stats with some property (or it could be websites with some property or links with some property).
Untested, but something like:
Website
.includes(:links => :stats)
.where("stats.indexed" => true)
.where("stats.id = (select max(stats2.id)
from stats stats2 where stats2.link_id = links.id)")
That last bit subselects stats that are part of each link and finds the max id. It then filters out stats (from the join at the top) that don't match that max id. The query returns websites, which each have some number of links, and each link has just one stat in its stats collection.
Some extra info
I originally wrote this answer in terms of window functions, which turned out to be overkill, but I think I should cover it here anyway, since, well, fun. You'll note that the aggregate function trick we used above only works because we're determining which stat to use based on its ID, which exactly the property we need to filter the stats from the join by. But let's say you wanted only the first stat as ranked by some criteria other than ID, such as as, say, number_of_clicks; that trick won't work anymore because the aggregation loses track of the IDs. That's where window functions come in.
Again, totally untested:
Website
.includes(:links => :stats)
.where("stats.indexed" => true)
.where(
"(stats.id, 1) in (
select id, row_number()
over (partition by stats2.id order by stats2.number_of_clicks DESC)
from stat stats2 where stats2.link_id = links.id
)"
)
That last where subselects stats that match each link and order them by number_of_clicks ascending, then the in part matches it to a stat from the join. Note that window queries aren't portable to other database platforms. You could also use this technique to solve the original problem you posed (just swap stats2.id for stats2.number_of_clicks); it could conceivably perform better, and is advocated by this blog post.

I'd try this:
has_one :latest_stat, class_name: "Stat", order: "id ASC"
#links = Link.includes(:latest_stat)
#links.each do |l|
puts l.latest_stat
end
Note you can't print latest_stat.size since it is the stat object itself and not a relation.

Is this what you're looking for?
#user.websites.map { |site| site.links.map { |link| link.stats.last } }.flatten
For a given user, this will return an array with that contains the last stats for the links on that users website.

Related

Rails ActiveRecord Access association's(children?) latest created objects

as title said i am trying to access an array of objects of an association
This is a has_many association
here is my class
class Keyword < ApplicationRecord
has_many :rankings
end
class Ranking < ApplicationRercord
belongs_to :keyword
end
There are a attribute in ranking called position:integer, i want to be able to access all latest created rankings from all keyword here is what i got so far
Keyword.all.joins(:rankings).select( 'MAX(rankings.id) ').pluck(:created_at, :keyword_id, :position)
i've read some other post suggesting me to use MAX on rankings.id, but i am still not able to return the array
At the moment Keyword.count return 4597
Ranking.count return 9245
Each keyword has generated about 2 rankings, but i just want the latest ranking from each keyword in array format, so to get latest of each i should expect around 4597
Not sure if i explained clear enough, hope u guys can help me :'( thanks really appreciate it
If you are using Postgres. You can use DISTINCT ON
Keyword.joins(:rankings)
.select("DISTINCT ON(ratings.keyword_id) keywords.*, ratings.position, ratings.created_at AS rating_created_at")
.order("ratings.keyword_id, ratings.id DESC")
Now you can access position, rating_created_at
#keywords.each do |k|
k.position
....
#keywords.map { |k| [k.id, k.rating_created_at, k.position] }
If you have enough rankings you might want to store the latest ranking on the on keywords table as a read optimization:
class Keyword < ApplicationRecord
belongs_to :latest_ranking, class_name: :ranking
has_many :rankings, after_add: :set_latest_ranking
def set_latest_ranking(ranking)
self.update!(latest_ranking: ranking)
end
end
Keyword.joins(:latest_ranking)
.pluck(:created_at, :id, "rankings.position")
This makes it both very easy to join and highly performant. I learned this after dealing with an application that had a huge row count and trying every possible solution like lateral joins to improve the pretty dismal performance of the query.
The cost is an extra write query when creating the record.
Keyword.joins(:rankings).group("keywords.id").pluck("keywords.id", "MAX(rankings.id)")
This will give you an array which elements will include an ID of a keyword and an ID of the latest ranking, associated with that keyword.
If you need to fetch more information about rankings rather than id, you can do it like this:
last_rankings_ids_scope = Ranking.joins(:keyword).group("keywords.id").select("MAX(rankings.id)")
Ranking.where(id: last_rankings_ids_scope).pluck(:created_at, :keyword_id, :position)

A tricky query for finding new forum posts

So I have these models.
class Forum < ActiveRecord::Base
has_ancestry
has_many :forum_topics
end
class ForumTopic < ActiveRecord::Base
belongs_to :forum
has_many :forum_topic_reads
# has a :last_post_at date column
end
class ForumTopicRead < ActiveRecord::Base
belongs_to :forum_topic
belongs_to :user
# has a :updated_at date column
end
Very basic setup.
Now what I want to get is an arry of ids of forums that have unread posts sowhere in their subtree. The presence of new posts is decided by the comparescent of forum_topics.last_post_at with forum_topic_reads.updated_at where forum_topics.id = forum_topic_reads.forum_topic_id for a particular user_id or when a ForumTopicRead record is absent for that topic and user.
The problem is - the only way I managed to get it working is by manualy going through every forum and geting its subtree and then getting all the topics for the subtree etc. That results in a ton of similar queries to the database and thus a very slow process.
I believe there should be a way to make it go faster. I just need the ids of the forums that have at least 1 unread topic in their subtrees, don't need the count, don't need the topic ids themselves.
UPDATE
Got a hint from #MrYoshiji
This query:
ForumTopic.joins(:forum_topic_reads).where('forum_topics.last_post_at > forum_topic_reads.updated_at AND forum_topic_reads.user_id = ?', user.id).pluck(:forum_id).uniq
does not work quite well, 'cause it ignores the topics withought appropriate topic_reads (and creating a read for every topic for every user is a bit of an overhead)
UPDATE 2
So I finally came up with a promissing path. If I drop all the reads on a topic when a new post gets added to it (thus updating the :last_post_at field), I'll be able to collect the forum_ids with this query:
"SELECT distinct forum_id FROM `forum_topics` LEFT JOIN forum_topic_reads ON forum_topic_reads.forum_topic_id = forum_topics.id AND forum_topic_reads.user_id = #{user.id} GROUP BY forum_topics.id having count(forum_topic_reads.id) < 1"
Now the only big problem I have is translating this from SQL to ActiveRecord.
ForumTopic.unscoped.joins(:forum_topic_reads).where('user_id = ?', user[:id]).group(:id).having('forum_topic_reads.count < 1').pluck(:forum_id)

Rails sort collection by association count

I'm working in Rails 4, and have two relevant models:
Account Model
has_many :agent_recalls, primary_key: "id", :foreign_key => "pickup_agent_id", class_name: "Booking"
Hence, queries like Account.find(10).agent_recalls would work.
What I want to do is sort the entire Account collection by this agent_recalls association.
Ideally it'd look something like (but obviously not):
#agents = Account.where(agent: true).order(:agent_recalls)
Question: What's the correct query to output an ordered list, by this agent_recall count?
Well to accomplish what you are looking for you have 2 options:
first, only a query, but it will implied a join, so there will be lost the Accounts that doesn't have any agent_recalls, so i will discard this option
second, i think this one is more appropriate for what you are trying to do
Account.find(:all, :conditions => { :agent => true }, :include => :agent_recalls).sort_by {|a| a. agent_recalls.size}
As you can see is a mix between a query and ruby, hope it helps :)

Limit the amount of results in associated queries with Rails 3.2

I have the following query:
#books = Books.includes(:author, :pages)
.find(:all,
:order => 'created_at DESC')
Let's assume my "Pages" table has fields "words, pictures". For blank pages, field "words" is NULL. There are many "Pages" records per book.
The problem with the above query, is that it retrieves ALL the pages for each book. I would like to retrieve only 1 page record for example with the condition "NOT NULL" on the "words" field. However, I don't want to exclude from the query results the Books that do not match the pages query (I have 10 books in my table and I want 10 books to be retrieved. The book.page association should be "nil" for the books where the condition does not match.)
I hope this makes sense.
Check this SO question:
Rails 3 - Eager loading with conditions
It looks like what you want
class Category
has_many :children, :class_name => "Category",
:foreign_key => "parent_id"
has_many :published_pages, :class_name => "Page",
:conditions => { :is_published => true }
end
If you only want a single blank page to be returned then you could add an association:
has_one :first_blank_page, -> {merge(Page.blanks).limit(1)}, :class_name => "Page"
... where in page.rb ...
def blanks
where(:words => nil)
end
Then you can:
#books = Books.includes(:author, :first_blank_page).order('created_at desc')
... and subsequently reading first_blank_page would be very efficient.
The limit will not be used if you eager load, though, as the SQL syntax for that sort of this would be very complex to execute as one query, so you'd want to consider whether you want to eager load all of the pages per book and then just use one per book. It's a tricky trade-off.

Adding integer to column in database

I have an integer column on the Users table called rating_number. This number is going to consist of two things.
Impressions on page views
The total number of likes they have on their posts
So far, I have the impression part taken care of. I'm using the gem is_impressionable with a counter_cache like so on my User model:
is_impressionable :counter_cache => true, :column_name => :rating_number, :unique => :all
Now, I'm trying to add to that column the second part, which is the total number of votes they have on their posts. I am getting that integer by:
#user = current_user # or some user
array = #user.posts.map { |post| post.votes.count }
count = array.inject { |sum, x| sum + x }
where count is the total number of votes they have on their posts. How can I automatically update the rating_number column in an efficient way every time a User get's one of their posts voted_on. Should I instead go the direction where I manually add 1 to that column in the post's def vote action after the vote has successfully been saved?
Not sure if this is useful, but I'm also using the thumbs_up gem for voting system.
Lookig at the your need, I am quite sure you need to use callback called after_update in your User model. To understand how call back works, read Callbacks. But I would suggest you to keep the data in 2 separate columns, rather than a single column.
class User < ActiveRecord::Base
after_update :vote_update
# other methods
def vote_update
user = #post.user
user.rating_number = user.rating_number + 1
user.save!
end
end