A tricky query for finding new forum posts - sql

So I have these models.
class Forum < ActiveRecord::Base
has_ancestry
has_many :forum_topics
end
class ForumTopic < ActiveRecord::Base
belongs_to :forum
has_many :forum_topic_reads
# has a :last_post_at date column
end
class ForumTopicRead < ActiveRecord::Base
belongs_to :forum_topic
belongs_to :user
# has a :updated_at date column
end
Very basic setup.
Now what I want to get is an arry of ids of forums that have unread posts sowhere in their subtree. The presence of new posts is decided by the comparescent of forum_topics.last_post_at with forum_topic_reads.updated_at where forum_topics.id = forum_topic_reads.forum_topic_id for a particular user_id or when a ForumTopicRead record is absent for that topic and user.
The problem is - the only way I managed to get it working is by manualy going through every forum and geting its subtree and then getting all the topics for the subtree etc. That results in a ton of similar queries to the database and thus a very slow process.
I believe there should be a way to make it go faster. I just need the ids of the forums that have at least 1 unread topic in their subtrees, don't need the count, don't need the topic ids themselves.
UPDATE
Got a hint from #MrYoshiji
This query:
ForumTopic.joins(:forum_topic_reads).where('forum_topics.last_post_at > forum_topic_reads.updated_at AND forum_topic_reads.user_id = ?', user.id).pluck(:forum_id).uniq
does not work quite well, 'cause it ignores the topics withought appropriate topic_reads (and creating a read for every topic for every user is a bit of an overhead)
UPDATE 2
So I finally came up with a promissing path. If I drop all the reads on a topic when a new post gets added to it (thus updating the :last_post_at field), I'll be able to collect the forum_ids with this query:
"SELECT distinct forum_id FROM `forum_topics` LEFT JOIN forum_topic_reads ON forum_topic_reads.forum_topic_id = forum_topics.id AND forum_topic_reads.user_id = #{user.id} GROUP BY forum_topics.id having count(forum_topic_reads.id) < 1"
Now the only big problem I have is translating this from SQL to ActiveRecord.

ForumTopic.unscoped.joins(:forum_topic_reads).where('user_id = ?', user[:id]).group(:id).having('forum_topic_reads.count < 1').pluck(:forum_id)

Related

Rails ActiveRecord Access association's(children?) latest created objects

as title said i am trying to access an array of objects of an association
This is a has_many association
here is my class
class Keyword < ApplicationRecord
has_many :rankings
end
class Ranking < ApplicationRercord
belongs_to :keyword
end
There are a attribute in ranking called position:integer, i want to be able to access all latest created rankings from all keyword here is what i got so far
Keyword.all.joins(:rankings).select( 'MAX(rankings.id) ').pluck(:created_at, :keyword_id, :position)
i've read some other post suggesting me to use MAX on rankings.id, but i am still not able to return the array
At the moment Keyword.count return 4597
Ranking.count return 9245
Each keyword has generated about 2 rankings, but i just want the latest ranking from each keyword in array format, so to get latest of each i should expect around 4597
Not sure if i explained clear enough, hope u guys can help me :'( thanks really appreciate it
If you are using Postgres. You can use DISTINCT ON
Keyword.joins(:rankings)
.select("DISTINCT ON(ratings.keyword_id) keywords.*, ratings.position, ratings.created_at AS rating_created_at")
.order("ratings.keyword_id, ratings.id DESC")
Now you can access position, rating_created_at
#keywords.each do |k|
k.position
....
#keywords.map { |k| [k.id, k.rating_created_at, k.position] }
If you have enough rankings you might want to store the latest ranking on the on keywords table as a read optimization:
class Keyword < ApplicationRecord
belongs_to :latest_ranking, class_name: :ranking
has_many :rankings, after_add: :set_latest_ranking
def set_latest_ranking(ranking)
self.update!(latest_ranking: ranking)
end
end
Keyword.joins(:latest_ranking)
.pluck(:created_at, :id, "rankings.position")
This makes it both very easy to join and highly performant. I learned this after dealing with an application that had a huge row count and trying every possible solution like lateral joins to improve the pretty dismal performance of the query.
The cost is an extra write query when creating the record.
Keyword.joins(:rankings).group("keywords.id").pluck("keywords.id", "MAX(rankings.id)")
This will give you an array which elements will include an ID of a keyword and an ID of the latest ranking, associated with that keyword.
If you need to fetch more information about rankings rather than id, you can do it like this:
last_rankings_ids_scope = Ranking.joins(:keyword).group("keywords.id").select("MAX(rankings.id)")
Ranking.where(id: last_rankings_ids_scope).pluck(:created_at, :keyword_id, :position)

Rails – order by "important", where "important" is in another table

I have a list of posts, which user can mark as "Important". When he listing them I want important ones to be the first, and all others should be below.
User mark post as important using another model, ImportantPost, which belongs to User and belongs to Post. The problem is I don't know how can I re-order Posts with conditions in "order" statement – every user have his own list of "important" posts.
My models are:
class Post < ActiveRecord::Base
has_many :important_posts
belongs_to :user
end
class User < ActiveRecord::Base
has_many :posts
has_many :important_posts
end
class ImportantPost < ActiveRecord::Base
belongs_to :user
belongs_to :post
end
I wrote code just right in here, because my real situation is a little bit harder.
The best I came with so far is:
Post.joins(:important_posts).select("post.*, important_posts.user_id = #{current_user.id} as important").order('important')
The only thing – it displays only posts which have been marked as important. For example, if totally there's a 3 posts, but only one marked as "Important" – the code above will return only one post.
UPD
Looks like join with left outer JOIN solve my problem... can it cause any problems?.. Should I use full outer join?
Yes, my problem fully solved by using outer left join. It's a first time I use outer join and it's good!

Rails: complex search on 3 models, return only newest - how to do this?

I'm trying to add an advanced search option to my app in which the user can search for certain links based on attributes from 3 different models.
My app is set up so that a User has_many :websites, Website has_many :links, and Link has_many :stats
I know how create SQL queries with joins or includes etc in Rails but I'm getting stuck since I only want to retrieve the latest stat for each link and not all of them - and I don't know the most efficient way to do this.
So for example, let's say a user has 2 websites, each with 10 links, and each link has 100 stats, that's 2,022 objects total, but I only want to search through 42 objects (only 1 stat per link).
Once I get only those 42 objects in a database query I can add .where("attribute like ?", user_input) and return the correct links.
Update
I've tried adding the following to my Link model:
has_many :stats, dependent: :destroy
has_many :one_stat, class_name: "Stat", order: "id ASC", limit: 1
But this doesn't seem to work, for example if I do:
#links = Link.includes(:one_stat).all
#links.each do |l|
puts l.one_stat.size
end
Instead of getting 1, 1, 1... I get the number of all the stats: 125, 40, 76....
Can I use the limit option to get the results I want or does it not work that way?
2nd Update
I've updated my code according to Erez's advice, but still not working properly:
has_one :latest_stat, class_name: "Stat", order: "id ASC"
#links = Link.includes(:latest_stat)
#links.each do |l|
puts l.latest_stat.indexed
end
=> true
=> true
=> true
=> false
=> true
=> true
=> true
Link.includes(:latest_stat).where("stats.indexed = ?", false).count
=> 6
Link.includes(:latest_stat).where("stats.indexed = ?", true).count
=> 7
It should return 1 and 6, but it's still checking all the stats rather than the latest only.
Sometimes, you gotta break through the AR abstraction and get your SQL on. Just a tiny bit.
Let's assume you have really simple relationships: Website has_many :links, and Link belongs_to :website and has_many :stats, and Stat belongs_to :link. No denormalization anywhere. Now, you want to build a query that finds, all of their links, and, for each link, the latest stat, but only for stats with some property (or it could be websites with some property or links with some property).
Untested, but something like:
Website
.includes(:links => :stats)
.where("stats.indexed" => true)
.where("stats.id = (select max(stats2.id)
from stats stats2 where stats2.link_id = links.id)")
That last bit subselects stats that are part of each link and finds the max id. It then filters out stats (from the join at the top) that don't match that max id. The query returns websites, which each have some number of links, and each link has just one stat in its stats collection.
Some extra info
I originally wrote this answer in terms of window functions, which turned out to be overkill, but I think I should cover it here anyway, since, well, fun. You'll note that the aggregate function trick we used above only works because we're determining which stat to use based on its ID, which exactly the property we need to filter the stats from the join by. But let's say you wanted only the first stat as ranked by some criteria other than ID, such as as, say, number_of_clicks; that trick won't work anymore because the aggregation loses track of the IDs. That's where window functions come in.
Again, totally untested:
Website
.includes(:links => :stats)
.where("stats.indexed" => true)
.where(
"(stats.id, 1) in (
select id, row_number()
over (partition by stats2.id order by stats2.number_of_clicks DESC)
from stat stats2 where stats2.link_id = links.id
)"
)
That last where subselects stats that match each link and order them by number_of_clicks ascending, then the in part matches it to a stat from the join. Note that window queries aren't portable to other database platforms. You could also use this technique to solve the original problem you posed (just swap stats2.id for stats2.number_of_clicks); it could conceivably perform better, and is advocated by this blog post.
I'd try this:
has_one :latest_stat, class_name: "Stat", order: "id ASC"
#links = Link.includes(:latest_stat)
#links.each do |l|
puts l.latest_stat
end
Note you can't print latest_stat.size since it is the stat object itself and not a relation.
Is this what you're looking for?
#user.websites.map { |site| site.links.map { |link| link.stats.last } }.flatten
For a given user, this will return an array with that contains the last stats for the links on that users website.

Rails 3: Chaining `has_many` relations with conditions on the last table, without excessive queries

I am new to both DBs and Rails (using 3.2), and I suspect this is a basic question that I'm just having trouble finding an answer for.
I'm making an app that tracks articles people submit to journals. So, each user has a number of articles. For each article, they can submit it to a number of journals. The submission object itself keeps track not only of the article being submitted and the journal it's submitted to, but also the date it was submitted, the way it was submitted (email, online system, etc.), the date a response was received (if any), the response (if any) (e.g. accepted, resubmit, decline).
I'm trying to allow a user to view all of their outstanding submissions in one place. So, for each user, I'm trying to query the db to get all submissions belonging to articles that belong to that user, but only returning the submissions for which response is NULL (meaning there is no response yet, and thus the submission is still outstanding).
So my models are related like this:
class User < ActiveRecord::Base
has_many :articles
end
class Article < ActiveRecord::Base
belongs_to :user
has_many :submissions
end
class Journal < ActiveRecord::Base
has_many :submissions
end
class Submission < ActiveRecord::Base
belongs_to :article
has_one :journal
end
Now, for a given #user, I think I could do something like
#articles_with_subs = #user.articles.joins(:submissions)
and then
#out_subs = Array.new
#articles_with_subs.each do |article|
outs = article.submissions.where("response NOT NULL")
#out_subs.push outs
end
#out_subs.flatten!
but that seems pretty inefficient. What probably big, obvious thing am I missing?
Thanks very much.
#out_subs = Submission.where("response NOT NULL")
.joins(:article).where(:article => {:user_id => #user.id})

Rails 3. Get most recent update

I am developing an API using Rails 3. A user can have several contact items, like phones, emails, websites and address. Each item got its own model.
I need to do caching in my iPhone app that is using this API therefore I need to get the date when the latest update to any of the items occured and match it against the timestamp of the cache file.
How can I get the most updated items (when comparing all the item tables)?
I am getting the most recent item for each item table like this. Is this really good?
#phones = #user.phones.order("updated_at desc").limit(1)
Thankful for all help!
You can make use of ActiveRecord's touch method. This is especially useful if you have one parent record with many child records. Every time the child records are saved or destroyed the parent record will have it's updated_at field set to the current time.
class User < ApplicationRecord
has_many :phones
has_many :addresses
end
class Phone < ApplicationRecord
belongs_to :user, touch: true
end
class Address < ApplicationRecord
belongs_to :user, touch: true
end
Now any time an address or a phone is updated, the User's updated_at will be set to the current time.
To check when the last updated for the current user took place, over all their tables, you now do:
#last_updated = #user.updated_at
For a small overhead in writes you gain a lot with simpler queries on checking your cache expiry. The documentation for this can be found under belongs_to options.