Slow active record query rails 5.1, selects all messages to get most recent and count - sql

I have a very slow query that ends up scanning all messages in order to show a users most recent.
def followed_members(limit = 6)
page = helpers.param_helper(:page, 1)
ids = #user.follows_by_type('User')
.to_a
.pluck(:followable_id)
#top_members = User
.where(User.arel_table[:id].in(ids))
.order(ranking_points: :desc)
.includes([:messages])
.offset((page - 1) * limit)
.limit(limit)
.to_a
#next_members = helpers.next_helper(#top_members, limit, page)
if page > 1
#next_page = #next_members
#ajaxcontent = #top_members
end
end
The :messages is only used to populate a user card to show how many messages a user has posted and the date of their most recent message.
user_cards.messages.length > 0 ? user_cards.messages[0].created_at.strftime("%m/%d/%Y") : ""
This is done for all the users follow by a member whenever they go to their following list. This thing is super inefficient but I'm a ruby noob and inherited this legacy project. How would you tackle this so I don't need to grab every message.

Related

How can i improve performances when using django queryset?

I'm trying to make a news feed. Each time the page is called, server must send multiple items. One item contain a post, number of likes, number of comments, number of comment children, comments data, comment children data etc.
My problem is, each time my page is called, it takes more than 5 secondes to be loaded. I've already implemented a caching system. But it's still slow.
posts = Posts.objects.filter(page="feed").order_by('-likes')[:'10'].cache()
posts = PostsSerializer(post,many=True)
hasPosted = Posts.objects.filter(page="feed",author="me").cache()
hasPosted = PostsSerializer(hasPosted,many=True)
for post in post.data:
commentsNum = Comments.objects.filter(parent=posts["id"]).cache(ops=['count'])
post["comments"] = len(commentsNum)
comments = Comments.objects.filter(parent=posts["id"]).order_by('-likes')[:'10'].cache()
liked = Likes.objects.filter(post_id=posts["id"],author="me").cache()
comments = CommentsSerializer(comments,many=True)
commentsObj[posts["id"]] = {}
for comment in comments.data:
children = CommentChildren.objects.filter(parent=comment["id"]).order_by('date')[:'10'].cache()
numChildren = CommentChildren.objects.filter(parent=comment["id"]).cache(ops=['count'])
posts["comments"] = posts["comments"] + len(numChildren)
children = CommentChildrenSerializer(children,many=True)
liked = Likes.objects.filter(post_id=comment["id"],author="me").cache()
for child in children.data:
if child["parent"] == comment["id"]:
liked = Liked.objects.filter(post_id=child["id"],author="me").cache()
I'm trying to find a simple method to fetch all these data quicker and without unnecessary database hit. I need to reduce the loading time from 5 secs to less than 1 if possible.
Any suggestion ?
Add the number of children as a integer on the comment field that gets updated every time a comment is added or removed. That way, you won't have to query for that value. You can do this using signals.
Add an ArrayField(if you're using postgres) or something similar on your Profile model that stores all the primary keys of Liked posts. Instead of querying the Likes model, you would be able to do this:
profile = Profile.objects.get(name='me')
liked = True if comment_pk in profile.liked_posts else False
Use select_related to CommentChildren instead of making an extra query for it.
Implementing these 3 items will get rid of all the db queries being executed in the "comment in comments.data" forloop which is probably taking up the majority of the processing time.
If you're interested, check out django-debug-toolbar which enables you to see what queries are being executed on every page.

Django: how to effectively use select_related() with Paginator?

I have 2 related models with 10 Million rows each and want to perform an efficient paginated request of 50 000 items of one of them and access related data on the other one:
class RnaPrecomputed(models.Model):
id = models.CharField(max_length=22, primary_key=True)
rna = models.ForeignKey('Rna', db_column='upi', to_field='upi', related_name='precomputed')
description = models.CharField(max_length=250)
class Rna(models.Model):
id = models.IntegerField(db_column='id')
upi = models.CharField(max_length=13, db_index=True, primary_key=True)
timestamp = models.DateField()
userstamp = models.CharField(max_length=30)
As you can see, RnaPrecomputed is related to RNA via a foreign key. Now, I want to fetch a specific page of 50 000 items of RnaPrecomputed and corresponding Rnas related to them. I expect N+1 requests problem, if I do this without select_related() call. Here are the timings:
First, for reference I won't touch the related model at all:
rna_paginator = paginator.Paginator(RnaPrecomputed.objects.all(), 50000)
message = ""
for object in rna_paginator.page(400).object_list:
message = message + str(object.id)
Takes:
real 0m12.614s
user 0m1.073s
sys 0m0.188s
Now, I'll try accessing data on related model:
rna_paginator = paginator.Paginator(RnaPrecomputed.objects.all(), 50000)
message = ""
for object in rna_paginator.page(400).object_list:
message = message + str(object.rna.upi)
it takes:
real 2m27.655s
user 1m20.194s
sys 0m4.315s
Which is a lot, so, probably I have N+1 requests problem.
But now, if I use select_related(),
rna_paginator = paginator.Paginator(RnaPrecomputed.objects.all().select_related('rna'), 50000)
message = ""
for object in rna_paginator.page(400).object_list:
message = message + str(object.rna.upi)
it takes even more:
real 7m9.720s
user 0m1.948s
sys 0m0.337s
So, somehow select_related() made things 3 times slower, instead of making them faster. And probably without it, I have N+1 requests, so for each entry of RnaPrecomputed, Django ORM probably has to do an additional request to the database to fetch the corresponding Rna?
What am I doing wrong and how to make select_related() perform well with paginated queryset?
It's worth checking that you're not missing an index in your database. You have db_index=True for the Rna.upi field, but are you sure the index exists in the database?
If the select_related is making the count() query slow, then you could try doing the select_related on the paginated object_list.
for object in rna_paginator.page(300).object_list.select_related():
message = message + str(object.rna.upi)

How to perform a conditional count in a Rails active record query?

In a Rails 3.2 app I'm trying to construct a query that will return an array with two calculated counts. The app has three models: a User has many Events, and a Location has many Events. I need to return an array for a user that contains the number of events they have at each location, as well as the number of active events.
e.g., [#<Location id: 1, name: "Location Name", total_events_count: 4, active_event_count: 4>]>
I can get the total_event_count
user = User.find(params[:id])
user.locations
.select("locations.id, locations.name, count(events.id) AS total_events_count")
.joins(:events)
.group("locations.id")
Given that my Event model has a string status field that can take a value active, how would I also include an active_events_count in this query?
EDIT
After some useful suggestions from xdazz and mu, I'm still struggling with this. My problem appears to be that the count is counting all events, rather than events that belong to the location AND the user.
I'm going to try to rephrase my question, hopefully someone can help me understand this.
Events belong to both a User and a Location (i.e. User has_many :locations, through: :events)
Events have several fields, including status (a string) and participants (an integer).
In a User show view, I'm trying to generate a list of a User's locations. For each location, I want to display a "success rate". The success rate is the total number of a User;s events with participants, divided by the total number of that User's events. i.e., if User1 has 4 events at LocationA, but only two of those events had participants, User1's success rate at LocationA is 0.5 (or 50%).
The way I though to achieve this is via a select query that also includes calculated counts for total_events_count and successful_events_count. (there may be a better way?)
So I do something like:
user = User.find(params[:id])
user.locations
.select("locations.id, locations.name, count(events.id) AS total_events_count, sum(case events.marked when NOT 0 then 1 else 0 end) AS successful_events_count")
.joins(:events)
.group("locations.id")
This is returning an array with the correct keys, but the values are not correct. I am getting the total number of all events (and all successful events) for that location, not just a count of those events that belong to the current user.
I have been looking at this problem for so long that I'm getting myself very confused. Would be very grateful for a fresh perspective and ideas!!
EDIT2
After a break and fresh eyes, I have managed to get the result I need using the following code. It seems quite convoluted. If there is a better way, please let me know. Otherwise I will tidy up this question in case anyone else runs into the same problem.
class User
def location_ratings
events = self.events
successful_events = events.where('events.participants > 0')
user_events_by_location = Event.
select("events.location_id AS l_id, count(events.id) AS location_event_count").
where( id: events.pluck(:id) ).
group("l_id").
to_sql
user_successful_events_by_location = Event.
select("events.location_id AS l_id, count(events.id) AS location_successful_events_count").
where( id: successful_events.pluck(:id) ).
group("l_id").
to_sql
Location.
joins("JOIN (#{user_events_by_location}) AS user_events ON user_events.l_id = location.id").
joins("JOIN (#{user_successful_events_by_location}) AS successful_user_events ON successful_user_events.l_id = location.id").
select('location.id, location.name, user_events.location_events_count, successful_user_events.location_successful_events_count').
order("user_events.location_events_count DESC")
end
You could use sum(events.marked='active') to get it:
user.locations
.select("locations.id, locations.name, count(events.id) AS total_events_count, sum(events.marked='active') AS marked_event_count")
.joins(:events)
.group("locations.id")
Update:
If you are using postgresql, then you have to case boolean to int before using SUM function.
user.locations
.select("locations.id, locations.name, count(events.id) AS total_events_count, sum((events.marked='active')::int) AS marked_event_count")
.joins(:events)
.group("locations.id")

Rails query not returning updated record

Rails is not returning the updated version of a record.
I have two methods in a model, submit_job(sig, label, jobtype) for submitting a job to a db that will get processed on the backend, and then poll_result(id) which will poll that submitted job every second to see when it completes, and then return the results from the completed job to the user.
My issue is that the poll_result(id) method is never getting the updated record.
def self.poll_result(id)
change = false
Workbench.where("id = ?", id).each do |sig|
if sig.resultsready.to_i == 1
change = true
end
end
return change
end
All this does is comeback with the results from my original insert over and over, as I can see when I have it print out the results of the record it is accessing. I am looking directly at the database and can see that it is calling the right ID, and that the record has been updated. resultsready is set to 1 in the database, the loop should end and it should return back, but it just gets stuck in an infinite loop.
My assumption is that it is somehow getting an old/stale record that is being cached somehow, but I cannot for the life of me figure out how to force it to get the new record.
Thank You,
-Dennis
Using the Workbench.connection.clear_query_cache fixed the issue! To be specific, I added it at the controller level, right before calling Workbench.poll_result(id)

Rails SQL efficiency for where statement

Is there a more efficient method for doing a Rails SQL statement of the following code?
It will be called across the site to hide certain content or users based on if a user is blocked or not so it needs to be fairly efficient or it will slow everything else down as well.
users.rb file:
def is_blocked_by_or_has_blocked?(user)
status = relationships.where('followed_id = ? AND relationship_status = ?',
user.id, relationship_blocked).first ||
user.relationships.where('followed_id = ? AND relationship_status = ?',
self.id, relationship_blocked).first
return status
end
In that code, relationship_blocked is just an abstraction of an integer to make it easier to read later.
In a view, I am calling this method like this:
- unless current_user.is_blocked_by_or_has_blocked?(user)
- # show the content for unblocked users here
Edit
This is a sample query.. it stops after it finds the first instance (no need to check for a reverse relationship)
Relationship Load (0.2ms) SELECT "relationships".* FROM "relationships" WHERE ("relationships".follower_id = 101) AND (followed_id = 1 AND relationship_status = 2) LIMIT 1
You can change it to only run one query by making it use an IN (x,y,z) statement in the query (this is done by passing an array of ids to :followed_id). Also, by using .count, you bypass Rails instantiating an instance of the model for the resulting relationships, which will keep things faster (less data to pass around in memory):
def is_blocked_by_or_has_blocked?(user)
relationships.where(:followed_id => [user.id, self.id], :relationship_status => relationship_blocked).count > 0
end
Edit - To get it to look both ways;
Relationship.where(:user_id => [user.id, self.id], :followed_id => [user.id, self.id], :relationship_status => relationship_blocked).count > 0