How can i improve performances when using django queryset? - sql

I'm trying to make a news feed. Each time the page is called, server must send multiple items. One item contain a post, number of likes, number of comments, number of comment children, comments data, comment children data etc.
My problem is, each time my page is called, it takes more than 5 secondes to be loaded. I've already implemented a caching system. But it's still slow.
posts = Posts.objects.filter(page="feed").order_by('-likes')[:'10'].cache()
posts = PostsSerializer(post,many=True)
hasPosted = Posts.objects.filter(page="feed",author="me").cache()
hasPosted = PostsSerializer(hasPosted,many=True)
for post in post.data:
commentsNum = Comments.objects.filter(parent=posts["id"]).cache(ops=['count'])
post["comments"] = len(commentsNum)
comments = Comments.objects.filter(parent=posts["id"]).order_by('-likes')[:'10'].cache()
liked = Likes.objects.filter(post_id=posts["id"],author="me").cache()
comments = CommentsSerializer(comments,many=True)
commentsObj[posts["id"]] = {}
for comment in comments.data:
children = CommentChildren.objects.filter(parent=comment["id"]).order_by('date')[:'10'].cache()
numChildren = CommentChildren.objects.filter(parent=comment["id"]).cache(ops=['count'])
posts["comments"] = posts["comments"] + len(numChildren)
children = CommentChildrenSerializer(children,many=True)
liked = Likes.objects.filter(post_id=comment["id"],author="me").cache()
for child in children.data:
if child["parent"] == comment["id"]:
liked = Liked.objects.filter(post_id=child["id"],author="me").cache()
I'm trying to find a simple method to fetch all these data quicker and without unnecessary database hit. I need to reduce the loading time from 5 secs to less than 1 if possible.
Any suggestion ?

Add the number of children as a integer on the comment field that gets updated every time a comment is added or removed. That way, you won't have to query for that value. You can do this using signals.
Add an ArrayField(if you're using postgres) or something similar on your Profile model that stores all the primary keys of Liked posts. Instead of querying the Likes model, you would be able to do this:
profile = Profile.objects.get(name='me')
liked = True if comment_pk in profile.liked_posts else False
Use select_related to CommentChildren instead of making an extra query for it.
Implementing these 3 items will get rid of all the db queries being executed in the "comment in comments.data" forloop which is probably taking up the majority of the processing time.
If you're interested, check out django-debug-toolbar which enables you to see what queries are being executed on every page.

Related

Django: how to effectively use select_related() with Paginator?

I have 2 related models with 10 Million rows each and want to perform an efficient paginated request of 50 000 items of one of them and access related data on the other one:
class RnaPrecomputed(models.Model):
id = models.CharField(max_length=22, primary_key=True)
rna = models.ForeignKey('Rna', db_column='upi', to_field='upi', related_name='precomputed')
description = models.CharField(max_length=250)
class Rna(models.Model):
id = models.IntegerField(db_column='id')
upi = models.CharField(max_length=13, db_index=True, primary_key=True)
timestamp = models.DateField()
userstamp = models.CharField(max_length=30)
As you can see, RnaPrecomputed is related to RNA via a foreign key. Now, I want to fetch a specific page of 50 000 items of RnaPrecomputed and corresponding Rnas related to them. I expect N+1 requests problem, if I do this without select_related() call. Here are the timings:
First, for reference I won't touch the related model at all:
rna_paginator = paginator.Paginator(RnaPrecomputed.objects.all(), 50000)
message = ""
for object in rna_paginator.page(400).object_list:
message = message + str(object.id)
Takes:
real 0m12.614s
user 0m1.073s
sys 0m0.188s
Now, I'll try accessing data on related model:
rna_paginator = paginator.Paginator(RnaPrecomputed.objects.all(), 50000)
message = ""
for object in rna_paginator.page(400).object_list:
message = message + str(object.rna.upi)
it takes:
real 2m27.655s
user 1m20.194s
sys 0m4.315s
Which is a lot, so, probably I have N+1 requests problem.
But now, if I use select_related(),
rna_paginator = paginator.Paginator(RnaPrecomputed.objects.all().select_related('rna'), 50000)
message = ""
for object in rna_paginator.page(400).object_list:
message = message + str(object.rna.upi)
it takes even more:
real 7m9.720s
user 0m1.948s
sys 0m0.337s
So, somehow select_related() made things 3 times slower, instead of making them faster. And probably without it, I have N+1 requests, so for each entry of RnaPrecomputed, Django ORM probably has to do an additional request to the database to fetch the corresponding Rna?
What am I doing wrong and how to make select_related() perform well with paginated queryset?
It's worth checking that you're not missing an index in your database. You have db_index=True for the Rna.upi field, but are you sure the index exists in the database?
If the select_related is making the count() query slow, then you could try doing the select_related on the paginated object_list.
for object in rna_paginator.page(300).object_list.select_related():
message = message + str(object.rna.upi)

phalcon querybuilder total_items always returns 1

I make a query via createBuilder() and when executing it (getQuery()->execute()->toArray())
I got 10946 elements. I want to paginate it, so I pass it to:
$paginator = new \Phalcon\Paginator\Adapter\QueryBuilder(array(
"builder" => $builder,
"limit" => $limit,
"page" => $current_page
));
$limit is 25 and $current_page is 1, but when doing:
$paginator->getPaginate();
$page->total_items;
returns 1.
Is that a bug or am I missing something?
UPD: it seems like when counting items it uses created sql with limit. There is no difference what limit is, limit divided by items per page always equals 1. I might be mistaken.
UPD2: Colleague helped me to figure this out, the bug was in the query phalcon produces: count() of the group by counts grouped elements. So a workaround looks like:
$dataCount = $builder->getQuery()->execute()->count();
$page->next = $page->current + 1;
$page->before = $page->current - 1 > 0 ? $page->current - 1 : 1;
$page->total_items = $dataCount;
$page->total_pages = ceil($dataCount / 100);
$page->last = $page->total_pages;
I know this isn't much of an answer but this is most likely to be a bug. Great guys at Phalcon took on a massive job that is too big to do it properly in their little free time and things like PHQL, Volt and other big but non-core components do not receive as much attention as we'd like. Also given that most time in the past 6 months was spent on v2 there are nearly 500 bugs about stuff like that and it's counting. I came across considerable issues in ORM, Volt, Validation and Session, which in the end made me stick to other not as cool but more proven solutions. When v2 comes out I'm sure all attention will on the bug list and testing, until then we are mostly on our own. Given that it's all C right now, only a few enthusiast get involved, with v2 this will also change.
If this is the only problem you are hitting, the best approach is to update your query to get the information you need yourself without getPaginate().

Rally custom grid with ((Parent = null) or (Parent.Name = 'xxxx'))

Can you create a custom User Story grid in Rally with the following query?
(((Parent = null) AND (Owner.Name = "dummy.name#email.com")) OR ((Parent.Name contains "Example") AND (Owner.Name = "dummy.name#email.com")))
Every time I try to do this it only returns results for the second part of the query. It seems like it cannot combine the Parent = null and the Parent.Name contains "Example".
Thanks for any feedback! I know that we could create two grids, but it would be nice to combine it into one.
This looks like a bug to me. Ignoring the owner portion of the query and taking just the Parent portions, per the title of your question: ((Parent = null) or (Parent.Name = 'xxxx')), it seems that I am also always seeing results that match the latter condition. If I switch the order, I get Parent-less stories. I'd recommend submitting a Case to Rally Support - rallysupport#rallydev.com - they can get this filed with the Rally developers.

Magento Bulk update attributes

I am missing the SQL out of this to Bulk update attributes by SKU/UPC.
Running EE1.10 FYI
I have all the rest of the code working but I"m not sure the who/what/why of
actually updating our attributes, and haven't been able to find them, my logic
is
Open a CSV and grab all skus and associated attrib into a 2d array
Parse the SKU into an entity_id
Take the entity_id and the attribute and run updates until finished
Take the rest of the day of since its Friday
Here's my (almost finished) code, I would GREATLY appreciate some help.
/**
* FUNCTION: updateAttrib
*
* REQS: $db_magento
* Session resource
*
* REQS: entity_id
* Product entity value
*
* REQS: $attrib
* Attribute to alter
*
*/
See my response for working production code. Hope this helps someone in the Magento community.
While this may technically work, the code you have written is just about the last way you should do this.
In Magento, you really should be using the models provided by the code and not write database queries on your own.
In your case, if you need to update attributes for 1 or many products, there is a way for you to do that very quickly (and pretty safely).
If you look in: /app/code/core/Mage/Adminhtml/controllers/Catalog/Product/Action/AttributeController.php you will find that this controller is dedicated to updating multiple products quickly.
If you look in the saveAction() function you will find the following line of code:
Mage::getSingleton('catalog/product_action')
->updateAttributes($this->_getHelper()->getProductIds(), $attributesData, $storeId);
This code is responsible for updating all the product IDs you want, only the changed attributes for any single store at a time.
The first parameter is basically an array of Product IDs. If you only want to update a single product, just put it in an array.
The second parameter is an array that contains the attributes you want to update for the given products. For example if you wanted to update price to $10 and weight to 5, you would pass the following array:
array('price' => 10.00, 'weight' => 5)
Then finally, the third and final attribute is the store ID you want these updates to happen to. Most likely this number will either be 1 or 0.
I would play around with this function call and use this instead of writing and maintaining your own database queries.
General Update Query will be like:
UPDATE
catalog_product_entity_[backend_type] cpex
SET
cpex.value = ?
WHERE cpex.attribute_id = ?
AND cpex.entity_id = ?
In order to find the [backend_type] associated with the attribute:
SELECT
  backend_type
FROM
  eav_attribute
WHERE entity_type_id =
  (SELECT
    entity_type_id
  FROM
    eav_entity_type
  WHERE entity_type_code = 'catalog_product')
AND attribute_id = ?
You can get more info from the following blog article:
http://www.blog.magepsycho.com/magento-eav-structure-role-of-eav_attributes-backend_type-field/
Hope this helps you.

jqGrid/NHibernate/SQL: navigate to selected record

I use jqGrid to display data which is retrieved using NHibernate. jqGrid does paging for me, I just tell NHibernate to get "count" rows starting from "n".
Also, I would like to highlight specific record. For example, in list of employees I'd like a specific employee (id) to be shown and pre-selected in table.
The problem is that this employee may be on non-current page. E.g. I display 20 rows from 0, but "highlighted" employee is #25 and is on second page.
It is possible to pass initial page to jqGrid, so, if I somehow use NHibernate to find what page the "highlighted" employee is on, it will just navigate to that page and then I'll use .setSelection(id) method of jqGrid.
So, the problem is narrowed down to this one: given specific search query like the one below, how do I tell NHibernate to calculate the page where the "highlighted" employee is?
A sample query (simplified):
var query = Session.CreateCriteria<T>();
foreach (var sr in request.SearchFields)
query = query.Add(Expression.Like(sr.Key, "%" + sr.Value + "%"));
query.SetFirstResult((request.Page - 1) * request.Rows)
query.SetMaxResults(request.Rows)
Here, I need to alter (calculate) request.Page so that it points to the page where request.SelectedId is.
Also, one interesting thing is, if sort order is not defined, will I get the same results when I run the search query twice? I'd say that SQL Server may optimize query because order is not defined... in which case I'll only get predictable result if I pull ALL query data once, and then will programmatically in C# slice the specified portion of query results - so that no second query occur. But it will be much slower, of course.
Or, is there another way?
Pretty sure you'd have to figure out the page with another query. This would surely require you to define the column to order by. You'll need to get the order by and restriction working together to count the rows before that particular id. Once you have the number of rows before your id, you can figure what page you need to select and perform the usual paging query.
OK, so currently I do this:
var iquery = GetPagedCriteria<T>(request, true)
.SetProjection(Projections.Property("Id"));
var ids = iquery.List<Guid>();
var index = ids.IndexOf(new Guid(request.SelectedId));
if (index >= 0)
request.Page = index / request.Rows + 1;
and in jqGrid setup options
url: "${Url.Href<MyController>(c => c.JsonIndex(null))}?_SelectedId=${Id}",
// remove _SelectedId from url once loaded because we only need to find its page once
gridComplete: function() {
$("#grid").setGridParam({url: "${Url.Href<MyController>(c => c.JsonIndex(null))}"});
},
loadComplete: function() {
$("#grid").setSelection("${Id}");
}
That is, in request I lookup for index of id and set page if found (jqGrid even understands to display the appropriate page number in the pager because I return the page number to in in json data). In grid setup, I setup url to include the lookup id first, but after grid is loaded I remove it from url so that prev/next buttons work. However I always try to highlight the selected id in the grid.
And of course I always use sorting or the method won't work.
One problem still exists is that I pull all ids from db which is a bit of performance hit. If someone can tell how to find index of the id in the filtered/sorted query I'd accept the answer (since that's the real problem); if no then I'll accept my own answer ;-)
UPDATE: hm, if I sort by id initially I'll be able to use the technique like "SELECT COUNT(*) ... WHERE id < selectedid". This will eliminate the "pull ids" problem... but I'd like to sort by name initially, anyway.
UPDATE: after implemented, I've found a neat side-effect of this technique... when sorting, the active/selected item is preserved ;-) This works if _SelectedId is reset only when page is changed, not when grid is loaded.
UPDATE: here's sources that include the above technique: http://sprokhorenko.blogspot.com/2010/01/jqgrid-mvc-new-version-sources.html