Django Making 1000 Duplicate Queries - sql

Model:
class Comment(MPTTModel):
submitter = models.ForeignKey(User, blank=True, null=True)
post = models.ForeignKey(Post, related_name="post_comments")
parent = TreeForeignKey('self', blank=True, null=True, related_name="children")
text = models.CharField("Text", max_length=1000)
rank = models.FloatField(default=0.0)
pub_date = models.DateTimeField(auto_now_add=True)
Iterating through nodes has the same effect (>1000 queries).

I had similar issue with MPTT models. It was solved with select_related
(also for parent's foreign keys).
So, depending on your needs, proper queryset can looks like:
Comment.objects.select_related('post', 'submitter', 'parent', 'parent__submitter', 'parent__post')
Also, if you need comment's children in your loop as well, it can be optimized like that:
queryset.prefetch_related('children')
Or even like that:
queryset.prefetch_related(
Prefetch(
'children',
queryset=Comment.objects.select_related('post', 'etc.'),
to_attr='children_with_posts'
)
)
... and depending on tree depth, you can use that:
queryset.select_related('parent', 'parent__parent', 'parent__parent__parent')
# you got the idea:)

Duplicated queries happens because all objects from iteration hits the data base when you refer a related object.
Try using select_related in your view method.
Probably using django prefetch related or select related will resolve that, but if not work, sorry you will need a raw query.
Have you ever read about optimizing Django queries? Here is a simple tutorial that's explain a lot of things: https://docs.djangoproject.com/en/3.1/topics/db/optimization/

Related

What is the best way to get all linked instances of a models in Django?

I am trying to create a messaging system in Django, and I came across an issue: How could I efficiently find all messages linked in a thread?
Let's imagine I have two models:
class Conversation(models.Model):
sender = models.ForeignKey(User)
receiver = models.ForeignKey(User)
first_message = models.OneToOneField(Message)
last_message = models.OneToOneField(Message)
class Message(models.Model):
previous = models.OneToOneField(Message)
content = models.TextField()
(code not tested, I'm sure it wouldn't work as is)
Since it is designed as a simple linked list, is it the only way to traverse it recursively?
Should I try to just get the previous of the previous until I find the first, or is there a way to query all of them more efficiently?
I use Rest Framework serializer with depth. So If you have serializer with Depth value to 3. I will fetch the full model of whatever the foreign key available until three parents.
https://www.django-rest-framework.org/api-guide/serializers/#specifying-nested-serialization
class AppliedSerializer(serializers.ModelSerializer):
class Meta:
model = Applied
fields = ("__all__")
depth = 3

How to Annotate Specific Related Field Value

I am attempting to optimize some code. I have model with many related models, and I want to annotate and filter by the value of a field of a specific type of these related models, as they are designed to be generic. I can find all instances of the type of related model I want, or all of the models related to the parent, but not the related model of the specific type related to the parent. Can anyone advise?
I initially tried
parents = parent.objects.all()
parents.annotate(field_value=Subquery(related_model.objects.get(
field__type='specific',
parent_id=OuterRef('id'),
).value)))
But get the error This queryset contains a reference to an outer query and may only be used in a subquery. When I tried
parents = parent.objects.all()
parents.annotate(field_value=Q(related_model.objects.get(
field__type='specific',
parent_id=F('id'),
).value)))
I get DoesNotExist: related_field matching query does not exist. which seems closer but still does not work.
Model structure:
class parent(models.Model):
id = models.IntegerField(null=False, primary_key=True)
class field(models.Model):
id = models.IntegerField(null=False, primary_key=True)
type = models.CharField(max_length=60)
class related_model(models.Model):
parent = models.ForeignKey(parent, on_delete=models.CASCADE, related_name='related_models')
field = models.ForeignKey(field, on_delete=models.CASCADE, related_name='fields')
Is what I want to do even possible?
Never mind I decided to do a reverse lookup, kinda like
parent_ids = related_model.objects.filter(field__type='specific', parent_id__in=list_of_parents).values_list('parent_id')
parents.objects.filter(id__in=parents_id)

Django determine "most popular"

Given this somewhat simplified representation of my application's models, my question is how do I globally find the most popular MyModel? I.e., those MyModels are favorited the most by MyUsers.
I've come across similar blog posts about how to find favorite tags, but I don't think those apply to this particular situation.
class MyUser(models.Model):
favorite_models = models.ManyToManyField(MyModel)
...
class MyModel(models.Model):
name = models.CharField(...)
...
Can this be done in a single query? Or do I need to loop over every MyUser and MyModel to determine the most popular? Thanks in advance!
I'm too lazy to create a django project from scratch, but this one should do the job:
from django.db.models import Count
MyModel.objects.annotate(Count('myuser'))
(or this)
MyModel.objects.annotate(Count('myuser_set'))
if not, try this:
class MyUser(models.Model):
favorite_models = models.ManyToManyField(MyModel, related_name='myuser')
and then
MyModel.objects.annotate(Count('myuser_set'))
(let me know if it works, in any case this page should contain what you need to do that: https://docs.djangoproject.com/en/dev/topics/db/aggregation/)

Django aggregate query

I have a model Page, which can have Posts on it. What I want to do is get every Page, plus the most recent Post on that page. If the Page has no Posts, I still want the page. (Sound familiar? This is a LEFT JOIN in SQL).
Here is what I currently have:
Page.objects.annotate(most_recent_post=Max('post__post_time'))
This only gets Pages, but it doesn't get Posts. How can I get the Posts as well?
Models:
class Page(models.Model):
name = models.CharField(max_length=50)
created = models.DateTimeField(auto_now_add = True)
enabled = models.BooleanField(default = True)
class Post(models.Model):
user = models.ForeignKey(User)
page = models.ForeignKey(Page)
post_time = models.DateTimeField(auto_now_add = True)
Depending on the relationship between the two, you should be able to follow the relationships quite easily, and increase performance by using select_related
Taking this:
class Page(models.Model):
...
class Post(models.Model):
page = ForeignKey(Page, ...)
You can follow the forward relationship (i.e. get all the posts and their associated pages) efficiently using select_related:
Post.objects.select_related('page').all()
This will result in only one (larger) query where all the page objects are prefetched.
In the reverse situation (like you have) where you want to get all pages and their associated posts, select_related won't work. See this,this and this question for more information about what you can do.
Probably your best bet is to use the techniques described in the django docs here: Following Links Backward.
After you do:
pages = Page.objects.annotate(most_recent_post=Max('post__post_time'))
posts = [page.post_set.filter(post_time=page.most_recent_post) for page in pages]
And then posts[0] should have the most recent post for pages[0] etc. I don't know if this is the most efficient solution, but this was the solution mentioned in another post about the lack of left joins in django.
You can create a database view that will contain all Page columns alongside with with necessary latest Post columns:
CREATE VIEW `testapp_pagewithrecentpost` AS
SELECT testapp_page.*, testapp_post.* -- I suggest as few post columns as possible here
FROM `testapp_page` LEFT JOIN `testapp_page`
ON test_page.id = test_post.page_id
AND test_post.post_time =
( SELECT MAX(test_post.post_time)
FROM test_post WHERE test_page.id = test_post.page_id );
Then you need to create a model with flag managed = False (so that manage.py sync won't break). You can also use inheritance from abstract Model to avoid column duplication:
class PageWithRecentPost(models.Model): # Or extend abstract BasePost ?
# Page columns goes here
# Post columns goes here
# We use LEFT JOIN, so all columns from the
# 'post' model will need blank=True, null=True
class Meta:
managed = False # Django will not handle creation/reset automatically
By doing that you can do what you initially wanted, so fetch from both tables in just one query:
pages_with_recent_post = PageWithRecentPost.objects.filter(...)
for page in pages_with_recent_post:
print page.name # Page column
print page.post_time # Post column
However this approach is not drawback free:
It's very DB engine-specific
You'll need to add VIEW creation SQL to your project
If your models are complex it's very likely that you'll need to resolve table column name clashes.
Model based on a database view will very likely be read-only (INSERT/UPDATE will fail).
It adds complexity to your project. Allowing for multiple queries is a definitely simpler solution.
Changes in Page/Post will require re-creating the view.

NHibernate Partial Update

Is there a way in NHibernate to start with an unproxied model
var m = new Model() { ID = 1 };
m.Name = "test";
//Model also has .LastName and .Age
Now save this model only updating Name without first selecting the model from the session?
If model has other properties then name, you need to initialize these with the original value in the database, unless they will be set to null.
You can use HQL update operations; I never tried it myself.
You could also use a native SQL statement. ("Update model set name ...").
Usually, this optimization is not needed. There are really rare cases where you need to avoid selecting the data, so writing this SQL statements are just a waste of time. You are using an ORM, this means: write your software object oriented! Unless you won't get much advantages from it.
What Stefan says looks like what you need. Please be aware that this is really an edge case and you should be happy with fully loading your entity unless you have some ultra-high-performance issues.
If you simply don't want to hit the database - try using caching - entity cache is very simple and efficient.
If your entity is a huge one - i.e. it contains a blob or something - think about splitting it in two (with many-to-one so that you can utilize lazy loading).
http://www.hibernate.org/hib_docs/nhibernate/html/mapping.html
dynamic-update (optional, defaults to
false): Specifies that UPDATE SQL
should be generated at runtime and
contain only those columns whose
values have changed.
Place dynamic-update on the class in the HBM.
var m = new Model() { ID = 1 };
m = session.Update(m); //attach m to the session.
m.Name = "test";
session.Save(m);