What is the best way to get all linked instances of a models in Django? - sql

I am trying to create a messaging system in Django, and I came across an issue: How could I efficiently find all messages linked in a thread?
Let's imagine I have two models:
class Conversation(models.Model):
sender = models.ForeignKey(User)
receiver = models.ForeignKey(User)
first_message = models.OneToOneField(Message)
last_message = models.OneToOneField(Message)
class Message(models.Model):
previous = models.OneToOneField(Message)
content = models.TextField()
(code not tested, I'm sure it wouldn't work as is)
Since it is designed as a simple linked list, is it the only way to traverse it recursively?
Should I try to just get the previous of the previous until I find the first, or is there a way to query all of them more efficiently?

I use Rest Framework serializer with depth. So If you have serializer with Depth value to 3. I will fetch the full model of whatever the foreign key available until three parents.
https://www.django-rest-framework.org/api-guide/serializers/#specifying-nested-serialization
class AppliedSerializer(serializers.ModelSerializer):
class Meta:
model = Applied
fields = ("__all__")
depth = 3

Related

How do I construct complex django query statements?

I am not very familiar with SQL and so trying to make more complex calls via Django ORM is stumping me. I have a Printer model that spawns Jobs and the jobs receive statuses via a State model with a foreign key relationship to it. The jobs status is determined by the most recent state object associated with it. This is so I can track the history of states of jobs throughout its life cycle. I want to be able to determine which Printers have successful jobs associated with them.
from django.db import models
class Printer(models.Model):
label = models.CharField(max_length=120)
class Job(models.Model):
label = models.CharField(max_length=120)
printer = models.ForeignKey(
Printer,
related_name='jobs',
related_query_name='job'
)
def set_state(self, state):
State.objects.create(state=state, job=self)
#property
def current_state(self):
return self.states.latest('created_at').state
class State(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
state = models.SmallIntegerField()
job = models.ForeignKey(
Job,
related_name='states',
related_query_name='state'
)
I need a QuerySet of Printer objects that have at least one related job with its most recent (latest) state object which has State.state == '200'. Is there a way to construct a compound call which will achieve this using the database and not having to pull in all Job objects to run python iterations on? Perhaps a custom manager? I've been reading posts about Subquery and Annotation and OuterRef, but these ideas are just not sinking in in a way that is showing me a path. I need them explained like I'm 5. They are very unpythonic statements..
The naive python way to describe what I want:
printers = []
for printer in Printer.objects.all():
for job in printer.jobs.objects.all():
if job.states.latest().state == '200':
printers.append(printer)
printers = list(set(printers))
But with the least number of DB round trips possible. Help!
edit: further question, what's the best way to filter Jobs based on the current state. Since Job.current_state is a calculated property it cannot be used in a QuerySet filter. But, again, I don't want to have to pull in all Job objects.
Took about two days to sink in, but I think I have an answer using annotation and Subqueries:
state_sq = State.objects.filter(job=OuterRef('pk')).order_by('-created_at')
successful_jobs = Job.objects.annotate(
latest_state=Subquery(state_sq.values('state')[:1])
).filter(printer=OuterRef('pk'), latest_state='200')
printers_with_successful_jobs = Printer.objects.annotate(
has_success_jobs=Exists(successful_jobs)
).filter(has_success_jobs=True)
And further, I constructed a custom manager to return latest_state by default.
class JobManager(models.Manager):
def get_queryset(self):
state_sq = State.objects.filter(
object_id=OuterRef('pk')
).order_by('-created_at')
return super().get_queryset().annotate(
latest_state=Subquery(state_sq.values('state')[:1])
)
class Job(models.Model):
objects = JobManager()
...

How to Annotate Specific Related Field Value

I am attempting to optimize some code. I have model with many related models, and I want to annotate and filter by the value of a field of a specific type of these related models, as they are designed to be generic. I can find all instances of the type of related model I want, or all of the models related to the parent, but not the related model of the specific type related to the parent. Can anyone advise?
I initially tried
parents = parent.objects.all()
parents.annotate(field_value=Subquery(related_model.objects.get(
field__type='specific',
parent_id=OuterRef('id'),
).value)))
But get the error This queryset contains a reference to an outer query and may only be used in a subquery. When I tried
parents = parent.objects.all()
parents.annotate(field_value=Q(related_model.objects.get(
field__type='specific',
parent_id=F('id'),
).value)))
I get DoesNotExist: related_field matching query does not exist. which seems closer but still does not work.
Model structure:
class parent(models.Model):
id = models.IntegerField(null=False, primary_key=True)
class field(models.Model):
id = models.IntegerField(null=False, primary_key=True)
type = models.CharField(max_length=60)
class related_model(models.Model):
parent = models.ForeignKey(parent, on_delete=models.CASCADE, related_name='related_models')
field = models.ForeignKey(field, on_delete=models.CASCADE, related_name='fields')
Is what I want to do even possible?
Never mind I decided to do a reverse lookup, kinda like
parent_ids = related_model.objects.filter(field__type='specific', parent_id__in=list_of_parents).values_list('parent_id')
parents.objects.filter(id__in=parents_id)

The best way to avoid generic ForeignKey in django

I have Picture model which contains different link to images. I also have People and Car models that may have one or more images. That means that certain pictures can belong to an object in Car or People model.
I try to make ForeignKey but one of the field (car_id or people_id) will be empty.
I can't create an abstract model and make ForeignKey from Picture
The last solution that I know is genericForeignKey but it seems to complex for such trivial task.
Is there are a best way to solve the problem?
I solved the problem like this:
I created another model called Album that has only id
class People(models.Model):
name = models.CharField(max_length=100)
album = models.OneToOneField(Album)
class Car(models.Model):
horse_power = models.IntegerField()
ablum = models.OneToOneField(Album)
class Picture(models.Model):
title = models.CharField(max_length=100)
album = models.ForeignKey(Album)

Django REST framework flat, read-write serializer

In Django REST framework, what is involved in creating a flat, read-write serializer representation? The docs refer to a 'flat representation' (end of the section http://django-rest-framework.org/api-guide/serializers.html#dealing-with-nested-objects) but don't offer examples or anything beyond a suggestion to use a RelatedField subclass.
For instance, how to provide a flat representation of the User and UserProfile relationship, below?
# Model
class UserProfile(models.Model):
user = models.OneToOneField(User)
favourite_number = models.IntegerField()
# Serializer
class UserProfileSerializer(serializers.ModelSerializer):
email = serialisers.EmailField(source='user.email')
class Meta:
model = UserProfile
fields = ['id', 'favourite_number', 'email',]
The above UserProfileSerializer doesn't allow writing to the email field, but I hope it expresses the intention sufficiently well. So, how should a 'flat' read-write serializer be constructed to allow a writable email attribute on the UserProfileSerializer? Is it at all possible to do this when subclassing ModelSerializer?
Thanks.
Looking at the Django REST framework (DRF) source I settled on the view that a DRF serializer is strongly tied to an accompanying Model for unserializing purposes. Field's source param make this less so for serializing purposes.
With that in mind, and viewing serializers as encapsulating validation and save behaviour (in addition to their (un)serializing behaviour) I used two serializers: one for each of the User and UserProfile models:
class UserSerializer(serializer.ModelSerializer):
class Meta:
model = User
fields = ['email',]
class UserProfileSerializer(serializer.ModelSerializer):
email = serializers.EmailField(source='user.email')
class Meta:
model = UserProfile
fields = ['id', 'favourite_number', 'email',]
The source param on the EmailField handles the serialization case adequately (e.g. when servicing GET requests). For unserializing (e.g. when serivicing PUT requests) it is necessary to do a little work in the view, combining the validation and save behaviour of the two serializers:
class UserProfileRetrieveUpdate(generics.GenericAPIView):
def get(self, request, *args, **kwargs):
# Only UserProfileSerializer is required to serialize data since
# email is populated by the 'source' param on EmailField.
serializer = UserProfileSerializer(
instance=request.user.get_profile())
return Response(serializer.data)
def put(self, request, *args, **kwargs):
# Both UserSerializer and UserProfileSerializer are required
# in order to validate and save data on their associated models.
user_profile_serializer = UserProfileSerializer(
instance=request.user.get_profile(),
data=request.DATA)
user_serializer = UserSerializer(
instance=request.user,
data=request.DATA)
if user_profile_serializer.is_valid() and user_serializer.is_valid():
user_profile_serializer.save()
user_serializer.save()
return Response(
user_profile_serializer.data, status=status.HTTP_200_OK)
# Combine errors from both serializers.
errors = dict()
errors.update(user_profile_serializer.errors)
errors.update(user_serializer.errors)
return Response(errors, status=status.HTTP_400_BAD_REQUEST)
First: better handling of nested writes is on it's way.
Second: The Serializer Relations docs say of both PrimaryKeyRelatedField and SlugRelatedField that "By default this field is read-write..." — so if your email field was unique (is it?) it might be you could use the SlugRelatedField and it would just work — I've not tried this yet (however).
Third: Instead I've used a plain Field subclass that uses the source="*" technique to accept the whole object. From there I manually pull the related field in to_native and return that — this is read-only. In order to write I've checked request.DATA in post_save and updated the related object there — This isn't automatic but it works.
So, Fourth: Looking at what you've already got, my approach (above) amounts to marking your email field as read-only and then implementing post_save to check for an email value and perform the update accordingly.
Although this does not strictly answer the question - I think it will solve your need. The issue may be more in the split of two models to represent one entity than an issue with DRF.
Since Django 1.5, you can make a custom user, if all you want is some method and extra fields but apart from that you are happy with the Django user, then all you need to do is:
class MyUser(AbstractBaseUser):
favourite_number = models.IntegerField()
and in settings: AUTH_USER_MODEL = 'myapp.myuser'
(And of course a db-migration, which could be made quite simple by using db_table option to point to your existing user table and just add the new columns there).
After that, you have the common case which DRF excels at.

Django aggregate query

I have a model Page, which can have Posts on it. What I want to do is get every Page, plus the most recent Post on that page. If the Page has no Posts, I still want the page. (Sound familiar? This is a LEFT JOIN in SQL).
Here is what I currently have:
Page.objects.annotate(most_recent_post=Max('post__post_time'))
This only gets Pages, but it doesn't get Posts. How can I get the Posts as well?
Models:
class Page(models.Model):
name = models.CharField(max_length=50)
created = models.DateTimeField(auto_now_add = True)
enabled = models.BooleanField(default = True)
class Post(models.Model):
user = models.ForeignKey(User)
page = models.ForeignKey(Page)
post_time = models.DateTimeField(auto_now_add = True)
Depending on the relationship between the two, you should be able to follow the relationships quite easily, and increase performance by using select_related
Taking this:
class Page(models.Model):
...
class Post(models.Model):
page = ForeignKey(Page, ...)
You can follow the forward relationship (i.e. get all the posts and their associated pages) efficiently using select_related:
Post.objects.select_related('page').all()
This will result in only one (larger) query where all the page objects are prefetched.
In the reverse situation (like you have) where you want to get all pages and their associated posts, select_related won't work. See this,this and this question for more information about what you can do.
Probably your best bet is to use the techniques described in the django docs here: Following Links Backward.
After you do:
pages = Page.objects.annotate(most_recent_post=Max('post__post_time'))
posts = [page.post_set.filter(post_time=page.most_recent_post) for page in pages]
And then posts[0] should have the most recent post for pages[0] etc. I don't know if this is the most efficient solution, but this was the solution mentioned in another post about the lack of left joins in django.
You can create a database view that will contain all Page columns alongside with with necessary latest Post columns:
CREATE VIEW `testapp_pagewithrecentpost` AS
SELECT testapp_page.*, testapp_post.* -- I suggest as few post columns as possible here
FROM `testapp_page` LEFT JOIN `testapp_page`
ON test_page.id = test_post.page_id
AND test_post.post_time =
( SELECT MAX(test_post.post_time)
FROM test_post WHERE test_page.id = test_post.page_id );
Then you need to create a model with flag managed = False (so that manage.py sync won't break). You can also use inheritance from abstract Model to avoid column duplication:
class PageWithRecentPost(models.Model): # Or extend abstract BasePost ?
# Page columns goes here
# Post columns goes here
# We use LEFT JOIN, so all columns from the
# 'post' model will need blank=True, null=True
class Meta:
managed = False # Django will not handle creation/reset automatically
By doing that you can do what you initially wanted, so fetch from both tables in just one query:
pages_with_recent_post = PageWithRecentPost.objects.filter(...)
for page in pages_with_recent_post:
print page.name # Page column
print page.post_time # Post column
However this approach is not drawback free:
It's very DB engine-specific
You'll need to add VIEW creation SQL to your project
If your models are complex it's very likely that you'll need to resolve table column name clashes.
Model based on a database view will very likely be read-only (INSERT/UPDATE will fail).
It adds complexity to your project. Allowing for multiple queries is a definitely simpler solution.
Changes in Page/Post will require re-creating the view.