How to use annotate to group by in Django? - sql

Suppose I have a Django model...
class Book(models.Model):
name = models.CharField(max_length=50)
author = models.CharField(max_length=50)
publisher = models.CharField(max_length=50)
and I want to know how many publishers each author has used? I've tried something like...
Book.objects.value('author').annotate(Count('publisher'))
but it doesn't seem to be working. What is the correct Django query?
Or alternatively how do I convert this sql query into Django?
SELECT author, COUNT(publisher) FROM Book GROUP BY author;

Related

Django SQL query count

I have the following models in Django:
class Author(models.Model):
name = models.CharField(max_length=120)
country = models.CharField(max_length=100)
class Book(models.Model):
title = models.CharField(max_length=1024)
publisher = models.CharField(max_length=255)
published_date = models.DateField()
author = models.ForeignKey(Author)
There are 9 records in the Author table and 4 in the Book table.
How many SQL queries would be issued when Book.objects.select_related().all() is evaluated?
My guess was 4, because there are 4 rows in the Book table, so 1 query each to search for all the authors related to each book. Why is my answer wrong?
The possible choices are 5, 4, 10 and 1.
select_related(*fields)
Returns a QuerySet that will “follow” foreign-key relationships, selecting additional related-object data when it executes its query. This is a performance booster which results in a single more complex query but means later use of foreign-key relationships won’t require database queries.

Django - join between different models without foreign key

Imagine I have two simple models (it's not really what I have but this will do):
Class Person(models.Model):
person_id = models.TextField()
name = models.TextField()
#...some other fields
Class Pet(models.Model):
person_id = models.TextField()
pet_name = models.TextField()
species = models.TextField()
#...even more fields
Here's the key difference between this example and some other questions I read about: my models don't enforce a foreign key, so I can't use select_related()
I need to create a view that shows a join between two querysets in each one. So, let's imagine I want a view with all owners named John with a dog.
# a first filter
person_query = Person.objects.filter(name__startswith="John")
# a second filter
pet_query = Pet.objects.filter(species="Dog")
# the sum of the two
magic_join_that_i_cant_find_and_possibly_doesnt_exist = join(person_query.person_id, pet_query.person_id)
Now, can I join those two very very simple querysets with any function?
Or should I use raw?
SELECT p.person_id, p.name, a.pet_name, a.species
FROM person p
LEFT JOIN pet a ON
p.person_id = a.person_id AND
a.species = 'Dog' AND
p.name LIKE 'John%'
Is this query ok? Damn, I'm not sure anymore... that's my issue with queries. Everything is all at once. But consecutive queries seem so simple...
If I reference in my model class a "foreign key" (for select_related() use), will it be enforced in the database after the migration? (I need that it DOESN'T happen)
Make a models.ForeignKey but use db_constraint=False.
See https://docs.djangoproject.com/en/3.0/ref/models/fields/#django.db.models.ForeignKey.db_constraint
Also, if this model is managed=False, ie it is a legacy db table and you're not using Django migrations, the constraint won't ever be made in the first place and it's fine.
If you create a FK in the model, Django will create a constraint on migration, so you want to avoid that in your case.
I don't think there is a way to join in the database in Django if you don't declare the field to join as a foreign key. The only thing you can do is to do the join in Python, which might or might not be OK. Think that prefetch_related does precisely this.
The code would be something like:
person_query = Person.objects.filter(name__startswith="John")
person_ids = [person.id for person in person_query]
pet_query = Pet.objects.filter(species="Dog", person_id__in=person_ids).order_by('person_id')
pets_by_person_id = {person_id: pet_group for person_id, pet_group in itertools.groupby(pet_query, lambda pet: pet.person_id)}
# Now everytime you need the pets for a certain person
pets_by_person_id(person.id)
# You can also set it in all objects for easy retrieval
for person in person_query:
person.pets = pets_by_person_id(person.id)
The code might not be 100% accurate, but you get the idea I hope.

Django ORM - understanding foreign key queries

I'm in a process of optimizing my queries. Assume I have these models:
class Author(models.Model):
name = models.CharField(max_length=20)
class Book(models.Model):
name = models.CharField(max_length=20)
author = models.ForeignKey(Author)
A simple task here would be to get all the books of a given author, assume I have the author-ID.
In standard SQL I would only need to query the books table.
But In django code I do:
# given authorID
author = Author.objects.get(pk=authorID)
books = Book.objects.filter(author=author)
Which would take two queries. How can I avoid the first query ?
Try something like:
Book.objects.filter(author_id=authorID)
This will return all the books where author's foreign key is authorID.

Prevent slow INNER JOIN in Django model calls when unnecessary

Is it possible to prevent Django from using INNER JOIN in SQL relationship queries when unnecessary?
I have the two tables:
class Author(models.Model):
name = models.CharField(max_length=50, primary_key=True, db_index=True)
hometown = models.CharField(max_length=50)
class Book(models.Model):
title = models.CharField(max_length=50, primary_key=True, db_index=True)
author = models.ForeignKey(Author, db_index=True)
The author table has more than 50 million rows, which makes requests such getting all the books of one author, Book.objects.filter(author_id='John Smith'), incredibly slow (about 20 sec). However, when I use raw SQL to achieve the same result, the query is almost instant: SELECT * FROM books WHERE author_id='John Smith';.
Using result.query I have found that Django is slower because it runs a INNER JOIN query on the entire table:
SELECT books.title, books.author_id FROM books INNER JOIN authors
ON (books.author_id = authors.name) WHERE books.author_id = 'John Smith';
Is there a way to make Django avoid the INNER JOIN in cases such as this when it isn't necessary?
I would like to avoid using raw SQL queries if at all possible as this database structure is highly simplified.
The solution turned out to be removing a class Meta option:
class Book(models.Model):
(...)
class Meta:
ordering = ['author', 'title']

How do you access parent and children relationships through the ORM with a conditional on the parent where record child exist?

I am trying to create a query through the Django ORM which is a straight join. I am trying to extract only records from parent table that have an entry in the child table, In addition, I would like to add a conditional on the parent table.
Here are the sample model:
class Reporter(models.Model):
first_name = models.CharField(max_length=64)
last_name = models.CharField(max_length=64)
class Article(models.Model):
pub_date = models.DateField()
headline = models.CharField(max_length=200)
content = models.TextField()
reporter = models.ForeignKey(Reporter)
The SQL would look as follows:
Select * from Reporter
JOIN Article ON Article.reporter_id = Reporter.id
where Reporter.last_name="Jones"
How do I construct the query above using the Django ORM?
This will do an inner join and return reporters:
Reporter.objects.filter(last_name='Jones', article__isnull=False)
(it will also add a harmless article.id IS NOT NULL to the WHERE)