I'm in a process of optimizing my queries. Assume I have these models:
class Author(models.Model):
name = models.CharField(max_length=20)
class Book(models.Model):
name = models.CharField(max_length=20)
author = models.ForeignKey(Author)
A simple task here would be to get all the books of a given author, assume I have the author-ID.
In standard SQL I would only need to query the books table.
But In django code I do:
# given authorID
author = Author.objects.get(pk=authorID)
books = Book.objects.filter(author=author)
Which would take two queries. How can I avoid the first query ?
Try something like:
Book.objects.filter(author_id=authorID)
This will return all the books where author's foreign key is authorID.
Related
I have the following models in Django:
class Author(models.Model):
name = models.CharField(max_length=120)
country = models.CharField(max_length=100)
class Book(models.Model):
title = models.CharField(max_length=1024)
publisher = models.CharField(max_length=255)
published_date = models.DateField()
author = models.ForeignKey(Author)
There are 9 records in the Author table and 4 in the Book table.
How many SQL queries would be issued when Book.objects.select_related().all() is evaluated?
My guess was 4, because there are 4 rows in the Book table, so 1 query each to search for all the authors related to each book. Why is my answer wrong?
The possible choices are 5, 4, 10 and 1.
select_related(*fields)
Returns a QuerySet that will “follow” foreign-key relationships, selecting additional related-object data when it executes its query. This is a performance booster which results in a single more complex query but means later use of foreign-key relationships won’t require database queries.
Imagine I have two simple models (it's not really what I have but this will do):
Class Person(models.Model):
person_id = models.TextField()
name = models.TextField()
#...some other fields
Class Pet(models.Model):
person_id = models.TextField()
pet_name = models.TextField()
species = models.TextField()
#...even more fields
Here's the key difference between this example and some other questions I read about: my models don't enforce a foreign key, so I can't use select_related()
I need to create a view that shows a join between two querysets in each one. So, let's imagine I want a view with all owners named John with a dog.
# a first filter
person_query = Person.objects.filter(name__startswith="John")
# a second filter
pet_query = Pet.objects.filter(species="Dog")
# the sum of the two
magic_join_that_i_cant_find_and_possibly_doesnt_exist = join(person_query.person_id, pet_query.person_id)
Now, can I join those two very very simple querysets with any function?
Or should I use raw?
SELECT p.person_id, p.name, a.pet_name, a.species
FROM person p
LEFT JOIN pet a ON
p.person_id = a.person_id AND
a.species = 'Dog' AND
p.name LIKE 'John%'
Is this query ok? Damn, I'm not sure anymore... that's my issue with queries. Everything is all at once. But consecutive queries seem so simple...
If I reference in my model class a "foreign key" (for select_related() use), will it be enforced in the database after the migration? (I need that it DOESN'T happen)
Make a models.ForeignKey but use db_constraint=False.
See https://docs.djangoproject.com/en/3.0/ref/models/fields/#django.db.models.ForeignKey.db_constraint
Also, if this model is managed=False, ie it is a legacy db table and you're not using Django migrations, the constraint won't ever be made in the first place and it's fine.
If you create a FK in the model, Django will create a constraint on migration, so you want to avoid that in your case.
I don't think there is a way to join in the database in Django if you don't declare the field to join as a foreign key. The only thing you can do is to do the join in Python, which might or might not be OK. Think that prefetch_related does precisely this.
The code would be something like:
person_query = Person.objects.filter(name__startswith="John")
person_ids = [person.id for person in person_query]
pet_query = Pet.objects.filter(species="Dog", person_id__in=person_ids).order_by('person_id')
pets_by_person_id = {person_id: pet_group for person_id, pet_group in itertools.groupby(pet_query, lambda pet: pet.person_id)}
# Now everytime you need the pets for a certain person
pets_by_person_id(person.id)
# You can also set it in all objects for easy retrieval
for person in person_query:
person.pets = pets_by_person_id(person.id)
The code might not be 100% accurate, but you get the idea I hope.
Is it possible to prevent Django from using INNER JOIN in SQL relationship queries when unnecessary?
I have the two tables:
class Author(models.Model):
name = models.CharField(max_length=50, primary_key=True, db_index=True)
hometown = models.CharField(max_length=50)
class Book(models.Model):
title = models.CharField(max_length=50, primary_key=True, db_index=True)
author = models.ForeignKey(Author, db_index=True)
The author table has more than 50 million rows, which makes requests such getting all the books of one author, Book.objects.filter(author_id='John Smith'), incredibly slow (about 20 sec). However, when I use raw SQL to achieve the same result, the query is almost instant: SELECT * FROM books WHERE author_id='John Smith';.
Using result.query I have found that Django is slower because it runs a INNER JOIN query on the entire table:
SELECT books.title, books.author_id FROM books INNER JOIN authors
ON (books.author_id = authors.name) WHERE books.author_id = 'John Smith';
Is there a way to make Django avoid the INNER JOIN in cases such as this when it isn't necessary?
I would like to avoid using raw SQL queries if at all possible as this database structure is highly simplified.
The solution turned out to be removing a class Meta option:
class Book(models.Model):
(...)
class Meta:
ordering = ['author', 'title']
Suppose I have a Django model...
class Book(models.Model):
name = models.CharField(max_length=50)
author = models.CharField(max_length=50)
publisher = models.CharField(max_length=50)
and I want to know how many publishers each author has used? I've tried something like...
Book.objects.value('author').annotate(Count('publisher'))
but it doesn't seem to be working. What is the correct Django query?
Or alternatively how do I convert this sql query into Django?
SELECT author, COUNT(publisher) FROM Book GROUP BY author;
I am trying to create a query through the Django ORM which is a straight join. I am trying to extract only records from parent table that have an entry in the child table, In addition, I would like to add a conditional on the parent table.
Here are the sample model:
class Reporter(models.Model):
first_name = models.CharField(max_length=64)
last_name = models.CharField(max_length=64)
class Article(models.Model):
pub_date = models.DateField()
headline = models.CharField(max_length=200)
content = models.TextField()
reporter = models.ForeignKey(Reporter)
The SQL would look as follows:
Select * from Reporter
JOIN Article ON Article.reporter_id = Reporter.id
where Reporter.last_name="Jones"
How do I construct the query above using the Django ORM?
This will do an inner join and return reporters:
Reporter.objects.filter(last_name='Jones', article__isnull=False)
(it will also add a harmless article.id IS NOT NULL to the WHERE)