Is it possible to prevent Django from using INNER JOIN in SQL relationship queries when unnecessary?
I have the two tables:
class Author(models.Model):
name = models.CharField(max_length=50, primary_key=True, db_index=True)
hometown = models.CharField(max_length=50)
class Book(models.Model):
title = models.CharField(max_length=50, primary_key=True, db_index=True)
author = models.ForeignKey(Author, db_index=True)
The author table has more than 50 million rows, which makes requests such getting all the books of one author, Book.objects.filter(author_id='John Smith'), incredibly slow (about 20 sec). However, when I use raw SQL to achieve the same result, the query is almost instant: SELECT * FROM books WHERE author_id='John Smith';.
Using result.query I have found that Django is slower because it runs a INNER JOIN query on the entire table:
SELECT books.title, books.author_id FROM books INNER JOIN authors
ON (books.author_id = authors.name) WHERE books.author_id = 'John Smith';
Is there a way to make Django avoid the INNER JOIN in cases such as this when it isn't necessary?
I would like to avoid using raw SQL queries if at all possible as this database structure is highly simplified.
The solution turned out to be removing a class Meta option:
class Book(models.Model):
(...)
class Meta:
ordering = ['author', 'title']
Related
Imagine I have two simple models (it's not really what I have but this will do):
Class Person(models.Model):
person_id = models.TextField()
name = models.TextField()
#...some other fields
Class Pet(models.Model):
person_id = models.TextField()
pet_name = models.TextField()
species = models.TextField()
#...even more fields
Here's the key difference between this example and some other questions I read about: my models don't enforce a foreign key, so I can't use select_related()
I need to create a view that shows a join between two querysets in each one. So, let's imagine I want a view with all owners named John with a dog.
# a first filter
person_query = Person.objects.filter(name__startswith="John")
# a second filter
pet_query = Pet.objects.filter(species="Dog")
# the sum of the two
magic_join_that_i_cant_find_and_possibly_doesnt_exist = join(person_query.person_id, pet_query.person_id)
Now, can I join those two very very simple querysets with any function?
Or should I use raw?
SELECT p.person_id, p.name, a.pet_name, a.species
FROM person p
LEFT JOIN pet a ON
p.person_id = a.person_id AND
a.species = 'Dog' AND
p.name LIKE 'John%'
Is this query ok? Damn, I'm not sure anymore... that's my issue with queries. Everything is all at once. But consecutive queries seem so simple...
If I reference in my model class a "foreign key" (for select_related() use), will it be enforced in the database after the migration? (I need that it DOESN'T happen)
Make a models.ForeignKey but use db_constraint=False.
See https://docs.djangoproject.com/en/3.0/ref/models/fields/#django.db.models.ForeignKey.db_constraint
Also, if this model is managed=False, ie it is a legacy db table and you're not using Django migrations, the constraint won't ever be made in the first place and it's fine.
If you create a FK in the model, Django will create a constraint on migration, so you want to avoid that in your case.
I don't think there is a way to join in the database in Django if you don't declare the field to join as a foreign key. The only thing you can do is to do the join in Python, which might or might not be OK. Think that prefetch_related does precisely this.
The code would be something like:
person_query = Person.objects.filter(name__startswith="John")
person_ids = [person.id for person in person_query]
pet_query = Pet.objects.filter(species="Dog", person_id__in=person_ids).order_by('person_id')
pets_by_person_id = {person_id: pet_group for person_id, pet_group in itertools.groupby(pet_query, lambda pet: pet.person_id)}
# Now everytime you need the pets for a certain person
pets_by_person_id(person.id)
# You can also set it in all objects for easy retrieval
for person in person_query:
person.pets = pets_by_person_id(person.id)
The code might not be 100% accurate, but you get the idea I hope.
I'm in a process of optimizing my queries. Assume I have these models:
class Author(models.Model):
name = models.CharField(max_length=20)
class Book(models.Model):
name = models.CharField(max_length=20)
author = models.ForeignKey(Author)
A simple task here would be to get all the books of a given author, assume I have the author-ID.
In standard SQL I would only need to query the books table.
But In django code I do:
# given authorID
author = Author.objects.get(pk=authorID)
books = Book.objects.filter(author=author)
Which would take two queries. How can I avoid the first query ?
Try something like:
Book.objects.filter(author_id=authorID)
This will return all the books where author's foreign key is authorID.
I want to display many-to-many field in report.
Currently my model is as follows:
from openerp.esv import orm, fields
class myClass(orm.Model):
_name = 'my.Class'
_columns = {
'teacher_id': fields.many2many('fci.staff','lgna_teacher','ids_lol',
'teacher_ids','Observers'),
}
And I want to display them using SQL select statement.
In my example below, I'm considering that teachers and courses have a many-to-many relationship: teachers can teach multiple courses and courses can be taught by multiple teachers.
from openerp.osv import orm, fields
class Teachers(orm.Model):
_name = 'teachers'
name = fields.Char()
class Course(rm.Model):
_name = 'course'
title = fields.Char()
teacher_ids = fields.Many2many('teachers', 'teacher_course_rel', 'course_id',
'teacher_id', string='Teachers')
Using SQL (either using Odoo API or your DBMS), you can query the junction table (or cross-reference table) teacher_course_rel to retrieve the needed columns from each table.
For instance, the query below retrieves all the teachers' names teaching the Physics course:
SELECT c.title, t.name
FROM teacher AS t
INNER JOIN teacher_course_rel AS tcr
ON t.id = tcr.teacher_id
INNER JOIN course AS c
ON tcr.course_id = c.id
WHERE c.title = 'Physics'
Please note that I have used an SQL INNER JOIN which returns rows from the two tables only when the conditions are met (i.e. the two INNER JOIN conditions and the WHERE condition). For your purposes, you may wish to use a different type of join depending on the information you wish to retrieve from both your tables.
I am having problems understanding how to make complex queries (or even simple ones) using Django models. I am looking to do an inner join, group by, and count in one statement using django models.
Example:
Select ab.userid, count(ab.userid) as bids, u.username
from auctionbids ab
inner join users u on ab.userid=u.id
group by ab.userid
order by numbids desc;
This type of query is very common and straight forward so I have to imagine it can be done with django models but it is not apparent from the documentation.
edit: added models
class Users(models.Model):
id = models.IntegerField(primary_key=True)
username = models.CharField(max_length=150)
class Auctionbids(models.Model):
id = models.IntegerField(primary_key=True)
user = models.ForeignKey(Users)
If you post your models.py file I can probably give you a more precise answer, but I think what you want is the Django Aggregration API
You would use it something like this:
from django.db.models import Count
User.objects.all().annotate(bids=Count('auctionbids')).order_by('bids')
I am writing a Django application that has a model for People, and I have hit a snag. I am assigning Role objects to people using a Many-To-Many relationship - where Roles have a name and a weight. I wish to order my list of people by their heaviest role's weight. If I do People.objects.order_by('-roles__weight'), then I get duplicates when people have multiple roles assigned to them.
My initial idea was to add a denormalized field called heaviest-role-weight - and sort by that. This could then be updated every time a new role was added or removed from a user. However, it turns out that there is no way to perform a custom action every time a ManyToManyField is updated in Django (yet, anyway).
So, I thought I could then go completely overboard and write a custom field, descriptor and manager to handle this - but that seems extremely difficult when the ManyRelatedManager is created dynamically for a ManyToManyField.
I have been trying to come up with some clever SQL that could do this for me - I'm sure it's possible with a subquery (or a few), but I'd be worried about it not being compatible will all the database backends Django supports.
Has anyone done this before - or have any ideas how it could be achieved?
Django 1.1 (currently beta) adds aggregation support. Your query can be done with something like:
from django.db.models import Max
People.objects.annotate(max_weight=Max('roles__weight')).order_by('-max_weight')
This sorts people by their heaviest roles, without returning duplicates.
The generated query is:
SELECT people.id, people.name, MAX(role.weight) AS max_weight
FROM people LEFT OUTER JOIN people_roles ON (people.id = people_roles.people_id)
LEFT OUTER JOIN role ON (people_roles.role_id = role.id)
GROUP BY people.id, people.name
ORDER BY max_weight DESC
Here's a way to do it without an annotation:
class Role(models.Model):
pass
class PersonRole(models.Model):
weight = models.IntegerField()
person = models.ForeignKey('Person')
role = models.ForeignKey(Role)
class Meta:
# if you have an inline configured in the admin, this will
# make the roles order properly
ordering = ['weight']
class Person(models.Model):
roles = models.ManyToManyField('Role', through='PersonRole')
def ordered_roles(self):
"Return a properly ordered set of roles"
return self.roles.all().order_by('personrole__weight')
This lets you say something like:
>>> person = Person.objects.get(id=1)
>>> roles = person.ordered_roles()
Something like this in SQL:
select p.*, max (r.Weight) as HeaviestWeight
from persons p
inner join RolePersons rp on p.id = rp.PersonID
innerjoin Roles r on rp.RoleID = r.id
group by p.*
order by HeaviestWeight desc
Note: group by p.* may be disallowed by your dialect of SQL. If so, just list all the columns in table p that you intend to use in the select clause.
Note: if you just group by p.ID, you won't be able to call for the other columns in p in your select clause.
I don't know how this interacts with Django.