Flattened list from manytomany - sql

What is the nicests (quickest) way to create a full outer join on 2 models related by a manytomany field.
for example a book and an author (normally in this case you would use a foreignkey), but suppose a book can have 2 authors (to get my case).
for example:
class Author(models.Model):
books = models.ManyToManyField(Book,related_name='book_author')
class Book(models.Model):
title = models.CharField()
and now i want to create a list with: (preferably a queryset)
author1 , book1
author1, book2
author2, book1
author2, book3
author3, book4
probably because of the time at fridays, but need a bit of help with this...
I want to offer the flat result to an api (DRF), so would be nice to get a queryset of this join.

You are trying to access the auto-generated through model between the Author and Book (Book_authors). You should be able to get that result like this
>>> Book.authors.through.objects.select_related('book', 'author')
<QuerySet [<Book_authors: Book_authors object>, ...>
To get the primary keys only, you can use values_list
>>> Book.authors.through.objects.values_list('book', 'author')
<QuerySet [(1, 1), (1, 2), (1, 3)]>

Related

Select Related With Multiple Conditions

Using the Django ORM is it possible to perform a select_related (left join) with conditions additional to the default table1.id = table2.fk
Using the example models:
class Author(models.Model):
name = models.TextField()
age = models.IntegerField()
class Book(models.Model):
title = models.TextField()
and the raw sql
SELECT 'Book'.*, 'Author'.'name'
FROM 'Book'
LEFT JOIN
'Author'
ON 'Author'.'id' = 'Book'.'author_id'
AND 'Author'.'age' > 18 ;<---this line here is what id like to use via the ORM
I understand that in this simple example you can perform the filtering after the join, but that hasn't worked in my specific case. As i am doing sums across multiple left joins that require filters.
# gets all books which has author with age higher than 18
books = Book.objects.filter(author__age__gt=18)
returns queryset.
Then you can loop trough the queryset to access specific values and print them:
for b in books:
print(b.title, b.author.name, b.author.age)

Django ORM: Get first instance for each foreignkey

I have the following models:
class Author(models.Model):
name = models.CharField(max_length=100)
class Book(models.Model):
title = models.CharField(max_length=100)
author = models.ForeignKey(Author, on_delete=models.CASCADE)
number = models.IntegerField()
Some Authors might have no Book. Is there a way to get a list or set containing exactly one Book per Author who wrote at least on Book? I'm looking for a solution in one single SQL transaction.
For example, if I have the following entries:
Authors:
Albert Camus
Friedrich Nietzsche
Sigmund Freud
Books:
Thus Spoke Zarathustra by Nietzsche
The Myth of Sisyphus by Camus
The Rebel by Camus
I want a query which returns [Thus Spoke Zarathustra, The Myth of Sisyphus], or [Thus Spoke Zarathustra, The Rebel].
Bonus points of the query returns the books with the lowest number.
You should be able to achieve this using a Subquery - for example
from django.db.models import OuterRef, Subquery
books = Book.objects.filter(author_id=OuterRef('pk')).order_by('number')
authors = Author.objects.annotate(book_title=Subquery(books.values('title')))
for author in authors:
print(author.name, author.book_title)
You'll have to use raw queries. Something like this:
query='''
SELECT b1.id, b1.title, b1.author_id, b1.number from Books_book b1, (
SELECT author_id, min(number) as min_number
from Books_book
GROUP BY author_id
) as b2
WHERE b1.author_id=b2.author_id AND b1.number = b2.min_number
'''
books_list = Book.objects.raw(query)[:]
Now books_list contains one book for each author(with lowest number), as required

Django ALL in query (as opposed to OR / plain IN clause)

Say I have 2 models joined via many-to-many:
class Person(models.Model):
name = models.CharField(max_length=200, null=False, blank=False)
sports = models.ManyToManyField('Sport')
class Sport(models.Model):
name = models.CharField(max_length=200, null=False, blank=False)
people = models.ManyToManyField('Person')
I'd like to perform an AND query to filter Person by those who play ALL sports given a list of sport ids. So something like:
Person.objects.filter(sports__id__all=[1,2,3])
Or, said differently, exclude anyone that doesn't play ALL the sports.
Filtering (retaining People that play all given Sports)
The solution is not trivial. If you however can calculate the length of the list, then it can be done by calculating the number of overlap between the list of sports, and the sports a Person plays:
from django.db.models import Count
sports_list = [1, 2, 3]
Person.objects.filter(
sports__in=sports_list
).annotate(
overlap=Count('sports')
).filter(overlap=len(sports_list))
So in case the number of sports of a Person that are in the sports_list is the number of elements in the sports_list, then we know that person plays all those sports.
non-unique Sports in the sport_list
Note that sport_lists should contain unique Sport objects (or id's). You can however build a set of sports, for example:
# in case a sport can occur *multiple times in the list
from django.db.models import Count
sports_set = set([1, 2, 3, 2, 3, 3])
Person.objects.filter(
sports__in=sports_set
).annotate(
overlap=Count('sports')
).filter(overlap=len(sports__set))
SQL query
Behind the curtains, we will construct a query like:
SELECT `person`.*
FROM `person`
INNER JOIN `person_sport` ON `person`.`id` = `person_sport`.`person_id`
WHERE `person_sport`.`sport_id` IN (1, 2, 3)
GROUP BY `person`.`id`
HAVING COUNT(`person_sport`.`sport_id`) = 3
Excluding (retaining People that do not play all given Sports)
A related problem might be to exclude those persons: persons that play all the specified sports. We can do this as well, but then there can occur a problem: people that play no sport at all, will be excluded as well, since the first .filter(..) will remove these people. We can however slightly change the code, such that these are included as well:
# opposite problem: excluding those people
from django.db.models import Q, Count
sports_set = set([1, 2, 3, 2, 3, 3])
Person.objects.filter(
Q(sports__in=sports_set) | Q(sports__isnull=True)
).annotate(
overlap=Count('sports')
).exclude(overlap=len(sports__set))

SQL join both ways to one result

I have two tables "TestItem" and "Connector" where Connector is used for relating two items in "TestItem".
I have two questions in prioritized order. But first, feel free to suggest alternative approaches. I'm open for suggestion to completely rethink my approach to what I want to achieve here.
Question 1) How to get relations both ways returned in the same result
Question 2) How to filter the most efficient way for specific items
Q1)
Two tables
Table: "TestItem"
ID, ITEM
1, "John Doe"
2, "Peggy Sue"
3, "Papa Sue"
Table: "Connector"
MOTHER, CHILD
1,2
The connector table will be used for several purposes (see below), but this is a destilled scenario for the equal type connection, like for instance marriage. If "John Doe" is married to "Peggy Sue" that information should also be sufficient to return "Peggy Sue" as married to "John Doe".
I can do this in two queries, but for efficiency (especially regarding my question 2) I'd appreciate this done in one query, so an implementation is not dependent on which way the connection is defined.
What is the most efficient way to do this?
Two queries approach to illustrate how the data can be fetched, but how one connection is missed one way or the other.
//Connector through "mother"-part SELECT ITEM, SUBITEM FROM TestItem
INNER JOIN (
SELECT MOTHER, ITEM AS SUBITEM
FROM Connector
INNER JOIN TestItem ON Connector.CHILD = TestItem.ID
) AS SUB ON TestItem.ID = SUB.MOTHER
/* WHERE ITEM = "John Doe" return "Peggy Sue" => Correct
WHERE ITEM = "Peggy Sue" return nothing => Wrong
*/
//Connector through "child"-part SELECT ITEM, SUBITEM FROM TestItem
INNER JOIN (
SELECT CHILD, ITEM AS SUBITEM
FROM Connector
INNER JOIN TestItem ON Connector.MOTHER= TestItem.ID
) AS SUB ON TestItem.ID = SUB.CHILD
/* WHERE ITEM = "John Doe" return nothing => Wrong
WHERE ITEM = "Peggy Sue" return "John Doe" => Correct
*/
Q2) Having the two approaches returned in one result may increase the amount of data involved, and hence bring down performance. If my focus is Peggy Sue, I assume sorting out only the relevant data as early as possible will improve performance. Is there a neat way of doing this from top level, or will every sub-query require an added WHERE?
PS: Some more information of the bigger perspective.
I'm planning to use the connector table for several purposes, both of the mentioned equal type, like colleagues, family, friends, etc, but also for hierarchical connection types like mother/child, leader/employee, country/city.
Thus solutions eliminating the mother/child-type connection may not suit my bigger purpose.
Basically I'm requesting how to handle the equal type of connections without losing the opportunity to use the same architecture and data for hierarchical connections.
Peggy Sue may through the same dataset be defined as daughter of Papa Sue through the relation
Mother, Child, Mother_type, Child_type
3, 2, Father, Daughter
1, 2, Married to, Married to
(But this is as mentioned on the side of what I'm requesting here. )
UNION ALL might be what you are looking for:
select mother.id as connectedToId,
mother.item as connectedToItem,
'Mother' as role
from TestItem ti
join Connector c on c.child = ti.id
join TestItem mother on c.mother = mother.id
where ti.item = 'John Doe'
union all
select child.id as connectedToId,
child.item as connectedToItem,
'Child' as role
from TestItem ti
join Connector c on c.mother = ti.id
join TestItem child on c.child = child.id
where ti.item = 'John Doe'

Storing constants in tuple vs table

What is the best practice for when to store constant values in a table vs a tuple. For example, is it better to do this:
class ModelA(models.Model):
SOME_VALUES = (
(0, 'A1'),
(1, 'A2'),
(2, 'A3'),
)
fieldA = models.IntegerField(choices=SOME_VALUES)
or have another model for the constant values:
class ConstantValue(models.Model):
text = model.CharField(max_length=2)
class ModelA(models.Model):
fieldA = models.OneToOneField(ConstantValue)
and a fixture would populate ConstantValue.
I have combination of the above in my code, but I'd like some consistency.
And what about many-to-many relationships? I have a model with constants as above and another model that points to the constant model with a ManyToManyField relationship. It's similar to the Django tutorial's pizza example. But I suppose I could have this:
class Topping(models.Model):
TOPPINGS = (
(0, 'Tomato'),
(1, 'Peppers'),
...
)
topping = models.IntegerField(choices=TOPPINGS)
class Pizza(models.Model):
topping = models.ForeignKey(Topping)
i have a very basic answer:
Whenever your Domain is probaply changing, use another (domain_) table (or other class) like a pizzaname; if you dont see a chance for changing - like gender (m/f/dont know) keep it short, altough the gender (of an entry) may change, the range wont (or most likely wont)