Django ALL in query (as opposed to OR / plain IN clause) - sql

Say I have 2 models joined via many-to-many:
class Person(models.Model):
name = models.CharField(max_length=200, null=False, blank=False)
sports = models.ManyToManyField('Sport')
class Sport(models.Model):
name = models.CharField(max_length=200, null=False, blank=False)
people = models.ManyToManyField('Person')
I'd like to perform an AND query to filter Person by those who play ALL sports given a list of sport ids. So something like:
Person.objects.filter(sports__id__all=[1,2,3])
Or, said differently, exclude anyone that doesn't play ALL the sports.

Filtering (retaining People that play all given Sports)
The solution is not trivial. If you however can calculate the length of the list, then it can be done by calculating the number of overlap between the list of sports, and the sports a Person plays:
from django.db.models import Count
sports_list = [1, 2, 3]
Person.objects.filter(
sports__in=sports_list
).annotate(
overlap=Count('sports')
).filter(overlap=len(sports_list))
So in case the number of sports of a Person that are in the sports_list is the number of elements in the sports_list, then we know that person plays all those sports.
non-unique Sports in the sport_list
Note that sport_lists should contain unique Sport objects (or id's). You can however build a set of sports, for example:
# in case a sport can occur *multiple times in the list
from django.db.models import Count
sports_set = set([1, 2, 3, 2, 3, 3])
Person.objects.filter(
sports__in=sports_set
).annotate(
overlap=Count('sports')
).filter(overlap=len(sports__set))
SQL query
Behind the curtains, we will construct a query like:
SELECT `person`.*
FROM `person`
INNER JOIN `person_sport` ON `person`.`id` = `person_sport`.`person_id`
WHERE `person_sport`.`sport_id` IN (1, 2, 3)
GROUP BY `person`.`id`
HAVING COUNT(`person_sport`.`sport_id`) = 3
Excluding (retaining People that do not play all given Sports)
A related problem might be to exclude those persons: persons that play all the specified sports. We can do this as well, but then there can occur a problem: people that play no sport at all, will be excluded as well, since the first .filter(..) will remove these people. We can however slightly change the code, such that these are included as well:
# opposite problem: excluding those people
from django.db.models import Q, Count
sports_set = set([1, 2, 3, 2, 3, 3])
Person.objects.filter(
Q(sports__in=sports_set) | Q(sports__isnull=True)
).annotate(
overlap=Count('sports')
).exclude(overlap=len(sports__set))

Related

Return results from more than one database table in Django

Suppose I have 3 hypothetical models;
class State(models.Model):
name = models.CharField(max_length=20)
class Company(models.Model):
name = models.CharField(max_length=60)
state = models.ForeignField(State)
class Person(models.Model):
name = models.CharField(max_length=60)
state = models.ForeignField(State)
I want to be able to return results in a Django app, where the results, if using SQL directly, would be based on a query such as this:
SELECT a.name as 'personName',b.name as 'companyName', b.state as 'State'
FROM Person a, Company b
WHERE a.state=b.state
I have tried using the select_related() method as suggested here, but I don't think this is quite what I am after, since I am trying to join two tables that have a common foreign-key, but have no key-relationships amongst themselves.
Any suggestions?
Since a Person can have multiple Companys in the same state. It is not a good idea to do the JOIN at the database level. That would mean that the database will (likely) return the same Company multiple times, making the output quite large.
We can prefetch the related companies, with:
qs = Person.objects.select_related('state').prefetch_related('state__company')
Then we can query the Companys in the same state with:
for person in qs:
print(person.state.company_set.all())
You can use a Prefetch-object [Django-doc] to prefetch the list of related companies in an attribute of the Person, for example:
from django.db.models import Prefetch
qs = Person.objects.prefetch_related(
Prefetch('state__company', Company.objects.all(), to_attr='same_state_companies')
)
Then you can print the companies with:
for person in qs:
print(person.same_state_companies)

Flattened list from manytomany

What is the nicests (quickest) way to create a full outer join on 2 models related by a manytomany field.
for example a book and an author (normally in this case you would use a foreignkey), but suppose a book can have 2 authors (to get my case).
for example:
class Author(models.Model):
books = models.ManyToManyField(Book,related_name='book_author')
class Book(models.Model):
title = models.CharField()
and now i want to create a list with: (preferably a queryset)
author1 , book1
author1, book2
author2, book1
author2, book3
author3, book4
probably because of the time at fridays, but need a bit of help with this...
I want to offer the flat result to an api (DRF), so would be nice to get a queryset of this join.
You are trying to access the auto-generated through model between the Author and Book (Book_authors). You should be able to get that result like this
>>> Book.authors.through.objects.select_related('book', 'author')
<QuerySet [<Book_authors: Book_authors object>, ...>
To get the primary keys only, you can use values_list
>>> Book.authors.through.objects.values_list('book', 'author')
<QuerySet [(1, 1), (1, 2), (1, 3)]>

Rails query for associated model with max column value

I have 3 models in rails: Author, Book, and Page. pages belongs to book, books belong to author as so:
class Author < ActiveRecord::Base
has_many :books
end
class Book < ActiveRecord::Base
belongs_to :author
has_many :pages
end
class Page < ActiveRecord::Base
belongs_to :book
end
the Page model has a column called page_number. I'm using Postgres.
My question is this: Assuming a have an author #author, how do query for all that author's last pages? In other words, I want the last page of each book written by that author. I am trying the following which isn't working:
Page.where(book_id: #author.books.pluck(:id)).select('MAX(page_number), *').group(:book_id)
EDIT
The following 2 lines work, but I would love to learn of a faster/cleaner solution:
all_pages = Page.where(book: #author.books)
last_pages = all_pages.select{ |a| !all_pages.select{ |b| b.book_id == a.book_id}.any?{ |c| c.page_number > a.page_number } }
The most efficient way might be leveraging postgres' window functions
A query like this doesn't fit into the activerecord common use case, so you may have to use find_by_sql, but it may be well worth it.
In your case, grabbing the book ids first may be a good call, as joining or an additional subquery may not be advantageous—your call.
Let's say you have a list of book ids from #author.books.ids. The next thing we want is a list of pages "grouped by" book so we can pluck off the last page for each group. Let 1,2 be the book ids for the author in question.
We can use a window function and the rank function in postgres to create a resultset where pages are ranked over partitions (group) of book. We'll even sort those partitions of pages by the page number so that the maximum page number (last page) is at the top of each partition. The query would look like this:
select
*,
rank() over (
partition by book_id order by page_number desc
) as reverse_page_index
from pages
where book_id in (1,2)
Our imagined pages result set would look like this.
author 1, book 1, page 3, rank 1
author 1, book 1, page 2, rank 2
author 1, book 1, page 1, rank 3
author 1, book 2, page 6, rank 1
author 1, book 2, page 5, rank 2
author 1, book 2, page 4, rank 3
author 1, book 2, page 3, rank 4
author 1, book 2, page 2, rank 5
author 1, book 2, page 1, rank 6
The pages records are partitioned by book, sorted by page number ascending and given a rank amongst their partition.
If we then want only the first ranked (last page) for each book after the window calculation is performed, we can use a sub-select like so:
select *
from
(
select
*,
rank() over (
partition by book_id order by page_number desc
) as reverse_page_index
from pages
where book_id in (1,2)
) as pages
where reverse_page_index = 1;
Where we filter the above imagined result set to just page records where the rank (reverse_page_index) is 1 (i.e. the last page).
And now our result set would be:
author 1, book 1, page 3, rank 1
author 1, book 2, page 6, rank 1
You could also order this resultset by last modified or whatever you need.
Toss that query in a find_by_sql and you'll have some activerecord objects to use.

Query that finds objects with ALL association ids (Rails 4)

BACKGROUND: Posts have many Communities through CommunityPosts. I understand the following query returns posts associated with ANY ONE of these community_ids.
Post.joins(:communities).where(communities: { id: [1,2,3] })
OBJECTIVE: I'd like to query for posts associated with ALL THREE community_ids in the array. Posts having communities 1, 2, and 3 as associations
EDIT: Please assume that length of the array is unknown. Used this array for explanation purposes.
Try this,
ids=[...]
Post.joins(:communities).select(“count(communities.id) AS cnt”).where(id: ids).group(‘post.id’).having(cnt: ids.size)
ids = [1, 2, 3] # and etc
Post.joins(:communities).where("communities.id IN ?", ids)
Wish it helps .

Rails 3 HABTM query with array, need AND not OR/IN

Say I have two models with a HABTM relationship. Teacher and Student. Here is an example of what I currently have working:
student_ids = [1,2,3,4]
Teacher.joins(:students).where("students.id" => student_ids)
The problem is that this will return all Teacher objects with with ANY of those student ids, but not require ALL of them:
SELECT `teachers`.* FROM `teachers` INNER JOIN `students_teachers` ON `students_teachers`.`teacher_id` = `teachers`.`id` INNER JOIN `students` ON `students`.`id` = `students_teachers`.`student_id` WHERE `students`.`id` IN (1, 2, 3, 4)
I have two cases, one of which is an OR condition, which the above handles fine since I just need to find Teachers with Student.id 1 OR 2 OR 3 OR 4. The other is AND, where I need to ensure that the the Teachers being returned include ALL of the student_ids, so Teachers with Student.id 1 AND 2 AND 3 AND 4.
I would use an include
example here:
clients = Client.includes(:address).limit(10)
this is what would happen:
SELECT * FROM clients LIMIT 10
SELECT addresses.* FROM addresses
WHERE (addresses.client_id IN (1,2,3,4,5,6,7,8,9,10))
you can read more about it here
http://guides.rubyonrails.org/active_record_querying.html
You can do something like:
teachers = nil # so it wont be used in the first pass
Student.include(:teachers).where(id: student_ids).each do |student|
teachers = (teachers || s.teachers) & s.teachers
end