Django ORM version of SQL COUNT(DISTINCT <column>)

Django ORM version of SQL COUNT(DISTINCT <column>) - sql

I need to fill in a template with a summary of user activity in a simple messaging system. For each message sender, I want the number of messages sent and the number of distinct recipients.
Here's a simplified version of the model:
class Message(models.Model):
sender = models.ForeignKey(User, related_name='messages_from')
recipient = models.ForeignKey(User, related_name='messages_to')
timestamp = models.DateTimeField(auto_now_add=True)
Here's how I'd do it in SQL:
SELECT sender_id, COUNT(id), COUNT(DISTINCT recipient_id)
FROM myapp_messages
GROUP BY sender_id;
I've been reading through the documentation on aggregation in ORM queries, and although annotate() can handle the first COUNT column, I don't see a way to get the COUNT(DISTINCT) result (even extra(select={}) hasn't been working, although it seems like it should). Can this be translated into a Django ORM query or should I just stick with raw SQL?

You can indeed use distinct and count together, as seen on this answer: https://stackoverflow.com/a/13145407/237091
In your case:
SELECT sender_id, COUNT(id), COUNT(DISTINCT recipient_id)
FROM myapp_messages
GROUP BY sender_id;
would become:
Message.objects.values('sender').annotate(
message_count=Count('sender'),
recipient_count=Count('recipient', distinct=True))

from django.db.models import Count
messages = Message.objects.values('sender').annotate(message_count=Count('sender'))
for m in messages:
m['recipient_count'] = len(Message.objects.filter(sender=m['sender']).\
values_list('recipient', flat=True).distinct())

Related

Annotate queryset with previous object in Django ORM

Example models:
class User(models.Model):
pass
class UserStatusChange(models.Model):
user = models.ForeignKey(User, related_name='status_changes')
status = models.CharField()
start_date = models.DateField()
I want to annotate UserStatusChanges queryset with end_date field, and end_date should be equal to start_date of next status change for the same user.
Eventually, I want to be able to do this:
qs = UserStatusChange.ojects.annotate(end_date=???)
qs = qs.filter(start_date__lte=some_date, end_date__gte=another_date)
Logically that annotation should be something like this:
qs.annotate(
end_date=qs.filter(
user=OuterRef('user'),
start_date__gt=OuterRef('start_date')
).order_by('start_date').first().start_date)
But it should be one DB query, if it is possible.
Solution:
subquery = UserStatusChange.objects.filter(user=OuterRef('user'),
start_date__gt=OuterRef('start_date')).order_by('start_date')
UserStatusChange.objects.annotate(end_date=Subquery(subquery.values('start_date')[:1]))
That works, thank to #hynekcer's answer. But with aggregate I got the error:
ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.
UPD: in Django 2.0+ it can be solved with Lead Window function.
In SQL it will be something like this:
select
user_id, status_id, start_date,
LEAD(start_date, 1) over (partition by user_id order by start_date)
from user_status_change;

You can use Subquery() with OuterRef() in Django 1.11.
from django.db.models import Min, OuterRef, Subquery
from django.db.models.functions import Coalesce
default_end = now() # or the end of the recorded history
qs = (
UserStatusChanges.objects
.annotate(
end_date=Coalesce(
Subquery(
UserStatusChanges.objects
.filter(
user=OuterRef('user'),
start_date__gt=OuterRef('start_date')
)
.order_by()
.aggregate(Min('start_date'))
),
default_end
)
)
)
qs = qs.order_by('user', 'start_date')
# an optional filter
qs = qs.filter(start_date__lte=some_date, end_date__gte=another_date, user__in=[...])
It is compiled as one query when being executed, e.g. when combined with User filter by prefetch_related. If you want a meaningful end_date also for the last item then you can use Coalesce() with a default value equal to the current timestamp.

Django Queries: related subquery

I have 3 Models: Offer, Request and Assignment. Assignment makes a connection between Request and Offer. Now I want to do this:
select *
from offer as a
where places > (
select count(*)
from assignment
where offer_id = a.id and
to_date > "2014-07-07");
I am not quiet sure how to achieve this with a django QuerySet... Any tips?
Edit: The query above is just an example, how the query in general should look like. The django model looks like this:
class Offer(models.Model):
...
places = models.IntegerField()
...
class Request(models.Model):
...
class Assignment(models.Model):
from_date = models.DateField()
to_data = models.DateField()
request = models.ForeignKey("Request",related_name="assignments")
offer = models.ForeignKey("Offer",related_name="assignments")
People now can create a offer with a given amount of places or a request. The admin then will connect a request with an offer for a given time. This is saved as an assignment. The query above should give me a list of offers, which have still places left. Therefore I want to count the number of valid assignments for a given offer to compare it with its number of places. This list should be used to find a possible offer for a given request to create a new assignment.
I hope this describes the problem better.

Unfortunately related subqueries aren't directly supported by ORM operations. Usage of .extra(where=...) should be possible in this case.
To get the same results without using a subquery something like the following should work:
Offer.objects.filter(
assignment__to_date__gt=thedate
).annotate(
assignment_cnt=Count('assignment')
).filter(
assignment_cnt__lte=F('places')
)
The exact query depends on the model definitions.

query = '''select *
from yourapp_offer as a
where places > (
select count(*)
from yourapp_assignment
where offer_id = a.id and
to_date > "2014-07-07");'''
offers = Offer.objects.raw(query):
https://docs.djangoproject.com/en/1.6/topics/db/sql/

Filtering model with HABTM relationship

I have 2 models - Restaurant and Feature. They are connected via has_and_belongs_to_many relationship. The gist of it is that you have restaurants with many features like delivery, pizza, sandwiches, salad bar, vegetarian option,… So now when the user wants to filter the restaurants and lets say he checks pizza and delivery, I want to display all the restaurants that have both features; pizza, delivery and maybe some more, but it HAS TO HAVE pizza AND delivery.
If I do a simple .where('features IN (?)', params[:features]) I (of course) get the restaurants that have either - so or pizza or delivery or both - which is not at all what I want.
My SQL/Rails knowledge is kinda limited since I'm new to this but I asked a friend and now I have this huuuge SQL that gets the job done:
Restaurant.find_by_sql(['SELECT restaurant_id FROM (
SELECT features_restaurants.*, ROW_NUMBER() OVER(PARTITION BY restaurants.id ORDER BY features.id) AS rn FROM restaurants
JOIN features_restaurants ON restaurants.id = features_restaurants.restaurant_id
JOIN features ON features_restaurants.feature_id = features.id
WHERE features.id in (?)
) t
WHERE rn = ?', params[:features], params[:features].count])
So my question is: is there a better - more Rails even - way of doing this? How would you do it?
Oh BTW I'm using Rails 4 on Heroku so it's a Postgres DB.

This is an example of a set-iwthin-sets query. I advocate solving these with group by and having, because this provides a general framework.
Here is how this works in your case:
select fr.restaurant_id
from features_restaurants fr join
features f
on fr.feature_id = f.feature_id
group by fr.restaurant_id
having sum(case when f.feature_name = 'pizza' then 1 else 0 end) > 0 and
sum(case when f.feature_name = 'delivery' then 1 else 0 end) > 0
Each condition in the having clause is counting for the presence of one of the features -- "pizza" and "delivery". If both features are present, then you get the restaurant_id.

How much data is in your features table? Is it just a table of ids and names?
If so, and you're willing to do a little denormalization, you can do this much more easily by encoding the features as a text array on restaurant.
With this scheme your queries boil down to
select * from restaurants where restaurants.features #> ARRAY['pizza', 'delivery']
If you want to maintain your features table because it contains useful data, you can store the array of feature ids on the restaurant and do a query like this:
select * from restaurants where restaurants.feature_ids #> ARRAY[5, 17]
If you don't know the ids up front, and want it all in one query, you should be able to do something along these lines:
select * from restaurants where restaurants.feature_ids #> (
select id from features where name in ('pizza', 'delivery')
) as matched_features
That last query might need some more consideration...
Anyways, I've actually got a pretty detailed article written up about Tagging in Postgres and ActiveRecord if you want some more details.

This is not "copy and paste" solution but if you consider following steps you will have fast working query.
index feature_name column (I'm assuming that column feature_id is indexed on both tables)
place each feature_name param in exists():
select fr.restaurant_id
from
features_restaurants fr
where
exists(select true from features f where fr.feature_id = f.feature_id and f.feature_name = 'pizza')
and
exists(select true from features f where fr.feature_id = f.feature_id and f.feature_name = 'delivery')
group by
fr.restaurant_id

Maybe you're looking at it backwards?
Maybe try merging the restaurants returned by each feature.
Simplified:
pizza_restaurants = Feature.find_by_name('pizza').restaurants
delivery_restaurants = Feature.find_by_name('delivery').restaurants
pizza_delivery_restaurants = pizza_restaurants & delivery_restaurants
Obviously, this is a single instance solution. But it illustrates the idea.
UPDATE
Here's a dynamic method to pull in all filters without writing SQL (i.e. the "Railsy" way)
def get_restaurants_by_feature_names(features)
# accepts an array of feature names
restaurants = Restaurant.all
features.each do |f|
feature_restaurants = Feature.find_by_name(f).restaurants
restaurants = feature_restaurants & restaurants
end
return restaurants
end

Since its an AND condition (the OR conditions get dicey with AREL). I reread your stated problem and ignoring the SQL. I think this is what you want.
# in Restaurant
has_many :features
# in Feature
has_many :restaurants
# this is a contrived example. you may be doing something like
# where(name: 'pizza'). I'm just making this condition up. You
# could also make this more DRY by just passing in the name if
# that's what you're doing.
def self.pizza
where(pizza: true)
end
def self.delivery
where(delivery: true)
end
# query
Restaurant.features.pizza.delivery
Basically you call the association with ".features" and then you use the self methods defined on features. Hopefully I didn't misunderstand the original problem.
Cheers!

Restaurant
.joins(:features)
.where(features: {name: ['pizza','delivery']})
.group(:id)
.having('count(features.name) = ?', 2)
This seems to work for me. I tried it with SQLite though.

Django sql order by

I'm really struggling on this one.
I need to be able to sort my user by the number of positive vote received on their comment.
I have a table userprofile, a table comment and a table likeComment.
The table comment has a foreign key to its user creator and the table likeComment has a foreign key to the comment liked.
To get the number of positive vote a user received I do :
LikeComment.objects.filter(Q(type = 1), Q(comment__user=user)).count()
Now I want to be able to get all the users sorted by the ones that have the most positive votes. How do I do that ? I tried to use extra and JOIN but this didn't go anywhere.
Thank you

It sounds like you want to perform a filter on an annotation:
class User(models.Model):
pass
class Comment(models.Model):
user = models.ForeignKey(User, related_name="comments")
class Like(models.Model):
comment = models.ForeignKey(Comment, related_name="likes")
type = models.IntegerField()
users = User \
.objects \
.all()
.extra(select = {
"positive_likes" : """
SELECT COUNT(*) FROM app_like
JOIN app_comment on app_like.comment_id = app_comment.id
WHERE app_comment.user_id = app_user.id AND app_like.type = 1 """})
.order_by("positive_likes")

models.py
class UserProfile(models.Model):
.........
def like_count(self):
LikeComment.objects.filter(comment__user=self.user, type=1).count()
views.py
def getRanking( anObject ):
return anObject.like_count()
def myview(request):
users = list(UserProfile.objects.filter())
users.sort(key=getRanking, reverse=True)
return render(request,'page.html',{'users': users})

Timmy's suggestion to use a subquery is probably the simplest way to solve this kind of problem, but subqueries almost never perform as well as joins, so if you have a lot of users you may find that you need better performance.
So, re-using Timmy's models:
class User(models.Model):
pass
class Comment(models.Model):
user = models.ForeignKey(User, related_name="comments")
class Like(models.Model):
comment = models.ForeignKey(Comment, related_name="likes")
type = models.IntegerField()
the query you want looks like this in SQL:
SELECT app_user.id, COUNT(app_like.id) AS total_likes
FROM app_user
LEFT OUTER JOIN app_comment
ON app_user.id = app_comment.user_id
LEFT OUTER JOIN app_like
ON app_comment.id = app_like.comment_id AND app_like.type = 1
GROUP BY app_user.id
ORDER BY total_likes DESCENDING
(If your actual User model has more fields than just id, then you'll need to include them all in the SELECT and GROUP BY clauses.)
Django's object-relational mapping system doesn't provide a way to express this query. (As far as I know—and I'd be very happy to be told otherwise!—it only supports aggregation across one join, not across two joins as here.) But when the ORM isn't quite up to the job, you can always run a raw SQL query, like this:
sql = '''
SELECT app_user.id, COUNT(app_like.id) AS total_likes
# etc (as above)
'''
for user in User.objects.raw(sql):
print user.id, user.total_likes

I believe this can be achieved with Django's queryset:
User.objects.filter(comments__likes__type=1)\
.annotate(lks=Count('comments__likes'))\
.order_by('-lks')
The only problem here is that this query will miss users with 0 likes. Code from #gareth-rees, #timmy-omahony and #Catherine will include also 0-ranked users.

Django DB API equivalent of a somewhat complex SQL query

I'm new to Django and still having some problems about simple queries.
Let's assume that I'm writting an email application. This is the Mail
model:
class Mail(models.Model):
to = models.ForeignKey(User, related_name = "to")
sender = models.ForeignKey(User, related_name = "sender")
subject = models.CharField()
conversation_id = models.IntegerField()
read = models.BooleanField()
message = models.TextField()
sent_time = models.DateTimeField(auto_now_add = True)
Each mail has conversation_id which identifies a set of email messages
which are written and replyed. Now, for listing emails in inbox, I
would like as gmail to show only last email per conversation.
I have the SQL equivalent which does the job, but how to construct native Django query for this?
select
*
from
main_intermail
where
id in
(select
max(id)
from
main_intermail
group by conversation_id);
Thank you in advance!

Does this work? It would require Django 1.1.
from django.db.models import Max
mail_list = Mail.objects.values('conversation_id').annotate(Max('id'))
conversation_id_list = mail_list.values_list('id__max',flat=True)
conversation_list = Mail.objects.filter(id__in=conversation_id_list)

So, given a conversation_id you want to retrieve the related record which has the highest id. To do this use order_by to sort the results in descending order (because you want the highest id to come first), and then use array syntax to get the first item, which will be the item with the highest id.
# Get latest message for conversation #42
Mail.objects.filter(conversation_id__exact=42).order_by('-id')[0]
However, this differs from your SQL query. Your query appears to provide the latest message from every conversation. This provides the latest message from one specific conversation. You could always do one query to get the list of conversations for that user, and then follow up with multiple queries to get the latest message from each conversation.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Django ORM version of SQL COUNT(DISTINCT <column>) - sql

from django.db.models import Count messages = Message.objects.values('sender').annotate(message_count=Count('sender')) for m in messages: m['recipient_count'] = len(Message.objects.filter(sender=m['sender']).\ values_list('recipient', flat=True).distinct())

Related

Annotate queryset with previous object in Django ORM

Django Queries: related subquery

Filtering model with HABTM relationship

Django sql order by

Django DB API equivalent of a somewhat complex SQL query

Categories

Resources