Join with subquery in Django ORM - sql

I want to run a filter using Django's ORM such that I get a distinct set of users with each user's most recent session. I have the tables set up so that a user has many sessions; there is a User and Session model with the Session model having a user = models.ForeignKey(User).
What I've tried so far is Users.objects.distinct('username').order_by('session__last_accessed'), but I know that this won't work because Django puts the session.last_accessed column into the selection, and so it's returning me, for example, 5 duplicate usernames with 5 distinct sessions rather than the single recent session and user.
Is it possible to query this via Django's ORM?
Edit: Okay, after some testing with SQL I've found that the SQL I want to use is:
select user.username, sub_query.last_accessed from (
select user_id, max(last_accessed) as last_accessed
from session
group by user_id
) sub_query
join user on
user.id = sub_query.user_id
order by sub_query.last_accessed desc
limit 5
And I can do sub_query via Session.objects.values('user').annotate(last_accessed=Max('last_accessed')). How can I use this sub_query to get the data I want with the ORM?
Edit 2: Specifically, I want to do this by performing one query only, like the SQL above does. Of course, I can query twice and do some processing in Python, but I'd prefer to hit the database once while using the ORM.

If you are using mysql backend, the following solution can be useful:
users_in_session = Session.objects.values_list('user_id', flat=True)
sessions_by_the_user_list = Session.objects \
.filter(user__in=set(users_in_session)) \
.order_by('last_accessed').distinct()
If you use the sub_query, then order_by('last_accessed') function should be good enough to get data in ordered list. Although as far as I have tested these results seemed unstable.
Update:
You can try:
Session.objects.values('user') \
.annotate(last_accessed=Max('last_accessed')) \
.orde‌​r_by('last_accessed').distinct()

Calling distinct('username') shouldn't return duplicate usernames ever. Are you sure you are using Django version that supports .dictinct(fields), that is Django version later than 1.4? Prior to Django 1.4 .distinct(fields) was accepted by the oRM, but it didn't actually do the correct DISTINCT ON query.
Another hint that things aren't working as expected is that .distinct(username).order_by(session__last_accessed) isn't a valid query - the order_by should have username as first argument because order_by must be prefixed with the field names in .distinct() call. See https://docs.djangoproject.com/en/1.4/ref/models/querysets/#django.db.models.query.QuerySet.distinct for details.

Related

When an user views all articles, see their own articles first

we have two models
article
user
user have many articles
i need When an user views all articles, see their own articles first.
my idea
make two query
first query return articles related to query
Article.where(user_id: user_id)
second query
Article.where.not(user_id: user_id)
and merge result
second Idea
get all articles and select method in ruby
but i need best way make this
i use Ruby On Rails 6.1 and Ruby 3
You could run one query but sort the articles with SQL depending on if they have a matching user_id:
Article.order(Arel.sql("CASE user_id WHEN #{user_id} THEN 0 ELSE 1 END"))
Note: order does not support multiple arguments and input sanitization out of the box. Use this only, when you are sure that the user_id contains only a valid user id, for example, be using current_user.id instead of user_id
In Rails 7 there will be a new method called in_order_of which would allow writing the same functionality like this:
Article.in_order_of(:user_id, user_id)
More programmatic approach
articles = Article.arel_table
case_statement = Arel::Nodes::Case.new(articles[:user_id])
.when(user_id)
.then(0)
.else(1)
Article.order(case_statement)

Django ORM filter multiple fields using 'IN' statement

So I have the following model in Django:
class MemberLoyalty(models.Model):
date_time = models.DateField(primary_key=True)
member = models.ForeignKey(Member, models.DO_NOTHING)
loyalty_value = models.IntegerField()
My goal is to have all the tuples grouped by the member with the most recent date. There are many ways to do it, one of them is using a subquery that groups by the member with max date_time and filtering member_loyalty with its results. The working sql for this solution is as follows:
SELECT
*
FROM
member_loyalty
WHERE
(date_time , member_id) IN (SELECT
max(date_time), member_id
FROM
member_loyalty
GROUP BY member_id);
Another way to do this would be by joining with the subquery.
How could i translate this on a django query? I could not find a way to filter with two fields using IN, nor a way to join with a subquery using a specific ON statement.
I've tried:
cls.objects.values('member_id', 'loyalty_value').annotate(latest_date=Max('date_time'))
But it starts grouping by the loyalty_value.
Also tried building the subquery, but cant find how to join it or use it on a filter:
subquery = cls.objects.values('member_id').annotate(max_date=Max('date_time'))
Also, I am using Mysql so I can not make use of the .distinct('param') method.
This is a typical greatest-per-group query. Stack-overflow even has a tag for it.
I believe the most efficient way to do it with the recent versions of Django is via a window query. Something along the lines should do the trick.
MemberLoyalty.objects.all().annotate(my_max=Window(
expression=Max('date_time'),
partition_by=F('member')
)).filter(my_max=F('date_time'))
Update: This actually won't work, because Window annotations are not filterable. I think in order to filter on window annotation you need to wrap it inside a Subquery, but with Subquery you are actually not obligated to use a Window function, there is another way to do it, which is my next example.
If either MySQL or Django does not support window queries, then a Subquery comes into play.
MemberLoyalty.objects.filter(
date_time=Subquery(
(MemberLoyalty.objects
.filter(member=OuterRef('member'))
.values('member')
.annotate(max_date=Max('date_time'))
.values('max_date')[:1]
)
)
)
If event Subqueries are not available (pre Django 1.11) then this should also work:
MemberLoyalty.objects.annotate(
max_date=Max('member__memberloyalty_set__date_time')
).filter(max_date=F('date_time'))

Rails .joins doesn't load the association

Helo,
My query:
#county = County.joins(:state)
.where("counties.slug = ? AND states.slug = ?", params[:county_slug])
.select('states.*, counties.*')
.first!
From the log, the SQL looks like this:
SELECT states.*, counties.* FROM "counties" INNER JOIN "states" ON "states"."id" = "counties"."state_id" LIMIT 1
My problem is that is doesn't eager load the data from the associated table (states), because when I do, for example, #county.state.name, it runs another query, although, as you can see from the log, it had already queried the database for the data in that table as well. But it doesn't pre populate #county.state
Any idea how i can get all the data from the database in just ONE query?
Thx
I think you need to use include instead of joins to get the eager loading. There's a good railscasts episode about the differences: http://railscasts.com/episodes/181-include-vs-joins , in particular:
The question we need to ask is “are we using any of the related model’s attributes?” In our case the answer is “yes” as we’re showing the user’s name against each comment. This means that we want to get the users at the same time as we retrieve the comments and so we should be using include here.

Django Really Simple Aggregation (or Group By)

Imagine I just want to know the number of users with the same first_name in Django's auth app.
I know how to do this really easy in SQL:
SELECT first_name, COUNT(1) as num_users
FROM auth_user
GROUP BY first_name
ORDER BY num_users DESC;
And I also know how to get the desired output in Django (e.g. like going through all the users, get their email and do a filter and count, for instance).
Isn't there a simpler way to do this via Django's ORM? I can accomplish it if I'm aggregating with a foreign key but not with one of the table fields. I'm pretty sure I'm missing something.
Thanks.
I blogged about this very issue a couple of years ago. Contrary to the other answers, it's perfectly possible in Django, with no need for raw SQL.
Django's annotations allow you to attach some basic calculations to each object in your queryset (or aggregations across the entire queryset) but you can't filter those annotations (i.e. in your case, you only want to count thoseusers who share your name)
Django also has F() objects which allow you to use a fields value within a query. Ideally you could use these in conjunction with annotations to filter the objects you are annotation, but that's not currently possible (there's a fix on the way)
So, an easy solution is to perform the annotation manually:
users = User.objects.all().extra(select={
'same_name_count' : """
SELECT COUNT(*)
FROM auth_user
WHERE auth_user.first_name = user.first_name
"""
})
Check this: https://docs.djangoproject.com/en/dev/topics/db/aggregation/
from django.db.models import Count
auth_user.objects.annotate(num_users=Count('first_name'))
For more complex queries you can use plain SQL but try to avoid it.
UPD Code fixed. Thanks Timmy O'Mahony for mention!

Django - finding the extreme member of each group

I've been playing around with the new aggregation functionality in the Django ORM, and there's a class of problem I think should be possible, but I can't seem to get it to work. The type of query I'm trying to generate is described here.
So, let's say I have the following models -
class ContactGroup(models.Model):
.... whatever ....
class Contact(models.Model):
group = models.ForeignKey(ContactGroup)
name = models.CharField(max_length=20)
email = models.EmailField()
...
class Record(models.Model):
contact = models.ForeignKey(Contact)
group = models.ForeignKey(ContactGroup)
record_date = models.DateTimeField(default=datetime.datetime.now)
... name, email, and other fields that are in Contact ...
So, each time a Contact is created or modified, a new Record is created that saves the information as it appears in the contact at that time, along with a timestamp. Now, I want a query that, for example, returns the most recent Record instance for every Contact associated to a ContactGroup. In pseudo-code:
group = ContactGroup.objects.get(...)
records_i_want = group.record_set.most_recent_record_for_every_contact()
Once I get this figured out, I just want to be able to throw a filter(record_date__lt=some_date) on the queryset, and get the information as it existed at some_date.
Anybody have any ideas?
edit: It seems I'm not really making myself clear. Using models like these, I want a way to do the following with pure django ORM (no extra()):
ContactGroup.record_set.extra(where=["history_date = (select max(history_date) from app_record r where r.id=app_record.id and r.history_date <= '2009-07-18')"])
Putting the subquery in the where clause is only one strategy for solving this problem, the others are pretty well covered by the first link I gave above. I know where-clause subselects are not possible without using extra(), but I thought perhaps one of the other ways was made possible by the new aggregation features.
It sounds like you want to keep records of changes to objects in Django.
Pro Django has a section in chapter 11 (Enhancing Applications) in which the author shows how to create a model that uses another model as a client that it tracks for inserts/deletes/updates.The model is generated dynamically from the client definition and relies on signals. The code shows most_recent() function but you could adapt this to obtain the object state on a particular date.
I assume it is the tracking in Django that is problematic, not the SQL to obtain this, right?
First of all, I'll point out that:
ContactGroup.record_set.extra(where=["history_date = (select max(history_date) from app_record r where r.id=app_record.id and r.history_date <= '2009-07-18')"])
will not get you the same effect as:
records_i_want = group.record_set.most_recent_record_for_every_contact()
The first query returns every record associated with a particular group (or associated with any of the contacts of a particular group) that has a record_date less than the date/ time specified in the extra. Run this on the shell and then do this to review the query django created:
from django.db import connection
connection.queries[-1]
which reveals:
'SELECT "contacts_record"."id", "contacts_record"."contact_id", "contacts_record"."group_id", "contacts_record"."record_date", "contacts_record"."name", "contacts_record"."email" FROM "contacts_record" WHERE "contacts_record"."group_id" = 1 AND record_date = (select max(record_date) from contacts_record r where r.id=contacts_record.id and r.record_date <= \'2009-07-18\')
Not exactly what you want, right?
Now the aggregation feature is used to retrieve aggregated data and not objects associated with aggregated data. So if you're trying to minimize number of queries executed using aggregation when trying to obtain group.record_set.most_recent_record_for_every_contact() you won't succeed.
Without using aggregation, you can get the most recent record for all contacts associated with a group using:
[x.record_set.all().order_by('-record_date')[0] for x in group.contact_set.all()]
Using aggregation, the closest I could get to that was:
group.record_set.values('contact').annotate(latest_date=Max('record_date'))
The latter returns a list of dictionaries like:
[{'contact': 1, 'latest_date': somedate }, {'contact': 2, 'latest_date': somedate }]
So one entry for for each contact in a given group and the latest record date associated with it.
Anyway, the minimum query number is probably 1 + # of contacts in a group. If you are interested obtaining the result using a single query, that is also possible, but you'll have to construct your models in a different way. But that's a totally different aspect of your problem.
I hope this will help you understand how to approach the problem using aggregation/ the regular ORM functions.