Annotate queryset with previous object in Django ORM - sql

Example models:
class User(models.Model):
pass
class UserStatusChange(models.Model):
user = models.ForeignKey(User, related_name='status_changes')
status = models.CharField()
start_date = models.DateField()
I want to annotate UserStatusChanges queryset with end_date field, and end_date should be equal to start_date of next status change for the same user.
Eventually, I want to be able to do this:
qs = UserStatusChange.ojects.annotate(end_date=???)
qs = qs.filter(start_date__lte=some_date, end_date__gte=another_date)
Logically that annotation should be something like this:
qs.annotate(
end_date=qs.filter(
user=OuterRef('user'),
start_date__gt=OuterRef('start_date')
).order_by('start_date').first().start_date)
But it should be one DB query, if it is possible.
Solution:
subquery = UserStatusChange.objects.filter(user=OuterRef('user'),
start_date__gt=OuterRef('start_date')).order_by('start_date')
UserStatusChange.objects.annotate(end_date=Subquery(subquery.values('start_date')[:1]))
That works, thank to #hynekcer's answer. But with aggregate I got the error:
ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.
UPD: in Django 2.0+ it can be solved with Lead Window function.
In SQL it will be something like this:
select
user_id, status_id, start_date,
LEAD(start_date, 1) over (partition by user_id order by start_date)
from user_status_change;

You can use Subquery() with OuterRef() in Django 1.11.
from django.db.models import Min, OuterRef, Subquery
from django.db.models.functions import Coalesce
default_end = now() # or the end of the recorded history
qs = (
UserStatusChanges.objects
.annotate(
end_date=Coalesce(
Subquery(
UserStatusChanges.objects
.filter(
user=OuterRef('user'),
start_date__gt=OuterRef('start_date')
)
.order_by()
.aggregate(Min('start_date'))
),
default_end
)
)
)
qs = qs.order_by('user', 'start_date')
# an optional filter
qs = qs.filter(start_date__lte=some_date, end_date__gte=another_date, user__in=[...])
It is compiled as one query when being executed, e.g. when combined with User filter by prefetch_related. If you want a meaningful end_date also for the last item then you can use Coalesce() with a default value equal to the current timestamp.

Related

Django ORM: order by minimal value from m2m depending on specific value

I have 4 models:
User, multiple Place(s) where he lives. Each Place have multiple nearby MetroStation(s).
Then, I have time in road (MetroTimes) between all each to each metro stations.
class User(models.Model):
pass
class Place(models.Model):
user = models.ForeignKey(User)
metro_stations = models.ManyToManyField('geo.MetroStation', related_name='places')
class MetroStation(models.Model):
pass
class MetroTimes(models.Model):
metro_station_1 = models.ForeignKey(MetroStation, related_name='metro_stations_1')
metro_station_2 = models.ForeignKey(MetroStation, related_name='metro_stations_2')
time = models.IntegerField()
The task is to sort all users by time in road (MetroTimes) to specific MetroStation from closest MetroStation among all User's Place's)
And the magic I can't deal:
specific_metro_station = MetroStation.objects.get(id=1)
User.objects.all().order_by(closest_metro_station_in_closest_user's_place_by_metro_time_to=specific_metro_station)
Big Thx for help!
I did it using Django's Conditional Expressions and Query Expressions
Code in My case:
from django.db.models import Avg, Max, Min, When
from django.db.models import CharField, Case, Value, When, IntegerField
current = MetroStation.objects.get(id=1)
users = User.objects.all().annotate(
time=Min(
Case(
When(
places__metro_stations__metro_stations_1__metro_station_2=current,
then=F('places__metro_stations__metro_stations_1__time')
),
output_field=IntegerField()
)
)
).order_by('time')

Activerecord or SQL statement to find users where something very specific happens in the join table

I have a User that have_many MyVersions associated.
A MyVersion is created every time the column "profile_id" or "state" are changed in User. MyVersion has these columns:
user_id, object_changed (profile_id or state), before, after
I need to find Users that where active and had a specific profile at a specific time. Meaning, to find all Users when this happens in its associated my_versions:
my_versions was created_at before a date AND where :object_changed is 'state' And within that time range:
1.1 THEN (is not AND) find the last one and only select the user if the value for :after is 'active'
my_versions was created_at before a date AND where :object_changed is 'profile_id' And within that time range:
2.1 THEN find the last one and only select the user if the value for :after is '1'
Select only users that match both 1.1 and 2.1
EDIT 1: Apparently I'm getting closer but still not sure this is getting what I need:
active_user_ids = User.joins(:my_versions).merge(MyVersion.where(
"my_versions.created_at = (SELECT MAX(created_at) from my_versions WHERE
user_id = users.id AND created_at < '2016-01-01' AND object_changed = 'state')
AND my_versions.after = 'activo'")).pluck(:id)
Now I have all user IDS that were active at the time (do I?). Then I can do the same for the profile, but passing also the previous IDS to combine the results properly:
active_and_right_profile =
User.joins(:my_versions).merge(MyVersion.where(
"my_versions.created_at = (SELECT MAX(created_at) from my_versions WHERE
user_id = users.id AND created_at < '2016-01-01' AND object_changed = 'profile_id')
AND my_versions.after = 1")).where(id: active_user_ids)
It doesn't look pretty and I'm not sure I'm getting what I describe above in the specifications. First tests appears to be right but I have many doubts because I don't understand some parts of the query:
Apparently when I use "SELECT MAX ... where user_id = users.id" I'm requiring the top value for each user id. Is that right?
If that's true, I'm getting and array of results and I'm passing it to the first created_at =. This means that if I have other versions outside of the scope of this query but with the exact timestamp, they will be in the results. Is that correct? That's relevant to me because few of those versions.created_at are being updated manually.
How does it look? Is there a way to make it better with only one query? Is there a way to avoid the problem of searching exact created_at values that I mention above?
Thanks!!
Previous attempts:
I tried this:
Class User...
scope :active_at, -> (date) {
joins(:my_versions).merge(MyVersion.on_state.before_date(date)
.where("my_versions.created_at = (SELECT MAX(created_at) FROM my_versions WHERE user_id = users.id AND after = 'activo')"))
}
But this create the folliwing query:
SELECT `users`.* FROM `users` INNER JOIN `my_versions` ON `my_versions`.`user_id` = `users`.`id` WHERE `my_versions`.`object_changed` = 'state' AND (my_versions.created_at < '2016-01-31') AND (my_versions.created_at = (SELECT MAX(created_at) FROM my_versions WHERE user_id = users.id AND after = 'activo'))
This is not what I need.

Django Queries: related subquery

I have 3 Models: Offer, Request and Assignment. Assignment makes a connection between Request and Offer. Now I want to do this:
select *
from offer as a
where places > (
select count(*)
from assignment
where offer_id = a.id and
to_date > "2014-07-07");
I am not quiet sure how to achieve this with a django QuerySet... Any tips?
Edit: The query above is just an example, how the query in general should look like. The django model looks like this:
class Offer(models.Model):
...
places = models.IntegerField()
...
class Request(models.Model):
...
class Assignment(models.Model):
from_date = models.DateField()
to_data = models.DateField()
request = models.ForeignKey("Request",related_name="assignments")
offer = models.ForeignKey("Offer",related_name="assignments")
People now can create a offer with a given amount of places or a request. The admin then will connect a request with an offer for a given time. This is saved as an assignment. The query above should give me a list of offers, which have still places left. Therefore I want to count the number of valid assignments for a given offer to compare it with its number of places. This list should be used to find a possible offer for a given request to create a new assignment.
I hope this describes the problem better.
Unfortunately related subqueries aren't directly supported by ORM operations. Usage of .extra(where=...) should be possible in this case.
To get the same results without using a subquery something like the following should work:
Offer.objects.filter(
assignment__to_date__gt=thedate
).annotate(
assignment_cnt=Count('assignment')
).filter(
assignment_cnt__lte=F('places')
)
The exact query depends on the model definitions.
query = '''select *
from yourapp_offer as a
where places > (
select count(*)
from yourapp_assignment
where offer_id = a.id and
to_date > "2014-07-07");'''
offers = Offer.objects.raw(query):
https://docs.djangoproject.com/en/1.6/topics/db/sql/

How to order a grouped result?

I am creating an Rails 3.2.14 app.
In this app I got a model called Timereport. In the model I got a class method
that I am using to generate statistics.
def self.stats_time_spent(params)
data = group("date(created_at)")
data = data.where("backend_user_id = ?", params[:backend_user_id])
data = data.where("created_at >= ?", params[:date_from])
data = data.where("created_at <= ?", params[:date_to])
data = data.select("date (created_at) as timecreated, sum(total_time) as timetotal")
data
end
This function works but it outputs data in a random fashion. The dates are not sorted.
I tried to add .order("created_at desc") but then I get this error:
PG::GroupingError: ERROR: column "timereports.created_at" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...user_id = '1') GROUP BY date(created_at) ORDER BY created_at...
^
: SELECT COUNT(*) AS count_all, date(created_at) AS date_created_at FROM "timereports" WHERE
I got two questions. Is this a good way of aggregating the data and how do I order the output?
Thankful for all input!
You should order by date(created_at)

Django - Count a subset of related models - Need to annotate count of active Coupons for each Item

I have a Coupon model that has some fields to define if it is active, and a custom manager which returns only live coupons. Coupon has an FK to Item.
In a query on Item, I'm trying to annotate the number of active coupons available. However, the Count aggregate seems to be counting all coupons, not just the active ones.
# models.py
class LiveCouponManager(models.Manager):
"""
Returns only coupons which are active, and the current
date is after the active_date (if specified) but before the valid_until
date (if specified).
"""
def get_query_set(self):
today = datetime.date.today()
passed_active_date = models.Q(active_date__lte=today) | models.Q(active_date=None)
not_expired = models.Q(valid_until__gte=today) | models.Q(valid_until=None)
return super(LiveCouponManager,self).get_query_set().filter(is_active=True).filter(passed_active_date, not_expired)
class Item(models.Model):
# irrelevant fields
class Coupon(models.Model):
item = models.ForeignKey(Item)
is_active = models.BooleanField(default=True)
active_date = models.DateField(blank=True, null=True)
valid_until = models.DateField(blank=True, null=True)
# more fields
live = LiveCouponManager() # defined first, should be default manager
# views.py
# this is the part that isn't working right
data = Item.objects.filter(q).distinct().annotate(num_coupons=Count('coupon', distinct=True))
The .distinct() and distinct=True bits are there for other reasons - the query is such that it will return duplicates. That all works fine, just mentioning it here for completeness.
The problem is that Count is including inactive coupons that are filtered out by the custom manager.
Is there any way I can specify that Count should use the live manager?
EDIT
The following SQL query does exactly what I need:
SELECT data_item.title, COUNT(data_coupon.id) FROM data_item LEFT OUTER JOIN data_coupon ON (data_item.id=data_coupon.item_id)
WHERE (
(is_active='1') AND
(active_date <= current_timestamp OR active_date IS NULL) AND
(valid_until >= current_timestamp OR valid_until IS NULL)
)
GROUP BY data_item.title
At least on sqlite. Any SQL guru feedback would be greatly appreciated - I feel like I'm programming by accident here. Or, even better, a translation back to Django ORM syntax would be awesome.
In case anyone else has the same problem, here's how I've gotten it to work:
Items = Item.objects.filter(q).distinct().extra(
select={"num_coupons":
"""
SELECT COUNT(data_coupon.id) FROM data_coupon
WHERE (
(data_coupon.is_active='1') AND
(data_coupon.active_date <= current_timestamp OR data_coupon.active_date IS NULL) AND
(data_coupon.valid_until >= current_timestamp OR data_coupon.valid_until IS NULL) AND
(data_coupon.data_id = data_item.id)
)
"""
},).order_by(order_by)
I don't know that I consider this a 'correct' answer - it completely duplicates my custom manager in a possibly non portable way (I'm not sure how portable current_timestamp is), but it does work.
Are you sure your custom manager actually get's called? You set your manager as Model.live, but you query the normal manager at Model.objects.
Have you tried the following?
data = Data.live.filter(q)...