Django ORM: Getting rows based on max value of a column - sql

I have a class Marketorders which contains information about single market orders and they are gathered in snapshots of the market (represented by class Snapshot). Each order can appear in more than one snapshot with the latest row of course being the relevant one.
class Marketorders(models.Model):
id = models.AutoField(primary_key=True)
snapid = models.IntegerField()
orderid = models.IntegerField()
reportedtime = models.DateTimeField(null=True, blank=True)
...
class Snapshot(models.Model):
id = models.IntegerField(primary_key=True)
...
What I'm doing is getting all of the orders across several snapshots for processing, but I want to include only the most recent row for each order. In SQL I would simply do:
SELECT m1.* FROM marketorders m1 WHERE reportedtime = (SELECT max(reportedtime)
FROM marketorders m2 WHERE m2.orderid=m1.orderid);
or better yet with a join:
SELECT m1.* FROM marketorders m1 LEFT JOIN marketorders m2 ON
m1.orderid=m2.orderid AND m1.reportedtime < m2.reportedtime
WHERE m2.orderid IS NULL;
However, I just can't figure out how to do this with Django ORM. Is there any way to accomplish this without raw SQL?
EDIT: Just to clarify the problem. Let's say we have the following marketorders (leaving out everything unimportant and using only orderid, reportedtime):
1, 09:00:00
1, 10:00:00
1, 12:00:00
2, 09:00:00
2, 10:00:00
How do I get the following set with the ORM?
1, 12:00:00
2, 10:00:00

If I understood right you need a list of Marketorder objects that contains each Marketorder with highest reportedtime per orderid
Something like this should work (disclaimer: didn't test it directly):
m_orders = Marketorders.objects.filter(id__in=(
Marketorders.objects
.values('orderid')
.annotate(Max('reportedtime'))
.values_list('id', flat=True)
))
For documentation check:
http://docs.djangoproject.com/en/dev/topics/db/aggregation/
Edit:
This should get a single Marketorder with highest reportedtime for a specific orderid
order = (
Marketorders.objects
.filter(orderid=the_orderid)
.filter(reportedtime=(
Marketorders.objects
.filter(orderid=the_orderid)
.aggregate(Max('reportedtime'))
['reportedtime__max']
))
)

Do you have a good reason why you don't use ForeignKey or (in your case better) ManyToManyField. These fields represent the relational structure of ur models.
Furthermore it is not necessary to declare an pk-field id. if no pk is defined, django adds id.
The code below allow orm-queries like this:
m1 = Marketorder()
m1.save() # important: primary key will be added in db
s1 = Snapshot()
s2 = Snapshot()
s1.save()
s2.save()
m1.snapshots.add(s1)
m1.snapshots.add(s2)
m1.snapshots.all()
m1.snapshots.order_by("id") # snapshots in relations with m1
# ordered by id, that is added automatically
s1.marketorder_set.all() # reverse
so for your query:
snapshot = Snapshot.objects.order_by('-id')[0] # order desc, pick first
marketorders = snapshot.marketorder_set.all() # all marketorders in snapshot
or compact:
marketorders = Snapshot.objects.order_by('-id')[0].marketorder_set.all()
models.py:
class Snapshot(models.Model):
name = models.CharField(max_length=100)
class Marketorder(models.Model):
snapshots = models.ManyToManyField(Snapshot)
reportedtime = models.DateTimeField(auto_now= True)
By convention all model classes name are singular. Django makes it plural in different places automatically.
more on queries (filtering, sorting, complex lookups). a must-read.

Related

How to model complex left join Django

I have two Django models that have a relationship that cannot be modelled with a foreign key
class PositionUnadjusted(models.Model):
identifier = models.CharField(max_length=256)
timestamp = models.DateTimeField()
quantity = models.IntegerField()
class Adjustment(models.Model):
identifier = models.CharField(max_length=256)
start = models.DateTimeField()
end = models.DateTimeField()
quantity_delta = models.IntegerField()
I want to create the notion of an adjusted position, where the quantity is modified by the sum of qty_deltas of all adjustments where adj.start <= pos.date < adj.end. In SQL this would be
SELECT pos_unadjusted.id,
pos_unadjusted.timestamp,
pos_unadjusted.identifier,
CASE
WHEN Sum(qty_delta) IS NOT NULL THEN pos_unadjusted.qty + Sum(qty_delta)
ELSE qty
END AS qty,
FROM myapp_positionunadjusted AS pos_unadjusted
LEFT JOIN myapp_adjustment AS adjustments
ON pos_unadjusted.identifier = adjustments.identifier
AND pos_unadjusted.timestamp >= date_start
AND pos_unadjusted.timestamp < date_end
GROUP BY pos_unadjusted.id,
pos_unadjusted.timestamp,
pos_unadjusted.identifier,
Is there some way to get this result without using raw sql? I use this query as a base for many other queries so I don't want to use raw sql.
I've looked into QuerySet and extra() but can't seem to coerce them into having this precise relationship. I'd love for position and PositionUnadjusted to have the same model and same API with no copy-pasting since right now updating them is a lot of copy pasting.

Django complex filter and order

I have 4 model like this
class Site(models.Model):
name = models.CharField(max_length=200)
def get_lowest_price(self, mm_date):
'''This method returns lowest product price on a site at a particular date'''
class Category(models.Model):
name = models.CharField(max_length=200)
site = models.ForeignKey(Site)
class Product(models.Model):
name = models.CharField(max_length=200)
category = models.ForeignKey(Category)
class Price(models.Model):
date = models.DateField()
price = models.IntegerField()
product = models.ForeignKey(Product)
Here every have many category, every category have many product. Now product price can change every day so price model will hold the product price and date.
My problem is I want list of site filter by price range. This price range will depends on the get_lowest_price method and can be sort Min to Max and Max to Min. Already I've used lambda expression to do that but I think it's not appropriate
sorted(Site.objects.all(), key=lambda x: x.get_lowest_price(the_date))
Also I can get all site within a price range by running a loop but this is also not a good idea. Please help my someone to do the query in right manner.
If you still need more clear view of the question please see the first comment from "Ishtiaque Khan", his assumption is 100% right.
*In these models writing frequency is low and reading frequency is high.
1. Using query
If you just wanna query using a specific date. Here is how:
q = Site.objects.filter(category__product__price__date=mm_date) \
.annotate(min_price=Min('category__product__price__price')) \
.filter(min_price__gte=min_price, min_price__lte=max_price)
It will return a list of Site with lowest price on mm_date fall within range of min_price - max_price. You can also query for multiple date using query like so:
q = Site.objects.values('name', 'category__product__price__date') \
.annotate(min_price=Min('category__product__price__price')) \
.filter(min_price__gte=min_price, min_price__lte=max_price)
2. Eager/pre-calculation, you can use post_save signal. Since the write frequency is low this will not be expensive
Create another Table to hold lowest prices per date. Like this:
class LowestPrice(models.Model):
date = models.DateField()
site = models.ForeignKey(Site)
lowest_price = models.IntegerField(default=0)
Use post_save signal to calculate and update this every time there. Sample code (not tested)
from django.db.models.signals import post_save
from django.dispatch import receiver
#receiver(post_save, sender=Price)
def update_price(sender, instance, **kwargs):
cur_price = LowestPrice.objects.filter(site=instance.product.category.site, date=instance.date).first()
if not cur_price:
new_price = LowestPrice()
new_price.site = instance.product.category.site
new_price.date = instance.date
else:
new_price = cur_price
# update price only if needed
if instance.price<new_price.lowest_price:
new_price.lowest_price = instance.price
new_price.save()
Then just query directly from this table when needed:
LowestPrice.objects.filter(date=mm_date, lowest_price__gte=min_price, lowest_price__lte=max_price)
Solution:
from django.db.models import Min
Site.objects.annotate(
price_min=Min('categories__products__prices__price')
).filter(
categories__products__prices__date=the_date,
).distinct().order_by('price_min') # prefix '-' for descending order
For this to work, you need to modify the models by adding a related_name attribute to the ForeignKey fields.
Like this -
class Category(models.Model):
# rest of the fields
site = models.ForeignKey(Site, related_name='categories')
Similary, for Product and Price models, add related_name as products and prices in the ForeignKey fields.
Explanation:
Starting with related_name, it describes the reverse relation from one model to another.
After the reverse relationship is setup, you can use them to inner join the tables.
You can use the reverse relationships to get the price of a product of a category on a site and annotate the min price, filtered by the_date. I have used the annotated value to order by min price of the product, in ascending order. You can use '-' as a prefix character to do in descending order.
Do it with django queryset operations
Price.objects.all().order_by('price') #add [0] for only the first object
or
Price.objects.all().order_by('-price') #add [0] for only the first object
or
Price.objects.filter(date= ... ).order_by('price') #add [0] for only the first object
or
Price.objects.filter(date= ... ).order_by('-price') #add [0] for only the first object
or
Price.objects.filter(date= ... , price__gte=lower_limit, price__lte=upper_limit ).order_by('price') #add [0] for only the first object
or
Price.objects.filter(date= ... , price__gte=lower_limit, price__lte=upper_limit ).order_by('-price') #add [0] for only the first object
I think this ORM query could do the job ...
from django.db.models import Min
sites = Site.objects.annotate(price_min= Min('category__product__price'))
.filter(category__product__price=mm_date).unique().order_by('price_min')
or /and for reversing the order :
sites = Site.objects.annotate(price_min= Min('category__product__price'))
.filter(category__product__price=mm_date).unique().order_by('-price_min')

Django sql order by

I'm really struggling on this one.
I need to be able to sort my user by the number of positive vote received on their comment.
I have a table userprofile, a table comment and a table likeComment.
The table comment has a foreign key to its user creator and the table likeComment has a foreign key to the comment liked.
To get the number of positive vote a user received I do :
LikeComment.objects.filter(Q(type = 1), Q(comment__user=user)).count()
Now I want to be able to get all the users sorted by the ones that have the most positive votes. How do I do that ? I tried to use extra and JOIN but this didn't go anywhere.
Thank you
It sounds like you want to perform a filter on an annotation:
class User(models.Model):
pass
class Comment(models.Model):
user = models.ForeignKey(User, related_name="comments")
class Like(models.Model):
comment = models.ForeignKey(Comment, related_name="likes")
type = models.IntegerField()
users = User \
.objects \
.all()
.extra(select = {
"positive_likes" : """
SELECT COUNT(*) FROM app_like
JOIN app_comment on app_like.comment_id = app_comment.id
WHERE app_comment.user_id = app_user.id AND app_like.type = 1 """})
.order_by("positive_likes")
models.py
class UserProfile(models.Model):
.........
def like_count(self):
LikeComment.objects.filter(comment__user=self.user, type=1).count()
views.py
def getRanking( anObject ):
return anObject.like_count()
def myview(request):
users = list(UserProfile.objects.filter())
users.sort(key=getRanking, reverse=True)
return render(request,'page.html',{'users': users})
Timmy's suggestion to use a subquery is probably the simplest way to solve this kind of problem, but subqueries almost never perform as well as joins, so if you have a lot of users you may find that you need better performance.
So, re-using Timmy's models:
class User(models.Model):
pass
class Comment(models.Model):
user = models.ForeignKey(User, related_name="comments")
class Like(models.Model):
comment = models.ForeignKey(Comment, related_name="likes")
type = models.IntegerField()
the query you want looks like this in SQL:
SELECT app_user.id, COUNT(app_like.id) AS total_likes
FROM app_user
LEFT OUTER JOIN app_comment
ON app_user.id = app_comment.user_id
LEFT OUTER JOIN app_like
ON app_comment.id = app_like.comment_id AND app_like.type = 1
GROUP BY app_user.id
ORDER BY total_likes DESCENDING
(If your actual User model has more fields than just id, then you'll need to include them all in the SELECT and GROUP BY clauses.)
Django's object-relational mapping system doesn't provide a way to express this query. (As far as I know—and I'd be very happy to be told otherwise!—it only supports aggregation across one join, not across two joins as here.) But when the ORM isn't quite up to the job, you can always run a raw SQL query, like this:
sql = '''
SELECT app_user.id, COUNT(app_like.id) AS total_likes
# etc (as above)
'''
for user in User.objects.raw(sql):
print user.id, user.total_likes
I believe this can be achieved with Django's queryset:
User.objects.filter(comments__likes__type=1)\
.annotate(lks=Count('comments__likes'))\
.order_by('-lks')
The only problem here is that this query will miss users with 0 likes. Code from #gareth-rees, #timmy-omahony and #Catherine will include also 0-ranked users.

Query that joins just a single row from a ForeignKey relationship

I have the following models (simplified):
class Category(models.model):
# ...
class Product(models.model):
# ...
class ProductCategory(models.Model):
product = models.ForeignKey(Product)
category = models.ForeignKey(Category)
# ...
class ProductImage(models.Model):
product = models.ForeignKey(Product)
image = models.ImageField(upload_to=product_image_path)
sort_order = models.PositiveIntegerField(default=100)
# ...
I want to construct a query that will get all the products associated with a particular category. I want to include just one of the many associated images--the image with the lowest sort_order--in the queryset so that a single query gets all of the data needed to show all products within a category.
In raw SQL I would might use a GROUP BY something like this:
SELECT * FROM catalog_product p
LEFT JOIN catalog_productcategory c ON (p.id = c.product_id)
LEFT JOIN catalog_productimage i ON (p.id = i.product_id)
WHERE c.category_id=2
GROUP BY p.id HAVING i.sort_order = MIN(sort_order)
Can this be done without using a raw query?
Edit - I should have noted what I've tried...
# inside Category model...
products = Product.objects.filter(productcategory__category=self) \
.annotate(Min('productimage__sort_order'))
While this query does GROUP BY, I do not see any way to (a) get the right ProductImage.image into the QuerySet eg. HAVING clause. I'm effectively trying to dynamically add a field to the Product instance (or the QuerySet) from a specific ProductImage instance. This may not be the way to do it with Django.
It isn't quite a raw query, but it isn't quite public api either.
You can add a group by clause to the queryset before it is evaluated:
qs = Product.objects.filter(some__foreign__key__join=something)
qs.group_by = 'some_field'
results = list(qs)
Word of caution, though: this behaves differently depending on the db backend.
catagory = Catagory.objects.get(get_your_catagory)
qs = Product.objects.annotate(Min('productimage__sortorder').filter(productcategory__category = catagory)
This should hit the DB only once, because querysets are lazy.

Django - Count a subset of related models - Need to annotate count of active Coupons for each Item

I have a Coupon model that has some fields to define if it is active, and a custom manager which returns only live coupons. Coupon has an FK to Item.
In a query on Item, I'm trying to annotate the number of active coupons available. However, the Count aggregate seems to be counting all coupons, not just the active ones.
# models.py
class LiveCouponManager(models.Manager):
"""
Returns only coupons which are active, and the current
date is after the active_date (if specified) but before the valid_until
date (if specified).
"""
def get_query_set(self):
today = datetime.date.today()
passed_active_date = models.Q(active_date__lte=today) | models.Q(active_date=None)
not_expired = models.Q(valid_until__gte=today) | models.Q(valid_until=None)
return super(LiveCouponManager,self).get_query_set().filter(is_active=True).filter(passed_active_date, not_expired)
class Item(models.Model):
# irrelevant fields
class Coupon(models.Model):
item = models.ForeignKey(Item)
is_active = models.BooleanField(default=True)
active_date = models.DateField(blank=True, null=True)
valid_until = models.DateField(blank=True, null=True)
# more fields
live = LiveCouponManager() # defined first, should be default manager
# views.py
# this is the part that isn't working right
data = Item.objects.filter(q).distinct().annotate(num_coupons=Count('coupon', distinct=True))
The .distinct() and distinct=True bits are there for other reasons - the query is such that it will return duplicates. That all works fine, just mentioning it here for completeness.
The problem is that Count is including inactive coupons that are filtered out by the custom manager.
Is there any way I can specify that Count should use the live manager?
EDIT
The following SQL query does exactly what I need:
SELECT data_item.title, COUNT(data_coupon.id) FROM data_item LEFT OUTER JOIN data_coupon ON (data_item.id=data_coupon.item_id)
WHERE (
(is_active='1') AND
(active_date <= current_timestamp OR active_date IS NULL) AND
(valid_until >= current_timestamp OR valid_until IS NULL)
)
GROUP BY data_item.title
At least on sqlite. Any SQL guru feedback would be greatly appreciated - I feel like I'm programming by accident here. Or, even better, a translation back to Django ORM syntax would be awesome.
In case anyone else has the same problem, here's how I've gotten it to work:
Items = Item.objects.filter(q).distinct().extra(
select={"num_coupons":
"""
SELECT COUNT(data_coupon.id) FROM data_coupon
WHERE (
(data_coupon.is_active='1') AND
(data_coupon.active_date <= current_timestamp OR data_coupon.active_date IS NULL) AND
(data_coupon.valid_until >= current_timestamp OR data_coupon.valid_until IS NULL) AND
(data_coupon.data_id = data_item.id)
)
"""
},).order_by(order_by)
I don't know that I consider this a 'correct' answer - it completely duplicates my custom manager in a possibly non portable way (I'm not sure how portable current_timestamp is), but it does work.
Are you sure your custom manager actually get's called? You set your manager as Model.live, but you query the normal manager at Model.objects.
Have you tried the following?
data = Data.live.filter(q)...