Django complex filter and order - sql

I have 4 model like this
class Site(models.Model):
name = models.CharField(max_length=200)
def get_lowest_price(self, mm_date):
'''This method returns lowest product price on a site at a particular date'''
class Category(models.Model):
name = models.CharField(max_length=200)
site = models.ForeignKey(Site)
class Product(models.Model):
name = models.CharField(max_length=200)
category = models.ForeignKey(Category)
class Price(models.Model):
date = models.DateField()
price = models.IntegerField()
product = models.ForeignKey(Product)
Here every have many category, every category have many product. Now product price can change every day so price model will hold the product price and date.
My problem is I want list of site filter by price range. This price range will depends on the get_lowest_price method and can be sort Min to Max and Max to Min. Already I've used lambda expression to do that but I think it's not appropriate
sorted(Site.objects.all(), key=lambda x: x.get_lowest_price(the_date))
Also I can get all site within a price range by running a loop but this is also not a good idea. Please help my someone to do the query in right manner.
If you still need more clear view of the question please see the first comment from "Ishtiaque Khan", his assumption is 100% right.
*In these models writing frequency is low and reading frequency is high.

1. Using query
If you just wanna query using a specific date. Here is how:
q = Site.objects.filter(category__product__price__date=mm_date) \
.annotate(min_price=Min('category__product__price__price')) \
.filter(min_price__gte=min_price, min_price__lte=max_price)
It will return a list of Site with lowest price on mm_date fall within range of min_price - max_price. You can also query for multiple date using query like so:
q = Site.objects.values('name', 'category__product__price__date') \
.annotate(min_price=Min('category__product__price__price')) \
.filter(min_price__gte=min_price, min_price__lte=max_price)
2. Eager/pre-calculation, you can use post_save signal. Since the write frequency is low this will not be expensive
Create another Table to hold lowest prices per date. Like this:
class LowestPrice(models.Model):
date = models.DateField()
site = models.ForeignKey(Site)
lowest_price = models.IntegerField(default=0)
Use post_save signal to calculate and update this every time there. Sample code (not tested)
from django.db.models.signals import post_save
from django.dispatch import receiver
#receiver(post_save, sender=Price)
def update_price(sender, instance, **kwargs):
cur_price = LowestPrice.objects.filter(site=instance.product.category.site, date=instance.date).first()
if not cur_price:
new_price = LowestPrice()
new_price.site = instance.product.category.site
new_price.date = instance.date
else:
new_price = cur_price
# update price only if needed
if instance.price<new_price.lowest_price:
new_price.lowest_price = instance.price
new_price.save()
Then just query directly from this table when needed:
LowestPrice.objects.filter(date=mm_date, lowest_price__gte=min_price, lowest_price__lte=max_price)

Solution:
from django.db.models import Min
Site.objects.annotate(
price_min=Min('categories__products__prices__price')
).filter(
categories__products__prices__date=the_date,
).distinct().order_by('price_min') # prefix '-' for descending order
For this to work, you need to modify the models by adding a related_name attribute to the ForeignKey fields.
Like this -
class Category(models.Model):
# rest of the fields
site = models.ForeignKey(Site, related_name='categories')
Similary, for Product and Price models, add related_name as products and prices in the ForeignKey fields.
Explanation:
Starting with related_name, it describes the reverse relation from one model to another.
After the reverse relationship is setup, you can use them to inner join the tables.
You can use the reverse relationships to get the price of a product of a category on a site and annotate the min price, filtered by the_date. I have used the annotated value to order by min price of the product, in ascending order. You can use '-' as a prefix character to do in descending order.

Do it with django queryset operations
Price.objects.all().order_by('price') #add [0] for only the first object
or
Price.objects.all().order_by('-price') #add [0] for only the first object
or
Price.objects.filter(date= ... ).order_by('price') #add [0] for only the first object
or
Price.objects.filter(date= ... ).order_by('-price') #add [0] for only the first object
or
Price.objects.filter(date= ... , price__gte=lower_limit, price__lte=upper_limit ).order_by('price') #add [0] for only the first object
or
Price.objects.filter(date= ... , price__gte=lower_limit, price__lte=upper_limit ).order_by('-price') #add [0] for only the first object

I think this ORM query could do the job ...
from django.db.models import Min
sites = Site.objects.annotate(price_min= Min('category__product__price'))
.filter(category__product__price=mm_date).unique().order_by('price_min')
or /and for reversing the order :
sites = Site.objects.annotate(price_min= Min('category__product__price'))
.filter(category__product__price=mm_date).unique().order_by('-price_min')

Related

Conditional bulk update in Django using grouping

Suppose I have a list of transactions with the following model definition:
class Transaction(models.Model):
amount = models.FloatField()
client = models.ForeignKey(Client)
date = models.DateField()
description = models.CharField()
invoice = models.ForeignKey(Invoice, null=True)
Now I want to create invoices for each client at the end of the month. The invoice model looks like this:
class Invoice(models.Model):
client = models.ForeignKey(Client)
invoice_date = models.DateField()
invoice_number = models.CharField(unique=True)
def amount_due(self):
return self.transaction_set.aggregate(Sum('amount'))
def create_invoices(invoice_date):
for client in Client.objects.all():
transactions = Transaction.objects.filter(client=client)
if transactions.exists():
invoice = Invoice(client=client, number=get_invoice_number(), date=invoice_date)
invoice.save()
transactions.update(invoice=invoice)
I know I can create all the invoices with a bulk create in 1 query with bulk create, but I would still have to the set the invoice field in the transaction model individually.
Is it possible to set the invoice field of all the Transaction models with a single query after I've created all the invoices? Preferably in using the ORM but happy to use raw SQL if required.
I know I can also use group by client on the transaction list to get the total per client, but then the individual entries are not linked to the invoice.
You could try constructing a conditional update query if you are able to generate a mapping from clients to invoices before:
from django.db.models import Case, Value, When
# generate this after creating the invoices
client_invoice_mapping = {
# client: invoice
}
cases = [When(client_id=client.pk, then=Value(invoice.pk))
for client, invoice in client_invoice_mapping.items()]
Transaction.objects.update(invoice_id=Case(*cases))
Note that Conditional Queries are available since Django 1.8. Otherwise you may look into constructing something similar using raw SQL.
To complement #Bernhard Vallant answer. You can use only 3 queries.
def create_invoices(invoice_date):
# Maybe use Exists clause here instead of subquery,
# do some tests for your case if the query is slow
clients_with_transactions = Client.objects.filter(
id__in=Transaction.objects.values('client')
)
invoices = [
Invoice(client=client, number=get_invoice_number(), date=invoice_date)
for client in clients_with_transactions
]
# With PostgreSQL Django can populate id's here
invoices = Invoice.objects.bulk_create(invoices)
# And now use a conditional update
cases = [
When(client_id=invoice.client_id, then=Value(invoice.pk))
for invoice in invoices
]
Transaction.objects.update(invoice_id=Case(*cases))

Django ORM: order by minimal value from m2m depending on specific value

I have 4 models:
User, multiple Place(s) where he lives. Each Place have multiple nearby MetroStation(s).
Then, I have time in road (MetroTimes) between all each to each metro stations.
class User(models.Model):
pass
class Place(models.Model):
user = models.ForeignKey(User)
metro_stations = models.ManyToManyField('geo.MetroStation', related_name='places')
class MetroStation(models.Model):
pass
class MetroTimes(models.Model):
metro_station_1 = models.ForeignKey(MetroStation, related_name='metro_stations_1')
metro_station_2 = models.ForeignKey(MetroStation, related_name='metro_stations_2')
time = models.IntegerField()
The task is to sort all users by time in road (MetroTimes) to specific MetroStation from closest MetroStation among all User's Place's)
And the magic I can't deal:
specific_metro_station = MetroStation.objects.get(id=1)
User.objects.all().order_by(closest_metro_station_in_closest_user's_place_by_metro_time_to=specific_metro_station)
Big Thx for help!
I did it using Django's Conditional Expressions and Query Expressions
Code in My case:
from django.db.models import Avg, Max, Min, When
from django.db.models import CharField, Case, Value, When, IntegerField
current = MetroStation.objects.get(id=1)
users = User.objects.all().annotate(
time=Min(
Case(
When(
places__metro_stations__metro_stations_1__metro_station_2=current,
then=F('places__metro_stations__metro_stations_1__time')
),
output_field=IntegerField()
)
)
).order_by('time')

Django retrieve results by date operation query

I have one model to make my Catalog like this
class Product(models.Model):
item = models.CharField(max_length=10)
description = models.CharField(max_length=140)
unit = models.CharField(max_length=5)
expiration= models.IntegerField(default=365) # Days after admission
expiration_warning= models.IntegerField(default=30) # Days before expiration
...
and I have my Inventory like this:
class Inventory(models.Model):
product = models.ForeignKey(Product)
quantity= models.DecimalField(default=0, decimal_places=2, max_digits=10)
admission = models.DateTimeField(auto_now_add=True)
...
Now I want to retrieve all Inventory objects that its expiration date is upcoming or has been reached.
With a RAW SQL query, it will be something like:
SELECT product,quantity,admission,expiration,expiration_warning
FROM Inventory as I JOIN Product as P on I.product=P.id
WHERE DATEADD(day,(expiration_warning*-1),DATEADD(day,expiration,admission))<=GETDATE()
I haven't tried this query yet, it is just an example
I want to achieve this using Django query syntax, can you help me please?
Maybe something like this can achieve what you need:
from datetime import datetime, timedelta
expiring_inventory = []
expired_inventory = []
for inventory in Inventory.objects.select_related('product').all(): # 1 query
p = inventory.product
expire_datetime = inventory.admission + timedelta(days=p.expiration)
expiring_datetime = expire_datetime - timedelta(days=p.expiration_warning)
if expiring_datetime <= datetime.now() < expire_datetime: # not quite expired
expiring_inventory.append(inventory)
if expire_datetime <= datetime.now(): # expired
expired_inventory.append(inventory)
You can read more about Query Sets in the django docs.

Query that joins just a single row from a ForeignKey relationship

I have the following models (simplified):
class Category(models.model):
# ...
class Product(models.model):
# ...
class ProductCategory(models.Model):
product = models.ForeignKey(Product)
category = models.ForeignKey(Category)
# ...
class ProductImage(models.Model):
product = models.ForeignKey(Product)
image = models.ImageField(upload_to=product_image_path)
sort_order = models.PositiveIntegerField(default=100)
# ...
I want to construct a query that will get all the products associated with a particular category. I want to include just one of the many associated images--the image with the lowest sort_order--in the queryset so that a single query gets all of the data needed to show all products within a category.
In raw SQL I would might use a GROUP BY something like this:
SELECT * FROM catalog_product p
LEFT JOIN catalog_productcategory c ON (p.id = c.product_id)
LEFT JOIN catalog_productimage i ON (p.id = i.product_id)
WHERE c.category_id=2
GROUP BY p.id HAVING i.sort_order = MIN(sort_order)
Can this be done without using a raw query?
Edit - I should have noted what I've tried...
# inside Category model...
products = Product.objects.filter(productcategory__category=self) \
.annotate(Min('productimage__sort_order'))
While this query does GROUP BY, I do not see any way to (a) get the right ProductImage.image into the QuerySet eg. HAVING clause. I'm effectively trying to dynamically add a field to the Product instance (or the QuerySet) from a specific ProductImage instance. This may not be the way to do it with Django.
It isn't quite a raw query, but it isn't quite public api either.
You can add a group by clause to the queryset before it is evaluated:
qs = Product.objects.filter(some__foreign__key__join=something)
qs.group_by = 'some_field'
results = list(qs)
Word of caution, though: this behaves differently depending on the db backend.
catagory = Catagory.objects.get(get_your_catagory)
qs = Product.objects.annotate(Min('productimage__sortorder').filter(productcategory__category = catagory)
This should hit the DB only once, because querysets are lazy.

Django - Count a subset of related models - Need to annotate count of active Coupons for each Item

I have a Coupon model that has some fields to define if it is active, and a custom manager which returns only live coupons. Coupon has an FK to Item.
In a query on Item, I'm trying to annotate the number of active coupons available. However, the Count aggregate seems to be counting all coupons, not just the active ones.
# models.py
class LiveCouponManager(models.Manager):
"""
Returns only coupons which are active, and the current
date is after the active_date (if specified) but before the valid_until
date (if specified).
"""
def get_query_set(self):
today = datetime.date.today()
passed_active_date = models.Q(active_date__lte=today) | models.Q(active_date=None)
not_expired = models.Q(valid_until__gte=today) | models.Q(valid_until=None)
return super(LiveCouponManager,self).get_query_set().filter(is_active=True).filter(passed_active_date, not_expired)
class Item(models.Model):
# irrelevant fields
class Coupon(models.Model):
item = models.ForeignKey(Item)
is_active = models.BooleanField(default=True)
active_date = models.DateField(blank=True, null=True)
valid_until = models.DateField(blank=True, null=True)
# more fields
live = LiveCouponManager() # defined first, should be default manager
# views.py
# this is the part that isn't working right
data = Item.objects.filter(q).distinct().annotate(num_coupons=Count('coupon', distinct=True))
The .distinct() and distinct=True bits are there for other reasons - the query is such that it will return duplicates. That all works fine, just mentioning it here for completeness.
The problem is that Count is including inactive coupons that are filtered out by the custom manager.
Is there any way I can specify that Count should use the live manager?
EDIT
The following SQL query does exactly what I need:
SELECT data_item.title, COUNT(data_coupon.id) FROM data_item LEFT OUTER JOIN data_coupon ON (data_item.id=data_coupon.item_id)
WHERE (
(is_active='1') AND
(active_date <= current_timestamp OR active_date IS NULL) AND
(valid_until >= current_timestamp OR valid_until IS NULL)
)
GROUP BY data_item.title
At least on sqlite. Any SQL guru feedback would be greatly appreciated - I feel like I'm programming by accident here. Or, even better, a translation back to Django ORM syntax would be awesome.
In case anyone else has the same problem, here's how I've gotten it to work:
Items = Item.objects.filter(q).distinct().extra(
select={"num_coupons":
"""
SELECT COUNT(data_coupon.id) FROM data_coupon
WHERE (
(data_coupon.is_active='1') AND
(data_coupon.active_date <= current_timestamp OR data_coupon.active_date IS NULL) AND
(data_coupon.valid_until >= current_timestamp OR data_coupon.valid_until IS NULL) AND
(data_coupon.data_id = data_item.id)
)
"""
},).order_by(order_by)
I don't know that I consider this a 'correct' answer - it completely duplicates my custom manager in a possibly non portable way (I'm not sure how portable current_timestamp is), but it does work.
Are you sure your custom manager actually get's called? You set your manager as Model.live, but you query the normal manager at Model.objects.
Have you tried the following?
data = Data.live.filter(q)...