Selecting related model: Left join, prefetch_related or select_related? - sql

Considering I have the following relationships:
class House(Model):
name = ...
class User(Model):
"""The standard auth model"""
pass
class Alert(Model):
user = ForeignKey(User)
house = ForeignKey(House)
somevalue = IntegerField()
Meta:
unique_together = (('user', 'property'),)
In one query, I would like to get the list of houses, and whether the current user has any alert for any of them.
In SQL I would do it like this:
SELECT *
FROM house h
LEFT JOIN alert a
ON h.id = a.house_id
WHERE a.user_id = ?
OR a.user_id IS NULL
And I've found that I could use prefetch_related to achieve something like this:
p = Prefetch('alert_set', queryset=Alert.objects.filter(user=self.request.user), to_attr='user_alert')
houses = House.objects.order_by('name').prefetch_related(p)
The above example works, but houses.user_alert is a list, not an Alert object. I only have one alert per user per house, so what is the best way for me to get this information?
select_related didn't seem to work. Oh, and surely I know I can manage this in multiple queries, but I'd really want to have it done in one, and the 'Django way'.
Thanks in advance!

The solution is clearer if you start with the multiple query approach, and then try to optimise it. To get the user_alerts for every house, you could do the following:
houses = House.objects.order_by('name')
for house in houses:
user_alerts = house.alert_set.filter(user=self.request.user)
The user_alerts queryset will cause an extra query for every house in the queryset. You can avoid this with prefetch_related.
alerts_queryset = Alert.objects.filter(user=self.request.user)
houses = House.objects.order_by('name').prefetch_related(
Prefetch('alert_set', queryset=alerts_queryset, to_attrs='user_alerts'),
)
for house in houses:
user_alerts = house.user_alerts
This will take two queries, one for houses and one for the alerts. I don't think you require select related here to fetch the user, since you already have access to the user with self.request.user. If you want you could add select_related to the alerts_queryset:
alerts_queryset = Alert.objects.filter(user=self.request.user).select_related('user')
In your case, user_alerts will be an empty list or a list with one item, because of your unique_together constraint. If you can't handle the list, you could loop through the queryset once, and set house.user_alert:
for house in houses:
house.user_alert = house.user_alerts[0] if house.user_alerts else None

Related

Return results from more than one database table in Django

Suppose I have 3 hypothetical models;
class State(models.Model):
name = models.CharField(max_length=20)
class Company(models.Model):
name = models.CharField(max_length=60)
state = models.ForeignField(State)
class Person(models.Model):
name = models.CharField(max_length=60)
state = models.ForeignField(State)
I want to be able to return results in a Django app, where the results, if using SQL directly, would be based on a query such as this:
SELECT a.name as 'personName',b.name as 'companyName', b.state as 'State'
FROM Person a, Company b
WHERE a.state=b.state
I have tried using the select_related() method as suggested here, but I don't think this is quite what I am after, since I am trying to join two tables that have a common foreign-key, but have no key-relationships amongst themselves.
Any suggestions?
Since a Person can have multiple Companys in the same state. It is not a good idea to do the JOIN at the database level. That would mean that the database will (likely) return the same Company multiple times, making the output quite large.
We can prefetch the related companies, with:
qs = Person.objects.select_related('state').prefetch_related('state__company')
Then we can query the Companys in the same state with:
for person in qs:
print(person.state.company_set.all())
You can use a Prefetch-object [Django-doc] to prefetch the list of related companies in an attribute of the Person, for example:
from django.db.models import Prefetch
qs = Person.objects.prefetch_related(
Prefetch('state__company', Company.objects.all(), to_attr='same_state_companies')
)
Then you can print the companies with:
for person in qs:
print(person.same_state_companies)

Filtering model with HABTM relationship

I have 2 models - Restaurant and Feature. They are connected via has_and_belongs_to_many relationship. The gist of it is that you have restaurants with many features like delivery, pizza, sandwiches, salad bar, vegetarian option,… So now when the user wants to filter the restaurants and lets say he checks pizza and delivery, I want to display all the restaurants that have both features; pizza, delivery and maybe some more, but it HAS TO HAVE pizza AND delivery.
If I do a simple .where('features IN (?)', params[:features]) I (of course) get the restaurants that have either - so or pizza or delivery or both - which is not at all what I want.
My SQL/Rails knowledge is kinda limited since I'm new to this but I asked a friend and now I have this huuuge SQL that gets the job done:
Restaurant.find_by_sql(['SELECT restaurant_id FROM (
SELECT features_restaurants.*, ROW_NUMBER() OVER(PARTITION BY restaurants.id ORDER BY features.id) AS rn FROM restaurants
JOIN features_restaurants ON restaurants.id = features_restaurants.restaurant_id
JOIN features ON features_restaurants.feature_id = features.id
WHERE features.id in (?)
) t
WHERE rn = ?', params[:features], params[:features].count])
So my question is: is there a better - more Rails even - way of doing this? How would you do it?
Oh BTW I'm using Rails 4 on Heroku so it's a Postgres DB.
This is an example of a set-iwthin-sets query. I advocate solving these with group by and having, because this provides a general framework.
Here is how this works in your case:
select fr.restaurant_id
from features_restaurants fr join
features f
on fr.feature_id = f.feature_id
group by fr.restaurant_id
having sum(case when f.feature_name = 'pizza' then 1 else 0 end) > 0 and
sum(case when f.feature_name = 'delivery' then 1 else 0 end) > 0
Each condition in the having clause is counting for the presence of one of the features -- "pizza" and "delivery". If both features are present, then you get the restaurant_id.
How much data is in your features table? Is it just a table of ids and names?
If so, and you're willing to do a little denormalization, you can do this much more easily by encoding the features as a text array on restaurant.
With this scheme your queries boil down to
select * from restaurants where restaurants.features #> ARRAY['pizza', 'delivery']
If you want to maintain your features table because it contains useful data, you can store the array of feature ids on the restaurant and do a query like this:
select * from restaurants where restaurants.feature_ids #> ARRAY[5, 17]
If you don't know the ids up front, and want it all in one query, you should be able to do something along these lines:
select * from restaurants where restaurants.feature_ids #> (
select id from features where name in ('pizza', 'delivery')
) as matched_features
That last query might need some more consideration...
Anyways, I've actually got a pretty detailed article written up about Tagging in Postgres and ActiveRecord if you want some more details.
This is not "copy and paste" solution but if you consider following steps you will have fast working query.
index feature_name column (I'm assuming that column feature_id is indexed on both tables)
place each feature_name param in exists():
select fr.restaurant_id
from
features_restaurants fr
where
exists(select true from features f where fr.feature_id = f.feature_id and f.feature_name = 'pizza')
and
exists(select true from features f where fr.feature_id = f.feature_id and f.feature_name = 'delivery')
group by
fr.restaurant_id
Maybe you're looking at it backwards?
Maybe try merging the restaurants returned by each feature.
Simplified:
pizza_restaurants = Feature.find_by_name('pizza').restaurants
delivery_restaurants = Feature.find_by_name('delivery').restaurants
pizza_delivery_restaurants = pizza_restaurants & delivery_restaurants
Obviously, this is a single instance solution. But it illustrates the idea.
UPDATE
Here's a dynamic method to pull in all filters without writing SQL (i.e. the "Railsy" way)
def get_restaurants_by_feature_names(features)
# accepts an array of feature names
restaurants = Restaurant.all
features.each do |f|
feature_restaurants = Feature.find_by_name(f).restaurants
restaurants = feature_restaurants & restaurants
end
return restaurants
end
Since its an AND condition (the OR conditions get dicey with AREL). I reread your stated problem and ignoring the SQL. I think this is what you want.
# in Restaurant
has_many :features
# in Feature
has_many :restaurants
# this is a contrived example. you may be doing something like
# where(name: 'pizza'). I'm just making this condition up. You
# could also make this more DRY by just passing in the name if
# that's what you're doing.
def self.pizza
where(pizza: true)
end
def self.delivery
where(delivery: true)
end
# query
Restaurant.features.pizza.delivery
Basically you call the association with ".features" and then you use the self methods defined on features. Hopefully I didn't misunderstand the original problem.
Cheers!
Restaurant
.joins(:features)
.where(features: {name: ['pizza','delivery']})
.group(:id)
.having('count(features.name) = ?', 2)
This seems to work for me. I tried it with SQLite though.

Django sql order by

I'm really struggling on this one.
I need to be able to sort my user by the number of positive vote received on their comment.
I have a table userprofile, a table comment and a table likeComment.
The table comment has a foreign key to its user creator and the table likeComment has a foreign key to the comment liked.
To get the number of positive vote a user received I do :
LikeComment.objects.filter(Q(type = 1), Q(comment__user=user)).count()
Now I want to be able to get all the users sorted by the ones that have the most positive votes. How do I do that ? I tried to use extra and JOIN but this didn't go anywhere.
Thank you
It sounds like you want to perform a filter on an annotation:
class User(models.Model):
pass
class Comment(models.Model):
user = models.ForeignKey(User, related_name="comments")
class Like(models.Model):
comment = models.ForeignKey(Comment, related_name="likes")
type = models.IntegerField()
users = User \
.objects \
.all()
.extra(select = {
"positive_likes" : """
SELECT COUNT(*) FROM app_like
JOIN app_comment on app_like.comment_id = app_comment.id
WHERE app_comment.user_id = app_user.id AND app_like.type = 1 """})
.order_by("positive_likes")
models.py
class UserProfile(models.Model):
.........
def like_count(self):
LikeComment.objects.filter(comment__user=self.user, type=1).count()
views.py
def getRanking( anObject ):
return anObject.like_count()
def myview(request):
users = list(UserProfile.objects.filter())
users.sort(key=getRanking, reverse=True)
return render(request,'page.html',{'users': users})
Timmy's suggestion to use a subquery is probably the simplest way to solve this kind of problem, but subqueries almost never perform as well as joins, so if you have a lot of users you may find that you need better performance.
So, re-using Timmy's models:
class User(models.Model):
pass
class Comment(models.Model):
user = models.ForeignKey(User, related_name="comments")
class Like(models.Model):
comment = models.ForeignKey(Comment, related_name="likes")
type = models.IntegerField()
the query you want looks like this in SQL:
SELECT app_user.id, COUNT(app_like.id) AS total_likes
FROM app_user
LEFT OUTER JOIN app_comment
ON app_user.id = app_comment.user_id
LEFT OUTER JOIN app_like
ON app_comment.id = app_like.comment_id AND app_like.type = 1
GROUP BY app_user.id
ORDER BY total_likes DESCENDING
(If your actual User model has more fields than just id, then you'll need to include them all in the SELECT and GROUP BY clauses.)
Django's object-relational mapping system doesn't provide a way to express this query. (As far as I know—and I'd be very happy to be told otherwise!—it only supports aggregation across one join, not across two joins as here.) But when the ORM isn't quite up to the job, you can always run a raw SQL query, like this:
sql = '''
SELECT app_user.id, COUNT(app_like.id) AS total_likes
# etc (as above)
'''
for user in User.objects.raw(sql):
print user.id, user.total_likes
I believe this can be achieved with Django's queryset:
User.objects.filter(comments__likes__type=1)\
.annotate(lks=Count('comments__likes'))\
.order_by('-lks')
The only problem here is that this query will miss users with 0 likes. Code from #gareth-rees, #timmy-omahony and #Catherine will include also 0-ranked users.

Query that joins just a single row from a ForeignKey relationship

I have the following models (simplified):
class Category(models.model):
# ...
class Product(models.model):
# ...
class ProductCategory(models.Model):
product = models.ForeignKey(Product)
category = models.ForeignKey(Category)
# ...
class ProductImage(models.Model):
product = models.ForeignKey(Product)
image = models.ImageField(upload_to=product_image_path)
sort_order = models.PositiveIntegerField(default=100)
# ...
I want to construct a query that will get all the products associated with a particular category. I want to include just one of the many associated images--the image with the lowest sort_order--in the queryset so that a single query gets all of the data needed to show all products within a category.
In raw SQL I would might use a GROUP BY something like this:
SELECT * FROM catalog_product p
LEFT JOIN catalog_productcategory c ON (p.id = c.product_id)
LEFT JOIN catalog_productimage i ON (p.id = i.product_id)
WHERE c.category_id=2
GROUP BY p.id HAVING i.sort_order = MIN(sort_order)
Can this be done without using a raw query?
Edit - I should have noted what I've tried...
# inside Category model...
products = Product.objects.filter(productcategory__category=self) \
.annotate(Min('productimage__sort_order'))
While this query does GROUP BY, I do not see any way to (a) get the right ProductImage.image into the QuerySet eg. HAVING clause. I'm effectively trying to dynamically add a field to the Product instance (or the QuerySet) from a specific ProductImage instance. This may not be the way to do it with Django.
It isn't quite a raw query, but it isn't quite public api either.
You can add a group by clause to the queryset before it is evaluated:
qs = Product.objects.filter(some__foreign__key__join=something)
qs.group_by = 'some_field'
results = list(qs)
Word of caution, though: this behaves differently depending on the db backend.
catagory = Catagory.objects.get(get_your_catagory)
qs = Product.objects.annotate(Min('productimage__sortorder').filter(productcategory__category = catagory)
This should hit the DB only once, because querysets are lazy.

Left join query in Django with multiple joins against same table

Not sure how to accomplish this in Django.
Models:
class LadderPlayer(models.Model):
player = models.ForeignKey(User, unique=True)
position = models.IntegerField(unique=True)
class Match(models.Model):
date = models.DateTimeField()
challenger = models.ForeignKey(LadderPlayer)
challengee = models.ForeignKey(LadderPlayer)
Would like to query to get all info about a player in one shot, including any challenges they have issued or challenges against them. This SQL works:
select lp.position,
lp.player_id,
sc1.challengee_id challenging,
sc2.challenger_id challenged_by
from ladderplayer lp left join challenge sc1 on lp.player_id = sc1.challenger_id
left join challenge sc2 on lp.player_id = sc2.challengee_id
Which returns something like this, if player 3 has challenged player 2:
position player_id challenging challenged_by
---------- ---------- ----------- -------------
1 1
2 2 3
3 3 2
No idea how to do in Django ORM....any way to do this?
Actually you should probably change your models a bit, since there's a many-to-many relation from LadderPlayer to itself using Match as an intermediate table. Check out django's documentation on this topic. Then you should be able to make the queries you want using django's orm! Also have a look at symmetrical/asymmetrical many-to-many relationships!
Well, I did more digging and it looks like in Django 1.2 this is doable via the "raw()" method on the Query Manager thing. So this is the code using my query above:
ladder_players = LadderPlayer.objects.raw("""select lp.id, lp.position,lp.player_id,
sc1.challengee_id challenging,
sc2.challenger_id challenged_by
from ladderplayer lp left join challenge sc1 on lp.player_id = sc1.challenger_id
left join challenge sc2 on lp.player_id = sc2.challengee_id order by position""")
And in the template, you can refer to the "calculated" join fields:
{% for p in ladder_players %}
{{p.challenging}} {{p.challenged_by}}
...
etc.
Seems to work as I needed....
#lazerscience is absolutely correct. You should tweak your models, since you are setting up a de facto many-to-many relationship; doing so will allow you to leverage more features of the admin interface & so forth.
Additionally, regardless, there is no need to go to raw(), since this can be done entirely via normal usage of the Django ORM.
Something like:
class LadderPlayer(models.Model):
player = models.ForeignKey(User, unique=True)
position = models.IntegerField(unique=True)
challenges = models.ManyToManyField("self", symmetrical=False, through='Match')
class Match(models.Model):
date = models.DateTimeField()
challenger = models.ForeignKey(LadderPlayer)
challengee = models.ForeignKey(LadderPlayer)
should be all you need to change in the models. You then should be able to do a query like
player_of_interest = LadderPlayer.objects.filter(pk=some_id)
matches_of_interest = \
Match.objects.filter(Q(challenger__pk=some_id)|Q(challengee__pk=some_id))
to get all the information of interest about the player in question. Note that you'll need to have from django.db.models import Q to use that.
If you want exactly the same info you're presenting with your example query, I believe it'd be easiest to split the queries into separate ones for getting the challenger & challengee lists -- for example, something like:
challengers = LadderPlayer.objects.filter(challenges__challengee__pk=poi_id)
challenged_by = LadderPlayer.objects.filter(challenges__challenger__pk=poi_id)
will get the two relevant query sets for the player of interest (w/ a primary key of poi_id).
If there's some particular reason you don't want the de facto many-to-many relationship to become a de jure one, you can change those to something along the lines of
challenger = LadderPlayer.objects.filter(match__challengee__pk=poi_id)
challenged_by = LadderPlayer.objects.filter(match__challenger_pk=poi_id)
So the suggestion for the model change is merely to help leverage existing tools, and to make explicit a relationship which you are currently having occur implicitly.
Based on how you want use it, you might want to do something like
pl_tuple = ()
for p in LadderPlayer.objects.all():
challengers = LadderPlayer.objects.filter(challenges__challengee__pk=p.id)
challenged_by = LadderPlayer.objects.filter(challenges__challenger__pk=p.id)
pl_tuple += (p.id, p.position, challengers, challenged_by)
context_dict['ladder_players'] = pl_tuple
in your view to prepare the data for your template.
Regardless, you should probably be doing your query through the Django ORM instead of using raw() in this case.