Django 1.0/1.1 rewrite of self join - sql

Is there a way to rewrite this query using the Django QuerySet object:
SELECT b.created_on, SUM(a.vote)
FROM votes a JOIN votes b ON a.created_on <= b.created_on
WHERE a.object_id = 1
GROUP BY 1
Where votes is a table, object_id is an int that occurs multiple times (foreign key - although that doesn't matter here), and created_on which is a datetime.
FWIW, this query allows one to get a score at any time in the past by summing up all previous votes on that object_id.

I'm pretty sure that query cannot be created with the Django ORM. The new Django aggregation code is pretty flexible, but I don't think it can do exactly what you want.
Are you sure that query works? You seem to be missing a check that b.object_id is 1.
This code should work, but it's more than one line and not that efficient.
from django.db.models import Sum
v_list = votes.objects.filter(object__id=1)
for v in v_list:
v.previous_score = votes.objects.filter(object__id=1, created_on__lte=v.created_on).aggregate(Sum('vote'))["vote__sum"]
Aggregation is only available in trunk, so you might need to update your django install before you can do this.

Aggregation isn't the issue; the problem here is that Django's ORM simply doesn't do joins on anything that isn't a ForeignKey, AFAIK.

This is what I'm using now. Ironically, the sql is broken but this is the gist of it:
def get_score_over_time(self, obj):
"""
Get a dictionary containing the score and number of votes
at all times historically
"""
import pdb; pdb.set_trace();
ctype = ContentType.objects.get_for_model(obj)
try:
query = """SELECT b.created_on, SUM(a.vote)
FROM %s a JOIN %s b
ON a.created_on <= b.created_on
WHERE a.object_id = %s
AND a.content_type_id = %s
GROUP BY 1""" % (
connection.ops.quote_name(self.model._meta.db_table),
connection.ops.quote_name(self.model._meta.db_table),
1,
ctype.id,
)
cursor = connection.cursor()
cursor.execute(query)
result_list = []
for row in cursor.fetchall():
result_list.append(row)
except models.ObjectDoesNotExist:
result_list = None
return result_list

Related

How to perform translation from RAW SQL to django queryset

I am struggling with conversion to django query having raw sql
I am new in django and any help will be appreciated
There are simple models:
Winemaker - target model
Wine
Post
Winemaker has 1+ Wines
Wine has 1+ Posts
I know that it should be done with annotations but have no idea how to implement it.
select w2.*,
(select count(wp.id)
from web_winemaker www
inner join web_wine ww on www.id = ww.winemaker_id
inner join web_post wp on ww.id = wp.wine_id
where
ww.status=20
and
wp.status=20
and
www.id = w2.id
) as wineposts_count,
(
select count(w.id)
from web_winemaker www1
inner join web_wine w on www1.id = w.winemaker_id
where
w.status=20
and www1.id = w2.id
) as wines_count
from web_winemaker w2;
You should be able to accomplish this with a Count aggregation expression in an annotate function. I took a guess at your related_name values on your relationship fields, so the following code may not plug in directly, but should give you an idea of how to do what you want.
from django.db.models import Count, Q
wine_makers = Winemaker.objects.annotate(
posts_count=Count(
'wine__post__id',
filter=Q(wines__status=20, wines__posts__status=20),
),
wines_count=Count(
'wines__id',
filter=Q(wines__status=20),
),
)
You may need to supply distinct=True depending on if you're crossing relationships.

SQL IN and AND clause output

I have written one small query like below. It is giving me output.
select user_id
from table tf
where tf.function_id in ('1001051','1001060','1001061')
but when i am running query like below it is showing 0 out put.however i have verified manually we have user_id's where all the 3 function_id's are present.
select user_id
from table tf
where tf.function_id='1001051'
and
tf.function_id='1001060'
and
tf.function_id='1001061'
it looks very simple to use AND clause. However i am not gettng desired output. AM i doing something wrong?
Thanks in advance
Is this what you want to do?
select tf.user_id
from table tf
where tf.function_id in ('1001051', '1001060', '1001061')
group by tf.user_id
having count(distinct tf.function_id) = 3;
This returns users that have all three functions.
EDIT:
This is the query in your comment:
select tu.dealer_id, tu.usr_alias, tf.function_nm
from t_usr tu, t_usr_function tuf, t_function tf
where tu.usr_id = tuf.usr_id and tuf.function_id = tf.function_id and
tf.function_id = '1001051' and tf.function_id = '1001060' and tf.function_id = '1001061' ;
First, you should learn proper join syntax. Simple rule: Never use commas in the from clause.
I think the query you want is:
select tu.dealer_id, tu.usr_alias
from t_usr tu join
t_usr_function tuf
on tu.usr_id = tuf.usr_id
where tuf.function_id in ('1001051', '1001060', '1001061')
group by tu.dealer_id, tu.usr_alias
having count(distinct tuf.function_id) = 3;
This doesn't give you the function name. I'm not sure why you need such detail if all three functions are there for each "user" (or at least dealer/user alias combination). And, the original question doesn't request this level of detail.
Using 'AND' clause mean that the query should satisfy all of the conditions.
in your case, you need to return either when the function_id='1001051' OR function_id='1001060'.
So in brief you need to replace the AND by OR.
select user_id from table tf
where tf.function_id='1001051' OR tf.function_id='1001060' OR tf.function_id='1001061'
Thats what the IN do, it compares with either of them.
As I pointed out in the comment, AND is not the right operator since all three conditions together will not be met. Use OR instead,
select user_id from table tf
where tf.function_id='1001051' OR tf.function_id='1001060' OR tf.function_id='1001061'
You're asking for the value to be three different values at the same time. A better use would be to use OR instead of AND:
select user_id from table tf
where tf.function_id='1001051' or tf.function_id='1001060' or tf.function_id='1001061'
If all of these things are true:
tf.function_id='1001051'
tf.function_id='1001060'
tf.function_id='1001061'
Then simple algebra tells us this must also be true:
'1001051'='1001060'='1001061'
Since that clearly can't ever be true, your SQL statement's where clause will always resolve to false.
What you want to say is that any of those conditions is true (which is equivalent to in), which means you need to use or:
SELECT user_id
FROM table tf
WHERE tf.function_id = '1001051'
OR tf.function_id = '1001060'
OR tf.function_id = '1001061'
The where clause applies to each row returned by the query. In order to gather data across rows, you either need to join the table to itself enough times to create a single row that satisfies the condition you're looking for or use aggregate functions to consolidate several rows into a single row.
Self-join solution:
SELECT user_id
FROM table tf1
JOIN table tf2 ON tf1.user_id = tf2.user_id
JOIN table tf3 ON tf1.user_id = tf3.user_id
WHERE tf1.function_id = '1001051'
AND tf2.function_id = '1001060'
AND tf3.function_id = '1001061'
Aggregate solution:
SELECT user_id
FROM table tf
WHERE tf.function_id IN ('1001051', '1001060', '1001061')
GROUP BY user_id
HAVING COUNT (DISTINCT tf.function_id) = 3
Try this as this link SQL IN
select function_id, user_id from table tf
where tf.function_id in ('1001051','1001060','1001061')

How to join on subqueries using ARel?

I have a few massive SQL request involving join across various models in my rails application.
A single request can involve 6 to 10 tables.
To run the request faster I want to use sub-queries in the joins (that way I can filter these tables before the join and reduce the columns to the ones I need). I'm trying to achieve this using ARel.
I thought I found the solution to my problem there: How to do joins on subqueries in AREL within Rails,
but things must have changed because I get undefined method '[]' for Arel::SelectManager.
Does anybody have any idea how to achieve this (without using strings) ?
Pierre, I thought a better solution could be the following (inspiration from this gist):
a = A.arel_table
b = B.arel_table
subquery = b.project(b[:a_id].as('A_id')).where{c > 4}
subquery = subquery.as('intm_table')
query = A.join(subquery).on(subquery[:A_id].eq(a[:id]))
No particular reason for naming the alias as "intm_table", I just thought it would be less confusing.
OK so my main problem was that you can't join a Arel::SelectManager ... BUT you can join a table aliasing.
So to generate the request in my comment above:
a = A.arel_table
b = B.arel_table
subquery = B.select(:a_id).where{c > 4}
query = A.join(subquery.as('B')).on(b[:a_id].eq(a[:id])
query.to_sql # SELECT A.* INNER JOIN (SELECT B.a_id FROM B WHERE B.c > 4) B ON A.id = B.a_id
Was looking for this, and was helped by the other answers, but there are some error in both, e.g. A.join(... should be a.join(....
And I also missed how to build an ActiveRecord::Relation.
Here is how to build an ActiveRecord::Relation, in Rails 4
a = A.arel_table
b = B.arel_table
subsel = B.select(b[:a_id]).where(b[:c].gt('4')).as('sub_select')
joins = a.join(subsel).on(subsel[:a_id].eq(a[:id])).join_sources
rel = A.joins(joins)
rel.to_sql
#=> "SELECT `a`.* FROM `a` INNER JOIN (SELECT `b`.`a_id` FROM `b` WHERE (`b`.`c` > 4)) sub_select ON sub_select.`a_id` = `a`.`id`"

Remove duplicates in a Django query

Is there a simple way to remove duplicates in the following basic query:
email_list = Emails.objects.order_by('email')
I tried using duplicate() but it was not working. What is the exact syntax for doing this query without duplicates?
This query will not give you duplicates - ie, it will give you all the rows in the database, ordered by email.
However, I presume what you mean is that you have duplicate data within your database. Adding distinct() here won't help, because even if you have only one field, you also have an automatic id field - so the combination of id+email is not unique.
Assuming you only need one field, email_address, de-duplicated, you can do this:
email_list = Email.objects.values_list('email', flat=True).distinct()
However, you should really fix the root problem, and remove the duplicate data from your database.
Example, deleting duplicate Emails by email field:
for email in Email.objects.values_list('email', flat=True).distinct():
Email.objects.filter(pk__in=Email.objects.filter(email=email).values_list('id', flat=True)[1:]).delete()
Or books by name:
for name in Book.objects.values_list('name', flat=True).distinct():
Book.objects.filter(pk__in=Artwork.objects.filter(name=name).values_list('id', flat=True)[3:]).delete()
For checking duplicate you can do a GROUP_BY and HAVING in Django as below. We are using Django annotations here.
from django.db.models import Count
from app.models import Email
duplicate_emails = Email.objects.values('email').annotate(email_count=Count('email')).filter(email_count__gt=1)
Now looping through the above data and deleting all other emails except the first one (depends on requirement or whatever).
for data in duplicates_emails:
email = data['email']
Email.objects.filter(email=email).order_by('pk')[1:].delete()
You can chain .distinct() on the end of your queryset to filter duplicates. Check out: http://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.distinct
You may be able to use the distinct() function, depending on your model. If you only want to retrieve a single field form the model, you could do something like:
email_list = Emails.objects.values_list('email').order_by('email').distinct()
which should give you an ordered list of emails.
You can also use set()
email_list = set(Emails.objects.values_list('email', flat=True))
Use, self queryset.annotate()!
from django.db.models import Subquery, OuterRef
email_list = Emails.objects.filter(
pk__in = Emails.objects.values('emails').distinct().annotate(
pk = Subquery(
Emails.objects.filter(
emails= OuterRef("emails")
)
.order_by("pk")
.values("pk")[:1])
)
.values_list("pk", flat=True)
)
This queryset goes to make this query.
SELECT `email`.`id`,
`email`.`title`,
`email`.`body`,
...
...
FROM `email`
WHERE `email`.`id` IN (
SELECT DISTINCT (
SELECT U0.`id`
FROM `email` U0
WHERE U0.`email` = V0.`approval_status`
ORDER BY U0.`id` ASC
LIMIT 1
) AS `pk`
FROM `agent` V0
)
cheet-sheet
from django.db.models import Subquery, OuterRef
group_by_duplicate_col_queryset = Models.objects.filter(
pk__in = Models.objects.values('duplicate_col').distinct().annotate(
pk = Subquery(
Models.objects.filter(
duplicate_col= OuterRef('duplicate_col')
)
.order_by("pk")
.values("pk")[:1])
)
.values_list("pk", flat=True)
)
I used the following to actually remove the duplicate entries from from the database, hopefully this helps someone else.
adds = Address.objects.all()
d = adds.distinct('latitude', 'longitude')
for address in adds:
if i not in d:
address.delete()
you can use this raw query : your_model.objects.raw("select * from appname_Your_model group by column_name")

Django Generic Relations and ORM Queries

Say I have the following models:
class Image(models.Model):
image = models.ImageField(max_length=200, upload_to=file_home)
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
content_object = generic.GenericForeignKey()
class Article(models.Model):
text = models.TextField()
images = generic.GenericRelation(Image)
class BlogPost(models.Model):
text = models.TextField()
images = generic.GenericRelation(Image)
What's the most processor- and memory-efficient way to find all Articles that have at least one Image attached to them?
I've done this:
Article.objects.filter(pk__in=Image.objects.filter(content_type=ContentType.objects.get_for_model(Article)).values_list('object_id', flat=True))
Which works, but besides being ugly it takes forever.
I suspect there's a better solution using raw SQL, but that's beyond me. For what it's worth, the SQL generated by the above is as following:
SELECT `issues_article`.`id`, `issues_article`.`text` FROM `issues_article` WHERE `issues_article`.`id` IN (SELECT U0.`object_id` FROM `uploads_image` U0 WHERE U0.`content_type_id` = 26 ) LIMIT 21
EDIT: czarchaic's suggestion has much nicer syntax but even worse (slower) performance. The SQL generated by his query looks like the following:
SELECT DISTINCT `issues_article`.`id`, `issues_article`.`text`, COUNT(`uploads_image`.`id`) AS `num_images` FROM `issues_article` LEFT OUTER JOIN `uploads_image` ON (`issues_article`.`id` = `uploads_image`.`object_id`) GROUP BY `issues_article`.`id` HAVING COUNT(`uploads_image`.`id`) > 0 ORDER BY NULL LIMIT 21
EDIT: Hooray for Jarret Hardie! Here's the SQL generated by his should-have-been-obvious solution:
SELECT DISTINCT `issues_article`.`id`, `issues_article`.`text` FROM `issues_article` INNER JOIN `uploads_image` ON (`issues_article`.`id` = `uploads_image`.`object_id`) WHERE (`uploads_image`.`id` IS NOT NULL AND `uploads_image`.`content_type_id` = 26 ) LIMIT 21
Thanks to generic relations, you should be able to query this structure using traditional query-set semantics for reverse relations:
Article.objects.filter(images__isnull=False)
This will produce duplicates for any Articles that are related to multiple Images, but you can eliminate that with the distinct() QuerySet method:
Article.objects.distinct().filter(images__isnull=False)
I think your best bet would be to use aggregation
from django.db.models import Count
Article.objects.annotate(num_images=Count('images')).filter(num_images__gt=0)