Django Generic Relations and ORM Queries

Django Generic Relations and ORM Queries - sql

Say I have the following models:
class Image(models.Model):
image = models.ImageField(max_length=200, upload_to=file_home)
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
content_object = generic.GenericForeignKey()
class Article(models.Model):
text = models.TextField()
images = generic.GenericRelation(Image)
class BlogPost(models.Model):
text = models.TextField()
images = generic.GenericRelation(Image)
What's the most processor- and memory-efficient way to find all Articles that have at least one Image attached to them?
I've done this:
Article.objects.filter(pk__in=Image.objects.filter(content_type=ContentType.objects.get_for_model(Article)).values_list('object_id', flat=True))
Which works, but besides being ugly it takes forever.
I suspect there's a better solution using raw SQL, but that's beyond me. For what it's worth, the SQL generated by the above is as following:
SELECT `issues_article`.`id`, `issues_article`.`text` FROM `issues_article` WHERE `issues_article`.`id` IN (SELECT U0.`object_id` FROM `uploads_image` U0 WHERE U0.`content_type_id` = 26 ) LIMIT 21
EDIT: czarchaic's suggestion has much nicer syntax but even worse (slower) performance. The SQL generated by his query looks like the following:
SELECT DISTINCT `issues_article`.`id`, `issues_article`.`text`, COUNT(`uploads_image`.`id`) AS `num_images` FROM `issues_article` LEFT OUTER JOIN `uploads_image` ON (`issues_article`.`id` = `uploads_image`.`object_id`) GROUP BY `issues_article`.`id` HAVING COUNT(`uploads_image`.`id`) > 0 ORDER BY NULL LIMIT 21
EDIT: Hooray for Jarret Hardie! Here's the SQL generated by his should-have-been-obvious solution:
SELECT DISTINCT `issues_article`.`id`, `issues_article`.`text` FROM `issues_article` INNER JOIN `uploads_image` ON (`issues_article`.`id` = `uploads_image`.`object_id`) WHERE (`uploads_image`.`id` IS NOT NULL AND `uploads_image`.`content_type_id` = 26 ) LIMIT 21

Thanks to generic relations, you should be able to query this structure using traditional query-set semantics for reverse relations:
Article.objects.filter(images__isnull=False)
This will produce duplicates for any Articles that are related to multiple Images, but you can eliminate that with the distinct() QuerySet method:
Article.objects.distinct().filter(images__isnull=False)

I think your best bet would be to use aggregation
from django.db.models import Count
Article.objects.annotate(num_images=Count('images')).filter(num_images__gt=0)

Related

Django query not working in shell_plus but working in dbshell

I have a query that won't return me an expected value, but when I print the query itself, and run it in Dbshell, it does work. I am on Django 1.8.18 with SQLite version 3.11.0
My Recommendation has a Foreign Key on my Car, and I need to get all my Cars that do not have a Recommendation with is_active=True AND description=FOO. I know I could probably make it work in the other way, but it would be way easier for me to make it work this way.
class Car(models.Model):
kind = models.CharField(max_length=100)
class Recommendation(models.Model):
car = models.ForeignKey(Car)
is_active = models.BooleanField(default=True)
description = models.CharField(max_length=100)
I have created a Recommendation linked to my Car id 100, with is_active set to False, and description to FOO
Car.objects.exclude(recommendation__is_active=True, recommendation__description="FOO")
This query returns me nothing, when I expected it to return Car 100. I decided to print the actual query and try it in dbshell
SELECT "myapp_car"."id"
FROM "myapp_car"
WHERE NOT ("myapp_car"."id" IN (SELECT U1."car_id" AS Col1 FROM "myotherapp_recommendation" U1 WHERE U1."description" = 'FOO') AND "myapp_car"."id" IN (SELECT U1."car_id" AS Col1 FROM "myotherapp_recommendation" U1 WHERE U1."is_active" = 'True'))
However, this properly works ! It returns me my Car 100
I have also tried with Q, but it didn't work either
Car.objects.exclude(Q(recommendation__is_active=True) & Q(recommendation__description="FOO"))
It feels like a Django bug, but I'd rather have your opinion

What you have written here, is basically two JOINs: you exclude Car objects that have a Recommendation that is not is_active, and you furthermore exclude Cars that have a Recommendation (not necessarily the same), that have a Recommendation with description='FOO'. But those recommendations are not per se the same. This is a consequence of negative logic.
With EXISTS
If we want to JOIN over the same table, we can work with an EXISTS subquery:
to_exclude = Recommendation.objects.filter(
car=OuterRef('pk'),
is_active=True,
description="FOO",
)
Now we can exclude the Cars for which such Recommendation exists:
Car.objects.annotate(
has_recommendation_to_exclude=Exists(to_exclude)
).exclude(
has_recommendation_to_exclude=False
)
Using a COUNT-filter approach
Car.objects.annotate(
nrec=Sum(
Case(
When(
recommendation__is_active=True,
recommendation__description="FOO",
then=Value(1)
),
default=Value(0),
output_field=IntegerField(),
)
)
).exclude(nrec__gt=0)

Selecting rows from Parent Table only if multiple rows in Child Table match

Im building a code that learns tic tac toe, by saving info in a database.
I have two tables, Games(ID,Winner) and Turns(ID,Turn,GameID,Place,Shape).
I want to find parent by multiple child infos.
For Example:
SELECT GameID FROM Turns WHERE
GameID IN (WHEN Turn = 1 THEN Place = 1) AND GameID IN (WHEN Turn = 2 THEN Place = 4);
Is something like this possible?
Im using ms-access.
Turm - Game turn GameID - Game ID Place - Place on matrix
1=top right, 9=bottom left Shape - X or circle
Thanks in advance

This very simple query will do the trick in a single scan, and doesn't require you to violate First Normal Form by storing multiple values in a string (shudder).
SELECT T.GameID
FROM Turns AS T
WHERE
(T.Turn = 1 AND T.Place = 1)
OR (T.Turn = 2 AND T.Place = 4)
GROUP BY T.GameID
HAVING Count(*) = 2;
There is no need to join to determine this information, as is suggested by other answers.
Please use proper database design principles in your database, and don't violate First Normal Form by storing multiple values together in a single string!

The general solution to your problem can be accomplished by using a sub-query that contains a self-join between two instances of the Turns table:
SELECT * FROM Games
WHERE GameID IN
(
SELECT Turns1.GameID
FROM Turns AS Turns1
INNER JOIN Turns AS Turns2
ON Turns1.GameID = Turns2.GameID
WHERE (
(Turns1.Turn=1 AND Turns1.Place = 1)
AND
(Turns2.Turn=2 AND Turns2.Place = 4))
);
The Self Join between Turns (aliased Turns1 and Turns2) is key, because if you just try to apply both sets of conditions at once like this:
WHERE (
(Turns.Turn=1 AND Turns.Place = 1)
AND
(Turns.Turn=2 AND Turns.Place = 4))
you will never get any rows back. This is because in your table there is no way for an individual row to satisfy both conditions at the same time.
My experience using Access is that to do a complex query like this you have to use the SQL View and type the query in on your own, rather than use the Query Designer. It may be possible to do in the Designer, but it's always been far easier for me to write the code myself.

select GameID from Games g where exists (select * from turns t where
t.gameid = g.gameId and ((turn =1 and place = 1) or (turn =2 and place =5)))
This will select all the games that have atleast one turn with the coresponding criteria.
More info on exist:
http://www.techonthenet.com/sql/exists.php

I bypassed this problem by adding a column which holds the turns as a string example : "154728" and i search for it instead. I think this solution is also less demanding on the database

Optimizing a request doing tons of exclusions

I have a huge and dirty SQL request doing many exclusions and I feel bad about it. Perhaps, you know a better way to proceed.
Here's a part of my request:
select name, version, iteration, score
from article, articlemaster
where article.idmaster = articlemaster.id
and article.id not in (select article.id
from article, spsarticlemaster
where article.idmaster = articlemaster.id
and articlemaster.name = 'nameOfMyArticle'
and article.version = 'A'
and article.state = 'CONTINUE')
and article.id not in....
and article.id not in....
You think it doesn't look that bad ? Actually, this is only a portion of the request, the "and spsarticle.id not in ...." exclude one article, and i got more than 1000 to exclude, so i'm using a java program to append the other 999.
Any idea how could i make a light version of this abomination ?

You might be better off loading all of the articles to exclude into a temporary table, then joining that table in to your query.
For example, create exclude_articles:
name version state
---- ------- -----
nameOfMyArticle A CONTINUE
Then exclude its results from the query:
select
article.name,
article.version,
article.iteration,
article.score
from
article
join articlemaster
on article.idmaster = articlemaster.id
where
not exists (
select 1
from article article2
join articlemaster articlemaster2
on article2.idmaster = articlemaster2.id
join exclude_articles
on articlemaster2.name = exclude_articles.name
and article2.version = exclude_articles.version
and article2.state = exclude_articles.state
where article.id = article2.id)
This is all assuming that the version and state are actually necessary for the exclusion logic. It would be a much easier case if the name is unique.

If you're using Java to create the query and process the results, then why not do all the complicated logic in Java? Just ask the database for all the articles matching some basic criterion (or maybe you really do want to read through all of them) and then filter the results:
select am.name, a.version, a.iteration, a.score, a.state
from article a, articlemaster am
where a.idmaster = am.id
and <some other basic criteria>
Then in Java loop over all the results (sorry, my Java is super rusty) and filter out the ones you don't want:
ArrayList recordList = ArrayList();
ArrayList finalList = ArrayList();
for (record in recordList) {
if (! filterThisRecord(record)) {
finalList.append(record);
}
}

Performing a Django raw SQL (using "WHERE col IN" syntax) or translating raw SQL to .raw() or .extra()

Django 1.3-dev provides several ways to query the database using raw SQL. They are covered here and here. The recommended ways are to use the .raw() or the .extra() methods. The advantage is that if the retrieved data fits the Model you can still use some of it's features directly.
The page I'm trying to display is somewhat complex because it uses lots of information which is spread across multiple tables with different relationships (one2one, one2many). With the current approach the server has to do about 4K queries per page. This is obviously slow due to database to webserver communication.
A possible solution is to use raw SQL to retrieve the relevant data but due to the complexity of the query I couldn't translate this to an equivalent in Django.
The query is:
SELECT clin.iso as iso,
(SELECT COUNT(*)
FROM clin AS a
LEFT JOIN clin AS b
ON a.pat_id = b.pat_id
WHERE a.iso = clin.iso
) AS multiple_iso,
(SELECT COUNT(*)
FROM samptopat
WHERE samptopat.iso_id = clin.iso
) AS multiple_samp,
(SELECT GROUP_CONCAT(value ORDER BY snp_id ASC)
FROM samptopat
RIGHT JOIN samptosnp
USING(samp_id)
WHERE iso_id = clin.iso
GROUP BY samp_id
LIMIT 1 -- Return 1st samp only
) AS snp
FROM clin
WHERE iso IN (...)
or alternatively WHERE iso = ....
Sample output looks like:
+-------+--------------+---------------+-------------+
| iso | multiple_iso | multiple_samp | snp |
+-------+--------------+---------------+-------------+
| 7 | 19883 | 0 | NULL |
| 8 | 19883 | 0 | NULL |
| 21092 | 1 | 2 | G,T,C,G,T,G |
| 31548 | 1 | 0 | NULL |
+-------+--------------+---------------+-------------+
4 rows in set (0.00 sec)
The documentation explains how one can do a query using WHERE col = %s but not the IN syntax.
One part of this question is How do I perform raw SQL queries using Django and the IN statement?
The other part is, considering the following models:
class Clin(models.Model):
iso = models.IntegerField(primary_key=True)
pat = models.IntegerField(db_column='pat_id')
class Meta:
db_table = u'clin'
class SampToPat(models.Model):
samptopat_id = models.IntegerField(primary_key=True)
samp = models.OneToOneField(Samp, db_column='samp_id')
pat = models.IntegerField(db_column='pat_id')
iso = models.ForeignKey(Clin, db_column='iso_id')
class Meta:
db_table = u'samptopat'
class Samp(models.Model):
samp_id = models.IntegerField(primary_key=True)
samp = models.CharField(max_length=8)
class Meta:
db_table = u'samp'
class SampToSnp(models.Model):
samptosnp_id = models.IntegerField(primary_key=True)
samp = models.ForeignKey(Samp, db_column='samp_id')
snp = models.IntegerField(db_column='snp_id')
value = models.CharField(max_length=2)
class Meta:
db_table = u'samptosnp'
Is it possible to rewrite the above query into something more ORM oriented?

For a problem like this one, I'd split the query into a small number of simpler ones, I think it's quite possible. Also, I found that MySQL actually may return results faster with this approach.
edit ...Actually after thinking a bit I see that you need to "annotate on subqueries", which is not possible in Django ORM (not in 1.2 at least). Maybe you have to do plain sql here or use some other tool to build the query.
Tried to rewrite your models in more default django pattern, maybe it will help to understand the problem better. Models Pat and Snp are missing though...
class Clin(models.Model):
pat = models.ForeignKey(Pat)
class Meta:
db_table = u'clin'
class SampToPat(models.Model):
samp = models.ForeignKey(Samp)
pat = models.ForeignKey(Pat)
iso = models.ForeignKey(Clin)
class Meta:
db_table = u'samptopat'
unique_together = ['samp', 'pat']
class Samp(models.Model):
samp = models.CharField(max_length=8)
snp_set = models.ManyToManyField(Snp, through='SampToSnp')
pat_set = models.ManyToManyField(Pat, through='SaptToPat')
class Meta:
db_table = u'samp'
class SampToSnp(models.Model):
samp = models.ForeignKey(Samp)
snp = models.ForeignKey(Snp)
value = models.CharField(max_length=2)
class Meta:
db_table = u'samptosnp'
The following seems to mean - get count of unique patients per clinic ...
(SELECT COUNT(*)
FROM clin AS a
LEFT JOIN clin AS b
ON a.pat_id = b.pat_id
WHERE a.iso = clin.iso
) AS multiple_iso,
Sample count per clinic:
(SELECT COUNT(*)
FROM samptopat
WHERE samptopat.iso_id = clin.iso
) AS multiple_samp,
This part is harder to understand, but in Django there is no way to do GROUP_CONCAT in plain ORM.
(SELECT GROUP_CONCAT(value ORDER BY snp_id ASC)
FROM samptopat
RIGHT JOIN samptosnp
USING(samp_id)
WHERE iso_id = clin.iso
GROUP BY samp_id
LIMIT 1 -- Return 1st samp only
) AS snp

Could you explain exactly what you're trying to extract w/ the snp subquery? I see you're joining over the two tables, but it looks like what you really want is Snp objects which have an associated Clin which has the given id. If so, this becomes almost as straightforward to do as a separate query as the other 2:
Snp.objects.filter(samp__pat__clin__pk=given_clin)
or some such thing ought to do the trick. You may have to rewrite that a bit due to all the ways you're violating the conventions, unfortunately.
The others are something like:
Pat.objects.filter(clin__pk=given_clin).count()
and
Samp.objects.filter(clin__pk=given_clin).count()
if #Evgeny's reading is correct (which is how I read it as well).
Often, with Django's ORM, I find I get better results if I try to think about directly what I want in terms of the ORM, instead of trying to translate to or from the SQL I might use if I wasn't using the ORM.

Django 1.0/1.1 rewrite of self join

Is there a way to rewrite this query using the Django QuerySet object:
SELECT b.created_on, SUM(a.vote)
FROM votes a JOIN votes b ON a.created_on <= b.created_on
WHERE a.object_id = 1
GROUP BY 1
Where votes is a table, object_id is an int that occurs multiple times (foreign key - although that doesn't matter here), and created_on which is a datetime.
FWIW, this query allows one to get a score at any time in the past by summing up all previous votes on that object_id.

I'm pretty sure that query cannot be created with the Django ORM. The new Django aggregation code is pretty flexible, but I don't think it can do exactly what you want.
Are you sure that query works? You seem to be missing a check that b.object_id is 1.
This code should work, but it's more than one line and not that efficient.
from django.db.models import Sum
v_list = votes.objects.filter(object__id=1)
for v in v_list:
v.previous_score = votes.objects.filter(object__id=1, created_on__lte=v.created_on).aggregate(Sum('vote'))["vote__sum"]
Aggregation is only available in trunk, so you might need to update your django install before you can do this.

Aggregation isn't the issue; the problem here is that Django's ORM simply doesn't do joins on anything that isn't a ForeignKey, AFAIK.

This is what I'm using now. Ironically, the sql is broken but this is the gist of it:
def get_score_over_time(self, obj):
"""
Get a dictionary containing the score and number of votes
at all times historically
"""
import pdb; pdb.set_trace();
ctype = ContentType.objects.get_for_model(obj)
try:
query = """SELECT b.created_on, SUM(a.vote)
FROM %s a JOIN %s b
ON a.created_on <= b.created_on
WHERE a.object_id = %s
AND a.content_type_id = %s
GROUP BY 1""" % (
connection.ops.quote_name(self.model._meta.db_table),
connection.ops.quote_name(self.model._meta.db_table),
1,
ctype.id,
)
cursor = connection.cursor()
cursor.execute(query)
result_list = []
for row in cursor.fetchall():
result_list.append(row)
except models.ObjectDoesNotExist:
result_list = None
return result_list

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Django Generic Relations and ORM Queries - sql

I think your best bet would be to use aggregation from django.db.models import Count Article.objects.annotate(num_images=Count('images')).filter(num_images__gt=0)

Related

Django query not working in shell_plus but working in dbshell

Selecting rows from Parent Table only if multiple rows in Child Table match

Optimizing a request doing tons of exclusions

Performing a Django raw SQL (using "WHERE col IN" syntax) or translating raw SQL to .raw() or .extra()

Django 1.0/1.1 rewrite of self join

Categories

Resources