how to add OR constraints to exclude query in django?

how to add OR constraints to exclude query in django? - sql

I have the following exclude that brings back all of books that have had transactions in the past two minutes. I want to add another constraint that says only if from a series of stores.
In SQL it would be where location_id = 1 or location_id = 2 or location_id = 3
Books have a location_id
How can I apply that to the below query?
transaction_window = timezone.now() + datetime.timedelta(minutes=-2)
ts = Book.objects.exclude(book_id__in = Status.objects
.filter(transaction_time__gte=transaction_window)
.values_list('book_id', flat=True))

You probably just want location__in=(1, 2, 3)

Use a Q object. They're built-in to django and allow more complex lookups.

From your code assuming Book model has field book_id and Status has also field book_id and this field are just plain IntegerField or similar.
from django.db.models import Q
transaction_window = timezone.now() - datetime.timedelta(minutes=2)
statuses = Status.objects.filter(transaction_time__gte=transaction_window)
ts = Book.objects.filter(~Q(book__id__in=statuses)
| ~Q(location_id__in=(1, 2, 3)))
If my guess about your models is correct.

Related

How to combine 4 sql queries into a single query with good performance?

I have a problem to solve. First I split this problem into parts and so I wrote four queries separately but now I need to put them together as if it were a single call to return a single result. How can I do this?
1) I select purchases according to branch and store
SELECT CD_PURCHASE FROM TB_PURCHASE_STORE WHERE CD_BRANCH = ? AND CD_STORE = ?
2) I validate if the promotional period of the purchase is within the current date (today)
SELECT CD_PURCHASE, DT_BEGIN_PROMOTION, DT_END_PROMOTION FROM TB_PURCHASE
WHERE SYSDATE BETWEEN TO_DATE(DT_BEGIN_PROMOTION) AND TO_DATE(DT_END_PROMOTION)
3) From the purchase code, I check which products are active
SELECT CD_PURCHASE, CD_PRODUCT FROM TB_PURCHASE_PRODUCT WHERE FL_ACTIVE = 1
4) Finally, I return some fields according to the customer id
SELECT CD_PURCHASE, CD_PRODUCT, ID_CUSTOMER, DT_LAST_PURCHASE
FROM TB_PURCHASE_SALES WHERE ID_CUSTOMER = ?

Have you tried below query? I've assumed you want INNER JOIN for all the tables, and CD_PURCHASE is common link in all the tables, and CD_PRODUCT is the link between TB_PURCHASE_PRODUCT and TB_PURCHASE_SALES.
SELECT TPS.CD_PURCHASE,
TP.CD_PURCHASE, TP.DT_BEGIN_PROMOTION, TP.DT_END_PROMOTION,
TPP.CD_PURCHASE, TPP.CD_PRODUCT,
TPSS.CD_PURCHASE, TPSS.CD_PRODUCT, TPSS.ID_CUSTOMER, TPSS.DT_LAST_PURCHASE
FROM TB_PURCHASE_STORE TPS,
TB_PURCHASE TP,
TB_PURCHASE_PRODUCT TPP,
TB_PURCHASE_SALES TPSS
WHERE TPS.CD_BRANCH = ? AND TPS.CD_STORE = ?
AND TPS.CD_PURCHASE = TP.CD_PURCHASE
AND SYSDATE BETWEEN TO_DATE(TP.DT_BEGIN_PROMOTION) AND TO_DATE(TP.DT_END_PROMOTION)
AND TPP.CD_PURCHASE = TPS.CD_PURCHASE
AND TPP.FL_ACTIVE = 1
AND TPSS.CD_PURCHASE = TPS.CD_PURCHASE AND TPSS.CD_PRODUCT = TPP.CD_PRODUCT
AND TPSS.ID_CUSTOMER = ?

Selecting rows from Parent Table only if multiple rows in Child Table match

Im building a code that learns tic tac toe, by saving info in a database.
I have two tables, Games(ID,Winner) and Turns(ID,Turn,GameID,Place,Shape).
I want to find parent by multiple child infos.
For Example:
SELECT GameID FROM Turns WHERE
GameID IN (WHEN Turn = 1 THEN Place = 1) AND GameID IN (WHEN Turn = 2 THEN Place = 4);
Is something like this possible?
Im using ms-access.
Turm - Game turn GameID - Game ID Place - Place on matrix
1=top right, 9=bottom left Shape - X or circle
Thanks in advance

This very simple query will do the trick in a single scan, and doesn't require you to violate First Normal Form by storing multiple values in a string (shudder).
SELECT T.GameID
FROM Turns AS T
WHERE
(T.Turn = 1 AND T.Place = 1)
OR (T.Turn = 2 AND T.Place = 4)
GROUP BY T.GameID
HAVING Count(*) = 2;
There is no need to join to determine this information, as is suggested by other answers.
Please use proper database design principles in your database, and don't violate First Normal Form by storing multiple values together in a single string!

The general solution to your problem can be accomplished by using a sub-query that contains a self-join between two instances of the Turns table:
SELECT * FROM Games
WHERE GameID IN
(
SELECT Turns1.GameID
FROM Turns AS Turns1
INNER JOIN Turns AS Turns2
ON Turns1.GameID = Turns2.GameID
WHERE (
(Turns1.Turn=1 AND Turns1.Place = 1)
AND
(Turns2.Turn=2 AND Turns2.Place = 4))
);
The Self Join between Turns (aliased Turns1 and Turns2) is key, because if you just try to apply both sets of conditions at once like this:
WHERE (
(Turns.Turn=1 AND Turns.Place = 1)
AND
(Turns.Turn=2 AND Turns.Place = 4))
you will never get any rows back. This is because in your table there is no way for an individual row to satisfy both conditions at the same time.
My experience using Access is that to do a complex query like this you have to use the SQL View and type the query in on your own, rather than use the Query Designer. It may be possible to do in the Designer, but it's always been far easier for me to write the code myself.

select GameID from Games g where exists (select * from turns t where
t.gameid = g.gameId and ((turn =1 and place = 1) or (turn =2 and place =5)))
This will select all the games that have atleast one turn with the coresponding criteria.
More info on exist:
http://www.techonthenet.com/sql/exists.php

I bypassed this problem by adding a column which holds the turns as a string example : "154728" and i search for it instead. I think this solution is also less demanding on the database

Remove duplicates in a Django query

Is there a simple way to remove duplicates in the following basic query:
email_list = Emails.objects.order_by('email')
I tried using duplicate() but it was not working. What is the exact syntax for doing this query without duplicates?

This query will not give you duplicates - ie, it will give you all the rows in the database, ordered by email.
However, I presume what you mean is that you have duplicate data within your database. Adding distinct() here won't help, because even if you have only one field, you also have an automatic id field - so the combination of id+email is not unique.
Assuming you only need one field, email_address, de-duplicated, you can do this:
email_list = Email.objects.values_list('email', flat=True).distinct()
However, you should really fix the root problem, and remove the duplicate data from your database.
Example, deleting duplicate Emails by email field:
for email in Email.objects.values_list('email', flat=True).distinct():
Email.objects.filter(pk__in=Email.objects.filter(email=email).values_list('id', flat=True)[1:]).delete()
Or books by name:
for name in Book.objects.values_list('name', flat=True).distinct():
Book.objects.filter(pk__in=Artwork.objects.filter(name=name).values_list('id', flat=True)[3:]).delete()

For checking duplicate you can do a GROUP_BY and HAVING in Django as below. We are using Django annotations here.
from django.db.models import Count
from app.models import Email
duplicate_emails = Email.objects.values('email').annotate(email_count=Count('email')).filter(email_count__gt=1)
Now looping through the above data and deleting all other emails except the first one (depends on requirement or whatever).
for data in duplicates_emails:
email = data['email']
Email.objects.filter(email=email).order_by('pk')[1:].delete()

You can chain .distinct() on the end of your queryset to filter duplicates. Check out: http://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.distinct

You may be able to use the distinct() function, depending on your model. If you only want to retrieve a single field form the model, you could do something like:
email_list = Emails.objects.values_list('email').order_by('email').distinct()
which should give you an ordered list of emails.

You can also use set()
email_list = set(Emails.objects.values_list('email', flat=True))

Use, self queryset.annotate()!
from django.db.models import Subquery, OuterRef
email_list = Emails.objects.filter(
pk__in = Emails.objects.values('emails').distinct().annotate(
pk = Subquery(
Emails.objects.filter(
emails= OuterRef("emails")
)
.order_by("pk")
.values("pk")[:1])
)
.values_list("pk", flat=True)
)
This queryset goes to make this query.
SELECT `email`.`id`,
`email`.`title`,
`email`.`body`,
...
...
FROM `email`
WHERE `email`.`id` IN (
SELECT DISTINCT (
SELECT U0.`id`
FROM `email` U0
WHERE U0.`email` = V0.`approval_status`
ORDER BY U0.`id` ASC
LIMIT 1
) AS `pk`
FROM `agent` V0
)
cheet-sheet
from django.db.models import Subquery, OuterRef
group_by_duplicate_col_queryset = Models.objects.filter(
pk__in = Models.objects.values('duplicate_col').distinct().annotate(
pk = Subquery(
Models.objects.filter(
duplicate_col= OuterRef('duplicate_col')
)
.order_by("pk")
.values("pk")[:1])
)
.values_list("pk", flat=True)
)

I used the following to actually remove the duplicate entries from from the database, hopefully this helps someone else.
adds = Address.objects.all()
d = adds.distinct('latitude', 'longitude')
for address in adds:
if i not in d:
address.delete()

you can use this raw query : your_model.objects.raw("select * from appname_Your_model group by column_name")

Performing a Django raw SQL (using "WHERE col IN" syntax) or translating raw SQL to .raw() or .extra()

Django 1.3-dev provides several ways to query the database using raw SQL. They are covered here and here. The recommended ways are to use the .raw() or the .extra() methods. The advantage is that if the retrieved data fits the Model you can still use some of it's features directly.
The page I'm trying to display is somewhat complex because it uses lots of information which is spread across multiple tables with different relationships (one2one, one2many). With the current approach the server has to do about 4K queries per page. This is obviously slow due to database to webserver communication.
A possible solution is to use raw SQL to retrieve the relevant data but due to the complexity of the query I couldn't translate this to an equivalent in Django.
The query is:
SELECT clin.iso as iso,
(SELECT COUNT(*)
FROM clin AS a
LEFT JOIN clin AS b
ON a.pat_id = b.pat_id
WHERE a.iso = clin.iso
) AS multiple_iso,
(SELECT COUNT(*)
FROM samptopat
WHERE samptopat.iso_id = clin.iso
) AS multiple_samp,
(SELECT GROUP_CONCAT(value ORDER BY snp_id ASC)
FROM samptopat
RIGHT JOIN samptosnp
USING(samp_id)
WHERE iso_id = clin.iso
GROUP BY samp_id
LIMIT 1 -- Return 1st samp only
) AS snp
FROM clin
WHERE iso IN (...)
or alternatively WHERE iso = ....
Sample output looks like:
+-------+--------------+---------------+-------------+
| iso | multiple_iso | multiple_samp | snp |
+-------+--------------+---------------+-------------+
| 7 | 19883 | 0 | NULL |
| 8 | 19883 | 0 | NULL |
| 21092 | 1 | 2 | G,T,C,G,T,G |
| 31548 | 1 | 0 | NULL |
+-------+--------------+---------------+-------------+
4 rows in set (0.00 sec)
The documentation explains how one can do a query using WHERE col = %s but not the IN syntax.
One part of this question is How do I perform raw SQL queries using Django and the IN statement?
The other part is, considering the following models:
class Clin(models.Model):
iso = models.IntegerField(primary_key=True)
pat = models.IntegerField(db_column='pat_id')
class Meta:
db_table = u'clin'
class SampToPat(models.Model):
samptopat_id = models.IntegerField(primary_key=True)
samp = models.OneToOneField(Samp, db_column='samp_id')
pat = models.IntegerField(db_column='pat_id')
iso = models.ForeignKey(Clin, db_column='iso_id')
class Meta:
db_table = u'samptopat'
class Samp(models.Model):
samp_id = models.IntegerField(primary_key=True)
samp = models.CharField(max_length=8)
class Meta:
db_table = u'samp'
class SampToSnp(models.Model):
samptosnp_id = models.IntegerField(primary_key=True)
samp = models.ForeignKey(Samp, db_column='samp_id')
snp = models.IntegerField(db_column='snp_id')
value = models.CharField(max_length=2)
class Meta:
db_table = u'samptosnp'
Is it possible to rewrite the above query into something more ORM oriented?

For a problem like this one, I'd split the query into a small number of simpler ones, I think it's quite possible. Also, I found that MySQL actually may return results faster with this approach.
edit ...Actually after thinking a bit I see that you need to "annotate on subqueries", which is not possible in Django ORM (not in 1.2 at least). Maybe you have to do plain sql here or use some other tool to build the query.
Tried to rewrite your models in more default django pattern, maybe it will help to understand the problem better. Models Pat and Snp are missing though...
class Clin(models.Model):
pat = models.ForeignKey(Pat)
class Meta:
db_table = u'clin'
class SampToPat(models.Model):
samp = models.ForeignKey(Samp)
pat = models.ForeignKey(Pat)
iso = models.ForeignKey(Clin)
class Meta:
db_table = u'samptopat'
unique_together = ['samp', 'pat']
class Samp(models.Model):
samp = models.CharField(max_length=8)
snp_set = models.ManyToManyField(Snp, through='SampToSnp')
pat_set = models.ManyToManyField(Pat, through='SaptToPat')
class Meta:
db_table = u'samp'
class SampToSnp(models.Model):
samp = models.ForeignKey(Samp)
snp = models.ForeignKey(Snp)
value = models.CharField(max_length=2)
class Meta:
db_table = u'samptosnp'
The following seems to mean - get count of unique patients per clinic ...
(SELECT COUNT(*)
FROM clin AS a
LEFT JOIN clin AS b
ON a.pat_id = b.pat_id
WHERE a.iso = clin.iso
) AS multiple_iso,
Sample count per clinic:
(SELECT COUNT(*)
FROM samptopat
WHERE samptopat.iso_id = clin.iso
) AS multiple_samp,
This part is harder to understand, but in Django there is no way to do GROUP_CONCAT in plain ORM.
(SELECT GROUP_CONCAT(value ORDER BY snp_id ASC)
FROM samptopat
RIGHT JOIN samptosnp
USING(samp_id)
WHERE iso_id = clin.iso
GROUP BY samp_id
LIMIT 1 -- Return 1st samp only
) AS snp

Could you explain exactly what you're trying to extract w/ the snp subquery? I see you're joining over the two tables, but it looks like what you really want is Snp objects which have an associated Clin which has the given id. If so, this becomes almost as straightforward to do as a separate query as the other 2:
Snp.objects.filter(samp__pat__clin__pk=given_clin)
or some such thing ought to do the trick. You may have to rewrite that a bit due to all the ways you're violating the conventions, unfortunately.
The others are something like:
Pat.objects.filter(clin__pk=given_clin).count()
and
Samp.objects.filter(clin__pk=given_clin).count()
if #Evgeny's reading is correct (which is how I read it as well).
Often, with Django's ORM, I find I get better results if I try to think about directly what I want in terms of the ORM, instead of trying to translate to or from the SQL I might use if I wasn't using the ORM.

Django Generic Relations and ORM Queries

Say I have the following models:
class Image(models.Model):
image = models.ImageField(max_length=200, upload_to=file_home)
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
content_object = generic.GenericForeignKey()
class Article(models.Model):
text = models.TextField()
images = generic.GenericRelation(Image)
class BlogPost(models.Model):
text = models.TextField()
images = generic.GenericRelation(Image)
What's the most processor- and memory-efficient way to find all Articles that have at least one Image attached to them?
I've done this:
Article.objects.filter(pk__in=Image.objects.filter(content_type=ContentType.objects.get_for_model(Article)).values_list('object_id', flat=True))
Which works, but besides being ugly it takes forever.
I suspect there's a better solution using raw SQL, but that's beyond me. For what it's worth, the SQL generated by the above is as following:
SELECT `issues_article`.`id`, `issues_article`.`text` FROM `issues_article` WHERE `issues_article`.`id` IN (SELECT U0.`object_id` FROM `uploads_image` U0 WHERE U0.`content_type_id` = 26 ) LIMIT 21
EDIT: czarchaic's suggestion has much nicer syntax but even worse (slower) performance. The SQL generated by his query looks like the following:
SELECT DISTINCT `issues_article`.`id`, `issues_article`.`text`, COUNT(`uploads_image`.`id`) AS `num_images` FROM `issues_article` LEFT OUTER JOIN `uploads_image` ON (`issues_article`.`id` = `uploads_image`.`object_id`) GROUP BY `issues_article`.`id` HAVING COUNT(`uploads_image`.`id`) > 0 ORDER BY NULL LIMIT 21
EDIT: Hooray for Jarret Hardie! Here's the SQL generated by his should-have-been-obvious solution:
SELECT DISTINCT `issues_article`.`id`, `issues_article`.`text` FROM `issues_article` INNER JOIN `uploads_image` ON (`issues_article`.`id` = `uploads_image`.`object_id`) WHERE (`uploads_image`.`id` IS NOT NULL AND `uploads_image`.`content_type_id` = 26 ) LIMIT 21

Thanks to generic relations, you should be able to query this structure using traditional query-set semantics for reverse relations:
Article.objects.filter(images__isnull=False)
This will produce duplicates for any Articles that are related to multiple Images, but you can eliminate that with the distinct() QuerySet method:
Article.objects.distinct().filter(images__isnull=False)

I think your best bet would be to use aggregation
from django.db.models import Count
Article.objects.annotate(num_images=Count('images')).filter(num_images__gt=0)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to add OR constraints to exclude query in django? - sql

You probably just want location__in=(1, 2, 3)

Use a Q object. They're built-in to django and allow more complex lookups.

Related

How to combine 4 sql queries into a single query with good performance?

Selecting rows from Parent Table only if multiple rows in Child Table match

Remove duplicates in a Django query

Performing a Django raw SQL (using "WHERE col IN" syntax) or translating raw SQL to .raw() or .extra()

Django Generic Relations and ORM Queries

Categories

Resources