How to get Django QuerySet 'exclude' to work right? - sql

I have a database that contains schemas for skus, kits, kit_contents, and checklists. Here is a query for "Give me all the SKUs defined for kitcontent records defined for kit records defined in checklist 1":
SELECT DISTINCT s.* FROM skus s
JOIN kit_contents kc ON kc.sku_id = s.id
JOIN kits k ON k.id = kc.kit_id
JOIN checklists c ON k.checklist_id = 1;
I'm using Django, and I mostly really like the ORM because I can express that query by:
skus = SKU.objects.filter(kitcontent__kit__checklist_id=1).distinct()
which is such a slick way to navigate all those foreign keys. Django's ORM produces basically the same as the SQL written above. The trouble is that it's not clear to me how to get all the SKUs not defined for checklist 1. In the SQL query above, I'd do this by replacing the "=" with "!=". But Django's models don't have a not equals operator. You're supposed to use the exclude() method, which one might guess would look like this:
skus = SKU.objects.filter().exclude(kitcontent__kit__checklist_id=1).distinct()
but Django produces this query, which isn't the same thing:
SELECT distinct s.* FROM skus s
WHERE NOT ((skus.id IN
(SELECT kc.sku_id FROM kit_contents kc
INNER JOIN kits k ON (kc.kit_id = k.id)
WHERE (k.checklist_id = 1 AND kc.sku_id IS NOT NULL))
AND skus.id IS NOT NULL))
(I've cleaned up the query for easier reading and comparison.)
I'm a beginner to the Django ORM, and I'd like to use it when possible. Is there a way to get what I want here?
EDIT:
karthikr gave an answer that doesn't work for the same reason the original ORM .exclude() solution doesn't work: a SKU can be in kit_contents in kits that exist on both checklist_id=1 and checklist_id=2. Using the by-hand query I opened my post with, using "checklist_id = 1" produces 34 results, using "checklist_id = 2" produces 53 results, and the following query produces 26 results:
SELECT DISTINCT s.* FROM skus s
JOIN kit_contents kc ON kc.sku_id = s.id
JOIN kits k ON k.id = kc.kit_id
JOIN checklists c ON k.checklist_id = 1
JOIN kit_contents kc2 ON kc2.sku_id = s.id
JOIN kits k2 ON k2.id = kc2.kit_id
JOIN checklists c2 ON k2.checklist_id = 2;
I think this is one reason why people don't seem to find the .exclude() solution a reasonable replacement for some kind of not_equals filter -- the latter allows you to say, succinctly, exactly what you mean. Presumably the former could also allow the query to be expressed, but I increasingly despair of such a solution being simple.

You could do this - get all the objects for checklist 1, and exclude it from the complete list.
sku_ids = skus.values_list('pk', flat=True)
non_checklist_1 = SKU.objects.exclude(pk__in=sku_ids).distinct()

Related

How would you explain this query in layman terms?

Here is the database I'm using: https://drive.google.com/file/d/1ArJekOQpal0JFIr1h3NXYcFVngnCNUxg/view?usp=sharing
select distinct
AC1.givename, AC1.famname, AC2.givename, AC2.famname
from
academic AC1, author AU1, academic AC2, author AU2
where
AC1.acnum = AU1.acnum
and AC2.acnum = AU2.acnum
and AU1.panum = AU2.panum
and AU2.acnum > AU1.acnum
and not exists (select *
from Interest I1, Interest I2
where I1.acnum = AC1.acnum
and I2.acnum = AC2.acnum);
Output:
I'm having trouble explaining this output of the subquery and query in layman terms(Normal english).
Not sure if my explanation is right:
"The subquery finds the interested fields where two authors have no common field of interest.
The whole query finds the first and last names of the authors of papers which have at least two authors, and have no common field of interest."
As it currently stands, the subquery will produce rows if each academic has at least one interest.
So overall, the query is "produce pairs of academics who co-authored at least one paper and where at least one of them has no interests whatsoever". It's difficult to believe that that was the intent, and if it was, there are clearer ways of writing it that make it more clear that that is what we're looking for.
If that's the query we want, though, I'd write it as:
SELECT
AC1.givename, AC1.famname, AC2.givename, AC2.famname
FROM
academic AC1
inner join
academic AC2
on
AC1.acnum < AC2.acnum
WHERE EXISTS
(select * from author au1 inner join author au2 on au1.panum = au2.panum
where au1.acnum = ac1.acnum and au2.acnum = ac2.acnum)
AND
(
NOT EXISTS (select * from interest i where i.acnum = ac1.acnum)
OR
NOT EXISTS (select * from interest i where i.acnum = ac2.acnum)
)
If, as is more likely, we wanted pairs of co-authors who have no interests in common, we would write something like:
SELECT
AC1.givename, AC1.famname, AC2.givename, AC2.famname
FROM
academic AC1
inner join
academic AC2
on
AC1.acnum < AC2.acnum
WHERE EXISTS
(select * from author au1 inner join author au2 on au1.panum = au2.panum
where au1.acnum = ac1.acnum and au2.acnum = ac2.acnum)
AND NOT EXISTS
(select * from interest i1 inner join interest i2 on i1.field = i2.field
where i1.acnum = ac1.acnum and i2.acnum = ac2.acnum)
Notice how neither of my queries uses distinct, because we've made sure that the outer query isn't joining additional rows where we only care about the existence or absence of those rows - we've moved all such checks into EXISTS subqueries.
I generally see distinct used far too often when the author is getting multiple results when they only want a single result and they're unwilling to expend the effort to discover why they're getting multiple results. In this case, it would be situations where the same pairs of academics have co-authored more than one paper.

SQL Server : multi-join with tuple IN clause

I'm trying to join 4 tables that have a somewhat complex relationship. Because of where this will be used, it needs to be contained in a single query, but I'm having trouble since the primary query and the IN clause query both join 2 tables together and the lookup is on two columns.
The goal is to input a SalesNum and SalesType and have it return the Price
Tables and relationships:
sdShipping
SalesNum[1]
SalesType[2]
Weight[3]
sdSales
SalesNum[1]
SalesType[2]
Zip[4]
spZones
Zip[4]
Zone[5]
spPrices
Zone[5]
Price
Weight[3]
Here's my latest attempt in T-SQL:
SELECT
spp.Price
FROM
spZones AS spz
LEFT OUTER JOIN
spPrices AS spp ON spz.Zone = spp.Zone
WHERE
(spp.Weight, spz.Zip) IN (SELECT ship.Weight, sales.Zip
FROM sdShipping AS ship
LEFT OUTER JOIN sdSales AS sales ON sales.SalesNum = ship.SalesNum
AND sales.SalesType = ship.SalesType
WHERE sales.SalesNum = (?)
AND ship.SalesType = (?));
SQL Server Management Studio says I have an error in my syntax near ',' (appropriately useless error message). Does anybody have any idea whether this is even allowed in Microsoft's version of SQL? Is there perhaps another way to accomplish it? I've seen the multi-key IN questions answered on here, but never in the case where both sides require a JOIN.
Many databases do support IN on tuples. SQL Server is not one of them.
Use EXISTS instead:
SELECT spp.Price
FROM spZones spz LEFT OUTER JOIN
spPrices spp
ON spz.Zone = spp.Zone
WHERE EXISTS (SELECT 1
FROM sdShipping ship LEFT JOIN
sdSales sales
ON sales.SalesNum = ship.SalesNum AND
sales.SalesType = ship.SalesType
WHERE spp.Weight = ship.Weight AND spz.Zip = sales.Zip AND
sales.SalesNum = (?) AND
ship.SalesType = (?)
);

Joining tables, counting, and group to return a Model

So I've got a SQL query I'd like to duplicate in rails:
select g.*
from gamebox_favorites f
inner join gameboxes g on f.gamebox_id = g.id
group by f.gamebox_id
order by count(f.gamebox_id) desc;
I've been reading over the rails Active Record Query Interface site, but can't quite seem to put this together. I'd like the query to return a collection of Gamebox records, sorted by the number of 'favorites' a gamebox has. What is the cleanest way to do this in rails?
I believe this will work (works on a similarly structured database locally), though I'm not sure I have the proper models in the proper spots for what you're trying to do, so you might need to move a coule things around:
Gamebox.joins(:gamebox_favorites).
group('"gamebox_favorites"."gamebox_id"').
order('count("gamebox_favorites"."gamebox_id")')
On the console, this should compile to (in the case of PostgreSQL on the back end):
SELECT "gameboxes".* FROM "gamebox_favorites"
INNER JOIN "gamebox_favorites"
ON "gamebox_favorites"."gamebox_id" = "gamebox"."id"
GROUP BY "gamebox_favorites"."gamebox_id"
ORDER BY count("gamebox_favorites"."gamebox_id")
...and I'm guessing that you don't want do just wrap it in a find_by_sql call, such as:
Gamebox.find_by_sql("select g.* from gamebox_favorites f
inner join gameboxes g
on f.gamebox_id = g.id
group by f.gamebox_id
order by count(f.gamebox_id) desc")

Help with Delphi 7, ADO, & MS Access SQL Statement - Part Deuce

I need help understanding why my SQL does not work. Or, if i need to write it differently to get the results i need. As the title suggests, I am using Delphi 7, with ADO components, and a MS Access 2000 database. You can see my table structure from Part I here:
Help with Delphi 7, ADO, & MS Access SQL Statement
The SQL i am currently using to get all knowledge based on keywords is as follows:
select * from (knowledge K
inner join knowledge_keywords KKW on KKW.knowledgeid = K.id)
inner join keywords KW on KW.id = KKW.keywordid
where (KW.keyword = 'job') AND (KW.keyword = 'task')
However, this does not return and results, when there is clearly both of those words in the knowledge_keywords table with the same knowledge id.
However, if i do the same SQL with an OR instead of an AND, i get the two records i expected
select * from (knowledge K
inner join knowledge_keywords KKW on KKW.knowledgeid = K.id)
inner join keywords KW on KW.id = KKW.keywordid
where (KW.keyword = 'job') AND (KW.keyword = 'task')
thanks for any help
Think about it this way: How many records are there in knowledge_keywords for which it is true both that keyword = 'job' AND keyword = 'task'. There are no such records. When you use AND you're asking for records that satisfy both the first condition AND the second condition at the same time. When you use OR, you're asking for records that satisfy one condition OR the other one (or both).
In this case, OR expresses what you want. AND expresses something different.
You can also use KW.keyword IN ('job', 'task') which is more concise and, perhaps, clearer.
I think the first query won't return any result, does it? That's because 'and' in speech differs from 'and' in programming. When you say, you want the keywords 'job' and 'task', you actually mean you want the rows where keyword is either 'job' or 'task'. A keyword cannot be both 'job' and 'task' so that query won't return any rows. You could replace the OR with an IN in the form of
WHERE KW.Keyword in ('job', 'task')
But this probably won't give you the result you want. I suspect you need to find articles that match both keywords.
To check if a knowledgebase has both keywords, you might need something like this (although I'm not sure if Access accepts this:
select
*
from
knowledge K
where
exists
(select 'x' from
knowledge_keywords KKW
inner join keywords KW on KW.id = KKW.keywordid
where
KKW.knowledgeid = K.id and
KW.keyword = 'job')
and exists
(select 'x' from
knowledge_keywords KKW
inner join keywords KW on KW.id = KKW.keywordid
where
KKW.knowledgeid = K.id and
KW.keyboard = 'task') and
[edit]
A different approach, that might work better in Access (I'm sorry I can't test it) is by using a count like this. I made a small assumption about the fields in K for this example.
This way, you join each keyword in the list. For a knowledge base article that has both 'job' and 'task' it will return two rows at first. These rows are then grouped on the Knowledge fields, and the rows are counted. Only the articles where count matches the total number of keywords are returned.
Possible problem: When an article has the same keyword (job) linked twice, it is still returned. This can be solved by preventing that from happening using unique constraints.
select
K.ID,
K.Title,
K.Content
from
knowledge K
inner join knowledge_keywords KKW on KKW.knowledgeid = K.id)
inner join keywords KW on KW.id = KKW.keywordid
where
KW.keyword in ('job', 'task')
group by
K.ID,
K.Title,
K.Content
having
count(*) = 2 /* Number of keywords */

Sorting rows by count of a many-to-many associated record

I know there are a lot of other SO entries that seem like this one, but I haven't found one that actually answers my question so hopefully one of you can either answer it or point me to another SO question that is related.
Basically, I have the following query that returns Venues that have any CheckIns that contain the searched Keyword ("foobar" in this example).
SELECT DISTINCT v.*
FROM "venues" v
INNER JOIN "check_ins" c ON c."venue_id" = v."id"
INNER JOIN "keywordings" ks ON ks."check_in_id" = c."id"
INNER JOIN "keywords" k ON ks."keyword_id" = k."id"
WHERE (k."name" = 'foobar')
I want to SELECT and ORDER BY the count of the matched Keyword for each given Venue. E.g. if there have been 5 CheckIns that have been created, associated with that Keyword, then there should be a returned column (called something like keyword_count) with the value 5 which is sorted.
Ideally this should be done without any queries in the SELECT clause, or preferably none at all.
I've been struggling with this for a while and my mind is just going blank (perhaps it's been too long a day) so some help would be greatly appreciated here.
Thanks in advance!
Sounds like you need something like:
SELECT v.x, v.y, count(*) AS keyword_count
FROM "venues" v
INNER JOIN "check_ins" c ON c."venue_id" = v."id"
INNER JOIN "keywordings" ks ON ks."check_in_id" = c."id"
INNER JOIN "keywords" k ON ks."keyword_id" = k."id"
WHERE (k."name" = 'foobar')
GROUP BY v.x, v.y
ORDER BY 3