Filtering simultaneously on count of related objects and on count of related objects that satisfy a condition in Django - sql

So I have models amounting to this (very simplified, obviously):
class Mystery(models.Model):
name = models.CharField(max_length=100)
class Character(models.Model):
mystery = models.ForeignKey(Mystery, related_name="characters")
required = models.BooleanField(default=True)
Basically, in each mystery there are a number of characters, which can be essential to the story or not. The minimum number of actors that can stage a mystery is the number of required characters for that mystery; the maximum number is the number of characters total for the mystery.
Now I'm trying to query for mysteries that can be played by some given number of actors. It seemed straightforward enough using the way Django's filtering and annotation features function; after all, both of these queries work fine:
# Returns mystery objects with at least x characters in all
Mystery.objects.annotate(max_actors=Count('characters', distinct=True)).filter(max_actors__gte=x)
# Returns mystery objects with no more than x required characters
Mystery.objects.filter(characters__required=True).annotate(min_actors=Count('characters', distinct=True)).filter(min_actors__lte=x)
However, when I try to combine the two...
Mystery.objects.annotate(max_actors=Count('characters', distinct=True)).filter(characters__required=True).annotate(min_actors=Count('characters', distinct=True)).filter(min_actors__lte=x, max_actors__gte=x)
...it doesn't work. Both min_actors and max_actors come out containing the maximum number of actors. The relevant parts of the actual query being run look like this:
SELECT `mysteries_mystery`.`id`,
`mysteries_mystery`.`name`,
COUNT(DISTINCT `mysteries_character`.`id`) AS `max_actors`,
COUNT(DISTINCT `mysteries_character`.`id`) AS `min_actors`
FROM `mysteries_mystery`
LEFT OUTER JOIN `mysteries_character` ON (`mysteries_mystery`.`id` = `mysteries_character`.`mystery_id`)
INNER JOIN `mysteries_character` T5 ON (`mysteries_mystery`.`id` = T5.`mystery_id`)
WHERE T5.`required` = True
GROUP BY `mysteries_mystery`.`id`, `mysteries_mystery`.`name`
...which makes it clear that while Django is creating a second join on the character table just fine (the second copy of the table being aliased to T5), that table isn't actually being used anywhere and both of the counts are being selected from the non-aliased version, which obviously yields the same result both times.
Even when I try to use an extra clause to select from T5, I get told there is no such table as T5, even as examining the output query shows that it's still aliasing the second character table to T5. Another attempt to do this with extra clauses went like this:
Mystery.objects.annotate(max_actors=Count('characters', distinct=True)).extra(select={'min_actors': "SELECT COUNT(*) FROM mysteries_character WHERE required = True AND mystery_id = mysteries_mystery.id"}).extra(where=["`min_actors` <= %s", "`max_actors` >= %s"], params=[x, x])
But that didn't work because I can't use a calculated field in the WHERE clause, at least on MySQL. If only I could use HAVING, but alas, Django's .extra() does not and will never allow you to set HAVING parameters.
Is there any way to get Django's ORM to do what I want?

How about combining your Count()s:
Mystery.objects.annotate(max_actors=Count('characters', distinct=True),min_actors=Count('characters', distinct=True)).filter(characters__required=True).filter(min_actors__lte=x, max_actors__gte=x)
This seems to work for me but I didn't test it with your exact models.

It's been a couple of weeks with no suggested solutions, so here's how I ended up going about it, for anyone else who might be looking for an answer:
Mystery.objects.annotate(max_actors=Count('characters', distinct=True)).filter(max_actors__gte=x, id__in=Mystery.objects.filter(characters__required=True).annotate(min_actors=Count('characters', distinct=True)).filter(min_actors__lte=x).values('id'))
In other words, filter on the first count and on IDs that match those in an explicit subquery that filters on the second count. Kind of clunky, but it works well enough for my purposes.

Related

SQLite alias (AS) not working in the same query

I'm stuck in an (apparently) extremely trivial task that I can't make work , and I really feel no chance than to ask for advice.
I used to deal with PHP/MySQL more than 10 years ago and I might be quite rusty now that I'm dealing with an SQLite DB using Qt5.
Basically I'm selecting some records while wanting to make some math operations on the fetched columns. I recall (and re-read some documentation and examples) that the keyword "AS" is going to conveniently rename (alias) a value.
So for example I have this query, where "X" is an integer number that I render into this big Qt string before executing it with a QSqlQuery. This query lets me select all the electronic components used in a Project and calculate how many of them to order (rounding to the nearest multiple of 5) and the total price per component.
SELECT Inventory.id, UsedItems.pid, UsedItems.RefDes, Inventory.name, Inventory.category,
Inventory.type, Inventory.package, Inventory.value, Inventory.manufacturer,
Inventory.price, UsedItems.qty_used as used_qty,
UsedItems.qty_used*X AS To_Order,
ROUND((UsedItems.qty_used*X/5)+0.5)*5*CAST((X > 0) AS INT) AS Nearest5,
Inventory.price*Nearest5 AS TotPrice
FROM Inventory
LEFT JOIN UsedItems ON Inventory.id=UsedItems.cid
WHERE UsedItems.pid='1'
ORDER BY RefDes, value ASC
So, for example, I aliased UsedItems.qty_used as used_qty. At first I tried to use it in the next field, multiplying it by X, writing "used_qty*X AS To_Order" ... Query failed. Well, no worries, I had just put the original tab.field name and it worked.
Going further, I have a complex calculation and I want to use its result on the next field, but the same issue popped out: if I alias "ROUND(...)" AS Nearest5, and then try to use this value by multiplying it in the next field, the query will fail.
Please note: the query WORKS, but ONLY if I don't use aliases in the following fields, namely if I don't use the alias Nearest5 in the TotPrice field. I just want to avoid re-writing the whole ROUND(...) thing for the TotPrice field.
What am I missing/doing wrong? Either SQLite does not support aliases on the same query or I am using a wrong syntax and I am just too stuck/confused to see the mistake (which I'm sure it has to be really stupid).
Column aliases defined in a SELECT cannot be used:
For other expressions in the same SELECT.
For filtering in the WHERE.
For conditions in the FROM clause.
Many databases also restrict their use in GROUP BY and HAVING.
All databases support them in ORDER BY.
This is how SQL works. The issue is two things:
The logic order of processing clauses in the query (i.e. how they are compiled). This affects the scoping of parameters.
The order of processing expressions in the SELECT. This is indeterminate. There is no requirement for the ordering of parameters.
For a simple example, what should x refer to in this example?
select x as a, y as x
from t
where x = 2;
By not allowing duplicates, SQL engines do not have to make a choice. The value is always t.x.
You can try with nested queries.
A SELECT query can be nested in another SELECT query within the FROM clause;
multiple queries can be nested, for example by following the following pattern:
SELECT *,[your last Expression] AS LastExp From (SELECT *,[your Middle Expression] AS MidExp FROM (SELECT *,[your first Expression] AS FirstExp FROM yourTables));
Obviously, respecting the order that the expressions of the innermost select query can be used by subsequent select queries:
the first expressions can be used by all other queries, but the other intermediate expressions can only be used by queries that are further upstream.
For your case, your query may be:
SELECT *, PRC*Nearest5 AS TotPrice FROM (SELECT *, ROUND((UsedItems.qty_used*X/5)+0.5)*5*CAST((X > 0) AS INT) AS Nearest5 FROM (SELECT Inventory.id, UsedItems.pid, UsedItems.RefDes, Inventory.name, Inventory.category, Inventory.type, Inventory.package, Inventory.value, Inventory.manufacturer, Inventory.price AS PRC, UsedItems.qty_used*X AS To_Order FROM Inventory LEFT JOIN UsedItems ON Inventory.id=UsedItems.cid WHERE UsedItems.pid='1' ORDER BY RefDes, value ASC))

Order by in subquery behaving differently than native sql query?

So I am honestly a little puzzled by this!
I have a query that returns a set of transactions that contain both repair costs and an odometer reading at the time of repair on the master level. To get an accurate Cost per mile reading I need to do a subquery to get both the first meter reading between a starting date and an end date, and an ending meter.
(select top 1 wf2.ro_num
from wotrans wotr2
left join wofile wf2
on wotr2.rop_ro_num = wf2.ro_num
and wotr2.rop_fac = wf2.ro_fac
where wotr.rop_veh_num = wotr2.rop_veh_num
and wotr.rop_veh_facility = wotr2.rop_veh_facility
AND ((#sdate = '01/01/1900 00:00:00' and wotr2.rop_tran_date = 0)
OR ([dbo].[udf_RTA_ConvertDateInt](#sdate) <= wotr2.rop_tran_date
AND [dbo].[udf_RTA_ConvertDateInt](#edate) >= wotr2.rop_tran_date))
order by wotr2.rop_tran_date asc) as highMeter
The reason I have the tables aliased as xx2 is because those tables are also used in the main query, and I don't want these to interact with each other except to pull the correct vehicle number and facility.
Basically when I run the main query it returns a value that is not correct; it returns the one that is second(keep in mind that the first and second have the same date.) But when I take the subquery and just copy and paste it into it's own query and run it, it returns the correct value.
I do have a work around for this, but I am just curious as to why this happening. I have searched quite a bit and found not much(other than the fact that people don't like order bys in subqueries). Talking to one of my friends that also does quite a bit of SQL scripting, it looks to us as if the subquery is ordering differently than the subquery by itsself when you have multiple values that are the same for the order by(i.e. 10 dates of 08/05/2016).
Any ideas would be helpful!
Like I said I have a work around that works in this one case, but don't know yet if it will work on a larger dataset.
Let me know if you want more code.

Access query speed differs

I have a local access database and in it a query which takes values from a form to populate a drop down menu. The weird (to me) thing is that with most options this query is quick (blink of an eye), but with a few options it's very slow (>10 seconds).
What the query is does is a follows: It populates a dropdown menu to record animals seen at a specific sighting, but only those animals which have not been recorded at that specific sighting yet (to avoid duplicate entries).
SELECT DISTINCT tblAnimals.AnimalID, tblAnimals.Nickname, tblAnimals.Species
FROM tblSightings INNER JOIN (tblAnimals INNER JOIN tblAnimalsatSighting ON tblAnimals.AnimalID = tblAnimalsatSighting.AnimalID) ON tblSightings.SightingID = tblAnimalsatSighting.SightingID
WHERE (((tblAnimals.Species)=[form]![Species]) AND ((tblAnimals.CurrentGroup)=[form]![AnimalGroup2]) AND ((tblAnimals.[Dead?])=False) AND ((Exists (select tblAnimalsatSighting.AnimalID FROM tblAnimalsatSighting WHERE tblAnimals.AnimalID = tblAnimalsatSighting.AnimalID AND tblAnimalsatSighting.SightingID = [form]![SightingID]))=False));
It performs well for all groups of 2 of the 4 possible species, for 1 species it performs well for 4 of the 5 groups, but not for the last group, and for the last species it performs very slowly for both groups. Anybody an idea what can be the cause of this kind of behavior? Is it problems with the query? Or duplicate entries in the tables which can cause this? I don't think it's duplicates in the tables, I've checked that, and there are some, but they appear both for groups where there are problems and where there aren't. Could I re-write the query so it performs faster?
As noted in our comments above, you confirmed that the extra joins were not really need and were in fact going to limit the results to animal that had already had a sighting. Those joins would also likely contribute to a slowdown.
I know that Access probably added most of the parentheses automatically but I've removed them and converted the subquery to a not exists form that's a lot more readable.
SELECT tblAnimals.AnimalID, tblAnimals.Nickname, tblAnimals.Species
FROM tblAnimals
WHERE
tblAnimals.Species = [form]![Species]
AND tblAnimals.CurrentGroup = [form]![AnimalGroup2]
AND tblAnimals.[Dead?] = False
AND NOT EXISTS (
SELECT tblAnimalsatSighting.AnimalID
FROM tblAnimalsatSighting
WHERE
tblAnimals.AnimalID = tblAnimalsatSighting.AnimalID
AND tblAnimalsatSighting.SightingID = [form]![SightingID]
);

SQL MIN() returns multiple values?

I am using SQL server 2005, querying with Web Developer 2010, and the min function appears to be returning more than one value (for each ID returned, see below). Ideally I would like it to just return the one for each ID.
SELECT Production.WorksOrderOperations.WorksOrderNumber,
MIN(Production.WorksOrderOperations.OperationNumber) AS Expr1,
Production.Resources.ResourceCode,
Production.Resources.ResourceDescription,
Production.WorksOrderExcel_ExcelExport_View.PartNumber,
Production.WorksOrderOperations.PlannedQuantity,
Production.WorksOrderOperations.PlannedSetTime,
Production.WorksOrderOperations.PlannedRunTime
FROM Production.WorksOrderOperations
INNER JOIN Production.Resources
ON Production.WorksOrderOperations.ResourceID = Production.Resources.ResourceID
INNER JOIN Production.WorksOrderExcel_ExcelExport_View
ON Production.WorksOrderOperations.WorksOrderNumber = Production.WorksOrderExcel_ExcelExport_View.WorksOrderNumber
WHERE Production.WorksOrderOperations.WorksOrderNumber IN
( SELECT WorksOrderNumber
FROM Production.WorksOrderExcel_ExcelExport_View AS WorksOrderExcel_ExcelExport_View_1
WHERE (WorksOrderSuffixStatus = 'Proposed'))
AND Production.Resources.ResourceCode IN ('1303', '1604')
GROUP BY Production.WorksOrderOperations.WorksOrderNumber,
Production.Resources.ResourceCode,
Production.Resources.ResourceDescription,
Production.WorksOrderExcel_ExcelExport_View.PartNumber,
Production.WorksOrderOperations.PlannedQuantity,
Production.WorksOrderOperations.PlannedSetTime,
Production.WorksOrderOperations.PlannedRunTime
If you can get your head around it, I am selecting certain columns from multiple tables where the WorksOrderNumber is also contained within a subquery, and numerous other conditions.
Result set looks a little like this, have blurred out irrelevant data.
http://i.stack.imgur.com/5UFIp.png (Wouldn't let me embed image).
The highlighted rows are NOT supposed to be there, I cannot explicitly filter them out, as this result set will be updated daily and it is likely to happen with a different record.
I have tried casting and converting the OperationNumber to numerous other data types, varchar type returns '100' instead of the '30'. Also tried searching search engines, no one seems to have the same problem.
I did not structure the tables (they're horribly normalised), and it is not possible to restructure them.
Any ideas appreciated, many thanks.
The MIN function returns the minimum within the group.
If you want the minimum for each ID you need to get group on just ID.
I assume that by "ID" you are referring to Production.WorksOrderOperations.WorksOrderNumber.
You can add this as a "table" in your SQL:
(SELECT Production.WorksOrderOperations.WorksOrderNumber,
MIN(Production.WorksOrderOperations.OperationNumber)
FROM Production.WorksOrderOperations
GROUP BY Production.WorksOrderOperations.WorksOrderNumber)

Rails SQL query optimization

I have a featured section in my website that contains featured posts of three types: normal, big and small. Currently I am fetching the three types in three separate queries like so:
#featured_big_first = Post.visible.where(pinged: 1).where('overlay_type =?', :big).limit(1)
#featured_big_first = Post.visible.where(pinged: 1).where('overlay_type =?', :small).limit(1)
#featured_big_first = Post.visible.where(pinged: 1).where('overlay_type =?', :normal).limit(5)
Basically I am looking for a query that will combine those three in to one and fetch 1 big, 1 small, 5 normal posts.
I'd be surprised if you don't want an order. As you have it, it is supposed to find a random small, random large, and 5 random normal.
Yes, you can use a UNION. However, you will have to do an execute SQL. Look at the log for the SQL for each of your three queries, and do an execute SQL of a string which is each of the three queries with UNION in between. It might work, or it might have problems with the limit.
It is possible in SQL by joining the table to itself, doing a group by on one of the aliases for the table, a where when the other aliased table is <= the group by table, and adding a having clause where count of the <= table is under the limit.
So, if you had a simple query of the posts table (without the visible and pinged conditions) and wanted the records with the latest created_at date, then the normal query would be:
SELECT posts1.*
FROM posts posts1, posts posts2
WHERE posts2.created_at >= posts1.create_at
AND posts1.overlay_type = 'normal'
AND posts2.overlay_type = 'normal'
GROUP BY posts1.id
HAVING count(posts2.id) <= 5
Take this SQL, and add your conditions for visible and pinged, remembering to use the condition for both posts1 and posts2.
Then write the big and small versions and UNION it all together.
I'd stick with the three database calls.
I don't think this is possible but you can use scope which is more rails way to write a code
Also it may just typo but you are reassigning the #featured_big_first so it will contain the data of the last query only
in post.rb
scope :overlay_type_record lamda{|type| joins(:visible).where(["visible.pinged=1 AND visible.overlay_type =", type])}
and in controller
#featured_big_first = Post.overlay_type_record(:big).limit(1)
#featured_small_first = Post.overlay_type_record(:small).limit(1)
#featured_normal_first = Post.overlay_type_record(:normal).limit(5)