Django annotate() multiple times causes wrong answers - sql

Django has the great new annotate() function for querysets. However I can't get it to work properly for multiple annotations in a single queryset.
For example,
tour_list = Tour.objects.all().annotate( Count('tourcomment') ).annotate( Count('history') )
A tour can contain multiple tourcomment and history entries. I'm trying to get how many comments and history entries exist for this tour. The resulting
history__count and tourcomment__count
values will be incorrect. If there's only one annotate() call the value will be correct.
There seems to be some kind of multiplicative effect coming from the two LEFT OUTER JOINs. For example, if a tour has 3 histories and 3 comments, 9 will be the count value for both. 12 histories + 1 comment = 12 for both values. 1 history + 0 comment = 1 history, 0 comments (this one happens to return the correct values).
The resulting SQL call is:
SELECT `testapp_tour`.`id`, `testapp_tour`.`operator_id`, `testapp_tour`.`name`, `testapp_tour`.`region_id`, `testapp_tour`.`description`, `testapp_tour`.`net_price`, `testapp_tour`.`sales_price`, `testapp_tour`.`enabled`, `testapp_tour`.`num_views`, `testapp_tour`.`create_date`, `testapp_tour`.`modify_date`, `testapp_tour`.`image1`, `testapp_tour`.`image2`, `testapp_tour`.`image3`, `testapp_tour`.`image4`, `testapp_tour`.`notes`, `testapp_tour`.`pickup_time`, `testapp_tour`.`dropoff_time`, COUNT(`testapp_tourcomment`.`id`) AS `tourcomment__count`, COUNT(`testapp_history`.`id`) AS `history__count`
FROM `testapp_tour` LEFT OUTER JOIN `testapp_tourcomment` ON (`testapp_tour`.`id` = `testapp_tourcomment`.`tour_id`) LEFT OUTER JOIN `testapp_history` ON (`testapp_tour`.`id` = `testapp_history`.`tour_id`)
GROUP BY `testapp_tour`.`id`
ORDER BY `testapp_tour`.`name` ASC
I have tried combining the results from two querysets that contain a single call to annotate (), but it doesn't work right... You can't really guarantee that the order will be the same. and it seems overly complicated and messy so I've been looking for something better...
tour_list = Tour.objects.all().filter(operator__user__exact = request.user ).filter(enabled__exact = True).annotate( Count('tourcomment') )
tour_list_historycount = Tour.objects.all().filter( enabled__exact = True ).annotate( Count('history') )
for i,o in enumerate(tour_list):
o.history__count = tour_list_historycount[i].history__count
Thanks for any help. Stackoverflow has saved my butt in the past with a lot of already-answered questions, but I wasn't able to find an answer to this one yet.

Thanks for your comment. That didn't quite work but it steered me in the right direction. I was finally able to solve this by adding distinct to both Count() calls:
Count('tourcomment', distinct=True)

tour_list = Tour.objects.all().annotate(tour_count=Count('tourcomment',distinct=True) ).annotate(history_count=Count('history',distinct=True) )
You have to add distinct=True to get the proper result else it will return the wrong answer.

I can't guarantee that this will solve your problem, but try appending .order_by() to your call. That is:
tour_list = Tour.objects.all().annotate(Count('tourcomment')).annotate(Count('history')).order_by()
The reason for this is that django needs to select all the fields in the ORDER BY clause, which causes otherwise identical results to be selected. By appending .order_by(), you're removing the ORDER BY clause altogether, which prevents this from happening. See the aggregation documentation for more information on this issue.

Related

ORA-01841 happens on one environment but not all

I have the following SQL-code in my (SAP IdM) Application:
Select mcmskeyvalue as MKV,v1.searchvalue as STARTDATE, v2.avalue as Running_Changes_flag
from idmv_entry_simple
inner join idmv_value_basic_active v1 on mskey = mcmskey and attrname = 'Start_of_company_change'
and mcentrytype = 'MX_PERSON' and to_date(v1.searchvalue,'YYYY-MM-DD')<= sysdate+3
left join idmv_value_basic v2 on v2.mskey = mcmskey and v2.attrname = 'Running_Changes_flag'
where mcmskey not in (Select mskey from idmv_value_basic_active where attrname = 'Company_change_running_flag')
I already found the solution for the ORA-01841 problem, as it could either be a solution similar to MSSQLs try_to_date as mentioned here: How to handle to_date exceptions in a SELECT statment to ignore those rows?
or a solution where I change the code to something like this, to work soly on strings:
Select mcmskeyvalue as MKV,v1.searchvalue as STARTDATE, v2.avalue as Running_Changes_flag
from idmv_entry_simple
inner join idmv_value_basic_active v1 on mskey = mcmskey and attrname = 'Start_of_company_change'
and mcentrytype = 'MX_PERSON' and v1.searchvalue<= to_char(sysdate+3,'YYYY-MM-DD')
left join idmv_value_basic v2 on v2.mskey = mcmskey and v2.attrname = 'Running_Changes_flag'
where mcmskey not in (Select mskey from idmv_value_basic_active where attrname = 'Company_change_running_flag')
So for the actually problem I have a solution.
But now I came into discussion with my customers and teammates why the error happens at all.
Basically for all entries of idmv_value_basic_activ that comply to the requirement of "attrname = 'Start_of_company_change'" we can be sure that those are dates. In addition, if we execute the query to check all values that would be delivered, all are in a valid format.
I learned in university that the DB-Engine could decide in which order it will run individual segments of a query. So for me the most logical explanation would be that, on the development environment (where we face the problem), the section " to_date(v1.searchvalue,'YYYY-MM-DD')<= sysdate+3” is executed before the section “attrname = 'Start_of_company_change'”
Whereas on the productive environment, where everything works like a charm, the segments are executed in the order that is descripted by the SQL Statement.
Now my Question is:
First: do I remember that right, since the teacher said that only once and at that time I could not really make sense out of it
And Second: Is this assumption of mine correct or is there another reason for the problem?
Borderinformation:
The Tool uses a kind of shifted data structure which is why there can be quite a few different types in the actual “Searchvalue” column of the idmv_value_basic_activ view. The datatype on the database layer is always a varchar one.
"the DB-Engine could decide in which order it will run individual segments of a query"
This is correct. A SQL query is just a description of the data you want and where it's stored. Oracle will calculate an execution plan to retrieve that data as best it can. That plan will vary based on any number of factors, like the number of actual rows in the table and the presence of indexes, so it will vary from environment to environment.
So it sounds like you have an invalid date somewhere in your table, so to_date raises an exception. You can use validate_conversion to find it.

SQL Query not searching data if record contains zero

I have two tables and with column paperNo and some data regarding that paper. I am trying to search all data based on paper no. from both the tables. I have successfully written the query and it is retrieving the data successfully. but I have noticed that. If my paperNo contains zero(0) then the query is not searching for that data. And for the non zero contains paperNo it is retrieving the same record twice.
I don't understand what is going wrong. tried every thing.
Here is my Query .-
SELECT PaperDate.paperNo,
PaperDate.RAW_PAPER,
PaperDate.EDGE_SEALED,
PaperDate.HYDRO_120,
PaperDate.HYDRO_350,
PaperDate.CATALYST_1ST,
PaperDate.CATALYST_2ND,
PaperDate.SIC_350,
tblThicknessPaperDate.rawThickness,
tblThicknessPaperDate.catThickness,
tblThicknessPaperDate.sicThickness,
tblThicknessPaperDate.rejectedThickness
FROM tblThicknessPaperDate
FULL OUTER JOIN PaperDate ON PaperDate.paperNo =tblThicknessPaperDate.paperNo
WHERE (tblThicknessPaperDate.paperNo = #paperNo)
I would try:
FROM tblThicknessPaperDate
RIGHT JOIN PaperDate ON PaperDate.paperNo =tblThicknessPaperDate.paperNo
WHERE (PaperDate.paperNo = #paperNo)
The two changes are: swapping to a right join so even if a record isn't in tblThicknessPaperDate we will still see the record in PaperDate. The other change is to use PapterDate.paperNo in the where clause. Since tblThicknessPaperDate.paperNo could be null we don't want to use that in the where if we can avoid it.
SELECT PaperDate.paperNo,
PaperDate.RAW_PAPER,
PaperDate.EDGE_SEALED,
PaperDate.HYDRO_120,
PaperDate.HYDRO_350,
PaperDate.CATALYST_1ST,
PaperDate.CATALYST_2ND,
PaperDate.SIC_350,
tblThicknessPaperDate.rawThickness,
tblThicknessPaperDate.catThickness,
tblThicknessPaperDate.sicThickness,
tblThicknessPaperDate.rejectedThickness
FROM tblThicknessPaperDate
FULL OUTER JOIN PaperDate ON PaperDate.paperNo =tblThicknessPaperDate.paperNo
WHERE (tblThicknessPaperDate.paperNo = #papNo | PaperDate.paperNo = #paperNo)

relationships produce duplicate records on query, DISTINCT does not work, any other solutions available?

I am new to this portal. I have a very simple problem to be solved. It is related to the ANSI SQL. I am writing a reports using BIRT and I am fetching the data from several tables. I understand how the SQL joins work but maybe not fully. I researched google for hours and I could not find relevant answer.
My problem is that one of the relationships in the code produce a duplicate result (the same row is copied - duplicated). I was so determined to solve it I used every type of join available. Some of this SQL was produced already. I shall post my code below. I know that one of the solutions to my problem is use of the 'DISTINCT' keyword. I have used it and it does not solve my problem.
Can anyone propose any solution to that?
Sample code:
SELECT DISTINCT
partmaster.partdesc,
partmaster.uom,
traders.name AS tradername,
worksorders.id AS worksorderno,
worksorders.partid,
worksorders.quantity,
worksorders.duedate,
worksorders.traderid,
worksorders.orderid,
routingoperations.partid,
routingoperations.methodid,
routingoperations.operationnumber,
routingoperations.workcentreid,
routingoperations.settime,
routingoperations.runtime,
routingoperations.perquantity,
routingoperations.description,
routingoperations.alternativeoperation,
routingoperations.alternativeoperationpreference,
machines.macdesc,
machines.msection,
allpartmaster.partnum,
allpartmaster.nbq,
allpartmaster.partdesc,
routingoperationtools.toolid,
tools.tooldesc,
CAST (emediadetails.data as VARCHAR(MAX)) AS cplandata
FROM worksorders
INNER JOIN partmaster ON worksorders.partid = partmaster.partnum
INNER JOIN traders traders ON worksorders.traderid = traders.id
INNER JOIN routingoperations routingoperations ON worksorders.partid = routingoperations.partid
AND worksorders.routingmethod = routingoperations.methodid
INNER JOIN allpartmaster allpartmaster ON routingoperations.partid = allpartmaster.partnum
LEFT OUTER JOIN machines machines ON routingoperations.workcentreid = machines.macid
LEFT OUTER JOIN routingoperationtools routingoperationtools ON routingoperationtools.partid = routingoperations.partid
AND routingoperationtools.routingmethod = routingoperations.methodid
AND routingoperationtools.operationnumber = routingoperations.operationnumber
LEFT OUTER JOIN tools tools ON tools.toolid = routingoperationtools.toolid
LEFT OUTER JOIN emediadetails ON emediadetails.keyvalue1 = worksorders.id
AND emediadetails.keyvalue2 = routingoperations.operationnumber
AND emediadetails.emediaid = 'worksorderoperation'
I do not have too much of the test data but I know that one row is copied twice as the result of the query below even tho I used DISTINCT keyword. I know that my problem is rather specific and not general but the solution that someone will propose may help others with the similar problem.
I can't solve your problem for you without some test data, but I have some helpful hints.
In principle, you should be really careful with DISTINCT - its a great way of hiding bugs in your query. Only use DISTINCT if you are confident that the underlying data contains legitimate duplicates. If your joins are wrong, and you're getting a cartesian product, you can remove the duplicates from the results with DISTINCT - but that doesn't stop the cartesian product being generated. You'll get very poor performance, and possibly incorrect data.
Secondly, I am pretty sure that DISTINCT works properly - you are almost certainly not getting duplicates, but it may be hard to spot the difference between two rows. Leading or trailing spaces in text columns, for instance could be to blame.
Finally, to work through this problem, I'd recommend building the query up join by join, and seeing where you get the duplicate - that's the join that's to blame.
So, start with:
SELECT
traders.name AS tradername,
worksorders.id AS worksorderno,
worksorders.partid,
worksorders.quantity,
worksorders.duedate,
worksorders.traderid,
worksorders.orderid
FROM worksorders
INNER JOIN traders traders ON
worksorders.traderid = traders.id
and build up to the next join.
Are you sure the results are exact duplicates? Makes sure there isn't one column that actually has a different value.

sql query question inner join

LEFT JOIN PatientClinics AB ON PPhy.PatientID = AB.PatientID
JOIN Clinics CL ON CL.ID = AB.ClinicID
AND COUNT(AB.ClinicID) = 1
I get error using Count(AB.ClinicID) = 1 (ClinicID has duplicate values in the table and
I want to use only 1 value of each duplicate value of ClinicId to produce result)
What mistake am I making?
I've never seen a COUNT() being used in a JOIN before. Maybe you should use:
HAVING COUNT(AB.ClinicID) = 1
instead.
Count() can't be used as a join/filter predicate. It can be used in the HAVING clause however. You should include the entire query in order to get a better example of how to rewrite it.
maybe investigate the HAVING clause instead of using COUNT where you put it.
hard to help without the full query.

Access query returns empty fields depending on how table is linked

I've got an Access MDB I use for reporting that has linked table views from SQL Server 2005. I built a query that retrieves information off of a PO table and categorizes the line item depending on information from another table. I'm relatively certain the query was fine until approximately a month ago when we shifted from compatibility mode 80 to 90 on the Server as required by our primary application (which creates the data). I can't say this with 100% certainty, but that is the only major change made in the past 90 days. We noticed that suddenly data was not showing up in the query making the reports look odd.
This is a copy of the failing query:
SELECT dbo_porel.jobnum, dbo_joboper.opcode, dbo_porel.jobseqtype,
dbo_opmaster.shortchar01,
dbo_porel.ponum, dbo_porel.poline, dbo_podetail.unitcost
FROM ((dbo_porel
LEFT JOIN dbo_joboper ON (dbo_porel.assemblyseq = dbo_joboper.assemblyseq)
AND (dbo_porel.jobseq = dbo_joboper.oprseq)
AND (dbo_porel.jobnum = dbo_joboper.jobnum))
LEFT JOIN dbo_opmaster ON dbo_joboper.opcode = dbo_opmaster.opcode)
LEFT JOIN dbo_podetail ON (dbo_porel.poline = dbo_podetail.poline)
AND (dbo_porel.ponum = dbo_podetail.ponum)
WHERE (dbo_porel.jobnum="367000003")
It returns the following:
jobnum opcode jobseqtype shortchar01 ponum poline unitcost
367000003 S 6624 2 15
The query normally should have displayed a value for opcode and shortchar01. If I remove the linked table dbo_podetail, it properly displays data for these fields (although I obviously don't have unitcost anymore). At first I thought it might be a data issue, but I found if I nested the query and then linked the table, it worked fine.
For example the following code works perfectly:
SELECT qryTest.*, dbo_podetail.unitcost
FROM (
SELECT dbo_porel.jobnum, dbo_joboper.opcode, dbo_porel.jobseqtype,
dbo_opmaster.shortchar01, dbo_porel.ponum, dbo_porel.poline
FROM (dbo_porel
LEFT JOIN dbo_joboper ON (dbo_porel.jobnum=dbo_joboper.jobnum)
AND (dbo_porel.jobseq=dbo_joboper.oprseq)
AND (dbo_porel.assemblyseq=dbo_joboper.assemblyseq))
LEFT JOIN dbo_opmaster ON dbo_joboper.opcode=dbo_opmaster.opcode
WHERE (dbo_porel.jobnum="367000003")
) As qryTest
LEFT JOIN dbo_podetail ON (qryTest.poline = dbo_podetail.poline)
AND (qryTest.ponum = dbo_podetail.ponum)
I'm at a loss for why it works in the latter case and not in the first case. Worse yet, it seems to work intermittently for some records and not for others (it's consistent about the ones it does and does not work for).
Do any of you experts have any ideas?
You definitely need to use subqueries for multiple left/right joins in Access.
I think it's a limitation of the Jet optimizer that gets confused if you're just chaining left/right joins.
You can see that this is a recurrent problem that surfaces often.
I'm always confused by Access' use of brackets in joins. Try stripping out the extra brackets.
FROM
dbo_porel
LEFT JOIN
dbo_joboper ON (dbo_porel.assemblyseq = dbo_joboper.assemblyseq)
AND (dbo_porel.jobseq = dbo_joboper.oprseq)
AND (dbo_porel.jobnum = dbo_joboper.jobnum)
LEFT JOIN
dbo_opmaster ON (dbo_joboper.opcode = dbo_opmaster.opcode)
LEFT JOIN
dbo_podetail ON (dbo_porel.poline = dbo_podetail.poline)
AND (dbo_porel.ponum = dbo_podetail.ponum)
OK the above doesn't work - Sorry I give up