SQL Left Outer Join on Subquery - sql

I am attempting to build a query that contains a left join subquery - based on the principles I learned in a previous question - that should pull similar data sets from two different tables. The goal is to compare volume data by account || platform to ensure that the stored procedure that creates one table from another is doing so correctly.
The idea is this:
Account || Product || T1Vol || T2Vol
abc AT 10 10
def RT 20 25
ghi OB 30
So with this example, the idea is to pull all accounts and products from T1 (the table the procedure acts on) and any accounts and products from T2 (the newly created table) where there is a match (so, Left Join on T1 = T2). (Ideally, everything will match perfectly, with no variance in T1 vs T2 vol and no nulls in T2 volume).
I wrote the following the query to accomplish this but its not quite working. The current error I get is not a GROUP BY expression - which I don't think is the real issue. I have been searching and with iterations to no avail.
The query is below. (To keep with the example, T1 = OpStats and T2 = RegSplits. Any help is much appreciated.
SELECT DTA.trading_code Account, OpStats.product_dwkey Platform, SUM(OpStats.risk_amount_adj)/1000000 OpStatsVol, RegSplits.Volume RegSplitsVol
FROM fact_trade_presplit_rollup OpStats
INNER JOIN dim_trading_accounts DTA ON OpStats.trading_dwkey=DTA.trading_dwkey
LEFT OUTER JOIN
( SELECT b.trading_Code Account, a.product_dwkey Platform, SUM(a.risk_amount_adj)/1000000 Volume
FROM fact_trade_rollup a
INNER JOIN dim_trading_accounts b on a.trading_dwkey=b.trading_dwkey
WHERE a.account_type IN('Customer','Taker')
AND a.date_key>='01-JAN-16'
AND a.date_key<='31-MAR-16'
AND a.daily_db_metric NOT IN ('Manual Treasury Volume ($B)', 'Manual Volume ($B)', 'HSBC-WL POMS (Internal) Volume ($B)','JPMC-WL Order Book (Internal) Volume ($B)')
AND (a.product_dwkey IN('RT','HWL') AND a.source_name<>'STP')
GROUP BY b.trading_code, a.product_dwkey ) RegSplits
ON (DTA.trading_code = RegSplits.Account) /* is it because I am trying to join DTA to the subquery */
WHERE OpStats.account_type IN('Customer','Taker')
AND OpStats.date_key>='01-JAN-16'
AND OpStats.date_key<='31-MAR-16'
AND OpStats.daily_db_metric NOT IN ('Manual Treasury Volume ($B)', 'Manual Volume ($B)', 'HSBC-WL POMS (Internal) Volume ($B)','JPMC-WL Order Book (Internal) Volume ($B)')
AND (OpStats.product_dwkey IN('RT','HWL') AND OpStats.source_name<>'STP')
GROUP BY DTA.trading_code, OpStats.product_dwkey;

The "Not group by expression" error is very easy to check.
Just compare SELECT expressions with GROUP BY expressions:
SELECT DTA.trading_code Account,
OpStats.product_dwkey Platform,
SUM(OpStats.risk_amount_adj)/1000000 OpStatsVol,
RegSplits.Volume RegSplitsVol
FROM ......
......
GROUP BY DTA.trading_code,
OpStats.product_dwkey;
There are two elements in SELECT that are not in GROUP BY:
SUM(OpStats.risk_amount_adj)/1000000 OpStatsVol
RegSplits.Volume RegSplitsVol
The number 1 is OK - it's an aggregate function, it cannot be in GROUP BY.
The number 2 caused this error - it's not an aggregate function, and it is not listed in GROUP BY clause.

Related

Previous Record With Cross Apply Syntax

I have a table called ArchiveActivityDetails which shows the history of a Customer Repair Order. 1 Repair Order will have many visits (ActivityID) with a Technician allocated depending on who is available for that planned visit.
The system automatically allocates the time that is required for a job but sometimes a job requires longer so we manually ammend jobs.
My initial query from the customer was to pull the manually ammended jobs (ie: jobs where PlannedDuration >=60 minutes) and shows the Technician linked to that manually ammended job.
This report works fine.
My most recent request from the customer is to now ADD a column showing WHO WAS THE PREVIOUS TECHNICIAN linked that the Repair Order.
My collegues suggested I do a Cross Apply going back to the ArchiveActivityDetails table and then show "Previous Tech" but I have not used Cross Apply before and I am struggling with the syntax and unable to get the results I want. In my Cross Apply I used LAG to work out the 'PrevTech' but when pulling it into my main report, I get NULL. So I assume I am not doing the Cross Apply correctly.
DECLARE #DateFrom as DATE = '2019-05-20'
DECLARE #DATETO AS DATE = '2019-07-23'
----------------------------------------------------------------------------------
SELECT
AAD.Date
,ASM.ASM
,A.ASM as PrevASM
,ASM.KDGID2
,R.ResourceName
,R.ID_ResourceID
,A.ServiceOrderNumber
,CONCAT(EN.TECHVORNAME, ' ' , EN.TECHNACHNAME) as TechName
,A.PrevTech
,EN.TechnicianID
,AAD.ID_ActivityID
,SO.ServiceOrderNumber
,AAD.VisitNumber
,AAD.PlannedDuration
,AAD.ActualDuration
,AAD.PlannedDuration-AAD.ActualDuration as DIFF
,DR.Original_Duration
FROM
[Easy].[ASMTrans] AS ASM
INNER JOIN
[FS_OTBE].[EngPayrollNumbers] AS EN
ON ASM.KDGID2 = EN.KDGID2
INNER JOIN
[OFSA].[ResourceID] AS R
ON EN.TechnicianID = Try_cast(R.ResourceName as int)
INNER JOIN
[OFSDA].[ArchiveActivityDetails] as [AAD]
ON R.[ID_ResourceID] = AAD.ID_ResourceID
INNER JOIN
[OFSA].[ServiceOrderNumber] SO
ON SO.ID_ServiceOrderNumber = AAD.ID_ServiceOrderNumber
LEFT JOIN
[OFSE].[DurationRevision] DR
on DR.ID_ActivityID = AAD.ID_ActivityID
CROSS APPLY
(
SELECT
AD.Date
,AD.ID_CountryCode
,AD.ID_Status
,Activity_TypeID
,AD.ID_ActivityID
,AD.ID_ResourceID
,SO.ServiceOrderNumber
,ASM.ASM
,LAG(EN.TECHVORNAME+ ' '+EN.TECHNACHNAME) OVER (ORDER BY SO.ServiceOrderNumber,AD.ID_ActivityID) as PrevTech
,AD.VisitNumber
,AD.ID_ServiceOrderNumber
,AD.PlannedDuration
,AD.ActualDuration
,ROW_NUMBER() OVER (PARTITION BY AD.ID_ServiceOrderNumber Order by AD.ID_ActivityID,AD.Date) as ROWNUM
FROM
[Easy].[ASMTrans] AS ASM
INNER JOIN
[FS_OTBE].[EngPayrollNumbers] AS EN
ON ASM.KDGID2 = EN.KDGID2
INNER JOIN
[OFSA].[ResourceID] AS R
ON EN.TechnicianID = Try_cast(R.ResourceName as int)
INNER JOIN
[OFSDA].[ArchiveActivityDetails] as [AD]
ON R.[ID_ResourceID] = AD.ID_ResourceID
INNER JOIN
[OFSA].[ServiceOrderNumber] SO
ON SO.ID_ServiceOrderNumber = AD.ID_ServiceOrderNumber
WHERE
AAD.ID_ActivityID = AD.ID_ActivityID
AND
AD.ID_CountryCode = AAD.ID_CountryCode
AND AD.ID_Status = AAD.ID_Status
AND AD.ID_ResourceID = AAD.ID_ResourceID
AND AD.Activity_TypeID = AAD.Activity_TypeID
AND AD.ID_ServiceOrderNumber = AAD.ID_ServiceOrderNumber
AND AD.Date >= '2019-05-01'
) as A
WHERE
ASM.KDGID2
IN (50008323,50008326,50008329,50008332,50008335,50008338,50008341,50008344,50008347,50008350,50008353,50008356,50008359,50008362,50008365)
AND AAD.ID_Status = 1
AND AAD.ID_CountryCode = 7
AND AAD.Activity_TypeID=91
AND
(
AAD.[Date] BETWEEN IIF(#DateFrom < '20190520','20190520',#DateFrom) AND IIF(#DateTo < '20190520','20190520',#DateTo))
AND AAD.ActualDuration > 11
AND
(
(DR.Original_Duration >= 60)
OR
(DR.ID_ActivityID IS NULL AND AAD.PlannedDuration >= 60))
I expect to see the previous Tech and previous Area Sales Manager for the job that was Manually Ammended.
Business Reason: Managers want to see who initially requested for the job to be Manually Ammended. The time requested is being over estimated which is wasting time. To plan better they need to see who requests extra time at a job and try to reduce the time.
I will attach the ArchiveActivityDetail table showing the history of a Repair Order as well as expected results.
Your query results in the cross apply will appear as a table in your query, so you can use top(1) and order by descending to get the first row ordered by what you want (it looks like ActivityId? maybe VisitNumber?).
Simplifying to get at the root of the issue, say you have just one table with ServiceOrderNumber, ID_Activity, ASM, and TECH. To get the previous row for activity 2414073 you would do this:
select top(1) ASM, TECH
from OFSDA.ArchiveActivityDetails as AD
where ID_ServiceOrderNumber = 2370634229 -- same ServiceOrderNumber
and ID_Activity < 2414073 -- previous activities
order by ID_Activity desc -- highest activity less than 2414073
Instead of cross apply, you probably want to use outer apply. This is the same but you will get a row in your main query for the first activity, it will just have nulls for values in your apply. If you want the first row omitted from your results because it doesn't have a previous row, go ahead and use cross apply.
You can just put the above query into the parenthesis in outer apply() and add an alias (Previous). You link to the values for the current row in your main query, use top(1) to get the first row only, and order by ID_Activity descending to get the row with the highest ID_Activity.
select ASM, TECH,
PreviousASM, PreviousTECH
from OFSDA.ArchiveActivityDetails as AD
outer apply (
select top(1) ADInner.ASM as PreviousASM, ADInner.TECH as PreviousTECH
from OFSDA.ArchiveActivityDetails as ADInner
where ADInner.ID_ServiceOrderNumber = AD.ID_ServiceOrderNumber
and ADInner.ID_Activity < AD.ID_Activity
order by ADInnerID_Activity desc
) Previous
where ID_ServiceOrderNumber = 2370634229

Include missing years in Group By query

I am fairly new in Access and SQL programming. I am trying to do the following:
Sum(SO_SalesOrderPaymentHistoryLineT.Amount) AS [Sum Of PaymentPerYear]
and group by year even when there is no amount in some of the years. I would like to have these years listed as well for a report with charts. I'm not certain if this is possible, but every bit of help is appreciated.
My code so far is as follows:
SELECT
Base_CustomerT.SalesRep,
SO_SalesOrderT.CustomerId,
Base_CustomerT.Customer,
SO_SalesOrderPaymentHistoryLineT.DatePaid,
Sum(SO_SalesOrderPaymentHistoryLineT.Amount) AS [Sum Of PaymentPerYear]
FROM
Base_CustomerT
INNER JOIN (
SO_SalesOrderPaymentHistoryLineT
INNER JOIN SO_SalesOrderT
ON SO_SalesOrderPaymentHistoryLineT.SalesOrderId = SO_SalesOrderT.SalesOrderId
) ON Base_CustomerT.CustomerId = SO_SalesOrderT.CustomerId
GROUP BY
Base_CustomerT.SalesRep,
SO_SalesOrderT.CustomerId,
Base_CustomerT.Customer,
SO_SalesOrderPaymentHistoryLineT.DatePaid,
SO_SalesOrderPaymentHistoryLineT.PaymentType,
Base_CustomerT.IsActive
HAVING
(((SO_SalesOrderPaymentHistoryLineT.PaymentType)=1)
AND ((Base_CustomerT.IsActive)=Yes))
ORDER BY
Base_CustomerT.SalesRep,
Base_CustomerT.Customer;
You need another table with all years listed -- you can create this on the fly or have one in the db... join from that. So if you had a table called alltheyears with a column called y that just listed the years then you could use code like this:
WITH minmax as
(
select min(year(SO_SalesOrderPaymentHistoryLineT.DatePaid) as minyear,
max(year(SO_SalesOrderPaymentHistoryLineT.DatePaid) as maxyear)
from SalesOrderPaymentHistoryLineT
), yearsused as
(
select y
from alltheyears, minmax
where alltheyears.y >= minyear and alltheyears.y <= maxyear
)
select *
from yearsused
join ( -- your query above goes here! -- ) T
ON year(T.SO_SalesOrderPaymentHistoryLineT.DatePaid) = yearsused.y
You need a data source that will provide the year numbers. You cannot manufacture them out of thin air. Supposing you had a table Interesting_year with a single column year, populated, say, with every distinct integer between 2000 and 2050, you could do something like this:
SELECT
base.SalesRep,
base.CustomerId,
base.Customer,
base.year,
Sum(NZ(data.Amount)) AS [Sum Of PaymentPerYear]
FROM
(SELECT * FROM Base_CustomerT INNER JOIN Year) AS base
LEFT JOIN
(SELECT * FROM
SO_SalesOrderT
INNER JOIN SO_SalesOrderPaymentHistoryLineT
ON (SO_SalesOrderPaymentHistoryLineT.SalesOrderId = SO_SalesOrderT.SalesOrderId)
) AS data
ON ((base.CustomerId = data.CustomerId)
AND (base.year = Year(data.DatePaid))),
WHERE
(data.PaymentType = 1)
AND (base.IsActive = Yes)
AND (base.year BETWEEN
(SELECT Min(year(DatePaid) FROM SO_SalesOrderPaymentHistoryLineT)
AND (SELECT Max(year(DatePaid) FROM SO_SalesOrderPaymentHistoryLineT))
GROUP BY
base.SalesRep,
base.CustomerId,
base.Customer,
base.year,
ORDER BY
base.SalesRep,
base.Customer;
Note the following:
The revised query first forms the Cartesian product of BaseCustomerT with Interesting_year in order to have base customer data associated with each year (this is sometimes called a CROSS JOIN, but it's the same thing as an INNER JOIN with no join predicate, which is what Access requires)
In order to have result rows for years with no payments, you must perform an outer join (in this case a LEFT JOIN). Where a (base customer, year) combination has no associated orders, the rest of the columns of the join result will be NULL.
I'm selecting the CustomerId from Base_CustomerT because you would sometimes get a NULL if you selected from SO_SalesOrderT as in the starting query
I'm using the Access Nz() function to convert NULL payment amounts to 0 (from rows corresponding to years with no payments)
I converted your HAVING clause to a WHERE clause. That's semantically equivalent in this particular case, and it will be more efficient because the WHERE filter is applied before groups are formed, and because it allows some columns to be omitted from the GROUP BY clause.
Following Hogan's example, I filter out data for years outside the overall range covered by your data. Alternatively, you could achieve the same effect without that filter condition and its subqueries by ensuring that table Intersting_year contains only the year numbers for which you want results.
Update: modified the query to a different, but logically equivalent "something like this" that I hope Access will like better. Aside from adding a bunch of parentheses, the main difference is making both the left and the right operand of the LEFT JOIN into a subquery. That's consistent with the consensus recommendation for resolving Access "ambiguous outer join" errors.
Thank you John for your help. I found a solution which works for me. It looks quiet different but I learned a lot out of it. If you are interested here is how it looks now.
SELECT DISTINCTROW
Base_Customer_RevenueYearQ.SalesRep,
Base_Customer_RevenueYearQ.CustomerId,
Base_Customer_RevenueYearQ.Customer,
Base_Customer_RevenueYearQ.RevenueYear,
CustomerPaymentPerYearQ.[Sum Of PaymentPerYear]
FROM
Base_Customer_RevenueYearQ
LEFT JOIN CustomerPaymentPerYearQ
ON (Base_Customer_RevenueYearQ.RevenueYear = CustomerPaymentPerYearQ.[RevenueYear])
AND (Base_Customer_RevenueYearQ.CustomerId = CustomerPaymentPerYearQ.CustomerId)
GROUP BY
Base_Customer_RevenueYearQ.SalesRep,
Base_Customer_RevenueYearQ.CustomerId,
Base_Customer_RevenueYearQ.Customer,
Base_Customer_RevenueYearQ.RevenueYear,
CustomerPaymentPerYearQ.[Sum Of PaymentPerYear]
;

Joining multiple tables in Access and limiting to Top 1 result

I have three tables that need to be joined in order to get monthly inventory data in return.
Table 1: TargetInventory
Table 2: TargetValue
Table 3: TargetWeight
[TargetInventory] does not change after being added the first time.
[TargetValue] is just a small table that includes prices of various types of metal.
[TargetWeight] is updated monthly as part of our inventory process. We INSERT new data, we never UPDATE old data.
Below is the relationship between these tables. (Sorry, I don't have the reputation points to post an image. Brand new here, so hopefully this makes sense.)
(* = UniqueKey)
--TargetValue-- --TargetInventory-- --TargetWeight--
*MaterialID <===| *TargetID <=====| *ID
Material |===> MaterialID |===> TargetID
PricePerOunce Length RecordDate
Density Width Weight
Thickness
DateInInventory
The TargetWeight table contains multiple records for TargetID (since a new one is added every month at inventory). That's good for me to track historical usage, but for the current inventory value, I only need the most recent TargetWeight.Weight to be returned.
I don't know how to do a CROSS APPLY from within another INNER JOIN, so I'm at a loss for how to do this (without switching to mySQL and just doing a LIMIT 1...)
I think it needs to look something like what's below, but I'm not sure how to finish the query.
SELECT
TargetInventory.TargetID AS TargetInventory_TargetID,
TargetInventory.MaterialID AS TargetInventory_MaterialID,
TargetInventory.Length,
TargetInventory.Width,
TargetInventory.Thickness,
TargetValue.MaterialID AS TargetValue_MaterialID,
TargetValue.PricePerOunce,
TargetValue.Density,
TargetWeight.ID,
TargetWeight.TargetID AS TargetWeight_TargetID,
TargetWeight.RecordDate,
TargetWeight.Weight
FROM
(TargetValue
INNER JOIN TargetInventory
ON TargetValue.[MaterialID] = TargetInventory.[MaterialID]
)
CROSS APPLY (
SELECT TOP 1 *
FROM .....
)
The following query works for me in Access 2010. It uses an INNER JOIN on a subquery to take the place of the CROSS APPLY (which Access SQL doesn't support). It assumes that there will be no more than one [TargetWeight] record for a given (TargetID, RecordDate):
SELECT
TargetInventory.TargetID AS TargetInventory_TargetID,
TargetInventory.MaterialID AS TargetInventory_MaterialID,
TargetInventory.Length,
TargetInventory.Width,
TargetInventory.Thickness,
TargetValue.MaterialID AS TargetValue_MaterialID,
TargetValue.PricePerOunce,
TargetValue.Density,
LatestWeight.ID,
LatestWeight.TargetID AS TargetWeight_TargetID,
LatestWeight.RecordDate,
LatestWeight.Weight
FROM
(
TargetValue
INNER JOIN
TargetInventory
ON TargetValue.[MaterialID] = TargetInventory.[MaterialID]
)
INNER JOIN
(
SELECT tw.*
FROM
TargetWeight AS tw
INNER JOIN
(
SELECT TargetID, MAX(RecordDate) AS LatestDate
FROM TargetWeight
GROUP BY TargetID
) AS latest
ON latest.TargetID=tw.TargetID
AND latest.LatestDate=tw.RecordDate
) AS LatestWeight
ON LatestWeight.TargetID = TargetInventory.TargetID
Alternative approach specifically for Access 2010 or later
If the above query bogs down with a large number of rows in [TargetWeight] then another possible solution for Access 2010+ would be to add a Yes/No field named [Current] to the [TargetWeight] table and use the following After Insert data macro to ensure that only the latest record for each [TargetID] is flagged as [Current]:
Once that is done, the query would simply be
SELECT
TargetInventory.TargetID AS TargetInventory_TargetID,
TargetInventory.MaterialID AS TargetInventory_MaterialID,
TargetInventory.Length,
TargetInventory.Width,
TargetInventory.Thickness,
TargetValue.MaterialID AS TargetValue_MaterialID,
TargetValue.PricePerOunce,
TargetValue.Density,
TargetWeight.ID,
TargetWeight.TargetID AS TargetWeight_TargetID,
TargetWeight.RecordDate,
TargetWeight.Weight
FROM
(
TargetValue
INNER JOIN
TargetInventory
ON TargetValue.[MaterialID] = TargetInventory.[MaterialID]
)
INNER JOIN
TargetWeight
ON TargetInventory.TargetID = TargetWeight.TargetID
WHERE TargetWeight.Current = True;
To maximize performance, the [TargetWeight].[TargetID] and [TargetWeight].[Current] fields should be indexed.
SELECT TargetInventory.TargetID AS TargetInventory_TargetID,
TargetInventory.MaterialID AS TargetInventory_MaterialID,
TargetInventory.Length,
TargetInventory.Width,
TargetInventory.Thickness,
TargetValue.MaterialID AS TargetValue_MaterialID,
TargetValue.PricePerOunce,
TargetValue.Density, Weight.ID,
Weight.TargetID AS TargetWeight_TargetID,
Weight.RecordDate,
Weight.Weight
FROM TargetInventory
INNER JOIN TargetValue ON TargetValue.[MaterialID] = TargetInventory.[MaterialID]
CROSS APPLY (
SELECT TOP 1 *
FROM TargetWeight
WHERE TargetID = TargetInventory.TargetID
ORDER BY RecordDate DESC
) AS Weight

SQl Query get data very slow from different tables

I am writing a sql query to get data from different tables but it is getting data from different tables very slowly.
Approximately above 2 minutes to complete.
What i am doing is here :
1. I am getting data differences and on behalf of date difference i am getting account numbers
2. I am comparing tables to get exact data i need.
here is my query
select T.accountno,
MAX(T.datetxn) as MxDt,
datediff(MM,MAX(T.datetxn), '2011-6-30') as Diffs,
max(P.Name) as POName
from Account_skd A,
AccountTxn_skd T,
POName P
where A.AccountNo = T.AccountNo and
GPOCode = A.OfficeCode and
Code = A.POCode and
A.servicecode = T.ServiceCode
group by T.AccountNo
order by len(T.AccountNo) DESC
please help that how i can use joins or any other way to get data within very less time say 5-10 seconds.
Since it appears you are getting EVERY ACCOUNT, and performance is slow, I would try by creating a prequery by just account, then do a single join to the other join tables something like..
select
T.Accountno,
T.MxDt,
datediff(MM, T.MxDt, '2011-6-30') as Diffs,
P.Name as POName
from
( select T1.AccountNo,
Max( T1.DateTxn ) MxDt
from AccontTxn_skd T1
group by T1.AccountNo ) T
JOIN Account_skd A
on T.AccountNo = A.AccountNo
JOIN POName P
on A.POCode = P.Code <-- GUESSING as you didn't qualify alias.field
AND A.OfficeCode = P.GPOCode <-- in your query for these two fields
order by
len(T.AccountNo) DESC
You had other elements based on the T.ServiceCode matching, but since you are only grouping on the account number anyhow, did it matter which service code was used? Otherwise, you would need to group by both the account AND service code (which I would have added the service code into the prequery and added as join condition to the account table too).

Timeout running SQL query

I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min
It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing?
SELECT
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined],
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined]
HAVING
(COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 )
EDIT:
Here are the relevant indexes (excluding those not used in the query):
auth_user.id (PK)
auth_user.username (Unique)
tickets_ticket.id (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id
EDIT 2:
After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution:
SELECT
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id]
The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line).
EDIT 3:
The python code which generated this is:
User.objects.annotate(
Count('tickets_captured'),
Count('assigned_tickets'),
Count('tickets_watched')
)
A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?
SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers
As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.
what about this?
SELECT auth_user.*,
C1.tickets_captured__count
C2.assigned_tickets__count
C3.tickets_watched__count
FROM
auth_user
LEFT JOIN
( SELECT capturer_id, COUNT(*) AS tickets_captured__count
FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT responsible_id, COUNT(*) AS assigned_tickets__count
FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT user_id, COUNT(*) AS tickets_watched__count
FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id
WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)