Sum up multiple values based off a single column

Sum up multiple values based off a single column - sql

For context, I work in transportation. Also, I apologize for a poor title - I'm not exactly sure how to summarize my issue.
I am currently editing an existing report which returns a drivers ID, their name, when they were hired, and the total amount of miles they have driven since they have started at the company. It was brought to my attention that drivers who move within the company are assigned a different driverID, which is not counted towards their total miles driven. Using an example provided to me, I was indeed able to confirm this scenario, as indicated below:
DriverCode DriverName
----------- ----------------
WETDE Wethington,Dean
WETDEA Wethington,Dean
This is the query that gets the above (example driver is hardcoded at the moment):
select mpp.mpp_id as DriverCode,
mpp.mpp_lastfirst as DriverName
from manpowerprofile mpp
outer apply (select top 1 mpp_id
from manpowerprofile) as id
where mpp_firstname = 'Dean'
and mpp_lastname = 'Wethington'
This is the current query as it stands:
SELECT lh.lgh_driver1 as DriverCode
,m.mpp_lastfirst as DriverName
,m.mpp_hiredate as HireDate
,SUM(s.stp_lgh_mileage) as TotMiles
FROM stops s (nolock)
INNER JOIN legheader lh (nolock) on lh.lgh_number = s.lgh_number
INNER JOIN manpowerprofile m (nolock) on m.mpp_id = lh.lgh_driver1
/* OUTER APPLY ( SELECT top 1 mpp_id
FROM manpowerprofile) as id */
WHERE m.mpp_terminationdt > GETDATE()
AND m.mpp_id <> 'UNKNOWN'
AND lh.lgh_outstatus = 'CMP'
GROUP BY lh.lgh_driver1, m.mpp_lastfirst, m.mpp_hiredate
HAVING SUM(s.stp_lgh_mileage) > 850000
ORDER BY DriverCode DESC
What I'm looking to do is check to see if a name exists twice, and if it does, add both of those driver code's total miles together to return a single result for that individual driver. I'm a pretty novice SQL Developer still and have only now really started to delve into databases.
My current train of thought was to use an outer apply, but I'm sure there's a better way to do this.

As per your comment, leaving off the driver code and hire date...
(Because they could/would be different for the drivers being combined.)
SELECT
m.mpp_lastfirst as DriverName
,SUM(s.stp_lgh_mileage) as TotMiles
FROM
stops s (nolock)
INNER JOIN
legheader lh (nolock)
on lh.lgh_number = s.lgh_number
INNER JOIN
manpowerprofile m (nolock)
on m.mpp_id = lh.lgh_driver1
WHERE
m.mpp_terminationdt > GETDATE()
AND m.mpp_id <> 'UNKNOWN'
AND lh.lgh_outstatus = 'CMP'
GROUP BY
m.mpp_lastfirst
HAVING
SUM(s.stp_lgh_mileage) > 850000
ORDER BY
m.mpp_lastfirstDESC

Related

If transaction within date range, then return customer name (and not all the transactions!)

This code is taking a significant amount of time to run. It's returning every single transaction within the date range but I just need to know if the customer has had at least one transaction, then include the CustomerID, CustomerName, Type, Sign, ReportingName.
I think I need to GROUP BY 'CustomerID' but again only if there was a transaction within the date range. And of course, I'm sure there is an optimal way to execute the below TSQL because it's quite slow at present.
Thanks in advance for any help!
SELECT [ABC].[dbo].[vwPrimary].[RelatedNameId] AS CustomerID
,[ABC].[dbo].[vwPrimary].[RelatedName] AS CustomerName
,[AFGPurchase].[IvL].[TaxTreatment].[ParticluarType] AS Type
,[AFGPurchase].[IvL].[Product].[Sign] AS [Sign]
,[AFGPurchase].[IvL].[Product].[ReportingName] AS ReportingName
,[AFGPurchase].[IvL].[Transaction].[EffectiveDate] AS 'Date'
FROM (((([AFGPurchase].[IvL].[Account]
INNER JOIN [AFGPurchase].[IvL].[Position] ON [AFGPurchase].[IvL].[Account].[AccountId] = [AFGPurchase].[IvL].[Position].[AccountId])
INNER JOIN [AFGPurchase].[IvL].[Product] ON [AFGPurchase].[IvL].[Position].[ProductID] = [AFGPurchase].[IvL].[Product].[ProductId])
INNER JOIN [ABC].[dbo].[vwPrimary] ON [AFGPurchase].[IvL].[Account].[ReportingEntityId] = [ABC].[dbo].[vwPrimary].[RelatedNameId])
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] ON [AFGPurchase].[IvL].[Account].[TaxTreatmentId] = [AFGPurchase].[IvL].[TaxTreatment].[TaxTreatmentId])
INNER JOIN [AFGPurchase].[IvL].[Transaction] ON [AFGPurchase].[IvL].[Position].[PositionId] = [AFGPurchase].[IvL].[Transaction].[PositionId]
WHERE ((([AFGPurchase].[IvL].[TaxTreatment].[RegistrationType]) LIKE 'NON%')
AND (([AFGPurchase].[IvL].[Product].[Sign])='XYZ2')
AND (([AFGPurchase].[IvL].[Position].[Quantity])<>0)
AND (([AFGPurchase].[IvL].[Transaction].[EffectiveDate]) between '2021-12-31' and '2022-12-31'))

Check your indexes on fragmentation, to speed up your query. And make sure you have indexes.
If you just need one result, just TOP 1
SELECT TOP 1 [ABC].[dbo].[vwPrimary].[RelatedNameId] AS CustomerID
,[ABC].[dbo].[vwPrimary].[RelatedName] AS CustomerName
,[AFGPurchase].[IvL].[TaxTreatment].[ParticluarType] AS Type
,[AFGPurchase].[IvL].[Product].[Sign] AS [Sign]
,[AFGPurchase].[IvL].[Product].[ReportingName] AS ReportingName
,[AFGPurchase].[IvL].[Transaction].[EffectiveDate] AS 'Date'
FROM (((([AFGPurchase].[IvL].[Account]
INNER JOIN [AFGPurchase].[IvL].[Position] ON [AFGPurchase].[IvL].[Account].[AccountId] = [AFGPurchase].[IvL].[Position].[AccountId])
INNER JOIN [AFGPurchase].[IvL].[Product] ON [AFGPurchase].[IvL].[Position].[ProductID] = [AFGPurchase].[IvL].[Product].[ProductId])
INNER JOIN [ABC].[dbo].[vwPrimary] ON [AFGPurchase].[IvL].[Account].[ReportingEntityId] = [ABC].[dbo].[vwPrimary].[RelatedNameId])
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] ON [AFGPurchase].[IvL].[Account].[TaxTreatmentId] = [AFGPurchase].[IvL].[TaxTreatment].[TaxTreatmentId])
INNER JOIN [AFGPurchase].[IvL].[Transaction] ON [AFGPurchase].[IvL].[Position].[PositionId] = [AFGPurchase].[IvL].[Transaction].[PositionId]
WHERE ((([AFGPurchase].[IvL].[TaxTreatment].[RegistrationType]) LIKE 'NON%')
AND (([AFGPurchase].[IvL].[Product].[Sign])='XYZ2')
AND (([AFGPurchase].[IvL].[Position].[Quantity])<>0)
AND (([AFGPurchase].[IvL].[Transaction].[EffectiveDate]) between '2021-12-31' and '2022-12-31'))

If you only need to check for the existence of a row, and not actually get any data from it then use EXISTS() rather than INNER JOIN, e.g.
SELECT vpr.[RelatedNameId] AS CustomerID
,vpr.[RelatedName] AS CustomerName
,tt.[ParticluarType] AS Type
,prd.[Sign]
,prd.ReportingName
,tr.[EffectiveDate] AS [Date]
FROM [AFGPurchase].[IvL].[Account] AS acc
INNER JOIN [AFGPurchase].[IvL].[Position] AS pos ON acc.[AccountId] = pos.[AccountId]
INNER JOIN [AFGPurchase].[IvL].[Product] AS prd ON pos.[ProductID] = prd.[ProductId]
INNER JOIN [ABC].[dbo].[vwPrimary] AS vpr ON acc.[ReportingEntityId] = vpr.[RelatedNameId]
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] AS tt ON acc.[TaxTreatmentId] = tt.[TaxTreatmentId]
WHERE tt.[RegistrationType] LIKE 'NON%'
AND prd.[Sign]='XYZ2'
AND pos.[Quantity]<>0
AND EXISTS
( SELECT 1
FROM [AFGPurchase].[IvL].[Transaction] AS tr
WHERE tr.[PositionId] = pos.[PositionId]
AND tr.[EffectiveDate] BETWEEN '2021-12-31' AND '2022-12-31'
);
N.B. I have added in table aliases and removed all the unnecessary parentheses for readability - you may disagree that it is more readable, but I would expect that most people would agree
This may not offer any performance benefits over simply grouping by the columns you are selecting and keeping your joins as they are - SQL is after all a declarative language where you tell the engine what you want, not how to get it. So you may find that the two plans are the same because you are requesting the same result. Using EXISTS does have the advance of being more semantically tied to what you are trying to do though, so gives the optimiser the best chance of getting to the right plan. If you are still having performance issues, then you may need to inspect the execution plan, and see if it suggests any indexes.
Finally, if you are really still using SQL Server 2008 then you really need to start thinking about your upgrade path. It has been completely unsupported for over 3 years now.

Previous Record With Cross Apply Syntax

I have a table called ArchiveActivityDetails which shows the history of a Customer Repair Order. 1 Repair Order will have many visits (ActivityID) with a Technician allocated depending on who is available for that planned visit.
The system automatically allocates the time that is required for a job but sometimes a job requires longer so we manually ammend jobs.
My initial query from the customer was to pull the manually ammended jobs (ie: jobs where PlannedDuration >=60 minutes) and shows the Technician linked to that manually ammended job.
This report works fine.
My most recent request from the customer is to now ADD a column showing WHO WAS THE PREVIOUS TECHNICIAN linked that the Repair Order.
My collegues suggested I do a Cross Apply going back to the ArchiveActivityDetails table and then show "Previous Tech" but I have not used Cross Apply before and I am struggling with the syntax and unable to get the results I want. In my Cross Apply I used LAG to work out the 'PrevTech' but when pulling it into my main report, I get NULL. So I assume I am not doing the Cross Apply correctly.
DECLARE #DateFrom as DATE = '2019-05-20'
DECLARE #DATETO AS DATE = '2019-07-23'
----------------------------------------------------------------------------------
SELECT
AAD.Date
,ASM.ASM
,A.ASM as PrevASM
,ASM.KDGID2
,R.ResourceName
,R.ID_ResourceID
,A.ServiceOrderNumber
,CONCAT(EN.TECHVORNAME, ' ' , EN.TECHNACHNAME) as TechName
,A.PrevTech
,EN.TechnicianID
,AAD.ID_ActivityID
,SO.ServiceOrderNumber
,AAD.VisitNumber
,AAD.PlannedDuration
,AAD.ActualDuration
,AAD.PlannedDuration-AAD.ActualDuration as DIFF
,DR.Original_Duration
FROM
[Easy].[ASMTrans] AS ASM
INNER JOIN
[FS_OTBE].[EngPayrollNumbers] AS EN
ON ASM.KDGID2 = EN.KDGID2
INNER JOIN
[OFSA].[ResourceID] AS R
ON EN.TechnicianID = Try_cast(R.ResourceName as int)
INNER JOIN
[OFSDA].[ArchiveActivityDetails] as [AAD]
ON R.[ID_ResourceID] = AAD.ID_ResourceID
INNER JOIN
[OFSA].[ServiceOrderNumber] SO
ON SO.ID_ServiceOrderNumber = AAD.ID_ServiceOrderNumber
LEFT JOIN
[OFSE].[DurationRevision] DR
on DR.ID_ActivityID = AAD.ID_ActivityID
CROSS APPLY
(
SELECT
AD.Date
,AD.ID_CountryCode
,AD.ID_Status
,Activity_TypeID
,AD.ID_ActivityID
,AD.ID_ResourceID
,SO.ServiceOrderNumber
,ASM.ASM
,LAG(EN.TECHVORNAME+ ' '+EN.TECHNACHNAME) OVER (ORDER BY SO.ServiceOrderNumber,AD.ID_ActivityID) as PrevTech
,AD.VisitNumber
,AD.ID_ServiceOrderNumber
,AD.PlannedDuration
,AD.ActualDuration
,ROW_NUMBER() OVER (PARTITION BY AD.ID_ServiceOrderNumber Order by AD.ID_ActivityID,AD.Date) as ROWNUM
FROM
[Easy].[ASMTrans] AS ASM
INNER JOIN
[FS_OTBE].[EngPayrollNumbers] AS EN
ON ASM.KDGID2 = EN.KDGID2
INNER JOIN
[OFSA].[ResourceID] AS R
ON EN.TechnicianID = Try_cast(R.ResourceName as int)
INNER JOIN
[OFSDA].[ArchiveActivityDetails] as [AD]
ON R.[ID_ResourceID] = AD.ID_ResourceID
INNER JOIN
[OFSA].[ServiceOrderNumber] SO
ON SO.ID_ServiceOrderNumber = AD.ID_ServiceOrderNumber
WHERE
AAD.ID_ActivityID = AD.ID_ActivityID
AND
AD.ID_CountryCode = AAD.ID_CountryCode
AND AD.ID_Status = AAD.ID_Status
AND AD.ID_ResourceID = AAD.ID_ResourceID
AND AD.Activity_TypeID = AAD.Activity_TypeID
AND AD.ID_ServiceOrderNumber = AAD.ID_ServiceOrderNumber
AND AD.Date >= '2019-05-01'
) as A
WHERE
ASM.KDGID2
IN (50008323,50008326,50008329,50008332,50008335,50008338,50008341,50008344,50008347,50008350,50008353,50008356,50008359,50008362,50008365)
AND AAD.ID_Status = 1
AND AAD.ID_CountryCode = 7
AND AAD.Activity_TypeID=91
AND
(
AAD.[Date] BETWEEN IIF(#DateFrom < '20190520','20190520',#DateFrom) AND IIF(#DateTo < '20190520','20190520',#DateTo))
AND AAD.ActualDuration > 11
AND
(
(DR.Original_Duration >= 60)
OR
(DR.ID_ActivityID IS NULL AND AAD.PlannedDuration >= 60))
I expect to see the previous Tech and previous Area Sales Manager for the job that was Manually Ammended.
Business Reason: Managers want to see who initially requested for the job to be Manually Ammended. The time requested is being over estimated which is wasting time. To plan better they need to see who requests extra time at a job and try to reduce the time.
I will attach the ArchiveActivityDetail table showing the history of a Repair Order as well as expected results.

Your query results in the cross apply will appear as a table in your query, so you can use top(1) and order by descending to get the first row ordered by what you want (it looks like ActivityId? maybe VisitNumber?).
Simplifying to get at the root of the issue, say you have just one table with ServiceOrderNumber, ID_Activity, ASM, and TECH. To get the previous row for activity 2414073 you would do this:
select top(1) ASM, TECH
from OFSDA.ArchiveActivityDetails as AD
where ID_ServiceOrderNumber = 2370634229 -- same ServiceOrderNumber
and ID_Activity < 2414073 -- previous activities
order by ID_Activity desc -- highest activity less than 2414073
Instead of cross apply, you probably want to use outer apply. This is the same but you will get a row in your main query for the first activity, it will just have nulls for values in your apply. If you want the first row omitted from your results because it doesn't have a previous row, go ahead and use cross apply.
You can just put the above query into the parenthesis in outer apply() and add an alias (Previous). You link to the values for the current row in your main query, use top(1) to get the first row only, and order by ID_Activity descending to get the row with the highest ID_Activity.
select ASM, TECH,
PreviousASM, PreviousTECH
from OFSDA.ArchiveActivityDetails as AD
outer apply (
select top(1) ADInner.ASM as PreviousASM, ADInner.TECH as PreviousTECH
from OFSDA.ArchiveActivityDetails as ADInner
where ADInner.ID_ServiceOrderNumber = AD.ID_ServiceOrderNumber
and ADInner.ID_Activity < AD.ID_Activity
order by ADInnerID_Activity desc
) Previous
where ID_ServiceOrderNumber = 2370634229

How to filter where a condition is true at least once

I need to filter down to only service orders that have a "service" work group value in at least one of their tasks. However, I don't want to get rid of the rows that aren't work group = "Service" if at least one of the task rows has that value. The end result would leave out all data from service orders that didn't have at least one BI_WRKFLW_TASK_KEY that was equal to "SERVICE". I know how to do normal filters but getting it to this specificity is beyond my current experience.
I've experimented with normal filters but they leave out rows that are a part of the same Service Order but just don't have that work group.
SELECT W.BI_WRKFLW_KEY,
T.BI_WORK_EVENT_CD,
T.BI_TASK_CD,
T.BI_WORKGRP,
**M.BI_SO_NBR**,
M.BI_SO_TYPE_CD,
M.BI_CLOSE_DT,
M.BI_OPEN_DT,
M.BI_SO_STAT_CD,
R.BI_WRKFLW_TMPLT_NM,
T.BI_WRKFLW_TASK_SEQ_NBR,
T.BI_WORKGRP,
A.BI_WORK_EVENT_CD,
A.BI_EVENT_DT_TM,
A.SY_JOB_QUEUE_ID,
**A.BI_WORKGRP**,
A.SY_USER_ID,
**A.BI_WRKFLW_TASK_KEY**
FROM BI_WRKFLW W
LEFT JOIN BI_WRKFLW_TASKS T ON W.BI_WRKFLW_KEY = T.BI_WRKFLW_KEY
LEFT JOIN BI_SO_DET D ON W.BI_WRKFLW_KEY = D.BI_WRKFLW_KEY
LEFT JOIN BI_SO_MASTER M ON D.BI_SO_NBR = M.BI_SO_NBR
LEFT JOIN BI_WRKFLW_TMPLT_REF R ON W.BI_WRKFLW_TMPLT_ID = R.BI_WRKFLW_TMPLT_ID
LEFT JOIN BI_TASK_ACT A ON T.BI_WRKFLW_TASKS_KEY = A.BI_WRKFLW_TASKS_KEY
WHERE M.BI_OPEN_DT >= ADD_MONTHS(CURRENT_DATE, -'12')
--AND M.BI_SO_TYPE_CD IN ('IVC-NEW1')
--AND M.BI_SO_STAT_CD LIKE 'O'
ORDER BY M.BI_SO_NBR, T.BI_EVENT_DT_TM
Any Service order row where the Service order has at least one BI_WRKFLOW_TASK_CD = "Service" would be kept and all other service orders filtered out.

I tried to map this out, i may not have got it quite right,
I think you are asking for BI_SO_MASTER records that have >=1 BI_WRKFLW_TASKS that belong to a certain group.
Try using a CTE to get the detail rows with a correct task, then you can find the SO population... then you can ???not sure what the ultimate result set goal is?
;with matchingTasks as ( D.BI_SO_NBR, D.<id> , W.BI_WRKFLW_KEY , T.<key> , A.Key
from BI_WRKFLW W
LEFT JOIN BI_WRKFLW_TASKS T ON W.BI_WRKFLW_KEY = T.BI_WRKFLW_KEY
LEFT JOIN BI_SO_DET D ON W.BI_WRKFLW_KEY = D.BI_WRKFLW_KEY
LEFT JOIN BI_TASK_ACT A ON T.BI_WRKFLW_TASKS_KEY = A.BI_WRKFLW_TASKS_KEYW
Where
<good dates>
and <A.field is what I am looking for>
)
/*Here you have the SO population
as well as the ids that helped this SO qualify.
*/
, My_SO_Population as (select Distinct BI_SO_NBR from matchingTasks )
/*now you can go get what you need.
the challenge of finding SOs w/ >=1 matching task has been solved...
*/
select <necessary fields> from
My_SO_Population
join <whatever you need....this is where i am cloudy>
if i am missing the goal, let me know where...

You can just add this to your WHERE clause:
AND T.BI_WRKFLW_KEY IN (
SELECT BI_WRKFLW_KEY
FROM BI_WRKFLW_TASKS
WHERE BI_WRKFLOW_TASK_CD = 'Service')

SQL Server - Need to SUM values in across multiple returned records

In the following query I am trying to get TotalQty to SUM across both the locations for item 6112040, but so far I have been unable to make this happen. I do need to keep both lines for 6112040 separate in order to capture the different location.
This query feeds into a Jasper ireport using something called Java.Groovy. Despite this, none of the PDFs printed yet have been either stylish or stained brown. Perhaps someone could address that issue as well, but this SUM issue takes priority
I know Gordon Linoff will get on in about an hour so maybe he can help.
DECLARE #receipt INT
SET #receipt = 20
SELECT
ent.WarehouseSku AS WarehouseSku,
ent.PalletId AS [ReceivedPallet],
ISNULL(inv.LocationName,'') AS [ActualLoc],
SUM(ISNULL(inv.Qty,0)) AS [LocationQty],
SUM(ISNULL(inv.Qty,0)) AS [TotalQty],
MAX(CAST(ent.ReceiptLineNumber AS INT)) AS [LineNumber],
MAX(ent.WarehouseLotReference) AS [WarehouseLot],
LEFT(SUM(ent.WeightExpected),7) AS [GrossWeight],
LEFT(SUM(inv.[Weight]),7) AS [NetWeight]
FROM WarehouseReceiptDetail AS det
INNER JOIN WarehouseReceiptDetailEntry AS ent
ON det.ReceiptNumber = ent.ReceiptNumber
AND det.FacilityName = ent.FacilityName
AND det.WarehouseName = ent.WarehouseName
AND det.ReceiptLineNumber = ent.ReceiptLineNumber
LEFT OUTER JOIN Inventory AS inv
ON inv.WarehouseName = det.WarehouseName
AND inv.FacilityName = det.FacilityName
AND inv.WarehouseSku = det.WarehouseSku
AND inv.CustomerLotReference = ent.CustomerLotReference
AND inv.LotReferenceOne = det.ReceiptNumber
AND ISNULL(ent.CaseId,'') = ISNULL(inv.CaseId,'')
WHERE
det.WarehouseName = $Warehouse
AND det.FacilityName = $Facility
AND det.ReceiptNumber = #receipt
GROUP BY
ent.PalletId
, ent.WarehouseSku
, inv.LocationName
, inv.Qty
, inv.LotReferenceOne
ORDER BY ent.WarehouseSku
The lines I need partially coalesced are 4 and 5 in the above return.

Create a second dataset with a subquery and join to that subquery - you can extrapolate from the following to apply to your situation:
First the Subquery:
SELECT
WarehouseSku,
SUM(Qty)
FROM
Inventory
GROUP BY
WarehouseSku
Now apply to your query - insert into the FROM clause:
...
LEFT JOIN (
SELECT
WarehouseSKU,
SUM(Qty)
FROM
Inventory
GROUP BY
WarehouseSKU
) AS TotalQty
ON Warehouse.WarehouseSku = TotalQty.WarehouseSku
Without seeing the actual schema DDL it is hard to know the exact cardinality, but I think this will point you in the right direction.

Timeout running SQL query

I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min
It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing?
SELECT
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined],
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined]
HAVING
(COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 )
EDIT:
Here are the relevant indexes (excluding those not used in the query):
auth_user.id (PK)
auth_user.username (Unique)
tickets_ticket.id (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id
EDIT 2:
After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution:
SELECT
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id]
The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line).
EDIT 3:
The python code which generated this is:
User.objects.annotate(
Count('tickets_captured'),
Count('assigned_tickets'),
Count('tickets_watched')
)
A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?

SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers
As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.

what about this?
SELECT auth_user.*,
C1.tickets_captured__count
C2.assigned_tickets__count
C3.tickets_watched__count
FROM
auth_user
LEFT JOIN
( SELECT capturer_id, COUNT(*) AS tickets_captured__count
FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT responsible_id, COUNT(*) AS assigned_tickets__count
FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT user_id, COUNT(*) AS tickets_watched__count
FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id
WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Sum up multiple values based off a single column - sql

Related

If transaction within date range, then return customer name (and not all the transactions!)

Previous Record With Cross Apply Syntax

How to filter where a condition is true at least once

SQL Server - Need to SUM values in across multiple returned records

Timeout running SQL query

Categories

Resources