Very slow stored procedure - sql

I have a hard time with query optimization, currently I'm very close to the point of database redesign. And the stackoverflow is my last hope. I don't think that just showing you the query is enough so I've linked not only database script but also attached database backup in case you don't want to generate the data by hand
Here you can find both the script and the backup
The problems start when you try to do the following...
exec LockBranches #count=64,#lockedBy='034C0396-5C34-4DDA-8AD5-7E43B373AE5A',#lockedOn='2011-07-01 01:29:43.863',#unlockOn='2011-07-01 01:32:43.863'
The main problems occur in this part:
UPDATE B
SET B.LockedBy = #lockedBy,
B.LockedOn = #lockedOn,
B.UnlockOn = #unlockOn,
B.Complete = 1
FROM
(
SELECT TOP (#count) B.LockedBy, B.LockedOn, B.UnlockOn, B.Complete
FROM Objectives AS O
INNER JOIN Generations AS G ON G.ObjectiveID = O.ID
INNER JOIN Branches AS B ON B.GenerationID = G.ID
INNER JOIN
(
SELECT SB.BranchID AS BranchID, SUM(X.SuitableProbes) AS SuitableProbes
FROM SpicieBranches AS SB
INNER JOIN Probes AS P ON P.SpicieID = SB.SpicieID
INNER JOIN
(
SELECT P.ID, 1 AS SuitableProbes
FROM Probes AS P
/* ----> */ INNER JOIN Results AS R ON P.ID = R.ProbeID /* SSMS Estimated execution plan says this operation is the roughest */
GROUP BY P.ID
HAVING COUNT(R.ID) > 0
) AS X ON P.ID = X.ID
GROUP BY SB.BranchID
) AS X ON X.BranchID = B.ID
WHERE
(O.Active = 1)
AND (B.Sealed = 0)
AND (B.GenerationNo < O.BranchGenerations)
AND (B.LockedBy IS NULL OR DATEDIFF(SECOND, B.UnlockOn, GETDATE()) > 0)
AND (B.Complete = 1 OR X.SuitableProbes = O.BranchSize * O.EstimateCount * O.ProbeCount)
) AS B
EDIT: Here are the amounts of rows in each table:
Spicies 71536
Results 10240
Probes 10240
SpicieBranches 4096
Branches 256
Estimates 5
Generations 1
Versions 1
Objectives 1

Somebody else might be able to explain better than I can why this is much quicker. Experience tells me when you have a bunch of queries that collectively run slow together but should be quick in their individual parts then its worth trying a temporary table.
This is much quicker
ALTER PROCEDURE LockBranches
-- Add the parameters for the stored procedure here
#count INT,
#lockedOn DATETIME,
#unlockOn DATETIME,
#lockedBy UNIQUEIDENTIFIER
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON
--Create Temp Table
SELECT SpicieBranches.BranchID AS BranchID, SUM(X.SuitableProbes) AS SuitableProbes
INTO #BranchSuitableProbeCount
FROM SpicieBranches
INNER JOIN Probes AS P ON P.SpicieID = SpicieBranches.SpicieID
INNER JOIN
(
SELECT P.ID, 1 AS SuitableProbes
FROM Probes AS P
INNER JOIN Results AS R ON P.ID = R.ProbeID
GROUP BY P.ID
HAVING COUNT(R.ID) > 0
) AS X ON P.ID = X.ID
GROUP BY SpicieBranches.BranchID
UPDATE B SET
B.LockedBy = #lockedBy,
B.LockedOn = #lockedOn,
B.UnlockOn = #unlockOn,
B.Complete = 1
FROM
(
SELECT TOP (#count) Branches.LockedBy, Branches.LockedOn, Branches.UnlockOn, Branches.Complete
FROM Objectives
INNER JOIN Generations ON Generations.ObjectiveID = Objectives.ID
INNER JOIN Branches ON Branches.GenerationID = Generations.ID
INNER JOIN #BranchSuitableProbeCount ON Branches.ID = #BranchSuitableProbeCount.BranchID
WHERE
(Objectives.Active = 1)
AND (Branches.Sealed = 0)
AND (Branches.GenerationNo < Objectives.BranchGenerations)
AND (Branches.LockedBy IS NULL OR DATEDIFF(SECOND, Branches.UnlockOn, GETDATE()) > 0)
AND (Branches.Complete = 1 OR #BranchSuitableProbeCount.SuitableProbes = Objectives.BranchSize * Objectives.EstimateCount * Objectives.ProbeCount)
) AS B
END
This is much quicker with an average execution time of 54ms compared to 6 seconds with the original one.
EDIT
Had a look and combined my ideas with those from RBarryYoung's solution. If you use the following to create the temporary table
SELECT SB.BranchID AS BranchID, COUNT(*) AS SuitableProbes
INTO #BranchSuitableProbeCount
FROM SpicieBranches AS SB
INNER JOIN Probes AS P ON P.SpicieID = SB.SpicieID
WHERE EXISTS(SELECT * FROM Results AS R WHERE R.ProbeID = P.ID)
GROUP BY SB.BranchID
then you can get this down to 15ms which is 400x better than we started with. Looking at the execution plan shows that there is a table scan happening on the temp table. Normally you avoid table scans as best you can but for 128 rows (in this case) it is quicker than whatever it was doing before.

This is basically a complete guess here, but in times past I've found that joining onto the results of a sub-query can be horrifically slow. That is, the subquery was being evaluated way too many times when it really didn't need to.
The way around this was to move the subqueries into CTEs and to join onto those instead. Good luck!

It appears the join on the two uniqueidentifier columns are the source of the problem. One is a clustered index, the other non-clustered on the (FK table). Good that there are indexes on them. Unfortunately guids are notoriously poor performing when joining with large numbers of rows.
As troubleshooting steps:
what state are the indexes in? When was the last time the statistics were updated?
how performant is that subquery onto itself, when executed adhoc? i.e. when you run this statement by itself, how fast does the resultset return? acceptable?
after rebuilding the 2 indexes, and updating statistics, is there any measurable difference?
SELECT P.ID, 1 AS SuitableProbes FROM Probes AS P
INNER JOIN Results AS R ON P.ID = R.ProbeID
GROUP BY P.ID HAVING COUNT(R.ID) > 0

The following runs about 15x faster on my system:
UPDATE B
SET B.LockedBy = #lockedBy,
B.LockedOn = #lockedOn,
B.UnlockOn = #unlockOn,
B.Complete = 1
FROM
(
SELECT TOP (#count) B.LockedBy, B.LockedOn, B.UnlockOn, B.Complete
FROM Objectives AS O
INNER JOIN Generations AS G ON G.ObjectiveID = O.ID
INNER JOIN Branches AS B ON B.GenerationID = G.ID
INNER JOIN
(
SELECT SB.BranchID AS BranchID, COUNT(*) AS SuitableProbes
FROM SpicieBranches AS SB
INNER JOIN Probes AS P ON P.SpicieID = SB.SpicieID
WHERE EXISTS(SELECT * FROM Results AS R WHERE R.ProbeID = P.ID)
GROUP BY SB.BranchID
) AS X ON X.BranchID = B.ID
WHERE
(O.Active = 1)
AND (B.Sealed = 0)
AND (B.GenerationNo < O.BranchGenerations)
AND (B.LockedBy IS NULL OR DATEDIFF(SECOND, B.UnlockOn, GETDATE()) > 0)
AND (B.Complete = 1 OR X.SuitableProbes = O.BranchSize * O.EstimateCount * O.ProbeCount)
) AS B

Insertion of sub query into local temporary table
SELECT SB.BranchID AS BranchID, SUM(X.SuitableProbes) AS SuitableProbes
into #temp FROM SpicieBranches AS SB
INNER JOIN Probes AS P ON P.SpicieID = SB.SpicieID
INNER JOIN
(
SELECT P.ID, 1 AS SuitableProbes
FROM Probes AS P
/* ----> */ INNER JOIN Results AS R ON P.ID = R.ProbeID /* SSMS Estimated execution plan says this operation is the roughest */
GROUP BY P.ID
HAVING COUNT(R.ID) > 0
) AS X ON P.ID = X.ID
GROUP BY SB.BranchID
The below query shows the partial joins with the corresponding table instead of complete!!
UPDATE B
SET B.LockedBy = #lockedBy,
B.LockedOn = #lockedOn,
B.UnlockOn = #unlockOn,
B.Complete = 1
FROM
(
SELECT TOP (#count) B.LockedBy, B.LockedOn, B.UnlockOn, B.Complete
From
(
SELECT ID, BranchGenerations, (BranchSize * EstimateCount * ProbeCount) as MultipliedFactor
FROM Objectives AS O WHERE (O.Active = 1)
)O
INNER JOIN Generations AS G ON G.ObjectiveID = O.ID
Inner Join
(
Select Sealed, GenerationNo, LockedBy, UnlockOn, ID, Complete
From Branches
Where B.Sealed = 0 AND (B.LockedBy IS NULL OR DATEDIFF(SECOND, B.UnlockOn, GETDATE()) > 0)
)B ON B.GenerationID = G.ID
INNER JOIN
(
Select * from #temp
) AS X ON X.BranchID = B.ID
WHERE
AND (B.GenerationNo < O.BranchGenerations)
AND (B.Complete = 1 OR X.SuitableProbes = O.MultipliedFactor)
) AS B

Related

Slow insert into but fast select query

A select query with join returns results in less than 1 sec(returns 1000 rows: 600 ms) but insert into temp table or physical table takes 15-16 seconds.
tested io performance : a select or insert into without any joins takes sub-second to write 1000 rows.
tried trace flag 1118
tried adding clustered index on the temp table and do insert into with tablock and maxdop hint.
None of these improved performance.
Thanks for all your comments. 6000 to 20000 rows that need to be inserted every 5 seconds from Kafka...
I get the data from kafka into sql server using table type variable
Pass it as a parameter to stored procedure
Load this data joining with other tables into a temporary table #table
I use the #table to merge the data into application table
Found a workaround that helps me achieve the target turnaround time but dont exactly know the reason for the behaviour. As I mentioned in the problem statement, the bottleneck was writing the resultset of the select statement that joins the table variable with various other tables to the temp table.
I put this into a stored prod and returned the execution of the stored proc to a temp table. Now the insert takes less than 1 sec
SELECT
i.Id AS IId,
df.Id AS dfid,
MAX(CASE
WHEN lp.Value IS NULL and f.pnp = 1 THEN 0
WHEN lp.Value = 0 and f.tzan = 1 and f.pnp = 0 THEN NULL
ELSE lp.Value
END) 'FV',
MAX(lp.TS),
MAX(lp.Eid),
MAX(0+lp.IsDelayedStream)
FROM
f1 f WITH (NOLOCK)
INNER JOIN ft1 ft WITH (NOLOCK) ON f.FeedTypeId = ft.Id
INNER JOIN FeedDataField fdf WITH (NOLOCK)
ON fdf.FeedId = f.Id
INNER JOIN df1 df WITH (NOLOCK)
ON fdf.dfId = df.Id
INNER JOIN ds1 ds WITH (NOLOCK)
ON df.dsid = ds.Id
INNER JOIN dp1 dp WITH (NOLOCK)
ON ds.dpId = dp.Id
INNER JOIN dc1 dc WITH (NOLOCK)
ON dc.dcId = ds.dcId
INNER JOIN i1 i WITH (NOLOCK)
ON f.iId = I.Id
INNER JOIN id1 id WITH (NOLOCK)
ON id.iId = i.Id
INNER JOIN IdentifierType it WITH (NOLOCK)
ON id.ItId = it.Id
INNER JOIN ivw_Tdf tdf WITH(NOEXPAND)
ON tdf.iId = i.Id
INNER JOIN z.dbo.[tlp] lp
ON lp.Ticker = id.Name AND lp.Field = df.SourceName AND
lp.Contributor = dc.Name AND lp.YellowKey = tdf.TextValue
WHERE
ft.Name in ('X', 'Y') AND f.SA = 1
AND dp.Name = 'B' AND (i.Inactive IS NULL OR i.Inactive = 0)
AND it.Name = 'T' AND id.ValidTo = #InfinityDate
AND tdf.SourceName = 'MSD'
AND tdf.ValidTo = #Infinity
GROUP BY i.Id, df.Id
OPTION(MAXDOP 4, OPTIMIZE FOR (#Infinity = '9999-12-31 23:59:59',
#InfinityDate = '9999-12-31))

Sort & Parallelism costing my query too much time

My SQL query is taking a large amount of time to run. I wrote a similar query and pit them against each other and this one runs FASTER when a small dataset (10K lines) is used, but about 20-30x slower than the other one when a LARGE dataset (500K+ lines) is used. My first query however does not have ONE column that I need, and I cannot add it without going about it with this different approach.
SELECT a.[RFIDTAGID], a.[JOB_NUMBER], d.[PROJECT_NUMBER], a.[PART_NUMBER], a.[QUANTITY], b.[DESIGNATION] as LOCATION,
c.[DESIGNATION] as CONTAINER, a.[LAST_SEEN_TIME], b.[TYPE], b.[BLDG], d.[PBG], d.[PLANNED_MFG_DELIVERY_DATE], d.[EXTENSION_DATE], a.[ORG_ID]
FROM [LTS].[dbo].[LTS_PACKAGE] as a
LEFT OUTER JOIN (
SELECT [DESIGNATION], [CONTAINER_ID], [LOCATION_ID]
FROM [LTS].[dbo].[LTS_CONTAINER]
) c ON a.[CONTAINER_ID] = c.[CONTAINER_ID]
LEFT OUTER JOIN (
SELECT [DESIGNATION], [TYPE], [BLDG], [LOCATION_ID]
FROM [LTS].[dbo].[LTS_LOCATION]
) b ON a.[LAST_SEEN_LOC_ID] = b.[LOCATION_ID] OR b.[LOCATION_ID] = c.[LOCATION_ID]
INNER JOIN (
SELECT [PBG], [PLANNED_MFG_DELIVERY_DATE], [EXTENSION_DATE], [DISCRETE_JOB_NUMBER], [PROJECT_NUMBER]
FROM [LTS].[dbo].[LTS_DISCRETE_JOB_SUMMARY]
)d ON a.[JOB_NUMBER] = d.[DISCRETE_JOB_NUMBER]
WHERE
d.[PLANNED_MFG_DELIVERY_DATE] <= GETDATE()
AND b.[TYPE] NOT IN('MFG', 'Manufacturing')
AND (b.[DESIGNATION] IS NOT NULL OR c.[DESIGNATION] IS NOT NULL)
ORDER BY [JOB_NUMBER], d.[PLANNED_MFG_DELIVERY_DATE] desc, [RFIDTAGID];
You can see below the usage, 100% is roughly 20,000, whereas my other query is about 900:
Is there something I can do to speed up my query, or where did I bog it down?
Remove inner selects and join directly to the tables:
SELECT a.[RFIDTAGID], a.[JOB_NUMBER], d.[PROJECT_NUMBER], a.[PART_NUMBER], a.[QUANTITY], b.[DESIGNATION] as LOCATION,
c.[DESIGNATION] as CONTAINER, a.[LAST_SEEN_TIME], b.[TYPE], b.[BLDG], d.[PBG], d.[PLANNED_MFG_DELIVERY_DATE], d.[EXTENSION_DATE], a.[ORG_ID]
FROM [LTS].[dbo].[LTS_PACKAGE] a
LEFT OUTER JOIN [LTS].[dbo].[LTS_CONTAINER]
c ON a.[CONTAINER_ID] = c.[CONTAINER_ID]
LEFT OUTER JOIN [dbo].[LTS_LOCATION]
b ON a.[LAST_SEEN_LOC_ID] = b.[LOCATION_ID] OR b.[LOCATION_ID] = c.[LOCATION_ID]
INNER JOIN
[LTS].[dbo].[LTS_DISCRETE_JOB_SUMMARY]
d ON a.[JOB_NUMBER] = d.[DISCRETE_JOB_NUMBER]
WHERE
d.[PLANNED_MFG_DELIVERY_DATE] <= GETDATE()
AND b.[TYPE] NOT IN('MFG', 'Manufacturing')
AND (b.[DESIGNATION] IS NOT NULL OR c.[DESIGNATION] IS NOT NULL)
ORDER BY [JOB_NUMBER], d.[PLANNED_MFG_DELIVERY_DATE] desc, [RFIDTAGID];

Stored Procedure Optimisation

I have written the following SQL Stored Procedure and because of all the select commands (I think) it's really slow running now the database has been populated with lots of data. Is there a way to optimise it so that it runs much quicker? Currently in an Azure S0 DB it takes around 1:40 min to process. Here's the stored procedure:
USE [cmb2SQL]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[spStockReport] #StockReportId as INT
AS
select
ProductId,
QtyCounted,
LastStockCount,
Purchases,
UnitRetailPrice,
CostPrice,
GrossProfit,
Consumed,
(Consumed * UnitRetailPrice) as ValueOfSales,
(QtyCounted * CostPrice) as StockOnHand,
StockCountId
from (
select
ProductId,
QtyCounted,
LastStockCount,
Purchases,
UnitRetailPrice,
CostPrice,
GrossProfit,
(LastStockCount + Purchases) - QtyCounted as Consumed,
StockCountId
from (
select
distinct
sci.StockCountItem_Product as ProductId,
(Select ISNULL(SUM(Qty), 0) as tmpQty from
(Select Qty from stockcountitems
join stockcounts on stockcountitems.stockcountitem_stockcount = stockcounts.id
where stockcountitem_Product = p.Id and stockcountitem_stockcount = sc.id and stockcounts.stockcount_pub = sc.StockCount_Pub
) as data
) as QtyCounted,
(Select ISNULL(SUM(Qty), 0) as LastStockCount from
(Select Qty from StockCountItems as sci
join StockCounts on sci.StockCountItem_StockCount = StockCounts.Id
join Products on sci.StockCountItem_Product = Products.id
where sci.StockCountItem_Product = p.id and sci.stockcountitem_stockcount =
(select top 1 stockcounts.id from stockcounts
join stockcountitems on stockcounts.id = stockcountitem_stockcount
where stockcountitems.stockcountitem_product = p.id and stockcounts.id < sc.id and StockCounts.StockCount_Pub = sc.StockCount_Pub
order by stockcounts.id desc)
) as data
) as LastStockCount,
(Select ISNULL(SUM(Qty * CaseSize), 0) as Purchased from
(select Qty, Products.CaseSize from StockPurchaseItems
join Products on stockpurchaseitems.stockpurchaseitem_product = products.id
join StockPurchases on stockpurchaseitem_stockpurchase = stockpurchases.id
join Periods on stockpurchases.stockpurchase_period = periods.id
where Products.id = p.Id and StockPurchases.StockPurchase_Period = sc.StockCount_Period and StockPurchases.StockPurchase_Pub = sc.StockCount_Pub) as data
) as Purchases,
sci.RetailPrice as UnitRetailPrice,
sci.CostPrice,
(select top 1 GrossProfit from Pub_Products where Pub_Products.Pub_Product_Product = p.id and Pub_Products.Pub_Product_Pub = sc.StockCount_Pub) as GrossProfit,
sc.Id as StockCountId
from StockCountItems as sci
join StockCounts as sc on sci.StockCountItem_StockCount = sc.Id
join Pubs on sc.StockCount_Pub = pubs.Id
join Periods as pd on sc.StockCount_Period = pd.Id
join Products as p on sci.StockCountItem_Product = p.Id
join Pub_Products as pp on p.Id = pp.Pub_Product_Product
Where StockCountItem_StockCount = #StockReportId and pp.Pub_Product_Pub = sc.StockCount_Pub
Group By sci.CostPrice, sci.StockCountItem_Product, sci.Qty, sc.Id, p.Id, sc.StockCount_Period, pd.Id, sci.RetailPrice, pp.CountPrice, sc.StockCount_Pub
) as data
) as final
GO
As requested here is the execution plan in XML (had to upload it to tinyupload as it exceeds the message character length):
execusionplan.xml
Schema:
Row Counts:
Table row_count
StockPurchaseItems 57511
Products 3116
StockCountItems 60949
StockPurchases 6494
StockCounts 240
Periods 30
Pub_Products 5694
Pubs 7
Without getting into the query rewrite, it's the most expensive and the last thing you should do probably. Try these steps first, one by one, and measure the impact - time, execution plan, and SET STATISTICS IO ON output. Create the baseline first for these metrics. Stop when you achieve the acceptable performance.
First of all, update statistics on relevant tables, I see some of the estimates are way off. Check the exec plan for estimated vs actual rows - any better now?
Create indexes on StockPurchaseItems(StockPurchaseItem_Product) and on StockCountItems(StockCountItem_Product, StockCountItem_StockCount). Check the execution plan, did optimizer consider using the indexes at all?
Add (include) other referenced columns to those two indexes in order to cover the query. Are they used now?
If nothing of above helped, consider breaking the query into smaller ones. Would be nice to have some real data to experiment with to be more specific.
** That "select distinct" smells real bad, are you sure the joins are all ok?

Reduce Runtime of T-SQL Query

-- WITH POD was causing the issue, removing this code reduced 2 year pull to 3 mins.
Will post new question to figure out best way to include POD data.
--Edit for clarity, I am a read only user to these tables.
I wrote the below query, but it takes a very long time to execute (20min).
It is currently limited to 1 month, but user wants at least 1 year preferably 2. I assume this would scale time to hours.
Can anyone take a look at let me know if there is a BKM I am not using to improve performance?
Or if there is a better method for a report of this size? At 2 years, it would return ~100K rows from 17 tables.
WITH
POD AS
(
SELECT SHIPMENTS.Delivery
,SHIPMENTS.Shipment_Number
,PROOF_OF_DELIVERY.Shipping_Carrier
,PROOF_OF_DELIVERY.Tracking_Number
,PROOF_OF_DELIVERY.Ship_Method
,PROOF_OF_DELIVERY.POD_Signature
,PROOF_OF_DELIVERY.POD_Date
,PROOF_OF_DELIVERY.POD_Time
FROM
SHIPMENTS
LEFT JOIN PROOF_OF_DELIVERY
ON SHIPMENTS.Shipment_Number = PROOF_OF_DELIVERY.Delivery_Or_Shipment
WHERE Load_Date IN
(
SELECT MAX(Load_Date)
FROM PROOF_OF_DELIVERY
GROUP BY Delivery_Or_Shipment
)
)
SELECT DISTINCT GI.GOODS_ISSUE_DOCUMENT_ID
,GI.SALES_ORDER_ID
,GI.SALES_ORDER_LINE_ID
,GI.SALES_ORDER_TYPE_CODE
,GI.DELIVERY_HEADER_ID
,GI.DELIVERY_ITEM_ID
,FD.FISCAL_MONTH_CODE
,GI.MATERIAL_NUMBER
,GI.SHIPPED_QTY
,SO.ORDERER_NAME
,SO.CREATED_BY
,SO.CONTACT_PERSON
,GI.SOLD_TO_CUSTOMER_ID
,GI.SHIP_TO_CUSTOMER_ID
,GI.ORIGINAL_COMMIT_DATE
,GI.SHIP_FROM_PLANT_ID
,GI.ACTUAL_PGI_DATE
,GI.CUSTOMER_PO_NUMBER
,GI.SHIPPED_PRICE
,(GI.SHIPPED_PRICE * GI.SHIPPED_QTY) AS EXT_SHIPPED_PRICE
,GI.SALES_ORGANIZATION_CODE
,GI.DELIVERY_NOTE_PRIORITY_CODE
,FD.FISCAL_WEEK_CODE
,DV.DIVISION_CODE
,DN.Delivery_Item_Creation_Date
,SOLD.CUSTOMER_SHORT_NAME AS SOLD_TO_CUSTOMER_SHORT_NAME
,SHIP.CUSTOMER_SHORT_NAME AS SHIP_TO_CUSTOMER_SHORT_NAME
,SHIP.Customer_Site_Name
,SHIP.REGION_NAME
,MATD.MATERIAL_DESCRIPTION
,MATD.STANDARD_COST
,(MATD.STANDARD_COST * GI.SHIPPED_QTY) AS EXT_STANDARD_COST
,MATD.GLOBAL_EVENT
,PLT.LEAD_TIME_FOR_ORIGINAL_COMMIT
,OPRM.BASE_PART_CODE
,MATD.PRODUCT_INSP_MEMO
,MATD.MATERIAL_PRICING_GROUP_CODE
,MATD.MATERIAL_STATUS AS MMPP
,PIM.PIM_PBG_GROUPING
,SOL.SHIPPING_CONDITION
,SVO.SERVICE_ORDER_NUM
,SO.CREATION_TIME AS SO_CREATION_TIME
,SOL.CREATED_TIME AS SO_LINE_CREATED_TIME
,SOL.SHIPPING_POINT
,SDT.SALES_DOCUMENT_TYPE_CODE AS SVO_DOCUMENT_TYPE_CODE
,EQU.EQUIPMENT_NUM
,EQU.SERIAL_NUMBER
,EQU.CUSTOMERTOOLID
,POD.Shipment_Number
,POD.Shipping_Carrier
,POD.Tracking_Number
,POD.Ship_Method
,POD.POD_Signature
,POD.POD_Date
,POD.POD_Time
,DATEDIFF(dd,SO.CREATION_TIME,GI.ACTUAL_PGI_DATE) AS Cycle_Time_to_PGI_Days
,DATEDIFF(hh,SO.CREATION_TIME,GI.ACTUAL_PGI_DATE) AS Cycle_Time_to_PGI_Hours
FROM GOODS_ISSUE AS GI
INNER JOIN dbo.Delivery_Notes AS DN
ON GI.DELIVERY_HEADER_ID = DN.DELIVERY_HEADER_CODE AND GI.DELIVERY_ITEM_ID = DN.DELIVERY_ITEM_CODE
INNER JOIN dbo.Customer_View AS SOLD
ON GI.SOLD_TO_CUSTOMER_ID = SOLD.CUSTOMER_CODE
INNER JOIN dbo.Customer_View AS SHIP
ON GI.SOLD_TO_CUSTOMER_ID = SHIP.CUSTOMER_CODE
INNER JOIN dbo.MATERIAL_DETAILS AS MATD
ON GI.MATERIAL_NUMBER = MATD.MATERIAL_NUMBER
INNER JOIN dbo.OPR_MATERIAL_DIM AS OPRM
ON OPRM.MATERIAL_NUMBER = GI.MATERIAL_NUMBER
LEFT JOIN dbo.SM_DATE_DIM AS FD
ON CAST(FD.CALENDAR_DAY AS DATE) = CAST(GI.ACTUAL_PGI_DATE AS DATE)
LEFT JOIN dbo.DIM_PUBLISHED_LEAD_TIME_COMMIT AS PLT
ON PLT.MATERIAL_NUMBER = OPRM.BASE_PART_CODE
LEFT JOIN dbo.PRODUCT_INSP_MEMO_DIM AS PIM
ON PIM.PRODUCT_INSP_MEMO = MATD.PRODUCT_INSP_MEMO
INNER JOIN dbo.SM_SALES_ORDER_LINE_FACT AS SOL
ON SOL.SALES_ORDER_CODE = GI.SALES_ORDER_ID AND SOL.SALES_ORDER_LINE_CODE = GI.SALES_ORDER_LINE_ID
INNER JOIN dbo.SM_SALES_ORDER_FACT AS SO
ON SO.SALES_ORDER_CODE = GI.SALES_ORDER_ID
INNER JOIN dbo.SM_DIVISION_DIM AS DV
ON SO.DIVISION_SID = DV.DIVISION_SID
LEFT JOIN dbo.SERVICE_ORDER_FACT AS SVO
ON SVO.SERVICE_ORDER_NUM = SO.SERVICE_ORDER_NUMBER
LEFT JOIN dbo.SM_SALES_DOCUMENT_TYPE_DIM AS SDT
ON SDT.SALES_DOCUMENT_TYPE_SID = SVO.SALES_DOCUMENT_TYPE_SID
LEFT JOIN dbo.SM_EQUIPMENT_DIM AS EQU
ON EQU.EQUIPMENT_SID = SVO.EQUIPMENT_SID
LEFT JOIN POD
ON POD.Delivery = GI.DELIVERY_HEADER_ID
WHERE GI.ACTUAL_PGI_DATE > GETDATE()-32
AND SOLD_TO_CUSTOMER_ID IN (0010000252,0010000898,0010001121,0010001409,0010001842,0010001852,0010001879,0010001977,0010001978,0010002021,0010002202,0010002227,0010002982,0010003118,0010003176,0010003294,0010005492,0010006904,0010007048,0010007080,0010010381,0010010572,0010010905,0010011999,0010012014,0010012048,0010012571,0010013124,0010013711,0010013713,0010013824,0010014180,0010014188,0010014333,0010015059,0010015313,0010015414,0010015541,0010015544,0010015550)
A CTE is just syntax
I suspect that CTE is evaluated many times
Materialze the CTE to #temp with indexe(s) so it is run once
This cast will hurt it
Make those columns true dates and index them
ON CAST(FD.CALENDAR_DAY AS DATE) = CAST(GI.ACTUAL_PGI_DATE AS DATE)
That where negates the left so you can just do a join
Also that MAX(Load_Date) could match on another shipment
SELECT SHIPMENTS.Delivery
,SHIPMENTS.Shipment_Number
,PROOF_OF_DELIVERY.Shipping_Carrier
,PROOF_OF_DELIVERY.Tracking_Number
,PROOF_OF_DELIVERY.Ship_Method
,PROOF_OF_DELIVERY.POD_Signature
,PROOF_OF_DELIVERY.POD_Date
,PROOF_OF_DELIVERY.POD_Time
FROM SHIPMENTS
JOIN PROOF_OF_DELIVERY
ON SHIPMENTS.Shipment_Number = PROOF_OF_DELIVERY.Delivery_Or_Shipment
WHERE PROOF_OF_DELIVERY.Load_Date IN
(
SELECT MAX(Load_Date)
FROM PROOF_OF_DELIVERY
GROUP BY Delivery_Or_Shipment
)
Pull this up into the join
INNER JOIN dbo.Customer_View AS SOLD
ON GI.SOLD_TO_CUSTOMER_ID = SOLD.CUSTOMER_CODE
AND GI.SOLD_TO_CUSTOMER_ID IN (0010000252,0010000898,0010001121,0010001409,0010001842,0010001852,0010001879,0010001977,0010001978,0010002021,0010002202,0010002227,0010002982,0010003118,0010003176,0010003294,0010005492,0010006904,0010007048,0010007080,0010010381,0010010572,0010010905,0010011999,0010012014,0010012048,0010012571,0010013124,0010013711,0010013713,0010013824,0010014180,0010014188,0010014333,0010015059,0010015313,0010015414,0010015541,0010015544,0010015550)

Filtering rows if two columns values are not same - sql server

I need to apply date filter on rows where OwningOfficeID and ScopeOfficeID is not same.
Here is the main query;
SELECT distinct v.VisitID, a.OfficeID AS OwningOfficeID,
scp.OfficeID AS ScopeOfficeID, V.VisitDate,
a.staffID as OwningStaff ,scp.StaffID as OwningScope
FROM Visits v
INNER JOIN VisitDocs vdoc ON vdoc.VisitID = v.VisitID
INNER JOIN InspectionScope scp ON scp.ScopeID = v.ScopeID
INNER JOIN Assignments a ON a.AssignmentID = scp.AssignmentID
INNER JOIN Staff s ON s.StaffID = a.StaffID
WHERE
v.VisitType = 1 AND
--'SCOPE OWNER AND LOOK FOR INSPECTION REPORT BUT NOT FOR COORD/FINAL REPORT.
(scp.StaffID = 141
AND EXISTS(SELECT *
FROM VisitDocs d
WHERE d.VisitID = v.VisitID
AND d.docType = 13)
AND NOT EXISTS(SELECT *
FROM VisitDocs d
WHERE d.VisitID = v.VisitID AND d.docType IN (1,2))
)
OR
--'ASSIGNMENT OWNER AND NOT SCOPE OWNER AND LOOK FOR COORDINATOR REPORT.
(a.StaffID = 141 AND scp.StaffID != 141
AND EXISTS(SELECT *
FROM VisitDocs d
WHERE d.VisitID = v.VisitID
AND d.docType = 2)
AND NOT EXISTS(SELECT * FROM VisitDocs d
WHERE d.VisitID = v.VisitID AND d.docType IN (1))
)
Result Set
Following condition can be applied to outer select query to achieve the results.
(OwningOfficeID <> ScopeOfficeID AND VisitDate >='01/11/2012' OR OwningOfficeID = ScopeOfficeID)
Is there anyway to do it in the one select statement?
I'm not sure what you mean by one select statement, but I think what you really want to know is how to do this query so that it runs quickly, is easy to see it is correct and easy to maintain.
Here is how to do that; use CTEs. (If you don't know what a CTE is go read about them and then come back here -- they are documented on Microsoft's website.)
CTEs can be much faster and clearer than sub-queries. Let me show you how, taking the code above and re-factoring with CTEs I get:
WITH DocTypes AS
(
SELECT VisitID, docType
FROM VisitDocs
), DocType13 AS
(
SELECT VisitID
FROM DocTypes
WHERE docType = 13
), DocType1and2 AS
(
SELECT VisitID
FROM DocTypes
WHERE docType IN (1,2)
), DocType1 AS
(
SELECT VisitID
FROM DocTypes1and2
WHERE docType = 1
), DocType2 AS
(
SELECT VisitID
FROM DocTypes1and2
WHERE docType = 2
), Base AS
(
SELECT distinct v.VisitID, a.OfficeID AS OwningOfficeID,
scp.OfficeID AS ScopeOfficeID, V.VisitDate,
a.staffID as OwningStaff ,scp.StaffID as OwningScope
FROM Visits v
INNER JOIN VisitDocs vdoc ON vdoc.VisitID = v.VisitID
INNER JOIN InspectionScope scp ON scp.ScopeID = v.ScopeID
INNER JOIN Assignments a ON a.AssignmentID = scp.AssignmentID
INNER JOIN Staff s ON s.StaffID = a.StaffID
WHERE
v.VisitType = 1 AND
--'SCOPE OWNER AND LOOK FOR INSPECTION REPORT BUT NOT FOR COORD/FINAL REPORT.
(scp.StaffID = 141
AND EXISTS(SELECT * FROM DocType13 d WHERE d.VisitID = v.VisitID)
AND NOT EXISTS(SELECT * FROM DocType1and2 d WHERE d.VisitID = v.VisitID)
)
OR
--'ASSIGNMENT OWNER AND NOT SCOPE OWNER AND LOOK FOR COORDINATOR REPORT.
(a.StaffID = 141 AND scp.StaffID != 141
AND EXISTS(SELECT * FROM DocType2 d WHERE d.VisitID = v.VisitID)
AND NOT EXISTS(SELECT * FROM DocType1 d WHERE d.VisitID = v.VisitID)
)
)
SELECT *
FROM Base
WHERE (OwningOfficeID <> ScopeOfficeID
AND VisitDate >='01/11/2012'(
OR OwningOfficeID = ScopeOfficeID
It should be clear why this is better, but if it isn't briefly:
All the selects you see at the beginning are exactly as they were in the original query, but because they are broken out prior the optimizer has an easier time, it will do better than when they were sub-queries -- I've seen huge changes in some cases, but in any case the optimizer has the option now to do a better job of memory caching and such.
It is clearer and easier to test. If you have problems with the query you can test it easy, comment out the main select and just select one of the the sub-queries, is it what you expect? These sub-queries are simple, but they can get complicated and this makes it easier.
You said you wanted to apply some logic to this query. I've added it to the CTE, as you can see it makes this complicated "layering" simple.