I have some questions about my query. I call this store-procedure in my first page, so it is important for me if it is optimize enough.
I do some select with some basic where expression, Then I filter them with some expression I passed through this store-procedure.
It is also considerable for me to select top n and its gonna search through millions of items (but I have hundreds of items already) and then do some paging in my website.
Select top (#NumberOfRows)
...
from(
SELECT
row_number() OVER (ORDER BY tblEventOpen.TicketAt, tblEvent.EventName, tblEventDetail.TimeStart) as RowNumber
, ...
FROM --[...some inner join logic...]
WHERE
(tblEventOpen.isValid = 1) AND (tblEvent.isValid = 1) and
(tblCondition_ResellerDetail.ResellerID = 1) AND
(tblEventOpen.TicketAt >= GETDATE()) AND
(GETDATE() BETWEEN
DATEADD(minute, (tblEventDetail.TimeStart - 60 * tblCondition_ResellerDetail.StartTime) , tblEventOpen.TicketAt)
AND DATEADD(minute, (tblEventDetail.TimeStart - 60 * tblCondition_ResellerDetail.EndTime) , tblEventOpen.TicketAt))
) as t1
where RowNumber >= (#PageNumber -1) * #NumberOfRows and
(#city='' or #city is null or city like #city) and
(#At is null or #At=At) and
(#TimeStartInMinute=-1 or #TimeStartInMinute=TimeStartInMinute) and
(#EventName='' or EventName like #EventName) and
(#CategoryID=-1 or #CategoryID = CategoryID) and
(#EventID is null or #EventID = EventID) and
(#DetailID is null or #DetailID = DetailID)
ORDER BY RowNumber
I'm worry about this part:
(GETDATE() BETWEEN
DATEADD(minute, (tblEventDetail.TimeStart - 60 * tblCondition_ResellerDetail.StartTime) , tblEventOpen.TicketAt)
AND DATEADD(minute, (tblEventDetail.TimeStart - 60 * tblCondition_ResellerDetail.EndTime) , tblEventOpen.TicketAt))
How does table t1 execute? I mean after I put some where expression after t1 (line 17 and further), does it filter items after execution of t1? for example I filter result by rownumber of 10, so it mean the inner (...) as t1 select will only return 10 items, or it select all items then my outer select will take 10 of them?
I want to filter my result by some optional parameters, so I put something like #DetailID is null or #DetailID = DetailID, is it a good way?
Anything else should I consider to make it faster (more optimize)?
My comment on your query:
You're correct, you should worry about condition "GETDATE() BETWEEN ...". Comparing value with function involving more than 1 table will most likely scan entire search space. Simplify your condition or if possible add a computed column for such function
Put all conditions except "RowNumber >= ..." in inner query
Its okay to put optional condition the way you do. I do it too :-)
Make sure you have index at least one for each column employed in the where clause as the first column of the index, and then the primary key. It would be better if your primary key is clustered
Well, these are based on my own experience. It may or may be not applicable to your situation.
[UPDATE] Here's the complete query
Select top (#NumberOfRows)
...
from(
SELECT
row_number() OVER (ORDER BY tblEventOpen.TicketAt, tblEvent.EventName, tblEventDetail.TimeStart) as RowNumber
, ...
FROM --[...some inner join logic...]
WHERE
(tblEventOpen.isValid = 1) AND (tblEvent.isValid = 1) and
(tblCondition_ResellerDetail.ResellerID = 1) AND
(tblEventOpen.TicketAt >= GETDATE()) AND
(GETDATE() BETWEEN
DATEADD(minute, (tblEventDetail.TimeStart - 60 * tblCondition_ResellerDetail.StartTime) , tblEventOpen.TicketAt)
AND DATEADD(minute, (tblEventDetail.TimeStart - 60 * tblCondition_ResellerDetail.EndTime) , tblEventOpen.TicketAt)) and
(#city='' or #city is null or city like #city) and
(#At is null or #At=At) and
(#TimeStartInMinute=-1 or #TimeStartInMinute=TimeStartInMinute) and
(#EventName='' or EventName like #EventName) and
(#CategoryID=-1 or #CategoryID = CategoryID) and
(#EventID is null or #EventID = EventID) and
(#DetailID is null or #DetailID = DetailID)
) as t1
where RowNumber >= (#PageNumber -1) * #NumberOfRows
ORDER BY RowNumber
Whilst you can seek advice on your query, it is better to learn how to optimise it yourself.
You need to view the execution plan, identify the bottlenecks and then see if there is anything that can be done to make an improvement.
In SSMS you can click "Query" ---> "Include Actual Execution Plan" before you run your query. (Ctrl+M) is they keyboard shortcut.
Then execute your query. SSMS will create a new tab in the results pane. Which will show you how the SQL engine executes your query, you can hover over each node for more information. The cost % will be particularly interesting, allowing you to see the most expensive part of your query.
It's difficult to advise you any more without that execution plan, which is why a number of people commented on your question. Your schema and indexes change how the query is executed, so it's not something that someone can accuratly replicate in their own environment without scripts for tables / indexes etc.... Even then statistics could be out of date and other problems could arise.
You can also execute SET STATISTICS PROFILE ON to get a textual view of the plan (maybe useful to seek help).
There are a number of articles that can help you fix the bottlenecks, or post another question for more advice.
http://msdn.microsoft.com/en-us/library/ms178071.aspx
SQL Server Query Plan Analysis
Execution Plan Basics
Related
The query structure: Helper-select in "with" clause - selects most recent entry using 'top 1 transaction_date'. Then does many joins. It takes too much time to run - what am I doing wrong?
CREATE VIEW [IRWSMCMaterialization].[FactInventoryItemOnHandDailyView] AS
WITH TempTBLFactIvnItmDaily AS (
SELECT TOP 20
ITEM_NUMBER AS [InventoryItemNumber]
,CAST(FORMAT(TRANSACTION_DATE, 'yyyyMMdd') AS INT) AS [DateKey]
,BRANCH_PLANT_FHK AS [BranchPlantKey]
,BRANCH_PLANT_CODE AS [BranchPlantCode]
,CAST(QUANTITY_ON_HAND AS BIGINT) AS [QuantityOnHand]
,TRANSACTION_DATE AS [Date]
,WAREHOUSE_LOCATION_FHK AS [WarehouseLocationKey]
,WAREHOUSE_LOCATION_CODE AS [WarehouseLocationCode]
,WAREHOUSE_LOT_NUMBER_CODE AS [WarehouseLotNumber]
,WAREHOUSE_LOT_NUMBER_FHK AS [WarehouseLotNumberKey]
,UNIT_OF_MEASURE AS [UnitOfMeasureName]
,UNIT_OF_MEASURE_PHK AS [UnitOfMeasureKey]
FROM dbo.RS_INV_ITEM_ON_HAND
-- below is where clause, choose only most recent entry
WHERE TRANSACTION_DATE = (SELECT TOP 1 TRANSACTION_DATE FROM dbo.RS_INV_ITEM_ON_HAND ORDER BY TRANSACTION_DATE DESC)
)
SELECT [InventoryItemNumber],
[DateKey],
[Date],
[BranchPlantCode] AS [BP],
[WarehouseLocationCode] AS [Location],
[QuantityOnHand],
[UnitOfMeasureName] AS [UoM],
CASE [WarehouseLotNumber]
WHEN 'Not Assigned' THEN NULL
ELSE [WarehouseLotNumber]
END
AS [Lot]
FROM TempTBLFactIvnItmDaily iioh
JOIN DWH.DimBranchPlant bp ON iioh.BranchPlantKey = bp.BRANCH_PLANT_PHK
JOIN DWH.DimWarehouseLocation wloc ON iioh.WarehouseLocationKey = wloc.WAREHOUSE_LOCATION_PHK
JOIN DWH.DimWarehouseLotNumber wlot ON iioh.WarehouseLotNumberKey = wlot.WarehouseLotNumber_PHK
JOIN DWH.DimUnitOfMeasure uom ON CAST(iioh.UnitOfMeasureKey AS VARCHAR(100)) = uom.UNIT_OF_MEASURE_PHK
where bp.BRANCH_PLANT_CODE = '96100'
AND iioh.QuantityOnHand > 0
AND (wloc.WAREHOUSE_LOCATION_CODE like '6000W01%' OR wloc.WAREHOUSE_LOCATION_CODE like 'BL%')
GO
There are a lot of things that does not seems good. First of all, your base query must be a lot simpler. Something like this:
SELECT iioh.ITEM_NUMBER AS [InventoryItemNumber],
CAST(FORMAT(iioh.TRANSACTION_DATE, 'yyyyMMdd') AS INT) AS [DateKey],
iioh.TRANSACTION_DATE AS [Date],
iioh.BRANCH_PLANT_CODE AS [BP],
iioh.WAREHOUSE_LOCATION_CODE AS [Location],
CAST(iioh.QUANTITY_ON_HAND AS BIGINT) AS [QuantityOnHand],
iioh.UNIT_OF_MEASURE AS [UoM],
NULLIF(iioh.WAREHOUSE_LOT_NUMBER_CODE, 'Not Assigned') AS [Lot]
FROM dbo.RS_INV_ITEM_ON_HAND iioh
JOIN DWH.DimBranchPlant bp
ON iioh.BranchPlantKey = bp.BRANCH_PLANT_PHK
JOIN DWH.DimWarehouseLocation wloc
ON iioh.WarehouseLocationKey = wloc.WAREHOUSE_LOCATION_PHK
JOIN DWH.DimUnitOfMeasure uom
ON CAST(iioh.UnitOfMeasureKey AS VARCHAR(100)) = uom.UNIT_OF_MEASURE_PHK
where bp.BRANCH_PLANT_CODE = '96100'
AND iioh.QuantityOnHand > 0
AND (wloc.WAREHOUSE_LOCATION_CODE like '6000W01%' OR wloc.WAREHOUSE_LOCATION_CODE like 'BL%')
AND iioh.TRANSACTION_DATE = #TRANSACTION_DATE
For example, you are joining the DWH.DimWarehouseLotNumber but you are not extracting columns - do you really need it? Also, there are other columns which are not returned by the view - why to query them?
From, there you are first filtering by date and then y other fields, so your first TOP 20 records may be filtered by the next conditions - is this a behavior you want?
Also, do you really want this cast?
ON CAST(iioh.UnitOfMeasureKey AS VARCHAR(100)) = uom.UNIT_OF_MEASURE_PHK
It's better to use CONVERT, not FORMAT in performance aspect. Also, why not saving/materializing the TRANSACTION_DATE as INT (for example using a persisted computed column or just on CRUD) instead of calculating this value on each read?
Filtering by location code using LIKE clause can heart the performance, too. Why not adding a new column WareHouseLocationCodeType and set a same value for all locations satisfying this condition:
(wloc.WAREHOUSE_LOCATION_CODE like '6000W01%' OR wloc.WAREHOUSE_LOCATION_CODE like 'BL%')
Then you can filter by this column in the view since this is very important for you. Also, you can create filter index on this column to increase the performance, more.
Also, you may want to create a inline-function instead a view and pass the date as parameter:
CREATE OR ALTER FUNCTION [IRWSMCMaterialization].[FactInventoryItemOnHandDailyView]
(
#TRANSACTION_DATE datetime
)
RETURNS TABLE
AS
RETURN
(
SELECT iioh.ITEM_NUMBER AS [InventoryItemNumber],
CAST(FORMAT(iioh.TRANSACTION_DATE, 'yyyyMMdd') AS INT) AS [DateKey],
iioh.TRANSACTION_DATE AS [Date],
iioh.BRANCH_PLANT_CODE AS [BP],
iioh.WAREHOUSE_LOCATION_CODE AS [Location],
CAST(iioh.QUANTITY_ON_HAND AS BIGINT) AS [QuantityOnHand],
iioh.UNIT_OF_MEASURE AS [UoM],
NULLIF(iioh.WAREHOUSE_LOT_NUMBER_CODE, 'Not Assigned') AS [Lot]
,iioh.TRANSACTION_DATE
FROM dbo.RS_INV_ITEM_ON_HAND iioh
JOIN DWH.DimBranchPlant bp
ON iioh.BranchPlantKey = bp.BRANCH_PLANT_PHK
JOIN DWH.DimWarehouseLocation wloc
ON iioh.WarehouseLocationKey = wloc.WAREHOUSE_LOCATION_PHK
JOIN DWH.DimUnitOfMeasure uom
ON CAST(iioh.UnitOfMeasureKey AS VARCHAR(100)) = uom.UNIT_OF_MEASURE_PHK
where bp.BRANCH_PLANT_CODE = '96100'
AND iioh.QuantityOnHand > 0
AND (wloc.WAREHOUSE_LOCATION_CODE like '6000W01%' OR wloc.WAREHOUSE_LOCATION_CODE like 'BL%')
AND iioh.TRANSACTION_DATE = #TRANSACTION_DATE
)
Then call it like this:
SELECT TOP 20 *
FROM [IRWSMCMaterialization].[FactInventoryItemOnHandDailyView] ('2020-12-04')
ORDER BY #TRANSACTION_DATE DESC
The query optimization is science today. If you want to find bottlenecks in your query you can follow some of these steps:
As the first step, enable statistics with these commands:
SET STATISTICS TIME ON;
SET STATISTICS IO ON;
Once you execute these commands in some query windows in the same window execute your query. When your query is executed switch to the Messages tab and you will see a lot of useful information like TIME execution, parse and compile-time and maybe the most interesting I/O reads.
As the second step, try to understand which table has a lot of reads, for example if you are expecting 10 rows from the query, but in some tables you have 10k or 100k logical reads something is wrong. That means for the 10 rows query execution from one table reads 10k pages. Obviously you are missing some index on this table, try to find which index you need.
If you are having some static values in where clause like the following one, then think about Filtered Index:
bp.BRANCH_PLANT_CODE = '96100' AND iioh.QuantityOnHand > 0
Not always, but in some cases conversion can break your indexes if you are casting them or using some other function in where clause like the following one, even you have an index on this column query optimizer will not use it in query execution:
CAST(iioh.UnitOfMeasureKey AS VARCHAR(100))
The last one, if you have OR logical operator in your query try to execute one by one part of your OR logical operator separately see to performance. This logical operator can really kill your query, and this is one example:
AND (wloc.WAREHOUSE_LOCATION_CODE like '6000W01%' OR wloc.WAREHOUSE_LOCATION_CODE like 'BL%')
Once, you determine here that you don't have any issues you can go more further.
I have a query that takes 20 minutes to run, even though I have an index for every column in the where clauses, and every column being joined:
SELECT DISTINCT skt.VCDRAWING_REG_NO, skb.NDRAWING_ORG_NO, skb.NDRAWING_ORG_REV_NO, skb.CAPPLY_START_DATE, skb.CAPPLY_END_DATE, skto.*
FROM SPM_ABS_TRANBASE skt
JOIN SPM_ABS_BASE skb
ON skt.NDRAWING_ORG_REV_NO = skb.NDRAWING_ORG_REV_NO
AND skt.NDRAWING_ORG_NO = skb.NDRAWING_ORG_NO
JOIN SPM_ABS_MODEL skm
ON skb.NDRAWING_ORG_REV_NO = skm.NDRAWING_ORG_REV_NO
AND skb.NDRAWING_ORG_NO = skm.NDRAWING_ORG_NO
JOIN SPM_ABS_TRANOPT skto
ON skt.NDRAWING_SYSTEM_NO = skto.NDRAWING_SYSTEM_NO
JOIN ModelImport mi
ON skm.CMODEL = mi.ModelCode
WHERE (skb.CAPPLY_START_DATE <= DATEADD(day, 2, GETDATE()) OR skb.CAPPLY_START_DATE IS NULL)
AND (skb.CAPPLY_END_DATE >= DATEADD(day, -2, GETDATE()) OR skb.CAPPLY_END_DATE IS NULL)
Here is my query plan.
One thing that puzzles me is this: If I add the following WHERE clause, the query returns in about 0.5 seconds:
AND mi.ModelCode = '3FBK5'
Now you're saying, well, duh, of course it gets much faster with that - the thing is, the ModelImport table contains only 351 records. Which means if I were to split up the query above into 351 queries, each with its own where clause for a distinct ModelCode - then I can get 100% of my query results in about 175 seconds, or 2.9 minutes. This is dramatically faster. Which tells me that something in the wide-open query is grossly inefficient, and the query plan is bad.
Here is my query plan with AND mi.ModelCode = '3FBK5' added.
After viewing my query plan, any ideas how I can speed this up?
Is it possible that you eliminate some of the joins as you are not selecting anything from those tables or applying any where conditions to those tables, e.g., skm and mi?
Without having the table schema and sizes it's a little hard to give an exact answer, but here are some updates to try.
Use group by instead of distinct
Don't use * in the select results (particularly with distinct) instead give specific list of columns to return
Avoid "or" statements in where clause (maybe use ISNULL instead)
Here is what the query might look like with these updates (though there probably some other columns from skto you would want to add)
SELECT skt.VCDRAWING_REG_NO,
skb.NDRAWING_ORG_NO,
skb.NDRAWING_ORG_REV_NO,
skb.CAPPLY_START_DATE,
skb.CAPPLY_END_DATE,
skto.NDRAWING_SYSTEM_NO
FROM SPM_ABS_TRANBASE skt
JOIN SPM_ABS_BASE skb ON
skt.NDRAWING_ORG_REV_NO = skb.NDRAWING_ORG_REV_NO
AND skt.NDRAWING_ORG_NO = skb.NDRAWING_ORG_NO
JOIN SPM_ABS_MODEL skm ON
skb.NDRAWING_ORG_REV_NO = skm.NDRAWING_ORG_REV_NO
AND skb.NDRAWING_ORG_NO = skm.NDRAWING_ORG_NO
JOIN SPM_ABS_TRANOPT skto ON
skt.NDRAWING_SYSTEM_NO = skto.NDRAWING_SYSTEM_NO
JOIN ModelImport mi ON
skm.CMODEL = mi.ModelCode
WHERE ISNULL(skb.CAPPLY_START_DATE, DATEADD(day, 2, GETDATE())) <= DATEADD(day, 2, GETDATE())
AND ISNULL(skb.CAPPLY_END_DATE,DATEADD(day, -2, GETDATE())) >= DATEADD(day, -2, GETDATE())
GROUP BY skt.VCDRAWING_REG_NO,
skb.NDRAWING_ORG_NO,
skb.NDRAWING_ORG_REV_NO,
skb.CAPPLY_START_DATE,
skb.CAPPLY_END_DATE,
skto.NDRAWING_SYSTEM_NO
So, I'm trying to create a collection in SCCM which I would like to give me a
list of assets(name0) that don't have an .ide file linked to them that is newer than
21 days old. Once identified I can go off and investigate why these assets are not updating.
So far I have written the following query in SSMS before I set it up in SCCM,
but it's become evident that this isn't the correct approach .
SELECT DISTINCT v_GS_SYSTEM.Name0
FROM v_GS_SYSTEM inner join v_GS_SoftwareFile
ON v_GS_SoftwareFile.ResourceID = v_GS_SYSTEM.ResourceID
WHERE (DATEDIFF(day, v_GS_SoftwareFile.ModifiedDate, getdate()) >=21)
AND NOT
(DATEDIFF(day, v_GS_SoftwareFile.ModifiedDate, getdate()) <=21)
AND
v_GS_SoftwareFile.FileName like '/%.ide/'
ORDER BY v_GS_SYSTEM.Name0;
This code returns the "correct" values but doesn't consider the fact that an asset may
still have newer ide files related to it, which defeats the purpose of this exercise.
So (I think!) my question is, is there a way check if Name0 has any associated ModifiedDate records
newer than 21 days and only return a value if this check returns true/false?
EDIT: edited #MatBailie answer with output:
To join all '*.ide' Files to their Resource, but only for resources that have not had any '*.ide' files modified in the last 21 days...
SELECT
s.Name0,
f.FilePath,
f.FileName
FROM
(
SELECT
*,
MAX(ModifiedDate) OVER (PARTITION BY ResourceID) AS ResourceMaxModifiedDate
FROM
v_GS_SoftwareFile
WHERE
FileName LIKE '%.ide'
)
AS f
INNER JOIN
v_GS_SYSTEM AS s
ON s.ResourceID = f.ResourceID
WHERE
f.ResourceMaxModifiedDate <= DATEADD(DAY, -21, GETDATE())
ORDER BY
s.Name0,
f.FilePath,
f.FileName
To get all Resources that have had no '*.ide' files modified in the last 21 days...
SELECT
s.Name0
FROM
v_GS_SYSTEM AS s
WHERE
NOT EXISTS (
SELECT *
FROM v_GS_SoftwareFile AS f
WHERE f.FileName LIKE '%.ide'
AND f.ResourceID = s.ResourceID
AND f.ModifiedDate >= DATEADD(DAY, -21, GETDATE())
)
ORDER BY
s.name0
Consider your indexes on these tables depending on which query you end up with. A Covering index over (ResourceID, ModifiedDate) would be useful. And a flag for the file type would be useful too (LIKE '*.ide' is going to require scanning the rows to find the matches, it can't be solved with a typical index).
You can simply add an additional EXISTS clause where you check this.
I believe the query you're trying to write is:
SELECT DISTINCT vs.Name0
, vsf.FilePath
, vsf.FileName
FROM v_GS_SYSTEM vs
INNER JOIN v_GS_SoftwareFile vsf
ON vsf.ResourceID = vs.ResourceID
WHERE (DATEDIFF(day, vsf.ModifiedDate, getdate()) >= 21)
AND NOT (DATEDIFF(day, vsf.ModifiedDate, getdate()) <= 21) -- this seems a bit redundant and might even exclude some rows where the result is exactly (21 * 24) hours
AND vsf.FileName LIKE '/%.ide/'
AND NOT EXISTS (
SELECT 1
FROM v_GS_SoftwareFile vsf2
WHERE vsf2.ModifiedDate > GETDATE() - 21
AND vsf2.ResourceId = vsf.ResourceId
)
ORDER BY vs.Name0;
You can basically change the check the TRUE / FALSE by keeping or removing the NOT in the AND NOT EXISTS.
Edit:
Since you were mentioning performance problems check if you have non-clustered idexes:
on column ResourceId in v_GS_SoftwareFile table (I really hope this is not a view, but the v_ at the start kind of makes me think this one is).
on column ResourceID in v_GS_SYSTEM table (same concerned comment here)
I have the following query which i am directly executing in my Code & putting it in datatable. The problem is it is taking more than 10 minutes to execute this query. The main part which is taking time is NON EXISTS.
SELECT
[t0].[PayrollEmployeeId],
[t0].[InOutDate],
[t0].[InOutFlag],
[t0].[InOutTime]
FROM [dbo].[MachineLog] AS [t0]
WHERE
([t0].[CompanyId] = 1)
AND ([t0].[InOutDate] >= '2016-12-13')
AND ([t0].[InOutDate] <= '2016-12-14')
AND
( NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[TO_Entry] AS [t1]
WHERE
([t1].[EmployeeId] = [t0].[PayrollEmployeeId])
AND ([t1]. [CompanyId] = 1)
AND ([t0].[PayrollEmployeeId] = [t1].[EmployeeId])
AND (([t0].[InOutDate]) = [t1].[Entry_Date])
AND ([t1].[Entry_Method] = 'M')
))
)
ORDER BY
[t0].[PayrollEmployeeId], [t0].[InOutDate]
Is there any way i can optimize this query? What is the work around for this. It is taking too much of time.
It seems that you can convert the NOT EXISTS into a LEFT JOIN query with second table returning NULL values
Please check following SELECT and modify if required to fulfill your requirements
SELECT
[t0].[PayrollEmployeeId], [t0].[InOutDate], [t0].[InOutFlag], [t0].[InOutTime]
FROM [dbo].[MachineLog] AS [t0]
LEFT JOIN [dbo].[TO_Entry] AS [t1]
ON [t1].[EmployeeId] = [t0].[PayrollEmployeeId]
AND [t0].[PayrollEmployeeId] = [t1].[EmployeeId]
AND [t0].[InOutDate] = [t1].[Entry_Date]
AND [t1]. [CompanyId] = 1
AND [t1].[Entry_Method] = 'M'
WHERE
([t0].[CompanyId] = 1)
AND ([t0].[InOutDate] >= '2016-12-13')
AND ([t0].[InOutDate] <= '2016-12-14')
AND [t1].[EmployeeId] IS NULL
ORDER BY
[t0].[PayrollEmployeeId], [t0].[InOutDate]
You will realize that there is an informative message on the execution plan for your query
It is informing that there is a missing cluster index with an effect of 30% on the execution time
It seems that transaction data is occurring based on some date fields like Entry time.
Dates fields especially on your case are strong candidates for clustered indexes. You can create an index on Entry_Date column
I guess you have already some index on InOutDate
You can try indexing this field as well
I am having an issue with the following query returning results a bit too slow and I suspect I am missing something basic. My initial guess is the 'CASE' statement is taking too long to process its result on the underlying data. But it could be something in the derived tables as well.
The question is, how can I speed this up? Are there any glaring errors in the way I am pulling the data? Am I running into a sorting or looping issues somewhere? The query runs for about 40 seconds, which seems quite long. C# is my primary expertise, SQL is a work in progress.
Note I am not asking "write my code" or "fix my code". Just for a pointer in the right direction, I can't seem to figure out where the slow down occurs. Each derived table runs very quickly (less than a second) by themselves, the joins seem correct and the result set is returning exactly what I need. It's just too slow and I'm sure there are better SQL scripter's out there ;) Any tips would be greatly appreciated!
SELECT
hdr.taker
, hdr.order_no
, hdr.po_no as display_po
, cust.customer_name
, hdr.customer_id
, 'INCORRECT-LARGE ORDER' + CASE
WHEN (ext_price_calc >= 600.01 and ext_price_calc <= 800) and fee_price.unit_price <> round(ext_price_calc * -.01,2)
THEN '-1%: $' + cast(cast(ext_price_calc * -.01 as decimal(18,2)) as varchar(255))
WHEN ext_price_calc >= 800.01 and ext_price_calc <= 1000 and fee_price.unit_price <> round(ext_price_calc * -.02,2)
THEN '-2%: $' + cast(cast(ext_price_calc * -.02 as decimal(18,2)) as varchar(255))
WHEN ext_price_calc > 1000 and fee_price.unit_price <> round(ext_price_calc * -.03,2)
THEN '-3%: $' + cast(cast(ext_price_calc * -.03 as decimal(18,2)) as varchar(255))
ELSE
'OK'
END AS Status
FROM
(myDb_view_oe_hdr hdr
LEFT OUTER JOIN myDb_view_customer cust
ON hdr.customer_id = cust.customer_id)
LEFT OUTER JOIN wpd_view_sales_territory_by_customer territory
ON cust.customer_id = territory.customer_id
LEFT OUTER JOIN
(select
order_no,
SUM(ext_price_calc) as ext_price_calc
from
(select
hdr.order_no,
line.item_id,
(line.qty_ordered - isnull(qty_canceled,0)) * unit_price as ext_price_calc
from myDb_view_oe_hdr hdr
left outer join myDb_view_oe_line line
on hdr.order_no = line.order_no
where
line.delete_flag = 'N'
AND line.cancel_flag = 'N'
AND hdr.projected_order = 'N'
AND hdr.delete_flag = 'N'
AND hdr.cancel_flag = 'N'
AND line.item_id not in ('LARGE-ORDER-1%','LARGE-ORDER-2%', 'LARGE-ORDER-3%', 'FUEL','NET-FUEL', 'CONVENIENCE-FEE')) as line
group by order_no) as order_total
on hdr.order_no = order_total.order_no
LEFT OUTER JOIN
(select
order_no,
count(order_no) as convenience_count
from oe_line with (nolock)
left outer join inv_mast inv with (nolock)
on oe_line.inv_mast_uid = inv.inv_mast_uid
where inv.item_id in ('LARGE-ORDER-1%','LARGE-ORDER-2%', 'LARGE-ORDER-3%')
and oe_line.delete_flag <> 'Y'
group by order_no) as fee_count
on hdr.order_no = fee_count.order_no
INNER JOIN
(select
order_no,
unit_price
from oe_line line with (nolock)
where line.inv_mast_uid in (select inv_mast_uid from inv_mast with (nolock) where item_id in ('LARGE-ORDER-1%','LARGE-ORDER-2%', 'LARGE-ORDER-3%'))) as fee_price
ON fee_count.order_no = fee_price.order_no
WHERE
hdr.projected_order = 'N'
AND hdr.cancel_flag = 'N'
AND hdr.delete_flag = 'N'
AND hdr.completed = 'N'
AND territory.territory_id = ‘CUSTOMERTERRITORY’
AND ext_price_calc > 600.00
AND hdr.carrier_id <> '100004'
AND fee_count.convenience_count is not null
AND CASE
WHEN (ext_price_calc >= 600.01 and ext_price_calc <= 800) and fee_price.unit_price <> round(ext_price_calc * -.01,2)
THEN '-1%: $' + cast(cast(ext_price_calc * -.01 as decimal(18,2)) as varchar(255))
WHEN ext_price_calc >= 800.01 and ext_price_calc <= 1000 and fee_price.unit_price <> round(ext_price_calc * -.02,2)
THEN '-2%: $' + cast(cast(ext_price_calc * -.02 as decimal(18,2)) as varchar(255))
WHEN ext_price_calc > 1000 and fee_price.unit_price <> round(ext_price_calc * -.03,2)
THEN '-3%: $' + cast(cast(ext_price_calc * -.03 as decimal(18,2)) as varchar(255))
ELSE
'OK' END <> 'OK'
Just as a clue to the right direction for optimization:
When you do an OUTER JOIN to a query with calculated columns, you are guaranteeing not only a full table scan, but that those calculations must be performed against every row in the joined table. It appears that you can actually do your join to oe_line without the column calculations (i.e. by filtering ext_price_calc to a specific range).
You don't need to do most of the subqueries that are in your query--the master query can be recrafted to use regular table join syntax. Joins to subqueries containing subqueries presents a challenge to the SQL optimizer that it may not be able to meet. But by using regular joins, the optimizer has a much better chance at identifying more efficient query strategies.
You don't tag which SQL engine you're using. Every database has proprietary extensions that may allow for speedier or more efficient queries. It would be easier to provide useful feedback if you indicated whether you were using MySQL, SQL Server, Oracle, etc.
Regardless of the database you're using, reviewing the query plan is always a good place to start. This will tell you where most of the I/O and time in your query is being spent.
Just on general principle, make sure your statistics are up-to-date.
It's may not be solvable by any of us without the real stuff to test with.
IF that's the case and nobody else posts the answer, I can still help. Here is how to trouble shoot it.
(1) take joins and pieces out one by one.
(2) this will cause errors. Remove or fake the references to get rid of them.
(3) see how that works.
(4) Put items back before you try taking something else out
(5) keep track...
(6) also be aware where a removal of something might drastically reduce the result set.
You might find you're missing an index or some other smoking gun.
I was having the same problem and I was able to solve it by indexing one of the tables and setting a primary key.
I strongly suspect that the problem lies in the number of joins you're doing. A lot of databases do joins basically by systemically checking all possible combinations of the various tables as being valid - so if you're joinging table A and B on column C, and A looks like:
Name:C
Fred:1
Alice:2
Betty:3
While B looks like:
C:Pet
1:Alligator
2:Lion
3:T-Rex
When you do the join, it checks all 9 possibilities:
Fred:1:1:Alligator
Fred:1:2:Lion
Fred:1:3:T-Rex
Alice:2:1:Alligator
Alice:2:2:Lion
Alice:2:3:T-Rex
Betty:3:1:Alligator
Betty:3:2:Lion
Betty:3:3:T-Rex
And goes through and deletes the non-matching ones:
Fred:1:1:Alligator
Alice:2:2:Lion
Betty:3:3:T-Rex
... which means with three entries in each table, it creates nine temporary records, sorts through them all, and deletes six of them ... all before it actually sorts through the results for what you're after (so if you are looking for Betty's Pet, you only want one row on that final result).
... and you're doing how many joins and sub-queries?