SQL Query with non exists optimize - sql

I have the following query which i am directly executing in my Code & putting it in datatable. The problem is it is taking more than 10 minutes to execute this query. The main part which is taking time is NON EXISTS.
SELECT
[t0].[PayrollEmployeeId],
[t0].[InOutDate],
[t0].[InOutFlag],
[t0].[InOutTime]
FROM [dbo].[MachineLog] AS [t0]
WHERE
([t0].[CompanyId] = 1)
AND ([t0].[InOutDate] >= '2016-12-13')
AND ([t0].[InOutDate] <= '2016-12-14')
AND
( NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[TO_Entry] AS [t1]
WHERE
([t1].[EmployeeId] = [t0].[PayrollEmployeeId])
AND ([t1]. [CompanyId] = 1)
AND ([t0].[PayrollEmployeeId] = [t1].[EmployeeId])
AND (([t0].[InOutDate]) = [t1].[Entry_Date])
AND ([t1].[Entry_Method] = 'M')
))
)
ORDER BY
[t0].[PayrollEmployeeId], [t0].[InOutDate]
Is there any way i can optimize this query? What is the work around for this. It is taking too much of time.

It seems that you can convert the NOT EXISTS into a LEFT JOIN query with second table returning NULL values
Please check following SELECT and modify if required to fulfill your requirements
SELECT
[t0].[PayrollEmployeeId], [t0].[InOutDate], [t0].[InOutFlag], [t0].[InOutTime]
FROM [dbo].[MachineLog] AS [t0]
LEFT JOIN [dbo].[TO_Entry] AS [t1]
ON [t1].[EmployeeId] = [t0].[PayrollEmployeeId]
AND [t0].[PayrollEmployeeId] = [t1].[EmployeeId]
AND [t0].[InOutDate] = [t1].[Entry_Date]
AND [t1]. [CompanyId] = 1
AND [t1].[Entry_Method] = 'M'
WHERE
([t0].[CompanyId] = 1)
AND ([t0].[InOutDate] >= '2016-12-13')
AND ([t0].[InOutDate] <= '2016-12-14')
AND [t1].[EmployeeId] IS NULL
ORDER BY
[t0].[PayrollEmployeeId], [t0].[InOutDate]

You will realize that there is an informative message on the execution plan for your query
It is informing that there is a missing cluster index with an effect of 30% on the execution time
It seems that transaction data is occurring based on some date fields like Entry time.
Dates fields especially on your case are strong candidates for clustered indexes. You can create an index on Entry_Date column
I guess you have already some index on InOutDate
You can try indexing this field as well

Related

Query takes too long to run, how to optimize it?

The query structure: Helper-select in "with" clause - selects most recent entry using 'top 1 transaction_date'. Then does many joins. It takes too much time to run - what am I doing wrong?
CREATE VIEW [IRWSMCMaterialization].[FactInventoryItemOnHandDailyView] AS
WITH TempTBLFactIvnItmDaily AS (
SELECT TOP 20
ITEM_NUMBER AS [InventoryItemNumber]
,CAST(FORMAT(TRANSACTION_DATE, 'yyyyMMdd') AS INT) AS [DateKey]
,BRANCH_PLANT_FHK AS [BranchPlantKey]
,BRANCH_PLANT_CODE AS [BranchPlantCode]
,CAST(QUANTITY_ON_HAND AS BIGINT) AS [QuantityOnHand]
,TRANSACTION_DATE AS [Date]
,WAREHOUSE_LOCATION_FHK AS [WarehouseLocationKey]
,WAREHOUSE_LOCATION_CODE AS [WarehouseLocationCode]
,WAREHOUSE_LOT_NUMBER_CODE AS [WarehouseLotNumber]
,WAREHOUSE_LOT_NUMBER_FHK AS [WarehouseLotNumberKey]
,UNIT_OF_MEASURE AS [UnitOfMeasureName]
,UNIT_OF_MEASURE_PHK AS [UnitOfMeasureKey]
FROM dbo.RS_INV_ITEM_ON_HAND
-- below is where clause, choose only most recent entry
WHERE TRANSACTION_DATE = (SELECT TOP 1 TRANSACTION_DATE FROM dbo.RS_INV_ITEM_ON_HAND ORDER BY TRANSACTION_DATE DESC)
)
SELECT [InventoryItemNumber],
[DateKey],
[Date],
[BranchPlantCode] AS [BP],
[WarehouseLocationCode] AS [Location],
[QuantityOnHand],
[UnitOfMeasureName] AS [UoM],
CASE [WarehouseLotNumber]
WHEN 'Not Assigned' THEN NULL
ELSE [WarehouseLotNumber]
END
AS [Lot]
FROM TempTBLFactIvnItmDaily iioh
JOIN DWH.DimBranchPlant bp ON iioh.BranchPlantKey = bp.BRANCH_PLANT_PHK
JOIN DWH.DimWarehouseLocation wloc ON iioh.WarehouseLocationKey = wloc.WAREHOUSE_LOCATION_PHK
JOIN DWH.DimWarehouseLotNumber wlot ON iioh.WarehouseLotNumberKey = wlot.WarehouseLotNumber_PHK
JOIN DWH.DimUnitOfMeasure uom ON CAST(iioh.UnitOfMeasureKey AS VARCHAR(100)) = uom.UNIT_OF_MEASURE_PHK
where bp.BRANCH_PLANT_CODE = '96100'
AND iioh.QuantityOnHand > 0
AND (wloc.WAREHOUSE_LOCATION_CODE like '6000W01%' OR wloc.WAREHOUSE_LOCATION_CODE like 'BL%')
GO
There are a lot of things that does not seems good. First of all, your base query must be a lot simpler. Something like this:
SELECT iioh.ITEM_NUMBER AS [InventoryItemNumber],
CAST(FORMAT(iioh.TRANSACTION_DATE, 'yyyyMMdd') AS INT) AS [DateKey],
iioh.TRANSACTION_DATE AS [Date],
iioh.BRANCH_PLANT_CODE AS [BP],
iioh.WAREHOUSE_LOCATION_CODE AS [Location],
CAST(iioh.QUANTITY_ON_HAND AS BIGINT) AS [QuantityOnHand],
iioh.UNIT_OF_MEASURE AS [UoM],
NULLIF(iioh.WAREHOUSE_LOT_NUMBER_CODE, 'Not Assigned') AS [Lot]
FROM dbo.RS_INV_ITEM_ON_HAND iioh
JOIN DWH.DimBranchPlant bp
ON iioh.BranchPlantKey = bp.BRANCH_PLANT_PHK
JOIN DWH.DimWarehouseLocation wloc
ON iioh.WarehouseLocationKey = wloc.WAREHOUSE_LOCATION_PHK
JOIN DWH.DimUnitOfMeasure uom
ON CAST(iioh.UnitOfMeasureKey AS VARCHAR(100)) = uom.UNIT_OF_MEASURE_PHK
where bp.BRANCH_PLANT_CODE = '96100'
AND iioh.QuantityOnHand > 0
AND (wloc.WAREHOUSE_LOCATION_CODE like '6000W01%' OR wloc.WAREHOUSE_LOCATION_CODE like 'BL%')
AND iioh.TRANSACTION_DATE = #TRANSACTION_DATE
For example, you are joining the DWH.DimWarehouseLotNumber but you are not extracting columns - do you really need it? Also, there are other columns which are not returned by the view - why to query them?
From, there you are first filtering by date and then y other fields, so your first TOP 20 records may be filtered by the next conditions - is this a behavior you want?
Also, do you really want this cast?
ON CAST(iioh.UnitOfMeasureKey AS VARCHAR(100)) = uom.UNIT_OF_MEASURE_PHK
It's better to use CONVERT, not FORMAT in performance aspect. Also, why not saving/materializing the TRANSACTION_DATE as INT (for example using a persisted computed column or just on CRUD) instead of calculating this value on each read?
Filtering by location code using LIKE clause can heart the performance, too. Why not adding a new column WareHouseLocationCodeType and set a same value for all locations satisfying this condition:
(wloc.WAREHOUSE_LOCATION_CODE like '6000W01%' OR wloc.WAREHOUSE_LOCATION_CODE like 'BL%')
Then you can filter by this column in the view since this is very important for you. Also, you can create filter index on this column to increase the performance, more.
Also, you may want to create a inline-function instead a view and pass the date as parameter:
CREATE OR ALTER FUNCTION [IRWSMCMaterialization].[FactInventoryItemOnHandDailyView]
(
#TRANSACTION_DATE datetime
)
RETURNS TABLE
AS
RETURN
(
SELECT iioh.ITEM_NUMBER AS [InventoryItemNumber],
CAST(FORMAT(iioh.TRANSACTION_DATE, 'yyyyMMdd') AS INT) AS [DateKey],
iioh.TRANSACTION_DATE AS [Date],
iioh.BRANCH_PLANT_CODE AS [BP],
iioh.WAREHOUSE_LOCATION_CODE AS [Location],
CAST(iioh.QUANTITY_ON_HAND AS BIGINT) AS [QuantityOnHand],
iioh.UNIT_OF_MEASURE AS [UoM],
NULLIF(iioh.WAREHOUSE_LOT_NUMBER_CODE, 'Not Assigned') AS [Lot]
,iioh.TRANSACTION_DATE
FROM dbo.RS_INV_ITEM_ON_HAND iioh
JOIN DWH.DimBranchPlant bp
ON iioh.BranchPlantKey = bp.BRANCH_PLANT_PHK
JOIN DWH.DimWarehouseLocation wloc
ON iioh.WarehouseLocationKey = wloc.WAREHOUSE_LOCATION_PHK
JOIN DWH.DimUnitOfMeasure uom
ON CAST(iioh.UnitOfMeasureKey AS VARCHAR(100)) = uom.UNIT_OF_MEASURE_PHK
where bp.BRANCH_PLANT_CODE = '96100'
AND iioh.QuantityOnHand > 0
AND (wloc.WAREHOUSE_LOCATION_CODE like '6000W01%' OR wloc.WAREHOUSE_LOCATION_CODE like 'BL%')
AND iioh.TRANSACTION_DATE = #TRANSACTION_DATE
)
Then call it like this:
SELECT TOP 20 *
FROM [IRWSMCMaterialization].[FactInventoryItemOnHandDailyView] ('2020-12-04')
ORDER BY #TRANSACTION_DATE DESC
The query optimization is science today. If you want to find bottlenecks in your query you can follow some of these steps:
As the first step, enable statistics with these commands:
SET STATISTICS TIME ON;
SET STATISTICS IO ON;
Once you execute these commands in some query windows in the same window execute your query. When your query is executed switch to the Messages tab and you will see a lot of useful information like TIME execution, parse and compile-time and maybe the most interesting I/O reads.
As the second step, try to understand which table has a lot of reads, for example if you are expecting 10 rows from the query, but in some tables you have 10k or 100k logical reads something is wrong. That means for the 10 rows query execution from one table reads 10k pages. Obviously you are missing some index on this table, try to find which index you need.
If you are having some static values in where clause like the following one, then think about Filtered Index:
bp.BRANCH_PLANT_CODE = '96100' AND iioh.QuantityOnHand > 0
Not always, but in some cases conversion can break your indexes if you are casting them or using some other function in where clause like the following one, even you have an index on this column query optimizer will not use it in query execution:
CAST(iioh.UnitOfMeasureKey AS VARCHAR(100))
The last one, if you have OR logical operator in your query try to execute one by one part of your OR logical operator separately see to performance. This logical operator can really kill your query, and this is one example:
AND (wloc.WAREHOUSE_LOCATION_CODE like '6000W01%' OR wloc.WAREHOUSE_LOCATION_CODE like 'BL%')
Once, you determine here that you don't have any issues you can go more further.

Please help to optimise the query

I am new to sql query things. Below sql query is taking almost 4 minutes to run because of which page remains stuck for that much time.We need to optimise this query so that user is not stuck there for that much time
Please help to optimise.
We are using oracle db
Some improvements can be made by having 2 pre-computed columns in aaa_soldto_xyz table:
aaa_soldto_xyz.ID1 = Substr(aaa_soldto_xyz.xyz_id, 0, 6)
aaa_soldto_xyz.ID2 = Substr(aaa_soldto_xyz.xyz_id, 1, Length ( aaa_soldto_xyz.xyz_id) - 5) )
Those can make better use of existing or new indexes.
We can not help you in optimizing the query without explain plan.
But obvious improvement needed in this query is: remove abc_customer_address table from subquery(select clause) and do left join in from list and check for the result.
You need to change following clauses.
From clause
Left join (SELECT ADDR.zip_code,
ADDR.ID,
ROW_NUMBER() OVER (PARTITION BY ADDR.ID ORDER BY 1) AS RN
FROM abc_customer_address ADDR
WHERE ADDR.id = abc_customer.billing ) ADDR
ON (ADDR.id = abc_customer.billing AND ADDR.RN = 1)
Select clause:
CASE WHEN abc_customer_address.zip_code IS NULL THEN ADDR.zip_code
ELSE abc_customer_address.zip_code
END AS ZIP_CODE,
Cheers!!

How to improve query performance in Oracle

Below sql query is taking too much time for execution. It might be due to repetitive use of same table in from clause. I am not able to find out how to fix this query so that performance would be improve.
Can anyone help me out with this?
Thanks in advance !!
select --
from t_carrier_location act_end,
t_location end_loc,
t_carrier_location act_start,
t_location start_loc,
t_vm_voyage_activity va,
t_vm_voyage v,
t_location_position lp_start,
t_location_position lp_end
where act_start.carrier_location_id = va.carrier_location_id
and act_start.carrier_id = v.carrier_id
and act_end.carrier_location_id =
decode((select cl.carrier_location_id
from t_carrier_location cl
where cl.carrier_id = act_start.carrier_id
and cl.carrier_location_no =
act_start.carrier_location_no + 1),
null,
(select cl2.carrier_location_id
from t_carrier_location cl2, t_vm_voyage v2
where v2.hire_period_id = v.hire_period_id
and v2.voyage_id =
(select min(v3.voyage_id)
from t_vm_voyage v3
where v3.voyage_id > v.voyage_id
and v3.hire_period_id = v.hire_period_id)
and v2.carrier_id = cl2.carrier_id
and cl2.carrier_location_no = 1),
(select cl.carrier_location_id
from t_carrier_location cl
where cl.carrier_id = act_start.carrier_id
and cl.carrier_location_no =
act_start.carrier_location_no + 1))
and lp_start.location_id = act_start.location_id
and lp_start.from_date <=
nvl(act_start.actual_dep_time, act_start.actual_arr_time)
and (lp_start.to_date is null or
lp_start.to_date >
nvl(act_start.actual_dep_time, act_start.actual_arr_time))
and lp_end.location_position_id = act_end.location_id
and lp_end.from_date <=
nvl(act_end.actual_dep_time, act_end.actual_arr_time)
and (lp_end.to_date is null or
lp_end.to_date >
nvl(act_end.actual_dep_time, act_end.actual_arr_time))
and act_end.location_id = end_loc.location_id
and act_start.location_id = start_loc.location_id;
There is no Stright forward one answer for your question and the query you've mentioned.
In order to get a better response time of any query, you need to keep few things in mind while writing your queries. I will mention few here which appeared to be important for your query
Use joins instead of subqueries.
Use EXPLAIN to determine queries are functioning appropriately.
Use the columns which are having indexes with your where clause else create an index on those columns. here use your common sense which are the columns to be indexed ex: foreign key columns, deleted, orderCreatedAt, startDate etc.
Keep the order of the select columns as they appear at the table instead of arbitrarily selecting columns.
The above four points are enough for the query you've provided.
To dig deep about SQL optimization and tuning refer this https://docs.oracle.com/database/121/TGSQL/tgsql_intro.htm#TGSQL130

SQL Query optimization

I have some questions about my query. I call this store-procedure in my first page, so it is important for me if it is optimize enough.
I do some select with some basic where expression, Then I filter them with some expression I passed through this store-procedure.
It is also considerable for me to select top n and its gonna search through millions of items (but I have hundreds of items already) and then do some paging in my website.
Select top (#NumberOfRows)
...
from(
SELECT
row_number() OVER (ORDER BY tblEventOpen.TicketAt, tblEvent.EventName, tblEventDetail.TimeStart) as RowNumber
, ...
FROM --[...some inner join logic...]
WHERE
(tblEventOpen.isValid = 1) AND (tblEvent.isValid = 1) and
(tblCondition_ResellerDetail.ResellerID = 1) AND
(tblEventOpen.TicketAt >= GETDATE()) AND
(GETDATE() BETWEEN
DATEADD(minute, (tblEventDetail.TimeStart - 60 * tblCondition_ResellerDetail.StartTime) , tblEventOpen.TicketAt)
AND DATEADD(minute, (tblEventDetail.TimeStart - 60 * tblCondition_ResellerDetail.EndTime) , tblEventOpen.TicketAt))
) as t1
where RowNumber >= (#PageNumber -1) * #NumberOfRows and
(#city='' or #city is null or city like #city) and
(#At is null or #At=At) and
(#TimeStartInMinute=-1 or #TimeStartInMinute=TimeStartInMinute) and
(#EventName='' or EventName like #EventName) and
(#CategoryID=-1 or #CategoryID = CategoryID) and
(#EventID is null or #EventID = EventID) and
(#DetailID is null or #DetailID = DetailID)
ORDER BY RowNumber
I'm worry about this part:
(GETDATE() BETWEEN
DATEADD(minute, (tblEventDetail.TimeStart - 60 * tblCondition_ResellerDetail.StartTime) , tblEventOpen.TicketAt)
AND DATEADD(minute, (tblEventDetail.TimeStart - 60 * tblCondition_ResellerDetail.EndTime) , tblEventOpen.TicketAt))
How does table t1 execute? I mean after I put some where expression after t1 (line 17 and further), does it filter items after execution of t1? for example I filter result by rownumber of 10, so it mean the inner (...) as t1 select will only return 10 items, or it select all items then my outer select will take 10 of them?
I want to filter my result by some optional parameters, so I put something like #DetailID is null or #DetailID = DetailID, is it a good way?
Anything else should I consider to make it faster (more optimize)?
My comment on your query:
You're correct, you should worry about condition "GETDATE() BETWEEN ...". Comparing value with function involving more than 1 table will most likely scan entire search space. Simplify your condition or if possible add a computed column for such function
Put all conditions except "RowNumber >= ..." in inner query
Its okay to put optional condition the way you do. I do it too :-)
Make sure you have index at least one for each column employed in the where clause as the first column of the index, and then the primary key. It would be better if your primary key is clustered
Well, these are based on my own experience. It may or may be not applicable to your situation.
[UPDATE] Here's the complete query
Select top (#NumberOfRows)
...
from(
SELECT
row_number() OVER (ORDER BY tblEventOpen.TicketAt, tblEvent.EventName, tblEventDetail.TimeStart) as RowNumber
, ...
FROM --[...some inner join logic...]
WHERE
(tblEventOpen.isValid = 1) AND (tblEvent.isValid = 1) and
(tblCondition_ResellerDetail.ResellerID = 1) AND
(tblEventOpen.TicketAt >= GETDATE()) AND
(GETDATE() BETWEEN
DATEADD(minute, (tblEventDetail.TimeStart - 60 * tblCondition_ResellerDetail.StartTime) , tblEventOpen.TicketAt)
AND DATEADD(minute, (tblEventDetail.TimeStart - 60 * tblCondition_ResellerDetail.EndTime) , tblEventOpen.TicketAt)) and
(#city='' or #city is null or city like #city) and
(#At is null or #At=At) and
(#TimeStartInMinute=-1 or #TimeStartInMinute=TimeStartInMinute) and
(#EventName='' or EventName like #EventName) and
(#CategoryID=-1 or #CategoryID = CategoryID) and
(#EventID is null or #EventID = EventID) and
(#DetailID is null or #DetailID = DetailID)
) as t1
where RowNumber >= (#PageNumber -1) * #NumberOfRows
ORDER BY RowNumber
Whilst you can seek advice on your query, it is better to learn how to optimise it yourself.
You need to view the execution plan, identify the bottlenecks and then see if there is anything that can be done to make an improvement.
In SSMS you can click "Query" ---> "Include Actual Execution Plan" before you run your query. (Ctrl+M) is they keyboard shortcut.
Then execute your query. SSMS will create a new tab in the results pane. Which will show you how the SQL engine executes your query, you can hover over each node for more information. The cost % will be particularly interesting, allowing you to see the most expensive part of your query.
It's difficult to advise you any more without that execution plan, which is why a number of people commented on your question. Your schema and indexes change how the query is executed, so it's not something that someone can accuratly replicate in their own environment without scripts for tables / indexes etc.... Even then statistics could be out of date and other problems could arise.
You can also execute SET STATISTICS PROFILE ON to get a textual view of the plan (maybe useful to seek help).
There are a number of articles that can help you fix the bottlenecks, or post another question for more advice.
http://msdn.microsoft.com/en-us/library/ms178071.aspx
SQL Server Query Plan Analysis
Execution Plan Basics

Bad performance of SQL query due to ORDER BY clause

I have a query joining 4 tables with a lot of conditions in the WHERE clause. The query also includes ORDER BY clause on a numeric column. It takes 6 seconds to return which is too long and I need to speed it up. Surprisingly I found that if I remove the ORDER BY clause it takes 2 seconds. Why the order by makes so massive difference and how to optimize it? I am using SQL server 2005. Many thanks.
I cannot confirm that the ORDER BY makes big difference since I am clearing the execution plan cache. However can you shed light at how to speed this up a little bit? The query is as follows (for simplicity there is "SELECT *" but I am only selecting the ones I need).
SELECT *
FROM View_Product_Joined j
INNER JOIN [dbo].[OPR_PriceLookup] pl on pl.siteID = NodeSiteID and pl.skuid = j.skuid
LEFT JOIN [dbo].[OPR_InventoryRules] irp on irp.ID = pl.SkuID and irp.InventoryRulesType = 'Product'
LEFT JOIN [dbo].[OPR_InventoryRules] irs on irs.ID = pl.siteID and irs.InventoryRulesType = 'Store'
WHERE (((((SiteName = N'EcommerceSite') AND (Published = 1)) AND (DocumentCulture = N'en-GB')) AND (NodeAliasPath LIKE N'/Products/Cats/Computers/Computer-servers/%')) AND ((NodeSKUID IS NOT NULL) AND (SKUEnabled = 1) AND pl.PriceLookupID in (select TOP 1 PriceLookupID from OPR_PriceLookup pl2 where pl.skuid = pl2.skuid and (pl2.RoleID = -1 or pl2.RoleId = 13) order by pl2.RoleID desc)))
ORDER BY NodeOrder ASC
Why the order by makes so massive difference and how to optimize it?
The ORDER BY needs to sort the resultset which may take long if it's big.
To optimize it, you may need to index the tables properly.
The index access path, however, has its drawbacks so it can even take longer.
If you have something other than equijoins in your query, or the ranged predicates (like <, > or BETWEEN, or GROUP BY clause), then the index used for ORDER BY may prevent the other indexes from being used.
If you post the query, I'll probably be able to tell you how to optimize it.
Update:
Rewrite the query:
SELECT *
FROM View_Product_Joined j
LEFT JOIN
[dbo].[OPR_InventoryRules] irp
ON irp.ID = j.skuid
AND irp.InventoryRulesType = 'Product'
LEFT JOIN
[dbo].[OPR_InventoryRules] irs
ON irs.ID = j.NodeSiteID
AND irs.InventoryRulesType = 'Store'
CROSS APPLY
(
SELECT TOP 1 *
FROM OPR_PriceLookup pl
WHERE pl.siteID = j.NodeSiteID
AND pl.skuid = j.skuid
AND pl.RoleID IN (-1, 13)
ORDER BY
pl.RoleID desc
) pl
WHERE SiteName = N'EcommerceSite'
AND Published = 1
AND DocumentCulture = N'en-GB'
AND NodeAliasPath LIKE N'/Products/Cats/Computers/Computer-servers/%'
AND NodeSKUID IS NOT NULL
AND SKUEnabled = 1
ORDER BY
NodeOrder ASC
The relation View_Product_Joined, as the name suggests, is probably a view.
Could you please post its definition?
If it is indexable, you may benefit from creating an index on View_Product_Joined (SiteName, Published, DocumentCulture, SKUEnabled, NodeOrder).