Performance slow in query its getting slow due to DATEDIFF function - sql

I am writing a SQL query which gives me a slow performance. Because of DATEDIFF function that it gives me no any result into mails. Please help me to remake this query so that my output results faster. I will put the query below
SELECT DISTINCT isnull(hrr.SourceEmailID,'')
,'Interview Rejected To Employee'
WHERE Title = 'Interview Reject Mail To Employee (Applicant Source- EGES)'
FROM hc_resume_bank hrb WITH (NOLOCK)
INNER JOIN hc_req_resume hrr WITH (NOLOCK)
ON hrr.resid = HRB.rid
WHERE hrrss.stageid = 4
AND hrrss.statusid = 9
AND hrr.SourceID = 4
AND isnull(hrb.SourceEmailId, '') <> ''
AND isnull(hrr.SourceEmailId, '') <> ''
and hrr.AddedType=10
AND Datediff(MI, dateadd(mi, 330, hrrss.StatusDate), DATEADD(mi, 330, GETUTCDATE())) <=5

Assuming that you have established that datediff is the root cause of poor performance, I suggest changing this:
Datediff(MI, dateadd(mi, 330, hrrss.StatusDate), DATEADD(mi, 330, GETUTCDATE())) <=5
to this:
hrrss.StatusDate >= DATEADD(MI, -5, GETDATE())
This assumes dates in StatusDate are same timezone as the server.

Salmon A has a great answer that I'd like to expand on.
Similar to why Salman A suggested you move the function to the right side of your where clause for hrrss.StatusDate, the same applies to SourceEmailId, as putting a function on the left prevents the use of an index on these columns.
However, ISNULL() is a bit more tricky to resolve, and there are several possible ways it could be addressed.
Consider if the column should really allow NULLS, and if altering the column to not allow NULLS is an option. Then your where clause would look like this.
AND hrb.SourceEmailId <> ''
AND hrr.SourceEmailId <> ''
It's also possible that SourceEmailId is always ether going to have a valid value, or be NULL. This would be preferred, as NULL should be used where a value is unknown. In which case you shouldn't be checking for <> ''. Simply check that email IS NOT NULL.
AND hrb.SourceEmailId IS NOT NULL
AND hrr.SourceEmailId IS NOT NULL
If option 1 and 2 are not an option, then consider a UNION result set. In this case, you'd write a query for hrb.SourceEmailId <> '' and UNION that to the results of a second query for hrb.SourceEmailId IS NOT NULL. Since you have checks for SourceEmailId on two different tables, it could mean as meany as four queries. However, don't get caught up on the fact it's more queries, and that that would somehow mean it'll be slower. If all 4 queries are properly tuned, and each run in 100ms, that's better than one combined query running in 5 minutes.
More details of the issues and possible work around to using ISNULL() can be found in the below links.
What are different ways to replace ISNULL() in a WHERE clause that uses only literal values?
Once these changes have been applied, you'll have a query that can actually use indexes on these columns. At that point, I'd start reviewing your execution plans and indexes, and possibly looking at removing the DISTINCT. But, as long as you have several WHERE clauses in your query that are going to force a SCAN every time they execute, doing these things now won't yield much benefit.

DISTINCT in a query is almost always an indicator for a badly written query, where the author joins a lot of tables, builds a huge intermediate result thus that they must then boil down to its real size with DISTINCT. This is a costly operation. It seems to apply to your query. If you simply want to make sure that the hc_req_resume.resid has an entry in hc_resume_bank with a sourceemailid, then use EXISTS or IN for this lookup, not a join.
Your query with appropriate lookup clauses:
,'Interview Rejected To Employee'
FROM hcm_template_library
WHERE title = 'Interview Reject Mail To Employee (Applicant Source- EGES)'
FROM hc_req_resume hrr
WHERE hrr.sourceid = 4
AND hrr.addedtype = 10
AND hrr.resid IN
SELECT hrb.rid
FROM hc_resume_bank hrb
WHERE hrb.sourceemailid <> ''
AND hrr.rid IN
SELECT hrrss.reqresid
FROM hc_req_resume_stage_status hrrss
WHERE hrrss.stageid = 4
AND hrrss.statusid = 9
AND hrrss.statusdate >= DATEADD(MI, -5, GETUTCDATE())
AND hrr.sourceid IN (SELECT hrs.rid FROM hcm_resume_source hrs)
AND hrr.rid IN (SELECT hrris.reqresid FROM hc_req_res_interview_stages);
The naming of the columns doesn't make things easier here. Why is the column sometimes called rid and sometimes reqresid? And then I see a rid combined with a resid. Is this just yet another name for the same thing? Or are there two meanings of a rid? And what is the table called the ID actually refers to? Is there a table called r or reqres or res? It doesn't seem so, but why does the ID of the table have a different name from the table, so the reader must guess what is what? We cannot even make much of a guess, if it is possible for a rid not to have a match in hc_req_res_interview_stages or for a sourceid not to have a match in hcm_resume_source. Usually you have a foreign key constraint on IDs, so either the ID is null (if this is allowed) or it does have a match in the parent table. A lookup would we pointless. Is it in your query? Or arent those tables the parent tables, but just other child tables refering to the same parent?
Remove any lookups that are not needed. The lookups in hcm_resume_source and hc_req_res_interview_stages may be such candidates, but I cannot know.
At last you want appropriate indexes. For hc_req_resume this may be something like
create index idx1 on hc_req_resume (sourceid, addedtype, rid, resid);
Then you may want:
create index idx2 on hc_resume_bank (rid) where sourceemailid <> '';
create index idx3 on hc_req_resume_stage_status (stageid, statusid, statusdate, reqresid);
The order of the columns in the indexes should be adjusted according to their selectivity.

You search for a result in the future, is this correct? -Edit: i realised its just the last 5 minutes you are looking for so in this case you might just as well remove the function on the left and see if this prevents the index scan.
About the slow performance. your query (only focussing on the datediff here) is not sargable this way. SQL server will need compute the column in all the rows on the table first, always resulting in a table scan. Remove the function on the left side.
One way to get around this, is to get the results from the main table first in a sargable way, put in it a temptable and then use the temptable for the function and use its ids to get back to the maintable for the results. See below example.
IF OBJECT_ID('tempdb..#MyTableName') IS NOT NULL
INSERT INTO #MyTableName (ID,StatusDate )
FROM dbo.basetable p
WHERE p.StatusDate > GETUTCDATE() --narrow your date criteria as much as needed
SELECT P.* FROM #MyTableName T
JOIN dbo.basetable P
ON P.Id = T.ID
WHERE Datediff(MI, dateadd(mi, 330, T.StatusDate), DATEADD(mi, 330, GETUTCDATE())) <= 5
If you can create a nonclustered index on your date column and see what it brings. In the way you wrote it, it will always scan the table but at least it has an index. In the sargable way that index will also help a bunch.


TSQL Improving performance of Update cross apply like statement

I have a client with a stored procedure that currently take 25 minutes to run. I have narrowed the cause of this to the following statement (changed column and table names)
UPDATE #customer_emails_tmp
SET #customer_emails_tmp.Possible_Project_Ref = cp.order_project_no,
#customer_emails_tmp.Possible_Project_id = cp.order_uid
FROM #customer_emails_tmp e
SELECT TOP 1 p.order_project_no, p.order_uid
FROM [order] p
WHERE e.Subject LIKE '%' + p.order_title + '%'
AND p.order_date < e.timestamp
ORDER BY p.order_date DESC
) as cp
WHERE e.Possible_Project_Ref IS NULL;
There are 3 slightly different version of the above, joining to 1 of three tables. The issue is the CROSS APPLY LIKE '%' + p.title + '%'. I have tried looking into CONTAINS() and FREETEXT() but as far as my testing and investigations go, you cannot do CONTAINS(e.title, p.title) or FREETEXT(e.title,p.title).
Have I miss read something or is there a better way to write the above query?
Any help on this is much appreciated.
Updated query to actual query used. Execution plan:
Tmp table has the following indexes:
CREATE NONCLUSTERED INDEX ix_tmp_customer_emails_first_recipient ON #customer_emails_tmp (First_Recipient);
CREATE NONCLUSTERED INDEX ix_tmp_customer_emails_first_recipient_domain_name ON #customer_emails_tmp (First_Recipient_Domain_Name);
CREATE NONCLUSTERED INDEX ix_tmp_customer_emails_client_id ON #customer_emails_tmp (customer_emails_client_id);
CREATE NONCLUSTERED INDEX ix_tmp_customer_emails_subject ON #customer_emails_tmp ([subject]);
There is no index on the [order] table for column order_title
Edit 2
The purpose of this SP is to link orders (amongst others) to sent emails. This is done via multiple UPDATE statements; all other update statements are less than a second in length; however, this one ( and 2 others exactly the same but looking at 2 other tables) take an extraordinary amount of time.
I cannot remove the filter on Possible_Project_Ref IS NULL as we only want to update the ones that are null.
Also, I cannot change WHERE e.Subject LIKE '%' + p.order_title + '%' to WHERE e.Subject LIKE p.order_title + '%' because the subject line may not start with the p.order_title, for example it could start with FW: or RE:
Reviewing your execution plan, I think the main issue is you're reading a lot of data from the order table. You are reading 27,447,044 rows just to match up to find 783 rows. Your 20k row temp table is probably nothing by comparison.
Without knowing your data or desired business logic, here's a couple things I'd consider:
Updating First Round of Exact Matches
I know you need to keep your %SearchTerm% parameters, but some data might have exact matches. So if you run an initial update for exact matches, it will reduce the ones you have to search with %SearchTerm%
Run something like this before your current update
/*Recommended index for this update*/
CREATE INDEX ix_test ON [order](order_title,order_date) INCLUDE (order_project_no, order_uid)
UPDATE #customer_emails_tmp
SET Possible_Project_Ref = cp.order_project_no
,Possible_Project_id = cp.order_uid
FROM #customer_emails_tmp e
SELECT TOP 1 p.order_project_no, p.order_uid
FROM [order] p
WHERE e.Subject = p.order_title
AND p.order_date < e.timestamp
ORDER BY p.order_date DESC
) as cp
WHERE e.Possible_Project_Ref IS NULL;
Narrowing Search Range
This will technically change your matching criteria, but there are probably certain logical assumptions you can make that won't impact the final results. Here are a couple of ideas for you to consider, to get you thinking this way, but only you know your business. The end goal should be to narrow the data read from the order table
Is there a customer id you can match on? Something like e.customerID = p.customerID? Do you really match any email to any order?
Can you narrow your search date range to something like x days before timestamp? Do you really need to search all historical orders for all of time? Would you even want a match if an email matches to an order from 5 years ago? For this, try updating your APPLY date filter to something like p.order_date BETWEEN DATEADD(dd,-30,e.[timestamp]) AND e.[timestamp]
Other Miscellaneous Notes
If I'm understanding this correctly, you are trying to link email to some sort of project #. Ideally, when the email are generated, they would be linked to a project immediately. I know this is not always possible resource/time wise, but the clean solution is to calculate this at the beginning of the process, not afterwards. Generally anytime you have to use fuzzy string matching, you will have data issues. I know business always wants results "yesterday" and always pushes for the shortcut, and nobody ever wants to update legacy processes, but sometimes you need to if you want clean data
I'd review your indexes on the temp table. Generally I find the cost to create the indexes and for SQL Server to maintain them as I update the temp table is not worth it. So 9 times out of 10, I leave the temp table as a plain heap with 0 indexes
First, filter the NULLs when you create #customer_emails_tmp, not after. Then you can lose:
WHERE e.Possible_Project_Ref IS NULL. This way you are only bringing in rows you need instead of retrieving rows you don't need, then filtering them.
Next, us this for your WHERE clause:
WHERE EXISTS (SELECT 1 FROM [order] AS p WHERE p.order_date < e.timestamp)
If an order date doesn't have any later timestamps in e, none of the rows in e will be considered.
Next remove the timestamp filter from your APPLY subquery. Now your subquery looks like this:
SELECT TOP 1 p.order_project_no, p.order_uid
FROM [order] AS p
WHERE e.Subject LIKE '%' + p.order_title + '%'
ORDER BY p.order_date DESC
This way you are applying your "Subject Like" filter to a much smaller set of rows. The final query would look like this:
UPDATE #customer_emails_tmp
SET #customer_emails_tmp.Possible_Project_Ref = cp.order_project_no,
#customer_emails_tmp.Possible_Project_id = cp.order_uid
FROM #customer_emails_tmp e
SELECT TOP 1 p.order_project_no, p.order_uid
FROM [order] p
WHERE e.Subject LIKE '%' + p.order_title + '%'
ORDER BY p.order_date DESC
) as cp
WHERE EXISTS (SELECT 1 FROM [order] AS p WHERE p.order_date < e.timestamp);

Improve Netezza SQL Query That Contains Hundreds of Strings in WHERE Clause

I have a Netezza query with a WHERE clause that includes several hundred potential strings. I'm surprised that it runs, but it takes time to complete and occasionally errors out ('transaction rolled back by client'). Here's a pseudo code version of my query.
A.I_TS BETWEEN '2017-01-01' AND '2017-01-15'
AND B.TAB_CODE IN ('00AV', '00BX', '00C2', '00DJ'...
) X
In my query, I'm limiting the results on B.TAB_CODE to about 1,200 values (out of more than 10k). I'm honestly surprised that it works at all, but it does most of the time.
Is there a more efficient way to handle this?
If the IN clause becomes too cumbersome, you can make your query in multiple parts. Create a temporary table containing a TAB_CODE set then use it in a JOIN.
WITH tab_codes(tab_code) AS (
--- etc ---
--- etc ---
INNER JOIN tab_codes Q ON B.TAB_CODES = Q.tab_code
If you want to boost performance even more, consider using a real temporary table (CTAS)
We've seen situations where it's "cheaper" to CTAS the original table to another, distributed on your primary condition, and then querying that table instead.
If im guessing correctly , the X.I_TS is in fact a ‘timestamp’, and as such i expect it to contain many different values per day. Can you confirm that?
If I’m right the query can possibly benefit from changing the ‘group by X.I._TS,...’ to ‘group by 1,...’
Furthermore the ‘Count(Distinct Case...’ can never return anything else than 1 or NULL. Can you confirm that?
If I’m right on that, you can get rid of the expensive ‘DISTINCT’ by changing it to ‘MAX(Case...’
Can you follow me :)

SQL Server 2005 Table Spool (Lazy spool) - performance

I have some legacy SQL (SP)
declare #FactorCollectionId int; select #FactorCollectionId = collectionID from dbo.collection where name = 'Factor'
declare #changeDate datetime; set #changeDate = getDate()
declare #changeTimeID int; set #changeTImeID = convert(int, convert(varchar(8), #changeDate, 112))
declare #MaxWindowID int; select #MaxWindowID = MAX(windowID) from dbo.window
select distinct #FactorCollectionId, ElementId, T.TimeID, #changeTimeId ChangeTimeID, 1 UserID, #MaxWindowID, 0 ChangeID
, null TransactionID, SystemSourceID, changeTypeID, 'R' OlapStatus, Comment, Net0 Delta0, Net0
, 1 CreatedBy, 1 UpdatedBy, #changeDate CreatedDate, #changeDate UpdatedDate, 1 CurrentRecord, MeasureTypeID
from dbo.aowCollectedFact FV
inner join dbo.timeView T on T.timeID >= FV.timeID
where FV.currentRecord = 1 --is current record
and T.CurrentHorizon <> 0 --Indicator that Time is part of current horizon
and FV.collectionID = #FactorCollectionId --factor collections only
and FV.timeID = (select MAX(timeID) --latest collected fact timeID for given collectionID and elementID
from aowCollectedFact FV2
where FV2.collectionId = #FactorCollectionId
and FV2.elementId = FV.elementID)
and (((T.ForecastLevel = 'Month') and (T.FirstDayInMonth = T.Date)) --Date is first of month for monthly customers, or
((T.ForecastLevel = 'Quarter')and (T.FirstDayInQuarter = T.Date))) --Date is first of quarter for quarterly customers
and not exists (select 1 --Record does not already exist in collected fact view
from aowCollectedFact FV3 -- for this factor collection, elementID, and timeID
where FV3.collectionId = #FactorCollectionId
and FV3.elementID = FV.elementId
and FV3.timeID = T.timeID)
This SQL processes over 2 million rows. I need to improve its performance. When I look at the execution plan I find that a lot of time is spent on a Table Spool (Lazy spool) operation (indexes exist in tables and they work well).
How to improve performance for this part ?
Before seeing the execution plan or table indices, I'll give best educated guesses. First, here are a couple links worth reading.
showplan operator of the week - lazy spool
Table spool/Lazy spool
INDEXING: Take a look at your indices to make sure that they're all covering the columns that you're selecting out of the tables. You'll want to aim to get all the columns included in JOINs and WHERE clauses within the indices. All other columns that are in the SELECT statements should be INCLUDEd, or covered, by the index.
OPERATORS: See if you can get rid of the not equals ("<>") operators, in favor of a single greater than or less than operator. Can this statement and T.CurrentHorizon <> 0 be changed to this and T.CurrentHorizon > 0?
JOINS: Get rid of the subqueries that are JOINing to tables outside of themselves. For instance, this line and FV2.elementId = FV.elementID might be causing some problems. There's no reason you can't move that out of a subquery and into a JOIN to dbo.aowCollectedFact FV, given that you're GROUPing (DISTINCT) in the main query already.
DISTINCT: Change it to a GROUP BY. I've got no reason other than, because it's good practice and takes two minutes.
LAST NOTE: The exception to all the above might be to leave the final subquery, the IF NOT EXISTS, as a subquery. If you change it to a JOIN, it'll have to be a LEFT JOIN...WHERE NULL statement, which can actually cause spooling operations. No great way to get around that one.

Optimising CTE for recursive queries

I have a table with self join. You can think of the structure as standard table to represent organisational hierarchy. Eg table:-
This table consists of 50000 sample records. I wrote CTE recursive query and it works absolutely fine. However the time it takes to process just 50000 records is round about 3 minutes on my machine (4GB Ram, 2.4 Ghz Core2Duo, 7200 RPM HDD).
How can I possibly improve the performance because 50000 is not so huge number. Over time it will keep on increasing. This is the query which is exactly what I have in my Stored Procedure. The query's purpose is to select all the members that come under a specific member. Eg. Under Owner of the company each and every person comes. For Manager, except Owner all of the records gets returned. I hope you understand the query's purpose.
Alter PROCEDURE spGetNonVirtualizedData
#MemberId int
With MembersCTE As
Select parent.MemberId As MemberId, 0 as Level
From Members as parent Where IsNull(MemberId,0) = IsNull(#MemberId,0)
Union ALL
Select child.MemberId As MemberId , Level + 1 as Level
From Members as child
Inner Join MembersCTE on MembersCTE.MemberId = child.RelatedMemberId
Select Members.*
From MembersCTE
Inner Join Members On MembersCTE.MemberId = Members.MemberId
option(maxrecursion 0)
As you can see to improve the performance, I have even made the Joins at the last step while selecting records so that all unnecessary records do not get inserted into temp table. If I made joins in my base step and recursive step of CTE (instead of Select at the last step) the query takes 20 minutes to execute!
MemberId is primary key in the table.
Thanks in advance :)
In your anchor condition you have Where IsNull(MemberId,0) = IsNull(#MemberId,0) I assume this is just because when you pass NULL as a parameter = doesn't work in terms of bringing back IS NULL values. This will cause a scan rather than a seek.
Use WHERE MemberId = #MemberId OR (#MemberId IS NULL AND MemberId IS NULL) instead which is sargable.
Also I'm assuming that you can't have an index on RelatedMemberId. If not you should add one
CREATE NONCLUSTERED INDEX ix_name ON Members(RelatedMemberId) INCLUDE (MemberId)
(though you can skip the included column bit if MemberId is the clustered index key as it will be included automatically)

Avoiding a nested subquery in SQL

I have a SQL table that contains data of the form:
Id int
EventTime dateTime
CurrentValue int
The table may have multiple rows for a given id that represent changes to the value over time (the EventTime identifying the time at which the value changed).
Given a specific point in time, I would like to be able to calculate the count of distinct Ids for each given Value.
Right now, I am using a nested subquery and a temporary table, but it seems it could be much more efficient.
TOP 1 [CurrentValue]
FROM [ValueHistory]
WHERE [Ids].[Id]=[ValueHistory].[Id] AND
[EventTime] < #StartTime
) as [LastValue]
INTO #temp
FROM [Ids]
SELECT [LastValue], COUNT([LastValue])
FROM #temp
GROUP BY [LastValue]
Here is my first go:
select ids.Id, count( distinct currentvalue)
from ids
join valuehistory vh on =
where vh.eventtime < #StartTime
group by
However, I am not sure I understand your table model very clearly, or the specific question you are trying to solve.
This would be: The distinct 'currentvalues' from valuehistory before a certain date that for each Id.
Is that what you are looking for?
I think I understand your question.
You want to get the most recent value for each id, group by that value, and then see how many ids have that same value? Is this correct?
If so, here's my first shot:
declare #StartTime datetime
set #StartTime = '20090513'
select ValueHistory.CurrentValue, count(
select id, max(EventTime) as LatestUpdateTime
from ValueHistory
where EventTime < #StartTime
group by id
) CurrentValues
inner join ValueHistory on =
and CurrentValues.LatestUpdateTime = ValueHistory.EventTime
group by ValueHistory.CurrentValue
No guarantee that this is actually faster though - for this to work with any decent speed you'll need an index on EventTime.
Let us keep in mind that, because the SQL language describes what you want and not how to get it, there are many ways of expressing a query that will eventually be turned into the same query execution plan by a good query optimizer. Of course, the level of "good" depends on the database you're using.
In general, subqueries are just a syntactically different way of describing joins. The query optimizer is going to recognize this and determine the most optimal way, to the best of its knowledge, to execute the query. Temporary tables may be created as needed. So in many cases, re-working the query is going to do nothing for your actual execution time -- it may come out to the same query execution plan in the end.
If you're going to attempt to optimize, you need to examine the query plan by doing a describe on that query. Make sure it's not doing full-table scans against large tables, and is picking the appropriate indices where possible. If, and only if, it is making sub-optimal choices here, should you attempt to manually optimize the query.
Now, having said all that, the query you pasted isn't entirely compatible with your stated goal of "calculat[ing] the count of distinct Ids for each given Value". So forgive me if I don't quite answer your need, but here's something to perf-test against your current query. (Syntax is approximate, sorry -- away from my desk).
SELECT [IDs].[Id], vh1.[CurrentValue], COUNT(vh2.[CurrentValue]) FROM
[IDs].[Id] as ids JOIN [ValueHistory] AS vh1 ON ids.[Id]=vh1.[Id]
JOIN [ValueHistory] AS vh2 ON vh1.[CurrentValue]=vh2.[CurrentValue]
GROUP BY [Id], [LastValue];
Note that you'll probably see better performance increases by adding indices to make those joins optimal than re-working the query, assuming you're willing to take the performance hit to update operations.