TSQL Improving performance of Update cross apply like statement - sql

I have a client with a stored procedure that currently take 25 minutes to run. I have narrowed the cause of this to the following statement (changed column and table names)
UPDATE #customer_emails_tmp
SET #customer_emails_tmp.Possible_Project_Ref = cp.order_project_no,
#customer_emails_tmp.Possible_Project_id = cp.order_uid
FROM #customer_emails_tmp e
CROSS APPLY (
SELECT TOP 1 p.order_project_no, p.order_uid
FROM [order] p
WHERE e.Subject LIKE '%' + p.order_title + '%'
AND p.order_date < e.timestamp
ORDER BY p.order_date DESC
) as cp
WHERE e.Possible_Project_Ref IS NULL;
There are 3 slightly different version of the above, joining to 1 of three tables. The issue is the CROSS APPLY LIKE '%' + p.title + '%'. I have tried looking into CONTAINS() and FREETEXT() but as far as my testing and investigations go, you cannot do CONTAINS(e.title, p.title) or FREETEXT(e.title,p.title).
Have I miss read something or is there a better way to write the above query?
Any help on this is much appreciated.
EDIT
Updated query to actual query used. Execution plan:
https://www.brentozar.com/pastetheplan/?id=B1YPbJiX5
Tmp table has the following indexes:
CREATE NONCLUSTERED INDEX ix_tmp_customer_emails_first_recipient ON #customer_emails_tmp (First_Recipient);
CREATE NONCLUSTERED INDEX ix_tmp_customer_emails_first_recipient_domain_name ON #customer_emails_tmp (First_Recipient_Domain_Name);
CREATE NONCLUSTERED INDEX ix_tmp_customer_emails_client_id ON #customer_emails_tmp (customer_emails_client_id);
CREATE NONCLUSTERED INDEX ix_tmp_customer_emails_subject ON #customer_emails_tmp ([subject]);
There is no index on the [order] table for column order_title
Edit 2
The purpose of this SP is to link orders (amongst others) to sent emails. This is done via multiple UPDATE statements; all other update statements are less than a second in length; however, this one ( and 2 others exactly the same but looking at 2 other tables) take an extraordinary amount of time.
I cannot remove the filter on Possible_Project_Ref IS NULL as we only want to update the ones that are null.
Also, I cannot change WHERE e.Subject LIKE '%' + p.order_title + '%' to WHERE e.Subject LIKE p.order_title + '%' because the subject line may not start with the p.order_title, for example it could start with FW: or RE:

Reviewing your execution plan, I think the main issue is you're reading a lot of data from the order table. You are reading 27,447,044 rows just to match up to find 783 rows. Your 20k row temp table is probably nothing by comparison.
Without knowing your data or desired business logic, here's a couple things I'd consider:
Updating First Round of Exact Matches
I know you need to keep your %SearchTerm% parameters, but some data might have exact matches. So if you run an initial update for exact matches, it will reduce the ones you have to search with %SearchTerm%
Run something like this before your current update
/*Recommended index for this update*/
CREATE INDEX ix_test ON [order](order_title,order_date) INCLUDE (order_project_no, order_uid)
UPDATE #customer_emails_tmp
SET Possible_Project_Ref = cp.order_project_no
,Possible_Project_id = cp.order_uid
FROM #customer_emails_tmp e
CROSS APPLY (
SELECT TOP 1 p.order_project_no, p.order_uid
FROM [order] p
WHERE e.Subject = p.order_title
AND p.order_date < e.timestamp
ORDER BY p.order_date DESC
) as cp
WHERE e.Possible_Project_Ref IS NULL;
Narrowing Search Range
This will technically change your matching criteria, but there are probably certain logical assumptions you can make that won't impact the final results. Here are a couple of ideas for you to consider, to get you thinking this way, but only you know your business. The end goal should be to narrow the data read from the order table
Is there a customer id you can match on? Something like e.customerID = p.customerID? Do you really match any email to any order?
Can you narrow your search date range to something like x days before timestamp? Do you really need to search all historical orders for all of time? Would you even want a match if an email matches to an order from 5 years ago? For this, try updating your APPLY date filter to something like p.order_date BETWEEN DATEADD(dd,-30,e.[timestamp]) AND e.[timestamp]
Other Miscellaneous Notes
If I'm understanding this correctly, you are trying to link email to some sort of project #. Ideally, when the email are generated, they would be linked to a project immediately. I know this is not always possible resource/time wise, but the clean solution is to calculate this at the beginning of the process, not afterwards. Generally anytime you have to use fuzzy string matching, you will have data issues. I know business always wants results "yesterday" and always pushes for the shortcut, and nobody ever wants to update legacy processes, but sometimes you need to if you want clean data
I'd review your indexes on the temp table. Generally I find the cost to create the indexes and for SQL Server to maintain them as I update the temp table is not worth it. So 9 times out of 10, I leave the temp table as a plain heap with 0 indexes

First, filter the NULLs when you create #customer_emails_tmp, not after. Then you can lose:
WHERE e.Possible_Project_Ref IS NULL. This way you are only bringing in rows you need instead of retrieving rows you don't need, then filtering them.
Next, us this for your WHERE clause:
WHERE EXISTS (SELECT 1 FROM [order] AS p WHERE p.order_date < e.timestamp)
If an order date doesn't have any later timestamps in e, none of the rows in e will be considered.
Next remove the timestamp filter from your APPLY subquery. Now your subquery looks like this:
SELECT TOP 1 p.order_project_no, p.order_uid
FROM [order] AS p
WHERE e.Subject LIKE '%' + p.order_title + '%'
ORDER BY p.order_date DESC
This way you are applying your "Subject Like" filter to a much smaller set of rows. The final query would look like this:
UPDATE #customer_emails_tmp
SET #customer_emails_tmp.Possible_Project_Ref = cp.order_project_no,
#customer_emails_tmp.Possible_Project_id = cp.order_uid
FROM #customer_emails_tmp e
CROSS APPLY (
SELECT TOP 1 p.order_project_no, p.order_uid
FROM [order] p
WHERE e.Subject LIKE '%' + p.order_title + '%'
ORDER BY p.order_date DESC
) as cp
WHERE EXISTS (SELECT 1 FROM [order] AS p WHERE p.order_date < e.timestamp);

Related

Performance slow in query its getting slow due to DATEDIFF function

I am writing a SQL query which gives me a slow performance. Because of DATEDIFF function that it gives me no any result into mails. Please help me to remake this query so that my output results faster. I will put the query below
SELECT DISTINCT isnull(hrr.SourceEmailID,'')
,''
,''
,hrr.RID
,hrr.ResID
,hrr.ReqID
,'Interview Rejected To Employee'
,(
SELECT TOP 1
RID
FROM HCM_TEMPLATE_LIBRARY WITH (NOLOCK)
WHERE Title = 'Interview Reject Mail To Employee (Applicant Source- EGES)'
)
,GETUTCDATE()
,hrr.CreatedUserID
,0
FROM hc_resume_bank hrb WITH (NOLOCK)
INNER JOIN hc_req_resume hrr WITH (NOLOCK)
ON hrr.resid = HRB.rid
INNER JOIN HC_REQ_RESUME_STAGE_STATUS hrrss WITH (NOLOCK) ON hrrss.ReqResID = hrr.RID
INNER JOIN HCM_RESUME_SOURCE hrs WITH (NOLOCK) ON hrs.RID = hrr.SourceID
INNER JOIN HC_REQ_RES_INTERVIEW_STAGES hrris ON hrris.ReqResId = hrr.RID
WHERE hrrss.stageid = 4
AND hrrss.statusid = 9
AND hrr.SourceID = 4
AND isnull(hrb.SourceEmailId, '') <> ''
AND isnull(hrr.SourceEmailId, '') <> ''
and hrr.AddedType=10
AND Datediff(MI, dateadd(mi, 330, hrrss.StatusDate), DATEADD(mi, 330, GETUTCDATE())) <=5
Assuming that you have established that datediff is the root cause of poor performance, I suggest changing this:
Datediff(MI, dateadd(mi, 330, hrrss.StatusDate), DATEADD(mi, 330, GETUTCDATE())) <=5
to this:
hrrss.StatusDate >= DATEADD(MI, -5, GETDATE())
This assumes dates in StatusDate are same timezone as the server.
Salmon A has a great answer that I'd like to expand on.
Similar to why Salman A suggested you move the function to the right side of your where clause for hrrss.StatusDate, the same applies to SourceEmailId, as putting a function on the left prevents the use of an index on these columns.
However, ISNULL() is a bit more tricky to resolve, and there are several possible ways it could be addressed.
Consider if the column should really allow NULLS, and if altering the column to not allow NULLS is an option. Then your where clause would look like this.
AND hrb.SourceEmailId <> ''
AND hrr.SourceEmailId <> ''
It's also possible that SourceEmailId is always ether going to have a valid value, or be NULL. This would be preferred, as NULL should be used where a value is unknown. In which case you shouldn't be checking for <> ''. Simply check that email IS NOT NULL.
AND hrb.SourceEmailId IS NOT NULL
AND hrr.SourceEmailId IS NOT NULL
If option 1 and 2 are not an option, then consider a UNION result set. In this case, you'd write a query for hrb.SourceEmailId <> '' and UNION that to the results of a second query for hrb.SourceEmailId IS NOT NULL. Since you have checks for SourceEmailId on two different tables, it could mean as meany as four queries. However, don't get caught up on the fact it's more queries, and that that would somehow mean it'll be slower. If all 4 queries are properly tuned, and each run in 100ms, that's better than one combined query running in 5 minutes.
More details of the issues and possible work around to using ISNULL() can be found in the below links.
isnull-around-the-predicate-and-sargability
What are different ways to replace ISNULL() in a WHERE clause that uses only literal values?
Once these changes have been applied, you'll have a query that can actually use indexes on these columns. At that point, I'd start reviewing your execution plans and indexes, and possibly looking at removing the DISTINCT. But, as long as you have several WHERE clauses in your query that are going to force a SCAN every time they execute, doing these things now won't yield much benefit.
DISTINCT in a query is almost always an indicator for a badly written query, where the author joins a lot of tables, builds a huge intermediate result thus that they must then boil down to its real size with DISTINCT. This is a costly operation. It seems to apply to your query. If you simply want to make sure that the hc_req_resume.resid has an entry in hc_resume_bank with a sourceemailid, then use EXISTS or IN for this lookup, not a join.
Your query with appropriate lookup clauses:
SELECT
ISNULL(hrr.sourceemailid,'')
,''
,''
,hrr.rid
,hrr.resid
,hrr.reqid
,'Interview Rejected To Employee'
,(
SELECT TOP 1
rid
FROM hcm_template_library
WHERE title = 'Interview Reject Mail To Employee (Applicant Source- EGES)'
)
,GETUTCDATE()
,hrr.createduserid
,0
FROM hc_req_resume hrr
WHERE hrr.sourceid = 4
AND hrr.addedtype = 10
AND hrr.resid IN
(
SELECT hrb.rid
FROM hc_resume_bank hrb
WHERE hrb.sourceemailid <> ''
)
AND hrr.rid IN
(
SELECT hrrss.reqresid
FROM hc_req_resume_stage_status hrrss
WHERE hrrss.stageid = 4
AND hrrss.statusid = 9
AND hrrss.statusdate >= DATEADD(MI, -5, GETUTCDATE())
)
AND hrr.sourceid IN (SELECT hrs.rid FROM hcm_resume_source hrs)
AND hrr.rid IN (SELECT hrris.reqresid FROM hc_req_res_interview_stages);
The naming of the columns doesn't make things easier here. Why is the column sometimes called rid and sometimes reqresid? And then I see a rid combined with a resid. Is this just yet another name for the same thing? Or are there two meanings of a rid? And what is the table called the ID actually refers to? Is there a table called r or reqres or res? It doesn't seem so, but why does the ID of the table have a different name from the table, so the reader must guess what is what? We cannot even make much of a guess, if it is possible for a rid not to have a match in hc_req_res_interview_stages or for a sourceid not to have a match in hcm_resume_source. Usually you have a foreign key constraint on IDs, so either the ID is null (if this is allowed) or it does have a match in the parent table. A lookup would we pointless. Is it in your query? Or arent those tables the parent tables, but just other child tables refering to the same parent?
Remove any lookups that are not needed. The lookups in hcm_resume_source and hc_req_res_interview_stages may be such candidates, but I cannot know.
At last you want appropriate indexes. For hc_req_resume this may be something like
create index idx1 on hc_req_resume (sourceid, addedtype, rid, resid);
Then you may want:
create index idx2 on hc_resume_bank (rid) where sourceemailid <> '';
create index idx3 on hc_req_resume_stage_status (stageid, statusid, statusdate, reqresid);
The order of the columns in the indexes should be adjusted according to their selectivity.
You search for a result in the future, is this correct? -Edit: i realised its just the last 5 minutes you are looking for so in this case you might just as well remove the function on the left and see if this prevents the index scan.
About the slow performance. your query (only focussing on the datediff here) is not sargable this way. SQL server will need compute the column in all the rows on the table first, always resulting in a table scan. Remove the function on the left side.
One way to get around this, is to get the results from the main table first in a sargable way, put in it a temptable and then use the temptable for the function and use its ids to get back to the maintable for the results. See below example.
IF OBJECT_ID('tempdb..#MyTableName') IS NOT NULL
DROP TABLE #MyTableName
CREATE TABLE #MyTableName
(
PK INT PRIMARY KEY IDENTITY (1,1) NOT NULL,
ID INT,
StatusDate DATETIME
)
INSERT INTO #MyTableName (ID,StatusDate )
SELECT
ID,StatusDate
FROM dbo.basetable p
WHERE p.StatusDate > GETUTCDATE() --narrow your date criteria as much as needed
GO
SELECT P.* FROM #MyTableName T
JOIN dbo.basetable P
ON P.Id = T.ID
WHERE Datediff(MI, dateadd(mi, 330, T.StatusDate), DATEADD(mi, 330, GETUTCDATE())) <= 5
OPTION (RECOMPILE)
;
If you can create a nonclustered index on your date column and see what it brings. In the way you wrote it, it will always scan the table but at least it has an index. In the sargable way that index will also help a bunch.

Optimizing SQL Stored Procedure Query for better performance

Is there a way to optimize the query below as its took quite awhile to retrieve the massive records from the table (T_School_Class) and (T_School) I had created indexes for Name as well as SchoolCode for T_School. In additional, Temp Table was also created.
SELECT Distinct (S.SchoolCode) As Code, Name from T_STU_School AS S
LEFT JOIN T_STU_School_Class AS SC ON S.SchoolCode = SC.SchoolCode
WHERE S.SchoolCode IN
(SELECT SchoolCode FROM #MainLevelCodeTemp)
AND [Status] = 'A'
AND Name LIKE #Keyword
AND (#AcademicCode = '' OR SC.AcademicLevel IN (#AcademicCode))
Order BY Name ASC;
all the imperatives in the sproc are a waste, you're just forcing SQL to scan T_STU_School multiple times, all that logic should just be added to the where clause:
SELECT Distinct (S.SchoolCode) As Code, Name from T_STU_School AS S
LEFT JOIN T_STU_School_Class AS SC ON S.SchoolCode = SC.SchoolCode
WHERE ((#MainLevelCode LIKE '%J%' AND S.MixLevelType IN ('T1','T2','T6'))
OR (#MainLevelCode LIKE '%S%' AND S.MixLevelType IN ('T1','T2','T5','T6'))
OR (#MainLevelCode LIKE '%P%' AND S.MixLevelType IN ('T1','T2','T6'))
OR (MainLevelCode IN (SELECT Item FROM [dbo].[SplitString](#MainLevelCode, ',')))
OR #MainLevelCode = '')
AND [Status] = 'A'
AND (#Keyword = '' OR Name LIKE #Keyword)
AND (#AcademicCode = '' OR SC.AcademicLevel IN (#AcademicCode))
Order BY Name ASC;
..the reason both tables are still being scanned per your execution plan even though you've created indexes on Name and SchoolCode is because there's no criteria on SchoolCode that would reduce the result set to less than the whole table, and likewise with Name whenever it is blank or starts with a "%". to prevent the full table scans you should create indexes on:
T_STU_School (Status, Name)
T_STU_School_Class (MixLevelType, SchoolCode)
T_STU_School_Class (MainLevelCode, SchoolCode)
..also any time you have stuff like (y='' OR x=y) in the where clause it's a good idea to add an OPTION (RECOMPILE) to the bottom to avoid the eventual bad plan cache nightmare.
..also this line is probably a bug:
AND (#AcademicCode = '' OR SC.AcademicLevel IN (#AcademicCode))
IN won't parse #AcademicCode so this statement is equivalent to SC.AcademicLevel=#AcademicCode
You definitely need an index on T_STU_SCHOOL.SchoolCode. Your query plan shows that 65% of the query time is taken from the index scan that results from the join. An index on the SchoolCode column should turn that into an index seek, which will be much faster.
The Name index is not currently being used, probably because you're passing in values for #keyword that start with a wildcard. Given that Name is on the T_STU_School table, which has a small number rows, you can maybe afford a table scan there in order to use wildcards the way you want to. So you should be able to drop the Name index.

"Subquery has multiple rows for like comparison

I have one table give below.
In the following query, the outer query joins on a like comparison on the tag column with the subquery.
SELECT top 6 *
FROM [piarchive].[picomp2]
WHERE tag Like
(
Select distinct left(tag,19) + '%'
from (SELECT *
FROM [piarchive].[picomp2]
WHERE tag like '%CPU_Active' and time between '2014/10/02 15:13:08'and'2014/10/02 15:18:37'
and value=-524289 order by time desc) as t1
)
and tag not like '%CPU_Active' and tag not like '%Program%' and time between '2014/10/02
15:13:08'and'2014/10/02 15:18:37' order by time desc
But this subquery returns multiple rows, causing the following error:
Error : "When used as an expression, subquery can return at most one row."
Replace the where tag like (...) (where ... is the subquery, omitted here for brevity) part with where exists (...), and bring the like comparison into the subquery.
select top 6
*
from
[piarchive].[picomp2] t0
where
exists
(
select
*
from
(
select
*
from
[piarchive].[picomp2]
where
tag like '%cpu_active' and time between '2014/10/02 15:13:08' and '2014/10/02 15:18:37'
and
value = -524289
)
as t1
where
t0.tag like left(t1.tag, 19) + '%'
)
and
tag not like '%cpu_active'
and
tag not like '%program%'
and
time between '2014/10/02 15:13:08' and '2014/10/02 15:18:37'
order by
time desc;
I've added a table alias to the outer query to disambiguate the tag columns, but you can see the like comparison is shifted to within the subquery.
I can't vouch for how this will perform on large data sets, but that's a different topic. Personally, I would be looking for a way to get rid of the subquery altogether, since it's all querying the same table.
More on optimisation
It's not going to be easy to optimise, and indexes will be of little use here, for the following reasons:
The join criteria (t0.tag like left(t1.tag, 19) + '%') is not simple, and the query optimiser may have a hard time producing anything better than nested loops (i.e., executing the subquery for every row of the outer query). This is probably your biggest performance killer right here.
None of the like comparisons can utilise table indexes, because they are checking the end of the value, not the start.
Your only hope might be if the date-range check is highly selective (eliminates a lot of records). Since the same check on the time field is performed in both outer and inner queries, you could select that into a temp table:
select left(tag, 19) as key, *
into #working
from [piarchive].[picomp2]
where [time] between '2014/10/02 15:13:08' and '2014/10/02 15:18:37';
#working now has only the records in the specified time period. Since your example range is quite narrow (only 5 1/2 minutes), I'd wager this might knock out ~99% of records. An index on time will speed this up significantly. After you do this, you're only dealing with a tiny fraction of the data.
Then, possibly (see later) index key:
create clustered index cx_key on #working (key);
Then complete the rest of the query as:
select a.*
from #working a
where exists
(
select *
from #working b
where a.key = b.key and b.tag like '%cpu_active'
)
and
a.tag not like '%program%'
and
a.tag not like '%cpu_active'
What I've done is create a clustered index on the joining criteria (the first 19 chars of tag) to optimise the subquery. You'll have to test this out, as it may make no difference or even slow things down if the gains are outweighed by the cost in creating the index in the first place. This will depend on how much data you have, and other factors. I only got minimal gains by doing this (about a 5% speed increase), though I'm only running this against a few hundred rows of test data I knocked up. The more data you have, the more effective it should be.

Query returning nothing

My query is as follows:
Select h.ord_no
from sales_history_header h
INNER JOIN sales_history_detail d
ON d.NUMBER = h.NUMBER
WHERE d.COMMENTS LIKE '%3838CS%'
And I get no results as shown here :
But I should get results because :
I ran the query:
Select NUMBER, Comments from SALES_HISTORY_DETAIL WHERE NUMBER LIKE '%0000125199%'
and got this (As you can see there's a comment field with 3838CS contained in it) :
And ran this query:
Select NUMBER, Ord_No from "SALES_HISTORY_HEADER" WHERE NUMBER = '0000125199'
and got this (The Ord_No exists) :
How come my first original query returns no results? Do I have the syntax wrong ?
Your query is returning nothing because the execution engine is using an index that is incorrectly referenced by this specific application (Sage BusinessVision) you have to work around the issue.
Explanation:
The issue you are having is related to the way BusinessVision created the index index of the table SALES_HISTORY_DETAIL. The PK (index key0) for this table is on both column NUMBER and RECNO.
Details on Pervasive indexs for BusinessVision
Here is the explanation of the way that index works with BV:
If you run a query that is capabable of using an index you will get better performance. Unfortunately the way pervasive compute this index for NUMBER is not working on its own.
--wrong way for this table
Select * from SALES_HISTORY_DETAIL WHERE NUMBER = '0000125199'
--return no result
Because of the way pervasive handle the index you should get no results. The workaround is you have to query on all the fields of the PK for it to work. In this case RECNO represent a record from 1 to 999 so we can specify all records with RECNO > 0.
--right way to use index key0
Select * from SALES_HISTORY_DETAIL WHERE NUMBER = '0000125199' and RECNO > 0
This will give you the result you expected for that table and use the index with the performance gain.
Note that you will get the same behavior in the table SALES_ORDER_DETAIL
Back you your question.
The query you ran to see the details did execute a table scan instead of using the index.
--the way you used in your question
Select * from SALES_HISTORY_DETAIL WHERE NUMBER LIKE '%0000125199%'
in that case it working, not because of the Like keyword but because of the leading '%'; remove it and that query won't work since the engine will optimise by using the weird index.
In your original query because you are referencing d.NUMBER = h.NUMBER pervasive use the index and you don't get any result, to fix that query simply add (and RECNO > 0)
Select h.ord_no
from sales_history_header h
INNER JOIN sales_history_detail d
ON d.NUMBER = h.NUMBER and RECNO > 0
WHERE d.COMMENTS LIKE '%3838CS%'
sage-businessvision pervasive-sql
I think this is because you have different data type for number in both table
There is no issues with your query. Looks like a data issue. "Number" stored in SALES_HISTORY_DETAIL might have some space. Its hard to tell if there is some space in value from the SS.
Run the following query to see if your SALES_HISTORY_DETAIL table number value is stored correctly.
Select NUMBER, Comments from SALES_HISTORY_DETAIL WHERE NUMBER = '0000125199'
comment column is text ? did you try
Select h.ord_no
from sales_history_header h
INNER JOIN sales_history_detail d ON d.NUMBER = h.NUMBER
WHERE cast(d.COMMENTS as varchar(max) LIKE '%3838CS%'

Can this SQL Query be optimized to run faster?

I have an SQL Query (For SQL Server 2008 R2) that takes a very long time to complete. I was wondering if there was a better way of doing it?
SELECT #count = COUNT(Name)
FROM Table1 t
WHERE t.Name = #name AND t.Code NOT IN (SELECT Code FROM ExcludedCodes)
Table1 has around 90Million rows in it and is indexed by Name and Code.
ExcludedCodes only has around 30 rows in it.
This query is in a stored procedure and gets called around 40k times, the total time it takes the procedure to finish is 27 minutes.. I believe this is my biggest bottleneck because of the massive amount of rows it queries against and the number of times it does it.
So if you know of a good way to optimize this it would be greatly appreciated! If it cannot be optimized then I guess im stuck with 27 min...
EDIT
I changed the NOT IN to NOT EXISTS and it cut the time down to 10:59, so that alone is a massive gain on my part. I am still going to attempt to do the group by statement as suggested below but that will require a complete rewrite of the stored procedure and might take some time... (as I said before, im not the best at SQL but it is starting to grow on me. ^^)
In addition to workarounds to get the query itself to respond faster, have you considered maintaining a column in the table that tells whether it is in this set or not? It requires a lot of maintenance but if the ExcludedCodes table does not change often, it might be better to do that maintenance. For example you could add a BIT column:
ALTER TABLE dbo.Table1 ADD IsExcluded BIT;
Make it NOT NULL and default to 0. Then you could create a filtered index:
CREATE INDEX n ON dbo.Table1(name)
WHERE IsExcluded = 0;
Now you just have to update the table once:
UPDATE t
SET IsExcluded = 1
FROM dbo.Table1 AS t
INNER JOIN dbo.ExcludedCodes AS x
ON t.Code = x.Code;
And ongoing you'd have to maintain this with triggers on both tables. With this in place, your query becomes:
SELECT #Count = COUNT(Name)
FROM dbo.Table1 WHERE IsExcluded = 0;
EDIT
As for "NOT IN being slower than LEFT JOIN" here is a simple test I performed on only a few thousand rows:
EDIT 2
I'm not sure why this query wouldn't do what you're after, and be far more efficient than your 40K loop:
SELECT src.Name, COUNT(src.*)
FROM dbo.Table1 AS src
INNER JOIN #temptable AS t
ON src.Name = t.Name
WHERE src.Code NOT IN (SELECT Code FROM dbo.ExcludedCodes)
GROUP BY src.Name;
Or the LEFT JOIN equivalent:
SELECT src.Name, COUNT(src.*)
FROM dbo.Table1 AS src
INNER JOIN #temptable AS t
ON src.Name = t.Name
LEFT OUTER JOIN dbo.ExcludedCodes AS x
ON src.Code = x.Code
WHERE x.Code IS NULL
GROUP BY src.Name;
I would put money on either of those queries taking less than 27 minutes. I would even suggest that running both queries sequentially will be far faster than your one query that takes 27 minutes.
Finally, you might consider an indexed view. I don't know your table structure and whether your violate any of the restrictions but it is worth investigating IMHO.
You say this gets called around 40K times. WHy? Is it in a cursor? If so do you really need a cursor. Couldn't you put the values you want for #name in a temp table and index it and then join to it?
select t.name, count(t.name)
from table t
join #name n on t.name = n.name
where NOT EXISTS (SELECT Code FROM ExcludedCodes WHERE Code = t.code)
group by t.name
That might get you all your results in one query and is almost certainly faster than 40K separate queries. Of course if you need the count of all the names, it's even simpleer
select t.name, count(t.name)
from table t
NOT EXISTS (SELECT Code FROM ExcludedCodes WHERE Code = t
group by t.name
NOT EXISTS typically performs better than NOT IN, but you should test it on your system.
SELECT #count = COUNT(Name)
FROM Table1 t
WHERE t.Name = #name AND NOT EXISTS (SELECT 1 FROM ExcludedCodes e WHERE e.Code = t.Code)
Without knowing more about your query it's tough to supply concrete optimization suggestions (i.e. code suitable for copy/paste). Does it really need to run 40,000 times? Sounds like your stored procedure needs reworking, if that's feasible. You could exec the above once at the start of the proc and insert the results in a temp table, which can keep the indexes from Table1, and then join on that instead of running this query.
This particular bit might not even be the bottleneck that makes your query run 27 minutes. For example, are you using a cursor over those 90 million rows, or scalar valued UDFs in your WHERE clauses?
Have you thought about doing the query once and populating the data in a table variable or temp table? Something like
insert into #temp (name, Namecount)
values Name, Count(name)
from table1
where name not in(select code from excludedcodes)
group by name
And don't forget that you could possibly use a filtered index as long as the excluded codes table is somewhat static.
Start evaluating the execution plan. Which is the heaviest part to compute?
Regarding the relation between the two tables, use a JOIN on indexed columns: indexes will optimize query execution.