Why is inserting into and joining #temp tables faster? - sql

I have a query that looks like
SELECT
P.Column1,
P.Column2,
P.Column3,
...
(
SELECT
A.ColumnX,
A.ColumnY,
...
FROM
dbo.TableReturningFunc1(#StaticParam1, #StaticParam2) AS A
WHERE
A.Key = P.Key
FOR XML AUTO, TYPE
),
(
SELECT
B.ColumnX,
B.ColumnY,
...
FROM
dbo.TableReturningFunc2(#StaticParam1, #StaticParam2) AS B
WHERE
B.Key = P.Key
FOR XML AUTO, TYPE
)
FROM
(
<joined tables here>
) AS P
FOR XML AUTO,ROOT('ROOT')
P has ~ 5000 rows
A and B ~ 4000 rows each
This query has a runtime performance of ~10+ minutes.
Changing it to this however:
SELECT
P.Column1,
P.Column2,
P.Column3,
...
INTO #P
SELECT
A.ColumnX,
A.ColumnY,
...
INTO #A
FROM
dbo.TableReturningFunc1(#StaticParam1, #StaticParam2) AS A
SELECT
B.ColumnX,
B.ColumnY,
...
INTO #B
FROM
dbo.TableReturningFunc2(#StaticParam1, #StaticParam2) AS B
SELECT
P.Column1,
P.Column2,
P.Column3,
...
(
SELECT
A.ColumnX,
A.ColumnY,
...
FROM
#A AS A
WHERE
A.Key = P.Key
FOR XML AUTO, TYPE
),
(
SELECT
B.ColumnX,
B.ColumnY,
...
FROM
#B AS B
WHERE
B.Key = P.Key
FOR XML AUTO, TYPE
)
FROM #P AS P
FOR XML AUTO,ROOT('ROOT')
Has a performance of ~4 seconds.
This makes not a lot of sense, as it would seem the cost to insert into a temp table and then do the join should be higher by default. My inclination is that SQL is doing the wrong type of "join" with the subquery, but maybe I've missed it, there's no way to specify the join type to use with correlated subqueries.
Is there a way to achieve this without using #temp tables/#table variables via indexes and/or hints?
EDIT: Note that dbo.TableReturningFunc1 and dbo.TableReturningFunc2 are inline TVF's, not multi-statement, or they are "parameterized" view statements.

Your procedures are being reevaluated for each row in P.
What you do with the temp tables is in fact caching the resultset generated by the stored procedures, thus removing the need to reevaluate.
Inserting into a temp table is fast because it does not generate redo / rollback.
Joins are also fast, since having a stable resultset allows possibility to create a temporary index with an Eager Spool or a Worktable
You can reuse the procedures without temp tables, using CTE's, but for this to be efficient, SQL Server needs to materialize the results of CTE.
You may try to force it do this with using an ORDER BY inside a subquery:
WITH f1 AS
(
SELECT TOP 1000000000
A.ColumnX,
A.ColumnY
FROM dbo.TableReturningFunc1(#StaticParam1, #StaticParam2) AS A
ORDER BY
A.key
),
f2 AS
(
SELECT TOP 1000000000
B.ColumnX,
B.ColumnY,
FROM dbo.TableReturningFunc2(#StaticParam1, #StaticParam2) AS B
ORDER BY
B.Key
)
SELECT …
, which may result in Eager Spool generated by the optimizer.
However, this is far from being guaranteed.
The guaranteed way is to add an OPTION (USE PLAN) to your query and wrap the correspondind CTE into the Spool clause.
See this entry in my blog on how to do that:
Generating XML in subqueries
This is hard to maintain, since you will need to rewrite your plan each time you rewrite the query, but this works well and is quite efficient.
Using the temp tables will be much easier, though.

This answer needs to be read together with Quassnoi's article
http://explainextended.com/2009/05/28/generating-xml-in-subqueries/
With judicious application of CROSS APPLY, you can force the caching or shortcut evaluation of inline TVFs. This query returns instantaneously.
SELECT *
FROM (
SELECT (
SELECT f.num
FOR XML PATH('fo'), ELEMENTS ABSENT
) AS x
FROM [20090528_tvf].t_integer i
cross apply (
select num
from [20090528_tvf].fn_num(9990) f
where f.num = i.num
) f
) q
--WHERE x IS NOT NULL -- covered by using CROSS apply
FOR XML AUTO
You haven't provided real structures so it's hard to construct something meaningful, but the technique should apply as well.
If you change the multi-statement TVF in Quassnoi's article to an inline TVF, the plan becomes even faster (at least one order of magnitude) and the plan magically reduces to something I cannot understand (it's too simple!).
CREATE FUNCTION [20090528_tvf].fn_num(#maxval INT)
RETURNS TABLE
AS RETURN
SELECT num + #maxval num
FROM t_integer
Statistics
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
(10 row(s) affected)
Table 't_integer'. Scan count 2, logical reads 22, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 2 ms.

It is a problem with your sub-query referencing your outer query, meaning the sub query has to be compiled and executed for each row in the outer query.
Rather than using explicit temp tables, you can use a derived table.
To simplify your example:
SELECT P.Column1,
(SELECT [your XML transformation etc] FROM A where A.ID = P.ID) AS A
If P contains 10,000 records then SELECT A.ColumnX FROM A where A.ID = P.ID will be executed 10,000 times.
You can instead use a derived table as thus:
SELECT P.Column1, A2.Column FROM
P LEFT JOIN
(SELECT A.ID, [your XML transformation etc] FROM A) AS A2
ON P.ID = A2.ID
Okay, not that illustrative pseudo-code, but the basic idea is the same as the temp table, except that SQL Server does the whole thing in memory: It first selects all the data in "A2" and constructs a temp table in memory, then joins on it. This saves you having to select it to TEMP yourself.
Just to give you an example of the principle in another context where it may make more immediate sense. Consider employee and absence information where you want to show the number of days absence recorded for each employee.
Bad: (runs as many queryes as there are employees in the DB)
SELECT EmpName,
(SELECT SUM(absdays) FROM Absence where Absence.PerID = Employee.PerID) AS Abstotal
FROM Employee
Good: (Runs only two queries)
SELECT EmpName, AbsSummary.Abstotal
FROM Employee LEFT JOIN
(SELECT PerID, SUM(absdays) As Abstotal
FROM Absence GROUP BY PerID) AS AbsSummary
ON AbsSummary.PerID = Employee.PerID

There are several possible reasons why using intermediate Temp tables might speed up a query, but the most likely in your case is that the functions which are being called (but are not listed), are probably Multi-statement TVF's and not in-line TVF's. Multi-statement TVF's are opaque to the optimization of their calling queries and thus the optimizer cannot tell if there are any oppurtunities for re-use of data, or other logical/physical operator re-ordering optimizations. Thus, all it can do is to re-execute the TVFs every time that the containing query is supposed to produce another row with the XML columns.
In short, multi-statement TVF's frustrate the optimizer.
The usual solutions, in order of (typical) preference are:
Re-write the offending multi-statement TVF to be an in-line TVF
In-line the function code into the calling query, or
Dump the offending TVF's data into a temp table. which is what you've done...

Consider using the WITH common_table_expression construct for what you now have as sub-selects or temporary tables, see http://msdn.microsoft.com/en-us/library/ms175972(SQL.90).aspx .

This makes not a lot of sense, as it
would seem the cost to insert into a
temp table and then do the join should
be higher by de> This makes not a lot of sense, as it
would seem the cost to insert into a
temp table and then do the join should
be higher by default.fault.
With temporary tables, you explitly instruct Sql Server which intermediate storage to use. But if you stash everything in a big query, Sql Server will decide for itself. The difference is not really that big; at the end of the day, temporary storage is used, whether you specify it as a temp table or not.
In your case, temporary tables work faster, so why not stick to them?

I agreed, Temp table is a good concept. When the row count increases in a table an example of 40 million rows and i want to update multiple columns on a table by applying joins with other table in that case i would always prefer to use Common table expression to update the columns in select statement using case statement, now my select statement result set contains updated rows.Inserting 40 million records into a temp table with select statement using case statement took 21 minutes for me and then creating an index took 10 minutes so my insert and index creation time took 30 minutes. Then i am going to apply update by joining temp table updated result set with main table. It took 5 minutes to update 10 million records out of 40 million records so my overall update time for 10 million records took almost 35 minutes vs 5 minutes from Common table expression. My choice in that case is Common table expression.

If temp tables turn out to be faster in your particular instance, you should instead use a table variable.
There is a good article here on the differences and performance implications:
http://www.codeproject.com/KB/database/SQP_performance.aspx

Related

SQL best practice/performance when inserting into a table. To use a temp table or not

I have a select query that's come about from me trying to remove while loops from an existing query that was far too slow. As it stands I first select into a temp table.
Then from that temp table I insert into the final table using the values from the temp table.
Below is a simplified example of the flow of my query
select
b.BookId,
b.BookDescription,
a.Name,
a.BirthDate,
a.CountryOfOrigin,
into #tempTable
from library.Book b
left join authors.Authors a
on a.AuthorId = b.AuthorId
insert into bookStore.BookStore
([BookStoreEntryId]
[BookId],
[BookDescription],
[Author],
[AuthorBirthdate],
[AuthorCountryOfOrigin])
select
NEWID(),
t.BookId,
t.BookDescription,
t.Name,
t.Birthdate,
t.CountryOfOrigin
from #tempTable t
drop table #tempTable
Would it be better to move the select statement at the start, to below so that its incorporated into the insert statement, removing the need for the temp table?
There is no advantage at all to having a temporary table in this case. Just use the select query directly.
Sometimes, temporary tables can improve performance. One method is that a real table has real statistics (notably the number of rows). The optimizer can use that information for better execution plans.
Temporary tables can also improve performance if they explicit have an index on them.
However, they incur overhead of writing the table.
In this case, you just get all the overhead and there should be no benefit.
Actually, I could imagine one benefit under one circumstance. If the query took a long time to run -- say because the join required a nested loops join with no indexes -- then the destination table would be saved from locking and contention until all the rows are available for insert. That would be an unusual case, though.
Do in 1 step
insert into bookStore.BookStore
( /* [BookStoreEntryId] <-- assuming this is auto id*/
[BookId],
[BookDescription],
[Author],
[AuthorBirthdate],
[AuthorCountryOfOrigin])
SELECT distinct
b.BookId,
b.BookDescription,
a.Name,
a.BirthDate,
a.CountryOfOrigin,
from library.Book b
left join authors.Authors a
on a.AuthorId = b.AuthorId
your performance will depend on number of indexes in the target table. More indexes - slower insert. May be worth to disable them during insert and then rebuild them after the insert is completed

Performance impact of chained CTE vs Temp table

I have the following Chained CTE query (simplified):
;WITH CTE1
AS(
SELECT * FROM TableA
),
CTE2
AS(
SELECT * FROM TableB b INNER JOIN CTE1 c ON b.id = c.id
)
SELECT * FROM CTE2
If I break CTE chain and store data of CTE1 into a temp table then the performance of the overall query improves (from 1 minute 20 seconds to 8 seconds).
;WITH CTE1
AS(
SELECT * FROM TableA
)
SELECT * INTO #Temp FROM CTE1
;WITH CTE2
AS(
SELECT * FROM TableB b INNER JOIN #Temp c ON b.id = c.id
)
SELECT * FROM CTE2
DROP TABLE #Temp
There are complex queries in CTE1 and CTE2. I have just created a simplified version to explain here.
Should breaking CTE chair improve the performance?
SQL Server version: 2008 R2
Obviously, it can, as you yourself have shown.
Why? The most obvious reason is that the optimizer knows the size of a temporary table. That gives it more information for optimizing the query. The CTE is just an estimate. So, the improvement you are seeing is due to the query plan.
Another reason would be if the CTE is referenced multiple times in the query. SQL Server does not materialize CTEs, so the definition code would be run multiple times.
Sometimes, you purposely materialize CTEs as temporary tables so you can add indexes to them. That can also improve performance.
All that said, I prefer to avoid temporary tables. The optimizer is usually pretty good.
Consider cte1 is expensive
;WITH CTE1
AS(
SELECT * FROM TableA
)
SELECT * INTO #Temp FROM CTE1
Above guarantees cte1 is only run once.
The chained cte can evaluate cte1 multiple times.
And even with #temp you should consider index / PK and sort the insert.
This depends upon many factors. Always try to write the single statement, if you can. Premature optimization is the root of a lot of evil.
If you do experience a performance problem, these are some of the advantages to decomposing your single statement:
It can increase maintainability, which is one of many non-functional requirements, by reducing complexity.
It can yield a better plan, so long as the cost of the intermediate materialization and the time saved is less than the original cost.
The intermediate tables can be indexed.
Indexes, primary keys, and unique constraints are very helpful to the optimizer, not only for choosing join types, but also for estimating cardinality, which has a large effect on memory grants.
You can choose to apply optimizer hints, such as MAXDOP only to select statements, rather than one gigantic statement. This is especially helpful when you need to manipulate memory grants.
You can tune individual statements to eliminate spill to tempdb.
Depending upon the complexity and total execution time of your process, you can potentially release resource locks earlier, depending also upon which isolation level your statements run under.
If your query plan is poor, due to an optimizer time-out, using less complex individual statements may yield better overall results.

Performance of nested select

I know this is a common question and I have read several other posts and papers but I could not find one that takes into account indexed fields and the volume of records that both queries could return.
My question is simple really. Which of the two is recommended here written in an SQL-like syntax (in terms of performance).
First query:
Select *
from someTable s
where s.someTable_id in
(Select someTable_id
from otherTable o
where o.indexedField = 123)
Second query:
Select *
from someTable
where someTable_id in
(Select someTable_id
from otherTable o
where o.someIndexedField = s.someIndexedField
and o.anotherIndexedField = 123)
My understanding is that the second query will query the database for every tuple that the outer query will return where the first query will evaluate the inner select first and then apply the filter to the outer query.
Now the second query may query the database superfast considering that the someIndexedField field is indexed but say that we have thousands or millions of records wouldn't it be faster to use the first query?
Note: In an Oracle database.
In MySQL, if nested selects are over the same table, the execution time of the query can be hell.
A good way to improve the performance in MySQL is create a temporary table for the nested select and apply the main select against this table.
For example:
Select *
from someTable s1
where s1.someTable_id in
(Select someTable_id
from someTable s2
where s2.Field = 123);
Can have a better performance with:
create temporary table 'temp_table' as (
Select someTable_id
from someTable s2
where s2.Field = 123
);
Select *
from someTable s1
where s1.someTable_id in
(Select someTable_id
from tempTable s2);
I'm not sure about performance for a large amount of data.
About first query:
first query will evaluate the inner select first and then apply the
filter to the outer query.
That not so simple.
In SQL is mostly NOT possible to tell what will be executed first and what will be executed later.
Because SQL - declarative language.
Your "nested selects" - are only visually, not technically.
Example 1 - in "someTable" you have 10 rows, in "otherTable" - 10000 rows.
In most cases database optimizer will read "someTable" first and than check otherTable to have match. For that it may, or may not use indexes depending on situation, my filling in that case - it will use "indexedField" index.
Example 2 - in "someTable" you have 10000 rows, in "otherTable" - 10 rows.
In most cases database optimizer will read all rows from "otherTable" in memory, filter them by 123, and than will find a match in someTable PK(someTable_id) index. As result - no indexes will be used from "otherTable".
About second query:
It completely different from first. So, I don't know how compare them:
First query link two tables by one pair: s.someTable_id = o.someTable_id
Second query link two tables by two pairs: s.someTable_id = o.someTable_id AND o.someIndexedField = s.someIndexedField.
Common practice to link two tables - is your first query.
But, o.someTable_id should be indexed.
So common rules are:
all PK - should be indexed (they indexed by default)
all columns for filtering (like used in WHERE part) should be indexed
all columns used to provide match between tables (including IN, JOIN, etc) - is also filtering, so - should be indexed.
DB Engine will self choose the best order operations (or in parallel). In most cases you can not determine this.
Use Oracle EXPLAIN PLAN (similar exists for most DBs) to compare execution plans of different queries on real data.
When i used directly
where not exists (select VAL_ID FROM #newVals = OLDPAR.VAL_ID) it was cost 20sec. When I added the temp table it costs 0sec. I don't understand why. Just imagine as c++ developer that internally there loop by values)
-- Temp table for IDX give me big speedup
declare #newValID table (VAL_ID int INDEX IX1 CLUSTERED);
insert into #newValID select VAL_ID FROM #newVals
insert into #deleteValues
select OLDPAR.VAL_ID
from #oldVal AS OLDPAR
where
not exists (select VAL_ID from #newValID where VAL_ID=OLDPAR.VAL_ID)
or exists (select VAL_ID from #VaIdInternals where VAL_ID=OLDPAR.VAL_ID);

SubQuery vs TempTable before Merge

I have a complex query that I want to use as the Source of a Merge into a table. This will be executed over millions of rows. Currently I am trying to apply constraints to the data by inserting it into a temp table before the merge.
The operations are:
Filter out duplicate data.
Join some tables to pull in additional data
Insert into the temp table.
Here is the query.
-- Get all Orders that aren't in the system
WITH Orders AS
(
SELECT *
FROM [Staging].Orders o
WHERE NOT EXISTS
(
SELECT 1
FROM Maps.VendorBOrders vbo
JOIN OrderFact of
ON of.Id = vbo.OrderFactId
AND InternalOrderId = o.InternalOrderId
AND of.DataSetId = o.DataSetId
AND of.IsDelete = 0
)
)
INSERT INTO #VendorBOrders
(
CustomerId
,OrderId
,OrderTypeId
,TypeCode
,LineNumber
,FromDate
,ThruDate
,LineFromDate
,LineThruDate
,PlaceOfService
,RevenueCode
,BillingProviderId
,Cost
,AdjustmentTypeCode
,PaymentDenialCode
,EffectiveDate
,IDRLoadDate
,RelatedOrderId
,DataSetId
)
SELECT
vc.CustomerId
,OrderId
,OrderTypeId
,TypeCode
,LineNumber
,FromDate
,ThruDate
,LineFromDate
,LineThruDate
,PlaceOfService
,RevenueCode
,bp.Id
,Cost
,AdjustmentTypeCode
,PaymentDenialCode
,EffectiveDate
,IDRLoadDate
,ro.Id
,o.DataSetId
FROM
Orders o
-- Join related orders to match orders sharing same instance
JOIN Maps.VendorBRelatedOrder ro
ON ro.OrderControlNumber = o.OrderControlNumber
AND ro.EquitableCustomerId = o.EquitableCustomerId
AND ro.DataSetId = o.DataSetId
JOIN BillingProvider bp
ON bp.ProviderNPI = o.ProviderNPI
-- Join on customers and fail if the customer doesn't exist
LEFT OUTER JOIN [Maps].VendorBCustomer vc
ON vc.ExtenalCustomerId = o.ExtenalCustomerId
AND vc.VendorId = o.VendorId;
I am wondering if there is anything I can do to optimize it for time. I have tried using the DB Engine Tuner, but this query takes 100x more CPU Time than the other queries I am running. Is there anything else that I can look into or can the query not be improved further?
CTE is just syntax
That CTE is evaluated (run) on that join
First just run it as a select statement (no insert)
If the select is slow then:
Move that CTE to a #TEMP so it is evaluated once and materialized
Put an index (PK if applicable) on the three join columns
If the select is not slow then it is insert time on #VendorBOrders
Fist only create PK and sort the insert on the PK so as not to fragment that clustered index
Then AFTER the insert is complete build any other necessary indexes
Generally when I do speed testing I perform checks on the parts of SQL to see where the problem lies. Turn on the 'Execution plan' and see where a lot of the time is going. Also if you want to just do the quick and dirty highlight your CTE and run just that. Is that fast, yes, move on.
I have at times found a single index being off throws off a whole complex logic of joins by merely having the database do one part of something large and then finding that piece.
Another idea is that if you have a fast tempdb on a production environment or the like, dump your CTE to a temp table as well. Index on that and see if that speeds things up. Sometimes CTE's, table variables, and temp tables lose some performance at joins. I have found that creating an index on a partial object will improve performance at times but you are also putting more load on the tempdb to do this, so keep that in mind.

Can this SQL Query be optimized to run faster?

I have an SQL Query (For SQL Server 2008 R2) that takes a very long time to complete. I was wondering if there was a better way of doing it?
SELECT #count = COUNT(Name)
FROM Table1 t
WHERE t.Name = #name AND t.Code NOT IN (SELECT Code FROM ExcludedCodes)
Table1 has around 90Million rows in it and is indexed by Name and Code.
ExcludedCodes only has around 30 rows in it.
This query is in a stored procedure and gets called around 40k times, the total time it takes the procedure to finish is 27 minutes.. I believe this is my biggest bottleneck because of the massive amount of rows it queries against and the number of times it does it.
So if you know of a good way to optimize this it would be greatly appreciated! If it cannot be optimized then I guess im stuck with 27 min...
EDIT
I changed the NOT IN to NOT EXISTS and it cut the time down to 10:59, so that alone is a massive gain on my part. I am still going to attempt to do the group by statement as suggested below but that will require a complete rewrite of the stored procedure and might take some time... (as I said before, im not the best at SQL but it is starting to grow on me. ^^)
In addition to workarounds to get the query itself to respond faster, have you considered maintaining a column in the table that tells whether it is in this set or not? It requires a lot of maintenance but if the ExcludedCodes table does not change often, it might be better to do that maintenance. For example you could add a BIT column:
ALTER TABLE dbo.Table1 ADD IsExcluded BIT;
Make it NOT NULL and default to 0. Then you could create a filtered index:
CREATE INDEX n ON dbo.Table1(name)
WHERE IsExcluded = 0;
Now you just have to update the table once:
UPDATE t
SET IsExcluded = 1
FROM dbo.Table1 AS t
INNER JOIN dbo.ExcludedCodes AS x
ON t.Code = x.Code;
And ongoing you'd have to maintain this with triggers on both tables. With this in place, your query becomes:
SELECT #Count = COUNT(Name)
FROM dbo.Table1 WHERE IsExcluded = 0;
EDIT
As for "NOT IN being slower than LEFT JOIN" here is a simple test I performed on only a few thousand rows:
EDIT 2
I'm not sure why this query wouldn't do what you're after, and be far more efficient than your 40K loop:
SELECT src.Name, COUNT(src.*)
FROM dbo.Table1 AS src
INNER JOIN #temptable AS t
ON src.Name = t.Name
WHERE src.Code NOT IN (SELECT Code FROM dbo.ExcludedCodes)
GROUP BY src.Name;
Or the LEFT JOIN equivalent:
SELECT src.Name, COUNT(src.*)
FROM dbo.Table1 AS src
INNER JOIN #temptable AS t
ON src.Name = t.Name
LEFT OUTER JOIN dbo.ExcludedCodes AS x
ON src.Code = x.Code
WHERE x.Code IS NULL
GROUP BY src.Name;
I would put money on either of those queries taking less than 27 minutes. I would even suggest that running both queries sequentially will be far faster than your one query that takes 27 minutes.
Finally, you might consider an indexed view. I don't know your table structure and whether your violate any of the restrictions but it is worth investigating IMHO.
You say this gets called around 40K times. WHy? Is it in a cursor? If so do you really need a cursor. Couldn't you put the values you want for #name in a temp table and index it and then join to it?
select t.name, count(t.name)
from table t
join #name n on t.name = n.name
where NOT EXISTS (SELECT Code FROM ExcludedCodes WHERE Code = t.code)
group by t.name
That might get you all your results in one query and is almost certainly faster than 40K separate queries. Of course if you need the count of all the names, it's even simpleer
select t.name, count(t.name)
from table t
NOT EXISTS (SELECT Code FROM ExcludedCodes WHERE Code = t
group by t.name
NOT EXISTS typically performs better than NOT IN, but you should test it on your system.
SELECT #count = COUNT(Name)
FROM Table1 t
WHERE t.Name = #name AND NOT EXISTS (SELECT 1 FROM ExcludedCodes e WHERE e.Code = t.Code)
Without knowing more about your query it's tough to supply concrete optimization suggestions (i.e. code suitable for copy/paste). Does it really need to run 40,000 times? Sounds like your stored procedure needs reworking, if that's feasible. You could exec the above once at the start of the proc and insert the results in a temp table, which can keep the indexes from Table1, and then join on that instead of running this query.
This particular bit might not even be the bottleneck that makes your query run 27 minutes. For example, are you using a cursor over those 90 million rows, or scalar valued UDFs in your WHERE clauses?
Have you thought about doing the query once and populating the data in a table variable or temp table? Something like
insert into #temp (name, Namecount)
values Name, Count(name)
from table1
where name not in(select code from excludedcodes)
group by name
And don't forget that you could possibly use a filtered index as long as the excluded codes table is somewhat static.
Start evaluating the execution plan. Which is the heaviest part to compute?
Regarding the relation between the two tables, use a JOIN on indexed columns: indexes will optimize query execution.