SQL - Optimizing complex paged search querys - sql

I'm currently developing a stored procedure for a complex search in a big database. Because there are many thousand entries, that could be returned I want to use paging. Although it is working, I think it's too slow. I read a lot of posts and articles regarding pagination of SQL queries and optimizing performance. But most 'optimizations' were only helpful for very basic requests like 'give items 20-30 from table x'.
Since our world is not that simple and there are more complex queries to make I would like to get some help optimizing the following query:
CREATE PROCEDURE [SearchItems]
#SAttr1 BIT = 0,
#SAttr2 BIT = 0,
#SAttr3 BIT = 0,
#Flag1 BIT = 0,
#Flag2 BIT = 0,
#Param1 VARCHAR(20),
#Param2 VARCHAR(10),
#SkipCount BIGINT,
#TakeCount BIGINT,
#SearchStrings NVARCHAR(1000)
AS
DECLARE #SearchStringsT TABLE(
Val NVARCHAR(30)
)
INSERT INTO #SearchStringsT
SELECT * FROM dbo.Split(#SearchStrings,',');
WITH ResultTable AS (
SELECT Table1.*, ROW_NUMBER() OVER(ORDER BY Table1.ID ASC) AS [!ROWNUM!]
FROM Table1
INNER JOIN Table2 ON Table1.ID = Table2.FK1
INNER JOIN Table3 ON Table2.ID = Table3.FK2
INNER JOIN Table4 ON Table3.XX = Table4.FKX
WHERE Table1.X1 = #Parameter1
AND
(#Flag1 = 0 OR Table1.X2 = 1) AND
(#Flag2 = 0 OR Table2.X4 = #Parameter2) AND
(#Flag3 = 0 OR EXISTS(SELECT * FROM Table5 WHERE Table5.ID = Table3.X1))
AND
(
(#SAttr1 = 0 OR EXISTS(SELECT * FROM #SearchStringsT WHERE Table1.X1 LIKE Val)) OR
(#SAttr2 = 0 OR EXISTS(SELECT * FROM #SearchStringsT WHERE Table2.X1 LIKE Val)) OR
(#SAttr3 = 0 OR EXISTS(SELECT * FROM #SearchStringsT WHERE Table3.X1 LIKE Val)) OR
(#SAttr4 = 0 OR EXISTS(SELECT * FROM #SearchStringsT WHERE Table4.X1 LIKE Val))
)
)
SELECT TOP(#TakeCount) * FROM ResultTable
WHERE [!ROWNUM!] BETWEEN (#SkipCount + 1) AND (#SkipCount + #TakeCount)
RETURN
The #SAttr parameters are bit parameters to specify whether to search a field or not , the #Flag parameters are turning on/off checking of some boolean expressions, #SkipCount and #TakeCount are used for paging. #SearchString is a comma separated list of search keywords, already including the wild cards.
I hope someone can help me optimizing this, because a single search in a database with 20.000 entries in the main table lasts 800ms and its increasing with entry count. The final application needs to deal with over 100.000 entries.
I thank you very much for every help.
Marks

While I agree with Tom H. that this may be a case where dynamic SQL is best (and I'm a stored proc kinds girl, so I don't say that very often), it may be that you don't have good indexing on your tables. Are all the possible search fields indexed? Are all the FKs indexed?
I mean 20,000 is a tiny, tiny table and 100,000 is too, so it really seems as if you might not have indexed yet.
Check your execution plan to see if indexes are being used.

Stored procedures are not very good at being super-generic because it prevents SQL Server from always using optimal methods. In a similar situation recently I used (gasp) dynamic SQL. My search stored procedures would build the SQL code to perform the search, using pagination just like you have it (WITH with a ROW_NUMBER(), etc.). The advantage was that if parameters indicated that one piece of information wasn't being used in the search then the generated code would omit it. In the end it allowed for better query plans.
Make sure that you use sp_executesql properly to prevent SQL injection attacks.

Related

Recursive SQL function and change tracking don't work together

I have a recursive function which gives allows me to give any GUID in the heirarchy and it pulls back all the values below it. This is used for folder security.
ALTER FUNCTION dbo.ValidSiteClass
(
#GUID UNIQUEIDENTIFIER
)
RETURNS TABLE
AS
RETURN
(
-- Add the SELECT statement with parameter references here
WITH previous
AS ( SELECT
PK_SiteClass,
FK_Species_SiteClass,
CK_ParentClass,
ClassID,
ClassName,
Description,
SyncKey,
SyncState
FROM
dbo.SiteClass
WHERE
PK_SiteClass = #GUID
UNION ALL
SELECT
Cur.PK_SiteClass,
Cur.FK_Species_SiteClass,
Cur.CK_ParentClass,
Cur.ClassID,
Cur.ClassName,
Cur.Description,
Cur.SyncKey,
Cur.SyncState
FROM
dbo.SiteClass Cur,
previous
WHERE
Cur.CK_ParentClass = previous.PK_SiteClass)
SELECT DISTINCT
previous.PK_SiteClass,
previous.FK_Species_SiteClass,
previous.CK_ParentClass,
previous.ClassID,
previous.ClassName,
previous.Description,
previous.SyncKey,
previous.syncState
FROM
previous
)
I have a stored procudure which then later needs to figure out what folders have changed in the user's heirarchy which I use for change tracking. When I try to join it with my change tracking it never returns the query. For example, the following doesn't ever return any results (It just spins, I stop it after 6 minutes)
DECLARE #ChangeTrackerNumber INT = 13;
DECLARE #SelectedSchema UNIQUEIDENTIFIER = '36EC6589-8297-4A82-86C3-E6AAECCC7D95';
WITH validones AS (SELECT PK_SITECLASS FROM ValidSiteClass(#SelectedSchema))
SELECT SiteClass.PK_SiteClass KeyGuid,
'' KeyString,
dbo.GetChangeOperationEnum(SYS_CHANGE_OPERATION) ChangeOp
FROM dbo.SiteClass
INNER JOIN CHANGETABLE(CHANGES SiteClass, #ChangeTrackerNumber) tracking --tracking
ON tracking.PK_SiteClass = SiteClass.PK_SiteClass
INNER JOIN validones
ON SiteClass.PK_SiteClass = validones.PK_SiteClass
WHERE SyncState IN ( 0, 2, 4 );
The only way I can make this work is with a temptable such as:
DECLARE #ChangeTrackerNumber INT = 13;
DECLARE #SelectedSchema UNIQUEIDENTIFIER = '36EC6589-8297-4A82-86C3-E6AAECCC7D95';
CREATE TABLE #temptable
(
[PK_SiteClass] UNIQUEIDENTIFIER
);
INSERT INTO #temptable
(
PK_SiteClass
)
SELECT PK_SiteClass
FROM dbo.ValidSiteClass(#SelectedSchema);
SELECT SiteClass.PK_SiteClass KeyGuid,
'' KeyString,
dbo.GetChangeOperationEnum(SYS_CHANGE_OPERATION) ChangeOp
FROM dbo.SiteClass
INNER JOIN CHANGETABLE(CHANGES SiteClass, #ChangeTrackerNumber) tracking --tracking
ON tracking.PK_SiteClass = SiteClass.PK_SiteClass
INNER JOIN #temptable
ON SiteClass.PK_SiteClass = #temptable.PK_SiteClass
WHERE SyncState IN ( 0, 2, 4 );
DROP TABLE #temptable;
In other words, the CTE doesn't work and I need to call the temptable.
First question, isn't the CTE supposed to be the same thing (but better) than a temptable?
Second question, does anyone know why this could be so? I have tried inner joins and using a where and in clause also. Is there something different about a recursive query that might cause this odd behavior?
Generally, when you have a table-valued function, you'd just include it like it was a regular table (assuming you have a parameter to pass to it). If you want to pass a series of parameters to it, you'd use outer apply, but that doesn't seem to be the case here.
I think (maybe) this is more like you want (notice no with clause):
select
s.PK_SiteClass KeyGuid,
'' KeyString,
dbo.GetChangeOperationEnum(t.SYS_CHANGE_OPERATION) ChangeOp
from
dbo.ValidSiteClass(#SelectedSchema) v
inner join
SiteClass s
on
s.PK_SiteClass = v.PK_SiteClass
inner join
changetable(changes SiteClass, #ChangeTrackerNumber) c
on
c.PK_SiteClass = s.PK_SiteClass
where
SyncState in ( 0, 2, 4 )
option (force order)
...which I'll admit doesn't look that mechanically different than what you have with the with clause. However, you could be running into an issue with SQL Server just picking a horrible plan not having any other clues. Including the option (force order) makes SQL Server perform the joins according to the order you put them in...and sometimes this makes an incredible difference.
I wouldn't say this is recommended. In fact, it's a hack...just to see WTF. But, play around with the order...and get SQL Server to show you the actual execution plans to see why it might have come up with something so heinous. An inline table-valued function is visible to SQL Server's query plan engine, and it may decide to not treat the function as an isolated thing the way programmers traditionally think about functions. I suspect this is why it took so long to begin with.
Funny enough, if your function were to be a so-called multi-lined table-valued function, SQL would definitely not have the same type of visibility into it when planning this query...and it might run faster. Again, not a recommendation, just something that might hack a better plan.

Performance of In Operator with OR conditional SQL

I have common clause in Most of the Procedures like
Select * from TABLE A + Joins where <Conditions>
And
(
-- All Broker
('True' = (Select AllBrokers from SiteUser where ID = #SiteUserID))
OR
(
A.BrokerID in
(
Select BrokerID from SiteUserBroker where SiteUserID
= #SiteUserID)
)
)
So basically if the user has access to all brokers the whole filter should not be applied else if should get the list of Broker
I am bit worries about the performance as this is used in lot of procedures and data has started reaching over 100,000 records and will grow soon, so can this be better written?
ANY Ideas are highly appreciated
One of the techniques is to use built dynamic T-SQL statement and then execute it. Since, this is done in stored procedure you are OK and the idea is simple.
DECLARE #DynamicTSQLStatement NVARCHAR(MAX);
SET #DynamicTSQLStatement = 'base query';
IF 'Getting All Brokers is not allowed '
BEGIN;
SET #DynamicTSQLStatement = #DynamicTSQLStatement + 'aditional where clause'
END;
EXEC sp_executesql #DynamicTSQLStatement;
Or instead of using dynamic T-SQL statement you can have two separate queries - one for users seeing all the data and one for users seeing part of the data. This can lead to code duplication.
Another way, is to turn this OR statement in INNER JOIN. You should test the performance in order to be sure you are getting something from it. The idea is to create a temporary table (it can have primary key or indexes if needed) and store all visible broker ids there - if the users sees all, then Select BrokerID from SiteUserBroker and if the users sees a few - Select BrokerID from SiteUserBroker where SiteUserID = #SiteUserID. In the second way, you are going to simplify the whole statement, but be sure to test if performance is improved.
CREATE TABLE #SiteUserBroker
(
[BrokerID] INT PRIMARY KEY
);
INSERT INTO #SiteUserBroker ([BrokerID])
SELECT BrokerID
FROM SiteUserBroker
where SiteUserID = #SiteUserID
OR ('True' = (Select AllBrokers from SiteUser where ID = #SiteUserID));
Select *
from TABLE A
INNER JOIN #SiteUserBroker B
ON A.BrokerID = B.[BrokerID]
-- other joins
where <Conditions>
As we are using INNER JOIN you can add it at the begging. If there are LEFT JOINs after it, it will affect the performance in positive way.
Adding to #gotqn answer, you can add EXISTS instead of IN (Note - This is not complete answer) -
AND EXISTS (
Select 1/0 from SiteUserBroker X
where A.BrokerID = X.BrokerID AND
X.SiteUserID = #SiteUserID
)
I found that Exists performs better than In in some cases. Please verify your case.

SQL Server stored procedure takes 1' 18" to run... seems long

Sure could use some optimization help here. I've got a stored procedure which takes approximately 1 minute, 18 seconds to run and it gets even worse when I run the asp.net page which hits it.
Some stats:
tbl_Allocation typically has approximately 55K records
CS_Ready has ~300
Redate_Orders has ~2000
Here is the code:
ALTER PROCEDURE [dbo].[sp_Order_Display]
/*
(
#parameter1 int = 5,
#parameter2 datatype OUTPUT
)
*/
AS
/* SET NOCOUNT ON */
BEGIN
WTIH CS_Ready AS
(
SELECT
tbl_Order_Notes.txt_Order_Only As CS_Ready_Order
FROM
tbl_Order_Notes
INNER JOIN
tbl_Order_Notes_by_line ON tbl_Order_Notes.txt_Order_Only = SUBSTRING(tbl_Order_Notes_by_line.txt_Order_Key_by_line, 1, CHARINDEX('-', tbl_Order_Notes_by_line.txt_Order_Key_by_line, 0) - 1)
WHERE
(tbl_Order_Notes.bin_Customer_Service_Review = 'True')
AND (tbl_Order_Notes_by_line.dat_Recommended_Date_by_line IS NOT NULL)
AND (tbl_Order_Notes_by_line.bin_Redate_Request_by_line = 'True')
OR (tbl_Order_Notes.bin_Customer_Service_Review = 'True')
AND (tbl_Order_Notes_by_line.dat_Recommended_Date_by_line IS NULL)
AND (tbl_Order_Notes_by_line.bin_Redate_Request_by_line = 'False'
OR tbl_Order_Notes_by_line.bin_Redate_Request_by_line IS NULL)
),
Redate_Orders AS
(
SELECT DISTINCT
SUBSTRING(txt_Order_Key_by_line, 1, CHARINDEX('-', txt_Order_Key_by_line, 0) - 1) AS Redate_Order_Number
FROM
tbl_Order_Notes_by_line
WHERE
(bin_Redate_Request_by_line = 'True')
)
SELECT DISTINCT
tbl_Allocation.*, tbl_Order_Notes.*,
tbl_Order_Notes_by_line.*,
tbl_Max_Promised_Date_1.Max_Promised_Ship,
tbl_Max_Promised_Date_1.Max_Scheduled_Pick,
Redate_Orders.Redate_Order_Number, CS_Ready.CS_Ready_Order,
tbl_Most_Recent_Comments.Abbr_Comment,
MRC_Line.Abbr_Comment as Abbr_Comment_Line
FROM
tbl_Allocation
INNER JOIN
tbl_Max_Promised_Date AS tbl_Max_Promised_Date_1 ON tbl_Allocation.num_Order_Num = tbl_Max_Promised_Date_1.num_Order_Num
LEFT OUTER JOIN
CS_Ready ON tbl_Allocation.num_Order_Num = CS_Ready.CS_Ready_Order
LEFT OUTER JOIN
Redate_Orders ON tbl_Allocation.num_Order_Num = Redate_Orders.Redate_Order_Number
LEFT OUTER JOIN
tbl_Order_Notes ON Hidden_Order_Only = tbl_Order_Notes.txt_Order_Only
LEFT OUTER JOIN
tbl_Order_Notes_by_line ON Hidden_Order_Key = tbl_Order_Notes_by_line.txt_Order_Key_by_line
LEFT OUTER JOIN
tbl_Most_Recent_Comments ON Cast(tbl_Allocation.Hidden_Order_Only as varchar) = tbl_Most_Recent_Comments.Com_ID_Parent_Key
LEFT OUTER JOIN
tbl_Most_Recent_Comments as MRC_Line ON Cast(tbl_Allocation.Hidden_Order_Key as varchar) = MRC_Line.Com_ID_Parent_Key
ORDER BY
num_Order_Num, num_Line_Num
End
RETURN
What suggestions do you have to make this execute within five seconds or less?
Thanks,
Rob
Assuming you have appropriate indices defined, you still have several things that suggest problems.
1) You have 2 select distinct clauses in this query -- in a good design, distinct clauses are are rarely needed
2) The first inner join uses
tbl_Order_Notes_by_line
ON tbl_Order_Notes.txt_Order_Only
= SUBSTRING(tbl_Order_Notes_by_line.txt_Order_Key_by_line, 1,
CHARINDEX('-', tbl_Order_Notes_by_line.txt_Order_Key_by_line, 0) - 1)
This looks like a horrible join criteria -- function calls during the join that prevent any decent query optimization. My guess is that your are using data the has internal meaning and that you are parsing the internal meaning during the join, e.g.,
PartNumber = AAA-BBBB_NNNNNNNN
where AAA is the Country product line and BBBB is the year & month of the design
If you must have coded fields like these AND you need to manipulate them, put the codes into separate database fields and created a computer column -- or even a plan copy of the full part number field if the combined field is unusually complex.
This point is not a performance issue, but you have a long sub-query using multiple AND & OR clauses. I know the rules for operator precedence, you may know the rules for operator precedence, but will the next guy? Will you remember them an 1:00 when stuff is broken.
ADDED
You are using 2 common table expressions. I know others say it does not happen, but I don't really trust the query optimizer for CTE's -- I have had to recode CTE based joins for performance issues on several occasions -- creating an actual view equivalent to the CTE and using that instead can be a significant speedup. May well depend on the version of SQL server, but if you are running an older version I would definitely wonder about CTR optimization. -- This is not as important as the first 2 things I've mentioned, try to fix those first.
ADDED
I'm going to harsh on CTEs again, as I did not really explain why they are bad for performance, and it was bothering me. If you don't have performance issues, and you like the syntax, they can be useful in at least limited usage, personally I don't normally recommend them for anything more than that -- and given that it is MS specific syntactical sugar, I really can't recommend them much at all.
I think the primary reason that CTEs don't get optimized well is that there are no statistics for the opimizer to use. If you are pulling a lot of rows into a CTE, you are probably better off creating #temptable and populating it. You can even add an index or two to your #temptable and the optimizer can figure out how to use them too. A #temp table is similar, but at least through sql 2012, the were no faster than #temp that I could tell -- supposedly new goodness in server 2014 help this.
A CTE is really just a temporary view in disguise, which I why I suggested you can replace with a real view to better better performance (and you often can), or you can populate a temp table and sometime get even better performance.

Can this SQL Query be optimized to run faster?

I have an SQL Query (For SQL Server 2008 R2) that takes a very long time to complete. I was wondering if there was a better way of doing it?
SELECT #count = COUNT(Name)
FROM Table1 t
WHERE t.Name = #name AND t.Code NOT IN (SELECT Code FROM ExcludedCodes)
Table1 has around 90Million rows in it and is indexed by Name and Code.
ExcludedCodes only has around 30 rows in it.
This query is in a stored procedure and gets called around 40k times, the total time it takes the procedure to finish is 27 minutes.. I believe this is my biggest bottleneck because of the massive amount of rows it queries against and the number of times it does it.
So if you know of a good way to optimize this it would be greatly appreciated! If it cannot be optimized then I guess im stuck with 27 min...
EDIT
I changed the NOT IN to NOT EXISTS and it cut the time down to 10:59, so that alone is a massive gain on my part. I am still going to attempt to do the group by statement as suggested below but that will require a complete rewrite of the stored procedure and might take some time... (as I said before, im not the best at SQL but it is starting to grow on me. ^^)
In addition to workarounds to get the query itself to respond faster, have you considered maintaining a column in the table that tells whether it is in this set or not? It requires a lot of maintenance but if the ExcludedCodes table does not change often, it might be better to do that maintenance. For example you could add a BIT column:
ALTER TABLE dbo.Table1 ADD IsExcluded BIT;
Make it NOT NULL and default to 0. Then you could create a filtered index:
CREATE INDEX n ON dbo.Table1(name)
WHERE IsExcluded = 0;
Now you just have to update the table once:
UPDATE t
SET IsExcluded = 1
FROM dbo.Table1 AS t
INNER JOIN dbo.ExcludedCodes AS x
ON t.Code = x.Code;
And ongoing you'd have to maintain this with triggers on both tables. With this in place, your query becomes:
SELECT #Count = COUNT(Name)
FROM dbo.Table1 WHERE IsExcluded = 0;
EDIT
As for "NOT IN being slower than LEFT JOIN" here is a simple test I performed on only a few thousand rows:
EDIT 2
I'm not sure why this query wouldn't do what you're after, and be far more efficient than your 40K loop:
SELECT src.Name, COUNT(src.*)
FROM dbo.Table1 AS src
INNER JOIN #temptable AS t
ON src.Name = t.Name
WHERE src.Code NOT IN (SELECT Code FROM dbo.ExcludedCodes)
GROUP BY src.Name;
Or the LEFT JOIN equivalent:
SELECT src.Name, COUNT(src.*)
FROM dbo.Table1 AS src
INNER JOIN #temptable AS t
ON src.Name = t.Name
LEFT OUTER JOIN dbo.ExcludedCodes AS x
ON src.Code = x.Code
WHERE x.Code IS NULL
GROUP BY src.Name;
I would put money on either of those queries taking less than 27 minutes. I would even suggest that running both queries sequentially will be far faster than your one query that takes 27 minutes.
Finally, you might consider an indexed view. I don't know your table structure and whether your violate any of the restrictions but it is worth investigating IMHO.
You say this gets called around 40K times. WHy? Is it in a cursor? If so do you really need a cursor. Couldn't you put the values you want for #name in a temp table and index it and then join to it?
select t.name, count(t.name)
from table t
join #name n on t.name = n.name
where NOT EXISTS (SELECT Code FROM ExcludedCodes WHERE Code = t.code)
group by t.name
That might get you all your results in one query and is almost certainly faster than 40K separate queries. Of course if you need the count of all the names, it's even simpleer
select t.name, count(t.name)
from table t
NOT EXISTS (SELECT Code FROM ExcludedCodes WHERE Code = t
group by t.name
NOT EXISTS typically performs better than NOT IN, but you should test it on your system.
SELECT #count = COUNT(Name)
FROM Table1 t
WHERE t.Name = #name AND NOT EXISTS (SELECT 1 FROM ExcludedCodes e WHERE e.Code = t.Code)
Without knowing more about your query it's tough to supply concrete optimization suggestions (i.e. code suitable for copy/paste). Does it really need to run 40,000 times? Sounds like your stored procedure needs reworking, if that's feasible. You could exec the above once at the start of the proc and insert the results in a temp table, which can keep the indexes from Table1, and then join on that instead of running this query.
This particular bit might not even be the bottleneck that makes your query run 27 minutes. For example, are you using a cursor over those 90 million rows, or scalar valued UDFs in your WHERE clauses?
Have you thought about doing the query once and populating the data in a table variable or temp table? Something like
insert into #temp (name, Namecount)
values Name, Count(name)
from table1
where name not in(select code from excludedcodes)
group by name
And don't forget that you could possibly use a filtered index as long as the excluded codes table is somewhat static.
Start evaluating the execution plan. Which is the heaviest part to compute?
Regarding the relation between the two tables, use a JOIN on indexed columns: indexes will optimize query execution.

How can I optimize this query for time performance?

I am working with a large amount of data: 6 million rows. I need the query to run as fast as possible, but am at a loss for further optimization. I already removed 3 subqueries and moved it from 11+ hours to just 35 minutes on a modest dataset of 100k rows. See below!
declare #UserId uniqueidentifier;
set #UserId = '936DA01F-9ABD-4d9d-80C7-02AF85C822A8';
select
temp.Address_Line1,
temp.Cell_Phone_Number,
temp.City,
temp.CPM_delt_acd,
temp.CPM_delt_date,
temp.Customer_Id,
temp.Customer_Type,
temp.Date_Birth,
temp.Email_Business,
temp.Email_Home,
temp.First_Name,
temp.Geo,
temp.Home_Phone_Number,
temp.Last_Name,
temp.Link_Customer_Id,
temp.Middle_Name,
temp.Naics_Code,
temp.Office_Phone_Number,
temp.St,
temp.Suffix,
temp.Tin,
temp.TIN_Indicator,
temp.Zip_Code,
crm_c.contactid as CrmRecordId,
crm_c.ownerid as OldOwnerId,
crm_c.ext_profiletype as old_profileType,
coalesce(crm_fim.ownerid, #UserId) as OwnerId,
2 as profileType,
case
when
(temp.Tin = crm_c.ext_retail_prime_taxid collate database_default
and temp.Last_Name = crm_c.lastname collate database_default)
then
('Tin/LastName: '+temp.Tin + '/' + temp.Last_Name)
when
(temp.Customer_ID = crm_c.ext_customerid collate database_default)
then
('Customer_ID: '+temp.Customer_ID)
else
('New Customer: '+temp.Customer_ID)
end as FriendlyName,
case
when
(temp.Customer_ID = crm_c.ext_customerid collate database_default)
then
0
else
1
end as ForceFieldLock
from DailyProfile_Current temp
left join crm_contact crm_c
on (temp.Customer_ID = crm_c.ext_customerid collate database_default
or (temp.Tin = crm_c.ext_retail_prime_taxid collate database_default
and temp.Last_Name = crm_c.lastname collate database_default))
and 0 = crm_c.deletionstatecode and 0 = crm_c.statecode
left outer join crm_ext_ImportMapping crm_fim
on temp.Geo = crm_fim.ext_geocode collate database_default
and 0 = crm_fim.deletionstatecode and 0 = crm_fim.statecode
Where crm_contact is a synonym that points to a view in another database. That view pulls data from a contact table and a contactextension table. I need data from both. I could probably separate this into two joins, if necessary. In general, columns that begin with "ext_" are from the extension part of the crm_contact view.
When I run this against 100k rows in the DailyProfile_Current table, it takes about 35 minutes. That table is a bunch of nvarchar(200) columns that had a flatfile dumped into it. It sucks, but it's what I inherited. I wonder if using real datatypes would help, but I'd like possible solutions that don't involve that as well.
If the DailyProfile_Current table is full of things that don't match the join conditions, this runs incredibly fast. If the table is full of things that do match the join conditions, it's incredibly slow.
There are indexes on Customer_ID and Geo from the temp table. There are also assorted indexes on the crm_contact tables. I don't know how much an index can help on an nvarchar(200) column, though.
In case it matters, I'm using Sql Server 2005.
Any ideas are appreciated.
I would definitely split it into 2 Queries as the or function can be slow at times. Also, put a non clustered index on these columns (group them by line):
DailyProfile_Current:
Customer_ID
Tin, Last_Name
Geo
crm_contact:
ext_customerid,deletionstatecode,statecode
ext_retail_prime_taxid, lastname ,deletionstatecode,statecode
crm_ext_ImportMapping:
ext_geocode,deletionstatecode,statecode
Why dont you try to run it through Query Profiler? It might give you few hints.
Or else include the execution plan in the query result and look through it.
From the look on the query I can only suggest to split it in two by moving the OR from JOIN clause and use UNION ALL to union the results. At least, it might give you and idea which of two types of JOINs is slow, and work from there.
Run it throught the query analyzer and allow it to create indexes for you. i'm guessing you have at least sql 2000. And why not break up some of the functionality in code. for example you can do the case statement in code. But that is assuming that you are writing a query for code. I find that splitting up queries and taking up some of the load in code provides a significant difference in run time.