Select with Many Discrete Values Possible in Where Clause - sql

Given a table like
CREATE TABLE [dbo].[Article](
[Id] [int] NOT NULL,
[CategoryId] [int] NOT NULL,
[Text] [nchar](10) NOT NULL)
users are allowed to select one or more categories for which they would like to view data. Typically they will select 1-20 categories. To accommodate that, I generate parameterized queries similar to:
SELECT * FROM Article
WHERE CategoryId IN (#c1, #c2, #c3, #c4, #c5)
However, in some rare use cases a user can legitimately select hundreds of categories. This lead me to discover a limitation of Linq-to-Entities, which I worked around by forming ranges of category codes. Unfortunately this only pushes off the issue, as there are limits to the size of a query that can be passed to SQL Server.
I would like to refactor this query to avoid any hard limits. My first thought was to create a temporary table containing the requested categories, and performing an inner join against that temporary table in lieu of the IN(...) clause. However, I understand that temporary tables can be quite slow.
Is there a more elegant and/or better performing solution to this problem?

Your first instinct is correct, thopugh you might find a table-valued variable sufficient in place of a temp table. Don't worry about the performance in a case like this; it won't be significant. An index could always be created onthe temp table if needed, but that seems unlikley. Is there an index on the CategoryId field?

EDIT:
Oops. I missed the Linq part.
Here is a alternate syntax that may be worth a try (for performance reasons, not for string length reasons)
Select * from dbo.Article art where exists ( select null from ( select 0 as MyV union all select 2 as MyV union all select 3 as MyV ) as derived1 where derived1.MyV = art.CategoryId )
......................
This is how I handle it.
Sometimes my variable table is changed to a #temp table. I test the 2 different scenarios for performance.
You can pass as many or as few values via xml.
DECLARE #input XML = '<root>
<category myvalue="1" />
<category myvalue="2" />
<category myvalue="3" />
</root>'
declare #holder table ( CatID int )
Insert into #holder (CatID)
SELECT
myvalue = MyXmlTable.value('(#myvalue)', 'int')
FROM
#input.nodes('/root/category') AS Tbl(MyXmlTable)
select * from #holder
SELECT * FROM Article art
where exists (select null from #holder hold where hold.CatID = art.CategoryId
Bigger write up here:
http://www.sqlservercentral.com/articles/Stored+Procedures/thezerotonparameterproblem/2283/

Related

SQL Server : join on uniqueidentifier

I have two tables Backup and Requests.
Below is the script for both the tables
Backup
CREATE TABLE UserBackup(
FileName varchar(70) NOT NULL,
)
File name is represented by a guid. Sometimes there is some additional information related to the file. Hence we have entries like guid_ADD entried in table.
Requests
CREATE TABLE Requests(
RequestId UNIQUEIDENTIFIER NOT NULL,
Status int Not null
)
Here are some sample rows :
UserBackup table:
FileName
15b993cc-e8be-405d-bb9f-0c58b66dcdfe
4cffe724-3f68-4710-b785-30afde5d52f8
4cffe724-3f68-4710-b785-30afde5d52f8_Add
7ad22838-ddee-4043-8d1f-6656d2953545
Requests table:
RequestId Status
15b993cc-e8be-405d-bb9f-0c58b66dcdfe 1
4cffe724-3f68-4710-b785-30afde5d52f8 1
7ad22838-ddee-4043-8d1f-6656d2953545 2
What I need is to return all the rows from userbackup table whose name (the guid) is matches RequestId in the Requests table and the status is 1. So here is the query I wrote
Select *
from UserBackup
inner join Requests on UserBackup.FileName = Requests.RequestId
where Requests.Status = 1
And this works fine. It returns me the following result
FileName RequestId Status
15b993cc-e8be-405d-bb9f-0c58b66dcdfe 15b993cc-e8be-405d-bb9f-0c58b66dcdfe 1
4cffe724-3f68-4710-b785-30afde5d52f8 4cffe724-3f68-4710-b785-30afde5d52f8 1
4cffe724-3f68-4710-b785-30afde5d52f8_Add 4cffe724-3f68-4710-b785-30afde5d52f8 1
This is exactly what I want. But what I don't understand is how it is working. If you notice the result is returning 4cffe724-3f68-4710-b785-30afde5d52f8_Add row as well. The inner join is on varchar and uniqueidentifier, and this join instead of working like "Equals to" comparison works like "contains" comparison. I want to know how this works so that I can be sure to use this code without any unexpected scenarios.
The values on both sides of a comparison have to be of the same data type. There's no such thing as, say, comparing a uniqueidentifier and a varchar.
uniqueidentifier has a higher precedence than varchar so the varchars will be converted to uniqueidentifiers before the comparison occurs.
Unfortunately, you get no error or warning if the string contains more characters than are needed:
select CONVERT(uniqueidentifier,'4cffe724-3f68-4710-b785-30afde5d52f8_Add')
Result:
4CFFE724-3F68-4710-B785-30AFDE5D52F8
If you want to force the comparison to occur between strings, you'll have to perform an explicit conversion:
Select *
from UserBackup
inner join Requests
on UserBackup.FileName = CONVERT(varchar(70),Requests.RequestId)
where Requests.Status = 1
When you compare two columns of different data types SQL Server will attempt to do implicit conversion on lower precedence.
The following comes from MSDN docs on uniqueidentifier
The following example demonstrates the truncation of data when the
value is too long for the data type being converted to. Because the
uniqueidentifier type is limited to 36 characters, the characters that
exceed that length are truncated.
DECLARE #ID nvarchar(max) = N'0E984725-C51C-4BF4-9960-E1C80E27ABA0wrong';
SELECT #ID, CONVERT(uniqueidentifier, #ID) AS TruncatedValue;
http://msdn.microsoft.com/en-us/library/ms187942.aspx
Documentation is clear that data is truncated
When ever you are unsure about your join operation you can verify Actual Execution Plan.
Here is test sample that you can run inside SSMS or SQL Sentry Plan Explorer
DECLARE #userbackup TABLE ( _FILENAME VARCHAR(70) )
INSERT INTO #userbackup
VALUES ( '15b993cc-e8be-405d-bb9f-0c58b66dcdfe' ),
( '4cffe724-3f68-4710-b785-30afde5d52f8' ),
( '4cffe724-3f68-4710-b785-30afde5d52f8_Add' )
, ( '7ad22838-ddee-4043-8d1f-6656d2953545' )
DECLARE #Requests TABLE
(
requestID UNIQUEIDENTIFIER
,_Status INT
)
INSERT INTO #Requests
VALUES ( '15b993cc-e8be-405d-bb9f-0c58b66dcdfe', 1 )
, ( '4cffe724-3f68-4710-b785-30afde5d52f8', 1 )
, ( '7ad22838-ddee-4043-8d1f-6656d2953545', 2 )
SELECT *
FROM #userbackup u
JOIN #Requests r
ON u.[_FILENAME] = r.requestID
WHERE r.[_Status] = 1
Instead of regular join operation SQL Server is doing HASH MATCH with EXPR 1006 in SSMS it is hard to see what is doing but if you open XML file you will find this
<ColumnReference Column="Expr1006" />
<ScalarOperator ScalarString="CONVERT_IMPLICIT(uniqueidentifier,#userbackup.[_FILENAME] as [u].[_FILENAME],0)">
When ever in doubt check execution plan and always make sure to match data types when comparing.
This is great blog Data Mismatch on WHERE Clause might Cause Serious Performance Problems from Microsoft engineer on exact problem.
What is happening here is the FileName is being converted from varchar to a UniqueIdentifier, and during that process it ignores anything after the first 36 characters.
You can see it in action here
Select convert(uniqueidentifier, UserBackup.FileName), FileName
from UserBackup
It works, but to reduce confusion for the next person to come along, you might want to store the RequestId associated with the UserBackup as a GUID in the UserBackup table and join on that.
At the very least put a comment in ;)

SQL History table and triggers

I have a requirement to keep a history of changes made to a specific table when there is an UPDATE called, but only care about specific columns.
So, I have created a History table:
CREATE TABLE [dbo].[SourceTable_History](
[SourceTable_HistoryID] [int] IDENTITY(1,1) NOT NULL,
[SourceTableID] [int] NOT NULL,
[EventDate] [date] NOT NULL,
[EventUser] [date] NOT NULL,
[ChangedColumn] VARCHAR(50) NOT NULL,
[PreviousValue] VARCHAR(100) NULL,
[NewValue] VARCHAR(100) NULL
CONSTRAINT pk_SourceTable_History PRIMARY KEY ([SourceTable_HistoryID]),
CONSTRAINT fk_SourceTable_HistoryID_History_Source FOREIGN KEY ([SourceTableID]) REFERENCES SourceTable (SourceTableId)
)
Abd my plan is to create an Update trigger on the SourceTable. The business only cares about changes to certain columns, so, in psudo code, I was planning to do something like
If source.Start <> new.start
Insert into history (PrimaryKey, EventDate, EventUser, ColumnName, OldValue, NewValue)
(PK, GETDATYE(), updateuser, "StartDate", old.value, new.value)
And there would be a block like that per column we want history on.
We're NOT allowed to use CDC, so we have to roll our own, and this is my plan so far.
Does this seem a suitable plan?
There are 7 tables we need to monitor, with a column count of between 2 and 5 columns per table.
I just need to work out how to get a trigger to first comapr the before and after values of a specific columnm and then write a new row.
I thought it was something as simple as:
CREATE TRIGGER tr_PersonInCareSupportNeeds_History
ON PersonInCareSupportNeeds
FOR UPDATE
AS
BEGIN
IF(inserted.StartDate <> deleted.StartDate)
BEGIN
INSERT INTO [dbo].[PersonInCareSupportNeeds_History]
([PersonInCareSupportNeedsID], [EventDate], [EventUser], [ChangedColumn], [PreviousValue], [NewValue])
VALUES
(inserted.[PersonInCareSupportNeedsID], GETDATE(), [LastUpdateUser], 'StartDate', deleted.[StartDate], deleted.[StartDate])
END
END
We have trigger based auditing system and we basically created it by analyzing how third party tool for generating audit triggers ApexSQL Audit creates triggers and manages storage and developed our own system based on that.
I think your solution is generally ok but that you need to think about modifying storage a bit and plan for scaling.
What if business decides to keep track of all columns in all tables? What if they decide to track inserts and deletes as well? Will your solution be able to accommodate this?
Storage: Use two tables to hold your data. One table for holding all info about transactions (when, who, application name, table name, schema name, affected rows, etc… ). And another table to hold the actual data (before and after values, primary key, etc..).
Triggers: We ended up with a template for insert, update and delete triggers and very simple C# app where we enter tables and columns so application outputs DDL. This saved us a lot of time.
Depending on your requirments, I think history tables should mirror the table you want to capture, plus the extra audit details (who, when, why).
That can make it easier to use the same existing logic (sql, data classes, screens etc) to view historical data.
With your design getting the data in will be ok, but how easy will it be to pull the data out in a usable format?
Well I think that your idea is not so bad. Actually, I have the similar system in production. I will not give you my complete code (with acynchronious history saving), but I could give you some guidelines.
The main idea is to turn your data from relational model into Entity-Attribute-Value model. Also we want our triggers to be as much general as we can, that means - do not write column names explicitly. This could be done by different ways, but the most general I know in SQL Server is to use FOR XML and then select from xml:
declare #Data xml
select #Data = (select * from Test for xml raw('Data'))
select
T.C.value('../#ID', 'bigint') as ID,
T.C.value('local-name(.)', 'nvarchar(128)') as Name,
T.C.value('.', 'nvarchar(max)') as Value
from #Data.nodes('Data/#*') as T(C)
SQL FIDDLE EXAMPLE
To get different rows of two tables, you could use EXCEPT:
select * from Test1 except select * from Test2
union all
select * from Test2 except select * from Test1
SQL FIDDLE EXAMPLE
and, finally, your trigger could be something like this:
create trigger utr_Test_History on Test
after update
as
begin
declare #Data_Inserted xml, #Data_Deleted xml
select #Data_Inserted =
(
select *
from (select * from inserted except select * from deleted) as a
for xml raw('Data')
)
select #Data_Deleted =
(
select *
from (select * from deleted except select * from inserted) as a
for xml raw('Data')
)
;with CTE_Inserted as (
select
T.C.value('../#ID', 'bigint') as ID,
T.C.value('local-name(.)', 'nvarchar(128)') as Name,
T.C.value('.', 'nvarchar(max)') as Value
from #Data_Inserted.nodes('Data/#*') as T(C)
), CTE_Deleted as (
select
T.C.value('../#ID', 'bigint') as ID,
T.C.value('local-name(.)', 'nvarchar(128)') as Name,
T.C.value('.', 'nvarchar(max)') as Value
from #Data_Deleted.nodes('Data/#*') as T(C)
)
insert into History (Table_Name, Record_ID, Event_Date, Event_User, Column_Name, Value_Old, Value_New)
select 'Test', isnull(I.ID, D.ID), getdate(), system_user, isnull(D.Name, I.Name), D.Value, I.Value
from CTE_Inserted as I
full outer join CTE_Deleted as D on D.ID = I.ID and D.Name = I.Name
where
not
(
I.Value is null and D.Value is null or
I.Value is not null and D.Value is not null and I.Value = D.Value
)
end
SQL FIDDLE EXAMPLE

Solution to avoid non-sargable argument in where clause

In the code_list CTE in this query I have a row constructor that will eventually take any number of arguments. The column icd in the patient_codes CTE is a five digit identifier that is most descriptive that the three digit codes that the row constructor has. The table icd_patient has a 100 million rows so for performance's sake, I would like to filer the rows on this table before I do any further work. I have
;with code_list(code_list)
as
(
select x.code_list
from (values ('70700'),('25002')) as x(code_list)
),patient_codes
as
(
select distinct icd,pat_id,id
from icd_patient
where icd in (select icd from code_list)
)
select distinct pat_id from patient_codes
The problem is, however, is that in the icd_patient table all of the icd columns are five digit and more descriptive. If I look at the execution plan of this query it's pretty streamlined. If I do
;with code_list(code_list)
as
(
select x.code_list
from (values ('70700'),('25002')) as x(code_list)
),patient_codes
as
(
select substring(icd,1,3) as icd,pat_id
from icd_patient2
where substring(icd,1,3) in (select * from code_list)
)
select * from patient_codes
this if course has a large performance impact because of the substring expression in the where clause. Does something akin to like in exist so I can take advantage of my indexes?
Index on icd_patient
CREATE NONCLUSTERED INDEX [ix_icd_patient] ON [dbo].[icd_patient2]
(
[pat_id] ASC
)
INCLUDE ( [id],
This much simpler query should be better than (or, at worst, the same as) your existing query.
select pat_id
FROM dbo.icd_patient
where icd LIKE '707%'
OR icd LIKE '250%'
GROUP BY pat_id;
Note that sargability only matters if there is actually an index on this column.
An alternative (since OR can sometimes give the optimizer fits):
SELECT pat_id FROM
(
SELECT pat_id
FROM dbo.icd_patient
WHERE icd LIKE '707%'
UNION ALL
SELECT pat_id
FROM dbo.icd_patient
WHERE icd LIKE '250%'
) AS x
GROUP BY pat_id;
To make this extensible beyond a handful of OR conditions, I would use a table-valued parameter (TVP).
CREATE TYPE dbo.StringPatterns AS TABLE(s VARCHAR(3) PRIMARY KEY);
Then your stored procedure could say:
CREATE PROCEDURE dbo.whatever
#sp dbo.StringPatterns READONLY
AS
BEGIN
SET NOCOUNT ON;
SELECT p.pat_id
FROM dbo.icd_patient AS p
INNER JOIN #sp AS sp
ON p.pat_id LIKE sp.s + '%'
GROUP BY p.pat_id;
END
Then you can pass in your set of three-character substrings from a DataTable or other collection in C#. From T-SQL just as an example:
DECLARE #p dbo.StringPatterns;
INSERT #p VALUES('707'),('250');
EXEC dbo.whatever #sp = #p;
Something like like in does not exist. The following is sargable:
select *
from icd_patient
where icd like '70700%' or
icd like '25002%'
Because like with a constant initial substring is a special case for SQL Server. This does not work when the strings on the right are variables.
One solution is to create an indexed view on the icd_patient table with an index on the first five characters of the icd code.
Using "IN" makes that part of a command non-sargable on both sides. End of discussion.
Saying he fixes it using substring, completely changes what it would return while it remains non sarged.
Any "fix" should exactly match results. The actual fix is to join the cte so the five characters match or put three characters in the cte and match that in a join or put 4 characters in the cte where the fourth is "%" and join matching by using LIKE
Using a "like" that starts with "%" increases the complexity of the search, but it would still use the index to find the value because parsing the index should use less reading by only getting the full table row when a search is successful.

sql server-query optimization with many columns

we have "Profile" table with over 60 columns like (Id, fname, lname, gender, profilestate, city, state, degree, ...).
users search other peopel on website. query is like :
WITH TempResult as (
select ROW_NUMBER() OVER(ORDER BY #sortColumn DESC) as RowNum, profile.id from Profile
where
(#a is null or a = #a) and
(#b is null or b = #b) and
...(over 60 column)
)
SELECT profile.* FROM TempResult join profile on TempResult.id = profile.id
WHERE
(RowNum >= #FirstRow)
AND
(RowNum <= #LastRow)
sql server by default use clustered index for execution query. but total execution time is over 300. we test another solution such as multi column index in all columns in where clause but total execution time is over 400.
do you have any solution to make total execution time lower than 100.
we using sql server 2008.
Unfortunately I don't think there is a pure SQL solution to your issue. Here are a couple alternatives:
Dynamic SQL - build up a query that only includes WHERE clause statements for values that are actually provided. Assuming the average search actually only fills in 2-3 fields, indexes could be added and utilized.
Full Text Search - go to something more like a Google keyword search. No individual options.
Lucene (or something else) - Search outside of SQL; This is a fairly significant change though.
One other option that I just remembered implementing in a system once. Create a vertical table that includes all of the data you are searching on and build up a query for it. This is easiest to do with dynamic SQL, but could be done using Table Value Parameters or a temp table in a pinch.
The idea is to make a table that looks something like this:
Profile ID
Attribute Name
Attribute Value
The table should have a unique index on (Profile ID, Attribute Name) (unique to make the search work properly, index will make it perform well).
In this table you'd have rows of data like:
(1, 'city', 'grand rapids')
(1, 'state', 'MI')
(2, 'city', 'detroit')
(2, 'state', 'MI')
Then your SQL will be something like:
SELECT *
FROM Profile
JOIN (
SELECT ProfileID
FROM ProfileAttributes
WHERE (AttributeName = 'city' AND AttributeValue = 'grand rapids')
AND (AttributeName = 'state' AND AttributeValue = 'MI')
GROUP BY ProfileID
HAVING COUNT(*) = 2
) SelectedProfiles ON Profile.ProfileID = SelectedProfiles.ProfileID
... -- Add your paging here
Like I said, you could use a temp table that has attribute name/values:
SELECT *
FROM Profile
JOIN (
SELECT ProfileID
FROM ProfileAttributes
JOIN PassedInAttributeTable ON ProfileAttributes.AttributeName = PassedInAttributeTable.AttributeName
AND ProfileAttributes.AttributeValue = PassedInAttributeTable.AttributeValue
GROUP BY ProfileID
HAVING COUNT(*) = CountOfRowsInPassedInAttributeTable -- calculate or pass in
) SelectedProfiles ON Profile.ProfileID = SelectedProfiles.ProfileID
... -- Add your paging here
As I recall, this ended up performing very well, even on fairly complicated queries (though I think we only had 12 or so columns).
As a single query, I can't think of a clever way of optimising this.
Provided that each column's check is highly selective, however, the following (very long winded) code, might prove faster, assuming each individual column has it's own separate index...
WITH
filter AS (
SELECT
[a].*
FROM
(SELECT * FROM Profile WHERE #a IS NULL OR a = #a) AS [a]
INNER JOIN
(SELECT id FROM Profile WHERE b = #b UNION ALL SELECT NULL WHERE #b IS NULL) AS [b]
ON ([a].id = [b].id) OR ([b].id IS NULL)
INNER JOIN
(SELECT id FROM Profile WHERE c = #c UNION ALL SELECT NULL WHERE #c IS NULL) AS [c]
ON ([a].id = [c].id) OR ([c].id IS NULL)
.
.
.
INNER JOIN
(SELECT id FROM Profile WHERE zz = #zz UNION ALL SELECT NULL WHERE #zz IS NULL) AS [zz]
ON ([a].id = [zz].id) OR ([zz].id IS NULL)
)
, TempResult as (
SELECT
ROW_NUMBER() OVER(ORDER BY #sortColumn DESC) as RowNum,
[filter].*
FROM
[filter]
)
SELECT
*
FROM
TempResult
WHERE
(RowNum >= #FirstRow)
AND (RowNum <= #LastRow)
EDIT
Also, thinking about it, you may even get the same result just by having the 60 individual indexes. SQL Server can do INDEX MERGING...
You've several issues imho. One is that you're going to end up with a seq scan no matter what you do.
But I think your more crucial issue here is that you've an unnecessary join:
SELECT profile.* FROM TempResult
WHERE
(RowNum >= #FirstRow)
AND
(RowNum <= #LastRow)
This is a classic "SQL Filter" query problem. I've found that the typical approaches of "(#b is null or b = #b)" & it's common derivatives all yeild mediocre performance. The OR clause tends to be the cause.
Over the years I've done a lot of Perf/Tuning & Query Optimisation. The Approach I've found best is to generate Dynamic SQL inside a Stored Proc. Most times you also need to add "with Recompile" on the statement. The Stored Proc helps reduce potential for SQL injection attacks. The Recompile is needed to force the selection of indexes appropriate to the parameters you are searching on.
Generally it is at least an order of magnitude faster.
I agree you should also look at points mentioned above like :-
If you commonly only refer to a small subset of the columns you could create non-clustered "Covering" indexes.
Highly selective (ie:those with many unique values) columns will work best if they are the lead colum in the index.
If many colums have a very small number of values, consider using The BIT datatype. OR Create your own BITMASKED BIGINT to represent many colums ie: a form of "Enumerated datatyle". But be careful as any function in the WHERE clause (like MOD or bitwise AND/OR) will prevent the optimiser from choosing an index. It works best if you know the value for each & can combine them to use an equality or range query.
While often good to find RoWID's with a small query & then join to get all the other columns you want to retrieve. (As you are doing above) This approach can sometimes backfire. If the 1st part of the query does a Clustred Index Scan then often it is faster to get the otehr columns you need in the select list & savethe 2nd table scan.
So always good to try it both ways & see what works best.
Remember to run SET STATISTICS IO ON & SET STATISTICS TIME ON. Before running your tests. Then you can see where the IO is & it may help you with index selection, for the mose frequent combination of paramaters.
I hope this makes sense without long code samples. (it is on my other machine)

SQL Server 2005 Full Text forum Search

I'm working on a search stored procedure for our existing forums.
I've written the following code which uses standard SQL full text indexes, however I'm sure there is a better way of doing it and would like a point in the right direction.
To give some info on how it needs to work, The page has 1 search text box which when clicked will search thread titles, thread descriptions and post text and should return the results with the title matches first, then descriptions then post data.
Below is what I've written so far which works but is not elegant or as fast as I would like. To give an example of performance with 20K threads and 80K posts it takes about 12 seconds to search for 5 random words.
ALTER PROCEDURE [dbo].[SearchForums]
(
--Input Params
#SearchText VARCHAR(200),
#GroupId INT = -1,
#ClientId INT,
--Paging Params
#CurrentPage INT,
#PageSize INT,
#OutTotalRecCount INT OUTPUT
)
AS
--Create Temp Table to Store Query Data
CREATE TABLE #SearchResults
(
Relevance INT IDENTITY,
ThreadID INT,
PostID INT,
[Description] VARCHAR(2000),
Author BIGINT
)
--Create and populate table of all GroupID's This search will return from
CREATE TABLE #GroupsToSearch
(
GroupId INT
)
IF #GroupId = -1
BEGIN
INSERT INTO #GroupsToSearch
SELECT GroupID FROM SNetwork_Groups WHERE ClientId = #ClientId
END
ELSE
BEGIN
INSERT INTO #GroupsToSearch
VALUES(#GroupId)
END
--Get Thread Titles
INSERT INTO #SearchResults
SELECT
SNetwork_Threads.[ThreadId],
(SELECT NULL) AS PostId,
SNetwork_Threads.[Description],
SNetwork_Threads.[OwnerUserId]
FROM
SNetwork_Threads
INNER JOIN SNetwork_Groups ON SNetwork_Groups.GroupId = SNetwork_Threads.GroupId
WHERE
FREETEXT(SNetwork_Threads.[Description], #SearchText) AND
Snetwork_Threads.GroupID IN (SELECT GroupID FROM #GroupsToSearch) AND
SNetwork_Groups.ClientId = #ClientId
--Get Thread Descriptions
INSERT INTO #SearchResults
SELECT
SNetwork_Threads.[ThreadId],
(SELECT NULL) AS PostId,
SNetwork_Threads.[Description],
SNetwork_Threads.[OwnerUserId]
FROM
SNetwork_Threads
INNER JOIN SNetwork_Groups ON SNetwork_Groups.GroupId = SNetwork_Threads.GroupId
WHERE
FREETEXT(SNetwork_Threads.[Name], #SearchText) AND
Snetwork_Threads.GroupID IN (SELECT GroupID FROM #GroupsToSearch) AND
SNetwork_Groups.ClientId = #ClientId
--Get Posts
INSERT INTO #SearchResults
SELECT
SNetwork_Threads.ThreadId,
SNetwork_Posts.PostId,
SNetwork_Posts.PostText,
SNetwork_Posts.[OwnerUserId]
FROM
SNetwork_Posts
INNER JOIN SNetwork_Threads ON SNetwork_Threads.ThreadId = SNetwork_Posts.ThreadId
INNER JOIN SNetwork_Groups ON SNetwork_Groups.GroupId = SNetwork_Threads.GroupId
WHERE
FREETEXT(SNetwork_Posts.PostText, #SearchText) AND
Snetwork_Threads.GroupID IN (SELECT GroupID FROM #GroupsToSearch) AND
SNetwork_Groups.ClientId = #ClientId
--Return Paged Result Sets
SELECT #OutTotalRecCount = COUNT(*) FROM #SearchResults
SELECT
#SearchResults.[ThreadID],
#SearchResults.[PostID],
#SearchResults.[Description],
#SearchResults.[Author]
FROM
#SearchResults
WHERE
#SearchResults.[Relevance] >= (#CurrentPage - 1) * #PageSize + 1 AND
#SearchResults.[Relevance] <= #CurrentPage*#PageSize
ORDER BY Relevance ASC
--Clean Up
DROP TABLE #SearchResults
DROP TABLE #GroupsToSearch
I know its a bit long winded but just a nudge in the right direction would be well appreciated.
Incase it helps 80% of the query time is taken up when search posts and according to teh query plan is spent on "Clustered Index Scan" on the posts table. I cant see anyway around this.
Thanks
Gavin
I'd really have to see an explain plan to know where the slow parts were, as I don't see anything particularly nasty in your code. Very first thing - make sure all your indexes are in good shape, they are being used, statistics are up to date, etc.
One other idea would be to do the search on thread title first, then use the results from that to prune the searches on thread description and post text. Similarly, use the results from the thread description search to prune the post text search.
The basic idea here is that if you find the keywords in the thread title, why bother searching the description and posts? I realize this may not work depending on how you are presenting the search results to the user, and it may not make a huge difference, but it's something to think about.
80k records isn't that much. I'd recommend not inserting the resulting data into your temp table, and instead only inserting the IDs, then joining to that table afterward. This will save on writing to the temp table, as you may store 10000 ints, instead of 10000 full posts (of which you discard all but one page of). This may reduce the amount of time spent scanning posts, as well.
It looks like you would need two temp tables, one for threads and one for posts. You would union them in the final select.