Optimize query in TSQL 2005 - sql-server-2005

I have to optimize this query can some help me fine tune it so it will return data faster?
Currently the output is taking somewhere around 26 to 35 seconds. I also created index based on attachment table following is my query and index:
SELECT DISTINCT o.organizationlevel, o.organizationid, o.organizationname, o.organizationcode,
o.organizationcode + ' - ' + o.organizationname AS 'codeplusname'
FROM Organization o
JOIN Correspondence c ON c.organizationid = o.organizationid
JOIN UserProfile up ON up.userprofileid = c.operatorid
WHERE c.status = '4'
--AND c.correspondence > 0
AND o.organizationlevel = 1
AND (up.site = 'ALL' OR
up.site = up.site)
--AND (#Dept = 'ALL' OR #Dept = up.department)
AND EXISTS (SELECT 1 FROM Attachment a
WHERE a.contextid = c.correspondenceid
AND a.context = 'correspondence'
AND ( a.attachmentname like '%.rtf' or a.attachmentname like '%.doc'))
ORDER BY o.organizationcode
I can't just change anything in db due to permission issues, any help would be much appreciated.

I believe your headache is coming from this part in specific...like in a where exists can be your performance bottleneck.
AND EXISTS (SELECT 1 FROM Attachment a
WHERE a.contextid = c.correspondenceid
AND a.context = 'correspondence'
AND ( a.attachmentname like '%.rtf' or a.attachmentname like '%.doc'))
This can be written as a join instead.
SELECT DISTINCT o.organizationlevel, o.organizationid, o.organizationname, o.organizationcode,
o.organizationcode + ' - ' + o.organizationname AS 'codeplusname'
FROM Organization o
JOIN Correspondence c ON c.organizationid = o.organizationid
JOIN UserProfile up ON up.userprofileid = c.operatorid
left join article a on a.contextid = c.correspondenceid
AND a.context = 'correspondence'
and right(attachmentname,4) in ('.doc','.rtf')
....
This eliminates both the like and the where exists. put your where clause at the bottom.it's a left join, so a.anycolumn is null means the record does not exist and a.anycolumn is not null means a record was found. Where a.anycolumn is not null will be the equivalent of a true in the where exists logic.
Edit to add:
Another thought for you...I'm unsure what you are trying to do here...
AND (up.site = 'ALL' OR
up.site = up.site)
so where up.site = 'All' or 1=1? is the or really needed?
and quickly on right...Right(column,integer) gives you the characters from the right of the string (I used a 4, so it'll take the 4 right chars of the column specified). I've found it far faster than a like statement runs.

This is always going to return true so you can eliminate it (and maybe the join to up)
AND (up.site = 'ALL' OR up.site = up.site)
If you can live with dirty reads then with (nolock)
And I would try Attachement as a join. Might not help but worth a try. Like is relatively expensive and if it is doing that in a loop where it could it once that would really help.
Join Attachment a
on a.contextid = c.correspondenceid
AND a.context = 'correspondence'
AND ( a.attachmentname like '%.rtf' or a.attachmentname like '%.doc'))
I know there are some people on SO that insist that exists is always faster than a join. And yes it is often faster than a join but not always.
Another approach is the create a #temp table using
CREATE TABLE #Temp (contextid INT PRIMARY KEY CLUSTERED);
insert into #temp
Select distinct contextid
from atachment
where context = 'correspondence'
AND ( attachmentname like '%.rtf' or attachmentname like '%.doc'))
order by contextid;
go
select ...
from correspondence c
join #Temp
on #Temp.contextid = c.correspondenceid
go
drop table #temp
Especially if productID is the primary key or part of the primary key on correspondence creating the PK on #temp will help.
That way you can be sure that like expression is only evaluated once. If the like is the expensive part and in a loop then it could be tanking the query. I use this a lot where I have a fairly expensive core query and I need to those results to pick up reference data from multiple tables. If you do a lot of joins some times the query optimizer goes stupid. But if you give the query optimizer PK to PK then it does not get stupid and is fast. The down side is it takes about 0.5 seconds to create and populate the #temp.

Related

Trying to optimize a system objects search

I’m trying to search the database for any stored procedures that contain one of about 3500 different values.
I created a table to store the values in. I’m running the query below. The problem is, just testing it with a SELECT TOP 100 is taking 3+ mins to run (I have 3500+ values). I know it’s happening due to the query using LIKE.
I’m wondering if anyone has an idea on how I could optimize the search. The only results I need are the names of every value being searched for (pulled directly from the table I created: “SearchTerms”) and then a column that displays a 1 if it exists, 0 if it doesn’t.
Here’s the query I’m running:
SELECT
trm.Pattern,
(CASE
WHEN sm.object_id IS NULL THEN 0
ELSE 1
END) AS “Exists”
FROM dbo.SearchTerms trm
LEFT OUTER JOIN sys.sql_modules sm
ON sm.definition LIKE '%' + trm.Pattern + '%'
ORDER BY trm.Pattern
Note: it’s a one-time deal —it’s not something that will be run consistently.
Try CTE and get your Patterns which exists in any stored procedure with WHERE condition using EXISTS (...). Then use LEFT JOIN with dbo.SearchTerms and your CTE to get 1 or 0 value for Exists column.
;WITH ExistsSearchTerms AS (
SELECT Pattern
FROM dbo.SearchTerms
WHERE EXISTS (SELECT 1 FROM sys.sql_modules sm WHERE sm.definition LIKE '%' + Pattern + '%')
)
SELECT trm.Pattern, IIF(trmExist.Pattern IS NULL, 0, 1) AS "Exists"
FROM dbo.SearchTerms trm
LEFT JOIN dbo.SearchTerms trmExist
ON trm.Pattern = trmExist.Pattern
ORDER BY Pattern
Reference :
SQL performance on LEFT OUTER JOIN vs NOT EXISTS
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server

SQL Query Pull data from another table IF NULL

I have a system running a SQL Server Express database and I need to pull some data from it. I have the basic SQL query created but I have found that some data is located elsewhere.
The basic premise is I have a database of Repair Orders, Vehicles And Customers. The Vehicles are usually added via a VIN decoder so they have ID's associated from a MAKE and MODEL table. However in the case of a VIN not decoding the application allows the user to manually enter this information and then it is stored in another table named "UserVehicleAttributes". In this table there is the VehicleID, AttributeName, & AttributeValue.
UserAttributeId VehicleId AttributeName AttributeValue
-----------------------------------------------------------
364 6829 Model Sedona
365 6830 Make Kia
366 6830 Model Sedona
So what I need is if the Make or Model comes up as NULL from the Vehicle table, I can display what as manually entered in.
I found that there is an existing function in the DB that looks to be able to do what I need but I don't know how to use it as part of my query.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER FUNCTION [SM].[fnVehicleModelName]()
RETURNS TABLE
AS
RETURN
(
SELECT DISTINCT v.VehicleId,
CASE
WHEN v.SubModelId IS NULL THEN ISNULL(ua.[AttributeValue],'')
ELSE smm.[Name]
END as Model
FROM SM.Vehicle v (NOLOCK)
LEFT OUTER JOIN
(SELECT sm.SubModelId, m.[Name] + ' ' + sm.[Name] as Name
FROM DMV.SubModel (NOLOCK) sm
INNER JOIN DMV.Model m (NOLOCK)
ON sm.ModelId = m.ModelId ) as smm
ON v.SubModelId = smm.SubModelId
LEFT OUTER JOIN SM.UserVehicleAttributes ua (NOLOCK)--
ON v.VehicleId = ua.VehicleId and ua.AttributeName = 'Model'
Any help is greatly appreciated. I am not very good with SQL (obviously) but I am trying to figure this one out.
I'm not sure why you're making this a function with no parameters - that's kinda the same thing as a view. Consider if using a view here might simplify the situation.
You're correct that ISNULL is what you want to use here, but I think the join should be more simple. Your situation is basically "pull the column value from whichever table has a non-null value, giving preference to one table first"
In the outer join, all the columns from the outer joined tables will be null if there's not a match, and if there is a match, all the values should be filled in. Knowing that... you should be able to do something like this... (as an example to clarify how this concept works, not solving your query for you)
select v.VehicleId,
VehicleName = isnull(Model.Name, UserVehicle.Name)
from Vehicle v
left outer join Model on Model.VehicleID = Vehicle.VehicleID
left outer join UserVehicle on UserVehicle.VehicleID = Vehicle.VehicleId
So, what that does is join the possible rows from either table, and the ISNULL macro selects whichever value is non-null. Do that for the rest of the columns, and fix the join condition to whatever your conditions are, and you should be golden.
That function has no parameters, if you want to use it rewrite it as a view, but it only shows model, so you can use subqueries like this:
SELECT
VehicleId,
CASE
WHEN Make IS NULL
THEN ( SELECT AttributeValue FROM UserVehicleAttributes
WHERE VehicleId = Vehicles.VehicleId
AND AttributeName = 'Make' )
ELSE Make
END AS Make,
CASE
WHEN Model IS NULL
THEN ( SELECT AttributeValue FROM UserVehicleAttributes
WHERE VehicleId = Vehicles.VehicleId
AND AttributeName = 'Model' )
ELSE Model
END AS Model
FROM
Vehicles

Check the query efficiency

I have this below SQL query that I want to get an opinion on whether I can improve it using Temp Tables or something else or is this good enough? So basically I am just feeding the result set from inner query to the outer one.
SELECT S.SolutionID
,S.SolutionName
,S.Enabled
FROM dbo.Solution S
WHERE s.SolutionID IN (
SELECT DISTINCT sf.SolutionID
FROM dbo.SolutionToFeature sf
WHERE sf.SolutionToFeatureID IN (
SELECT sfg.SolutionToFeatureID
FROM dbo.SolutionFeatureToUsergroup SFG
WHERE sfg.UsergroupID IN (
SELECT UG.UsergroupID
FROM dbo.Usergroup UG
WHERE ug.SiteID = #SiteID
)
)
)
It's going to depend largely on the indexes you have on those tables. Since you are only selecting data out of the Solution table, you can put everything else in an exists clause, do some proper joins, and it should perform better.
The exists clause will allow you to remove the distinct you have on the SolutionToFeature table. Distinct will cause a performance hit because it is basically creating a temp table behind the scenes to do the comparison on whether or not the record is unique against the rest of the result set. You take a pretty big hit as your tables grow.
It will look something similar to what I have below, but without sample data or anything I can't tell if it's exactly right.
Select S.SolutionID, S.SolutionName, S.Enabled
From dbo.Solutin S
Where Exists (
select 1
from dbo.SolutionToFeature sf
Inner Join dbo.SolutionToFeatureTousergroup SFG on sf.SolutionToFeatureID = SFG.SolutionToFeatureID
Inner Join dbo.UserGroup UG on sfg.UserGroupID = UG.UserGroupID
Where S.SolutionID = sf.SolutionID
and UG.SiteID = #SiteID
)

SQL Update with join

Is there any other faster way to do an update besides with a join? Here's my query but it's a bit slow:
UPDATE #user_dupes
SET ExternalEmail = ud2.Email
FROM #user_dupes ud1
INNER JOIN(
SELECT Email, UserName
FROM #user_flat_emailtable_dupes
WHERE EmailType = 2
AND Email IS NOT NULL AND LEN(Email) > 0
) ud2
ON ud1.UserName = ud2.UserName
Thanks for any ideas
If you are using SQL Server, you were almost there. It's just a little fix:
UPDATE ud1 --little fix here!
SET ExternalEmail = ud2.Email
FROM #user_dupes ud1
INNER JOIN
(
SELECT Email, UserName
FROM #user_flat_emailtable_dupes
WHERE EmailType = 2
AND Email IS NOT NULL AND LEN(Email) > 0
) ud2
ON ud1.UserName = ud2.UserName
A couple of changes, on top of what #Adrian said...
UPDATE
ud1 -- #Adrian's change. Update the instance that you have already aliased.
SET
externalEmail = ud2.Email
FROM
#user_dupes AS ud1
INNER JOIN
#user_flat_emailtable_dupes AS ud2
ON ud1.UserName = ud2.UserName
WHERE
ud2.EmailType = 2 -- Removed sub-query, for layout, doubt it will help performance
AND ud2.Email IS NOT NULL
AND ud2.Email <> '' -- Removed the `LEN()` function
But possibly the most important past is to ensure you have indexes. The JOIN is necessary for this logic (or correlated sub-queries, etc), so you want the join to be performant.
An Index of (UserName) on #user_dupes, and an Index of (EmailType, Email, UserName) on #user_flat_emailtable_dupes. (This assumes ud2 is the smaller table, after the filtering)
With the indexes as specified, the change from LEN(Email) > 0 to Email <> '' may allow an index seek rather than scan. The larger your tables the more apparent this will be.
I believe this query will do the same thing (although you'd have to be sure to form #user_flat_emailtable_dupes with no duplicate usernames). I haven't checked to see if they have different execution plans. It looks like you're refining junky input, I mention this partly because I do a lot of that and find MERGE useful (all the more useful for me since I don't know how UPDATE FROM works). And partly because I hadn't ever used MERGE with variables. It appears to be the case that at least the target table must be aliased, or the parser decides #ud1 is a scalar variable and it breaks.
MERGE #user_dupes AS ud1
USING #user_flat_emailtable_dupes AS ud2
ON emailType = 2
AND COALESCE(ud2.email, '') <> ''
AND ud2.username = ud1.username
WHEN MATCHED THEN UPDATE
SET externalEmail = ud2.email
;

SQL 2005 Query Optimisation

I have a SQL 2005 table consisting of around 10million records (dbo.Logs).
I have another table, dbo.Rollup that matches distinct dbo.Logs.URL to a FileId column in a third table, dbo.Files. The dbo.Rollup table forms the basis of various aggregate reports we run at a later stage.
Suffice to say for now, the problem I am having is in populating dbo.Rollup efficiently.
By definition, dbo.Logs has potentially tens of thousands of rows which all share the same URL field value. In our application, one URL can be matched to one dbo.Files.FileId. I.E. There is a many-to-one relationship between dbo.Logs.URL and dbo.Files.FileId (we parse the values of dbo.Logs to determine what the appropriate FileId is for a given URL).
My goal is to significantly reduce the amount of time it takes the first of three stored procedures that run in order to create meaningful statistics from our raw log data.
What I need is a specific example of how to refactor this SQL query to be much more efficient:
sp-Rollup-Step1:
INSERT INTO dbo.Rollup ([FileURL], [FileId])
SELECT
logs.RequestedFile As [URL],
FileId = dbo.fn_GetFileIdFromURL(l.RequestedFile, l.CleanFileName)
FROM
dbo.Logs l (readuncommitted)
WHERE
NOT EXISTS (
SELECT
FileURL
FROM
dbo.Rollup
WHERE
FileUrl = RequestedFile
)
fn_GetFileIdFromURL():
CREATE FUNCTION [dbo].[fn_GetFileIdFromURL]
(
#URL nvarchar(500),
#CleanFileName nvarchar(255)
)
RETURNS uniqueidentifier
AS
BEGIN
DECLARE #id uniqueidentifier
if (exists(select FileURL from dbo.[Rollup] where [FileUrl] = #URL))
begin
-- This URL has been seen before in dbo.Rollup.
-- Retrieve the FileId from the dbo.Rollup table.
set #id = (select top 1 FileId from dbo.[Rollup] where [FileUrl] = #URL)
end
else
begin
-- This is a new URL. Hunt for a matching URL in our list of files,
-- and return a FileId if a match is found.
Set #id = (
SELECT TOP 1
f.FileId
FROM
dbo.[Files] f
INNER JOIN
dbo.[Servers] s on s.[ServerId] = f.[ServerId]
INNER JOIN
dbo.[URLs] u on
u.[ServerId] = f.[ServerId]
WHERE
Left(u.[PrependURLProtocol],4) = left(#URL, 4)
AND #CleanFileName = f.FileName
)
end
return #id
END
Key considerations:
dbo.Rollup should contain only one entry for each DISTINCT/unique URL found in dbo.tLogs.
I would like to omit records from being inserted into dbo.[Rollup] where the FileId is NULL.
In my own observations, it seems the slowest part of the query by far is in the stored procedure: the "NOT EXISTS" clause (I am not sure at this point whether that continually refreshes the table or not).
I'm looking for a specific solution (with examples using either pseudo-code or by modifying my procedures shown here) - answer will be awarded to those who provide it!
Thanks in advance for any assistance you can provide.
/Richard.
Short answer is you have a CURSOR here. The scalar UDF is run per row of output.
The udf could be 2 LEFT JOINs onto derived tables. A rough outline:
...
COALESCE (F.xxx, L.xxx) --etc
...
FROM
dbo.Logs l (readuncommitted)
LEFT JOIN
(select DISTINCT --added after comment
FileId, FileUrl from dbo.[Rollup]) R ON L.FileUrl = R.FileUrl
LEFT JOIN
(SELECT DISTINCT --added after comment
f.FileId,
FileName ,
left(#PrependURLProtocol, 4) + '%' AS Left4
FROM
dbo.[Files] f
INNER JOIN
dbo.[Servers] s on s.[ServerId] = f.[ServerId]
INNER JOIN
dbo.[URLs] u on
u.[ServerId] = f.[ServerId]
) F ON L.CleanFileName = R.FileName AND L.FileURL LIKE F.Left4
...
I'm also not sure if you need the NOT EXISTS because of how the udf works. If you do, make sure the columns are indexed.
I think your hotspot is located here:
Left(u.[PrependURLProtocol],4) = left(#URL, 4)
This will cause the server to do a scan on the url table. You should not use a function on a field in a join clause. try to rewrite that to something like
... where PrependURLProtocol like left(#URL, 4) +"%"
And make sure you have an index on the field.
INSERT INTO dbo.Rollup ([FileURL], [FileId])
SELECT
logs.RequestedFile As [URL],
FileId = dbo.fn_GetFileIdFromURL(l.RequestedFile, l.CleanFileName)
FROM dbo.Logs l (readuncommitted) LEFT OUTER JOIN dbo.Rollup
on FileUrl = RequestedFile
WHERE FileUrl IS NULL
The logic here is that if dbo.Rollup does not exist for the given FileUrl, then the left outer join will turn up null. The NOT EXISTS now becomes an IS NULL, which is faster.