SQL Query Optimization - sql

This report used to take about 16 seconds when there were 8000 rows to process. Now there are 50000 rows and the report takes 2:30 minutes.
This was my first pass at this and the client needed it yesterday, so I wrote this code in the logical order of what needed to be done, but without optimization in mind.
Now with the report taking longer as the data increases, I need to take a second look at this and optimize it. I'm thinking indexed views, table functions, etc.
I think the biggest bottleneck is looping through the temp table, making 4 select statements, and updating the temp table...50,000 times.
I think I can condense ALL of this into one large SELECT with either (a) 4 joins to the same table to get the 4 statuses, but then I am not sure how to get the TOP 1 in there, or I can try (b) using nested subqueries, but both seem really messy compared to the current code.
I'm not expecting anyone to write code for me, but if some SQL experts can peruse this code and tell me about any obvious inefficiencies and alternate methods, or ways to speed this up, or techniques I should be using instead, it would be appreciated.
PS: Assume that this DB is for the most part normalized, but poorly designed, and that I am not able to add indexes. I basically have to work with it, as is.
Where the code says (less than) I had to replace a "less than" symbol because it was cropping some of my code.
Thanks!
CREATE PROCEDURE RptCollectionAccountStatusReport AS
SET NOCOUNT ON;
DECLARE #Accounts TABLE
(
[AccountKey] INT IDENTITY(1,1) NOT NULL,
[ManagementCompany] NVARCHAR(50),
[Association] NVARCHAR(100),
[AccountNo] INT UNIQUE,
[StreetAddress] NVARCHAR(65),
[State] NVARCHAR(50),
[PrimaryStatus] NVARCHAR(100),
[PrimaryStatusDate] SMALLDATETIME,
[PrimaryDaysRemaining] INT,
[SecondaryStatus] NVARCHAR(100),
[SecondaryStatusDate] SMALLDATETIME,
[SecondaryDaysRemaining] INT,
[TertiaryStatus] NVARCHAR(100),
[TertiaryStatusDate] SMALLDATETIME,
[TertiaryDaysRemaining] INT,
[ExternalStatus] NVARCHAR(100),
[ExternalStatusDate] SMALLDATETIME,
[ExternalDaysRemaining] INT
);
INSERT INTO
#Accounts (
[ManagementCompany],
[Association],
[AccountNo],
[StreetAddress],
[State])
SELECT
mc.Name AS [ManagementCompany],
a.LegalName AS [Association],
c.CollectionKey AS [AccountNo],
u.StreetNumber + ' ' + u.StreetName AS [StreetAddress],
CASE WHEN c.InheritedAccount = 1 THEN 'ZZ' ELSE u.State END AS [State]
FROM
ManagementCompany mc WITH (NOLOCK)
JOIN
Association a WITH (NOLOCK) ON a.ManagementCompanyKey = mc.ManagementCompanyKey
JOIN
Unit u WITH (NOLOCK) ON u.AssociationKey = a.AssociationKey
JOIN
Collection c WITH (NOLOCK) ON c.UnitKey = u.UnitKey
WHERE
c.Closed IS NULL;
DECLARE #MaxAccountKey INT;
SELECT #MaxAccountKey = MAX([AccountKey]) FROM #Accounts;
DECLARE #index INT;
SET #index = 1;
WHILE #index (less than) #MaxAccountKey BEGIN
DECLARE #CollectionKey INT;
SELECT #CollectionKey = [AccountNo] FROM #Accounts WHERE [AccountKey] = #index;
DECLARE #PrimaryStatus NVARCHAR(100) = NULL;
DECLARE #PrimaryStatusDate SMALLDATETIME = NULL;
DECLARE #PrimaryDaysRemaining INT = NULL;
DECLARE #SecondaryStatus NVARCHAR(100) = NULL;
DECLARE #SecondaryStatusDate SMALLDATETIME = NULL;
DECLARE #SecondaryDaysRemaining INT = NULL;
DECLARE #TertiaryStatus NVARCHAR(100) = NULL;
DECLARE #TertiaryStatusDate SMALLDATETIME = NULL;
DECLARE #TertiaryDaysRemaining INT = NULL;
DECLARE #ExternalStatus NVARCHAR(100) = NULL;
DECLARE #ExternalStatusDate SMALLDATETIME = NULL;
DECLARE #ExternalDaysRemaining INT = NULL;
SELECT TOP 1
#PrimaryStatus = a.StatusName, #PrimaryStatusDate = c.StatusDate, #PrimaryDaysRemaining = c.DaysRemaining
FROM CollectionAccountStatus c WITH (NOLOCK) JOIN AccountStatus a WITH (NOLOCK) ON c.AccountStatusKey = a.AccountStatusKey
WHERE c.CollectionKey = #CollectionKey AND a.StatusType = 'Primary Status' AND a.StatusName 'Cleared'
ORDER BY c.sysCreated DESC;
SELECT TOP 1
#SecondaryStatus = a.StatusName, #SecondaryStatusDate = c.StatusDate, #SecondaryDaysRemaining = c.DaysRemaining
FROM CollectionAccountStatus c WITH (NOLOCK) JOIN AccountStatus a WITH (NOLOCK) ON c.AccountStatusKey = a.AccountStatusKey
WHERE c.CollectionKey = #CollectionKey AND a.StatusType = 'Secondary Status' AND a.StatusName 'Cleared'
ORDER BY c.sysCreated DESC;
SELECT TOP 1
#TertiaryStatus = a.StatusName, #TertiaryStatusDate = c.StatusDate, #TertiaryDaysRemaining = c.DaysRemaining
FROM CollectionAccountStatus c WITH (NOLOCK) JOIN AccountStatus a WITH (NOLOCK) ON c.AccountStatusKey = a.AccountStatusKey
WHERE c.CollectionKey = #CollectionKey AND a.StatusType = 'Tertiary Status' AND a.StatusName 'Cleared'
ORDER BY c.sysCreated DESC;
SELECT TOP 1
#ExternalStatus = a.StatusName, #ExternalStatusDate = c.StatusDate, #ExternalDaysRemaining = c.DaysRemaining
FROM CollectionAccountStatus c WITH (NOLOCK) JOIN AccountStatus a WITH (NOLOCK) ON c.AccountStatusKey = a.AccountStatusKey
WHERE c.CollectionKey = #CollectionKey AND a.StatusType = 'External Status' AND a.StatusName 'Cleared'
ORDER BY c.sysCreated DESC;
UPDATE
#Accounts
SET
[PrimaryStatus] = #PrimaryStatus,
[PrimaryStatusDate] = #PrimaryStatusDate,
[PrimaryDaysRemaining] = #PrimaryDaysRemaining,
[SecondaryStatus] = #SecondaryStatus,
[SecondaryStatusDate] = #SecondaryStatusDate,
[SecondaryDaysRemaining] = #SecondaryDaysRemaining,
[TertiaryStatus] = #TertiaryStatus,
[TertiaryStatusDate] = #TertiaryStatusDate,
[TertiaryDaysRemaining] = #TertiaryDaysRemaining,
[ExternalStatus] = #ExternalStatus,
[ExternalStatusDate] = #ExternalStatusDate,
[ExternalDaysRemaining] = #ExternalDaysRemaining
WHERE
[AccountNo] = #CollectionKey;
SET #index = #index + 1;
END;
SELECT
[ManagementCompany],
[Association],
[AccountNo],
[StreetAddress],
[State],
[PrimaryStatus],
CONVERT(VARCHAR, [PrimaryStatusDate], 101) AS [PrimaryStatusDate],
[PrimaryDaysRemaining],
[SecondaryStatus],
CONVERT(VARCHAR, [SecondaryStatusDate], 101) AS [SecondaryStatusDate],
[SecondaryDaysRemaining],
[TertiaryStatus],
CONVERT(VARCHAR, [TertiaryStatusDate], 101) AS [TertiaryStatusDate],
[TertiaryDaysRemaining],
[ExternalStatus],
CONVERT(VARCHAR, [ExternalStatusDate], 101) AS [ExternalStatusDate],
[ExternalDaysRemaining]
FROM
#Accounts
ORDER BY
[ManagementCompany],
[Association],
[StreetAddress]
ASC;

Don't try to guess where the query is going wrong - look at the execution plan. It will tell you what's chewing up your resources.
You can update directly from another table, even from a table variable: SQL update from one Table to another based on a ID match
That would allow you to combine everything in your loop into a single (massive) statement. You can join to the same tables for the secondary and tertiary statuses using different aliases, e.g.,
JOIN AccountStatus As TertiaryAccountStatus...AND a.StatusType = 'Tertiary Status'
JOIN AccountStatus AS SecondaryAccountStatus...AND a.StatusType = 'Secondary Status'
I'll bet you don't have an index on the AccountStatus.StatusType field. You might try using the PK of that table instead.
HTH.

First use a temp table instead of a table varaiable. These can be indexed.
Next, do not loop! Looping is bad for performance in virtually every case. This loop ran 50000 times rather than once for 50000 records, it will be horrible when you havea million records! Here is a link that will help you understand how to do set-based processing instead. It is written to avoid cursos but loops are similar to cursors, so it should help.
http://wiki.lessthandot.com/index.php/Cursors_and_How_to_Avoid_Them
And (nolock) will give dirty data reads which can be very bad for reporting. If you are in a version of SQl Server higher than 2000, there are better choices.

SELECT #CollectionKey = [AccountNo] FROM #Accounts WHERE [AccountKey] = #index;
This query would benefit from a PRIMARY KEY declaration on your table variable.
When you say IDENTITY, you are asking the database to auto-populate the column.
When you say PRIMARY KEY, you are asking the database to organize the data into a clustered index.
These two concepts are very different. Typically, you should use both of them.
DECLARE #Accounts TABLE
(
[AccountKey] INT IDENTITY(1,1) PRIMARY KEY,
I am not able to add indexes.
In that case, copy the data to a database where you may add indexes. And use: SET STATISTICS IO ON

Related

MSSQL says I am trying to convert a varchar to a string when I'm not

So I have this fairly long procedure at Work that I just made. What it does it not that important, but the end result is what matters.
I need to count some different types of descriptions in a table and that Works fine. I then need to take the two things that I Count and put them in a string that I return to my software. However, every time I run this procedure it gives me this:
Msg 245, Level 16, State 1, Procedure WorkDays, Line 43 Conversion
failed when converting the varchar value
'FlightDeck:161,CabinCrew:189' to data type int.
I just can't figure out why it keeps telling me this when I am not trying to convert a varchar to an int but rather ints to a single varchar.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[WorkDays] #requestedDate nchar(10)
AS
SET ANSI_WARNINGS OFF
DECLARE #date as nchar(10) = ''
DECLARE #returnVal as varchar(30) = ''
DECLARE #flightDeck as int = 0
DECLARE #cabinCrew as int = 0
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
SET #date = #requestedDate
SELECT
#flightDeck = SUM(CASE WHEN dbo.Crew_Category.Description LIKE 'Flight Deck' THEN 1 END),
#cabinCrew = SUM(CASE WHEN dbo.Crew_Category.Description LIKE 'Cabin Crew' THEN 1 END)
FROM
dbo.CrewMember INNER JOIN
dbo.Crew_Category ON dbo.CrewMember.CrewCategorySeqNo = dbo.Crew_Category.CrewCategorySeqno
WHERE
(dbo.Crew_Category.Description = N'Flight Deck' OR
dbo.Crew_Category.Description = N'Cabin Crew') AND
(dbo.CrewMember.EmploymentEndDate > #date)
AND dbo.CrewMember.CrewSeqno NOT IN (
SELECT
CrewMember_1.CrewSeqno
FROM
dbo.CrewMember AS CrewMember_1 INNER JOIN
dbo.CrewReqAsg ON CrewMember_1.CrewSeqno = dbo.CrewReqAsg.crewSeqno INNER JOIN
dbo.activity ON dbo.CrewReqAsg.act_seqno = dbo.activity.act_seqno INNER JOIN
dbo.ActivityType ON dbo.activity.actType_seqno = dbo.ActivityType.actType_seqno INNER JOIN
dbo.ActivityCategory ON dbo.ActivityType.ActCat_seqno = dbo.ActivityCategory.actCat_seqno INNER JOIN
dbo.Crew_Category AS Crew_Category_1 ON CrewMember_1.CrewCategorySeqNo = Crew_Category_1.CrewCategorySeqno
WHERE (
dbo.ActivityCategory.Category = N'Ferie' OR
dbo.ActivityCategory.Category = N'Fridage' OR
dbo.ActivityCategory.Category = N'Sygdom') AND (Crew_Category_1.Description = N'Flight Deck' OR
Crew_Category_1.Description = N'Cabin Crew') AND (LEFT(dbo.activity.Start,10) LIKE #date));
SET #returnVal = 'FlightDeck:'+CAST(#flightDeck AS varchar);
SET #returnVal += ',CabinCrew:'+CAST(#cabinCrew AS varchar);
END
RETURN #returnVal
It's been a while since I've had to do this so perhaps I just forgot something fundamental. Please help me figure out why this happens? :)
Yes, you forgot something fundamental. To return data to the caller, use SELECT, not RETURN.
You need
SELECT #returnVal

SQl Server Performance

I have database with more than 30 tables and more than 270k records in one table (the most important table) and create view get data from this table and other tables,
When I run the code below on my machine it takes less than 4 sec to get data from the view.
select * from view
My problem is that,
When I run the same script of database on another machine and run the same query from the view it takes a very long time.
Code for view
SELECT
dbo.UserSite.UserId,
dbo.UserSite.Name,
dbo.Site.RootPageURL,
dbo.PDFDocument.DocumentId,
dbo.RunDocumentVerificationResult.Status,
dbo.UserSite.UserSiteId,
dbo.Systemcode.Value,
dbo.RunDocumentVerificationResult.PageNumber,
dbo.RunDocumentVerificationResult.TestNameID,
dbo.RunDocumentVerificationResult.VerificationResultID,
dbo.TaskRun.VerificationEndDate,
dbo.TaskRun.RunId,
dbo.RunDocument.IsTagged,
dbo.RunDocument.IsProtected,
dbo.RunDocument.IsCorrupted
FROM
dbo.UserSite
INNER JOIN dbo.Site ON dbo.UserSite.SiteId = dbo.Site.SiteId
INNER JOIN dbo.TaskUserSites ON dbo.UserSite.UserSiteId = dbo.TaskUserSites.UserSiteId
INNER JOIN dbo.Task ON dbo.TaskUserSites.TaskId = dbo.Task.TaskId
INNER JOIN dbo.TaskRun ON dbo.Task.TaskId = dbo.TaskRun.TaskId
INNER JOIN dbo.RunDocument ON dbo.TaskRun.RunId = dbo.RunDocument.RunId
INNER JOIN dbo.PDFDocument ON dbo.PDFDocument.DocumentId = dbo.RunDocument.DocumentId
INNER JOIN dbo.RunDocumentVerificationResult ON dbo.RunDocument.RunDocumentId = dbo.RunDocumentVerificationResult.RunDocumentID
INNER JOIN dbo.Systemcode ON dbo.RunDocumentVerificationResult.Status = dbo.Systemcode.ID
EstimatedTime
Procdure Code is
ALTER proc [dbo].[status]
as
begin
begin transaction
declare #usersiteid bigint
declare #runid bigint
declare #TestedFiles int
declare #TaggedFiles int
declare #UnTaggedFiles int
declare #PassedFiles int
declare #FaildFiles int
declare #Name varchar(500)
declare #VerificationEndDate datetime
declare #RootPageURL varchar (1024)
declare #status table ( Name varchar(1000) , Urlrootpage varchar(2000) ,Testedfile int , TaggedFiles int , Untaggedfile int ,passedfiles int , faildfiles int,VerificationEndDate datetime,rootpageurl varchar(1024) )
declare #domain table (name varchar(1000) , urlrootpage varchar (2000) )
if (1=2)
begin
select 'n' Name ,'r' Urlrootpage ,1 Testedfile ,1 TaggedFiles ,0 Untaggedfile ,0 passedfiles ,0 faildfiles,GETDATE() VerificationEndDate ,'r' rootpageurl where 1=2
end
create table #status ( Name varchar(1000) , Urlrootpage varchar(2000) ,Testedfile int , TaggedFiles int , Untaggedfile int ,passedfiles int , faildfiles int,VerificationEndDate datetime,rootpageurl varchar(1024) )
set #usersiteid = (select min (UserSiteId) from vw)
set #runid = (select max (runid) from vw where usersiteid = #usersiteid)
while #usersiteid is not null
begin
set #TestedFiles = (select (count ( distinct documentid )) from vw where UserSiteId=#usersiteid and runid=#runid )
set #TaggedFiles = (select (count ( distinct documentid )) from vw where istagged=1 and UserSiteId=#usersiteid and runid=#runid)
set #UnTaggedFiles =(select (count ( distinct documentid )) from vw where istagged=0 and UserSiteId=#usersiteid and runid=#runid)
set #PassedFiles =(select (count ( distinct documentid )) from vw where Status<>1 and DocumentId not in (select DocumentId from vw where status =1) and UserSiteId=#usersiteid and runid=#runid)
set #FaildFiles = ( select (count ( distinct documentid )) from vw where Status=1 and UserSiteId=#usersiteid and runid=#runid)
set #Name = (select distinct name from vw where UserSiteId=#usersiteid)
set #rootPageUrl = (select distinct RootPageURL from vw where UserSiteId=#usersiteid)
set #VerificationEndDate = (select max(distinct VerificationEndDate) from vw where UserSiteId=#usersiteid and RunId=#runid)
insert into #status ( Name, Urlrootpage , Testedfile , TaggedFiles , Untaggedfile ,passedfiles , faildfiles ,VerificationEndDate ) values
(#Name,#RootPageURL,#TestedFiles,#TaggedFiles ,#UnTaggedFiles,#PassedFiles,#FaildFiles,#VerificationEndDate)
set #usersiteid = (select min (UserSiteId) from vw where UserSiteId > #usersiteid)
set #runid = (select max (runid) from vw where usersiteid = #usersiteid)
end
insert into #domain select UserSite.Name , Site.RootPageURL from UserSite inner join Site on UserSite.SiteId=Site.SiteId where UserSiteId not in (select UserSiteId from vw)
insert into #status select name,urlrootpage,0,0,0,0,0,null,0 from #domain
select Name,Urlrootpage,Testedfile,TaggedFiles,Untaggedfile, passedfiles,faildfiles from #status
end
If (##Error <> 0) -- Check if any error
Begin
rollback transaction
End
else
commit transaction
return
I would do a little test to find out if it is actually, as suggested, the network bandwidth that causes your query to be slow, or, better said, to look like it's slow. Append a limit-statement to your query and run it, like LIMIT 10. So while the whole query will execute, only the 10 first rows will be sent, and if the network is your bottleneck, it should now be very fast. If it is still that slow, your machine's sql server probably has very little memory to use, so it can't fit the whole result in, and your local sql server is probably configured to use more memory, so it executes faster. In this case, giving your sql server more memory should fix the problem. This should be no problem at all, since, as already mentioned in the comments, your database is actually very small, so the currently used memory will be very small too.
If your network connection turns out to be the bottleneck, you need to decide if, and why, you need all the results to be sent at once. I can't really help you on that one, since I don't know what the application is supposed to do with the data. But probably you should either do some aggegration in the database, or only send a small part of the data over the network.

Can we create a view after a script from a variable?

I would like to create a view at the end of the following request.
I know that 'create view' must be the first statement in a query batch. My problem is that for this query i must use a variable (#listOfIDRUB).
This variable is only fill correctly at the end of my little script.
I also have tried to create the view before my first declaration but it created a problem with "DECLARE".
So is it possible to create a view easily from the result of my script or i have to do something else ?
DECLARE #CounterResId int;
DECLARE #lePath varchar(255);
DECLARE #listOfIDRUB TABLE (EXTERNALREFERENCE uniqueidentifier, ID varchar(255), DOCID varchar(255) );
DECLARE #Max int;
SET #lePath = '';
SET #CounterResId = 1;
SET #Max = (SELECT COUNT(*) FROM SYNTHETIC..EXTRANET_PURGE WHERE TYPE_SUPPR = 'ResId')
WHILE (#CounterResId <= #Max )
BEGIN;
set #lePath =
(select tmp.lePath from
(
select row_number() over(order by path)as NumLigne, CONCAT(path, '%' ) as lePath from RUBRIQUE
WHERE MODELE = 'CAEEE64D-2B00-44EF-AA11-6B72ABD9FE38'
and CODE in (SELECT ID FROM SYNTHETIC..EXTRANET_PURGE where TYPE_SUPPR='ResId')
) tmp
WHERE tmp.NumLigne = #CounterResId)
INSERT INTO #listOfIDRUB(EXTERNALREFERENCE, ID, DOCID)
SELECT SEC.EXTERNALREFERENCE , SEC.ID, SEC.DOCUMENTID
FROM WEBACCESS_FRONT..SECTIONS sec
inner join rubrique rub ON rub.ID_RUBRIQUE = sec.EXTERNALREFERENCE
inner join template_tree_item tti ON tti.id_template_tree_item = rub.modele
inner join template t ON t.id_template = tti.template
WHERE t.CODE IN (SELECT TEMPLATE_CODE from SYNTHETIC..EasyFlowEngineListTemplateCode)
and rub.path like #lePath
print #CounterResId;
print #lePath;
set #CounterResId = #CounterResId + 1;
END;
select * from #listOfIDRUB;
Instead of select * from #listOfIDRUB
i wanted create view test as select * from listOfIDRUB
I have also tried create view test as (all my request)
Whenever you ask something about SQL please state your RDBMS (product and version). The answers are highly depending on this...
From your code I assume this is SQL Server.
So to your question: No, a VIEW must be "inlineable" (single-statement or "ad-hoc") statement.
You might think about a multi-statement UDF, but this is in almost all cases a bad thing (bad performance). Only go this way, if your result table will consist of rather few rows!
Without knowing your tables this is rather blind walking, but you might try this (add parameters, if you can transfer external operations (e.g. filtering) into the function):
CREATE FUNCTION dbo.MyFunction()
RETURNS #listOfIDRUB TABLE (EXTERNALREFERENCE uniqueidentifier, ID varchar(255), DOCID varchar(255) )
AS
BEGIN
DECLARE #CounterResId int;
DECLARE #lePath varchar(255);
DECLARE #Max int;
SET #lePath = '';
SET #CounterResId = 1;
SET #Max = (SELECT COUNT(*) FROM SYNTHETIC..EXTRANET_PURGE WHERE TYPE_SUPPR = 'ResId')
WHILE (#CounterResId <= #Max )
BEGIN;
set #lePath =
(select tmp.lePath from
(
select row_number() over(order by path)as NumLigne, CONCAT(path, '%' ) as lePath from RUBRIQUE
WHERE MODELE = 'CAEEE64D-2B00-44EF-AA11-6B72ABD9FE38'
and CODE in (SELECT ID FROM SYNTHETIC..EXTRANET_PURGE where TYPE_SUPPR='ResId')
) tmp
WHERE tmp.NumLigne = #CounterResId)
INSERT INTO #listOfIDRUB(EXTERNALREFERENCE, ID, DOCID)
SELECT SEC.EXTERNALREFERENCE , SEC.ID, SEC.DOCUMENTID
FROM WEBACCESS_FRONT..SECTIONS sec
inner join rubrique rub ON rub.ID_RUBRIQUE = sec.EXTERNALREFERENCE
inner join template_tree_item tti ON tti.id_template_tree_item = rub.modele
inner join template t ON t.id_template = tti.template
WHERE t.CODE IN (SELECT TEMPLATE_CODE from SYNTHETIC..EasyFlowEngineListTemplateCode)
and rub.path like #lePath
--print #CounterResId;
--print #lePath;
set #CounterResId = #CounterResId + 1;
END;
RETURN;
END
You can call it like this (very similar to a VIEW)
SELECT * FROM dbo.MyFunction();
And you might even use it in joins...
And last but not least I'm quite sure, that one could solve this without declares and a loop too...

Stored procedure optimization

i have a stored procedure which takes lot of time to execure .Can any one suggest a better approch so that the same result set is achived.
ALTER PROCEDURE [dbo].[spFavoriteRecipesGET]
#USERID INT, #PAGENUMBER INT, #PAGESIZE INT, #SORTDIRECTION VARCHAR(4), #SORTORDER VARCHAR(4),#FILTERBY INT
AS
BEGIN
DECLARE
#ROW_START INT
DECLARE
#ROW_END INT
SET
#ROW_START = (#PageNumber-1)* #PageSize+1
SET
#ROW_END = #PageNumber*#PageSize
DECLARE
#RecipeCount INT
DECLARE
#RESULT_SET_TABLE
TABLE
(
Id INT NOT NULL IDENTITY(1,1),
FavoriteRecipeId INT,
RecipeId INT,
DateAdded DATETIME,
Title NVARCHAR(255),
UrlFriendlyTitle NVARCHAR(250),
[Description] NVARCHAR(MAX),
AverageRatingId FLOAT,
SubmittedById INT,
SubmittedBy VARCHAR(250),
RecipeStateId INT,
RecipeRatingId INT,
ReviewCount INT,
TweaksCount INT,
PhotoCount INT,
ImageName NVARCHAR(50)
)
INSERT INTO #RESULT_SET_TABLE
SELECT
FavoriteRecipes.FavoriteRecipeId,
Recipes.RecipeId,
FavoriteRecipes.DateAdded,
Recipes.Title,
Recipes.UrlFriendlyTitle,
Recipes.[Description],
Recipes.AverageRatingId,
Recipes.SubmittedById,
COALESCE(users.DisplayName,users.UserName,Recipes.SubmittedBy) As SubmittedBy,
Recipes.RecipeStateId,
RecipeReviews.RecipeRatingId,
COUNT(RecipeReviews.Review),
COUNT(RecipeTweaks.Tweak),
COUNT(Photos.PhotoId),
dbo.udfGetRecipePhoto(Recipes.RecipeId) AS ImageName
FROM
FavoriteRecipes
INNER JOIN Recipes ON FavoriteRecipes.RecipeId=Recipes.RecipeId AND Recipes.RecipeStateId <> 3
LEFT OUTER JOIN RecipeReviews ON RecipeReviews.RecipeId=Recipes.RecipeId AND RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeRatingId= (
SELECT MAX(RecipeReviews.RecipeRatingId)
FROM RecipeReviews
WHERE RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeId=FavoriteRecipes.RecipeId
)
OR RecipeReviews.RecipeRatingId IS NULL
LEFT OUTER JOIN RecipeTweaks ON RecipeTweaks.RecipeId = Recipes.RecipeId AND RecipeTweaks.TweakedById= #UserId
LEFT OUTER JOIN Photos ON Photos.RecipeId = Recipes.RecipeId
AND Photos.UploadedById = #UserId AND Photos.RecipeId = FavoriteRecipes.RecipeId
AND Photos.PhotoTypeId = 1
LEFT OUTER JOIN users ON Recipes.SubmittedById = users.UserId
WHERE
FavoriteRecipes.UserId=#UserId
GROUP BY
FavoriteRecipes.FavoriteRecipeId,
Recipes.RecipeId,
FavoriteRecipes.DateAdded,
Recipes.Title,
Recipes.UrlFriendlyTitle,
Recipes.[Description],
Recipes.AverageRatingId,
Recipes.SubmittedById,
Recipes.SubmittedBy,
Recipes.RecipeStateId,
RecipeReviews.RecipeRatingId,
users.DisplayName,
users.UserName,
Recipes.SubmittedBy;
WITH SortResults
AS
(
SELECT
ROW_NUMBER() OVER (
ORDER BY CASE WHEN #SORTDIRECTION = 't' AND #SORTORDER='a' THEN TITLE END ASC,
CASE WHEN #SORTDIRECTION = 't' AND #SORTORDER='d' THEN TITLE END DESC,
CASE WHEN #SORTDIRECTION = 'r' AND #SORTORDER='a' THEN AverageRatingId END ASC,
CASE WHEN #SORTDIRECTION = 'r' AND #SORTORDER='d' THEN AverageRatingId END DESC,
CASE WHEN #SORTDIRECTION = 'mr' AND #SORTORDER='a' THEN RecipeRatingId END ASC,
CASE WHEN #SORTDIRECTION = 'mr' AND #SORTORDER='d' THEN RecipeRatingId END DESC,
CASE WHEN #SORTDIRECTION = 'd' AND #SORTORDER='a' THEN DateAdded END ASC,
CASE WHEN #SORTDIRECTION = 'd' AND #SORTORDER='d' THEN DateAdded END DESC
) RowNumber,
FavoriteRecipeId,
RecipeId,
DateAdded,
Title,
UrlFriendlyTitle,
[Description],
AverageRatingId,
SubmittedById,
SubmittedBy,
RecipeStateId,
RecipeRatingId,
ReviewCount,
TweaksCount,
PhotoCount,
ImageName
FROM
#RESULT_SET_TABLE
WHERE
((#FILTERBY = 1 AND SubmittedById= #USERID)
OR ( #FILTERBY = 2 AND (SubmittedById <> #USERID OR SubmittedById IS NULL))
OR ( #FILTERBY <> 1 AND #FILTERBY <> 2))
)
SELECT
RowNumber,
FavoriteRecipeId,
RecipeId,
DateAdded,
Title,
UrlFriendlyTitle,
[Description],
AverageRatingId,
SubmittedById,
SubmittedBy,
RecipeStateId,
RecipeRatingId,
ReviewCount,
TweaksCount,
PhotoCount,
ImageName
FROM
SortResults
WHERE
RowNumber BETWEEN #ROW_START AND #ROW_END
print #ROW_START
print #ROW_END
SELECT
#RecipeCount=dbo.udfGetFavRecipesCount(#UserId)
SELECT
#RecipeCount AS RecipeCount
SELECT COUNT(Id) as FilterCount FROM #RESULT_SET_TABLE
WHERE
((#FILTERBY = 1 AND SubmittedById= #USERID)
OR (#FILTERBY = 2 AND (SubmittedById <> #USERID OR SubmittedById IS NULL))
OR (#FILTERBY <> 1 AND #FILTERBY <> 2))
END
You need to look at the execution plan to see where the time is going. It could be indexes, table-scans caused by your UDF, any number of things. As you anayze the plan, try to break up the query into smaller pieces to see if you can make a difference in them.
Then learn about ROW_NUMBER to see if you can do without the local table.
Couple notes
Indexing - often times when people create procedures which use temp table or table variable they fail to realize you can create indexes on those objects and this can have massive performance implications.
UDF - Sometimes the query processor will effectively inline UDF logic and sometimes not, look closely at your query plan an see how this is being handled. Often times if you manually inline this logic in something like a correlated sub-query you can boost performance a lot.
As others have said, the only way to know is to look at explain plans. Glancing over the code, this part looks kind of fishy:
AND RecipeReviews.RecipeRatingId= (
SELECT MAX(RecipeReviews.RecipeRatingId)
FROM RecipeReviews
WHERE RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeId=FavoriteRecipes.RecipeId
)
In general, doing non-trivial stuff in join conditions is a Bad Idea. I would factor that out into a sub-select, and since it's an outer join, you'd probably have to combine that with RecipeReviews somehow.
BUT: All of this is speculation! Explain! Measure!
Well in addition to the possible poor performance of the UDF, this line of code concerns me
LEFT OUTER JOIN RecipeReviews
ON RecipeReviews.RecipeId=Recipes.RecipeId
AND RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeRatingId=
(SELECT MAX(RecipeReviews.RecipeRatingId)
FROM RecipeReviews
WHERE RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeId=FavoriteRecipes.RecipeId )
OR RecipeReviews.RecipeRatingId IS NULL
It is generally a poor practice to use a subquery as part of a join. I would strongly supect this is not using any indexes you may have. And the OR part doesn;t make sense to mea atll all, the left join shoudl get you this.
Rewrite it to make a derived table instead.
If you have a lot of records a temp table usually performs better than a table variable and can (and probably should) be indexed.
You need to add parentheses around your OR conditions.
LEFT OUTER JOIN RecipeReviews
ON RecipeReviews.RecipeId = Recipes.RecipeId
AND RecipeReviews.ReviewedById = #UserId
AND
-- insert open parenthesis here:
(
RecipeReviews.RecipeRatingId = (... subquery ...)
OR RecipeReviews.RecipeRatingId IS NULL
-- insert close parenthesis here:
)
the very first, simple thing i would do, is move all your declare statements to the top.
DECLARE #ROW_START INT,
#ROW_END INT,
#RecipeCount INT
DECLARE
#RESULT_SET_TABLE
TABLE
(
Id INT NOT NULL IDENTITY(1,1),
)
The next part, which is still rather simple, is stuff like this:
AND Recipes.RecipeStateId <> 3
AND RecipeTweaks.TweakedById= #UserId
This can be taken out of the join and move to the where clause. if you can, change the <> to an in statement so that it can utlize an index seek.
AND RecipeReviews.RecipeRatingId=
(
SELECT MAX(RecipeReviews.RecipeRatingId)
FROM RecipeReviews
WHERE RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeId=FavoriteRecipes.RecipeId
)
that's jsut crazy looking and needs to be completely redone.

Sql Optimization on advertising system

I am currently developing on an advertising system, which have been running just fine for a while now, apart from recently when our views per day have shot up from about 7k to 328k. Our server cannot take the pressure on this anymore - and knowing that I am not the best SQL guy around (hey, I can make it work, but not always in the best way) I am asking here for some optimization guidelines. I hope that some of you will be able to give rough ideas on how to improve this - I don't specifically need code, just to see the light :).
As it is at the moment, when an advert is supposed to be shown a PHP script is called, which in return calls a stored procedure. This stored procedure does several checks, it tests up against our customer database to see if the person showing the advert (given by a primary key id) is an actual customer under the given locale (our system is running on several languages which are all run as separate sites). Next up is all the advert details fetched out (image location as an url, height and width of the advert) - and lest step calls a separate stored procedure to test if the advert is allowed to be shown (is the campaign expired by either date or number of adverts allowed to show?) and if the customer has access to it (we got 2 access systems running, a blacklist and a whitelist one) and lastly what type of campaign we're running, is the view unique and so forth.
The code consists of a couple of stored procedures that I will post in here.
--- procedure called from PHP
CREATE PROCEDURE [dbo].[ExecView]
(
#publisherId bigint,
#advertId bigint,
#localeId int,
#ip varchar(15),
#ipIsUnique bit,
#success bit OUTPUT,
#campaignId bigint OUTPUT,
#advert varchar(500) OUTPUT,
#advertWidth int OUTPUT,
#advertHeight int OUTPUT
)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #unique bit
DECLARE #approved bit
DECLARE #publisherEarning money
DECLARE #advertiserCost money
DECLARE #originalStatus smallint
DECLARE #advertUrl varchar(500)
DECLARE #return int
SELECT #success = 1, #advert = NULL, #advertHeight = NULL, #advertWidth = NULL
--- Must be valid publisher, ie exist and actually be a publisher
IF dbo.IsValidPublisher(#publisherId, #localeId) = 0
BEGIN
SELECT #success = 0
RETURN 0
END
--- Must be a valid advert
EXEC #return = FetchAdvertDetails #advertId, #localeId, #advert OUTPUT, #advertUrl OUTPUT, #advertWidth OUTPUT, #advertHeight OUTPUT
IF #return = 0
BEGIN
SELECT #success = 0
RETURN 0
END
EXEC CanAddStatToAdvert 2, #advertId, #publisherId, #ip, #ipIsUnique, #success OUTPUT, #unique OUTPUT, #approved OUTPUT, #publisherEarning OUTPUT, #advertiserCost OUTPUT, #originalStatus OUTPUT, #campaignId OUTPUT
IF #success = 1
BEGIN
INSERT INTO dbo.Stat (AdvertId, [Date], Ip, [Type], PublisherEarning, AdvertiserCost, [Unique], Approved, PublisherCustomerId, OriginalStatus)
VALUES (#advertId, GETDATE(), #ip, 2, #publisherEarning, #advertiserCost, #unique, #approved, #publisherId, #originalStatus)
END
END
--- IsValidPublisher
CREATE FUNCTION [dbo].[IsValidPublisher]
(
#publisherId bigint,
#localeId int
)
RETURNS bit
AS
BEGIN
DECLARE #customerType smallint
DECLARE #result bit
SET #customerType = (SELECT [Type] FROM dbo.Customer
WHERE CustomerId = #publisherId AND Deleted = 0 AND IsApproved = 1 AND IsBlocked = 0 AND LocaleId = #localeId)
IF #customerType = 2
SET #result = 1
ELSE
SET #result = 0
RETURN #result
END
-- Fetch advert details
CREATE PROCEDURE [dbo].[FetchAdvertDetails]
(
#advertId bigint,
#localeId int,
#advert varchar(500) OUTPUT,
#advertUrl varchar(500) OUTPUT,
#advertWidth int OUTPUT,
#advertHeight int OUTPUT
)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
SELECT #advert = T1.Advert, #advertUrl = T1.TargetUrl, #advertWidth = T1.Width, #advertHeight = T1.Height FROM Advert as T1
INNER JOIN Campaign AS T2 ON T1.CampaignId = T2.Id
WHERE T1.Id = #advertId AND T2.LocaleId = #localeId AND T2.Deleted = 0 AND T2.[Status] <> 1
IF #advert IS NULL
RETURN 0
ELSE
RETURN 1
END
--- CanAddStatToAdvert
CREATE PROCEDURE [dbo].[CanAddStatToAdvert]
#type smallint, --- Type of stat to add
#advertId bigint,
#publisherId bigint,
#ip varchar(15),
#ipIsUnique bit,
#success bit OUTPUT,
#unique bit OUTPUT,
#approved bit OUTPUT,
#publisherEarning money OUTPUT,
#advertiserCost money OUTPUT,
#originalStatus smallint OUTPUT,
#campaignId bigint OUTPUT
AS
BEGIN
SET NOCOUNT ON;
DECLARE #campaignLimit int
DECLARE #campaignStatus smallint
DECLARE #advertsLeft int
DECLARE #campaignType smallint
DECLARE #campaignModeration smallint
DECLARE #count int
SELECT #originalStatus = 0
SELECT #success = 1
SELECT #approved = 1
SELECT #unique = 1
SELECT #campaignId = CampaignId FROM dbo.Advert
WHERE Id = #advertId
IF #campaignId IS NULL
BEGIN
SELECT #success = 0
RETURN
END
SELECT #campaignLimit = Limit, #campaignStatus = [Status], #campaignType = [Type], #publisherEarning = PublisherEarning, #advertiserCost = AdvertiserCost, #campaignModeration = ModerationType FROM dbo.Campaign
WHERE Id = #campaignId
IF (#type <> 0 AND #type <> 2 AND #type <> #campaignType) OR ((#campaignType = 0 OR #campaignType = 2) AND (#type = 1)) -- if not a click or view type, then type must match the campaign (ie, only able to do leads on lead campaigns, no isales or etc), click and view campaigns however can do leads too
BEGIN
SELECT #success = 0
RETURN
END
-- Take advantage of the fact that the variable only gets touched if there is a record,
-- which is supposed to override the existing one, if there is one
SELECT #publisherEarning = Earning FROM dbo.MapCampaignPublisherEarning
WHERE CanpaignId = #campaignId AND PublisherId = #publisherId
IF #campaignStatus = 1
BEGIN
SELECT #success = 0
RETURN
END
IF NOT #campaignLimit IS NULL
BEGIN
SELECT #advertsLeft = AdvertsLeft FROM dbo.Campaign WHERE Id = #campaignId
IF #advertsLeft < 1
BEGIN
SELECT #success = 0
RETURN
END
END
IF #campaignModeration = 0 -- blacklist
BEGIN
SELECT #count = COUNT([Status]) FROM dbo.MapCampaignModeration WHERE CampaignId = #campaignId AND PublisherId = #publisherId AND [Status] = 3
IF #count > 0
BEGIN
SELECT #success = 0
RETURN
END
END
ELSE -- whitelist
BEGIN
SELECT #count = COUNT([Status]) FROM dbo.MapCampaignModeration WHERE CampaignId = #campaignId AND PublisherId = #publisherId AND [Status] = 2
IF #count < 1
BEGIN
SELECT #success = 0
RETURN
END
END
IF #ipIsUnique = 1
BEGIN
SELECT #unique = 1
END
ELSE
BEGIN
IF (SELECT COUNT(T1.Id) FROM dbo.Stat AS T1
INNER JOIN dbo.IQ_Advert AS T2
ON T1.AdvertId = T2.Id
WHERE T2.CampaignId = #campaignId
AND T1.[Type] = #type
AND T1.[Unique] = 1
AND T1.PublisherCustomerId = #publisherId
AND T1.Ip = #ip
AND DATEADD(SECOND, 86400, T1.[Date]) > GETDATE()
) = 0
SELECT #unique = 1
ELSE
BEGIN
SELECT #unique = 0, #originalStatus = 1 -- not unique, and set status to be ip conflict
END
END
IF #unique = 0 AND #type <> 0 AND #type <> 2
BEGIN
SELECT #unique = 1, #approved = 0
END
IF #originalStatus = 0
SELECT #originalStatus = 5
IF #approved = 0 OR #type <> #campaignType
BEGIN
SELECT #publisherEarning = 0, #advertiserCost = 0
END
END
I am thinking this needs more than just a couple of indexes thrown in to help it, but rather a total rethinking of how to handle it. I have been heard that running this as a batch would help, but I am not sure how to get this implemented, and really not sure if i can implement it in a such way where I keep all these nice checks before the actual insert or if I have to give up on some of this.
Anyhow, all help would be appreciated, if you need any of the table layouts, let me know :).
Thanks for taking the time to look at it :)
Make sure to reference tables with the ownership prefix. So instead of:
INNER JOIN Campaign AS T2 ON T1.CampaignId = T2.Id
Use
INNER JOIN dbo.Campaign AS T2 ON T1.CampaignId = T2.Id
That will allow the database to cache the execution plan.
Another possibility is to disable database locking, which has data integrity risks, but can significantly increase performance:
INNER JOIN dbo.Campaign AS T2 (nolock) ON T1.CampaignId = T2.Id
Run a sample query in SQL Analyzer with "Show Execution Plan" turned on. This might give you a hint as to the slowest part of the query.
it seems like FetchAdvertDetails hit the same tables as the start of CanAddStatToAdvert (Advert and Campaign). If possible, I'd try to eliminate FetchAdvertDetails and roll its logic into CanAddStatToAdvert, so you don't have the hit Advert and Campaign the extra times.
Get rid of most of the SQL.
This stored procedure does several
checks, it tests up against our
customer database to see if the person
showing the advert (given by a primary
key id) is an actual customer under
the given locale (our system is
running on several languages which are
all run as separate sites). Next up is
all the advert details fetched out
(image location as an url, height and
width of the advert) - and lest step
calls a separate stored procedure to
test if the advert is allowed to be
shown (is the campaign expired by
either date or number of adverts
allowed to show?) and if the customer
has access to it (we got 2 access
systems running, a blacklist and a
whitelist one) and lastly what type of
campaign we're running, is the view
unique and so forth.
Most of this should not be done in the database for every request. In particular:
Customer and local can be stored in memory. Expire them after 5 minutes or so, but do not ask for this info on every repetitive request.
Advert details can also be stored. Every advert will have a "key" to identify it (Number)?. Ust a dictionary / hashtable in memory.
Eliminate as many SQL Parts as you can. Dumping repetitive work on the SQL Database is a typical mistake.