Stored procedure optimization - sql

i have a stored procedure which takes lot of time to execure .Can any one suggest a better approch so that the same result set is achived.
ALTER PROCEDURE [dbo].[spFavoriteRecipesGET]
#USERID INT, #PAGENUMBER INT, #PAGESIZE INT, #SORTDIRECTION VARCHAR(4), #SORTORDER VARCHAR(4),#FILTERBY INT
AS
BEGIN
DECLARE
#ROW_START INT
DECLARE
#ROW_END INT
SET
#ROW_START = (#PageNumber-1)* #PageSize+1
SET
#ROW_END = #PageNumber*#PageSize
DECLARE
#RecipeCount INT
DECLARE
#RESULT_SET_TABLE
TABLE
(
Id INT NOT NULL IDENTITY(1,1),
FavoriteRecipeId INT,
RecipeId INT,
DateAdded DATETIME,
Title NVARCHAR(255),
UrlFriendlyTitle NVARCHAR(250),
[Description] NVARCHAR(MAX),
AverageRatingId FLOAT,
SubmittedById INT,
SubmittedBy VARCHAR(250),
RecipeStateId INT,
RecipeRatingId INT,
ReviewCount INT,
TweaksCount INT,
PhotoCount INT,
ImageName NVARCHAR(50)
)
INSERT INTO #RESULT_SET_TABLE
SELECT
FavoriteRecipes.FavoriteRecipeId,
Recipes.RecipeId,
FavoriteRecipes.DateAdded,
Recipes.Title,
Recipes.UrlFriendlyTitle,
Recipes.[Description],
Recipes.AverageRatingId,
Recipes.SubmittedById,
COALESCE(users.DisplayName,users.UserName,Recipes.SubmittedBy) As SubmittedBy,
Recipes.RecipeStateId,
RecipeReviews.RecipeRatingId,
COUNT(RecipeReviews.Review),
COUNT(RecipeTweaks.Tweak),
COUNT(Photos.PhotoId),
dbo.udfGetRecipePhoto(Recipes.RecipeId) AS ImageName
FROM
FavoriteRecipes
INNER JOIN Recipes ON FavoriteRecipes.RecipeId=Recipes.RecipeId AND Recipes.RecipeStateId <> 3
LEFT OUTER JOIN RecipeReviews ON RecipeReviews.RecipeId=Recipes.RecipeId AND RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeRatingId= (
SELECT MAX(RecipeReviews.RecipeRatingId)
FROM RecipeReviews
WHERE RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeId=FavoriteRecipes.RecipeId
)
OR RecipeReviews.RecipeRatingId IS NULL
LEFT OUTER JOIN RecipeTweaks ON RecipeTweaks.RecipeId = Recipes.RecipeId AND RecipeTweaks.TweakedById= #UserId
LEFT OUTER JOIN Photos ON Photos.RecipeId = Recipes.RecipeId
AND Photos.UploadedById = #UserId AND Photos.RecipeId = FavoriteRecipes.RecipeId
AND Photos.PhotoTypeId = 1
LEFT OUTER JOIN users ON Recipes.SubmittedById = users.UserId
WHERE
FavoriteRecipes.UserId=#UserId
GROUP BY
FavoriteRecipes.FavoriteRecipeId,
Recipes.RecipeId,
FavoriteRecipes.DateAdded,
Recipes.Title,
Recipes.UrlFriendlyTitle,
Recipes.[Description],
Recipes.AverageRatingId,
Recipes.SubmittedById,
Recipes.SubmittedBy,
Recipes.RecipeStateId,
RecipeReviews.RecipeRatingId,
users.DisplayName,
users.UserName,
Recipes.SubmittedBy;
WITH SortResults
AS
(
SELECT
ROW_NUMBER() OVER (
ORDER BY CASE WHEN #SORTDIRECTION = 't' AND #SORTORDER='a' THEN TITLE END ASC,
CASE WHEN #SORTDIRECTION = 't' AND #SORTORDER='d' THEN TITLE END DESC,
CASE WHEN #SORTDIRECTION = 'r' AND #SORTORDER='a' THEN AverageRatingId END ASC,
CASE WHEN #SORTDIRECTION = 'r' AND #SORTORDER='d' THEN AverageRatingId END DESC,
CASE WHEN #SORTDIRECTION = 'mr' AND #SORTORDER='a' THEN RecipeRatingId END ASC,
CASE WHEN #SORTDIRECTION = 'mr' AND #SORTORDER='d' THEN RecipeRatingId END DESC,
CASE WHEN #SORTDIRECTION = 'd' AND #SORTORDER='a' THEN DateAdded END ASC,
CASE WHEN #SORTDIRECTION = 'd' AND #SORTORDER='d' THEN DateAdded END DESC
) RowNumber,
FavoriteRecipeId,
RecipeId,
DateAdded,
Title,
UrlFriendlyTitle,
[Description],
AverageRatingId,
SubmittedById,
SubmittedBy,
RecipeStateId,
RecipeRatingId,
ReviewCount,
TweaksCount,
PhotoCount,
ImageName
FROM
#RESULT_SET_TABLE
WHERE
((#FILTERBY = 1 AND SubmittedById= #USERID)
OR ( #FILTERBY = 2 AND (SubmittedById <> #USERID OR SubmittedById IS NULL))
OR ( #FILTERBY <> 1 AND #FILTERBY <> 2))
)
SELECT
RowNumber,
FavoriteRecipeId,
RecipeId,
DateAdded,
Title,
UrlFriendlyTitle,
[Description],
AverageRatingId,
SubmittedById,
SubmittedBy,
RecipeStateId,
RecipeRatingId,
ReviewCount,
TweaksCount,
PhotoCount,
ImageName
FROM
SortResults
WHERE
RowNumber BETWEEN #ROW_START AND #ROW_END
print #ROW_START
print #ROW_END
SELECT
#RecipeCount=dbo.udfGetFavRecipesCount(#UserId)
SELECT
#RecipeCount AS RecipeCount
SELECT COUNT(Id) as FilterCount FROM #RESULT_SET_TABLE
WHERE
((#FILTERBY = 1 AND SubmittedById= #USERID)
OR (#FILTERBY = 2 AND (SubmittedById <> #USERID OR SubmittedById IS NULL))
OR (#FILTERBY <> 1 AND #FILTERBY <> 2))
END

You need to look at the execution plan to see where the time is going. It could be indexes, table-scans caused by your UDF, any number of things. As you anayze the plan, try to break up the query into smaller pieces to see if you can make a difference in them.
Then learn about ROW_NUMBER to see if you can do without the local table.

Couple notes
Indexing - often times when people create procedures which use temp table or table variable they fail to realize you can create indexes on those objects and this can have massive performance implications.
UDF - Sometimes the query processor will effectively inline UDF logic and sometimes not, look closely at your query plan an see how this is being handled. Often times if you manually inline this logic in something like a correlated sub-query you can boost performance a lot.

As others have said, the only way to know is to look at explain plans. Glancing over the code, this part looks kind of fishy:
AND RecipeReviews.RecipeRatingId= (
SELECT MAX(RecipeReviews.RecipeRatingId)
FROM RecipeReviews
WHERE RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeId=FavoriteRecipes.RecipeId
)
In general, doing non-trivial stuff in join conditions is a Bad Idea. I would factor that out into a sub-select, and since it's an outer join, you'd probably have to combine that with RecipeReviews somehow.
BUT: All of this is speculation! Explain! Measure!

Well in addition to the possible poor performance of the UDF, this line of code concerns me
LEFT OUTER JOIN RecipeReviews
ON RecipeReviews.RecipeId=Recipes.RecipeId
AND RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeRatingId=
(SELECT MAX(RecipeReviews.RecipeRatingId)
FROM RecipeReviews
WHERE RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeId=FavoriteRecipes.RecipeId )
OR RecipeReviews.RecipeRatingId IS NULL
It is generally a poor practice to use a subquery as part of a join. I would strongly supect this is not using any indexes you may have. And the OR part doesn;t make sense to mea atll all, the left join shoudl get you this.
Rewrite it to make a derived table instead.
If you have a lot of records a temp table usually performs better than a table variable and can (and probably should) be indexed.

You need to add parentheses around your OR conditions.
LEFT OUTER JOIN RecipeReviews
ON RecipeReviews.RecipeId = Recipes.RecipeId
AND RecipeReviews.ReviewedById = #UserId
AND
-- insert open parenthesis here:
(
RecipeReviews.RecipeRatingId = (... subquery ...)
OR RecipeReviews.RecipeRatingId IS NULL
-- insert close parenthesis here:
)

the very first, simple thing i would do, is move all your declare statements to the top.
DECLARE #ROW_START INT,
#ROW_END INT,
#RecipeCount INT
DECLARE
#RESULT_SET_TABLE
TABLE
(
Id INT NOT NULL IDENTITY(1,1),
)
The next part, which is still rather simple, is stuff like this:
AND Recipes.RecipeStateId <> 3
AND RecipeTweaks.TweakedById= #UserId
This can be taken out of the join and move to the where clause. if you can, change the <> to an in statement so that it can utlize an index seek.
AND RecipeReviews.RecipeRatingId=
(
SELECT MAX(RecipeReviews.RecipeRatingId)
FROM RecipeReviews
WHERE RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeId=FavoriteRecipes.RecipeId
)
that's jsut crazy looking and needs to be completely redone.

Related

SQl Server Performance

I have database with more than 30 tables and more than 270k records in one table (the most important table) and create view get data from this table and other tables,
When I run the code below on my machine it takes less than 4 sec to get data from the view.
select * from view
My problem is that,
When I run the same script of database on another machine and run the same query from the view it takes a very long time.
Code for view
SELECT
dbo.UserSite.UserId,
dbo.UserSite.Name,
dbo.Site.RootPageURL,
dbo.PDFDocument.DocumentId,
dbo.RunDocumentVerificationResult.Status,
dbo.UserSite.UserSiteId,
dbo.Systemcode.Value,
dbo.RunDocumentVerificationResult.PageNumber,
dbo.RunDocumentVerificationResult.TestNameID,
dbo.RunDocumentVerificationResult.VerificationResultID,
dbo.TaskRun.VerificationEndDate,
dbo.TaskRun.RunId,
dbo.RunDocument.IsTagged,
dbo.RunDocument.IsProtected,
dbo.RunDocument.IsCorrupted
FROM
dbo.UserSite
INNER JOIN dbo.Site ON dbo.UserSite.SiteId = dbo.Site.SiteId
INNER JOIN dbo.TaskUserSites ON dbo.UserSite.UserSiteId = dbo.TaskUserSites.UserSiteId
INNER JOIN dbo.Task ON dbo.TaskUserSites.TaskId = dbo.Task.TaskId
INNER JOIN dbo.TaskRun ON dbo.Task.TaskId = dbo.TaskRun.TaskId
INNER JOIN dbo.RunDocument ON dbo.TaskRun.RunId = dbo.RunDocument.RunId
INNER JOIN dbo.PDFDocument ON dbo.PDFDocument.DocumentId = dbo.RunDocument.DocumentId
INNER JOIN dbo.RunDocumentVerificationResult ON dbo.RunDocument.RunDocumentId = dbo.RunDocumentVerificationResult.RunDocumentID
INNER JOIN dbo.Systemcode ON dbo.RunDocumentVerificationResult.Status = dbo.Systemcode.ID
EstimatedTime
Procdure Code is
ALTER proc [dbo].[status]
as
begin
begin transaction
declare #usersiteid bigint
declare #runid bigint
declare #TestedFiles int
declare #TaggedFiles int
declare #UnTaggedFiles int
declare #PassedFiles int
declare #FaildFiles int
declare #Name varchar(500)
declare #VerificationEndDate datetime
declare #RootPageURL varchar (1024)
declare #status table ( Name varchar(1000) , Urlrootpage varchar(2000) ,Testedfile int , TaggedFiles int , Untaggedfile int ,passedfiles int , faildfiles int,VerificationEndDate datetime,rootpageurl varchar(1024) )
declare #domain table (name varchar(1000) , urlrootpage varchar (2000) )
if (1=2)
begin
select 'n' Name ,'r' Urlrootpage ,1 Testedfile ,1 TaggedFiles ,0 Untaggedfile ,0 passedfiles ,0 faildfiles,GETDATE() VerificationEndDate ,'r' rootpageurl where 1=2
end
create table #status ( Name varchar(1000) , Urlrootpage varchar(2000) ,Testedfile int , TaggedFiles int , Untaggedfile int ,passedfiles int , faildfiles int,VerificationEndDate datetime,rootpageurl varchar(1024) )
set #usersiteid = (select min (UserSiteId) from vw)
set #runid = (select max (runid) from vw where usersiteid = #usersiteid)
while #usersiteid is not null
begin
set #TestedFiles = (select (count ( distinct documentid )) from vw where UserSiteId=#usersiteid and runid=#runid )
set #TaggedFiles = (select (count ( distinct documentid )) from vw where istagged=1 and UserSiteId=#usersiteid and runid=#runid)
set #UnTaggedFiles =(select (count ( distinct documentid )) from vw where istagged=0 and UserSiteId=#usersiteid and runid=#runid)
set #PassedFiles =(select (count ( distinct documentid )) from vw where Status<>1 and DocumentId not in (select DocumentId from vw where status =1) and UserSiteId=#usersiteid and runid=#runid)
set #FaildFiles = ( select (count ( distinct documentid )) from vw where Status=1 and UserSiteId=#usersiteid and runid=#runid)
set #Name = (select distinct name from vw where UserSiteId=#usersiteid)
set #rootPageUrl = (select distinct RootPageURL from vw where UserSiteId=#usersiteid)
set #VerificationEndDate = (select max(distinct VerificationEndDate) from vw where UserSiteId=#usersiteid and RunId=#runid)
insert into #status ( Name, Urlrootpage , Testedfile , TaggedFiles , Untaggedfile ,passedfiles , faildfiles ,VerificationEndDate ) values
(#Name,#RootPageURL,#TestedFiles,#TaggedFiles ,#UnTaggedFiles,#PassedFiles,#FaildFiles,#VerificationEndDate)
set #usersiteid = (select min (UserSiteId) from vw where UserSiteId > #usersiteid)
set #runid = (select max (runid) from vw where usersiteid = #usersiteid)
end
insert into #domain select UserSite.Name , Site.RootPageURL from UserSite inner join Site on UserSite.SiteId=Site.SiteId where UserSiteId not in (select UserSiteId from vw)
insert into #status select name,urlrootpage,0,0,0,0,0,null,0 from #domain
select Name,Urlrootpage,Testedfile,TaggedFiles,Untaggedfile, passedfiles,faildfiles from #status
end
If (##Error <> 0) -- Check if any error
Begin
rollback transaction
End
else
commit transaction
return
I would do a little test to find out if it is actually, as suggested, the network bandwidth that causes your query to be slow, or, better said, to look like it's slow. Append a limit-statement to your query and run it, like LIMIT 10. So while the whole query will execute, only the 10 first rows will be sent, and if the network is your bottleneck, it should now be very fast. If it is still that slow, your machine's sql server probably has very little memory to use, so it can't fit the whole result in, and your local sql server is probably configured to use more memory, so it executes faster. In this case, giving your sql server more memory should fix the problem. This should be no problem at all, since, as already mentioned in the comments, your database is actually very small, so the currently used memory will be very small too.
If your network connection turns out to be the bottleneck, you need to decide if, and why, you need all the results to be sent at once. I can't really help you on that one, since I don't know what the application is supposed to do with the data. But probably you should either do some aggegration in the database, or only send a small part of the data over the network.

Stored Procedure and output parameter from paging script (SQL Server 2008)

I have the below stored procedure and would like to only have one SQL statement. At the moment you can see there are two statements, one for the actual paging and one for a count of the total records which needs to be return to my app for paging.
However, the below is inefficient as I am getting the total rows from the first query:
COUNT(*) OVER(PARTITION BY 1) as TotalRows
How can I set TotalRows as my output parameter?
ALTER PROCEDURE [dbo].[Nop_LoadAllOptimized]
(
#PageSize int = null,
#PageNumber int = null,
#WarehouseCombinationID int = null,
#CategoryId int = null,
#OrderBy int = null,
#TotalRecords int = null OUTPUT
)
AS
BEGIN
WITH Paging AS (
SELECT rn = (ROW_NUMBER() OVER (
ORDER BY
CASE WHEN #OrderBy = 0 AND #CategoryID IS NOT NULL AND #CategoryID > 0
THEN pcm.DisplayOrder END ASC,
CASE WHEN #OrderBy = 0
THEN p.[Name] END ASC,
CASE WHEN #OrderBy = 5
THEN p.[Name] END ASC,
CASE WHEN #OrderBy = 10
THEN wpv.Price END ASC,
CASE WHEN #OrderBy = 15
THEN wpv.Price END DESC,
CASE WHEN #OrderBy = 20
THEN wpv.Price END DESC,
CASE WHEN #OrderBy = 25
THEN wpv.UnitPrice END ASC
)),COUNT(*) OVER(PARTITION BY 1) as TotalRows, p.*, pcm.DisplayOrder, wpv.Price, wpv.UnitPrice FROM Nop_Product p
INNER JOIN Nop_Product_Category_Mapping pcm ON p.ProductID=pcm.ProductID
INNER JOIN Nop_ProductVariant pv ON p.ProductID = pv.ProductID
INNER JOIN Nop_ProductVariant_Warehouse_Mapping wpv ON pv.ProductVariantID = wpv.ProductVariantID
WHERE pcm.CategoryID = #CategoryId
AND (wpv.Published = 1 AND pv.Published = 1 AND p.Published = 1 AND p.Deleted = 0 AND pv.Deleted = 0 and wpv.Deleted = 0)
AND wpv.WarehouseID IN (select WarehouseID from Nop_WarehouseCombination where UserWarehouseCombinationID = #WarehouseCombinationID)
)
SELECT TOP (#PageSize) * FROM Paging PG
WHERE PG.rn > (#PageNumber * #PageSize) - #PageSize
SELECT #TotalRecords = COUNT(p.ProductId) FROM Nop_Product p
INNER JOIN Nop_Product_Category_Mapping pcm ON p.ProductID=pcm.ProductID
INNER JOIN Nop_ProductVariant pv ON p.ProductID = pv.ProductID
INNER JOIN Nop_ProductVariant_Warehouse_Mapping wpv ON pv.ProductVariantID = wpv.ProductVariantID
WHERE pcm.CategoryID = #CategoryId
AND (wpv.Published = 1 AND pv.Published = 1 AND p.Published = 1 AND p.Deleted = 0 AND pv.Deleted = 0 and wpv.Deleted = 0)
AND wpv.WarehouseID IN (select WarehouseID from Nop_WarehouseCombination where UserWarehouseCombinationID = #WarehouseCombinationID)
END
I think I understand your issue here. Have you considered that the Count could be done BEFORE the CTE
and then passed in as value to the CTE as a variable.
i.e, set the value for #TotalRecords up front, pass it in, and so the CTE will use this count rather than executing the count a second time?
Does this make sense, or have I missed your point here.
no problem friend, highly possible i missed a trick here. However without the schema and data its tricky to test what I am suggesting. In the absence of someone giving a better answer, I've put this test script with data together to demo what I am talking about. If this isn't what you want then no problem. If it is just plain missing the point again, then I'll take that on the chin.
Declare #pagesize as int
Declare #PageNumber as int
Declare #TotalRowsOutputParm as int
SET #pagesize = 3
SET #PageNumber = 2;
--create some test data
DECLARE #SomeData table
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[SomeValue] [nchar](10) NULL
)
INSERT INTO #SomeData VALUES ('TEST1')
INSERT INTO #SomeData VALUES ('TEST2')
INSERT INTO #SomeData VALUES ('TEST3')
INSERT INTO #SomeData VALUES ('TEST4')
INSERT INTO #SomeData VALUES ('TEST5')
INSERT INTO #SomeData VALUES ('TEST6')
INSERT INTO #SomeData VALUES ('TEST7')
INSERT INTO #SomeData VALUES ('TEST8')
INSERT INTO #SomeData VALUES ('TEST9')
INSERT INTO #SomeData VALUES ('TEST10');
--Get total count of all rows
Set #TotalRowsOutputParm = (SELECT COUNT(SomeValue) FROM #SomeData p) ;
WITH Paging AS
(
SELECT rn = (ROW_NUMBER() OVER (ORDER BY SomeValue ASC)),
#TotalRowsOutputParm as TotalRows, p.*
FROM [SomeData] p
)
SELECT TOP (#PageSize) * FROM Paging PG
WHERE PG.rn > (#PageNumber * #PageSize) - #PageSize
PRINT #TotalRowsOutputParm
I don't think you can do it without running the query twice if you want to assign it to a variable
however, can't you just add another column and do something like this instead?
;WITH Paging AS (select *,ROW_NUMBER() OVER(ORDER BY name) AS rn FROM sysobjects)
SELECT (SELECT MAX(rn) FROM Paging) AS TotalRecords,* FROM Paging
WHERE rn < 10
Or in your case
SELECT TOP (#PageSize) *,(SELECT MAX(PG.rn) FROM Paging) AS TotalRecords
FROM Paging PG
WHERE PG.rn > (#PageNumber * #PageSize) - #PageSize
Then from the front end grab that column
In the end I decided just to use two different SQL statements, one for count, one for select.
The "COUNT(*) OVER(PARTITION BY 1) as TotalRows" actually was pretty expensive and it turned out much quicker to just use two different statements.
Thank you everyone who helped with this question.

SQL Query Optimization

This report used to take about 16 seconds when there were 8000 rows to process. Now there are 50000 rows and the report takes 2:30 minutes.
This was my first pass at this and the client needed it yesterday, so I wrote this code in the logical order of what needed to be done, but without optimization in mind.
Now with the report taking longer as the data increases, I need to take a second look at this and optimize it. I'm thinking indexed views, table functions, etc.
I think the biggest bottleneck is looping through the temp table, making 4 select statements, and updating the temp table...50,000 times.
I think I can condense ALL of this into one large SELECT with either (a) 4 joins to the same table to get the 4 statuses, but then I am not sure how to get the TOP 1 in there, or I can try (b) using nested subqueries, but both seem really messy compared to the current code.
I'm not expecting anyone to write code for me, but if some SQL experts can peruse this code and tell me about any obvious inefficiencies and alternate methods, or ways to speed this up, or techniques I should be using instead, it would be appreciated.
PS: Assume that this DB is for the most part normalized, but poorly designed, and that I am not able to add indexes. I basically have to work with it, as is.
Where the code says (less than) I had to replace a "less than" symbol because it was cropping some of my code.
Thanks!
CREATE PROCEDURE RptCollectionAccountStatusReport AS
SET NOCOUNT ON;
DECLARE #Accounts TABLE
(
[AccountKey] INT IDENTITY(1,1) NOT NULL,
[ManagementCompany] NVARCHAR(50),
[Association] NVARCHAR(100),
[AccountNo] INT UNIQUE,
[StreetAddress] NVARCHAR(65),
[State] NVARCHAR(50),
[PrimaryStatus] NVARCHAR(100),
[PrimaryStatusDate] SMALLDATETIME,
[PrimaryDaysRemaining] INT,
[SecondaryStatus] NVARCHAR(100),
[SecondaryStatusDate] SMALLDATETIME,
[SecondaryDaysRemaining] INT,
[TertiaryStatus] NVARCHAR(100),
[TertiaryStatusDate] SMALLDATETIME,
[TertiaryDaysRemaining] INT,
[ExternalStatus] NVARCHAR(100),
[ExternalStatusDate] SMALLDATETIME,
[ExternalDaysRemaining] INT
);
INSERT INTO
#Accounts (
[ManagementCompany],
[Association],
[AccountNo],
[StreetAddress],
[State])
SELECT
mc.Name AS [ManagementCompany],
a.LegalName AS [Association],
c.CollectionKey AS [AccountNo],
u.StreetNumber + ' ' + u.StreetName AS [StreetAddress],
CASE WHEN c.InheritedAccount = 1 THEN 'ZZ' ELSE u.State END AS [State]
FROM
ManagementCompany mc WITH (NOLOCK)
JOIN
Association a WITH (NOLOCK) ON a.ManagementCompanyKey = mc.ManagementCompanyKey
JOIN
Unit u WITH (NOLOCK) ON u.AssociationKey = a.AssociationKey
JOIN
Collection c WITH (NOLOCK) ON c.UnitKey = u.UnitKey
WHERE
c.Closed IS NULL;
DECLARE #MaxAccountKey INT;
SELECT #MaxAccountKey = MAX([AccountKey]) FROM #Accounts;
DECLARE #index INT;
SET #index = 1;
WHILE #index (less than) #MaxAccountKey BEGIN
DECLARE #CollectionKey INT;
SELECT #CollectionKey = [AccountNo] FROM #Accounts WHERE [AccountKey] = #index;
DECLARE #PrimaryStatus NVARCHAR(100) = NULL;
DECLARE #PrimaryStatusDate SMALLDATETIME = NULL;
DECLARE #PrimaryDaysRemaining INT = NULL;
DECLARE #SecondaryStatus NVARCHAR(100) = NULL;
DECLARE #SecondaryStatusDate SMALLDATETIME = NULL;
DECLARE #SecondaryDaysRemaining INT = NULL;
DECLARE #TertiaryStatus NVARCHAR(100) = NULL;
DECLARE #TertiaryStatusDate SMALLDATETIME = NULL;
DECLARE #TertiaryDaysRemaining INT = NULL;
DECLARE #ExternalStatus NVARCHAR(100) = NULL;
DECLARE #ExternalStatusDate SMALLDATETIME = NULL;
DECLARE #ExternalDaysRemaining INT = NULL;
SELECT TOP 1
#PrimaryStatus = a.StatusName, #PrimaryStatusDate = c.StatusDate, #PrimaryDaysRemaining = c.DaysRemaining
FROM CollectionAccountStatus c WITH (NOLOCK) JOIN AccountStatus a WITH (NOLOCK) ON c.AccountStatusKey = a.AccountStatusKey
WHERE c.CollectionKey = #CollectionKey AND a.StatusType = 'Primary Status' AND a.StatusName 'Cleared'
ORDER BY c.sysCreated DESC;
SELECT TOP 1
#SecondaryStatus = a.StatusName, #SecondaryStatusDate = c.StatusDate, #SecondaryDaysRemaining = c.DaysRemaining
FROM CollectionAccountStatus c WITH (NOLOCK) JOIN AccountStatus a WITH (NOLOCK) ON c.AccountStatusKey = a.AccountStatusKey
WHERE c.CollectionKey = #CollectionKey AND a.StatusType = 'Secondary Status' AND a.StatusName 'Cleared'
ORDER BY c.sysCreated DESC;
SELECT TOP 1
#TertiaryStatus = a.StatusName, #TertiaryStatusDate = c.StatusDate, #TertiaryDaysRemaining = c.DaysRemaining
FROM CollectionAccountStatus c WITH (NOLOCK) JOIN AccountStatus a WITH (NOLOCK) ON c.AccountStatusKey = a.AccountStatusKey
WHERE c.CollectionKey = #CollectionKey AND a.StatusType = 'Tertiary Status' AND a.StatusName 'Cleared'
ORDER BY c.sysCreated DESC;
SELECT TOP 1
#ExternalStatus = a.StatusName, #ExternalStatusDate = c.StatusDate, #ExternalDaysRemaining = c.DaysRemaining
FROM CollectionAccountStatus c WITH (NOLOCK) JOIN AccountStatus a WITH (NOLOCK) ON c.AccountStatusKey = a.AccountStatusKey
WHERE c.CollectionKey = #CollectionKey AND a.StatusType = 'External Status' AND a.StatusName 'Cleared'
ORDER BY c.sysCreated DESC;
UPDATE
#Accounts
SET
[PrimaryStatus] = #PrimaryStatus,
[PrimaryStatusDate] = #PrimaryStatusDate,
[PrimaryDaysRemaining] = #PrimaryDaysRemaining,
[SecondaryStatus] = #SecondaryStatus,
[SecondaryStatusDate] = #SecondaryStatusDate,
[SecondaryDaysRemaining] = #SecondaryDaysRemaining,
[TertiaryStatus] = #TertiaryStatus,
[TertiaryStatusDate] = #TertiaryStatusDate,
[TertiaryDaysRemaining] = #TertiaryDaysRemaining,
[ExternalStatus] = #ExternalStatus,
[ExternalStatusDate] = #ExternalStatusDate,
[ExternalDaysRemaining] = #ExternalDaysRemaining
WHERE
[AccountNo] = #CollectionKey;
SET #index = #index + 1;
END;
SELECT
[ManagementCompany],
[Association],
[AccountNo],
[StreetAddress],
[State],
[PrimaryStatus],
CONVERT(VARCHAR, [PrimaryStatusDate], 101) AS [PrimaryStatusDate],
[PrimaryDaysRemaining],
[SecondaryStatus],
CONVERT(VARCHAR, [SecondaryStatusDate], 101) AS [SecondaryStatusDate],
[SecondaryDaysRemaining],
[TertiaryStatus],
CONVERT(VARCHAR, [TertiaryStatusDate], 101) AS [TertiaryStatusDate],
[TertiaryDaysRemaining],
[ExternalStatus],
CONVERT(VARCHAR, [ExternalStatusDate], 101) AS [ExternalStatusDate],
[ExternalDaysRemaining]
FROM
#Accounts
ORDER BY
[ManagementCompany],
[Association],
[StreetAddress]
ASC;
Don't try to guess where the query is going wrong - look at the execution plan. It will tell you what's chewing up your resources.
You can update directly from another table, even from a table variable: SQL update from one Table to another based on a ID match
That would allow you to combine everything in your loop into a single (massive) statement. You can join to the same tables for the secondary and tertiary statuses using different aliases, e.g.,
JOIN AccountStatus As TertiaryAccountStatus...AND a.StatusType = 'Tertiary Status'
JOIN AccountStatus AS SecondaryAccountStatus...AND a.StatusType = 'Secondary Status'
I'll bet you don't have an index on the AccountStatus.StatusType field. You might try using the PK of that table instead.
HTH.
First use a temp table instead of a table varaiable. These can be indexed.
Next, do not loop! Looping is bad for performance in virtually every case. This loop ran 50000 times rather than once for 50000 records, it will be horrible when you havea million records! Here is a link that will help you understand how to do set-based processing instead. It is written to avoid cursos but loops are similar to cursors, so it should help.
http://wiki.lessthandot.com/index.php/Cursors_and_How_to_Avoid_Them
And (nolock) will give dirty data reads which can be very bad for reporting. If you are in a version of SQl Server higher than 2000, there are better choices.
SELECT #CollectionKey = [AccountNo] FROM #Accounts WHERE [AccountKey] = #index;
This query would benefit from a PRIMARY KEY declaration on your table variable.
When you say IDENTITY, you are asking the database to auto-populate the column.
When you say PRIMARY KEY, you are asking the database to organize the data into a clustered index.
These two concepts are very different. Typically, you should use both of them.
DECLARE #Accounts TABLE
(
[AccountKey] INT IDENTITY(1,1) PRIMARY KEY,
I am not able to add indexes.
In that case, copy the data to a database where you may add indexes. And use: SET STATISTICS IO ON

how do I remove nested select statement

I have a Name table with the columns
NameID
Name
TypeID
With the following SQL
SELECT[NameID]
FROM[Name]
WHERE[TypeID] = #TypeID
AND NameID >= (SELECT MIN([NameID])
FROM [Name]
WHERE [Name]='Billy' AND [TypeID]=#TypeID)
Ive been asked to convert this to an Inner Join without using any nested select but not sure how to.
thanks for your help!
Originally I didn't think you needed a join at all,
;WITH n AS
(
SELECT
NameID,
rn = ROW_NUMBER() OVER (ORDER BY NameID)
FROM [Name]
WHERE TypeID = #TypeID
AND [Name] = 'Billy'
)
SELECT NameID
FROM n
WHERE rn > 1;
Then again, maybe I do not have the requirements clear. What is the purpose of this query?
SELECT n1.NameID
FROM [Name] AS n1
INNER JOIN
(
SELECT NameID = MIN(NameID)
FROM [Name]
WHERE TypeID = #TypeID
AND [Name] = 'Billy'
) AS n2
ON n1.NameID >= n2.NameID
WHERE n1.TypeID = #TypeID;
I agree with Lukas, I am not sure why the person who is telling you to change this thinks an inner join will be better than your original.
You could remove the nested part via: -
declare #NameID int
select #NameID = (SELECT MIN([NameID])
FROM [Name] WHERE [Name]='Billy' AND
[TypeID]=#TypeID)
SELECT [NameID] FROM [Name] WHERE [TypeID]
= #TypeID AND NameID >= #NameID
But as stated already, this does not provide any performance benefit as the subquery would only be evaluated once in your version, the same as in this.
Well, it looks like just moving the condition [Name]='Billy' should produce the same result for this specific query. So convert your original:
SELECT[NameID]
FROM[Name]
WHERE[TypeID] = #TypeID
AND NameID >= (SELECT MIN([NameID])
FROM [Name]
WHERE [Name]='Billy' AND [TypeID]=#TypeID)
to:
SELECT[NameID]
FROM[Name]
WHERE[TypeID] = #TypeID
AND[Name]='Billy'

How can I efficiently do a database massive update?

I have a table with some duplicate entries. I have to discard all but one, and then update this latest one. I've tried with a temporary table and a while statement, in this way:
CREATE TABLE #tmp_ImportedData_GenericData
(
Id int identity(1,1),
tmpCode varchar(255) NULL,
tmpAlpha3Code varchar(50) NULL,
tmpRelatedYear int NOT NULL,
tmpPreviousValue varchar(255) NULL,
tmpGrowthRate varchar(255) NULL
)
INSERT INTO #tmp_ImportedData_GenericData
SELECT
MCS_ImportedData_GenericData.Code,
MCS_ImportedData_GenericData.Alpha3Code,
MCS_ImportedData_GenericData.RelatedYear,
MCS_ImportedData_GenericData.PreviousValue,
MCS_ImportedData_GenericData.GrowthRate
FROM MCS_ImportedData_GenericData
INNER JOIN
(
SELECT CODE, ALPHA3CODE, RELATEDYEAR, COUNT(*) AS NUMROWS
FROM MCS_ImportedData_GenericData AS M
GROUP BY M.CODE, M.ALPHA3CODE, M.RELATEDYEAR
HAVING count(*) > 1
) AS M2 ON MCS_ImportedData_GenericData.CODE = M2.CODE
AND MCS_ImportedData_GenericData.ALPHA3CODE = M2.ALPHA3CODE
AND MCS_ImportedData_GenericData.RELATEDYEAR = M2.RELATEDYEAR
WHERE
(MCS_ImportedData_GenericData.PreviousValue <> 'INDEFINITO')
-- SELECT * from #tmp_ImportedData_GenericData
-- DROP TABLE #tmp_ImportedData_GenericData
DECLARE #counter int
DECLARE #rowsCount int
SET #counter = 1
SELECT #rowsCount = count(*) from #tmp_ImportedData_GenericData
-- PRINT #rowsCount
WHILE #counter < #rowsCount
BEGIN
SELECT
#Code = tmpCode,
#Alpha3Code = tmpAlpha3Code,
#RelatedYear = tmpRelatedYear,
#OldValue = tmpPreviousValue,
#GrowthRate = tmpGrowthRate
FROM
#tmp_ImportedData_GenericData
WHERE
Id = #counter
DELETE FROM MCS_ImportedData_GenericData
WHERE
Code = #Code
AND Alpha3Code = #Alpha3Code
AND RelatedYear = #RelatedYear
AND PreviousValue <> 'INDEFINITO' OR PreviousValue IS NULL
UPDATE
MCS_ImportedData_GenericData
SET
PreviousValue = #OldValue, GrowthRate = #GrowthRate
WHERE
Code = #Code
AND Alpha3Code = #Alpha3Code
AND RelatedYear = #RelatedYear
AND MCS_ImportedData_GenericData.PreviousValue ='INDEFINITO'
SET #counter = #counter + 1
END
but it takes too long time, even if there are just 20000 - 30000 rows to process.
Does anyone has some suggestions in order to improve performance?
Thanks in advance!
WITH q AS (
SELECT m.*, ROW_NUMBER() OVER (PARTITION BY CODE, ALPHA3CODE, RELATEDYEAR ORDER BY CASE WHEN PreviousValue = 'INDEFINITO' THEN 1 ELSE 0 END)
FROM MCS_ImportedData_GenericData m
WHERE PreviousValue <> 'INDEFINITO'
)
DELETE
FROM q
WHERE rn > 1
Quassnoi's answer uses SQL Server 2005+ syntax, so I thought I'd put in my tuppence worth using something more generic...
First, to delete all the duplicates, but not the "original", you need a way of differentiating the duplicate records from each other. (The ROW_NUMBER() part of Quassnoi's answer)
It would appear that in your case the source data has no identity column (you create one in the temp table). If that is the case, there are two choices that come to my mind:
1. Add the identity column to the data, then remove the duplicates
2. Create a "de-duped" set of data, delete everything from the original, and insert the de-deduped data back into the original
Option 1 could be something like...
(With the newly created ID field)
DELETE
[data]
FROM
MCS_ImportedData_GenericData AS [data]
WHERE
id > (
SELECT
MIN(id)
FROM
MCS_ImportedData_GenericData
WHERE
CODE = [data].CODE
AND ALPHA3CODE = [data].ALPHA3CODE
AND RELATEDYEAR = [data].RELATEDYEAR
)
OR...
DELETE
[data]
FROM
MCS_ImportedData_GenericData AS [data]
INNER JOIN
(
SELECT
MIN(id) AS [id],
CODE,
ALPHA3CODE,
RELATEDYEAR
FROM
MCS_ImportedData_GenericData
GROUP BY
CODE,
ALPHA3CODE,
RELATEDYEAR
)
AS [original]
ON [original].CODE = [data].CODE
AND [original].ALPHA3CODE = [data].ALPHA3CODE
AND [original].RELATEDYEAR = [data].RELATEDYEAR
AND [original].id <> [data].id
I don't understand used syntax perfectly enough to post an exact answer, but here's an approach.
Identify rows you want to preserve (eg. select value, ... from .. where ...)
Do the update logic while identifying (eg. select value + 1 ... from ... where ...)
Do insert select to a new table.
Drop the original, rename new to original, recreate all grants/synonyms/triggers/indexes/FKs/... (or truncate the original and insert select from the new)
Obviously this has a prety big overhead, but if you want to update/clear millions of rows, it will be the fastest way.