SQL IN operator in update query causes a lot of time - sql

Below is a update query which is to update a table with about 40000 records:
UPDATE tableName
SET colA = val, colB = val
WHERE ID IN (select RecordIDs from tableB where needUpdate = 'Y')
When the above query is executed, I found out that the below query taken ~ 15 seconds
SELECT RecordIDs
FROM tableB
WHERE needUpdate = 'Y'
But when I take away the where clause (i.e. update tableName set colA = val, colB = val) The query runs smoothly.
Why this happens? are there any ways to shorten the time of execution?
Edited:
Below is the structure of both tables:
tableName:
ID int,
VehicleBrandID int,
VehicleLicenseExpiryDate nvarchar(25),
LicensePlateNo nvarchar(MAX),
ContactPerson nvarchar(MAX),
ContactPersonID nvarchar(MAX),
ContactPersonPhoneNumber nvarchar(MAX),
ContactPersonAddress nvarchar(MAX),
CreatedDate nvarchar(MAX),
CreatedBy nvarchar(MAX)
PRIMARY KEY (ID)
tableB:
RowNumber int
RecordIDs int
NeedUpdate char(1)
PRIMARY KEY (RowNumber)
Edited
Below screenshot is the execution plan for the update query

The execution plan shows you are using table variables and are missing a useful index.
Keep the existing PK on #output
DECLARE #output TABLE (
ID INT PRIMARY KEY,
VehicleBrandID INT,
VehicleLicenseExpiryDate NVARCHAR(25),
LicensePlateNo NVARCHAR(MAX),
ContactPerson NVARCHAR(MAX),
ContactPersonID NVARCHAR(MAX),
ContactPersonPhoneNumber NVARCHAR(MAX),
ContactPersonAddress NVARCHAR(MAX),
CreatedDate NVARCHAR(MAX), /*<-- Don't store dates as strings*/
CreatedBy NVARCHAR(MAX))
And add a new index to #tenancyEditable
DECLARE #tenancyEditable TABLE (
RowNumber INT PRIMARY KEY,
RecordIDs INT,
NeedUpdate CHAR(1),
UNIQUE (NeedUpdate, RecordIDs, RowNumber))
With these indexes in place the following query
UPDATE #output
SET LicensePlateNo = ''
WHERE ID IN (SELECT RecordIDs
FROM #tenancyEditable
WHERE NeedUpdate = 'Y')
OPTION (RECOMPILE)
Can generate the more efficient looking
Also you should use appropriate datatypes rather than storing everything as NVARCHAR(MAX). A person name isn't going to need more than nvarchar(100) at most and CreatedDate should be stored as date[time2] for example.

I suppose you are in one of the 2 cases below:
1/ STATISTICS are not updated due to a some recently modification of in your table. In this case you should execute this:
UPDATE STATISTICS tableB
2/ I suppose a wrong query plan is used, case when I recommend to execute this in order to force recompilation of the query:
SELECT RecordIDs
FROM tableB
WHERE needUpdate = 'Y'
OPTION (RECOMPILE)
Tell us the result and we'll come with more details about.

This is an alternative. It is worth it to try in your environment as it has been demonstrated for others to be faster.
MERGE INTO tableName tn
USING (
SELECT recordIDs
FROM tableB
WHERE needUpdate = 'Y'
) tb
ON tn.ID = tb.recordID
WHEN MATCHED THEN
UPDATE
SET colA = tb.val,
colB = tb.val;
EDIT:
I am not claiming this to be faster in every case or in every setup/environment - just that it is worth a try as it has worked for me and others I have worked with or read about.

you can use inner join instead of IN clause.
update t
set
t.colA = val, t.colB = val
From tablename
inner join tableb x on
t.id = x.recordid
where x.needUpdate = 'Y'
Although the UPDATE...FROM
syntax is essential in some
circumstances, I prefer to use
subqueries (by using IN clause) whenever
possible.

Related

Update with a case when (one row joined with multiple rows) - Execution plan question

The following test case
IF OBJECT_ID('tempdb..#a') IS NOT NULL
DROP TABLE #a
IF OBJECT_ID('tempdb..#b') IS NOT NULL
DROP TABLE #b
CREATE TABLE #a (
[Value] NVARCHAR(MAX),
[PickedPriority] NVARCHAR(MAX)
)
INSERT INTO #a([Value])
VALUES('Test')
CREATE TABLE #b (
[RowId] INT IDENTITY(1,1),
[ColumnMiddlePriority] NVARCHAR(MAX),
[ColumnTopPriority] NVARCHAR(MAX),
[ColumnLowPriority] NVARCHAR(MAX),
PRIMARY KEY([RowId])
)
INSERT INTO #b([ColumnLowPriority])
VALUES(N'Test')
INSERT INTO #b([ColumnTopPriority])
VALUES(N'Test')
INSERT INTO #b([ColumnMiddlePriority])
VALUES(N'Test')
UPDATE A
SET
A.[PickedPriority] = CASE
WHEN B.[ColumnTopPriority] = A.[Value] THEN N'TOP'
WHEN B.[ColumnMiddlePriority] = A.[Value] THEN N'MIDDLE'
WHEN B.[ColumnLowPriority] = A.[Value] THEN N'LOW'
END
FROM #a A
INNER JOIN #b B ON (
A.[Value] = B.[ColumnLowPriority]
OR A.[Value] = B.[ColumnTopPriority]
OR A.[Value] = B.[ColumnMiddlePriority]
)
produces result: PickedPriority is always TOP even if I try to change the order of insertion in table #b.
When I check the execution plan, I can understand why: a GROUP BY is operated after matching A row and B rows, then the left to right testing of the case when do the trick, but is the result deterministic here ?
Does a different execution plan could end up with a different result ?
I've found my answer in this article:
https://sqlperformance.com/2019/07/sql-performance/the-any-aggregate-is-broken
and in the documentation
https://learn.microsoft.com/en-us/sql/t-sql/queries/update-transact-sql?view=sql-server-2017#best-practices
So no, in my case (multiple lines matching one line in an update statement), not deterministic at all.

MS SQL Execute multiple updates based on a list

I have the problem that I found out how to fix the database the only problem is that I have to insert the CaseNumber for one execution everytime.
In C# I would use somekind of string list for the broken records is there something in MS SQL.
My Code so far I implemented a variable CaseNumber. I have a table with a lot of Casenumber records that are broken. Is there a way to execute this for every Casenumber of a different table.
Like:
1. Take the first casenumber and run this script.
2. Than take the second one and run this script again until every casenumber was fixed.
Thx in advance for any idea.
GO
DECLARE #CaseNumber VARCHAR(50)
SET #CaseNumber = '25615'
print 'Start fixing broken records.'
print 'Fixing FIELD2'
UPDATE t
SET t.FIELD2 = ( SELECT DISTINCT TOP 1 FIELD2
FROM {myTable} t2
WHERE IDFIELD = #CaseNumber
AND FIELD2 IS NOT NULL )
FROM {myTable} t
WHERE FIELD2 IS NULL
AND IDFIELD = #CaseNumber
</Code>
Here are a couple different options...
-- This verion will just "fix" everything that can be fixed.
UPDATE mt1 SET
mt1.FIELD2 = mtx.FIELD2
FROM
dbo.myTable mt1
CROSS APPLY (
SELECT TOP (1)
mt2.FIELD2
FROM
dbo.myTable mt2
WHERE
mt1.IDFIELD = mt2.IDFIELD
AND mt2.FIELD2 IS NOT NULL
) mtx
WHERE
mt1.FIELD2 IS NULL;
And if, for whatever reason, you don't want to fix the entire table all in one go. You can restrain to to just those you specify...
-- This version will works off the same principal but limits itself to only those values in the #CaseNumCSV parameter.
DECLARE #CaseNumCSV VARCHAR(8000) = '25615,25616,25617,25618,25619';
IF OBJECT_ID('tempdb..#CaseNum', 'U') IS NOT NULL
BEGIN DROP TABLE #CaseNum; END;
CREATE TABLE #CaseNum (
CaseNumber VARCHAR(50) NOT NULL,
PRIMARY KEY (CaseNumber)
WITH(IGNORE_DUP_KEY = ON) -- just in case the same CaseNumber is in the string multiple times.
);
INSERT #CaseNum(CaseNumber)
SELECT
CaseNumber = dsk.Item
FROM
dbo.DelimitedSplit8K(#CaseNumCSV, ',') dsk;
-- a copy of DelimitedSplit8K can be found here: http://www.sqlservercentral.com/articles/Tally+Table/72993/
UPDATE mt1 SET
mt1.FIELD2 = mtx.FIELD2
FROM
#CaseNum cn
JOIN dbo.myTable mt1
ON cn.CaseNumber = mt1.IDFIELD
CROSS APPLY (
SELECT TOP (1)
mt2.FIELD2
FROM
dbo.myTable mt2
WHERE
mt1.IDFIELD = mt2.IDFIELD
AND mt2.FIELD2 IS NOT NULL
) mtx
WHERE
mt1.FIELD2 IS NULL;

Why is an SQL update on a table variable slower than with a temp table

I have something like:
DECLARE #tbl TABLE
(
name varchar(255),
type int
)
UPDATE c
SET c.name = t.name
FROM dbo.cars c
JOIN #tbl t ON t.type = c.type
I have a stored procedure that does something similar but it takes over 20 minutes with the table variable. It runs in less than 2 minutes if I change it from table variable to temp table. Why is this so?
I think this answer helpfull for you
https://stackoverflow.com/a/64891/1887827
I recommend that you look at this link;
https://support.microsoft.com/en-gb/kb/305977

MS sql server looping through huge table

I have a table with 9 million record I need to loop through each row and need to insert into multiple tables in each iteration.
My example query is
//this is the table with 9 million records
create table tablename
(
ROWID INT IDENTITY(1, 1) primary key ,
LeadID int,
Title varchar(20),
FirstName varchar(50),
MiddleName varchar(20),
Surname varchar(50)
)
declare #counter int
declare #leadid int
Declare #totalcounter int
set #counter = 1
Select #totalcounter = count(id) from tablename
while(#counter < #totalcounter)
begin
select #leadid = leadid from tablename
where ROWID = #counter
--perform some insert into multiple tables
--in each iteration i need to do this as well
select * from [sometable]
inner join tablename where leadid = #leadid
set #counter = #counter + 1
end
The problem here is this is taking too long especially the join on each iteration.
Can someone please help me to optimize this.
Yes, your join is taking long because there is no join condition specified between your two tables so you are creating a Cartesian product. That is definitely going to take a while.
If yuo want to optimize this, specifiy what you want to join those tables on.
If it is still slow, have a look at appropriate indexes.
It looks like you are trying to find all the rows in sometable that have the same leadid as the rows in tablename ? If so, a simple join should work
select t2.*
from tablename t2 inner join sometable t2
on t1.leadid=t2.leadid
As long as you have an index on leaid you shouldn't have any problems
What are you really trying to do?

Stored procedure optimization

i have a stored procedure which takes lot of time to execure .Can any one suggest a better approch so that the same result set is achived.
ALTER PROCEDURE [dbo].[spFavoriteRecipesGET]
#USERID INT, #PAGENUMBER INT, #PAGESIZE INT, #SORTDIRECTION VARCHAR(4), #SORTORDER VARCHAR(4),#FILTERBY INT
AS
BEGIN
DECLARE
#ROW_START INT
DECLARE
#ROW_END INT
SET
#ROW_START = (#PageNumber-1)* #PageSize+1
SET
#ROW_END = #PageNumber*#PageSize
DECLARE
#RecipeCount INT
DECLARE
#RESULT_SET_TABLE
TABLE
(
Id INT NOT NULL IDENTITY(1,1),
FavoriteRecipeId INT,
RecipeId INT,
DateAdded DATETIME,
Title NVARCHAR(255),
UrlFriendlyTitle NVARCHAR(250),
[Description] NVARCHAR(MAX),
AverageRatingId FLOAT,
SubmittedById INT,
SubmittedBy VARCHAR(250),
RecipeStateId INT,
RecipeRatingId INT,
ReviewCount INT,
TweaksCount INT,
PhotoCount INT,
ImageName NVARCHAR(50)
)
INSERT INTO #RESULT_SET_TABLE
SELECT
FavoriteRecipes.FavoriteRecipeId,
Recipes.RecipeId,
FavoriteRecipes.DateAdded,
Recipes.Title,
Recipes.UrlFriendlyTitle,
Recipes.[Description],
Recipes.AverageRatingId,
Recipes.SubmittedById,
COALESCE(users.DisplayName,users.UserName,Recipes.SubmittedBy) As SubmittedBy,
Recipes.RecipeStateId,
RecipeReviews.RecipeRatingId,
COUNT(RecipeReviews.Review),
COUNT(RecipeTweaks.Tweak),
COUNT(Photos.PhotoId),
dbo.udfGetRecipePhoto(Recipes.RecipeId) AS ImageName
FROM
FavoriteRecipes
INNER JOIN Recipes ON FavoriteRecipes.RecipeId=Recipes.RecipeId AND Recipes.RecipeStateId <> 3
LEFT OUTER JOIN RecipeReviews ON RecipeReviews.RecipeId=Recipes.RecipeId AND RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeRatingId= (
SELECT MAX(RecipeReviews.RecipeRatingId)
FROM RecipeReviews
WHERE RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeId=FavoriteRecipes.RecipeId
)
OR RecipeReviews.RecipeRatingId IS NULL
LEFT OUTER JOIN RecipeTweaks ON RecipeTweaks.RecipeId = Recipes.RecipeId AND RecipeTweaks.TweakedById= #UserId
LEFT OUTER JOIN Photos ON Photos.RecipeId = Recipes.RecipeId
AND Photos.UploadedById = #UserId AND Photos.RecipeId = FavoriteRecipes.RecipeId
AND Photos.PhotoTypeId = 1
LEFT OUTER JOIN users ON Recipes.SubmittedById = users.UserId
WHERE
FavoriteRecipes.UserId=#UserId
GROUP BY
FavoriteRecipes.FavoriteRecipeId,
Recipes.RecipeId,
FavoriteRecipes.DateAdded,
Recipes.Title,
Recipes.UrlFriendlyTitle,
Recipes.[Description],
Recipes.AverageRatingId,
Recipes.SubmittedById,
Recipes.SubmittedBy,
Recipes.RecipeStateId,
RecipeReviews.RecipeRatingId,
users.DisplayName,
users.UserName,
Recipes.SubmittedBy;
WITH SortResults
AS
(
SELECT
ROW_NUMBER() OVER (
ORDER BY CASE WHEN #SORTDIRECTION = 't' AND #SORTORDER='a' THEN TITLE END ASC,
CASE WHEN #SORTDIRECTION = 't' AND #SORTORDER='d' THEN TITLE END DESC,
CASE WHEN #SORTDIRECTION = 'r' AND #SORTORDER='a' THEN AverageRatingId END ASC,
CASE WHEN #SORTDIRECTION = 'r' AND #SORTORDER='d' THEN AverageRatingId END DESC,
CASE WHEN #SORTDIRECTION = 'mr' AND #SORTORDER='a' THEN RecipeRatingId END ASC,
CASE WHEN #SORTDIRECTION = 'mr' AND #SORTORDER='d' THEN RecipeRatingId END DESC,
CASE WHEN #SORTDIRECTION = 'd' AND #SORTORDER='a' THEN DateAdded END ASC,
CASE WHEN #SORTDIRECTION = 'd' AND #SORTORDER='d' THEN DateAdded END DESC
) RowNumber,
FavoriteRecipeId,
RecipeId,
DateAdded,
Title,
UrlFriendlyTitle,
[Description],
AverageRatingId,
SubmittedById,
SubmittedBy,
RecipeStateId,
RecipeRatingId,
ReviewCount,
TweaksCount,
PhotoCount,
ImageName
FROM
#RESULT_SET_TABLE
WHERE
((#FILTERBY = 1 AND SubmittedById= #USERID)
OR ( #FILTERBY = 2 AND (SubmittedById <> #USERID OR SubmittedById IS NULL))
OR ( #FILTERBY <> 1 AND #FILTERBY <> 2))
)
SELECT
RowNumber,
FavoriteRecipeId,
RecipeId,
DateAdded,
Title,
UrlFriendlyTitle,
[Description],
AverageRatingId,
SubmittedById,
SubmittedBy,
RecipeStateId,
RecipeRatingId,
ReviewCount,
TweaksCount,
PhotoCount,
ImageName
FROM
SortResults
WHERE
RowNumber BETWEEN #ROW_START AND #ROW_END
print #ROW_START
print #ROW_END
SELECT
#RecipeCount=dbo.udfGetFavRecipesCount(#UserId)
SELECT
#RecipeCount AS RecipeCount
SELECT COUNT(Id) as FilterCount FROM #RESULT_SET_TABLE
WHERE
((#FILTERBY = 1 AND SubmittedById= #USERID)
OR (#FILTERBY = 2 AND (SubmittedById <> #USERID OR SubmittedById IS NULL))
OR (#FILTERBY <> 1 AND #FILTERBY <> 2))
END
You need to look at the execution plan to see where the time is going. It could be indexes, table-scans caused by your UDF, any number of things. As you anayze the plan, try to break up the query into smaller pieces to see if you can make a difference in them.
Then learn about ROW_NUMBER to see if you can do without the local table.
Couple notes
Indexing - often times when people create procedures which use temp table or table variable they fail to realize you can create indexes on those objects and this can have massive performance implications.
UDF - Sometimes the query processor will effectively inline UDF logic and sometimes not, look closely at your query plan an see how this is being handled. Often times if you manually inline this logic in something like a correlated sub-query you can boost performance a lot.
As others have said, the only way to know is to look at explain plans. Glancing over the code, this part looks kind of fishy:
AND RecipeReviews.RecipeRatingId= (
SELECT MAX(RecipeReviews.RecipeRatingId)
FROM RecipeReviews
WHERE RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeId=FavoriteRecipes.RecipeId
)
In general, doing non-trivial stuff in join conditions is a Bad Idea. I would factor that out into a sub-select, and since it's an outer join, you'd probably have to combine that with RecipeReviews somehow.
BUT: All of this is speculation! Explain! Measure!
Well in addition to the possible poor performance of the UDF, this line of code concerns me
LEFT OUTER JOIN RecipeReviews
ON RecipeReviews.RecipeId=Recipes.RecipeId
AND RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeRatingId=
(SELECT MAX(RecipeReviews.RecipeRatingId)
FROM RecipeReviews
WHERE RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeId=FavoriteRecipes.RecipeId )
OR RecipeReviews.RecipeRatingId IS NULL
It is generally a poor practice to use a subquery as part of a join. I would strongly supect this is not using any indexes you may have. And the OR part doesn;t make sense to mea atll all, the left join shoudl get you this.
Rewrite it to make a derived table instead.
If you have a lot of records a temp table usually performs better than a table variable and can (and probably should) be indexed.
You need to add parentheses around your OR conditions.
LEFT OUTER JOIN RecipeReviews
ON RecipeReviews.RecipeId = Recipes.RecipeId
AND RecipeReviews.ReviewedById = #UserId
AND
-- insert open parenthesis here:
(
RecipeReviews.RecipeRatingId = (... subquery ...)
OR RecipeReviews.RecipeRatingId IS NULL
-- insert close parenthesis here:
)
the very first, simple thing i would do, is move all your declare statements to the top.
DECLARE #ROW_START INT,
#ROW_END INT,
#RecipeCount INT
DECLARE
#RESULT_SET_TABLE
TABLE
(
Id INT NOT NULL IDENTITY(1,1),
)
The next part, which is still rather simple, is stuff like this:
AND Recipes.RecipeStateId <> 3
AND RecipeTweaks.TweakedById= #UserId
This can be taken out of the join and move to the where clause. if you can, change the <> to an in statement so that it can utlize an index seek.
AND RecipeReviews.RecipeRatingId=
(
SELECT MAX(RecipeReviews.RecipeRatingId)
FROM RecipeReviews
WHERE RecipeReviews.ReviewedById=#UserId
AND RecipeReviews.RecipeId=FavoriteRecipes.RecipeId
)
that's jsut crazy looking and needs to be completely redone.