SELECT data first then UPDATE using clustered index or UPDATE directly? - sql

There's table (in SQL SERVER) which have high concurrent access (Ex: Transactions), but I need to update some data in this table. But when filtering data that needed to update, there's no indexes associated with that column(s).
What would be the best approach for minimum Table/ Rows lock time?
Approach 1:
DECLARE #vt_TxnIds TABLE
(
[Id] INT
)
/** Filter the required data first **/
INSERT INTO #vt_TxnIds
SELECT TXN.[Id]
FROM [Transactions] TXN WITH (NOLOCK) -- NOLOCK is fine in this case
LEFT JOIN [Account] ACC WITH (NOLOCK)
ON ACC.[Id] = TXN.[AccountId] AND
ACC.[IsActive] = 0
WHERE TXN.[Status] = 1 -- This column is not indexed
AND ACC.[Id] IS NULL
/** Then update by clustered Index **/
BEGIN TRAN;
UPDATE [Transactions]
SET [Status] = 5
WHERE [Id] IN ( -- [Id] is clustered index
SELECT [Id]
FROM #vt_TxnIds
)
COMMIT;
Approach 2:
BEGIN TRAN;
UPDATE TXN
SET TXN.[Status] = 5
FROM [Transactions] TXN
LEFT JOIN [Account] ACC WITH (NOLOCK)
ON ACC.[Id] = TXN.[AccountId] AND
ACC.[IsActive] = 0
WHERE TXN.[Status] = 1 -- This column is not indexed
AND ACC.[Id] IS NULL
COMMIT;
I'm not considering about the execution time. For example in my case, it's okay that whole query take 15 seconds but table/ rows locked for 5 seconds. Rather than whole table locked for 10 seconds and query also take 10 seconds.
Could someone please suggest what's the best approach or any alternative approach that full-fill my requirement?
Many thanks!
Update: Creating new index is not an option.

The first option is pointless extra work, and does not conform to ACID properties.
The unmentioned Approach #3 is best:
Approach #2 is good as a starting point
Remove the transaction as it is only a single statement
Remove the NOLOCK hint as that will just cause incorrect results and weird errors
Convert the left-join to a NOT EXISTS which is often more efficient.
UPDATE TXN
SET TXN.Status = 5
FROM Transactions TXN
WHERE TXN.Status = 1
AND NOT EXISTS (SELECT 1
FROM Account ACC
WHERE ACC.Id = TXN.AccountId
AND ACC.IsActive = 0
);
For this to work efficiently, you will want indexes (either clustered or non-clustered)
TXN (Status, AccountId)
ACC (IsActive, Id)
Alternatively you can use filtered non-clustered indexes
TXN (AccountId) INCLUDE (Status) WHERE (Status = 1)
ACC (Id) INCLUDE (IsActive) WHERE (IsActive = 0)
If you want to prevent a lot of locking and/or you cannot add the indexes, you can do the update in a loop on a few rows at a time.
Note that a transaction is not used here, to prevent excessive locking. Obviously you cannot roll back each run of the loop once finished.
DECLARE #batchSize bigint = 1000;
WHILE (1=1)
BEGIN
UPDATE TOP (#batchSize) TXN
SET TXN.Status = 5
FROM Transactions TXN
WHERE TXN.Status = 1
AND NOT EXISTS (SELECT 1
FROM Account ACC
WHERE ACC.Id = TXN.AccountId
AND ACC.IsActive = 0
);
IF ##ROWCOUNT < #batchSize
BREAK;
WAITFOR DELAY '00:00:05'; -- or some other wait time
END;

Presumably this update is required for your application to function correctly. When dealing with an overzealous datababase administrator (I didn't say "incompetent", did I? :-) you, the developer, get the application right and leave the DBA to sort out the performance and table-locking problems. They can always add an index later when your production code gets slow. To which you say "hey, good idea!" presuming they ask you.
The same logic holds true for NOLOCK. The DBA can tell you if that's necessary. (It probably isn't.) Leave it out of your work.
Your objective here is to minimize the time during which a table is locked, as you said. Your secondary objective is to minimize the number of rows involved in any particular UPDATE operation.
You can do that, in SQL Server, by using TOP (n) to control the number of rows. That means you do multiple UPDATEs and keep going until the job is done. This kind of thing will work. (not debugged.)
SET #batchsize = 100;
SET #count = 1;
WHILE #count > 0 BEGIN
SET DEADLOCK_PRIORITY LOW;
UPDATE TOP (#batchsize) TXN
SET TXN.[Status] = 5
FROM [Transactions] TXN
LEFT JOIN [Account] ACC
ON ACC.[Id] = TXN.[AccountId] AND ACC.[IsActive] = 0
WHERE TXN.[Status] = 1
AND ACC.[Id] IS NULL;
SET #count = ##ROWCOUNT;
END;
This works because your UPDATE sets Transactions.Status to 5. Once a row has been updated, that same row won't be chosen again for update.
Setting the deadlock priority to low is a good idea for this sort of repeating operation. If somehow your update query causes a deadlock with other app software it tells SQL server to stop your query rather than others. Stopping your query doesn't matter: your update will catch the same rows the next time it runs.
Now, obviously, this doesn't update the whole table in a single ACID transaction. Instead it's each batch. But I suspect that will not damage your application, or your transactional code would have done the update in real time.

Related

SQL Server - Using joins in Update statement

I have a HRUser and an Audit table, both are in production with large number of rows.
Now I have added one more column to my HRUser table called IsActivated.
I need to create a one-time script which will be executed in production and populate data into this IsActivated column. After execution of this one-time script onwards, whenever the users activate their account, the HRUser table's IsActivated column will automatically be updated.
For updating the IsActivated column in the HRUser table, I need to check the Audit table whether the user has logged in till now.
UPDATE [dbo].HRUser
SET IsActivated = 1
FROM dbo.[UserAudit] A
JOIN dbo.[HRUser] U ON A.UserId = U.UserId
WHERE A.AuditTypeId = 14
AuditTypeId=14 means the user has logged in and user can login any number of times and every time the user logs in it will get captured in the UserAudit table...
The logic is that if the user has logged in at least once means the user is activated.
This cannot be tested on lower environments and it need to be directly executed on production as in lower environments we don’t have any data in the UserAudit table.
I am not really sure if that works as I have never used joins in update statement, I am looking for suggestions for any better approach than this for accomplishing my task
You could use EXISTS and a correlated subquery to filter on rows whose UserId has at least one audit event of id 14:
UPDATE h
SET IsActivated = 1
FROM [dbo].HRUser h
WHERE EXISTS (
SELECT 1 FROM
FROM dbo.[UserAudit] a
WHERE a.UserId = h.UserId AND a.AuditTypeId = 14
)
Note that there is no point reopening the target table in the subquery; you just need to correlate it with the outer query.
Two methods below. Method 1 is NOT recommended for tables "in production with large number of rows". But it is much easier to code. Method 2 works in production with no downtime.
Whichever method you choose: TEST it outside production. Copy the data from production. If you cannot do that, then build your own. Build a toy system. Highly recommended that you test at some level before running either method in production.
METHOD 1:
Updating on a join is straight forward. Use an alias. Reminder, this is NOT RECOMMENDED "with large number of rows" AND production running. The SQL Server optimizer most likely will escalate locks on both tables and block the tables until the update completes. IF you are taking an outage and are not concerned with how long the update takes, this method works.
UPDATE U
SET IsActivated = 1
FROM dbo.[UserAudit] A
JOIN dbo.[HRUser] U ON A.UserId = U.UserId
WHERE A.AuditTypeId = 14
METHOD 2:
IF you cannot afford to stop your production systems for this update (and most of us cannot), then I recommend that you do 2 things:
Set up a loop with the transaction inside the loop. This means that the optimizer will use row locks and not block the entire table. This method may take longer, but it will not block production. If the update takes longer, not a concern as long as the devops team never calls because production is blocked.
Capture rows to be updated outside the transaction. THEN, update based on a primary key (fastest). The total transaction time is how long the rows updated will be blocked.
Here is a toy example for looping.
-- STEP 1: get data to be updated
CREATE TABLE #selected ( ndx INT IDENTITY(1,1), UserId INT )
INSERT INTO #selected (UserId)
SELECT UserId
FROM dbo.[UserAudit] A
JOIN dbo.[HRUser] U ON A.UserId = U.UserId
WHERE A.AuditTypeId = 14
-- STEP 2: update on primary key in steps of 1000
DECLARE #RowsToUpdate INT = 1000
, #LastId INT = 0
, #RowCnt INT = 0
DECLARE #v TABLE(ndx INT, UserId INT)
WHILE 1=1
BEGIN
DELETE #v
INSERT INTO #v
SELECT TOP(#RowsToUpdate) *
FROM #selected WHERE ndx > #LastId
ORDER BY ndx
SET #RowCnt = ##ROWCOUNT
IF #RowCnt = 0
BREAK;
BEGIN TRANSACTION
UPDATE a
SET IsActivated = 1
FROM #v v
JOIN dbo.HRUser a ON a.Id = v.UserId
COMMIT TRANSACTION
SELECT #LastId = MAX(ndx) FROM #v
END

SQL Server Update query very slow

I ran the following query on a previous years data and it took 3 hours, this year it took 13 days. I don't know why this is though. Any help would be much appreciated.
I have just tested the queries in the old SQL server and it works in 3 hours. Therefore the problem must have something to do with the new SQL server I created. Do you have any ideas what the problem might be?
The query:
USE [ABCJan]
CREATE INDEX Link_Oct ON ABCJan2014 (Link_ref)
GO
CREATE INDEX Day_Oct ON ABCJan2014 (date_1)
GO
UPDATE ABCJan2014
SET ABCJan2014.link_id = LT.link_id
FROM ABCJan2014 MT
INNER JOIN [Central].[dbo].[LookUp_ABC_20142015] LT
ON MT.Link_ref = LT.Link_ref
UPDATE ABCJan2014
SET SumAvJT = ABCJan2014.av_jt * ABCJan2014.n
UPDATE ABCJan2014
SET ABCJan2014.DayType = LT2.DayType
FROM ABCJan2014 MT
INNER JOIN [Central].[dbo].[ABC_20142015_days] LT2
ON MT.date_1 = LT2.date1
With the following data structures:
ABCJan2014 (70 million rows - NO UNIQUE IDENTIFIER - Link_ref & date_1 together are unique)
Link_ID nvarchar (17)
Link_ref int
Date_1 smalldatetime
N int
Av_jt int
SumAvJT decimal(38,14)
DayType nvarchar (50)
LookUp_ABC_20142015
Link_ID nvarchar (17) PRIMARY KEY
Link_ref int INDEXED
Link_metres int
ABC_20142015_days
Date1 smalldatetime PRIMARY KEY & INDEXED
DayType nvarchar(50)
EXECUTION PLAN
It appears to be this part of the query that is taking such a long time.
Thanks again for any help, I'm pulling my hair out.
Create Index on ABCJan2014 table as it is currently a heap
If you look at the execution plan the time is in the actual update
Look at the log file
Is the log file on a fast disk?
Is the log file on the same physical disk?
Is the log file required to grow?
Size the log file to like 1/2 the size of the data file
As far as indexes test and tune this
If the join columns are indexed not much to do here
select count(*)
FROM ABCJan2014 MT
INNER JOIN [Central].[dbo].[LookUp_ABC_20142015] LT
ON MT.Link_ref = LT.Link_ref
select count(*)
FROM ABCJan2014 MT
INNER JOIN [Central].[dbo].[ABC_20142015_days] LT2
ON MT.date_1 = LT2.date1
Start with a top (1000) to get update tuning working
For grins please give this a try
Please post this query plan
(do NOT add an index to ABCJan2014 link_id)
UPDATE top (1000) ABCJan2014
SET MT.link_id = LT.link_id
FROM ABCJan2014 MT
JOIN [Central].[dbo].[LookUp_ABC_20142015] LT
ON MT.Link_ref = LT.Link_ref
AND MT.link_id <> LT.link_id
If LookUp_ABC_20142015 is not active then add a nolock
JOIN [Central].[dbo].[LookUp_ABC_20142015] LT with (nolock)
nvarchar (17) for a PK to me is just strange
why n - do you really have some unicode?
why not just char(17) and let it allocate space?
Why have 3 update statements when you can do it in one?
UPDATE MT
SET MT.link_id = CASE WHEN LT.link_id IS NULL THEN MT.link_id ELSE LT.link_id END,
MT.SumAvJT = MT.av_jt * MT.n,
MT.DayType = CASE WHEN LT2.DayType IS NULL THEN MT.DayType ELSE LT2.DayType END
FROM ABCJan2014 MT
LEFT OUTER JOIN [Central].[dbo].[LookUp_ABC_20142015] LT
ON MT.Link_ref = LT.Link_ref
LEFT OUTER JOIN [Central].[dbo].[ABC_20142015_days] LT2
ON MT.date_1 = LT2.date1
Also, I would create only one index for the join. Create the following index after the updates.
CREATE INDEX Day_Oct ON ABCJan2014 (date_1)
GO
Before you run, compare the execution plan by putting the update query above and your 3 update statements altogether in one query window, and do Display Estimated Execution Plan. It will show the estimated percentages and you'll be able to tell if it's any better (if new one is < 50%).
Also, it looks like the query is slow because it's doing a Hash Match. Please add a PK index on [LookUp_ABC_20142015].Link_ref.
[LookUp_ABC_20142015].Link_ID is a bad choice for PK, so drop the PK on that column.
Then add an index to [ABCJan2014].Link_ref.
See if that makes any improvement.
If you are going to update a table you need a unique identifier, so put on on ABCJan2014 ASAP especially since it is so large. There is no reason why you can't create a unique index on the fields that together compose the unique record. In the future, do not ever design a table that does not have a unique index or PK. This is simply asking for trouble both in processing time and more importantly in data integrity.
When you have a lot of updating to do to a large table, it is sometimes more effective to work in batches. You don't tie up the table in a lock for a long period of time and sometimes it is even faster due to how the database internals are working the problem. Consider processing 50,000 K records at a time (you may need to experiment to find the sweet spot of records to process in a batch, there is generally a point where the update starts to take significantly longer) in a loop or cursor.
UPDATE ABCJan2014
SET ABCJan2014.link_id = LT.link_id
FROM ABCJan2014 MT
JOIN [Central].[dbo].[LookUp_ABC_20142015] LT ON MT.Link_ref = LT.Link_ref
The code above will update all records from the join. If some of the records already have the link_id you might save considerable time by only updating the records where link_id is null or ABCJan2014.link_id <> LT.link_id. You have a 70 million record table, you don't need to be updating records that do not need a change. The same thing of course applies to your other updates as well.
Not knowing how much data gets added to this table or how often this number need updating, consider that this SumAvJT might be best defined as a persisted calculated field. Then it gets updated automatically when one of the two values changes. This wouldn't help if the table is bulk loaded but might if records come in individually.
In the execution plan, it makes recommendations for indexes being added. Have you created those indexes? Also, take a look at your older server's data structure - script out the table structures including indexes - and see if there are differences between them. At some point somebody's possibly built an index on your old server's tables to make this more efficient.
That said, what volume of data are you looking at? If you're looking at significantly different volumes of data, it could be that the execution plans generated by the servers differ significantly. SQL Server doesn't always guess right, when it builds the plans.
Also, are you using prepared statements (i.e., stored procedures)? If you are, then it's possible that the cached data access plan is simply out of date & needs to be updated, or you need to update statistics on the tables and then run the procedure with recompile so that a new data access plan is generated.
where is located the [Central] server ?
It is possible to duplicate your [Central].[dbo].[LookUp_ABC_20142015] and [Central].[dbo].[ABC_20142015_days] table locally ?
1) Do:
select * into [ABC_20142015_days] from [Central].[dbo].[ABC_20142015_days]
select * into [LookUp_ABC_20142015] from [Central].[dbo].[LookUp_ABC_20142015]
2) Recreate the index on [ABC_20142015_days] and [LookUp_ABC_20142015]...
3) Rewrite your updates by removing the "[Central].[dbo]." prefix !
Just after writing this solution, I found an other solution, but I'm not sure if it's applicable to your server: add the "REMOTE" join hints... I never use it, but you can found the documentation at https://msdn.microsoft.com/en-us/library/ms173815.aspx
Hopping it could help you...
All the previous answers that suggest improving the structure of the tables and the queries itself are nice to know for you, there is doubt about that.
However your question is why the SAME data/structure and the SAME queries give this huge difference.
So before you look at optimising sql you must find the real cause. And the real cause is hardware or software or configuration. Start by compating sql server with the old one then move to the hardware and benchmark it. Lastly look at the software for differences.
Only when you solved the actual problem you can start improving the sql itself
ALTER TABLE dbo.ABCJan2014
ADD SumAvJT AS av_jt * n --PERSISTED
CREATE INDEX ix ON ABCJan2014 (Link_ref) INCLUDE (link_id)
GO
CREATE INDEX ix ON ABCJan2014 (date_1) INCLUDE (DayType)
GO
UPDATE ABCJan2014
SET ABCJan2014.link_id = LT.link_id
FROM ABCJan2014 MT
JOIN [Central].[dbo].[LookUp_ABC_20142015] LT ON MT.Link_ref = LT.Link_ref
UPDATE ABCJan2014
SET ABCJan2014.DayType = LT2.DayType
FROM ABCJan2014 MT
JOIN [Central].[dbo].[ABC_20142015_days] LT2 ON MT.date_1 = LT2.date1
I guess there is a lot of page splitting. Can You try this?
SELECT
(SELECT LT.link_id FROM [Central].[dbo].[LookUp_ABC_20142015] LT
WHERE MT.Link_ref = LT.Link_ref) AS Link_ID,
Link_ref,
Date_1,
N,
Av_jt,
MT.av_jt * MT.n AS SumAvJT,
(SELECT LT2.DayType FROM [Central].[dbo].[ABC_20142015_days] LT2
WHERE MT.date_1 = LT2.date1) AS DayType
INTO ABCJan2014new
FROM ABCJan2014 MT
In addition to all answer above.
i) Even 3 hour is lot.I mean even if any query take 3 hours,I first check my requirement and revise it.Raise the issue.Of course I will optimize my query.
Like in your query,none of the update appear to be serious matter.
Like #Devart pointed,one of the column can be calculated columns.
ii) Trying running other query in new server and compare.?
iii) Rebuild the index.
iv) Use "with (nolock)" in your join.
v) Create index on table LookUp_ABC_20142015 column Link_ref.
vi)clustered index on nvarchar (17) or datetime is always a bad idea.
join on datetime column or varchar column always take time.
Try with alias instead of recapturing table name in UPDATE query
USE [ABCJan]
CREATE INDEX Link_Oct ON ABCJan2014 (Link_ref)
GO
CREATE INDEX Day_Oct ON ABCJan2014 (date_1)
GO
UPDATE MT
SET MT.link_id = LT.link_id
FROM ABCJan2014 MT
INNER JOIN [Central].[dbo].[LookUp_ABC_20142015] LT
ON MT.Link_ref = LT.Link_ref
UPDATE ABCJan2014
SET SumAvJT = av_jt * n
UPDATE MT
SET MT.DayType = LT2.DayType
FROM ABCJan2014 MT
INNER JOIN [Central].[dbo].[ABC_20142015_days] LT2
ON MT.date_1 = LT2.date1
Frankly, I think you've already answered your own question.
ABCJan2014 (70 million rows - NO UNIQUE IDENTIFIER - Link_ref & date_1 together are unique)
If you know the combination is unique, then by all means 'enforce' it. That way the server will know it too and can make use of it.
Query Plan showing the need for an index on [ABCJAN2014].[date_1] 3 times in a row!
You shouldn't believe everything that MSSQL tells you, but you should at least give it a try =)
Combining both I'd suggest you add a PK to the table on the fields [date_1] and [Link_ref] (in that order!). Mind: adding a Primary Key -- which is essentially a clustered unique index -- will take a while and require a lot of space as the table pretty much gets duplicated along the way.
As far as your query goes, you could put all 3 updates in 1 statement (similar to what joordan831 suggests) but you should take care about the fact that a JOIN might limit the number of rows affected. As such I'd rewrite it like this:
UPDATE ABCJan2014
SET ABCJan2014.link_id = (CASE WHEN LT.Link_ref IS NULL THEN ABCJan2014.link_id ELSE LT.link_id END), -- update when there is a match, otherwise re-use existig value
ABCJan2014.DayType = (CASE WHEN LT2.date1 IS NULL THEN ABCJan2014.DayType ELSE LT2.DayType END), -- update when there is a match, otherwise re-use existig value
SumAvJT = ABCJan2014.av_jt * ABCJan2014.n
FROM ABCJan2014 MT
LEFT OUTER JOIN [Central].[dbo].[LookUp_ABC_20142015] LT
ON MT.Link_ref = LT.Link_ref
LEFT OUTER JOIN [Central].[dbo].[ABC_20142015_days] LT2
ON MT.date_1 = LT2.date1
which should have the same effect as running your original 3 updates sequentially; but hopefully taking a lot less time.
PS: Going by the Query Plans, you already have indexes on the tables you JOIN to ([LookUp_ABC_20142015] & [LookUp_ABC_20142015]) but they seem to be non-unique (and not always clustered). Assuming they're suffering from the 'we know it's unique but the server doesn't'-illness: it would be advisable to also add a Primary Key to those tables on the fields you join to, both for data-integrity and performance reasons!
Good luck.
Update data
set
data.abcKey=surrogate.abcKey
from [MyData].[dbo].[fAAA_Stage] data with(nolock)
join [MyData].[dbo].[dBBB_Surrogate] surrogate with(nolock)
on data.MyKeyID=surrogate.MyKeyID
The surrogate table must have a nonclustered index with an unique key. myKeyID must be created as an unique non-clustered key. The performance results improvements are significant.

Is a transaction that only updates a single table always isolated?

According to the UPDATE documentation, an UPDATE always acquires an exclusive lock on the whole table. However, I am wondering if the exclusive lock is acquired before the rows to be updated are determined or only just before the actual update.
My concrete problem is that I have a nested SELECT in my UPDATE like this:
UPDATE Tasks
SET Status = 'Active'
WHERE Id = (SELECT TOP 1 Id
FROM Tasks
WHERE Type = 1
AND (SELECT COUNT(*)
FROM Tasks
WHERE Status = 'Active') = 0
ORDER BY Id)
Now I am wondering whether it is really guaranteed that there is exactly one
task with Status = 'Active' afterwards if in parallel the same statement may be executed with another Type:
UPDATE Tasks
SET Status = 'Active'
WHERE Id = (SELECT TOP 1 Id
FROM Tasks
WHERE Type = 2 -- <== The only difference
AND (SELECT COUNT(*)
FROM Tasks
WHERE Status = 'Active') = 0
ORDER BY Id)
If for both statements the rows to change would be determined before the lock is acquired, I could end up with two active tasks which I must prevent.
If this is the case, how can I prevent it? Can I prevent it without setting the transaction level to SERIALIZABLE or messing with lock hints?
From the answer to Is a single SQL Server statement atomic and consistent? I learned that the problem arises when the nested SELECT accesses another table. However, I'm not sure if I have to care about this issue if only the updated table is concerned.
If you want exactly one task with static = active, then set up the table to ensure this is true. Use a filtered unique index:
create unique index unq_tasks_status_filter_active on tasks(status)
where status = 'Active';
A second concurrent update might fail, but you will be ensured of uniqueness. Your application code can process such failed updates, and re-try.
Relying on the actual execution plans of the updates might be dangerous. That is why it is safer to have the database do such validations. Underlying implementation details could vary, depending on the environment and version of SQL Server. For instance, what works in a single threaded, single processor environment may not work in a parallel environment. What works with one isolation level may not work with another.
EDIT:
And, I cannot resist. For efficiency purposes, consider writing the query as:
UPDATE Tasks
SET Status = 'Active'
WHERE NOT EXISTS (SELECT 1
FROM Tasks
WHERE Status = 'Active'
) AND
Id = (SELECT TOP 1 Id
FROM Tasks
WHERE Type = 2 -- <== The only difference
ORDER BY Id
);
Then place indexes on Tasks(Status) and Tasks(Type, Id). In fact, with the right query, you might find that the query is so fast (despite the update on the index) that your worry about current updates is greatly mitigated. This would not solve a race condition, but it might at least make it rare.
And if you are capturing errors, then with the unique filtered index, you could just do:
UPDATE Tasks
SET Status = 'Active'
WHERE Id = (SELECT TOP 1 Id
FROM Tasks
WHERE Type = 2 -- <== The only difference
ORDER BY Id
);
This will return an error if a row already is active.
Note: all these queries and concepts can be applied to "one active per group". This answer is addressing the question that you asked. If you have a "one active per group" problem, then consider asking another question.
This not an answer on your question... But your query is pain for my eyes :)
;WITH cte AS
(
SELECT *, RowNum = ROW_NUMBER() OVER (PARTITION BY [type] ORDER BY id)
FROM Tasks
)
UPDATE cte
SET [Status] = 'Active'
WHERE RowNum = 1
AND [type] = 1
AND NOT EXISTS(
SELECT 1
FROM Tasks
WHERE [Status] = 'Active'
)
No, at least the nested select statement can be processed before the update is started and locks are acquired. To make sure that no other query interferes with this update it is required to set the transaction isolation level to SERIALIZABLE.
This article (and the series it is part of) explains very well the subtleties of concurrency in SQL server:
http://sqlperformance.com/2014/02/t-sql-queries/confusion-caused-by-trusting-acid

Optimising CTE for recursive queries

I have a table with self join. You can think of the structure as standard table to represent organisational hierarchy. Eg table:-
MemberId
MemberName
RelatedMemberId
This table consists of 50000 sample records. I wrote CTE recursive query and it works absolutely fine. However the time it takes to process just 50000 records is round about 3 minutes on my machine (4GB Ram, 2.4 Ghz Core2Duo, 7200 RPM HDD).
How can I possibly improve the performance because 50000 is not so huge number. Over time it will keep on increasing. This is the query which is exactly what I have in my Stored Procedure. The query's purpose is to select all the members that come under a specific member. Eg. Under Owner of the company each and every person comes. For Manager, except Owner all of the records gets returned. I hope you understand the query's purpose.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
Alter PROCEDURE spGetNonVirtualizedData
(
#MemberId int
)
AS
BEGIN
With MembersCTE As
(
Select parent.MemberId As MemberId, 0 as Level
From Members as parent Where IsNull(MemberId,0) = IsNull(#MemberId,0)
Union ALL
Select child.MemberId As MemberId , Level + 1 as Level
From Members as child
Inner Join MembersCTE on MembersCTE.MemberId = child.RelatedMemberId
)
Select Members.*
From MembersCTE
Inner Join Members On MembersCTE.MemberId = Members.MemberId
option(maxrecursion 0)
END
GO
As you can see to improve the performance, I have even made the Joins at the last step while selecting records so that all unnecessary records do not get inserted into temp table. If I made joins in my base step and recursive step of CTE (instead of Select at the last step) the query takes 20 minutes to execute!
MemberId is primary key in the table.
Thanks in advance :)
In your anchor condition you have Where IsNull(MemberId,0) = IsNull(#MemberId,0) I assume this is just because when you pass NULL as a parameter = doesn't work in terms of bringing back IS NULL values. This will cause a scan rather than a seek.
Use WHERE MemberId = #MemberId OR (#MemberId IS NULL AND MemberId IS NULL) instead which is sargable.
Also I'm assuming that you can't have an index on RelatedMemberId. If not you should add one
CREATE NONCLUSTERED INDEX ix_name ON Members(RelatedMemberId) INCLUDE (MemberId)
(though you can skip the included column bit if MemberId is the clustered index key as it will be included automatically)

SQL2000 to SQL2005. Query now a lot slower

This query used to take 3secs in SQL2000, now it takes about 70secs. Both databases give the same results. The 2005 database is not running in compatibility mode.
Currently we're rebuilding the query to run in SQL2005.. by a process of elimination and understanding the logic.
However - can anyone see anything obvious that we've missed.
And/or are there any tools that could help here?
We've been looking at the Execution plan... and profiler. And index tuning wizard.
Profiler points to a massive number more records being queried to get the same results.
I know that this is a very hard question to debug without the data... another pair of eyes is always good if there is anything obvious!
Cheers
Dave
ALTER PROCEDURE [dbo].[GetNodeList]
#ViewID int,
#UserID int = null
as
Select ProcessList.*,
A.NDOC_DOC_ID,
A.NDOC_Order,
A.OMNIBOOK_ID,
A.Node_Order
from (
(SELECT N.NOD_ID,
N.NOD_Name,
N.NOD_Procname,
N.NOD_Xpos,
N.NOD_Ypos,
N.NOD_Zpos,
VN.VNOD_VIE_ID
FROM Node N
INNER JOIN View_NODe VN
ON N.NOD_ID = VN.VNOD_NOD_ID
Where VN.VNOD_VIE_ID = #ViewID) ProcessList
Left Join
(
SELECT N.NOD_ID,
N.NOD_Name,
N.NOD_Procname,
N.NOD_Xpos as NOD_Xpos,
N.NOD_Ypos as NOD_Ypos,
N.NOD_Zpos as NOD_Zpos,
VN.VNOD_VIE_ID,
ND.NDOC_DOC_ID as NDOC_DOC_ID,
ND.NDOC_Order as NDOC_Order,
null as OMNIBOOK_ID,
null as Node_Order
FROM Node N
INNER JOIN View_NODe VN
ON N.NOD_ID = VN.VNOD_NOD_ID
LEFT JOIN NODe_DOCument ND
ON N.NOD_ID = ND.NDOC_NOD_ID
WHERE VN.VNOD_VIE_ID=#ViewID
and ND.NDOC_DOC_ID is not null
and (#UserID is null
or exists (Select 1
from Document D
where Doc_ID = ND.NDOC_DOC_ID
and dbo.fn_UserCanSeeDoc(#UserID,D.Doc_ID)<>0
)
)
UNION
SELECT N.NOD_ID,
N.NOD_Name,
N.NOD_Procname,
N.NOD_Xpos,
N.NOD_Ypos,
N.NOD_Zpos,
VN.VNOD_VIE_ID,
null,
null,
NOM.OMNIBOOK_ID,
NOM.Node_Order
FROM Node N
INNER JOIN View_NODe VN
ON N.NOD_ID = VN.VNOD_NOD_ID
LEFT JOIN NODe_OMNIBOOK NOM
ON N.NOD_ID = NOM.NODE_ID
WHERE VN.VNOD_VIE_ID=#ViewID
and NOM.OMNIBOOK_ID is not null
and exists (select 1 from Omnibook_Doc where OmnibookID = NOM.OMNIBOOK_ID)
) A
--On ProcessList.NOD_ID = A.NOD_ID
ON ProcessList.NOD_Xpos = A.NOD_Xpos
And ProcessList.NOD_Ypos = A.NOD_Ypos
And ProcessList.NOD_Zpos = A.NOD_Zpos
And ProcessList.VNOD_VIE_ID = A.VNOD_VIE_ID
)
ORDER BY
ProcessList.NOD_Xpos,
ProcessList.NOD_Zpos,
ProcessList.NOD_Ypos,
Coalesce(A.NDOC_Order,A.Node_Order),
Coalesce(A.NDOC_DOC_ID,A.OMNIBOOK_ID)
I've seen this before when the statistics haven't kept up with the data. It's possible in this instance that SQL Server 2005 uses the statistics differently to SQL Server 2000. Try rebuilding your statistics for the tables used in the query; so for each table:
UPDATE STATISTICS <table> WITH FULLSCAN
Yes, I'd add the FULLSCAN unless you know your data well enough to think that a sample of records will give good enough results. It'll slow down the stats creation, but will make it more accurate.
Is it possible that your statistics haven't come across? in the 2k5 dbase? So the dbase doesn't have the info needed to make a good plan? As opposed to your old database which has good stats on the table and can choose a better plan for the data?
Could it be an issue with "parameter sniffing", i.e. SQL Server caching a query plan optimized for the parameters supplied for the first execution?
Microsoft technet has more
A college has come up with a solution... regarding bringing the function fn_UserCanSeeDoc back into the SQL.
Shown below is the old commented out function code, then the new inline SQL below it. The code now runs super quick (from over 1 minute to about a second)
Looking at the old SQL I'm surprised how good a job SQL2000 did of running it!
Cheers
--and dbo.fn_UserCanSeeDoc(#UserID,D.Doc_ID)<>0
-- if exists(Select 1 from Omnibook where Omnibook_ID = #DocID)
-- Begin
-- Set #ReturnVal = 1
-- End
--
-- else
-- Begin
-- if exists(
-- Select 1
-- from UserSecurityModule USM
-- Inner join DocSecurity DS
-- On USM.SecurityModuleID = DS.SecurityModuleID
-- where USM.UserID = #UserID
-- and DS.DocID = #DocID
-- )
--
-- Set #ReturnVal = 1
--
-- else
--
-- Set #ReturnVal = 0
-- End
AND D.Doc_ID IN (select DS.DocID from UserSecurityModule USM
Inner join DocSecurity DS
On USM.SecurityModuleID = DS.SecurityModuleID
where USM.UserID = #UserID)