SQL Server - Using joins in Update statement - sql

I have a HRUser and an Audit table, both are in production with large number of rows.
Now I have added one more column to my HRUser table called IsActivated.
I need to create a one-time script which will be executed in production and populate data into this IsActivated column. After execution of this one-time script onwards, whenever the users activate their account, the HRUser table's IsActivated column will automatically be updated.
For updating the IsActivated column in the HRUser table, I need to check the Audit table whether the user has logged in till now.
UPDATE [dbo].HRUser
SET IsActivated = 1
FROM dbo.[UserAudit] A
JOIN dbo.[HRUser] U ON A.UserId = U.UserId
WHERE A.AuditTypeId = 14
AuditTypeId=14 means the user has logged in and user can login any number of times and every time the user logs in it will get captured in the UserAudit table...
The logic is that if the user has logged in at least once means the user is activated.
This cannot be tested on lower environments and it need to be directly executed on production as in lower environments we don’t have any data in the UserAudit table.
I am not really sure if that works as I have never used joins in update statement, I am looking for suggestions for any better approach than this for accomplishing my task

You could use EXISTS and a correlated subquery to filter on rows whose UserId has at least one audit event of id 14:
UPDATE h
SET IsActivated = 1
FROM [dbo].HRUser h
WHERE EXISTS (
SELECT 1 FROM
FROM dbo.[UserAudit] a
WHERE a.UserId = h.UserId AND a.AuditTypeId = 14
)
Note that there is no point reopening the target table in the subquery; you just need to correlate it with the outer query.

Two methods below. Method 1 is NOT recommended for tables "in production with large number of rows". But it is much easier to code. Method 2 works in production with no downtime.
Whichever method you choose: TEST it outside production. Copy the data from production. If you cannot do that, then build your own. Build a toy system. Highly recommended that you test at some level before running either method in production.
METHOD 1:
Updating on a join is straight forward. Use an alias. Reminder, this is NOT RECOMMENDED "with large number of rows" AND production running. The SQL Server optimizer most likely will escalate locks on both tables and block the tables until the update completes. IF you are taking an outage and are not concerned with how long the update takes, this method works.
UPDATE U
SET IsActivated = 1
FROM dbo.[UserAudit] A
JOIN dbo.[HRUser] U ON A.UserId = U.UserId
WHERE A.AuditTypeId = 14
METHOD 2:
IF you cannot afford to stop your production systems for this update (and most of us cannot), then I recommend that you do 2 things:
Set up a loop with the transaction inside the loop. This means that the optimizer will use row locks and not block the entire table. This method may take longer, but it will not block production. If the update takes longer, not a concern as long as the devops team never calls because production is blocked.
Capture rows to be updated outside the transaction. THEN, update based on a primary key (fastest). The total transaction time is how long the rows updated will be blocked.
Here is a toy example for looping.
-- STEP 1: get data to be updated
CREATE TABLE #selected ( ndx INT IDENTITY(1,1), UserId INT )
INSERT INTO #selected (UserId)
SELECT UserId
FROM dbo.[UserAudit] A
JOIN dbo.[HRUser] U ON A.UserId = U.UserId
WHERE A.AuditTypeId = 14
-- STEP 2: update on primary key in steps of 1000
DECLARE #RowsToUpdate INT = 1000
, #LastId INT = 0
, #RowCnt INT = 0
DECLARE #v TABLE(ndx INT, UserId INT)
WHILE 1=1
BEGIN
DELETE #v
INSERT INTO #v
SELECT TOP(#RowsToUpdate) *
FROM #selected WHERE ndx > #LastId
ORDER BY ndx
SET #RowCnt = ##ROWCOUNT
IF #RowCnt = 0
BREAK;
BEGIN TRANSACTION
UPDATE a
SET IsActivated = 1
FROM #v v
JOIN dbo.HRUser a ON a.Id = v.UserId
COMMIT TRANSACTION
SELECT #LastId = MAX(ndx) FROM #v
END

Related

SELECT data first then UPDATE using clustered index or UPDATE directly?

There's table (in SQL SERVER) which have high concurrent access (Ex: Transactions), but I need to update some data in this table. But when filtering data that needed to update, there's no indexes associated with that column(s).
What would be the best approach for minimum Table/ Rows lock time?
Approach 1:
DECLARE #vt_TxnIds TABLE
(
[Id] INT
)
/** Filter the required data first **/
INSERT INTO #vt_TxnIds
SELECT TXN.[Id]
FROM [Transactions] TXN WITH (NOLOCK) -- NOLOCK is fine in this case
LEFT JOIN [Account] ACC WITH (NOLOCK)
ON ACC.[Id] = TXN.[AccountId] AND
ACC.[IsActive] = 0
WHERE TXN.[Status] = 1 -- This column is not indexed
AND ACC.[Id] IS NULL
/** Then update by clustered Index **/
BEGIN TRAN;
UPDATE [Transactions]
SET [Status] = 5
WHERE [Id] IN ( -- [Id] is clustered index
SELECT [Id]
FROM #vt_TxnIds
)
COMMIT;
Approach 2:
BEGIN TRAN;
UPDATE TXN
SET TXN.[Status] = 5
FROM [Transactions] TXN
LEFT JOIN [Account] ACC WITH (NOLOCK)
ON ACC.[Id] = TXN.[AccountId] AND
ACC.[IsActive] = 0
WHERE TXN.[Status] = 1 -- This column is not indexed
AND ACC.[Id] IS NULL
COMMIT;
I'm not considering about the execution time. For example in my case, it's okay that whole query take 15 seconds but table/ rows locked for 5 seconds. Rather than whole table locked for 10 seconds and query also take 10 seconds.
Could someone please suggest what's the best approach or any alternative approach that full-fill my requirement?
Many thanks!
Update: Creating new index is not an option.
The first option is pointless extra work, and does not conform to ACID properties.
The unmentioned Approach #3 is best:
Approach #2 is good as a starting point
Remove the transaction as it is only a single statement
Remove the NOLOCK hint as that will just cause incorrect results and weird errors
Convert the left-join to a NOT EXISTS which is often more efficient.
UPDATE TXN
SET TXN.Status = 5
FROM Transactions TXN
WHERE TXN.Status = 1
AND NOT EXISTS (SELECT 1
FROM Account ACC
WHERE ACC.Id = TXN.AccountId
AND ACC.IsActive = 0
);
For this to work efficiently, you will want indexes (either clustered or non-clustered)
TXN (Status, AccountId)
ACC (IsActive, Id)
Alternatively you can use filtered non-clustered indexes
TXN (AccountId) INCLUDE (Status) WHERE (Status = 1)
ACC (Id) INCLUDE (IsActive) WHERE (IsActive = 0)
If you want to prevent a lot of locking and/or you cannot add the indexes, you can do the update in a loop on a few rows at a time.
Note that a transaction is not used here, to prevent excessive locking. Obviously you cannot roll back each run of the loop once finished.
DECLARE #batchSize bigint = 1000;
WHILE (1=1)
BEGIN
UPDATE TOP (#batchSize) TXN
SET TXN.Status = 5
FROM Transactions TXN
WHERE TXN.Status = 1
AND NOT EXISTS (SELECT 1
FROM Account ACC
WHERE ACC.Id = TXN.AccountId
AND ACC.IsActive = 0
);
IF ##ROWCOUNT < #batchSize
BREAK;
WAITFOR DELAY '00:00:05'; -- or some other wait time
END;
Presumably this update is required for your application to function correctly. When dealing with an overzealous datababase administrator (I didn't say "incompetent", did I? :-) you, the developer, get the application right and leave the DBA to sort out the performance and table-locking problems. They can always add an index later when your production code gets slow. To which you say "hey, good idea!" presuming they ask you.
The same logic holds true for NOLOCK. The DBA can tell you if that's necessary. (It probably isn't.) Leave it out of your work.
Your objective here is to minimize the time during which a table is locked, as you said. Your secondary objective is to minimize the number of rows involved in any particular UPDATE operation.
You can do that, in SQL Server, by using TOP (n) to control the number of rows. That means you do multiple UPDATEs and keep going until the job is done. This kind of thing will work. (not debugged.)
SET #batchsize = 100;
SET #count = 1;
WHILE #count > 0 BEGIN
SET DEADLOCK_PRIORITY LOW;
UPDATE TOP (#batchsize) TXN
SET TXN.[Status] = 5
FROM [Transactions] TXN
LEFT JOIN [Account] ACC
ON ACC.[Id] = TXN.[AccountId] AND ACC.[IsActive] = 0
WHERE TXN.[Status] = 1
AND ACC.[Id] IS NULL;
SET #count = ##ROWCOUNT;
END;
This works because your UPDATE sets Transactions.Status to 5. Once a row has been updated, that same row won't be chosen again for update.
Setting the deadlock priority to low is a good idea for this sort of repeating operation. If somehow your update query causes a deadlock with other app software it tells SQL server to stop your query rather than others. Stopping your query doesn't matter: your update will catch the same rows the next time it runs.
Now, obviously, this doesn't update the whole table in a single ACID transaction. Instead it's each batch. But I suspect that will not damage your application, or your transactional code would have done the update in real time.

Is it possible to pick one result set to return from a batch execution?

I have to retrieve some data from a MSSQL 2014 server through a propriatery application which then uses an odbc data source. The only thing I can modify is the query the application uses. I cannot modify the application or how the application handles the results.
The following query is doing what I want if I execute it directly e.g. in Heidi.
USE MY_DB;
BEGIN TRANSACTION
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
DECLARE #myvar1 INT = 2;
DECLARE #myvar2 INT = 2;
PRINT #myvar1;
SELECT TOP 20 [vwPGIA].[OrNum],[vwPGIA].[DBM],[vwPGIA].[MBM],[vwPGIA].[MN],[NOMID],[Priority],SUBSTRING([Comment],0,254) AS Comment,[TLSAP],[Box],[SequenceNumber]
INTO #tmp_tbl
FROM [MY_DB].[dbo].[vwPGIA]
INNER JOIN [MY_DB].[dbo].[tblDLA] ON [dbo].[tblDLA].[OrNum]=[dbo].[vwPGIA].[OrNum]
INNER JOIN [dbo].[tblMDM] ON [vwPGIA].[MBM]=[tblMDM].[MBM]
WHERE ([TLSAP] = #myvar1)
AND [vwPGIA].[MBM] NOT IN (SELECT [MBM] FROM [MY_DB].[dbo].[vwDPS])
AND [vwPGIA].[OrNum] NOT IN (SELECT [OrNum] FROM [MY_DB].[dbo].[vwDPS] WHERE [MY_DB].[dbo].[vwDPS].[TLR] <> #myvar1)
ORDER BY [SequenceNumber];
SELECT TOP 1 [OrNum],[DBM],[MBM],[MN],[NOMID],[Priority],[Comment],[TLSAP],[Box],[WTT],[SequenceNumber]
FROM #tmp_tbl
INNER JOIN [dbo].[tblTBN] ON [Box]=[BoxN]
WHERE ([WTT]=#myvar2)
ORDER BY [SequenceNumber];
INSERT INTO [dbo].[tblDPS]
(OrNum,DBM,MBM,State,StateStartTime,Info,TLR)
SELECT TOP 1 [OrNum],[DBM],[MBM],'1',GETDATE(),'info',#myvar1
FROM #tmp_tbl
INNER JOIN [dbo].[tblTBN] ON [Box]=[BoxN]
WHERE ([WTT]=#myvar2)
ORDER BY [SequenceNumber]
;
DROP TABLE #tmp_tbl;
COMMIT TRANSACTION
Running this through the ODBC interface results in an empty result. The problem seems to be, that I am doing a batch request which results in multiple result sets. The application probably only handles the first result set or maybe cannot handle more than one result set.
Finally the question: Is there a way or workaround to reduce the result sets to only the one returned by the SELECT TOP 1 ... part?

Performance of In Operator with OR conditional SQL

I have common clause in Most of the Procedures like
Select * from TABLE A + Joins where <Conditions>
And
(
-- All Broker
('True' = (Select AllBrokers from SiteUser where ID = #SiteUserID))
OR
(
A.BrokerID in
(
Select BrokerID from SiteUserBroker where SiteUserID
= #SiteUserID)
)
)
So basically if the user has access to all brokers the whole filter should not be applied else if should get the list of Broker
I am bit worries about the performance as this is used in lot of procedures and data has started reaching over 100,000 records and will grow soon, so can this be better written?
ANY Ideas are highly appreciated
One of the techniques is to use built dynamic T-SQL statement and then execute it. Since, this is done in stored procedure you are OK and the idea is simple.
DECLARE #DynamicTSQLStatement NVARCHAR(MAX);
SET #DynamicTSQLStatement = 'base query';
IF 'Getting All Brokers is not allowed '
BEGIN;
SET #DynamicTSQLStatement = #DynamicTSQLStatement + 'aditional where clause'
END;
EXEC sp_executesql #DynamicTSQLStatement;
Or instead of using dynamic T-SQL statement you can have two separate queries - one for users seeing all the data and one for users seeing part of the data. This can lead to code duplication.
Another way, is to turn this OR statement in INNER JOIN. You should test the performance in order to be sure you are getting something from it. The idea is to create a temporary table (it can have primary key or indexes if needed) and store all visible broker ids there - if the users sees all, then Select BrokerID from SiteUserBroker and if the users sees a few - Select BrokerID from SiteUserBroker where SiteUserID = #SiteUserID. In the second way, you are going to simplify the whole statement, but be sure to test if performance is improved.
CREATE TABLE #SiteUserBroker
(
[BrokerID] INT PRIMARY KEY
);
INSERT INTO #SiteUserBroker ([BrokerID])
SELECT BrokerID
FROM SiteUserBroker
where SiteUserID = #SiteUserID
OR ('True' = (Select AllBrokers from SiteUser where ID = #SiteUserID));
Select *
from TABLE A
INNER JOIN #SiteUserBroker B
ON A.BrokerID = B.[BrokerID]
-- other joins
where <Conditions>
As we are using INNER JOIN you can add it at the begging. If there are LEFT JOINs after it, it will affect the performance in positive way.
Adding to #gotqn answer, you can add EXISTS instead of IN (Note - This is not complete answer) -
AND EXISTS (
Select 1/0 from SiteUserBroker X
where A.BrokerID = X.BrokerID AND
X.SiteUserID = #SiteUserID
)
I found that Exists performs better than In in some cases. Please verify your case.

Is a transaction that only updates a single table always isolated?

According to the UPDATE documentation, an UPDATE always acquires an exclusive lock on the whole table. However, I am wondering if the exclusive lock is acquired before the rows to be updated are determined or only just before the actual update.
My concrete problem is that I have a nested SELECT in my UPDATE like this:
UPDATE Tasks
SET Status = 'Active'
WHERE Id = (SELECT TOP 1 Id
FROM Tasks
WHERE Type = 1
AND (SELECT COUNT(*)
FROM Tasks
WHERE Status = 'Active') = 0
ORDER BY Id)
Now I am wondering whether it is really guaranteed that there is exactly one
task with Status = 'Active' afterwards if in parallel the same statement may be executed with another Type:
UPDATE Tasks
SET Status = 'Active'
WHERE Id = (SELECT TOP 1 Id
FROM Tasks
WHERE Type = 2 -- <== The only difference
AND (SELECT COUNT(*)
FROM Tasks
WHERE Status = 'Active') = 0
ORDER BY Id)
If for both statements the rows to change would be determined before the lock is acquired, I could end up with two active tasks which I must prevent.
If this is the case, how can I prevent it? Can I prevent it without setting the transaction level to SERIALIZABLE or messing with lock hints?
From the answer to Is a single SQL Server statement atomic and consistent? I learned that the problem arises when the nested SELECT accesses another table. However, I'm not sure if I have to care about this issue if only the updated table is concerned.
If you want exactly one task with static = active, then set up the table to ensure this is true. Use a filtered unique index:
create unique index unq_tasks_status_filter_active on tasks(status)
where status = 'Active';
A second concurrent update might fail, but you will be ensured of uniqueness. Your application code can process such failed updates, and re-try.
Relying on the actual execution plans of the updates might be dangerous. That is why it is safer to have the database do such validations. Underlying implementation details could vary, depending on the environment and version of SQL Server. For instance, what works in a single threaded, single processor environment may not work in a parallel environment. What works with one isolation level may not work with another.
EDIT:
And, I cannot resist. For efficiency purposes, consider writing the query as:
UPDATE Tasks
SET Status = 'Active'
WHERE NOT EXISTS (SELECT 1
FROM Tasks
WHERE Status = 'Active'
) AND
Id = (SELECT TOP 1 Id
FROM Tasks
WHERE Type = 2 -- <== The only difference
ORDER BY Id
);
Then place indexes on Tasks(Status) and Tasks(Type, Id). In fact, with the right query, you might find that the query is so fast (despite the update on the index) that your worry about current updates is greatly mitigated. This would not solve a race condition, but it might at least make it rare.
And if you are capturing errors, then with the unique filtered index, you could just do:
UPDATE Tasks
SET Status = 'Active'
WHERE Id = (SELECT TOP 1 Id
FROM Tasks
WHERE Type = 2 -- <== The only difference
ORDER BY Id
);
This will return an error if a row already is active.
Note: all these queries and concepts can be applied to "one active per group". This answer is addressing the question that you asked. If you have a "one active per group" problem, then consider asking another question.
This not an answer on your question... But your query is pain for my eyes :)
;WITH cte AS
(
SELECT *, RowNum = ROW_NUMBER() OVER (PARTITION BY [type] ORDER BY id)
FROM Tasks
)
UPDATE cte
SET [Status] = 'Active'
WHERE RowNum = 1
AND [type] = 1
AND NOT EXISTS(
SELECT 1
FROM Tasks
WHERE [Status] = 'Active'
)
No, at least the nested select statement can be processed before the update is started and locks are acquired. To make sure that no other query interferes with this update it is required to set the transaction isolation level to SERIALIZABLE.
This article (and the series it is part of) explains very well the subtleties of concurrency in SQL server:
http://sqlperformance.com/2014/02/t-sql-queries/confusion-caused-by-trusting-acid

SQL server 2008 trigger not working correct with multiple inserts

I've got the following trigger;
CREATE TRIGGER trFLightAndDestination
ON checkin_flight
AFTER INSERT,UPDATE
AS
BEGIN
IF NOT EXISTS
(
SELECT 1
FROM Flight v
INNER JOIN Inserted AS i ON i.flightnumber = v.flightnumber
INNER JOIN checkin_destination AS ib ON ib.airport = v.airport
INNER JOIN checkin_company AS im ON im.company = v.company
WHERE i.desk = ib.desk AND i.desk = im.desk
)
BEGIN
RAISERROR('This combination of of flight and check-in desk is not possible',16,1)
ROLLBACK TRAN
END
END
What i want the trigger to do is to check the tables Flight, checkin_destination and checkin_company when a new record for checkin_flight is added. Every record of checkin_flight contains a flightnumber and desknumber where passengers need to check in for this destination.
The tables checkin_destination and checkin_company contain information about companies and destinations restricted to certain checkin desks. When adding a record to checkin_flight i need information from the flight table to get the destination and flightcompany with the inserted flightnumber. This information needs to be checked against the available checkin combinations for flights, destinations and companies.
I'm using the trigger as stated above, but when i try to insert a wrong combination the trigger allows it. What am i missing here?
EDIT 1:
I'm using the following multiple insert statement
INSERT INTO checkin_flight VALUES (5315,3),(5316,3),(5316,2)
//5315 is the flightnumber, 3 is the desknumber to checkin for that flight
EDIT 2:
Tested a single row insert which isn't possible, then the error is being thrown correct. So it's the multiple insert which seems to give the problem.
The problem is that your logic is allowing any insert that includes at least one valid set of values through. It will only fail if all of the inserted records are invalid, instead of if any of the inserted records are invalid.
Change your "IF NOT EXISTS(...)" to a statement "IF EXISTS(...)" and change your SELECT statement to return invalid flights.
eg:
IF EXISTS
(
SELECT 1
FROM Flight v
INNER JOIN Inserted AS i ON i.flightnumber = v.flightnumber
LEFT JOIN checkin_destination AS ib ON ib.airport = v.airport
AND i.desk = ib.desk
LEFT JOIN checkin_company AS im ON im.company = v.company
AND i.desk = im.desk
WHERE (im.desk IS NULL OR ib.desk IS NULL)
)
BEGIN
RAISERROR('This combination of of flight and check-in desk is not possible',16,1)
ROLLBACK TRAN
END
I'm not sure of your business logic, but you need to check that the query does the proper thing.
Your problem is the IF NOT EXISTS, if the condition is true for 1 of the 3 rows in INSERTED it does not exist. You need to convert it to find a problems row and use IF EXISTS then error out.
However, when in a trigger the best way to error out is:
RAISERROR()
ROLLBACK TRANSACTION
RETURN
I kind of doubt that the lack of a RETURN is your problem, but it is always best to include the three Rs when erroring out in a trigger.
The problem is that the condition will be true if only one of the inserted records are correct. You have to check that all records are correct, e.g.:
if (
(
select count(*) from inserted
) = (
select count(*) from flight v
inner join inserted i ...
)
) ...
The inserted table can contain multiple rows and therefore all logic within a trigger MUST be able to apply to all rows. The idea triggers must fire once per row effect is a common misunderstanding WRT triggers. SQL Server will tend to coalesce calls to a trigger to increase performance when they occur within the same transaction.
To fix you might start with a COUNT() of inserted and compare that with a COUNT() of the matching conditions and raise an error if there is a mismatch.