How to lock a row in a table before updating it - sql

To implement multi threading in SQL Server 2012 for one my update tasks, I need to have different threads select a row from a table (Accounts) and mark that row as processed using an update in a stored procedure.
Something like this:
create procedure ChooseNextAccountToProcess (#Account_ID Int Output)
select top 1 #Account_ID = Account_ID
from Accounts
order by LastProcessDate Desc
update Accounts
set LastProcessDate = getdate()
where Account_ID = #Account_ID
go
The problem with this approach is that two threads might call this stored procedure exactly at the same time and process the same account. My goal is to select an account from accounts table and exclusively lock it before update has chance to update it.
I tried SELECT .... WITH (UPDLOCK) and WITH Exclusive lock but none of these can actually put exclusive lock on the row when I select that row.
Any suggestion?

You can use update top (n) ..., but you can't specify order by directly in the statement. So, a little trick is in order:
declare #t table (
Id int primary key,
LastProcessDate date not null
);
insert into #t (Id, LastProcessDate)
values
(1, getdate() - 10),
(2, getdate() - 7),
(3, getdate() - 1),
(4, getdate() - 4);
-- Your stored procedure code starts from here
declare #res table (Id int primary key);
declare #AccountId int;
update a set LastProcessDate = getdate()
output inserted.Id into #res(Id)
from (select top (1) * from #t order by LastProcessDate desc) a;
select #AccountId = Id from #res;
-- Returns 3
select #AccountId;
Not exactly a one-liner, but close to it, yes.

Related

SQL Server: automatically add a unique identifier to all rows inserted at one time

The below SQL Server code successfully calculates and inserts the monthly pay for all employees along with their staffID number and inserts it into Tablepayroll.
INSERT INTO Tablepayroll (StaffID,Totalpaid)
(SELECT Tabletimelog.StaffID , Tabletimelog.hoursworked * Tablestaff.hourlypay
FROM Tabletimelog
JOIN Tablestaff ON
Tabletimelog.StaffID = Tablestaff.StaffID)
However, I want to be able to also insert a batchIDso that you can identify each time the above insert has been run and the records inserted by it at that time. Meaning that all staff payroll calculated at the same time would have the same batchID number. Each subsequent batchID should just increase by 1.
Please see image below for visual explanation .
I think that Select MAX(batch_id) + 1 would work , but I don't know how to include it in the insert statement.
You can use subquery to find latest batch_id from your current table using this query:
INSERT INTO TablePayroll (StaffID, TotalPaid, batch_id)
SELECT T1.StaffID
, T1.HoursWorked * T2.HourlyPay
, ISNULL((SELECT MAX(batch_id) FROM TablePayRoll), 0) + 1 AS batch_id
FROM TableTimeLog AS T1
INNER JOIN TableStaff AS T2
ON T1.StaffID = T2.StaffID;
As you can see, I just add 1 to current MAX(batch_id) and that's it.
By the way, learn to use aliases. It will make your life easier
Yet another solution would be having your batch_id as a GUID, so you wouldn't have to create sequences or get MAX(batch_id) from current table.
DECLARE #batch_id UNIQUEIDENTIFIER = NEWID();
INSERT INTO TablePayroll (StaffID, TotalPaid, batch_id)
SELECT T1.StaffID, T1.HoursWorked * T2.HourlyPay, #batch_id
FROM TableTimeLog AS T1
INNER JOIN TableStaff AS T2
ON T1.StaffID = T2.StaffID;
Updated
First of all obtain the maximum value in a large table (based on the name of the table it must be big) can be very expensive. Especially if there is no index on the column batch_id
Secondly, pay attantion your solution SELECT MAX(batch_id) + 1 may behave incorrectly when you will have competitive inserts. Solution from #EvaldasBuinauskas without opening transaction and right isolation level can also lead to same batch_id if you run the two inserts at the same time in parallel.
If your SQL Server ver 2012 or higer you can try SEQUENCE. This at least ensures that no duplicates batch_id
Creating SEQUENCE:
CREATE SEQUENCE dbo.BatchID
START WITH 1
INCREMENT BY 1 ;
-- DROP SEQUENCE dbo.BatchID
GO
And using it:
DECLARE #BatchID INT
SET #BatchID = NEXT VALUE FOR dbo.BatchID;
INSERT INTO Tablepayroll (StaffID,Totalpaid, batch_id)
(SELECT Tabletimelog.StaffID , Tabletimelog.hoursworked * Tablestaff.hourlypay, #BatchID
FROM Tabletimelog
JOIN Tablestaff ON Tabletimelog.StaffID = Tablestaff.StaffID)
An alternative SEQUENCE may be additional table:
CREATE TABLE dbo.Batch (
ID INT NOT NULL IDENTITY
CONSTRAINT PK_Batch PRIMARY KEY CLUSTERED
,DT DATETIME
CONSTRAINT DF_Batch_DT DEFAULT GETDATE()
);
This solution works even on older version of the server.
DECLARE #BatchID INT
INSERT INTO dbo.Batch (DT)
VALUES (GETDATE());
SET #BatchID = SCOPE_IDENTITY();
INSERT INTO Tablepayroll (StaffID,Totalpaid, batch_id)
(SELECT Tabletimelog.StaffID , Tabletimelog.hoursworked * Tablestaff.hourlypay, #BatchID
FROM Tabletimelog ...
And yes, all of these solutions do not guarantee the absence of holes in the numbering. This can happen during a transaction rollback (deadlock for ex.)

Left join with nearest value without duplicates

I want to achieve in MS SQL something like below, using 2 tables and through join instead of iteration.
From table A, I want each row to identify from table B which in the list is their nearest value, and when value has been selected, that value cannot re-used. Please help if you've done something like this before. Thank you in advance! #SOreadyToAsk
Below is a set-based solution using CTEs and windowing functions.
The ranked_matches CTE assigns a closest match rank for each row in TableA along with a closest match rank for each row in TableB, using the index value as a tie breaker.
The best_matches CTE returns rows from ranked_matches that have the best rank (rank value 1) for both rankings.
Finally, the outer query uses a LEFT JOIN from TableA to the to the best_matches CTE to include the TableA rows that were not assigned a best match due to the closes match being already assigned.
Note that this does not return a match for the index 3 TableA row indicated in your sample results. The closes match for this row is TableB index 3, a difference of 83. However, that TableB row is a closer match to the TableA index 2 row, a difference of 14 so it was already assigned. Please clarify you question if this isn't what you want. I think this technique can be tweaked accordingly.
CREATE TABLE dbo.TableA(
[index] int NOT NULL
CONSTRAINT PK_TableA PRIMARY KEY
, value int
);
CREATE TABLE dbo.TableB(
[index] int NOT NULL
CONSTRAINT PK_TableB PRIMARY KEY
, value int
);
INSERT INTO dbo.TableA
( [index], value )
VALUES ( 1, 123 ),
( 2, 245 ),
( 3, 342 ),
( 4, 456 ),
( 5, 608 );
INSERT INTO dbo.TableB
( [index], value )
VALUES ( 1, 152 ),
( 2, 159 ),
( 3, 259 );
WITH
ranked_matches AS (
SELECT
a.[index] AS a_index
, a.value AS a_value
, b.[index] b_index
, b.value AS b_value
, RANK() OVER(PARTITION BY a.[index] ORDER BY ABS(a.Value - b.value), b.[index]) AS a_match_rank
, RANK() OVER(PARTITION BY b.[index] ORDER BY ABS(a.Value - b.value), a.[index]) AS b_match_rank
FROM dbo.TableA AS a
CROSS JOIN dbo.TableB AS b
)
, best_matches AS (
SELECT
a_index
, a_value
, b_index
, b_value
FROM ranked_matches
WHERE
a_match_rank = 1
AND b_match_rank= 1
)
SELECT
TableA.[index] AS a_index
, TableA.value AS a_value
, best_matches.b_index
, best_matches.b_value
FROM dbo.TableA
LEFT JOIN best_matches ON
best_matches.a_index = TableA.[index]
ORDER BY
TableA.[index];
EDIT:
Although this method uses CTEs, recursion is not used and is therefore not limited to 32K recursions. There may be room for improvement here from a performance perspective, though.
I don't think it is possible without a cursor.
Even if it is possible to do it without a cursor, it would definitely require self-joins, maybe more than once. As a result performance is likely to be poor, likely worse than straight-forward cursor. And it is likely that it would be hard to understand the logic and later maintain this code. Sometimes cursors are useful.
The main difficulty is this part of the question:
when value has been selected, that value cannot re-used.
There was a similar question just few days ago.
The logic is straight-forward. Cursor loops through all rows of table A and with each iteration adds one row to the temporary destination table. To determine the value to add I use EXCEPT operator that takes all values from the table B and removes from them all values that have been used before. My solution assumes that there are no duplicates in value in table B. EXCEPT operator removes duplicates. If values in table B are not unique, then temporary table would hold unique indexB instead of valueB, but main logic remains the same.
Here is SQL Fiddle.
Sample data
DECLARE #TA TABLE (idx int, value int);
INSERT INTO #TA (idx, value) VALUES
(1, 123),
(2, 245),
(3, 342),
(4, 456),
(5, 608);
DECLARE #TB TABLE (idx int, value int);
INSERT INTO #TB (idx, value) VALUES
(1, 152),
(2, 159),
(3, 259);
Main query inserts result into temporary table #TDst. It is possible to write that INSERT without using explicit variable #CurrValueB, but it looks a bit cleaner with variable.
DECLARE #TDst TABLE (idx int, valueA int, valueB int);
DECLARE #CurrIdx int;
DECLARE #CurrValueA int;
DECLARE #CurrValueB int;
DECLARE #iFS int;
DECLARE #VarCursor CURSOR;
SET #VarCursor = CURSOR FAST_FORWARD
FOR
SELECT idx, value
FROM #TA
ORDER BY idx;
OPEN #VarCursor;
FETCH NEXT FROM #VarCursor INTO #CurrIdx, #CurrValueA;
SET #iFS = ##FETCH_STATUS;
WHILE #iFS = 0
BEGIN
SET #CurrValueB =
(
SELECT TOP(1) Diff.valueB
FROM
(
SELECT B.value AS valueB
FROM #TB AS B
EXCEPT -- remove values that have been selected before
SELECT Dst.valueB
FROM #TDst AS Dst
) AS Diff
ORDER BY ABS(Diff.valueB - #CurrValueA)
);
INSERT INTO #TDst (idx, valueA, valueB)
VALUES (#CurrIdx, #CurrValueA, #CurrValueB);
FETCH NEXT FROM #VarCursor INTO #CurrIdx, #CurrValueA;
SET #iFS = ##FETCH_STATUS;
END;
CLOSE #VarCursor;
DEALLOCATE #VarCursor;
SELECT * FROM #TDst ORDER BY idx;
Result
idx valueA valueB
1 123 152
2 245 259
3 342 159
4 456 NULL
5 608 NULL
It would help to have the following indexes:
TableA - (idx) include (value), because we SELECT idx, value ORDER BY idx;
TableB - (value) unique, Temp destination table - (valueB) unique filtered NOT NULL, to help EXCEPT. So, it may be better to have a temporary #table for result (or permanent table) instead of table variable, because table variables can't have indexes.
Another possible method would be to delete a row from table B (from original or from a copy) as its value is inserted into result. In this method we can avoid performing EXCEPT again and again and it could be faster overall, especially if it is OK to leave table B empty in the end. Still, I don't see how to avoid cursor and processing individual rows in sequence.
SQL Fiddle
DECLARE #TDst TABLE (idx int, valueA int, valueB int);
DECLARE #CurrIdx int;
DECLARE #CurrValueA int;
DECLARE #iFS int;
DECLARE #VarCursor CURSOR;
SET #VarCursor = CURSOR FAST_FORWARD
FOR
SELECT idx, value
FROM #TA
ORDER BY idx;
OPEN #VarCursor;
FETCH NEXT FROM #VarCursor INTO #CurrIdx, #CurrValueA;
SET #iFS = ##FETCH_STATUS;
WHILE #iFS = 0
BEGIN
WITH
CTE
AS
(
SELECT TOP(1) B.idx, B.value
FROM #TB AS B
ORDER BY ABS(B.value - #CurrValueA)
)
DELETE FROM CTE
OUTPUT #CurrIdx, #CurrValueA, deleted.value INTO #TDst;
FETCH NEXT FROM #VarCursor INTO #CurrIdx, #CurrValueA;
SET #iFS = ##FETCH_STATUS;
END;
CLOSE #VarCursor;
DEALLOCATE #VarCursor;
SELECT
A.idx
,A.value AS valueA
,Dst.valueB
FROM
#TA AS A
LEFT JOIN #TDst AS Dst ON Dst.idx = A.idx
ORDER BY idx;
I highly believe THIS IS NOT A GOOD PRACTICE because I am bypassing the policy SQL made for itself that functions with side-effects (INSERT,UPDATE,DELETE) is a NO, but due to the fact that I want solve this without resulting to iteration options, I came up with this and gave me better view of things now.
create table tablea
(
num INT,
val MONEY
)
create table tableb
(
num INT,
val MONEY
)
I created a hard-table temp which I shall drop from time-to-time.
if((select 1 from sys.tables where name = 'temp_tableb') is not null) begin drop table temp_tableb end
select * into temp_tableb from tableb
I created a function that executes xp_cmdshell (this is where the side-effect bypassing happens)
CREATE FUNCTION [dbo].[GetNearestMatch]
(
#ParamValue MONEY
)
RETURNS MONEY
AS
BEGIN
DECLARE #ReturnNum MONEY
, #ID INT
SELECT TOP 1
#ID = num
, #ReturnNum = val
FROM temp_tableb ORDER BY ABS(val - #ParamValue)
DECLARE #SQL varchar(500)
SELECT #SQL = 'osql -S' + ##servername + ' -E -q "delete from test..temp_tableb where num = ' + CONVERT(NVARCHAR(150),#ID) + ' "'
EXEC master..xp_cmdshell #SQL
RETURN #ReturnNum
END
and my usage in my query simply looks like this.
-- initialize temp
if((select 1 from sys.tables where name = 'temp_tableb') is not null) begin drop table temp_tableb end
select * into temp_tableb from tableb
-- query nearest match
select
*
, dbo.GetNearestMatch(a.val) AS [NearestValue]
from tablea a
and gave me this..

Performance issue using Merge in azure sql stored proc

Problem: We are experiencing sql timeouts which we believe are attributed to a recent database change that changed db schema and implemented a new proc that handled deletion and insertion of rows into a table. This table is quite large with around 4.5 million rows(but only five columns,three of which can be null), and is indexed with a primary key made up of two columns (UserID, GroupID). Unfortunately our database guy is unavailable and we are kind of stuck in between a rock and a hard place.
Question: Is there anything in the following stored procedure that sticks out as being a performance issue or is incorrectly done?
inputs:
UserID
GroupIDs (list of unique identifiers)
UpdateAdminID (Unique identifier of user who initiated stored proc)
Expectations:
When calling this stored procedure the expectation is that a row for each groupID will be inserted into UserGroups where it does not exist already. Also if a row is found that has the UserID parameter and a GroupID that is not in the input list then it must be deleted.
Procedure:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[sp_UpdateUserGroups] (
#UserID uniqueidentifier,
#Groups IDs READONLY,
#UpdateAdminID uniqueidentifier = NULL
) AS
BEGIN
DECLARE #EnforceUserGroupHeirarchy bit
DECLARE #ClientID uniqueidentifier
DECLARE #AttributeClientID uniqueidentifier
select #ClientID = clientid from users where userid=#userid
select #AttributeClientID=isnull(ParentClientID, ClientID)from clients where ClientID=#ClientID
select #EnforceUserGroupHeirarchy= value from v_clientpreferences where preferenceid=323 and clientid=#AttributeClientID
DECLARE #NumRows INT
SET #NumRows = 1
CREATE TABLE #AllGroups
(
GroupID uniqueidentifier
)
insert into #AllGroups (GroupID) select id from #Groups
if #EnforceUserGroupHeirarchy = 1
BEGIN
WHILE #NumRows>0
BEGIN
INSERT INTO #AllGroups (GroupID)
(
SELECT groupid from groups where (parentgroupid in (select groupid from #AllGroups)) and groupid not in (select groupid from #AllGroups)
)
SET #NumRows = ROWCOUNT_BIG()
END
END
merge usergroups as T
USING (select #UserID as UserId, g.groupid as GroupId from #AllGroups g) as S
on (T.UserID = S.UserID and T.GroupID = S.GroupID)
WHEN NOT MATCHED BY TARGET
THEN INSERT (UserID, GroupID, DateCreated, UpdateAdminID) VALUES(S.Userid,S.GroupID,GetDate(),#UpdateAdminID)
WHEN NOT MATCHED BY SOURCE and T.UserID =#UserID
THEN DELETE;
END
GO
Edit: Fragmentation of indexes is less then 30 percent.
Edit2: EnforceGroupHirearchy looks at a group, and recursivly adds all children to #AllGroups.

How do I get temp values to be set after an insert has occured in a trigger?

I have a trigger I am working on that will insert rows into a table when another table has inserts or updates applied to it. So far the Update portion works (the column that I'm most concerned with is the Balance column), but when the first row is added for an insert on the Account table, in my AuditTrailCustomerBalance table OldBalance, NewBalance and CustNo are set to NULL. How can I get NewBalance and CustNo to reference to the values that were just inserted into the table from the trigger?
Here is the trigger:
ALTER TRIGGER AuditTrigger
ON Accounts
FOR INSERT, UPDATE
AS
IF UPDATE( Balance )
BEGIN
IF EXISTS
(
SELECT 'True'
FROM Inserted i
JOIN Deleted d
ON i.AccountID = d.AccountID
)
BEGIN
--1. Declare temp variables.
DECLARE #OldBalance NUMERIC( 18, 0 )
DECLARE #NewBalance NUMERIC( 18, 0 )
DECLARE #CustNo INT
--2. Set the variables.
SELECT #OldBalance = Balance FROM deleted
SELECT #NewBalance = Balance FROM inserted
SELECT #CustNo = CustNo FROM inserted
INSERT INTO AuditTrailCustomerBalance( TimeChanged, ChangedBy, OldBalance, NewBalance, CustNo )
VALUES( GETDATE(), SUSER_SNAME(), #OldBalance, #NewBalance, #CustNo )
END
END
GO
And the test statement:
INSERT INTO Custs( CustNo, GivenName, Surname, DOB, SIN )
VALUES( 1, 'Peter', 'Griffen', 'January 15, 1950', '555555555')
INSERT INTO Accounts( CustNo, Type, Balance, AccruedInt, WithdrawalCount )
VALUES( 1, 'Savings', 0, 0, 0 )
UPDATE Accounts SET Balance = 100
WHERE CustNo = 1
I believe that you want something like this:
ALTER TRIGGER AuditTrigger
ON Accounts
FOR INSERT, UPDATE
AS
INSERT INTO AuditTrailCustomerBalance(TimeChanged, ChangedBy,
OldBalance, NewBalance, CustNo )
SELECT GETDATE(), SUSER_SNAME(),
COALESCE(d.Balance,0), i.Balance, i.CustNo
FROM inserted i
left join
deleted d
on
i.AccountNo = d.AccountNo
WHERE
i.Balance <> d.Balance OR
d.Balance IS NULL
As I said in my comments, inserted and deleted can contain multiple rows (or no rows) and so you need to take that into account and write a set-based query that deals with all of those rows - also some rows may have had balance changes and some not - so deciding whether to write any entries based on UPDATE(Balance) was also flawed.
you can if you are sure of your code write something like this :
if (select count(*) from inserted) = 1
and execute your code.
You can for the insert do like this :
insert into AuditTrailCustomerBalance (.....)
select .... from inserted
as already posted, the problem with your trigger is in the calling if you update one row or multiple (same for insert)

How to scan for differences between two queries?

I have a table that loads new data every day and another table that contains a history of changes to that table. What's the best way to check if any of the data have changed since the last time data was loaded?
For example, I have table #a with some strategies for different countries and table #b tracks the changes made to table #a. I can use a checksum() to hash the fields that can change, and add them to the table if the existing hash is different from the new hash. However, MSDN doesn't think this is a good idea since "collisions" can occur, e.g. two different values map to the same checksum.
MSDN link for checksum
http://msdn.microsoft.com/en-us/library/aa258245(v=SQL.80).aspx
Sample code:
declare #a table
(
ownerid bigint
,Strategy varchar(50)
,country char(3)
)
insert into #a
select 1,'Long','USA'
insert into #a
select 2,'Short','CAN'
insert into #a
select 3,'Neutral','AUS'
declare #b table
(
Lastupdated datetime
,ownerid bigint
,Strategy varchar(50)
,country char(3)
)
insert into #b
(
Lastupdated
,ownerid
,strategy
,country
)
select
getdate()
,a.ownerid
,a.strategy
,a.country
from #a a left join #b b
on a.ownerid=b.ownerid
where
b.ownerid is null
select * from #b
--get a different timestamp
waitfor delay '00:00:00.1'
--change source data
update #a
set strategy='Short'
where ownerid=1
--add newly changed data into
insert into #b
select
getdate()
,a.ownerid
,a.strategy
,a.country
from
(select *,checksum(strategy,country) as hashval from #a) a
left join
(select *,checksum(strategy,country) as hashval from #b) b
on a.ownerid=b.ownerid
where
a.hashval<>b.hashval
select * from #b
How about writing a query using EXCEPT? Just write queries for both tables and then add EXCEPT between them:
(SELECT * FROM table_new) EXCEPT (SELECT * FROM table_old)
The result will be the entries in table_new that aren't in table_old (i.e. that have been updated or inserted).
Note: To get rows recently deleted from table_old, you can reverse the order of the queries.
There is no need to check for changes if you use a different approach to the problem.
On your master table create a trigger for INSERT, UPDATE and DELETE which tracks the changes for you by writing to table #b.
If you search the internet for "SQL audit table" you will find many pages describing the process, for example: Adding simple trigger-based auditing to your SQL Server database
Thanks to #newenglander I was able to use EXCEPT to find the changed row. As #Tony said, I'm not sure how multiple changes will work, but here's the same sample code reworked to use Except instead of CHECKSUM
declare #a table
(
ownerid bigint
,Strategy varchar(50)
,country char(3)
)
insert into #a
select 1,'Long','USA'
insert into #a
select 2,'Short','CAN'
insert into #a
select 3,'Neutral','AUS'
declare #b table
(
Lastupdated datetime
,ownerid bigint
,Strategy varchar(50)
,country char(3)
)
insert into #b
(
Lastupdated
,ownerid
,strategy
,country
)
select
getdate()
,a.ownerid
,a.strategy
,a.country
from #a a left join #b b
on a.ownerid=b.ownerid
where
b.ownerid is null
select * from #b
--get a different timestamp
waitfor delay '00:00:00.1'
--change source data
update #a
set strategy='Short'
where ownerid=1
--add newly changed data using EXCEPT
insert into #b
select getdate(),
ownerid,
strategy,
country
from
(
(
select
ownerid
,strategy
,country
from #a changedtable
)
EXCEPT
(
select
ownerid
,strategy
,country
from #b historicaltable
)
) x
select * from #b