Is There a Better Way: Updating the same row within a Transaction - sql

I have a stored procedure that runs the inputs through a series of validation queries, and kicks out a parameter detailing if record will be saved with "complete" or "incomplete" information. Then, we start a transaction, save the data to a couple of tables, commit. Done.
Essentially this:
EXEC dbo.ValidationProcedure --(sets output params)
BEGIN TRANSACTION
MERGE TABLE A (uses output params)
MERGE TABLE B
COMMIT TRANSACTION
In addition to this, we have a view that has the same validation queries, written such that it returns all occurrences of invalid data within our tables; the queries in the procedure version merely check the inputs provided (both the validation procedure and view look at data for both TABLE A AND B). This view returns the exact error code, message, recordID, etc. and we use it audit queries, to get error messages on displaying the input form, yada yada yada....
My issue is that we have the same logic in multiple places, which I hate. My question, and potential solution to this, would it be bad practice/poor design, to remove the validation procedure all together, and do the following:
BEGIN TRANSACTION
MERGE TABLE A
MERGE TABLE B
IF EXISTS (SELECT * FROM ValidationView T WHERE T.ID = #ID)
BEGIN
UPDATE TABLE A SET Incomplete = 1 WHERE [ID] = #ID;
END
COMMIT TRANSACTION
I thought of doing it this way a while ago, but, I was not fond of affecting the same row twice. It seemed wasteful, unnecessary, and an incorrect way to go about it; I'm hoping I am wrong about this thinking. But now, I'm having second thoughts and would like to know if we can unload some code overhead by going with the second example, or should we keep the course and maintain both the validation procedure and validation view.

Related

Use T-SQL Transaction for batch of delete statements?

I have a stored procedure that deletes records from multiple tables.
I wish for either all of the delete statements to complete successfully, or none. The actual purpose here is to wipe all data related to a particular user.
Note that none of this data is related in any way to any other data. E.g. a user's data is not referenced in any way by another users data. However it is possible to have concurrent client sources accessing one user's data simultaneously. I don't know if this is relevant
So I've wrapped it in BEGIN TRANSACTION ... COMMIT TRANSACTION
like so:
CREATE PROCEDURE [dbo].[spDeleteData]
#MyID AS INT
AS
BEGIN TRANSACTION
DELETE FROM [Table1] WHERE myId = #MyID;
DELETE FROM [Table2] WHERE myId = #MyID;
....
COMMIT TRANSACTION
RETURN 0
My question here is what are the implications of wrapping multiple DELETE calls in a transaction? Will it create possible deadlock scenarios, or hurt performance in some way?
From what I am reading, using TRANSACTION ISOLATION LEVEL only applies to read operations, is this true?
What you are guaranteeing is that either all the rows that match the conditions in both tables are successfully deleted or none of the rows are deleted (i.e. if there is a problem the deletes are rolled back.) There are more locks and they are kept for a longer period but if it fails you don't have to manually recreate the rows the deletes are undone for you automatically. You probably want to add the statement:
set xact_abort on
at the beginning of the transaction and to wrap the whole thing in a begin try/begin catch statement.
Please see sommarskog.se/error-handling-I.html#XACT_ABORT for an execellent discussion on this statement and on error handling for TSQL.

How to execute part of the code only once while running multiple instances in T-SQL

I have a stored procedure that is called from business code. This code uses parallelism, so multiple instances of this SP could be running at the same time depending on some conditions.
There is some logic in this SP that I want to execute only once. I have a table (let's call it HISTORY) that holds a UID for the run and a DATETIME when this portion of the code is executed. Here's my flow:
SP BEGIN
-- some logic
IF certain conditions are met, check if HISTORY does not have an entry for the UID
1. Add an entry in HISTORY for the current UID
2. Run the once only code
SP END
The issue is that, at times, the logic above still gets executed multiple times if different instances reach that part at the same time. What can I do to ensure that it only runs once?
Thank you!
BEGIN TRANSACTION;
INSERT [HISTORY](UID, ...)
SELECT #UID, ...
WHERE NOT EXISTS (
SELECT * FROM [HISTORY] WITH (HOLDLOCK) WHERE UID = #UID
);
IF ##ROWCOUNT = 1 BEGIN;
-- we inserted, do logic that should run only once
END;
COMMIT;
HOLDLOCK (equivalent to running the transaction under SERIALIZABLE, but more granular) ensures that no other transaction running in parallel can insert an entry in HISTORY for that UID; any transaction that tries so will block until the first INSERT is finished and then return (since a row already exists). Ensure that an index on UID exists, otherwise it will lock a lot more than is healthy for performance.
Getting code like this right is always tricky, so make sure to test it in practice as well by stress-testing concurrent inserts for the same (and different) UID.

Query for Multiple Users - Best Practices

I currently have about 10 users that use their own personalized query for an internal process at my workplace. The user inputs a few values at the top of the query, hits execute, and voila, their report shows up in the grid. The source data tables they access are the same, but the created tables within are personalized with the suffix _User1, _User2...User10. Each time they run the query, the previously created tables are dropped and created again. The entire query takes about 1 second to run.
The majority of the structure looks like this repeated 5 times for the 5 steps to get to their desired output:
DROP TABLE z
SELECT *
INTO z
FROM y
Now, the number of users is multiplying to 50, and that means that each tweak in the master query code will result in me changing 50 user-specific queries and sending them back out. Managable and annoying with 10 users, completely unmanagable with 50.
My question is, what is the best way to go about structuring the database/query? Ideally I'd like to just have one query, one set of created tables (not 50). Since it only takes 1 second to run, would we run the risk of two or more users (with different inputs) running the query simultaneously, accessing the same tables and somehow getting bad data because they ran it at the exact same time?
Is there a specfic way this is normally done? Hoping someone can shed some light.
Thanks
Disclaimer: As I've indicated in my comments, giving a bunch of users access directly to SSMS to run reports is a very bad idea. Get some sort of front-end, even a simple MS Access database - you would only need a single license to develop the database, and you could give the rest of the users Access Runtime, for instance. There are so many ways a user could really mess you up if they don't know what they're doing. I will offer some ideas below, but I don't recommend doing this.
One solution: use temp tables so you don't have to worry about each user's tables overlapping:
-- drop the table if it already exists
if object_id('tempdb..#z') is not null
DROP TABLE #z
SELECT *
INTO #z
FROM y
When you prefix a table name with #, it becomes a connection-scoped temporary table, which means separate sessions will not see the temporary tables in other sessions even if they have the same name.
Often it is not necessary to create a temp table unless you have some really complicated scenario. You should be able to make use of subqueries, views, CTE's, and stored procedures to generate the output real-time without any new tables being involved. You can even build views and procedures that reference other views so you can organize your complicated logic. For example, you might encapsulate the logic into a stored procedure like this:
CREATE PROCEDURE TheReport
(
#ReportID int,
#Name varchar(50),
#SomeField varchar(10)
)
AS
BEGIN
-- do some complicated query here
SELECT field1, field2 FROM Result Q
END
Then you don't even have to send updates to your users (unless the fields change). Just have their query call the stored procedure, and you can update the procedure directly at your convenience:
DECLARE #ReportID int
DECLARE #Name varchar(50)
DECLARE #SomeField varchar(10)
-- YOU CAN MODIFY THIS --
SET #ReportID = 5
SET #Name = 'MyName'
SET #SomeField = 'abc'
-- DON'T MODIFY BELOW THIS LINE --
EXEC [TheReport] #ReportID, #Name, #SomeField;

SQL Server stored procedure locking issue?

I created this pretty basic stored procedure, that gets called by our cms when a user creates a specific type of item. However, it looks like there are times when we get two rows for each cms item created with the same data, but an off-by-one SourceID. I don't do much SQL work, so this might be something basic - but do I need to explicitly lock the table somehow in the stored procedure to keep this from happening?
Here is the stored procedure code:
BEGIN
SET #newid = (SELECT MAX(SourceID)+1 from [dbo].[sourcecode])
IF NOT EXISTS(SELECT SourceId from [dbo].[sourcecode] where SourceId = #newid)
INSERT INTO [dbo].[sourcecode]
(
SourceID,
Description,
RunCounts,
ShowOnReport,
SourceParentID,
ApprovedSource,
Created
)
VALUES
(
#newid,
#Desc,
1,
#ShowOnReport,
1,
1,
GetDate()
)
RETURN #newid
END
and here is an example of the duplicated data (less a couple of irrelevant columns):
SourceId Description Created
676 some text 2012-10-17 09:42:36.553
677 some text 2012-10-17 09:43:01.380
I am sure this has nothing to do with SP. As Oded mentioned, this could be the result of your code.
I don't see anything in the stored procedure which is capable of generating duplicates.
Also, I wouldn't use MAX(SourceId) + 1. Why don't you use "Auto Increment" if you want a new Source Id all the time anyways?
As it has been said in the comments, I think your issue is more in the code layer; none of the data seems to be violating any constraints. You may want to do a check to see if the same user has submitted the same data "recently" before performing the insert.
You can use locking when using stored procedures. On the ones I use I usually use WITH (ROWLOCK). Locking is used to ensure data integrity. I think a simple Google should bring up lots of information about why you should be using locking.
But as other commentators had said, see if there isn't anything in your code as well. Is there something that is calling the same method twice? Are there 'events' referencing the method that is doing the updating?
The description is probably duplicated because you are calling the same function twice, by clicking the button twice, or whatever.
You should use an IDENTITY on your SourceID column and use the Scope_Identity() function
If you don't want to do that for some reason, then you should wrap the above code in a transaction with the isolation level set to Serializable
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
SET #newid = ....
COMMIT

SQL Server Race Condition Question

(Note: this is for MS SQL Server)
Say you have a table ABC with a primary key identity column, and a CODE column. We want every row in here to have a unique, sequentially-generated code (based on some typical check-digit formula).
Say you have another table DEF with only one row, which stores the next available CODE (imagine a simple autonumber).
I know logic like below would present a race condition, in which two users could end up with the same CODE:
1) Run a select query to grab next available code from DEF
2) Insert said code into table ABC
3) Increment the value in DEF so it's not re-used.
I know that, two users could get stuck at Step 1), and could end up with same CODE in the ABC table.
What is the best way to deal with this situation? I thought I could just wrap a "begin tran" / "commit tran" around this logic, but I don't think that worked. I had a stored procedure like this to test, but I didn't avoid the race condition when I ran from two different windows in MS:
begin tran
declare #x int
select #x= nextcode FROM def
waitfor delay '00:00:15'
update def set nextcode = nextcode + 1
select #x
commit tran
Can someone shed some light on this? I thought the transaction would prevent another user from being able to access my NextCodeTable until the first transaction completed, but I guess my understanding of transactions is flawed.
EDIT: I tried moving the wait to after the "update" statement, and I got two different codes... but I suspected that. I have the waitfor statement there to simulate a delay so the race condition can be easily seen. I think the key problem is my incorrect perception of how transactions work.
Set the Transaction Isolation Level to Serializable.
At lower isolation levels, other transactions can read the data in a row that is read, (but not yet modified) in this transaction. So two transactions can indeed read the same value. At very low isolation (Read Uncommitted) other transactions can even read data after it's been modified (but before committed)...
Review details about SQL Server Isolation Levels here
So bottom line is that the Isolation level is crtitical piece here to control what level of access other transactions get into this one.
NOTE. From the link, about Serializable
Statements cannot read data that has been modified but not yet committed by other transactions.
This is because the locks are placed when the row is modified, not when the Begin Trans occurs, So what you have done may still allow another transaction to read the old value until the point where you modify it. So I would change the logic to modify it in the same statement as you read it, thereby putting the lock on it at the same time.
begin tran
declare #x int
update def set #x= nextcode, nextcode += 1
waitfor delay '00:00:15'
select #x
commit tran
As other responders have mentioned, you can set the transaction isolation level to ensure that anything you 'read' using a SELECT statement cannot change within a transaction.
Alternatively, you could take out a lock specifically on the DEF table by adding the syntax WITH HOLDLOCK after the table name, e.g.,
SELECT nextcode FROM DEF WITH HOLDLOCK
It doesn't make much difference here, as your transaction is small, but it can be useful to take out locks for some SELECTs and not others within a transaction. It's a question of 'repeatability versus concurrency'.
A couple of relavant MS-SQL docs.
Isolation levels
Table hints
Late answer. You want to avoid a race condition...
"SQL Server Process Queue Race Condition"
Recap:
You began a transaction. This doesn't actually "do" anything in and of itself, it modifies subsequent behavior
You read data from a table. The default isolation level is Read Committed, so this select statement is not made part of the transaction.
You then wait 15 seconds
You then issue an update. With the declared transaction, this will generate a lock until the transaction is committed.
You then commit the transaction, releasing the lock.
So, guessing you ran this simultaneously in two windows (A and B):
A read the "next" value from table def, then went into wait mode
B read the same "next" value from the table, then went into wait mode. (Since A only did a read, the transaction did not lock anything.)
A then updated the table, and probably commited the change before B exited the wait state.
B then updated the table, after A's write was committed.
Try putting the wait statement after the update, before the commit, and see what happens.
It's not a real race condition. It's more a common problem with concurrent transactions. One solution is to set a read lock on the table and therefor have a serialization in place.
This is actually a common problem in SQL databases and that is why most (all?) of them have some built in features to take care of this issue of obtaining a unique identifier. Here are some things to look into if you are using Mysql or Postgres. If you are using a different database I bet the provide something very similar.
A good example of this is postgres sequences which you can check out here:
Postgres Sequences
Mysql uses something called auto increments.
Mysql auto increment
You can set the column to a computed value that is persisted. This will take care of the race condition.
Persisted Computed Columns
NOTE
Using this method means you do not need to store the next code in a table. The code column becomes the reference point.
Implementation
Give the column the following properties under computed column specification.
Formula = dbo.GetNextCode()
Is Persisted = Yes
Create Function dbo.GetNextCode()
Returns VarChar(10)
As
Begin
Declare #Return VarChar(10);
Declare #MaxId Int
Select #MaxId = Max(Id)
From Table
Select #Return = Code
From Table
Where Id = #MaxId;
/* Generate New Code ... */
Return #Return;
End