What is wrong with the below tsql code - sql

BEGIN TRAN
SELECT * FROM AnySchema.AnyTable
WHERE AnyColumn = SomeCondition
COMMIT
I know the transaction is not required here because it is just a select but just want to know how bad a programming it is and whether it is going to be an overhead on the DB engine.

You may use transaction on SELECT statements to ensure nobody else could update / delete records of the table of while the bunch of your select queries are executing.
Using WITH(NOLOCK):
Anyways, you may also use WITH(NOLOCK) for t_sql
SELECT * FROM AnySchema.AnyTable WITH(NOLOCK)
WHERE AnyColumn = SomeCondition
WITH (NOLOCK) is the equivalent of using READ UNCOMMITED as a transaction isolation level. Here stand the risk of reading an uncommitted row that is subsequently rolled back, i.e. data that never made it into the database.
So, while it can prevent reads being deadlocked by other operations, it comes with a risk.
TRANSACTION Block :
Using TRANSACTION block will not cause much of extra DB overload but if you keep the same type practice on, and suppose , at any SQL block you forget (you / your developers may forget, right ?) to close the transaction, then other processes can't work on the same table.
Anyways, it depends on what type of application you are using. If very frequent update and select things are there , it is advised not to use such transaction blocks. If medium level of updates and select are there, occurs next to each other, you may use transaction blocks for select (but ensure to close the transaction).

That's actually a good question. To understand what's going on, you need to know about SET IMPLICIT_TRANSACTIONS.
When ON, the system is in implicit transaction mode. This means that if ##TRANCOUNT = 0, any of the following Transact-SQL statements begins a new transaction. It is equivalent to an unseen BEGIN TRANSACTION being executed first:
When OFF, each of the preceding T-SQL statements is bounded by an unseen BEGIN TRANSACTION and an unseen COMMIT TRANSACTION statement. When OFF, we say the transaction mode is autocommit. [this is the default]
If your T-SQL code visibly issues a BEGIN TRANSACTION, we say the transaction mode is explicit.
https://msdn.microsoft.com/en-us/library/ms187807.aspx
Since the SQL Server would have created a transaction for you, manually doing doesn't actually change anything. The exact same thing would have happened either way.
Summary: What you are doing isn't 'wrong' because it has no effect, but unnecessary and confusing to the reader.

I think you would be taking a holdlock and most likely a tablock
table hints
That is not always a good thing as you would block any update or deletes (maybe even inserts)
It would be better to let SQL decide what level of of locks to take. Most likely pagelocks. I would stay away from nolock as bad stuff can happen.
On a select on a single table just let the optimizer do it's thing.

Related

SQL database locking difference between SELECT and UPDATE

I am re-writing an old stored procedure which is called by BizTalk. Now this has the potential to have 50-60 messages pushed through at once.
I occasionally have an issue with database locking when they are all trying to update at once.
I can only make changes in SQL (not BizTalk) and I am trying to find the best way to run my SP.
With this in mind what i have done is to make the majority of the statement to determine if an UPDATE is needed by using a SELECT statement.
What my question is - What is the difference regarding locking between an UPDATE statement and a SELECT with a NOLOCK against it?
I hope this makes sense - Thank you.
You use nolock when you want to read uncommitted data and want to avoid taking any shared lock on the data so that other transactions can take exclusive lock for updating/deleting.
You should not use nolock with update statement, it is really a bad idea, MS says that nolock are ignored for the target of update/insert statement.
Support for use of the READUNCOMMITTED and NOLOCK hints in the FROM
clause that apply to the target table of an UPDATE or DELETE statement
will be removed in a future version of SQL Server. Avoid using these
hints in this context in new development work, and plan to modify
applications that currently use them.
Source
Regarding your locking problem during multiple updates happening at the same time. This normally happens when you read data with the intention to update it later by just putting a shared lock, the following UPDATE statement can’t acquire the necessary Update Locks, because they are already blocked by the Shared Locks acquired in the different session causing the deadlock.
To resolve this you can select the records using UPDLOCK like following
DECLARE #IdToUpdate INT
SELECT #IdToUpdate =ID FROM [Your_Table] WITH (UPDLOCK) WHERE A=B
UPDATE [Your_Table]
SET X=Y
WHERE ID=#IdToUpdate
This will take the necessary Update lock on the record in advance and will stop other sessions to acquire any lock (shared/exclusive) on the record and will prevent from any deadlocks.
NOLOCK: Specifies that dirty reads are allowed. No shared locks are issued to prevent other transactions from modifying data read by the current transaction, and exclusive locks set by other transactions do not block the current transaction from reading the locked data. NOLOCK is equivalent to READUNCOMMITTED.
Thus, while using NOLOCK you get all rows back but there are chances to read Uncommitted (Dirty) data. And while using READPAST you get only Committed Data so there are chances you won’t get those records that are currently being processed and not committed.
For your better understanding please go through below link.
https://www.mssqltips.com/sqlservertip/2470/understanding-the-sql-server-nolock-hint/
https://www.mssqltips.com/sqlservertip/4468/compare-sql-server-nolock-and-readpast-table-hints/
https://social.technet.microsoft.com/wiki/contents/articles/19112.understanding-nolock-query-hint.aspx

Why deadlock occurs?

I use a small transaction which consists of two simple queries: select and update:
SELECT * FROM XYZ WHERE ABC = DEF
and
UPDATE XYZ SET ABC = 123
WHERE ABC = DEF
It is quite often situation when the transaction is started by two threads, and depending on Isolation Level deadlock occurs (RepeatableRead, Serialization). Both transactions try to read and update exactly the same row.
I'm wondering why it is happening. What is the order of queries which leads to deadlock? I've read a bit about lock (shared, exclusive) and how long locks last for each isolation level, but I still don't fully understand...
I've even prepared a simple test which always result in deadlock. I've looked at results of the test in SSMS and SQL Server Profiler. I started first query and then immediately the second.
First query:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
SELECT ...
WAITFOR DELAY '00:00:04'
UPDATE ...
COMMIT
Second query:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
SELECT ...
UPDATE ...
COMMIT
Now I'm not able to show you detailed logs, but it looks less or more like this (I've very likely missed Lock:deadlock etc. somewhere):
(1) SQL:BatchStarting: First query
(2) SQL:BatchStarting: Second query
(3) Lock:timeout for second query
(4) Lock:timeout for first query
(5) Deadlock graph
If I understand locks well, in (1) first query takes a shared lock (to execute SELECT), then goes to sleep and keeps the shared lock until the end of transaction. In (2) second query also takes shared lock (SELECT) but cannot take exclusive lock (UPDATE) while there are shared locks on the same row, which results in Lock:timeout. But I can't explain why timeout for second query occurs. Probably I don't understand the whole process well. Can anybody give a good explanation?
I haven't noticed deadlocks using ReadCommitted but I'm afraid they may occur.
What solution do you recommend?
A deadlock occurs when two or more tasks permanently block each other by each task having a lock on a resource which the other tasks are trying to lock
http://msdn.microsoft.com/en-us/library/ms177433.aspx
"But I can't explain why timeout for second query occurs."
Because the first query holds shared lock. Then the update in the first query also tries to get the exclusive lock, which makes him sleep. So the first and second query are both sleeping waiting for the other to wake up - and this is a deadlock which results in timeout :-)
In mysql it works better - the deadlock is detected immediatelly and one of the transactions is rolled back (you need not to wait for timeout :-)).
Also, in mysql, you can do the following to prevent deadlock:
select ... for update
which will put a write-lock (i.e. exclusive lock) just from the beginning of the transaction, and this way you avoid the deadlock situation! Perhaps you can do something similar in your database engine.
For MSSQL there is a mechanism to prevent deadlocks. What you need here is called the WITH NOLOCK hint.
In 99.99% of the cases of SELECT statements it's usable and there is no need to bundle the SELECT with the UPDATE. There is also no need to put a SELECT into a transaction. The only exception is when dirty reads are not allowed.
Changing your queries to this form would solve all your issues:
SELECT ...
FROM yourtable WITH (NOLOCK)
WHERE ...
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
UPDATE ...
COMMIT
It has been a long time since I last dealt with this, but I believe that the select statement creates a read-lock, which only prevents the data to be changed -- hence multiple queries can hold and share a read-lock on the same data. The shared-read-lock is for read consistency, that is if you multiple times in your transaction reads the same row, then read-consistency should mean that you should always get the same result.
The update statement requires an exclusive lock, and hence the update statement have to wait for the read-lock to be released.
None of the two transactions will release the locks, so the transactions fails.
Different databases implementations have different strategies for how to deal with this, with Sybase and MS-SQL-servers using lock escalation with timeout (escalate from read-to-write-lock) -- Oracle I believe (at some point) implemented read consistency though use of the roll-back-log, where MySQL have yet a different strategy.

Deadlocks - Will this really help?

So I've got a query that keeps deadlocking on me. People who know the system well can't figure out why the sproc is deadlocking, but they tell me that I should just add this to it:
SET NOCOUNT ON
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
Is this really a valid solution? What does that do?
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
This will cause the system to return inconsitent data, including duplicate records and missing records. Read more at Previously committed rows might be missed if NOLOCK hint is used, or here at Timebomb - The Consistency problem with NOLOCK / READ UNCOMMITTED.
Deadlocks can be investigated and fixed, is not a big deal if you follow the proper procedure. Of course, throwing a dirty read may seem easier, but down the road you'll be sitting long hours staring at your general ledger and wondering why the heck it does not balance debits and credits. So read again until you really grok this: DIRTY READs ARE INCONSISTENT READS.
If you want a get-out-of-jail card, turn on snapshot isolation:
ALTER DATABASE MyDatabase
SET READ_COMMITTED_SNAPSHOT ON
But keep in mind that snapshot isolation does not fix the deadlocks, it only hides them. Proper investigation of the deadlock cause and fix is always the appropriate action.
NOCOUNT will keep your query from returning rowcounts back to the calling application (i.e. 1000000 rows affected).
TRANSACTION ISOLATION LEVEL READ UNCOMMITTED will allow for dirty reads as indicated here.
The isolation level may help, but do you want to allow dirty reads?
Randomly adding SET options to the query is unlikely to help I'm afraid
SET NOCOUNT ON
Will have no effect on the issue.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
will prevent your query taking out shared locks. As well as reading "dirty" data it also can lead to your query reading the same rows twice, or not at all, dependant upon what other concurrent activity is happening.
Whether this will resolve your deadlock issue depends upon the type of deadlock. It will have no effect at all if the issue is 2 writers deadlocking due to non linear ordering of lock requests. (transaction 1 updating row a, transaction 2 updating row b then tran 1 requesting a lock on b and tran 2 requesting a lock on a)
Can you post the offending query and deadlock graph? (if you are on SQL 2005 or later)
The best guide is:
http://technet.microsoft.com/es-es/library/ms173763.aspx
Snippet:
Specifies that statements can read rows that have been modified by other
transactions but not yet committed.
Transactions running at the READ
UNCOMMITTED level do not issue shared
locks to prevent other transactions
from modifying data read by the
current transaction. READ UNCOMMITTED
transactions are also not blocked by
exclusive locks that would prevent the
current transaction from reading rows
that have been modified but not
committed by other transactions. When
this option is set, it is possible to
read uncommitted modifications, which
are called dirty reads. Values in the
data can be changed and rows can
appear or disappear in the data set
before the end of the transaction.
This option has the same effect as
setting NOLOCK on all tables in all
SELECT statements in a transaction.
This is the least restrictive of the
isolation levels.
In SQL Server, you can also minimize
locking contention while protecting
transactions from dirty reads of
uncommitted data modifications using
either:
The READ COMMITTED isolation level
with the READ_COMMITTED_SNAPSHOT
database option set to ON. The
SNAPSHOT isolation level
.
On a different tack, there are two other aspects to consider, that may help.
1) Indexes and the indexes used by the SQL. The indexing strategy used on the tables will affect how many rows are affected. If you make the data modifications using a unique index, you may reduce the chance of deadlocks.
One algorithm - of course it will not work it all cases. The use of NOLOCK is targeted rather than being global.
The "old" way:
UPDATE dbo.change_table
SET somecol = newval
WHERE non_unique_value = 'something'
The "new" way:
INSERT INTO #temp_table
SELECT uid FROM dbo.change_table WITH (NOLOCK)
WHERE non_unique_value = 'something'
UPDATE dbo.change_table
SET somecol = newval
FROM dbo.change_table c
INNER JOIN
#temp_table t
ON (c.uid = t.uid)
2) Transaction duration
The longer a transaction is open the more likely there may be contention. If there is a way to reduce the amount of time that records remain locked, you can reduce the chances of a deadlock occurring.
For example, perform as many SELECT statements (e.g. lookups) at the start of the code instead of performing an INSERT or UPDATE, then a lookup, then an INSERT, and then another lookup.
This is where one can use the NOLOCK hint for SELECTs on "static" tables that are not changing reducing the lock "footprint" of the code.

SQL Server Race Condition Question

(Note: this is for MS SQL Server)
Say you have a table ABC with a primary key identity column, and a CODE column. We want every row in here to have a unique, sequentially-generated code (based on some typical check-digit formula).
Say you have another table DEF with only one row, which stores the next available CODE (imagine a simple autonumber).
I know logic like below would present a race condition, in which two users could end up with the same CODE:
1) Run a select query to grab next available code from DEF
2) Insert said code into table ABC
3) Increment the value in DEF so it's not re-used.
I know that, two users could get stuck at Step 1), and could end up with same CODE in the ABC table.
What is the best way to deal with this situation? I thought I could just wrap a "begin tran" / "commit tran" around this logic, but I don't think that worked. I had a stored procedure like this to test, but I didn't avoid the race condition when I ran from two different windows in MS:
begin tran
declare #x int
select #x= nextcode FROM def
waitfor delay '00:00:15'
update def set nextcode = nextcode + 1
select #x
commit tran
Can someone shed some light on this? I thought the transaction would prevent another user from being able to access my NextCodeTable until the first transaction completed, but I guess my understanding of transactions is flawed.
EDIT: I tried moving the wait to after the "update" statement, and I got two different codes... but I suspected that. I have the waitfor statement there to simulate a delay so the race condition can be easily seen. I think the key problem is my incorrect perception of how transactions work.
Set the Transaction Isolation Level to Serializable.
At lower isolation levels, other transactions can read the data in a row that is read, (but not yet modified) in this transaction. So two transactions can indeed read the same value. At very low isolation (Read Uncommitted) other transactions can even read data after it's been modified (but before committed)...
Review details about SQL Server Isolation Levels here
So bottom line is that the Isolation level is crtitical piece here to control what level of access other transactions get into this one.
NOTE. From the link, about Serializable
Statements cannot read data that has been modified but not yet committed by other transactions.
This is because the locks are placed when the row is modified, not when the Begin Trans occurs, So what you have done may still allow another transaction to read the old value until the point where you modify it. So I would change the logic to modify it in the same statement as you read it, thereby putting the lock on it at the same time.
begin tran
declare #x int
update def set #x= nextcode, nextcode += 1
waitfor delay '00:00:15'
select #x
commit tran
As other responders have mentioned, you can set the transaction isolation level to ensure that anything you 'read' using a SELECT statement cannot change within a transaction.
Alternatively, you could take out a lock specifically on the DEF table by adding the syntax WITH HOLDLOCK after the table name, e.g.,
SELECT nextcode FROM DEF WITH HOLDLOCK
It doesn't make much difference here, as your transaction is small, but it can be useful to take out locks for some SELECTs and not others within a transaction. It's a question of 'repeatability versus concurrency'.
A couple of relavant MS-SQL docs.
Isolation levels
Table hints
Late answer. You want to avoid a race condition...
"SQL Server Process Queue Race Condition"
Recap:
You began a transaction. This doesn't actually "do" anything in and of itself, it modifies subsequent behavior
You read data from a table. The default isolation level is Read Committed, so this select statement is not made part of the transaction.
You then wait 15 seconds
You then issue an update. With the declared transaction, this will generate a lock until the transaction is committed.
You then commit the transaction, releasing the lock.
So, guessing you ran this simultaneously in two windows (A and B):
A read the "next" value from table def, then went into wait mode
B read the same "next" value from the table, then went into wait mode. (Since A only did a read, the transaction did not lock anything.)
A then updated the table, and probably commited the change before B exited the wait state.
B then updated the table, after A's write was committed.
Try putting the wait statement after the update, before the commit, and see what happens.
It's not a real race condition. It's more a common problem with concurrent transactions. One solution is to set a read lock on the table and therefor have a serialization in place.
This is actually a common problem in SQL databases and that is why most (all?) of them have some built in features to take care of this issue of obtaining a unique identifier. Here are some things to look into if you are using Mysql or Postgres. If you are using a different database I bet the provide something very similar.
A good example of this is postgres sequences which you can check out here:
Postgres Sequences
Mysql uses something called auto increments.
Mysql auto increment
You can set the column to a computed value that is persisted. This will take care of the race condition.
Persisted Computed Columns
NOTE
Using this method means you do not need to store the next code in a table. The code column becomes the reference point.
Implementation
Give the column the following properties under computed column specification.
Formula = dbo.GetNextCode()
Is Persisted = Yes
Create Function dbo.GetNextCode()
Returns VarChar(10)
As
Begin
Declare #Return VarChar(10);
Declare #MaxId Int
Select #MaxId = Max(Id)
From Table
Select #Return = Code
From Table
Where Id = #MaxId;
/* Generate New Code ... */
Return #Return;
End

Should I commit or rollback a read transaction?

I have a read query that I execute within a transaction so that I can specify the isolation level. Once the query is complete, what should I do?
Commit the transaction
Rollback the transaction
Do nothing (which will cause the transaction to be rolled back at the end of the using block)
What are the implications of doing each?
using (IDbConnection connection = ConnectionFactory.CreateConnection())
{
using (IDbTransaction transaction = connection.BeginTransaction(IsolationLevel.ReadUncommitted))
{
using (IDbCommand command = connection.CreateCommand())
{
command.Transaction = transaction;
command.CommandText = "SELECT * FROM SomeTable";
using (IDataReader reader = command.ExecuteReader())
{
// Read the results
}
}
// To commit, or not to commit?
}
}
EDIT: The question is not if a transaction should be used or if there are other ways to set the transaction level. The question is if it makes any difference that a transaction that does not modify anything is committed or rolled back. Is there a performance difference? Does it affect other connections? Any other differences?
You commit. Period. There's no other sensible alternative. If you started a transaction, you should close it. Committing releases any locks you may have had, and is equally sensible with ReadUncommitted or Serializable isolation levels. Relying on implicit rollback - while perhaps technically equivalent - is just poor form.
If that hasn't convinced you, just imagine the next guy who inserts an update statement in the middle of your code, and has to track down the implicit rollback that occurs and removes his data.
If you haven't changed anything, then you can use either a COMMIT or a ROLLBACK. Either one will release any read locks you have acquired and since you haven't made any other changes, they will be equivalent.
If you begin a transaction, then best practice is always to commit it. If an exception is thrown inside your use(transaction) block the transaction will be automatically rolled-back.
Consider nested transactions.
Most RDBMSes do not support nested transactions, or try to emulate them in a very limited way.
For example, in MS SQL Server, a rollback in an inner transaction (which is not a real transaction, MS SQL Server just counts transaction levels!) will rollback the everything which has happened in the outmost transaction (which is the real transaction).
Some database wrappers might consider a rollback in an inner transaction as an sign that an error has occured and rollback everything in the outmost transaction, regardless whether the outmost transaction commited or rolled back.
So a COMMIT is the safe way, when you cannot rule out that your component is used by some software module.
Please note that this is a general answer to the question. The code example cleverly works around the issue with an outer transaction by opening a new database connection.
Regarding performance: depending on the isolation level, SELECTs may require a varying degree of LOCKs and temporary data (snapshots). This is cleaned up when the transaction is closed. It does not matter whether this is done via COMMIT or ROLLBACK. There might be a insignificant difference in CPU time spent - a COMMIT is probably faster to parse than a ROLLBACK (two characters less) and other minor differences. Obviously, this is only true for read-only operations!
Totally not asked for: another programmer who might get to read the code might assume that a ROLLBACK implies an error condition.
IMHO it can make sense to wrap read only queries in transactions as (especially in Java) you can tell the transaction to be "read-only" which in turn the JDBC driver can consider optimizing the query (but does not have to, so nobody will prevent you from issuing an INSERT nevertheless). E.g. the Oracle driver will completely avoid table locks on queries in a transaction marked read-only, which gains a lot of performance on heavily read-driven applications.
ROLLBACK is mostly used in case of an error or exceptional circumstances, and COMMIT in the case of successful completion.
We should close transactions with COMMIT (for success) and ROLLBACK (for failure), even in the case of read-only transactions where it doesn't seem to matter. In fact it does matter, for consistency and future-proofing.
A read-only transaction can logically "fail" in many ways, for example:
a query does not return exactly one row as expected
a stored procedure raises an exception
data fetched is found to be inconsistent
user aborts the transaction because it's taking too long
deadlock or timeout
If COMMIT and ROLLBACK are used properly for a read-only transaction, it will continue to work as expected if DB write code is added at some point, e.g. for caching, auditing or statistics.
Implicit ROLLBACK should only be used for "fatal error" situations, when the application crashes or exits with an unrecoverable error, network failure, power failure, etc.
Just a side note, but you can also write that code like this:
using (IDbConnection connection = ConnectionFactory.CreateConnection())
using (IDbTransaction transaction = connection.BeginTransaction(IsolationLevel.ReadUncommitted))
using (IDbCommand command = connection.CreateCommand())
{
command.Transaction = transaction;
command.CommandText = "SELECT * FROM SomeTable";
using (IDataReader reader = command.ExecuteReader())
{
// Do something useful
}
// To commit, or not to commit?
}
And if you re-structure things just a little bit you might be able to move the using block for the IDataReader up to the top as well.
If you put the SQL into a stored procedure and add this above the query:
set transaction isolation level read uncommitted
then you don't have to jump through any hoops in the C# code. Setting the transaction isolation level in a stored procedure does not cause the setting to apply to all future uses of that connection (which is something you have to worry about with other settings since the connections are pooled). At the end of the stored procedure it just goes back to whatever the connection was initialized with.
Given that a READ does not change state, I would do nothing. Performing a commit will do nothing, except waste a cycle to send the request to the database. You haven't performed an operation that has changed state. Likewise for the rollback.
You should however, be sure to clean up your objects and close your connections to the database. Not closing your connections can lead to issues if this code gets called repeatedly.
If you set AutoCommit false, then YES.
In an experiment with JDBC(Postgresql driver), I found that if select query breaks(because of timeout), then you can not initiate new select query unless you rollback.
Do you need to block others from reading the same data? Why use a transaction?
#Joel - My question would be better phrased as "Why use a transaction on a read query?"
#Stefan - If you are going to use AdHoc SQL and not a stored proc, then just add the WITH (NOLOCK) after the tables in the query. This way you dont incur the overhead (albeit minimal) in the application and the database for a transaction.
SELECT * FROM SomeTable WITH (NOLOCK)
EDIT # Comment 3: Since you had "sqlserver" in the question tags, I had assumed MSSQLServer was the target product. Now that that point has been clarified, I have edited the tags to remove the specific product reference.
I am still not sure of why you want to make a transaction on a read op in the first place.
In your code sample, where you have
// Do something useful
Are you executing a SQL Statement that changes data ?
If not, there's no such thing as a "Read" Transaction... Only changes from an Insert, Update and Delete Statements (statements that can change data) are in a Transaction... What you are talking about is the locks that SQL Server puts on the data you are reading, because of OTHER transactions that affect that data. The level of these locks is dependant on the SQL Server Isolation Level.
But you cannot Commit, or ROll Back anything, if your SQL statement has not changed anything.
If you are changing data, then you can change the isolation level without explicitly starting a transation... Every individual SQL Statement is implicitly in a transaction. explicitly starting a Transaction is only necessary to ensure that 2 or more statements are within the same transaction.
If all you want to do is set the transaction isolation level, then just set a command's CommandText to "Set Transaction Isolation level Repeatable Read" (or whatever level you want), set the CommandType to CommandType.Text, and execute the command. (you can use Command.ExecuteNonQuery() )
NOTE: If you are doing MULTIPLE read statements, and want them all to "see" the same state of the database as the first one, then you need to set the isolation Level top Repeatable Read or Serializable...