Delete and Insert Inside one Transaction SQL - sql

I just want to ask if it is always the first query will be executed when encapsulate to a transaction? for example i got 500 k records to be deleted and 500 k to be inserted, is there a possibility of locking?
Actually I already test this query and it works fine but i want to make sure if my assumption is correct.
Note: this will Delete and Insert the same record with possible update on other columns.
BEGIN TRAN;
DELETE FROM OUTPUT TABLE WHERE ID = (1,2,3,4 etc)
INSERT INTO OUTPUT TABLE Values (1,2,3,4 etc)
COMMIT TRAN;

Within a transaction all write locks (all locks acquired for modifications) must obey the strict two phase locking rule. One of the consequences is that a write (X) lock acquired in a transaction cannot be released until the transaction commits. So yes, the DELETE and INSERT will execute sequentially and all locks acquired during the DELETE will be retained while executing the INSERT.
Keep in mind that deleting 500k rows in a transaction will escalate the locks to one table lock, see Lock Escalation.
Deleting 500k rows and inserting 500k rows in a single transaction, while maybe correct, is a bad idea. You should avoid such large units of works, long transaction, if possible. Long transactions pin the log in place, create blocking and contention, increase recovery and DB startup time, increase SQL Server resource consumption (locks require memory).
You should consider doing the operation in small batches (perhaps 10000 rows at time), use MERGE instead of DELETE/INSERT (if possible) and, last but not least, consider a partitioned sliding window
implementation, see How to Implement an Automatic Sliding Window in a Partitioned Table.

From the documentation on TRANSACTION (emphasis mine):
BEGIN TRANSACTION represents a point at which the data referenced by a
connection is logically and physically consistent. If errors are
encountered, all data modifications made after the BEGIN TRANSACTION
can be rolled back to return the data to this known state of
consistency. Each transaction lasts until either it completes without
errors and COMMIT TRANSACTION is issued to make the modifications a
permanent part of the database, or errors are encountered and all
modifications are erased with a ROLLBACK TRANSACTION statement.
BEGIN TRANSACTION starts a local transaction for the connection
issuing the statement. Depending on the current transaction isolation
level settings, many resources acquired to support the Transact-SQL
statements issued by the connection are locked by the transaction
until it is completed with either a COMMIT TRANSACTION or ROLLBACK
TRANSACTION statement. Transactions left outstanding for long periods
of time can prevent other users from accessing these locked resources,
and also can prevent log truncation.
Although BEGIN TRANSACTION starts a local transaction, it is not
recorded in the transaction log until the application subsequently
performs an action that must be recorded in the log, such as executing
an INSERT, UPDATE, or DELETE statement. An application can perform
actions such as acquiring locks to protect the transaction isolation
level of SELECT statements, but nothing is recorded in the log until
the application performs a modification action.

Related

Delete Statement and Transaction for One row in table

I have only 1 row in a table. Suppose I use following statements
BEGIN TRANSACTION
DELETE FROM mytable (Now if we check the table, the row will be deleted from table.)
ROLLBACK TRANSACTION (The row will be again available in table).
My question is, when we ROLLBACK the transaction, from where the SQL Server restores the data or where the data is temporarily held during transaction.
Thanks
All of the data modifications within the transaction are stored within the transaction log, with additional space also reserved in the log for the undo records, in the event that it has to rollback. Each transaction log has sufficient information within it, to reverse the change it has made, so that it can undo the change if required.
If we take a simple delete operation as an example, the record being deleted is stored inside the transaction log entry of LOP_DELETE_ROWS and with some non-trivial effort you can decode and demonstrate the entire row is within the log entry.
If the transaction is to be rolled back, the undo space reserved in the log is going to be used, and the row would be re-inserted. The reason for the undo reservation of space is to ensure that the transaction log can not be filled up mid transaction, leaving it no space to complete or rollback.

Does a SQL UPDATE operation read data to "local memory"?

This answer quotes this Technet article which explains the two interpretations of lost updates:
A lost update can be interpreted in one of two ways. In the first scenario, a lost update is considered to have taken place when data that has been updated by one transaction is overwritten by another transaction, before the first transaction is either committed or rolled back. This type of lost update cannot occur in SQL Server 2005 because it is not allowed under any transaction isolation level.
The other interpretation of a lost update is when one transaction (Transaction #1) reads data into its local memory, and then another transaction (Transaction #2) changes this data and commits its change. After this, Transaction #1 updates the same data based on what it read into memory before Transaction #2 was executed. In this case, the update performed by Transaction #2 can be considered a lost update.
So it looks like the difference is that in the first scenario the whole update happens out of "local memory" while in the second one there's "local memory" used and this makes a difference.
Suppose I have the following code:
UPDATE MagicTable SET MagicColumn = MagicColumn + 10 WHERE SomeCondition
Does this involve "local memory"? Is it prone to the first or to the second interpretation of lost updates?
I suppose it would come under the second interpretation.
However the way this type of UPDATE is implemented in SQL Server a lost update is still not possible. Rows read for the update are protected with a U lock (converted to X lock when the row is actually updated).
U locks are not compatible with other U locks (or X locks)
So at all isolation levels if two concurrent transactions were to run this statement then one of them would end up blocked behind the other transaction's U lock or X lock and would not be able to proceed until that transaction completes.
Therefore it is not possible for lost updates to occur with this pattern in SQL Server at any isolation level.
To achieve a lost update you would need to do something like
BEGIN TRAN
DECLARE #MagicColumn INT;
/*Two concurrent transactions can both read the same pre-update value*/
SELECT #MagicColumn = MagicColumn FROM MagicTable WHERE SomeCondition
UPDATE MagicTable SET MagicColumn = #MagicColumn + 10 WHERE SomeCondition
COMMIT

With TSQL SNAPSHOT ISOLATION, is it possible to update only non-locked rows?

A large SQL Server 2008 table is normally being updated in (relatively) small chunks using a SNAPSHOT ISOLATION transaction. Snapshot works very well for those updates since the chunks never overlap. These updates aren't a single long running operation, but many small one-row insert/update grouped by the transaction.
I would like a lower priority transaction to update all the rows which aren't currently locked. Does anyone know how I can get this behavior? Will another SNAPSHOT ISOLATION transaction fail as soon as it a row clashes, or will it update everything it can before failing?
Could SET DEADLOCK_PRIORITY LOW with a try-catch be of any help? Maybe in a retry loop with a WHERE which targets only rows which haven't been updated?
Snapshot isolation doesn't really work that way; the optimistic locking model means it won't check for locks or conflicts until it's ready to write/commit. You also can't set query 'priority' per se, nor can you use the READPAST hint on an update.
Each update is an implicit atomic transaction so if 1 update out of 10 fails (in a single transaction) they all roll back.
SET DEADLOCK_PRIORITY only sets a preference for which transaction is rolled back in the event of a dealdlock (otherwise the 'cheapest' rollback is selected).
A try-catch is pretty much a requirement if you're expecting regular collisions.
The retry loop would work as would using a different locking model and the NOWAIT hint to skip queries that would be blocked.
SNAPSHOT ISOLATION transaction fails as soon as it encounters an update conflict. However, I would use some queue outside the database to prioritize updates.

SQL Server Insert query for a forum

Considering a forum table and many users simultaneously inserting messages into it, how safe is this transaction?
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
DECLARE #LastMessageId SMALLINT
SELECT #LastMessageId = MAX(MessageId)
FROM Discussions
WHERE ForumId = #ForumId AND DiscussionId = #DiscussionId
INSERT INTO Discussions
(ForumId, DiscussionId, MessageId, ParentId, MessageSubject, MessageBody)
VALUES
(#ForumId, #DiscussionId, #LastMessageId + 1, #ParentId, #MessageSubject, #MessageBody)
IF ##ERROR = 0
BEGIN
COMMIT TRANSACTION
RETURN 0
END
ROLLBACK TRANSACTION
RETURN 1
Here I read last MessageId and increment it. I can't use Identity field because it needs to be incremented for every message inserted in a group (not every message insert into table.)
Your transaction should be quite safe indeed - check out the MSDN docs on the SERIALIZABLE transaction level:
SERIALIZABLE
Specifies the following:
Statements cannot read data that has been modified but not yet
committed by other transactions.
No other transactions can modify data that has been read by the
current transaction until the current
transaction completes.
Other transactions cannot insert new rows with key values that
would fall in the range of keys read
by any statements in the current
transaction until the current
transaction completes.
Range locks are placed in the range of key values that match the
search conditions of each statement
executed in a transaction. This blocks
other transactions from updating or
inserting any rows that would qualify
for any of the statements executed by
the current transaction. This means
that if any of the statements in a
transaction are executed a second
time, they will read the same set of
rows. The range locks are held until
the transaction completes. This is the
most restrictive of the isolation
levels because it locks entire ranges
of keys and holds the locks until the
transaction completes. Because
concurrency is lower, use this option
only when necessary. This option has
the same effect as setting HOLDLOCK on
all tables in all SELECT statements in
a transaction.
The main problem with this transaction isolation level is that it's a pretty heavy load on the server, and serializes (as the name implies) any access, so your server performance and scalability will suffer, e.g. with very high numbers of users, you'll possibly get lots of timeouts for users waiting for a transaction to finish.
So using the more lightweight approach of a global message id as INT IDENTITY is definitely much better!

Is there a difference between a SELECT statement inside a transaction and one that is outside of it?

Does the default READ COMMITTED isolation level somehow makes the SELECT statement act different inside of a transaction than one that is not in a transaction?
I am using MS SQL.
Yes, the one inside the transaction can see changes made by other previous Insert/Update/delete statements in that transaction; a Select statement outside the transaction cannot.
If all you are asking about is what the Isolation Level does, then understand that all Select statements (hey, all statements of any kind) - are in a transaction. The only difference between one that is explicitly in a transaction and one that is standing on its own is that the one that is standing alone starts its transaction immediately before it executes it, and commits or roll back immediately after it executes;
whereas the one that is explicitly in a transaction can (because it has a Begin Transaction statement) can have other statements (inserts/updates/deletes, whatever) occurring within that same transaction, either before or after that Select statement.
So whatever the isolation level is set to, both selects (inside or outside an explicit transaction) will nevertheless be in a transaction which is operating at that isolation level.
Addition:
The following is for SQL Server, but all databases MUST work in the same way. In SQL Server the Query Processor is always in one of 3 Transaction Modes, AutoCommit, Implicit, or Explicit.
AutoCommit is the default transaction management mode of the SQL Server Database Engine. .. Every Transact-SQL statement is committed or rolled back when it completes. ... If a statement completes successfully, it is committed; if it encounters any error, it is rolled back. This is the default, and is the answer to #Alex's question in the comments.
In Implicit Transaction mode, "... the SQL Server Database Engine automatically starts a new transaction after the current transaction is committed or rolled back. You do nothing to delineate the start of a transaction; you only commit or roll back each transaction. Implicit transaction mode generates a continuous chain of transactions. ..." Note that the italicized snippet is for each transaction, whether it be a single or multiple statement transaction.
The engine is placed in Explicit Transaction mode when you explicitly initiate a transaction with BEGIN TRANSACTION Statement. Then, every statement is executed within that transaction until you explicitly terminate the transaction (with COMMIT or ROLLBACK) or if a failure occurs that causes the engine to terminate and Rollback.
Yes, there is a bit of a difference. For MySQL, the database doesn't actually start with a snapshot until your first query. Therefore, it's not begin that matters, but the first statement within the transaction. If I do the following:
#Session 1
begin; select * from table;
#Session 2
delete * from table; #implicit autocommit
#Session 1
select * from table;
Then I'll get the same thing in session one both times (the information that was in the table before I deleted it). When I end session one's transaction (commit, begin, or rollback) and check again from that session, the table will show as empty.
The READ COMMITTED isolation level is about the records that have been written. It has nothing to do with whether or not this select statement is in a transaction (except for those things written during that same transaction).
If your database (or in mysql, the underlying storage engine of all tables used in your select statement) is transactional, then there simply no way to execute it "outside of a transaction".
Perhaps you meant "run it in autocommit mode", but that is not the same as "not transactional". In the latter case, it still runs in a transaction, it's just that the transaction ends immediately after your statement is finshed.
So, in both cases, during the run, a single select statement will be isolated at the READ COMMITTED level from the other transactions.
Now what this means for your READ COMMITTED transaction isolation level: perhaps surprisingly, not that much.
READ COMMITTED means that you may encounter non-repeatable reads: when running multiple select statements in the same transaction, it is possible that rows that you selected at a certain point in time are modified and comitted by another transaction. You will be able to see those changes when you re-execute the select statement later on in the same pending transaction. In autocommit mode, those 2 select statements would be executed in their own transaction. If another transaction would have modified and committed the rows you selected the first time, you would be able to see those changes just as well when you executed the statement the second time.