Should I commit or rollback a read transaction? - sql

I have a read query that I execute within a transaction so that I can specify the isolation level. Once the query is complete, what should I do?
Commit the transaction
Rollback the transaction
Do nothing (which will cause the transaction to be rolled back at the end of the using block)
What are the implications of doing each?
using (IDbConnection connection = ConnectionFactory.CreateConnection())
{
using (IDbTransaction transaction = connection.BeginTransaction(IsolationLevel.ReadUncommitted))
{
using (IDbCommand command = connection.CreateCommand())
{
command.Transaction = transaction;
command.CommandText = "SELECT * FROM SomeTable";
using (IDataReader reader = command.ExecuteReader())
{
// Read the results
}
}
// To commit, or not to commit?
}
}
EDIT: The question is not if a transaction should be used or if there are other ways to set the transaction level. The question is if it makes any difference that a transaction that does not modify anything is committed or rolled back. Is there a performance difference? Does it affect other connections? Any other differences?

You commit. Period. There's no other sensible alternative. If you started a transaction, you should close it. Committing releases any locks you may have had, and is equally sensible with ReadUncommitted or Serializable isolation levels. Relying on implicit rollback - while perhaps technically equivalent - is just poor form.
If that hasn't convinced you, just imagine the next guy who inserts an update statement in the middle of your code, and has to track down the implicit rollback that occurs and removes his data.

If you haven't changed anything, then you can use either a COMMIT or a ROLLBACK. Either one will release any read locks you have acquired and since you haven't made any other changes, they will be equivalent.

If you begin a transaction, then best practice is always to commit it. If an exception is thrown inside your use(transaction) block the transaction will be automatically rolled-back.

Consider nested transactions.
Most RDBMSes do not support nested transactions, or try to emulate them in a very limited way.
For example, in MS SQL Server, a rollback in an inner transaction (which is not a real transaction, MS SQL Server just counts transaction levels!) will rollback the everything which has happened in the outmost transaction (which is the real transaction).
Some database wrappers might consider a rollback in an inner transaction as an sign that an error has occured and rollback everything in the outmost transaction, regardless whether the outmost transaction commited or rolled back.
So a COMMIT is the safe way, when you cannot rule out that your component is used by some software module.
Please note that this is a general answer to the question. The code example cleverly works around the issue with an outer transaction by opening a new database connection.
Regarding performance: depending on the isolation level, SELECTs may require a varying degree of LOCKs and temporary data (snapshots). This is cleaned up when the transaction is closed. It does not matter whether this is done via COMMIT or ROLLBACK. There might be a insignificant difference in CPU time spent - a COMMIT is probably faster to parse than a ROLLBACK (two characters less) and other minor differences. Obviously, this is only true for read-only operations!
Totally not asked for: another programmer who might get to read the code might assume that a ROLLBACK implies an error condition.

IMHO it can make sense to wrap read only queries in transactions as (especially in Java) you can tell the transaction to be "read-only" which in turn the JDBC driver can consider optimizing the query (but does not have to, so nobody will prevent you from issuing an INSERT nevertheless). E.g. the Oracle driver will completely avoid table locks on queries in a transaction marked read-only, which gains a lot of performance on heavily read-driven applications.

ROLLBACK is mostly used in case of an error or exceptional circumstances, and COMMIT in the case of successful completion.
We should close transactions with COMMIT (for success) and ROLLBACK (for failure), even in the case of read-only transactions where it doesn't seem to matter. In fact it does matter, for consistency and future-proofing.
A read-only transaction can logically "fail" in many ways, for example:
a query does not return exactly one row as expected
a stored procedure raises an exception
data fetched is found to be inconsistent
user aborts the transaction because it's taking too long
deadlock or timeout
If COMMIT and ROLLBACK are used properly for a read-only transaction, it will continue to work as expected if DB write code is added at some point, e.g. for caching, auditing or statistics.
Implicit ROLLBACK should only be used for "fatal error" situations, when the application crashes or exits with an unrecoverable error, network failure, power failure, etc.

Just a side note, but you can also write that code like this:
using (IDbConnection connection = ConnectionFactory.CreateConnection())
using (IDbTransaction transaction = connection.BeginTransaction(IsolationLevel.ReadUncommitted))
using (IDbCommand command = connection.CreateCommand())
{
command.Transaction = transaction;
command.CommandText = "SELECT * FROM SomeTable";
using (IDataReader reader = command.ExecuteReader())
{
// Do something useful
}
// To commit, or not to commit?
}
And if you re-structure things just a little bit you might be able to move the using block for the IDataReader up to the top as well.

If you put the SQL into a stored procedure and add this above the query:
set transaction isolation level read uncommitted
then you don't have to jump through any hoops in the C# code. Setting the transaction isolation level in a stored procedure does not cause the setting to apply to all future uses of that connection (which is something you have to worry about with other settings since the connections are pooled). At the end of the stored procedure it just goes back to whatever the connection was initialized with.

Given that a READ does not change state, I would do nothing. Performing a commit will do nothing, except waste a cycle to send the request to the database. You haven't performed an operation that has changed state. Likewise for the rollback.
You should however, be sure to clean up your objects and close your connections to the database. Not closing your connections can lead to issues if this code gets called repeatedly.

If you set AutoCommit false, then YES.
In an experiment with JDBC(Postgresql driver), I found that if select query breaks(because of timeout), then you can not initiate new select query unless you rollback.

Do you need to block others from reading the same data? Why use a transaction?
#Joel - My question would be better phrased as "Why use a transaction on a read query?"
#Stefan - If you are going to use AdHoc SQL and not a stored proc, then just add the WITH (NOLOCK) after the tables in the query. This way you dont incur the overhead (albeit minimal) in the application and the database for a transaction.
SELECT * FROM SomeTable WITH (NOLOCK)
EDIT # Comment 3: Since you had "sqlserver" in the question tags, I had assumed MSSQLServer was the target product. Now that that point has been clarified, I have edited the tags to remove the specific product reference.
I am still not sure of why you want to make a transaction on a read op in the first place.

In your code sample, where you have
// Do something useful
Are you executing a SQL Statement that changes data ?
If not, there's no such thing as a "Read" Transaction... Only changes from an Insert, Update and Delete Statements (statements that can change data) are in a Transaction... What you are talking about is the locks that SQL Server puts on the data you are reading, because of OTHER transactions that affect that data. The level of these locks is dependant on the SQL Server Isolation Level.
But you cannot Commit, or ROll Back anything, if your SQL statement has not changed anything.
If you are changing data, then you can change the isolation level without explicitly starting a transation... Every individual SQL Statement is implicitly in a transaction. explicitly starting a Transaction is only necessary to ensure that 2 or more statements are within the same transaction.
If all you want to do is set the transaction isolation level, then just set a command's CommandText to "Set Transaction Isolation level Repeatable Read" (or whatever level you want), set the CommandType to CommandType.Text, and execute the command. (you can use Command.ExecuteNonQuery() )
NOTE: If you are doing MULTIPLE read statements, and want them all to "see" the same state of the database as the first one, then you need to set the isolation Level top Repeatable Read or Serializable...

Related

Do default Tranactions in postgresql provide any benefit when only the last statement is writing?

I just learned that in Postgresql the default transaction isolation level is "Read committed". I'm very used to MySQLs "REPEATABLE READ" isolation level. In postgresql by my understanding this means in a default transaction "two successive SELECT commands can see different data". With that in mind, is there any benefit to transactions when only the last statement in the transaction is writing?
The transaction does not prevent you from data changing between statements, the only benefit I see is rolling the transaction back on failure. But if only one writing statement exists at the end, then that would happen anyway.
To make a bit more clear what I'm referring to, lets take a generic simple sequence of (pseudo) queries to a table:
BEGIN TRANSACTION
SELECT userId FROM users WHERE username = "the provided username"
INSERT INTO activites (activity, user_fk) VALUES ("posted on SO", userId)
COMMIT
In this sequence and any general sequence of statments where only the last statement is writing, is there a benefit in postgresql to using a transaction with the default isolation level?
Bonus question, is there any overhead from it?
The difference between READ COMMITTED and REPEATABLE READ is that the former takes a new database snapshot for each statement, while the latter takes a snapshot only for the first SQL statement and uses that snapshot for the whole transaction. This implies that REPEATABLE READ actually performs better that READ COMMITTED, since it takes fewer snapshots.
The disadvantage of REPEATABLE READ is that you can get serialization errors. That does not affect your example, but if you had an UPDATE instead of an INSERT, it could be that the row you are trying to update has been modified by a concurrent transaction since the snapshot was taken. The serialization error that causes would mean that you have to repeat the transaction. Another disadvantage of REPEATABLE READ transactions is that a long-running read-only transaction can hinder the progress of VACUUM, which it wouldn't do in READ COMMITTED mode.
For read-only transactions or transactions like the one you are showing, REPEATABLE READ is often the better isolation level. The nice thing about READ COMMITTED is that you can get no serialization errors apart from deadlocks.
To explicitly answer your question: there is no advantage to running the statement from your example in a single transaction. You may as well use the default autocommit mode to run them in separate transactions.
Incidentally, the SQL standard decrees that the default transaction isolation level be SERIALIZABLE, but I don't know any database that implements that.

Postgres could not serialize access due to concurrent update

I have an issue with "could not serialize access due to concurrent update". I checked logs and I can clearly see that two transactions were trying to update a row at the same time.
my sql query
UPDATE sessionstore SET valid_until = %s WHERE sid = %s;
How can I tell postgres to "try" update row without throwing any exception?
There is a caveat here which has been mentioned in comments. You must be using REPEATABLE READ transaction isolation or higher. Why? That is not typically required unless you really have a specific reason.
Your problem will go away if you use standard READ COMMITTED. But still it’s better to use SKIP LOCKED to both avoid lock waits and redundant updates and wasted WAL traffic.
As of Postgres 9.5+, there is a much better way to handle this, which would be like this:
UPDATE sessionstore
SET valid_until = %s
WHERE sid = (
SELECT sid FROM sessionstore
WHERE sid = %s
FOR UPDATE SKIP LOCKED
);
The first transaction to acquire the lock in SELECT FOR UPDATE SKIP LOCKED will cause any conflicting transaction to select nothing, leading to a no-op. As requested, it will not throw an exception.
See SKIP LOCKED notes here:
https://www.postgresql.org/docs/current/static/sql-select.html
Also the advice about a savepoint is not specific enough. What if the update fails for a reason besides a serialization error? Like an actual deadlock? You don’t want to just silently ignore all errors. But these are also in general a bad idea - an exception handler or a savepoint for every row update is a lot of extra overhead especially if you have high traffic. That is why you should use READ COMMITTED and SKIP LOCKED both to simplify the matter, and any actual error then would be truly unexpected.
The canonical way to do that would be to set a checkpoint before the UPDATE:
SAVEPOINT x;
If the update fails,
ROLLBACK TO SAVEPOINT x;
and the transaction can continue.
The default isolation level is "read committed" unless you need to change it for any specific use case.
You must be using the "repeatable read" or "serializable" isolation level. Here, the current transaction will roll-back if the already running transaction updates the value which was also supposed to be updated by current transaction.
Though this scenario can be easily handled by the "read_commit" isolation level where the current transaction accepts an updated value from other transaction and perform its instructions after the previous transaction is committed
ALTER DATABASE SET DEFAULT_TRANSACTION_ISOLATION TO 'read committed';
Ref: https://www.postgresql.org/docs/9.5/transaction-iso.html

What is wrong with the below tsql code

BEGIN TRAN
SELECT * FROM AnySchema.AnyTable
WHERE AnyColumn = SomeCondition
COMMIT
I know the transaction is not required here because it is just a select but just want to know how bad a programming it is and whether it is going to be an overhead on the DB engine.
You may use transaction on SELECT statements to ensure nobody else could update / delete records of the table of while the bunch of your select queries are executing.
Using WITH(NOLOCK):
Anyways, you may also use WITH(NOLOCK) for t_sql
SELECT * FROM AnySchema.AnyTable WITH(NOLOCK)
WHERE AnyColumn = SomeCondition
WITH (NOLOCK) is the equivalent of using READ UNCOMMITED as a transaction isolation level. Here stand the risk of reading an uncommitted row that is subsequently rolled back, i.e. data that never made it into the database.
So, while it can prevent reads being deadlocked by other operations, it comes with a risk.
TRANSACTION Block :
Using TRANSACTION block will not cause much of extra DB overload but if you keep the same type practice on, and suppose , at any SQL block you forget (you / your developers may forget, right ?) to close the transaction, then other processes can't work on the same table.
Anyways, it depends on what type of application you are using. If very frequent update and select things are there , it is advised not to use such transaction blocks. If medium level of updates and select are there, occurs next to each other, you may use transaction blocks for select (but ensure to close the transaction).
That's actually a good question. To understand what's going on, you need to know about SET IMPLICIT_TRANSACTIONS.
When ON, the system is in implicit transaction mode. This means that if ##TRANCOUNT = 0, any of the following Transact-SQL statements begins a new transaction. It is equivalent to an unseen BEGIN TRANSACTION being executed first:
When OFF, each of the preceding T-SQL statements is bounded by an unseen BEGIN TRANSACTION and an unseen COMMIT TRANSACTION statement. When OFF, we say the transaction mode is autocommit. [this is the default]
If your T-SQL code visibly issues a BEGIN TRANSACTION, we say the transaction mode is explicit.
https://msdn.microsoft.com/en-us/library/ms187807.aspx
Since the SQL Server would have created a transaction for you, manually doing doesn't actually change anything. The exact same thing would have happened either way.
Summary: What you are doing isn't 'wrong' because it has no effect, but unnecessary and confusing to the reader.
I think you would be taking a holdlock and most likely a tablock
table hints
That is not always a good thing as you would block any update or deletes (maybe even inserts)
It would be better to let SQL decide what level of of locks to take. Most likely pagelocks. I would stay away from nolock as bad stuff can happen.
On a select on a single table just let the optimizer do it's thing.

batch procedure, when to commit transactions?

I'm pretty new to PL-SQL although I've got lots of db experience with other RDBMS's. Here's my current issue.
procedure CreateWorkUnit
is
update workunit
set workunitstatus = 2 --workunit loaded
where
SYSDATE between START_DATE and END_DATE
and workunitstatus = 1 --workunit created;
--commit here?
call loader; --loads records based on status, will have a commit of its own
update workunit wu
set workunititemcount = (select count(*) from workunititems wui where wui.wuid = wu.wuid)
where workunitstatus = 2
So the behaviour I'm seeing, with or without commit statements is that I have to execute twice. Once to flip the statuses, then the loader will run on the second execution. I'd like it all to run in one go.
I'd appreciate any words of oracle wisdom.
Thanks!
When to commit transactions in a batch procedure? It is a good question, although it only seems vaguely related to the problems with the code you post. But let's answer it anyway.
We need to commit when the PL/SQL procedure has completed a unit of work. A unit of work is a business transaction. This would normally be at the end of the program, the last statement before the EXCEPTION section.
Sometimes not even then. The decision to commit or rollback properly lies with the top of the calling stack. If our PL/SQL is being called from a client (may a user clicking a button on a screen) then perhaps the client should issue the commit.
But it is not unreasonable for a batch process to manage its own commit (and rollback in the case of errors). But the main point is that the only the toppermost procedure should issue COMMIT. If a procedure calls other procedures those called programs should not issue commits or rollbacks. If they should handle any errors (log etc) and re-raise them to the calling program. Let it decode whether to rollback. Because all the called procedures run in the same session and hence the same transaction: a rollback in a called program will revert all the changes in the batch process. That's not right. The same reasoning applies to commits.
You will sometimes read advice on using intermittent commits to break up long running processes into smaller units e.g. every 1000 inserts. This is bad advice for several reasons, not all of them related to transactions. The pertinent ones are:
Issuing a commit frees locks on resources. This is the cause of ORA-1555 Snapshot too old errors.
It also affects read consistency, which only applies at the statement and/or transaction level. This is the cause of ORA-1002 Fetch out of sequence errors.
It affects re-startability. If the program fails having processed 30% of the records, can we be confident it will only process the remaining 70% when we re-run the batch?
Once we commit records other sessions can see those changes: does it make sense for other users to see a partially changed view of the data?
So, the words of "Oracle wisdom" are: always align the database transaction with the business transaction, with a single commit per unit of work.
Somebody mentioned autonmous transactions as a way of issuing commits in sub-processes. This is usually a bad idea. Changes made in an autonomous transaction are visible to other sessions but not to our own. That very rarely makes sense. It also creates the same problems with re-startability which I discussed earlier.
The only acceptable use for automomous transactions is recording activity (error log, trace, audit records). We need that data to persist regardless of what happens in the wider transaction. Any other use of the pragma is almost certainly a workaround for a porr design, which actually just makes the problem worse.
You may not need to commit in pl/sql procedure. the procedures that you call inside another procedure will use same session so you don't need to commit. by the way procedure must completely rollback if it session rollbacked or has an exception.
I mis-classfied my problem. I thought this was a transaction problem and really it was one of my flags not being set as expected.A number field was null when I was expecting 0.
Sorry for that.
Josh Robinson

when commit will affect actually tables while procedure call?

I am working with ms sql with struts framework.
While calling procedure I put autocommit false in program.
when the procedure run I have to commit one seperate transaction and it must be affect the table externally
But it never be save until conn.commit() statement execute in program.
Is it any other way to commit the transaction in procedure itself, to affect the table on the end of the single transaction in procedure?
Pl. tell me if you know.
T.Saravanan
You should start and commit/rollback a transaction at the same level, otherwise you are introducing a lot of unpredictable paths - and frankly some bad design. So: if you need to commit at the server, use BEGIN TRAN / COMMIT TRAN in the TSQL to handle the transaction locally.
Note, though, that TSQL exception/error handling is not as rich as handling errors at a caller such as java/C#. If the problem is that you want to disassociate this work from another unrelated transaction, then it depends on how your calling code works:
if it is using connection-level transactions, then you will need to use a separate connection; just run the transaction on a different connection using the java/C#/whatever transaction API (i.e. the same as your existing code, by the sound of it, but on a different connection)
if it is using things like scope-based transactions (TransactionScope in C#; not sure about java etc - but this is an LTM or DTC transaction) then you can explicitly create a new scope that is bound to either a new (isolated) transaction, or the nil-transaction (i.e. the inner scope is not enlisted)
As for affecting the tables... SQL Server generally does optimistic changes, i.e. yes the changes are applied immediately (so that commit is cheap, and rollback is more expensive) - however, the isolation level will generally prevent other SPIDs from seeing the data. A competing SPID with a low isolation level (or using the NOLOCK hint) will see the uncommitted data, but this may be a phantom/non-repeatable read if the data eventually gets rolled back.