Do default Tranactions in postgresql provide any benefit when only the last statement is writing? - sql

I just learned that in Postgresql the default transaction isolation level is "Read committed". I'm very used to MySQLs "REPEATABLE READ" isolation level. In postgresql by my understanding this means in a default transaction "two successive SELECT commands can see different data". With that in mind, is there any benefit to transactions when only the last statement in the transaction is writing?
The transaction does not prevent you from data changing between statements, the only benefit I see is rolling the transaction back on failure. But if only one writing statement exists at the end, then that would happen anyway.
To make a bit more clear what I'm referring to, lets take a generic simple sequence of (pseudo) queries to a table:
BEGIN TRANSACTION
SELECT userId FROM users WHERE username = "the provided username"
INSERT INTO activites (activity, user_fk) VALUES ("posted on SO", userId)
COMMIT
In this sequence and any general sequence of statments where only the last statement is writing, is there a benefit in postgresql to using a transaction with the default isolation level?
Bonus question, is there any overhead from it?

The difference between READ COMMITTED and REPEATABLE READ is that the former takes a new database snapshot for each statement, while the latter takes a snapshot only for the first SQL statement and uses that snapshot for the whole transaction. This implies that REPEATABLE READ actually performs better that READ COMMITTED, since it takes fewer snapshots.
The disadvantage of REPEATABLE READ is that you can get serialization errors. That does not affect your example, but if you had an UPDATE instead of an INSERT, it could be that the row you are trying to update has been modified by a concurrent transaction since the snapshot was taken. The serialization error that causes would mean that you have to repeat the transaction. Another disadvantage of REPEATABLE READ transactions is that a long-running read-only transaction can hinder the progress of VACUUM, which it wouldn't do in READ COMMITTED mode.
For read-only transactions or transactions like the one you are showing, REPEATABLE READ is often the better isolation level. The nice thing about READ COMMITTED is that you can get no serialization errors apart from deadlocks.
To explicitly answer your question: there is no advantage to running the statement from your example in a single transaction. You may as well use the default autocommit mode to run them in separate transactions.
Incidentally, the SQL standard decrees that the default transaction isolation level be SERIALIZABLE, but I don't know any database that implements that.

Related

Avoid dirty/phantom reads while selecting from database

I have two tables A and B.
My transactions are like this:
Read -> read from table A
Write -> write in table B, write in table A
I want to avoid dirty/phantom reads since I have multiple nodes making request to same database.
Here is an example:
Transaction 1 - Update is happening on table B
Transaction 2 - Read is happening on table A
Transaction 1 - Update is happening on table A
Transaction 2 - Completed
Transaction 1 - Rollback
Now Transaction 2 client has dirty data. How should I avoid this?
If your database is not logged, there is nothing you can do. By choosing an unlogged database, those who set it up decided this sort of issue was not a problem. The only way to fix the problem here is change the database mode to logged, but that is not something you do casually on a whim — there are lots of ramifications to the change.
Assuming your database is logged — it doesn't matter here whether it is buffered logging or unbuffered logging or (mostly) a MODE ANSI database — then unless you set DIRTY READ isolation, you are running using at least COMMITTED READ isolation (it will be Informix's REPEATABLE READ level, standard SQL's SERIALIZABLE level, if the database is MODE ANSI).
If you want to ensure that data rows do not change after a transaction has read them, you need to run at a higher isolation — REPEATABLE READ. (See SET ISOLATION in the manual for the details. (Beware of the nomenclature for SET TRANSACTION; there's a section of the manual about Comparing SET ISOLATION and SET TRANSACTION and related sections.) The downside to using SET ISOLATION TO REPEATABLE READ (or SET TRANSACTION ISOLATION LEVEL SERIALIZABLE) is that the extra locks needed reduce concurrency — but give you the best guarantees about the state of the database.

What happens to Isolation Level in Stored Proc that fails

I have a SP that takes about 30 seconds to run. 10 seconds of this creating an initial temp table on which I operate eg.
SELECT * INTO #tmp FROM view_name ....(lots of joins and where conditions)
My understanding is that by default this long select (insert) can lock the table and therefore the application attached to the database?
My solution to this was to add
SET ISOLATION TRANSACTION LEVEL READ UNCOMMITTED;
to the top of the Stored Proc and then reset it to COMMITED at the end.
I guess this is a two part question firstly is my understanding of this correct and secondly if somewhere in my SP it fails with an error. Will the DB remain in UNCOMMITTED mode?
For the specific case of code executed from a stored procedure, the isolation level is reset to what it was before you called the stored procedure, regardless of what happens. From the docs:
If you issue SET TRANSACTION ISOLATION LEVEL in a stored procedure or
trigger, when the object returns control the isolation level is reset
to the level in effect when the object was invoked.
If you're not using a stored procedure, or you're dealing with code inside the stored procedure, the (much more complicated) story follows below.
Isolation level is a property of the session, not of the database. Whether or not a statement like SET TRANSACTION ISOLATION LEVEL executes after an error depends on whether the error aborts the entire batch or only the statement. The rules for this are complicated and unintuitive. To make things even more interesting, the isolation level of a session is not reset if it's associated with a pooled connection. If you want foolproof code that maintains an isolation level, preferably set the isolation level consistently from client code. If you must do it in T-SQL, it's a good idea to use SET XACT_ABORT ON and TRY .. CATCH, or the WITH (NOLOCK) table hint to restrict the locking options to a particular table or tables.
But why complicate your life? Consider if you really need to use READ UNCOMMITTED (or WITH (NOLOCK)), because you have no guarantee of data consistency anymore when you do, and even if you think you're OK with that, it can lead to results that aren't just a little bit wrong but completely useless. Unfortunately, this is true especially when the load on the database increases (which is why you might consider reducing locking in the first place).
If your query takes a long time while locking data, and that is a problem to applications (neither of which should be assumed, because SQL Server is good at locking data for only as long as necessary), your first instinct should be to see if the query itself can be optimized to reduce lock time (by adding indexes, most likely). If you've still got unacceptable locking left, consider using snapshot isolation. Only after those things have been evaluated should you consider dirty reads, and even then only for cases where consistency of the data is not your primary concern (for example, the query gets some monitoring statistics that are refreshed every so often anyway).

Deadlocks - Will this really help?

So I've got a query that keeps deadlocking on me. People who know the system well can't figure out why the sproc is deadlocking, but they tell me that I should just add this to it:
SET NOCOUNT ON
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
Is this really a valid solution? What does that do?
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
This will cause the system to return inconsitent data, including duplicate records and missing records. Read more at Previously committed rows might be missed if NOLOCK hint is used, or here at Timebomb - The Consistency problem with NOLOCK / READ UNCOMMITTED.
Deadlocks can be investigated and fixed, is not a big deal if you follow the proper procedure. Of course, throwing a dirty read may seem easier, but down the road you'll be sitting long hours staring at your general ledger and wondering why the heck it does not balance debits and credits. So read again until you really grok this: DIRTY READs ARE INCONSISTENT READS.
If you want a get-out-of-jail card, turn on snapshot isolation:
ALTER DATABASE MyDatabase
SET READ_COMMITTED_SNAPSHOT ON
But keep in mind that snapshot isolation does not fix the deadlocks, it only hides them. Proper investigation of the deadlock cause and fix is always the appropriate action.
NOCOUNT will keep your query from returning rowcounts back to the calling application (i.e. 1000000 rows affected).
TRANSACTION ISOLATION LEVEL READ UNCOMMITTED will allow for dirty reads as indicated here.
The isolation level may help, but do you want to allow dirty reads?
Randomly adding SET options to the query is unlikely to help I'm afraid
SET NOCOUNT ON
Will have no effect on the issue.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
will prevent your query taking out shared locks. As well as reading "dirty" data it also can lead to your query reading the same rows twice, or not at all, dependant upon what other concurrent activity is happening.
Whether this will resolve your deadlock issue depends upon the type of deadlock. It will have no effect at all if the issue is 2 writers deadlocking due to non linear ordering of lock requests. (transaction 1 updating row a, transaction 2 updating row b then tran 1 requesting a lock on b and tran 2 requesting a lock on a)
Can you post the offending query and deadlock graph? (if you are on SQL 2005 or later)
The best guide is:
http://technet.microsoft.com/es-es/library/ms173763.aspx
Snippet:
Specifies that statements can read rows that have been modified by other
transactions but not yet committed.
Transactions running at the READ
UNCOMMITTED level do not issue shared
locks to prevent other transactions
from modifying data read by the
current transaction. READ UNCOMMITTED
transactions are also not blocked by
exclusive locks that would prevent the
current transaction from reading rows
that have been modified but not
committed by other transactions. When
this option is set, it is possible to
read uncommitted modifications, which
are called dirty reads. Values in the
data can be changed and rows can
appear or disappear in the data set
before the end of the transaction.
This option has the same effect as
setting NOLOCK on all tables in all
SELECT statements in a transaction.
This is the least restrictive of the
isolation levels.
In SQL Server, you can also minimize
locking contention while protecting
transactions from dirty reads of
uncommitted data modifications using
either:
The READ COMMITTED isolation level
with the READ_COMMITTED_SNAPSHOT
database option set to ON. The
SNAPSHOT isolation level
.
On a different tack, there are two other aspects to consider, that may help.
1) Indexes and the indexes used by the SQL. The indexing strategy used on the tables will affect how many rows are affected. If you make the data modifications using a unique index, you may reduce the chance of deadlocks.
One algorithm - of course it will not work it all cases. The use of NOLOCK is targeted rather than being global.
The "old" way:
UPDATE dbo.change_table
SET somecol = newval
WHERE non_unique_value = 'something'
The "new" way:
INSERT INTO #temp_table
SELECT uid FROM dbo.change_table WITH (NOLOCK)
WHERE non_unique_value = 'something'
UPDATE dbo.change_table
SET somecol = newval
FROM dbo.change_table c
INNER JOIN
#temp_table t
ON (c.uid = t.uid)
2) Transaction duration
The longer a transaction is open the more likely there may be contention. If there is a way to reduce the amount of time that records remain locked, you can reduce the chances of a deadlock occurring.
For example, perform as many SELECT statements (e.g. lookups) at the start of the code instead of performing an INSERT or UPDATE, then a lookup, then an INSERT, and then another lookup.
This is where one can use the NOLOCK hint for SELECTs on "static" tables that are not changing reducing the lock "footprint" of the code.

Would this prevent the row from being read during the transaction?

I remember an example where reads in a transaction then writing back the data is not safe because another transaction may read/write to it in the time between. So i would like to check the date and prevent the row from being modified or read until my transaction is finish. Would this do the trick? and are there any sql variants that this will not work on?
update tbl set id=id where date>expire_date and id=#id
Note: date>expire_date happens to be my condition. It could be anything. Would this prevent other transaction from reading the row until i commit or rollback?
In a lot of cases, your UPDATE statement will not prevent other transactions from reading the row.
ziang mentioned transaction isolation levels.
Depending on the isolation level, databases use different types of locking. At the highest level, locking can be divided into two categories:
- pessimistic,
- optimistic
MS SQL 2008, for example, has 6 isolation levels, 4 of them are pessimistic, 2 are optimistic. By default , it uses READ COMMITTED isolation level, which falls into the pessimistic category.
Oracle, on another note, uses optimistic locking by default.
The statement that will lock your record for writing is
SELECT * FROM TBL WITH UPDLOCK WHERE id=#id
From that point on, no other transaction will be able to update your record with id=#id
And only transactions running in isolation level READ UNCOMMITTED will be able to read it.
With the default transaction level, READ COMMITTED, no other thansaction will be able to read or write into this record until you either commit or roll back your entire transaction.
It depends on the transaction isolation level you set on your transaction control. There are 4 types of read
READ UNCOMMITTED: this allows the dirty read
READ COMMITTED
REPEATABLE READ
SERIALIZABLE
for more info, you can check msdn.
You should be able to do this in a normal select using a combination of
HOLDLOCK/ROWLOCK
It very well may work. Different platforms offer different services. For instance, in T-SQL, you can simply set the isolation level of the transaction and, as a result, force a lock to be obtained. I don't know what platform you are using so I cannot answer your question definitively.

Should I commit or rollback a read transaction?

I have a read query that I execute within a transaction so that I can specify the isolation level. Once the query is complete, what should I do?
Commit the transaction
Rollback the transaction
Do nothing (which will cause the transaction to be rolled back at the end of the using block)
What are the implications of doing each?
using (IDbConnection connection = ConnectionFactory.CreateConnection())
{
using (IDbTransaction transaction = connection.BeginTransaction(IsolationLevel.ReadUncommitted))
{
using (IDbCommand command = connection.CreateCommand())
{
command.Transaction = transaction;
command.CommandText = "SELECT * FROM SomeTable";
using (IDataReader reader = command.ExecuteReader())
{
// Read the results
}
}
// To commit, or not to commit?
}
}
EDIT: The question is not if a transaction should be used or if there are other ways to set the transaction level. The question is if it makes any difference that a transaction that does not modify anything is committed or rolled back. Is there a performance difference? Does it affect other connections? Any other differences?
You commit. Period. There's no other sensible alternative. If you started a transaction, you should close it. Committing releases any locks you may have had, and is equally sensible with ReadUncommitted or Serializable isolation levels. Relying on implicit rollback - while perhaps technically equivalent - is just poor form.
If that hasn't convinced you, just imagine the next guy who inserts an update statement in the middle of your code, and has to track down the implicit rollback that occurs and removes his data.
If you haven't changed anything, then you can use either a COMMIT or a ROLLBACK. Either one will release any read locks you have acquired and since you haven't made any other changes, they will be equivalent.
If you begin a transaction, then best practice is always to commit it. If an exception is thrown inside your use(transaction) block the transaction will be automatically rolled-back.
Consider nested transactions.
Most RDBMSes do not support nested transactions, or try to emulate them in a very limited way.
For example, in MS SQL Server, a rollback in an inner transaction (which is not a real transaction, MS SQL Server just counts transaction levels!) will rollback the everything which has happened in the outmost transaction (which is the real transaction).
Some database wrappers might consider a rollback in an inner transaction as an sign that an error has occured and rollback everything in the outmost transaction, regardless whether the outmost transaction commited or rolled back.
So a COMMIT is the safe way, when you cannot rule out that your component is used by some software module.
Please note that this is a general answer to the question. The code example cleverly works around the issue with an outer transaction by opening a new database connection.
Regarding performance: depending on the isolation level, SELECTs may require a varying degree of LOCKs and temporary data (snapshots). This is cleaned up when the transaction is closed. It does not matter whether this is done via COMMIT or ROLLBACK. There might be a insignificant difference in CPU time spent - a COMMIT is probably faster to parse than a ROLLBACK (two characters less) and other minor differences. Obviously, this is only true for read-only operations!
Totally not asked for: another programmer who might get to read the code might assume that a ROLLBACK implies an error condition.
IMHO it can make sense to wrap read only queries in transactions as (especially in Java) you can tell the transaction to be "read-only" which in turn the JDBC driver can consider optimizing the query (but does not have to, so nobody will prevent you from issuing an INSERT nevertheless). E.g. the Oracle driver will completely avoid table locks on queries in a transaction marked read-only, which gains a lot of performance on heavily read-driven applications.
ROLLBACK is mostly used in case of an error or exceptional circumstances, and COMMIT in the case of successful completion.
We should close transactions with COMMIT (for success) and ROLLBACK (for failure), even in the case of read-only transactions where it doesn't seem to matter. In fact it does matter, for consistency and future-proofing.
A read-only transaction can logically "fail" in many ways, for example:
a query does not return exactly one row as expected
a stored procedure raises an exception
data fetched is found to be inconsistent
user aborts the transaction because it's taking too long
deadlock or timeout
If COMMIT and ROLLBACK are used properly for a read-only transaction, it will continue to work as expected if DB write code is added at some point, e.g. for caching, auditing or statistics.
Implicit ROLLBACK should only be used for "fatal error" situations, when the application crashes or exits with an unrecoverable error, network failure, power failure, etc.
Just a side note, but you can also write that code like this:
using (IDbConnection connection = ConnectionFactory.CreateConnection())
using (IDbTransaction transaction = connection.BeginTransaction(IsolationLevel.ReadUncommitted))
using (IDbCommand command = connection.CreateCommand())
{
command.Transaction = transaction;
command.CommandText = "SELECT * FROM SomeTable";
using (IDataReader reader = command.ExecuteReader())
{
// Do something useful
}
// To commit, or not to commit?
}
And if you re-structure things just a little bit you might be able to move the using block for the IDataReader up to the top as well.
If you put the SQL into a stored procedure and add this above the query:
set transaction isolation level read uncommitted
then you don't have to jump through any hoops in the C# code. Setting the transaction isolation level in a stored procedure does not cause the setting to apply to all future uses of that connection (which is something you have to worry about with other settings since the connections are pooled). At the end of the stored procedure it just goes back to whatever the connection was initialized with.
Given that a READ does not change state, I would do nothing. Performing a commit will do nothing, except waste a cycle to send the request to the database. You haven't performed an operation that has changed state. Likewise for the rollback.
You should however, be sure to clean up your objects and close your connections to the database. Not closing your connections can lead to issues if this code gets called repeatedly.
If you set AutoCommit false, then YES.
In an experiment with JDBC(Postgresql driver), I found that if select query breaks(because of timeout), then you can not initiate new select query unless you rollback.
Do you need to block others from reading the same data? Why use a transaction?
#Joel - My question would be better phrased as "Why use a transaction on a read query?"
#Stefan - If you are going to use AdHoc SQL and not a stored proc, then just add the WITH (NOLOCK) after the tables in the query. This way you dont incur the overhead (albeit minimal) in the application and the database for a transaction.
SELECT * FROM SomeTable WITH (NOLOCK)
EDIT # Comment 3: Since you had "sqlserver" in the question tags, I had assumed MSSQLServer was the target product. Now that that point has been clarified, I have edited the tags to remove the specific product reference.
I am still not sure of why you want to make a transaction on a read op in the first place.
In your code sample, where you have
// Do something useful
Are you executing a SQL Statement that changes data ?
If not, there's no such thing as a "Read" Transaction... Only changes from an Insert, Update and Delete Statements (statements that can change data) are in a Transaction... What you are talking about is the locks that SQL Server puts on the data you are reading, because of OTHER transactions that affect that data. The level of these locks is dependant on the SQL Server Isolation Level.
But you cannot Commit, or ROll Back anything, if your SQL statement has not changed anything.
If you are changing data, then you can change the isolation level without explicitly starting a transation... Every individual SQL Statement is implicitly in a transaction. explicitly starting a Transaction is only necessary to ensure that 2 or more statements are within the same transaction.
If all you want to do is set the transaction isolation level, then just set a command's CommandText to "Set Transaction Isolation level Repeatable Read" (or whatever level you want), set the CommandType to CommandType.Text, and execute the command. (you can use Command.ExecuteNonQuery() )
NOTE: If you are doing MULTIPLE read statements, and want them all to "see" the same state of the database as the first one, then you need to set the isolation Level top Repeatable Read or Serializable...