Is there a difference between commit and rollback in a transaction only having selects?

Is there a difference between commit and rollback in a transaction only having selects? - sql

The in-house application framework we use at my company makes it necessary to put every SQL query into transactions, even though if I know that none of the commands will make changes in the database. At the end of the session, before closing the connection, I commit the transaction to close it properly. I wonder if there were any particular difference if I rolled it back, especially in terms of speed.
Please note that I am using Oracle, but I guess other databases have similar behaviour. Also, I can't do anything about the requirement to begin the transaction, that part of the codebase is out of my hands.

Databases often preserve either a before-image journal (what it was before the transaction) or an after-image journal (what it will be when the transaction completes.) If it keeps a before-image, that has to be restored on a rollback. If it keeps an after-image, that has to replace data in the event of a commit.
Oracle has both a journal and rollback space. The transaction journal accumulates blocks which are later written by DB writers. Since these are asychronous, almost nothing DB writer related has any impact on your transaction (if the queue fills up, then you might have to wait.)
Even for a query-only transaction, I'd be willing to bet that there's some little bit of transactional record-keeping in Oracle's rollback areas. I suspect that a rollback requires some work on Oracle's part before it determines there's nothing to actually roll back. And I think this is synchronous with your transaction. You can't really release any locks until the rollback is completed. [Yes, I know you aren't using any in your transaction, but the locking issue is why I think a rollback has to be fully released then all the locks can be released, then your rollback is finished.]
On the other hand, the commit is more-or-less the expected outcome, and I suspect that discarding the rollback area might be slightly faster. You created no transaction entries, so the db writer will never even wake up to check and discover that there was nothing to do.
I also expect that while commit may be faster, the differences will be minor. So minor, that you might not be able to even measure them in a side-by-side comparison.

I agree with the previous answers that there's no difference between COMMIT and ROLLBACK in this case. There might be a negligible difference in the CPU time needed to determine that there's nothing to COMMIT versus the CPU time needed to determine that there's nothing to ROLLBACK. But, if it's a negligible difference, we can safely forget about about it.
However, it's worth pointing out that there's a difference between a session that does a bunch of queries in the context of a single transaction and a session that does the same queries in the context of a series of transactions.
If a client starts a transaction, performs a query, performs a COMMITor ROLLBACK, then starts a second transaction and performs a second query, there's no guarantee that the second query will observe the same database state as the first query. Sometimes, maintaining a single consistent view of the data is of the essence. Sometimes, getting a more current view of the data is of the essence. It depends on what you are doing.
I know, I know, the OP didn't ask this question. But some readers may be asking it in the back of their minds.

In general a COMMIT is much faster than a ROLLBACK, but in the case where you have done nothing they are effectively the same.

The documentation states that:
Oracle recommends that you explicitly end every transaction in your application programs with a COMMIT or ROLLBACK statement, including the last transaction, before disconnecting from Oracle Database. If you do not explicitly commit the transaction and the program terminates abnormally, then the last uncommitted transaction is automatically rolled back. A normal exit from most Oracle utilities and tools causes the current transaction to be committed. A normal exit from an Oracle precompiler program does not commit the transaction and relies on Oracle Database to roll back the current transaction.
http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/statements_4010.htm#SQLRF01110
If you want o choose to do one or the other then you might as well do the one that is the same as doing nothing, and just commit it.

Well, we must take into account what an SELECT returns in Oracle. There are two modes. By default an SELECT returns data as that data looked in the very moment the SELECT statement started executing (this is default behavior in READ COMMITTED isolation mode, the default transactional mode). So if an UPDATE/INSERT was executed after SELECT was issued that won't be visible in result set.
This can be a problem if you need to compare two result sets (for example debta and credit sides of an general ledger app). For that we have a second mode. In that mode SELECT returns data as it looked at the moment the current transaction began (default behavior in READ ONLY and SERIALIZABLE isolation levels).
So, at least sometimes it is necessary to execute SELECTs in transaction.

Since you've not done any DML, I suspect there'd be no difference between a COMMIT and ROLLBACK in Oracle. Either way there's nothing to do.

I'd think a Commit would be more efficient; since generally you'd expect most DB transactions to be committed; so you would think the DB optimizes for this case (as opposed to trying to be more efficient for a rollback).

Related

Can I open an stoppable transaction with SQL Server?

I'm looking for something similiar to an SQL transaction. I need the usual protections that transactions provide, but I don't want it to slow down anyone else.
Imagine client A connects to the DB and runs these commands:
BEGIN TRAN
SELECT (something)
(Wait a few seconds maybe.)
UPDATE (something)
COMMIT
Inbetween the SELECT and the UPDATE, client B comes along and attempts to do a query, that under normal circumstances, would end up having to wait for A to COMMIT.
What I'd like is for client A to open it's transaction in such a way that should B come along and perform it's query, client A will find it's transaction immediately rolled back and it's subsequent commands failing. Client B would only experience minimal delay.
(Note that the SELECT and UPDATE are simply illustrative commands.)
Update...
I've got a high priority task (client B) that sometimes (once a month-ish) gets an SQL timeout error, and a low priority task (client A) with a transaction which causes that timeout. I'd rather that the low priority task fails and is reattempted in the next cycle.
I ended up fixing this problem by eliminating the transactions entirely and replacing them with an informal set of flags. The queries were refactored to only do something if the right set of flags are raised and I added something that cleared up abandoned records that the rollback would have cleared in the past.
I fixed my transaction issues by eliminating transactions.

Using SNAPSHOT isolation level will prevent B from blocking. B will see data in the state they were before A issued BEGIN TRANSACTION. Unless B modifies data, they will never block each other.

While not a transaction at all, Optimistic Concurrency may be useful -- it is used by default in LINQ2SQL, etc.
The general idea is that the data is read -- modifications can be independently made -- and then the data written back with a "check" (this is loosely comparable to a Compare and Swap). If the check fails it is up the application to decide what to do (restart the process, proceed anyway, fail).
This naturally doesn't work for all scenarios and may not detect a number of interactions, such as new items added between the "read" and "write". Both the actual read and write can be in separate transactions with the appropriate isolation level; the separate transactions may allow additional transactions to be interleaved.
Of course, depending upon the exact problem and interactions... different isolation levels and/or finer grained locking may be sufficient.
Happy coding.

That is back to front.
You can't have later clients aborting earlier transactions: that's chaos.
You can have snapshot isolation so that client B has a consistent view and isn't blocked (mostly) by client A. Also Wikipedia for more general stuff
Perhaps describe your problem more fully so we can offer suggestions for that...

One thing that I've seen used (but I'm afraid that I don't have any code handy for it) is having transaction A spawn another process which then monitors the transaction. If it sees any blocks caused by the transaction then it immediately issues a KILL to the spid.
If I can find the code for this then I'll add it here.

Locking the database

Hi I'm trying to see what's locking the database and found 2 types of locking. Optimistic and Pessimistic Locking. I found some articles on Wiki but I would like to know more ! Can someone explain me about those locking ? We should only use locking when we need exclusive access to something? Locking only happens when we use transaction?
Thanks in advance.
Kevin

Optimistic locking is no locking at all.
It works by noting the state the system was in before you started making your changes, and then going ahead and just making those changes, assuming (optimistically) that no one else will want to make conflicting updates. Just as you are about to atomically commit those changes, you would check if in the mean-time someone else has also updated the same data. In which case, your commit fails.
Subversion for example using optimistic locking. When you try to commit, you have to handle any conflicts, but before that, you can do on your working copy whatever you want.
Pessimistic locks work with real locks. Assuming that there will be contention, you lock everything you want to update before touching it. Everyone else will have to wait for you to commit or rollback.
When using a relational database with transaction support, the database usually takes care of locking internally (such as when you issue an UPDATE statement), so for normal online processing you do not need to handle this yourself. Only if you want to do maintenance work or large batches do you sometimes want to lock down tables.
We should only use locking when we need exclusive access to something?
You need it to prevent conflicting operations from other sessions. In general, this means updates. Reading data can normally go on concurrently.
Locking only happens when we use transaction?
Yes. You will accumulate locks while proceeding with your transaction, releasing all of them at the end of it. Note that a single SQL command in auto-commit mode is still a transaction by itself.

Transactions isolation levels also specify the locking behaviour. BOL refers:Transaction isolation levels control:
Whether locks are taken when data is read, and what type of locks are requested.
How long the read locks are held.
Whether a read operation referencing rows modified by another transaction:
Blocks until the exclusive lock on the row is freed.
Retrieves the committed version of the row that existed at the time the statement or transaction started.
Reads the uncommitted data modification.
The default levels are:
Read uncommitted (the lowest level where transactions are isolated only enough to ensure that physically corrupt data is not read)
Read committed (Database Engine default level)
Repeatable read
Serializable (the highest level, where transactions are completely isolated from one another)

Can a COMMIT statement (in SQL) ever fail? How?

When working with database transactions, what are the possible conditions (if any) that would cause the final COMMIT statement in a transaction to fail, presuming that all statements within the transaction already executed without issue?
For example... let's say you have some two-phase or three-phase commit protocol where you do a bunch of statements, then wait for some master process to tell you when it is ok to finally commit the transaction:
-- <initial handshaking stuff>
START TRANSACTION;
-- <Execute a bunch of SQL statements>
-- <Inform master of readiness to commit>
-- <Time passes... background transactions happening while we wait>
-- <Receive approval to commit from master (finally!)>
COMMIT;
If your code gets to that final COMMIT statement and sends it to your DBMS, can you ever get an error (uniqueness issue, database full, etc) at that statement? What errors? Why? How do they appear? Does it vary depending on what DBMS you run?

COMMIT may fail. You might have had sufficent resources to log all the changes you wished to make, but lack resources to actually implement the changes.
And that's not considering other reasons it might fail:
The change itself might not fit the constraints of the database.
Power loss stops things from completing.
The level of requested selection concurrency might disallow an update (cursors updating a modified table, for example).
The commit might time out or be on a connection which times out due to starvation issues.
The network connection between the client and the database may be lost.
And all the other "simple" reasons that aren't on the top of my head.

It is possible for some database engines to defer UNIQUE index constraint checking until COMMIT. Obviously if the constraint does not hold true at the time of commit then it will fail.

Sure.
In a multi-user environment, the COMMIT may fail because of changes by other users (e.g. your COMMIT would violate a referential constraint when applied to the now current database...).
Thomas

If you're using two-phase commit, then no. Everything that could go wrong is done in the prepare phase.
There could still be network outage, power less, cosmic rays, etc, during the commit, but even so, the transactions will have been written to permanent storage, and if a commit has been triggered, recovery processes should carry them through.
Hopefully.

Certainly, there could be a number of issues. The act of committing, in and of itself, must make some final, permanent entry to indicate that the transaction committed. If making that entry fails, then the transaction can't commit.
As Ignacio states, there can be deferred constraint checking (this could be any form of constraint, not just unique constraint, depending on the DBMS engine).
SQL Server Specific: flushing FILESTREAM data can be deferred until commit time. That could fail.

One very simple and often overlooked item: hardware failure. The commit can fail if the underlying server dies. This might be disk, cpu, memory, or even network related.
The transaction could fail if it never receives approval from the master (for any number of reasons).

No matter how wonderfully a system may be designed, there is going to be some possibility that a commit will get into a situation where it's impossible to know whether it succeeded or not. In some cases, it may not matter (e.g. if a hard drive holding the database turns into a pile of slag, it may be impossible to tell whether the commit succeeded or not before that occurred but it wouldn't really matter); in others cases, however, this could be a problem. Especially with distributed database systems, if a connection failure occurs at just the right time during a commit, it will be impossible for both sides to be certain of whether the other side is expecting a commit or a rollback.

With MySQL or MariaDB, when used with Galera clustering, COMMIT is when the other nodes in the cluster are checked. So, yes important errors can be discovered by COMMIT, and you must check for these errors.

AUTONOMOUS_TRANSACTION

I was thinking of using AUTONOMOUS_TRANSACTION Pragma for some logging in a batch process. Does anyone have any experience with this ? If so any pros and cons would be appreciated.

IMO Autonomous Transactions are particularly adapted to logging: they run independently from the main session, meaning you can write in a table, commit or rollback changes without affecting the main transaction.
They also add little overhead: if you run big statements and add an autonomous transaction between each statement the performance cost will be negligible.
There is also a side-effect that you may find interesting: since the autonomous transactions are in independant sessions from the calling transaction, you can follow the progression of your main process as it is running. You don't have to wait for the main transaction to finish: you can query the logging table as it is filled by the autonomous transactions.

Obviously, any logging done in an autonomous transaction will remain in the database even if the main transaction rolls back. For logging this is probably what you want, but it is important to remember that a log record saying "inserted row X into table Y" doesn't mean that that insert actually got committed.

What are the problems of using transactions in a database?

From this post. One obvious problem is scalability/performance. What are the other problems that transactions use will provoke?
Could you say there are two sets of problems, one for long running transactions and one for short running ones? If yes, how would you define them?
EDIT: Deadlock is another problem, but data inconsistency might be worse, depending on the application domain. Assuming a transaction-worthy domain (banking, to use the canonical example), deadlock possibility is more like a cost to pay for ensuring data consistency, rather than a problem with transactions use, or you would disagree? If so, what other solutions would you use to ensure data consistency which are deadlock free?

It depends a lot on the transactional implementation inside your database and may also depend on the transaction isolation level you use. I'm assuming "repeatable read" or higher here. Holding transactions open for a long time (even ones which haven't modified anything) forces the database to hold on to deleted or updated rows of frequently-changing tables (just in case you decide to read them) which could otherwise be thrown away.
Also, rolling back transactions can be really expensive. I know that in MySQL's InnoDB engine, rolling back a big transaction can take FAR longer than committing it (we've seen a rollback take 30 minutes).
Another problem is to do with database connection state. In a distributed, fault-tolerant application, you can't ever really know what state a database connection is in. Stateful database connections can't be maintained easily as they could fail at any moment (the application needs to remember what it was in the middle of doing it and redo it). Stateless ones can just be reconnected and have the (atomic) command re-issued without (in most cases) breaking state.

You can get deadlocks even without using explicit transactions. For one thing, most relational databases will apply an implicit transaction to each statement you execute.
Deadlocks are fundamentally caused by acquiring multiple locks, and any activity that involves acquiring more than one lock can deadlock with any other activity that involves acquiring at least two of the same locks as the first activity. In a database transaction, some of the acquired locks may be held longer than they would otherwise be held -- to the end of the transaction, in fact. The longer locks are held, the greater the chance for a deadlock. This is why a longer-running transaction has a greater chance of deadlock than a shorter one.

One issue with transactions is that it's possible (unlikely, but possible) to get deadlocks in the DB. You do have to understand how your database works, locks, transacts, etc in order to debug these interesting/frustrating problems.
-Adam

I think the major issue is at the design level. At what level or levels within my application do I utilise transactions.
For example I could:
Create transactions within stored procedures,
Use the data access API (ADO.NET) to control transactions
Use some form of implicit rollback higher in the application
A distributed transaction in (via DTC / COM+).
Using more then one of these levels in the same application often seems to create performance and/or data integrity issues.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas