Serializable isolation level atomicity

Serializable isolation level atomicity - sql

I have several threads executing some SQL select queries with serializable isolation level. I am not sure which implementation to choose. This:
_repository.Select(...)
or this
lock (_lockObject)
{
_repository.Select(...);
}
In other words, is it possible several transactions will start executing at the same time and partially block records inside Select operation range.
P. S. I am using MySQL but I guess it is a more general question.

Transactions performing SELECT queries place a shared lock on the rows, permitting other transactions to read those rows, but preventing them from making changes to the rows (including inserting new records into the gaps)
Locking in the application is doing something else, it will not allow other threads to enter the code block which fetches the data from the repository, This approach can lead to very bad performance for a few reasons:
If any of the rows are locked by another transaction (outside the application) via a exclusive lock, the lock in the application will not help.
Multiple transactions will not be able to perform reads even on rows that are not locked in exclusive mode (not being updated).
The lock will not be released until all the data is fetched and returned to the client. This includes the network latency and any other overhead that it takes converting the MySql result set to a code object.
Most importantly, Enforcing data integrity & atomicity is the databases job, it knows how to handle it very well, how to detect potential deadlocks. When to perform record locks, and when to add Index gap locks. It is what databases are for, and MySql is ACID complaint and is proven to handle these situations
I suggest you read through Section 13.2.8. The InnoDB Transaction Model and Locking of the MySql docs, it will give you a great insight how locking in InnoDB is performed.

Related

Could an query with READ UNCOMMITTED isolation level cause locks on the tables it access?

My app needs to batch process 10M rows, the result of a complex SQL query that join tables.
I'll plan to be iterating a resultset, reading a hundred per iteration.
To run this on a busy OLTP production DB and avoid locks, I figured I'll query with a READ UNCOMMITTED isolation level.
Would that get the query out of the way of any DB writes? avoiding any rows/table locks?
My main concern is my query blocking any other DB activity, I'm far less concerned with the other way around.
Side Notes:
1. I'll be reading historical data, so I'm unlikely to meet uncommitted data. It's OK if I do.
2. The iteration process could take hours. The DB connection would remain open through this process.
3. I'll have two such concurrent batch instances at most.
4. I can tolerate dup rows. (by product of read uncommitted).
5. DB2 is the target DB, but I want a solution that fits other DBs vendors as well.
6. Will snapshot isolation level help me clear out server memory?

Have you actually encountered any real locks on read?
As far as I'm concerned, the only reason that READ UNCOMMITED existed in SQL standard was to allow non-locking reads. So I don't know DB2, but I blindly bet that it does not lock data during read in READ UNCOMMITED mode. Most modern RDBMS systems however don't lock data at all during read (*). So READ UNCOMMITED is either not available (in Oracle, for example) or is silently promoted to READ COMMITED (PostgreSQL).
If you can freely choose the engine, either check DB2 transaction isolation level handling or go for Oracle/PostgreSQL/other.
(*) More precisely, they don't exclusively lock the data. Some shared locks can be placed on queried tables so no DDL alters them during read.

My answer applies to SQL Server.
Read committed releases lock after every row read (approximately). Locking is probably not your problem.
I recommend you use the safer READ COMMITTED. Better yet, use snapshot isolation. That removes many locking problems. There are disadvantages as well, sou you better read a little about it.
My main concern is my query blocking any other DB activity
Snapshot isolation makes all locking concerns go away for read-only transactions. No blocking either way, full data consistency. Be aware that long-running transactions can cause TempDB to fill with snapshot versions.
The DB connection would remain open through this process.
That's a problem because a network hiccup, app deployment or mirroring failover would kill your batch process.
Be aware, that read uncommitted can cause queries to sporadically fail outright. You need retry logic or tolerate failed jobs.

In sql server Transaction isolation level Read uncommitted cause no lock on table.

Locking the database

Hi I'm trying to see what's locking the database and found 2 types of locking. Optimistic and Pessimistic Locking. I found some articles on Wiki but I would like to know more ! Can someone explain me about those locking ? We should only use locking when we need exclusive access to something? Locking only happens when we use transaction?
Thanks in advance.
Kevin

Optimistic locking is no locking at all.
It works by noting the state the system was in before you started making your changes, and then going ahead and just making those changes, assuming (optimistically) that no one else will want to make conflicting updates. Just as you are about to atomically commit those changes, you would check if in the mean-time someone else has also updated the same data. In which case, your commit fails.
Subversion for example using optimistic locking. When you try to commit, you have to handle any conflicts, but before that, you can do on your working copy whatever you want.
Pessimistic locks work with real locks. Assuming that there will be contention, you lock everything you want to update before touching it. Everyone else will have to wait for you to commit or rollback.
When using a relational database with transaction support, the database usually takes care of locking internally (such as when you issue an UPDATE statement), so for normal online processing you do not need to handle this yourself. Only if you want to do maintenance work or large batches do you sometimes want to lock down tables.
We should only use locking when we need exclusive access to something?
You need it to prevent conflicting operations from other sessions. In general, this means updates. Reading data can normally go on concurrently.
Locking only happens when we use transaction?
Yes. You will accumulate locks while proceeding with your transaction, releasing all of them at the end of it. Note that a single SQL command in auto-commit mode is still a transaction by itself.

Transactions isolation levels also specify the locking behaviour. BOL refers:Transaction isolation levels control:
Whether locks are taken when data is read, and what type of locks are requested.
How long the read locks are held.
Whether a read operation referencing rows modified by another transaction:
Blocks until the exclusive lock on the row is freed.
Retrieves the committed version of the row that existed at the time the statement or transaction started.
Reads the uncommitted data modification.
The default levels are:
Read uncommitted (the lowest level where transactions are isolated only enough to ensure that physically corrupt data is not read)
Read committed (Database Engine default level)
Repeatable read
Serializable (the highest level, where transactions are completely isolated from one another)

Regarding SQL Server Locking Mechanism

I would like to ask couple of questions regarding SQL Server Locking mechanism
If i am not using the lock hint with SQL Statement, SQL Server uses PAGELOCK hint by default. am I right??? If yes then why? may be its due to the factor of managing too many locks this is the only thing i took as drawback but please let me know if there are others. and also tell me if we can change this default behavior if its reasonable to do.
I am writing a server side application, a Sync Server (not using sync framework) and I have written database queries in C# code file and using ODBC connection to execute them. Now question is what is the best way to change the default locking from Page to Row keeping drawbacks in mind (e.g. adding lock hint in queries this is what i am planning for).
What if a sql query(SELECT/DML) is being executed without the scope of transaction and statement contains lock hint then what kind of lock will be acquired (e.g. shared, update, exclusive)? AND while in transaction scope does Isolation Level of transaction has impact on lock type if ROWLOCK hint is being used.
Lastly, If some could give me sample so i could test and experience all above scenarios my self (e.g. dot net code or sql script)
Thanks
Mubashar

No. It locks as it sees fit and escalates locks as needed
Let the DB engine manage it
See point 2
See point 2
I'd only use lock hints if you want specific and certain behaviours eg queues or non-blocking (dirty) reads.
More generally, why do you think the DB engine can't do what you want by default?

The default locking is row locks not page locks, although the way in which the locking mechanism works means you will be placing locks on all the objects within the hierarchy e.g. reading a single row will place a shared lock on the table, a shared lock on the page and then a shared lock on the row.
This enables an action requesting an exclusive lock on the table to know it may not take it yet, since there is a shared lock present (otherwise it would have to check every page / row for locks.)
If you issue too many locks for an individual query however, it performs lock escalation which reduces the granularity of the lock - so that is it managing less locks.
This can be turned off using a trace flag but I wouldn't consider it.
Until you know you actually have a locking / lock escalation issue you risk prematurely optimizing a non-existant problem.

Default SQL Server IsolationLevel Changes

we have a customer that's been experiencing some blocking issues with our database application. We asked them to run a Blocked Process Report trace and the trace they gave us shows blocking occurring between a SELECT and UPDATE operation. The trace files show the following:
The same SELECT query is being executed at different isolation levels. One trace shows a Serializable IsolationLevel while a later trace shows a RepeatableRead IsolationLevel. We do not use an explicit transaction while executing the query.
The UPDATE query is being executed with a RepeatableRead isolation level but is being blocked by the SELECT query. This is expected as our updates are wrapped in an explicit transaction with IsolationLevel of RepeatableRead.
So basically we're at a loss as to why the Isolation Level of the SELECT query would not be the default ReadCommitted IsolationLevel but, even more confusingly, why the IsolationLevel of the query would change over time? It is only one customer that is seeing this behaviour so we suspect it may be a database configuration issue.
Any ideas?
Thanks in advance,
Graham

In your scenario, I would recommend explicitly setting isolation level to snapshot - that will prevent read from getting in the way of writes (inserts and updates) by preventing locks, yet those read would still be "good" reads (i.e. not dirty data - it is not the same as a NOLOCK)
Generally i find that where i have locking issues with my queries, i manually control the lock applied. e.g. i would do updates with row-level locks to avoid page/table level locking, and set my reads to readpast (accepting that i may miss some data, in some scenarios that might be ok)
link|edit|delete|flag
EDIT-- Combining all the comments into the answer
As part of the optimisation process, sql server avoids getting commited reads on a page that it know hasn't changed, and automatically falls back to a lesser locking strategy. In your case, sql server drops from a serializable read to a repeatable read.
Q: Thanks for that useful info regarding dropping Isolation Levels. Can you think of any reason that it would use Serializable IsolationLevel in the first place, given that we don't use an explicit transaction for the SELECT - it was our understanding that the implicit transaction would use ReadCommitted?
A: By default, SQL Server will use Read Commmited if that is your default isolation level BUT if you do not additionally specify a locking strategy in your query, you are basically saying to sql server "do what you think is best, but my preference is Read Commited". Since SQL Server is free to choose, so it does in order to optimise the query. (The optimisation algorithm in sql server is very complex and i do not fully understand it myself). Not explicitly executing within a transaction does not, afaik, affect the isolation level that sql server uses.
Q: One last thing, does it seem reasonable that SQL Server would increase the Isolation Level (and presumably the number of locks required) to optimise the query? I'm also wondering whether the reuse of a pooled connection would affect this if it inherited the last used Isolation Level?
A: Sql server will do that as part of a process called "Lock Escalation". From http://support.microsoft.com/kb/323630, i quote: "Microsoft SQL Server dynamically determines when to perform lock escalation. When making this decision, SQL Server takes into account the number of locks that are held on a particular scan, the number of locks that are held by the whole transaction, and the memory that is being used for locks in the system as a whole. Typically, SQL Server's default behavior results in lock escalation occurring only at those points where it would improve performance or when you must reduce excessive system lock memory to a more reasonable level. However, some application or query designs may trigger lock escalation at a time when it is not desirable, and the escalated table lock may block other users".
Although lock escalation is not exactly the same thing as changing the isolation level a query runs under, this surprises me because i would not have expected sql server to take more locks than what the default isolation level permits.

More info regarding why SQL would take more locks by escalating: this is incorrect, escalating reduces (not increases) the number of locks required. A table lock is a single lock vs. all the page or row locks required to do the same from a lower level. Lock escalation is always done for one reason: it's more efficient to take a higher level lock than to lock all the lower-level objects
For example, perhaps there is no index available to lock efficiently against. I.e. if you take a count with UPDLOCK on all records with a year of 2010 in a field, and there is no index on that date field, this will require a row lock on each record in 2010, which is not efficient if many records are hit, and a page lock will not help either since they are presumably distributed randomly across pages, therefore SQL takes a table lock. Moreover, SQL MUST also lock other records from changing to being in the year 2010 while the UPDLOCK is held, and with no index on this field to do a range lock, SQL has NO CHOICE but to take a table lock to prevent this from happening. This latter point is one often missed by those new to optimization: the realization that SQL must also "protect" the integrity of the queries already executed in the transaction.

What are the problems of using transactions in a database?

From this post. One obvious problem is scalability/performance. What are the other problems that transactions use will provoke?
Could you say there are two sets of problems, one for long running transactions and one for short running ones? If yes, how would you define them?
EDIT: Deadlock is another problem, but data inconsistency might be worse, depending on the application domain. Assuming a transaction-worthy domain (banking, to use the canonical example), deadlock possibility is more like a cost to pay for ensuring data consistency, rather than a problem with transactions use, or you would disagree? If so, what other solutions would you use to ensure data consistency which are deadlock free?

It depends a lot on the transactional implementation inside your database and may also depend on the transaction isolation level you use. I'm assuming "repeatable read" or higher here. Holding transactions open for a long time (even ones which haven't modified anything) forces the database to hold on to deleted or updated rows of frequently-changing tables (just in case you decide to read them) which could otherwise be thrown away.
Also, rolling back transactions can be really expensive. I know that in MySQL's InnoDB engine, rolling back a big transaction can take FAR longer than committing it (we've seen a rollback take 30 minutes).
Another problem is to do with database connection state. In a distributed, fault-tolerant application, you can't ever really know what state a database connection is in. Stateful database connections can't be maintained easily as they could fail at any moment (the application needs to remember what it was in the middle of doing it and redo it). Stateless ones can just be reconnected and have the (atomic) command re-issued without (in most cases) breaking state.

You can get deadlocks even without using explicit transactions. For one thing, most relational databases will apply an implicit transaction to each statement you execute.
Deadlocks are fundamentally caused by acquiring multiple locks, and any activity that involves acquiring more than one lock can deadlock with any other activity that involves acquiring at least two of the same locks as the first activity. In a database transaction, some of the acquired locks may be held longer than they would otherwise be held -- to the end of the transaction, in fact. The longer locks are held, the greater the chance for a deadlock. This is why a longer-running transaction has a greater chance of deadlock than a shorter one.

One issue with transactions is that it's possible (unlikely, but possible) to get deadlocks in the DB. You do have to understand how your database works, locks, transacts, etc in order to debug these interesting/frustrating problems.
-Adam

I think the major issue is at the design level. At what level or levels within my application do I utilise transactions.
For example I could:
Create transactions within stored procedures,
Use the data access API (ADO.NET) to control transactions
Use some form of implicit rollback higher in the application
A distributed transaction in (via DTC / COM+).
Using more then one of these levels in the same application often seems to create performance and/or data integrity issues.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas