I am trying to understand how latches work in databases. I am trying to build a concurrent btree with latch crabbing / coupling techniques. Lock coupling guarantees isolation of single latch operations (insert, deletes, and scans). But each SQL command may require multiple latches to be acquired. In between two latch operations for the same command, how is it guaranteed that there isn't another latch operation that is performed in between the the two btree operations from the first command?
It sounds like you're mixing the roles of locks and latches. Locks are a "high level" database feature which are associated with transactions, and latches are a "low level" feature that only needs to provide thread-local mutual exclusion. What a database calls a latch, most concurrency APIs call a lock.
A concurrent b-tree performs latch coupling, not lock coupling. An SQL command might acquire multiple locks, and the locks are held for the duration of the transaction. Latches are released as soon as possible, and they don't need to be tracked by the transaction.
Locks guard logical records, like rows, tables, etc. Latches only guard b-tree nodes. When thinking about b-tree concurrency, just think in terms of latches and don't think about locks or transactions.
When an SQL command performs an operation like an insert, it might first acquire a lock to prevent concurrent modifications to the row. Exactly how it does it, and what kind of lock it acquires varies by database. Note that lock acquisition doesn't need to interact with the b-tree at all. Locks can be managed by a separate hashtable.
When the insert writes into the b-tree, it latches the root node, finds the child node, and then latches the child node. Next, the child node is latched, and the parent node latch is released. This is latch coupling. Once the target child node is found, the record is inserted into it, and the latch is released. If the node needs to split, then this creates special problems because latch coupling cannot go in reverse (child to parent) without risking deadlocks. The b-link tree design solves this problem.
As long as all b-tree operations (including reads) follow the proper root-to-child latch coupling strategy, multiple threads can safely interact with the b-tree. For improved concurrency, a mix of shared/exclusive latches is required. Read-only operations can use shared latches, but write operations generally require exclusive latches. Shared latches can be used for write operations too, but only when searching through the parent nodes. Splits and merges require special attention. Again, b-link trees solve this problem.
For the sake of this answer, I'm going to assume that you're using a lock-based concurrency control (LCC) protocol. This is a common protocol in databases, and it's the one that I'm most familiar with.
Lock Types
In LCC, there are two types of locks: shared locks and exclusive locks. A shared lock allows concurrent access to a resource, while an exclusive lock requires exclusive access to a resource. For example, if you have a shared lock on a database table, you can perform a SELECT query on that table, but you can't perform an INSERT query on that table. If you have an exclusive lock on a database table, you can perform an INSERT query on that table, but you can't perform a SELECT query on that table.
Transaction Types
In LCC, there are two types of transactions: read-only and read-write. A read-only transaction is one that only performs SELECT queries. A read-write transaction is one that performs SELECT queries and INSERT, UPDATE, or DELETE queries. In LCC, read-only transactions are allowed to acquire shared locks on resources, while read-write transactions are allowed to acquire both shared and exclusive locks on resources.
Lock Acquisition
In LCC, a transaction is allowed to acquire a lock on a resource if and only if the transaction doesn't already have a lock on that resource. So, for example, if you have a read-only transaction that has a shared lock on a database table, you can't acquire a second shared lock on that table. Similarly, if you have a read-write transaction that has an exclusive lock on a database table, you can't acquire a second shared lock on that table.
Lock Release
In LCC, a transaction is allowed to release a lock on a resource if and only if the transaction doesn't have any other locks on that resource. So, for example, if you have a read-only transaction that has a shared lock on a database table, you can't release that shared lock on that table if you still have a shared lock on some other database table. Similarly, if you have a read-write transaction that has an exclusive lock on a database table, you can't release that exclusive lock on that table if you still have an exclusive lock on some other database table.
I hope that this helps.
Related
Will a bulk insert on a master DB (only user is one job that writes to it) that takes 20 seconds cause the read replicas to also get locked for 20 seconds while the change gets propagated to them?
The only locks that get replicated with streaming replication (I guess that's what you are talking about) are ACCESS EXCLUSIVE locks, which are taken by LOCK TABLE, DROP,TRUNCATE and similar statements, but also (and that's the most frequent case) by vacuum truncation (removing empty blocks at the end of a table).
These locks will block activity on the standby. Everything else will not interfere with the activity on the standby.
The activity on the standby is limited to reading, and reads do not conflict with data modification: The documentation shows that the ACCESS SHARE lock taken by a SELECT on the table doesn't conflict with the ROW EXCLUSIVE lock takes by data modifications, so that latter lock doesn't even have to be replicated. And since SELECT doesn't take row locks at all, you can never be blocked that way.
I need to write a sql select statement without creating any locks in tables but read committed records.
Can someone help please...
Reading/Selecting data under default transaction isolation level doesn't lock the table, but it obtains something called Shared Locks on the resources, It means multiple users can read the same rows all obtaining Shared Locks on the resources.
And when a user modifies a row it obtains an Exclusive Lock on the resources. Exclusive lock means no one else can access the data while its being modified. It is exclusively locked by that user.
Therefore moral of the story is stick to Default Transaction Isolation Level Read Committed and it will obtain a lock (shared lock) on the row before retrieving it, to avoid Dirty reads.
Otherwise less strict isolation level read uncommitted does not obtain any locks and will result in dirty reads.
You can turn on the READ_COMMITTED_SNAPSHOT database option. With that option on, row versioning instead of locking is used to provide the default READ_COMMITTED isolation behavior.
There is some cost when this option is enabled. There is an additional 14-bytes for each row plus the overhead of maintaining the row version store in tempdb. However, the overhead can be more than offset by concurrency improvements depending on your workload. You also need to make sure applications are not coded to rely on the default locking behavior.
Hi I'm trying to see what's locking the database and found 2 types of locking. Optimistic and Pessimistic Locking. I found some articles on Wiki but I would like to know more ! Can someone explain me about those locking ? We should only use locking when we need exclusive access to something? Locking only happens when we use transaction?
Thanks in advance.
Kevin
Optimistic locking is no locking at all.
It works by noting the state the system was in before you started making your changes, and then going ahead and just making those changes, assuming (optimistically) that no one else will want to make conflicting updates. Just as you are about to atomically commit those changes, you would check if in the mean-time someone else has also updated the same data. In which case, your commit fails.
Subversion for example using optimistic locking. When you try to commit, you have to handle any conflicts, but before that, you can do on your working copy whatever you want.
Pessimistic locks work with real locks. Assuming that there will be contention, you lock everything you want to update before touching it. Everyone else will have to wait for you to commit or rollback.
When using a relational database with transaction support, the database usually takes care of locking internally (such as when you issue an UPDATE statement), so for normal online processing you do not need to handle this yourself. Only if you want to do maintenance work or large batches do you sometimes want to lock down tables.
We should only use locking when we need exclusive access to something?
You need it to prevent conflicting operations from other sessions. In general, this means updates. Reading data can normally go on concurrently.
Locking only happens when we use transaction?
Yes. You will accumulate locks while proceeding with your transaction, releasing all of them at the end of it. Note that a single SQL command in auto-commit mode is still a transaction by itself.
Transactions isolation levels also specify the locking behaviour. BOL refers:Transaction isolation levels control:
Whether locks are taken when data is read, and what type of locks are requested.
How long the read locks are held.
Whether a read operation referencing rows modified by another transaction:
Blocks until the exclusive lock on the row is freed.
Retrieves the committed version of the row that existed at the time the statement or transaction started.
Reads the uncommitted data modification.
The default levels are:
Read uncommitted (the lowest level where transactions are isolated only enough to ensure that physically corrupt data is not read)
Read committed (Database Engine default level)
Repeatable read
Serializable (the highest level, where transactions are completely isolated from one another)
I have several threads executing some SQL select queries with serializable isolation level. I am not sure which implementation to choose. This:
_repository.Select(...)
or this
lock (_lockObject)
{
_repository.Select(...);
}
In other words, is it possible several transactions will start executing at the same time and partially block records inside Select operation range.
P. S. I am using MySQL but I guess it is a more general question.
Transactions performing SELECT queries place a shared lock on the rows, permitting other transactions to read those rows, but preventing them from making changes to the rows (including inserting new records into the gaps)
Locking in the application is doing something else, it will not allow other threads to enter the code block which fetches the data from the repository, This approach can lead to very bad performance for a few reasons:
If any of the rows are locked by another transaction (outside the application) via a exclusive lock, the lock in the application will not help.
Multiple transactions will not be able to perform reads even on rows that are not locked in exclusive mode (not being updated).
The lock will not be released until all the data is fetched and returned to the client. This includes the network latency and any other overhead that it takes converting the MySql result set to a code object.
Most importantly, Enforcing data integrity & atomicity is the databases job, it knows how to handle it very well, how to detect potential deadlocks. When to perform record locks, and when to add Index gap locks. It is what databases are for, and MySql is ACID complaint and is proven to handle these situations
I suggest you read through Section 13.2.8. The InnoDB Transaction Model and Locking of the MySql docs, it will give you a great insight how locking in InnoDB is performed.
I'm taking an intro class on database management systems, and had a question that wasn't answered by my book. This question is not from my homework, I was just curious.
The textbook continually stresses that a transaction is one logical unit of work. However, when coming across the shared/exclusive locking modes, I got a little confused.
There was a diagram in the book that looked like this:
Time | Transaction Status
1 Request Lock
2 Receive Lock
3 Process transaction
4 Release Lock
5 Lock is released
Does the transaction get processed all at the same time, or does it get processed as individual locks are obtained?
If there are commands in two transactions that result in a shared lock as well as an exclusive lock, do those transactions run concurrently, or are they scheduled one after the other?
The answer is, as usual, "it depends" :-)
Generally speaking, you don't need to take out all your locks before you begin; however, you need to take out all your locks before you release any locks.
So you can do the following:
lock resource A
update A
lock resource B
update B
unlock A
unlock B
This allows you to be a bit friendlier to other transactions that may want to read B, and don't care about A, for example. It does introduce more risk -- you may be unable to acquire a lock on B, and decide to roll back your transaction. Them's the breaks.
You also want to always acquire all locks in the same order, so that you don't wind up in a deadlock (transaction 1 has A and wants B; trans 2 has B and wants A; standoff at high noon, no one wins. If you enforce consistent order, trans 2 will try to get A before B and either wait, letting trans 2 proceed, or fail, if trans 1 already started -- either way, no deadlock).
Things get more interesting when you have intent-to-exclude locks -- locks that are taken as shared with an "option" to make them exclusive. This might be covered somewhere in the back of your book :-)
In practice each operation aquires the needed lock before it proceeds. A SELECT will first aquire a shared lock on a row, then read the row. An UPDATE will first acquire an exclusive lock on that row, then update the row. In theory you can say that 'locks are aquired, then the transaction processes', but in real life is it each individual operation in the transaction that knows what locks are required.
If it needs an exclusive lock, it will either block the other transaction or it will wait for the other transaction to finish before obtaining the lock.
Things that need exclusive locks (UPDATE/DELETE/etc) can't happen while anything else is accessing the data.
in general locks are determined at run time. When the BEGIN TRANSACTION command is processed, nothing has run in the transaction yet, so there are no locks. As commands execute in the transaction locks are acquired.
"If there are commands in two transactions that result in a shared lock as well as an exclusive lock, do those transactions run concurrently, or are they scheduled one after the other?"
A lock does not consist solely of the notion "shared/exclusive". The most important thing about a lock is the resource that it applies to.
Two transactions that each hold an exclusive lock on distinct resources (say, two separate tables, or two separate partitions, or two separate pages, or two separate rows, or two separate printers, or two separate IP ports, ...) can continue to run concurrently without any problem.
Transaction serialization only becomes necessary when a transaction requests a lock on some resource, where the sharing mode of that lock is incompatible with a lock held on the same resource by some other transaction.
If your textbook really gives the sequence of events as you state, then throw it away. Lock requests emerge as the transaction is being processed, and there is no definitive and final way for the transaction processor to know at the start of the transaction which locks it will be needing (otherwise deadlocking would be a nonexistant problem).