Combining code that relies on different transaction isolation levels in Postgres

Combining code that relies on different transaction isolation levels in Postgres - sql

I have two functions which both require a transaction. One is calling the other. I have code that can nest such transactions using SAVEPOINT into a single one.
If they have the same transaction isolation level there is no problem. Now, if they do not, is there still way I could 'correctly' combine the transactions?
What would be the risk, other than decreased performance, if I ran both transaction under the most restrictive isolation level of the two?

In this situation, yes, generally you can combine transaction into the more restrictive isolation level.
The risk is pretty much that higher isolation level is going to catch more serialisation errors (i.e. ERROR: could not serialize access due to concurrent update in REPEATABLE READ and ERROR: could not serialize access due to read/write dependencies among transactions in SERIALIZABLE). The typical way to handle these serialisation failures is to retry the transactions, but you should verify whether this makes sense within the context of your application.
Another possible error that might occur is dead locks. Postgres should detect these and break the dead lock (after which the failing transaction should retry), but if you can, you should always try to write your application so dead locks can't exists in the first place. Generally, the main technique to avoid dead lock is to make sure that all applications that acquires any locks (implicit or explicit locks) to acquire those locks in consistent order.
You may need to take special care if your application needs to make requests to another external service, as you may need to verify whether the retry are going to cause you to make unwanted duplicate requests, especially if these external requests are not idempotent.

Related

Concurrent issues in SQL Server

I have set of validations which decides record to be inserted into the database with valid status code, the issue we are facing is that many users are making requests at the same time and middle of one transaction another transaction comes and both are getting inserted with valid status, which it shouldn't. it should return an error that record already exists which can be easily handled by a simple query but at specific scenarios we are allowing them to insert duplicates, I have tried sp_getapplock which is solving my problem but it is compromising performance big time. Are there any optimal ways to handle concurrent requests?
Thanks.

sp_getapplock is pretty much the befiest and most arbitrary lock you can take. It functions more like the lock keyword does in OOO programming. Basically you name a resource, give it a scope (proc or transaction), then lock it. Pretty much nothing can bypass that lock, which is why it's solved your race conditions. It's also probably mad overkill for what you're trying to do.
The first code/architecture idea that comes to mind is to restructure this table. I'm going to assume you have high update volumes or you wouldn't be running into these violations. You could simply use a try/catch block, and have the catch block retry on a PK violation. Clumsy, but might just do the trick.
Next, you could consider altering the structure of the table which receives this stream of updates throughout the day. Make this table primary keyed off an identity column, and pretty much nothing else. Inserts will be lightning fast, so any blockage will be negligible. You can then move this data in batches into a table better suited for batch processing (as opposed to trying to batch-process in real time)
There are also a whole range of transaction isolation settings which adjust SQL's regular locking system to support different variants (whether at the batch level, or inline via query hints. I'd read up on those, but you might consider looking at Serialized isolation. Various settings will enforce different runtime rules to fit your needs.
Also be sure to check your transactions. You probably want to be locking the hell out of this table (and potentially during some other action) but once that need is gone, so should the lock.

How to know when a transaction scheme is serializable?

I'm studying SQL and need to know whether a certain transaction scheme is serializable. I understand the method of determining this is making a graph with the transactions as nodes and direction between the nodes and if the graph is cyclic then the scheme is not serializable. But what does it mean and what determines whether there is a directed edge in the graph from one transaction to the other? Is serialization in this case the same kind of serialization as writing objects to disk?
Thanks for any insight

Transaction serialization has nothing to do with object serialization. The serializable transaction isolation level, when fully implemented, ensures that the behavior of any set of concurrent serializable transactions is consistent with some serial (one-at-a-time) sequence of execution -- as though the transactions had been run one at a time. This means that if you can show that a database transaction will do the right thing when it is run alone, it will do the right thing in any mix of serializable transactions, or it will roll back with a serialization failure so that it can be retried from the start.
Serializable transaction isolation can be enforced in many ways. The most common scheme is strict two-phase locking (S2PL). This one is so common that you often see answers on SO which discuss things only in terms of this technique. There are also optimistic concurrency control (OCC), serializable snapshot isolation (SSI), and others.
PostgreSQL versions before 9.1, MS SQL Server in some configurations, and all versions of Oracle don't actually provide serializable transactions. They let you ask for them, but actually provide snapshot isolation. PostgreSQL versions starting with 9.1 use SSI when serializable transaction isolation is requested.
It's not possible to thoroughly discuss how any of these techniques work in an SO answer, but to summarize the techniques mentioned above:
Under S2PL every write within a transaction acquires a lock which cannot be shared with anything, and every read within the transaction acquires a lock which can be shared with other reads but can not be shared with a write. The read locks need to cover "gaps" in scanned indexes. Locks are held until the end of the transaction and released atomically with the work of the transaction becoming visible to other transactions. If the blocking creates a cycle, this is called a "deadlock", and one of the transactions involved in the cycle is rolled back.
Under OCC a transaction keeps track of what data it has used, without locking it. When transaction commit is requested, the transaction checks whether any other transaction modified any of its data and committed. If so, the commit request fails and the work is rolled back.
Under SSI writes block each other, but reads don't block writes and writes don't block reads. There is tracking of read-write dependencies to look for patterns of visibility which would create a cycle in the apparent order of execution. If a "dangerous structure" is found, which means that a cycle in the apparent order of execution is possible, one of the transactions involved in the possible cycle is rolled back. It is more like OCC than S2PL, but doesn't have as many rollbacks under higher contention.
Full disclosure: I teamed with Dan R.K. Ports of MIT to implement the new SSI-based serializable transactions in PostgreSQL 9.1.

Serialization means that transaction can be executed in a serial way, one after the other (nothing to do with object serialization), basically a transaction its serializable if regardless of the order these are interleaved the result will be as if they were executed in a serial way, if the graph its cyclic then it is not serializable and there is some risk of conflict, here is where your isolation level will help to decide wheter the transaction should be executed in a serial way, meaning first one and then the other or wheter it should try to execute it in an interleaved way hoping there is no conflicts.
Its not a complete answer but i hope this will help.

Will transactions stop other code from reading inconsistent data?

I have a stored procedure that inserts into several tables in a single transaction. I know transactions can maintain data consistency in non-concurrent situations by allowing rollbacks after errors, power failure, etc., but if other code selects from these tables before I commit the transaction, could it possibly select inconsistent data?
Basically, can you select uncommitted transactions?
If so, then how do people typically deal with this?

This depends on the ISOLATION LEVEL of the read query rather than the transaction. This can be set centrally on the connection or provided in the SELECT hint.
See:
Connection side: http://msdn.microsoft.com/en-us/library/system.data.isolationlevel.aspx
Database side: http://msdn.microsoft.com/en-us/library/ms173763.aspx

As already mentioned by Aliostad, this depends on the selected isolation level. The Wikipedia article has examples of the different common scenarios.
So yes, you can choose to get uncommitted data, but only by choice. I never did that and I have to admit that the idea seems a bit ... dangerous to me. But there are probably reasonable use cases.

Extending Aliostad's answer:
By default, other reading processes won't read data that is being changed (uncommitted, aka "dirty reads"). This applies to all clients and drivers
You have to override this default deliberately with the NOLOCK hint or changing isolation level to allow "dirty reads".

SELECT FOR UPDATE for locked queries

I'm using MySql 5.x and in my environment, I have a table with the name CALLS.
Table CALLS has a column status which takes an enum {inprogress, completed}.
I want reads/updates of the table to be row-locked, so:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SET AUTOCOMMIT = 0;
SELECT amount from CALLS where callId=1213 FOR UPDATE;
COMMIT
Basically I'm doing a FOR UPDATE even in situations whereby I only need to read the amount and return. I find that this allow me to ensure that reads/updates are prevented from interfering from each other. However I've been told this will reduce the concurrency of the app.
Is there anyway to achieve the same transaction consistency without incurring locking overheads ? Thanks.

Disclaimer: MySQL is generally full of surprises, so the following could be untrue.
What you are doing doesn't make any sense to me: You are committing after the SELECT, which should break the lock. So in my opinion, your code shouldn't really incur any significant overhead; but it doesn't give you any consistency improvements, either.
In general, SELECT FOR UPDATE can be a very sound and reasonable way to ensure consistency without taking more locks than are really needed. But of course, it should only be used when needed. Maybe you should have different code paths: One (using FOR UPDATE) used when the retrieved value is used in a subsequent change-operation. And another one (not using FOR UPDATE) used when the value doesn't have to be protected from changes.

What you've implemented there--in case you weren't familiar with it--is called pessimistic locking. You're sacrificing performance for consistency, which is sometimes a valid choice. In my professional experience, I've found pessimistic locking to be far more of a hindrance than a help.
For one thing, it can lead to deadlock.
The (better imho) alternative is optimistic locking, where you make the assumption that collisions occur infrequently and you simply deal with them when they happen. You're doing your owrk in a transaction so a collision shouldn't leave your data in an inconsistent state.
Here's more information on optimistic locking in a Java sense but the ideas are applicable to anything.

Zero SQL deadlock by design - any coding patterns?

I am encountering very infrequent yet annoying SQL deadlocks on a .NET 2.0 webapp running on top of MS SQL Server 2005. In the past, we have been dealing with the SQL deadlocks in the very empirical way - basically tweaking the queries until it work.
Yet, I found this approach very unsatisfactory: time consuming and unreliable. I would highly prefer to follow deterministic query patterns that would ensure by design that no SQL deadlock will be encountered - ever.
For example, in C# multithreaded programming, a simple design rule such as the locks must be taken following their lexicographical order ensures that no deadlock will ever happen.
Are there any SQL coding patterns guaranteed to be deadlock-proof?

Writing deadlock-proof code is really hard. Even when you access the tables in the same order you may still get deadlocks [1]. I wrote a post on my blog that elaborates through some approaches that will help you avoid and resolve deadlock situations.
If you want to ensure two statements/transactions will never deadlock you may be able to achieve it by observing which locks each statement consumes using the sp_lock system stored procedure. To do this you have to either be very fast or use an open transaction with a holdlock hint.
Notes:
Any SELECT statement that needs more than one lock at once can deadlock against an intelligently designed transaction which grabs the locks in reverse order.

Zero deadlocks is basically an incredibly costly problem in the general case because you must know all the tables/obj that you're going to read and modify for every running transaction (this includes SELECTs). The general philosophy is called ordered strict two-phase locking (not to be confused with two-phase commit) (http://en.wikipedia.org/wiki/Two_phase_locking ; even 2PL does not guarantee no deadlocks)
Very few DBMS actually implement strict 2PL because of the massive performance hit such a thing causes (there are no free lunches) while all your transactions wait around for even simple SELECT statements to be executed.
Anyway, if this is something you're really interested in, take a look at SET ISOLATION LEVEL in SQL Server. You can tweak that as necessary. http://en.wikipedia.org/wiki/Isolation_level
For more info, see wikipedia on Serializability: http://en.wikipedia.org/wiki/Serializability
That said -- a great analogy is like source code revisions: check in early and often. Keep your transactions small (in # of SQL statements, # of rows modified) and quick (wall clock time helps avoid collisions with others). It may be nice and tidy to do a LOT of things in a single transaction -- and in general I agree with that philosophy -- but if you're experiencing a lot of deadlocks, you may break the trans up into smaller ones and then check their status in the application as you move along. TRAN 1 - OK Y/N? If Y, send TRAN 2 - OK Y/N? etc. etc
As an aside, in my many years of being a DBA and also a developer (of multiuser DB apps measuring thousands of concurrent users) I have never found deadlocks to be such a massive problem that I needed special cognizance of it (or to change isolation levels willy-nilly, etc).

There is no magic general purpose solution to this problem that work in practice. You can push concurrency to the application but this can be very complex especially if you need to coordinate with other programs running in separate memory spaces.
General answers to reduce deadlock opportunities:
Basic query optimization (proper index use) hotspot avoidanant design, hold transactions for shortest possible times...etc.
When possible set reasonable query timeouts so that if a deadlock should occur it is self-clearing after the timeout period expires.
Deadlocks in MSSQL are often due to its default read concurrency model so its very important not to depend on it - assume Oracle style MVCC in all designs. Use snapshot isolation or if possible the READ UNCOMMITED isolation level.

I believe the following useful read/write pattern is dead lock proof given some constraints:
Constraints:
One table
An index or PK is used for read/write so engine does not resort to table locks.
A batch of records can be read using a single SQL where clause.
Using SQL Server terminology.
Write Cycle:
All writes within a single "Read Committed" transaction.
The first update in the transaction is to a specific, always-present record
within each update group.
Multiple records may then be written in any order. (They are "protected"
by the write to the first record).
Read Cycle:
The default read committed transaction level
No transaction
Read records as a single select statement.
Benefits:
Secondary write cycles are blocked at the write of first record until the first write transaction completes entirely.
Reads are blocked/queued/executed atomically between the write commits.
Achieve transaction level consistency w/o resorting to "Serializable".
I need this to work too so please comment/correct!!

As you said, always access tables in the same order is a very good way to avoid deadlocks. Furthermore, shorten your transactions as much as possible.
Another cool trick is to combine 2 sql statements in one whenever you can. Single statements are always transactional. For example use "UPDATE ... SELECT" or "INSERT ... SELECT", use "##ERROR" and "##ROWCOUNT" instead of "SELECT COUNT" or "IF (EXISTS ...)"
Lastly, make sure that your calling code can handle deadlocks by reposting the query a configurable amount of times. Sometimes it just happens, it's normal behaviour and your application must be able to deal with it.

In addition to consistent sequence of lock acquisition - another path is explicit use of locking and isolation hints to reduce time/resources wasted unintentionally acquiring locks such as shared-intent during read.

Something that none has mentioned (surprisingly), is that where SQL server is concerned many locking problems can be eliminated with the right set of covering indexes for a DB's query workload. Why? Because it can greatly reduce the number of bookmark lookups into a table's clustered index (assuming it's not a heap), thus reducing contention and locking.

If you have enough design control over your app, restrict your updates / inserts to specific stored procedures and remove update / insert privileges from the database roles used by the app (only explicitly allow updates through those stored procedures).
Isolate your database connections to a specific class in your app (every connection must come from this class) and specify that "query only" connections set the isolation level to "dirty read" ... the equivalent to a (nolock) on every join.
That way you isolate the activities that can cause locks (to specific stored procedures) and take "simple reads" out of the "locking loop".

Quick answer is no, there is no guaranteed technique.
I don't see how you can make any application deadlock proof in general as a design principle if it has any non-trivial throughput. If you pre-emptively lock all the resources you could potentially need in a process in the same order even if you don't end up needing them, you risk the more costly issue where the second process is waiting to acquire the first lock it needs, and your availability is impacted. And as the number of resources in your system grows, even trivial processes have to lock them all in the same order to prevent deadlocks.
The best way to solve SQL deadlock problems, like most performance and availability problems is to look at the workload in the profiler and understand the behavior.

Not a direct answer to your question, but food for thought:
http://en.wikipedia.org/wiki/Dining_philosophers_problem
The "Dining philosophers problem" is an old thought experiment for examining the deadlock problem. Reading about it might help you find a solution to your particular circumstance.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas