Serializable Isolation Violation Errors in Amazon Redshift - sql

Recently we see a huge amount of errors that pertain to Serializable isolation violation on table we have some base tables that forms our core data and we extract values from these tables to run our business logic in Lambda's.
Scenario :
Lambda 1 : Runs every 15 mins and gets the latest data from the source (RDBMS) into Redshift which forms our base tables (Does a DELETE and INSERT)
Lambda 2 : Triggered after successful run of the above Lambda and this is where the business logic is written and is normal SELECT statements
Lambda 3 : Triggered every 15 mins and is also run using the base tables and has only SELECT statements
When the Lambda 1 is triggered for its next run at instances we see that it fails with the Serializable isolation violation error.
Based on most of the posts putting a LOCK on the table might solve the issue but will increase the wait time for the other queries to run longer than expected and due to the constraint of the Lambda it will timeout after 15 mins which is not ideal. And I did see posts that stated putting a LOCK didn't entirely solve it too so skeptical to use to.
So something that struck me is would creating a VIEW on top of the base table and use the view in all the SELECT statements would that help here, if someone has any insights on this would really be helpful.

So the issue is with the locks each transaction is creating and being unable to determine the correct order the locks need to be resolved in. See: https://aws.amazon.com/premiumsupport/knowledge-center/redshift-serializable-isolation/
Now your description doesn't have enough writes in flight as stated to see how you are getting this from one pass of the Lambdas runs. So either this description isn't complete (multiple Lambdas updating tables) or the issue is coming between runs of Lambdas. A possibility is that the transaction aren't being closed and the Lambda invocations are having locks that cannot be resolved. Do you have COMMITs closing transactions? More info is needed to know which.
You can inspect pg_locks between Lambdas to see what locks are left around. The XID is the transaction that has the lock. I'd guess you have many more open transactions than you expect. Are your Lambda sessions in autocommit mode? Are you updating tables from multiple sessions? Are you COMMITting your changes? Are you reusing scratch tables?
Adding an explicit LOCK to serialize things can work if you know why / what tables are causing the serialization issue. This is also not likely the best solution and a better approach will likely be apparent when the issue is understood.
Adding a view to the mix won't resolve the issue (though it might move it if it changes the timing of events).

Related

Suggestion to handle Deadlock

There are like 6 procedures which are called internally to get data from a transactional table and do aggregations on the retrieved data , formated as an XML and then send emails hourly.
During this process, a lot of logging in done and logs are also sent as email in an HTML format(in the same email).There is one procedure where a deadlock occurs and one section of the email is always missed or we have a deadlock occurence(LOGS). So in order to handle I am trying to use the READ_COMMITTED_SNAPSHOT in that particular procedure. Can anyone please suggest how if this has worked for them or else which is the best way to handle this kind of deadlock.
Can I do a retry of that particular procedure internally by checking the output is Null or not.
I cant let the other process fail as that is a transaction.But I need the HTML to show all the information without missing anything in the body.
EDIT: This occurs very rarely.But the frequency is increasing daily now.I am not able to understand as the procedure is just trying to read from the transactional table and make some calculations and format it into XML and the other transaction is writting to the transactional table. So how does a WRITE effect a READ
You need to fix the deadlock in order to resolve this.
A deadlock occurs when one process holds a resource that the other requires in order to proceed and vice-versa. You'll get a deadlock when you have two processes that acquire the same set of resources in different orders. For instance, If process P1 acquires resources in the following order:
Resource A
Resource B
And a competing process, P2, requires the same resources in a different order:
Resource B
Resource A
P1 starts and acquires exclusive access to Resource A.
P2 starts and acquires exclusive access to Resource B.
In order for each to continue, P1 needs access to Resource B and P2 needs access to Resource A.
Neither of them can acquire the resource they need, thus causing the deadlock.
This is different than blocking, where one process is simply waiting for another process to release the needed resource. Given sufficient time, the blocking will be resolved. In a deadlock, the blocking cannot be resolved.
The SQL Engine can (and does) detect the deadlock situation. It resolves it by selecting one process or the other as the deadlock victim and rolling back.
Fix the deadlock by identifying the problem and resolving it, not by simply retrying and hoping it goes through. SQL Trace may help you identify the problem. You may need a DBA to help you.
A simpler (less dangerous) approach would be to change the six procedures in question so that they do dirty reads (i.e., WITH(NOLOCK)). This should work even in a deadlock, although you might get garbage data.

Recover from SQL batch-abort errors inside a transaction? Alternative?

I'm looking for a way to continue execution of a transaction despite errors while inserting low-priority data. It seems like real nested transaction could be a solution, but they aren't supported by SQL Server 2005/2008. Another solution would be to have logic to decide if an error is critical or not, but it would seem that's not possible either.
Here's more detail on my scenario:
Data is periodicaly inserted in the database using ADO.NET/C#, and while some of it is vital, some could also be missing without problems. When the inserts are done, some computations are made on the data. (Both vital and non-vital) This whole process is inside a transaction so everything remains in synch.
Currently, transaction save points are used, and partial rollbacks are made on exceptions which occur during non-vital inserts. However, this doesn't work for "batch-abort" errors, which automaticly rollback the entire transaction. I understand some errors are critical, but things like failed casts are considered by SQL Server to be batch-abort errors. (Info on batch errors) I'm trying to prevent these errors from bringing down the whole insert if they occur on low priority data.
If what I'm describing isn't possible, I'm willing to consider any alternative way to achieve data integrity but allow the failure of the non-vital inserts.
Thanks for your help.
Unfortunately, can't be done as you describe (full support for nested transactions would be key here). Couple things I can think of that have been used to get around this in the past:
Best option would probably be to separate the commands into important/non-important commands that could be executed distinctly, naturally this would require that they not be order-dependent on each other
Could also use a messaging based approach (see Service Broker) where you would execute the primary commands inline and push the non-primary commands onto a queue for execution later/separately. The push to the queue would be transactional within the batch, but the execution of the command when you pop off the queue would be separate. This too would require they not be order-dependent on each other.
If order-dependent, you could use the messaging approach for everything, which would ensure order and could have separate messages per operation, then grouping them together (via conversation groups) would allow you to pull them off the queue in order as well and use separate transactions for each 'type' of operation (i.e. primary vs. non-primary). This would require some special coding on your part if all the grouped messages must be a single autonomous operation, but could be done.
I hesitate to even mention this option, because it is a terrible option, but for full disclosure I suppose you could consider it at your discretion if you think it fits (but it is definitely not an architecture that would apply to almost any scenario). You could use xp_cmdshell to call out to the command line and execute sqlcmd/osql for the non-critical tasks - this sqlcmd execution would be in a separate transaction from the module you are executing from, and simply ignoring the xp_cmdshell failure should allow the primary batch to continue.
Those are some ideas...
Can you do your import into a temporary location, using transactions only for the important parts. Once the temp location loaded, having absorbed any non-critical errors, you can copy the data into its final destination in a single transaction. Depends on the nature the work you are doing, but potentially a viable option.

Deadlock error in INSERT statement

We've got a web-based application. There are time-bound database operations (INSERTs and UPDATEs) in the application which take more time to complete, hence this particular flow has been changed into a Java Thread so it will not wait (block) for the complete database operation to be completed.
My problem is, if more than 1 user comes across this particular flow, I'm facing the following error thrown by PostgreSQL:
org.postgresql.util.PSQLException: ERROR: deadlock detected
Detail: Process 13560 waits for ShareLock on transaction 3147316424; blocked by process 13566.
Process 13566 waits for ShareLock on transaction 3147316408; blocked by process 13560.
The above error is consistently thrown in INSERT statements.
Additional Information:
1) I have PRIMARY KEY defined in this table.
2) There are FOREIGN KEY references in this table.
3) Separate database connection is passed to each Java Thread.
Technologies
Web Server: Tomcat v6.0.10
Java v1.6.0
Servlet
Database: PostgreSQL v8.2.3
Connection Management: pgpool II
One way to cope with deadlocks is to have a retry mechanism that waits for a random interval and tries to run the transaction again. The random interval is necessary so that the colliding transactions don't continuously keep bumping into each other, causing what is called a live lock - something even nastier to debug. Actually most complex applications will need such a retry mechanism sooner or later when they need to handle transaction serialization failures.
Of course if you are able to determine the cause of the deadlock it's usually much better to eliminate it or it will come back to bite you. For almost all cases, even when the deadlock condition is rare, the little bit of throughput and coding overhead to get the locks in deterministic order or get more coarse-grained locks is worth it to avoid the occasional large latency hit and the sudden performance cliff when scaling concurrency.
When you are consistently getting two INSERT statements deadlocking it's most likely an unique index insert order issue. Try for example the following in two psql command windows:
Thread A | Thread B
BEGIN; | BEGIN;
| INSERT uniq=1;
INSERT uniq=2; |
| INSERT uniq=2;
| block waiting for thread A to commit or rollback, to
| see if this is an unique key error.
INSERT uniq=1; |
blocks waiting |
for thread B, |
DEADLOCK |
V
Usually the best course of action to resolve this is to figure out the parent objects that guard all such transactions. Most applications have one or two of primary entities, such as users or accounts, that are good candidates for this. Then all you need is for every transaction to get the locks on the primary entity it touches via SELECT ... FOR UPDATE. Or if touches several, get locks on all of them but in the same order every time (order by primary key is a good choice).
What PostgreSQL does here is covered in the documentation on Explicit Locking. The example in the "Deadlocks" section shows what you're probably doing. The part you may not have expected is that when you UPDATE something, that acquires a lock on that row that continues until the transaction involved ends. If you have multiple clients all doing updates of more than one thing at once, you'll inevitably end up with deadlocks unless you go out of your way to prevent them.
If you have multiple things that take out implicit locks like UPDATE, you should wrap the whole sequence in BEGIN/COMMIT transaction blocks, and make sure you're consistent about the order they acquire locks (even the implicit ones like what UPDATE grabs) at everywhere. If you need to update something in table A then table B, and one part of the app does A then B while the other does B then A, you're going to deadlock one day. Two UPDATEs against the same table are similarly destined to fail unless you can enforce some ordering of the two that's repeatable among clients. Sorting by primary key once you have the set of records to update and always grabbing the "lower" one first is a common strategy.
It's less likely your INSERTs are to blame here, those are much harder to get into a deadlocked situation, unless you violate a primary key as Ants already described.
What you don't want to do is try and duplicate locking in your app, which is going to turn into a giant scalability and reliability mess (and will likely still result in database deadlocks). If you can't work around this within the confines of the standard database locking methods, consider using either the advisory lock facility or explicit LOCK TABLE to enforce what you need instead. That will save you a world of painful coding over trying to push all the locks onto the client side. If you have multiple updates against a table and can't enforce the order they happen in, you have no choice but to lock the whole table while you execute them; that's the only route that doesn't introduce a potential for deadlock.
Deadlock explained:
In a nutshell, what is happening is that a particular SQL statement (INSERT or other) is waiting on another statement to release a lock on a particular part of the database, before it can proceed. Until this lock is released, the first SQL statement, call it "statement A" will not allow itself to access this part of the database to do its job (= regular lock situation). But... statement A has also put a lock on another part of the database to ensure that no other users of the database access (for reading, or modifiying/deleting, depending on the type of lock). Now... the second SQL statement, is itself in need of accessing the data section marked by the lock of Statement A. That is a DEAD LOCK : both Statement will wait, ad infinitum, on one another.
The remedy...
This would require to know the specific SQL statement these various threads are running, and looking in there if there is a way to either:
a) removing some of the locks, or changing their types.
For example, maybe the whole table is locked, whereby only a given row, or
a page thereof would be necessary.
b) preventing multiple of these queries to be submitted at a given time.
This would be done by way of semaphores/locks (aka MUTEX) at the level of the
multi-threading logic.
Beware that the "b)" approach, if not correctly implemented may just move the deadlock situation from within SQL to within the program/threads logic. The key would be to only create one mutex to be obtained first by any thread which is about to run one of these deadlock-prone queries.
Your problem, probably, is the insert command is trying to lock one or both index and the indexes is locked for the other tread.
One common mistake is lock resources in different order on each thread. Check the orders and try to lock the resources in the same order in all threads.

How can I get dead lock in this situation?

In my client application I have a method like this (in practice it's more complex, but I've left the main part):
public void btnUpdate_Click(...)
{
...
dataAdapter.Update(...);
...
dataAdapter.Fill(...); // here I got exception one time
}
The exception I found in logs says "Deadlock found when trying to get lock; try restarting transaction". I met this exception only time, so it wasn't repeated.
As I understand, DataAdapter.Fill() method executes only select query. I don't make an explicit transaction and I have autocommit enabled.
So how can I get dead lock on a simple select query which is not a part of bigger transaction?
As I understand, to get a dead lock, two transactions should wait for each other. How is that possible with a single select not inside a transaction? Maybe it's a bug in MySql?
Thank you in advance.
You are right it takes two transactions to make a deadlock. That is to say, No statement or statements within a single transaction can deadlock with other statements within the same transaction.
But it only take one transaction to notice a report of a deadlock. How do you know that the transaction you are seeing the deadlock reported in is the only transaction being executed in the database? Isn't there other activity going on in this database?
Also. your statement "I don't make an explicit transaction", and "... which is not a part of bigger transaction" implies that you do not understand that every SQL statement executed is always in an implicit transaction, even if you do not explicitly start one.
Most databases have reporting mechanisms specifically designed to track, report and/or log instances of deadlocks for diagnostic purposes. In SQL server there is a trace flag that causes a log entry with much detail about each deadlock that occurs, including details about each of the two transactions involved, like what sql statements were being executed, what objects in the database were being locked, and why the lock could not be obtained. I'd guess mySQL has similar disgnostic tool. Find out what it is and turn it on so that the next time this occurs you can look and find out exactly what happened.
You can deadlock a simple SELECT against other statements, like an UPDATE. On my blog I have an example explaining a deadlock between two well tunned statements: Read/Write deadlock. While the example is SQL Server specific, the principle is generic. I don't have enough knowledge of MySQL to claim this is necessarily the case or not, specially in the light of various engines MySQL can deploy, but none the less a simple SELECT can be the victim of a deadlock.
I haven't looked into how MySQL transaction works, but this is based on how MSSQL transactions work:
If you are not using a transaction, each query has a transaction by itself. Otherwise you would get a mess every time an update failed in the middle.
The reason for the deadlock might be lock escalation. The database tries to lock as little as possible for each query, so it starts out by locking only the single rows affected. When most of the rows in a page is locked by the query it might decide that escalating the lock into locking the entire page would be better, which may have the side effect of locking some rows not otherwise affected by the query.
If a select query and an update query are trying to escalate locks on the same table, they may cause a deadlock eventhough only a single table is involved.
I agree that in this particular issue this is unlikely to be the issue but this is supplemental to the other answers in terms of limiting their scope, recorded for posterity in case someone finds it useful.
MySQL can in rare cases have single statements periodically deadlock against themselves. This seems to happen particularly on bulk inserts and the issues are almost certainly a deadlock between different threads relating to the operation. I would expect bulk updates to have the same problem. In the past when faced with this sort of issue I have generally just cut down on the number of rows being inserted (or updated) in a single statement. You won't usually get a deadlock when trying to obtain the lock in this case but other messages.
A colleague of mine and I were discussing similar problems in MS SQL Server (so this is not unique to MySQL!) and he pointed out that the solution there is to tell the server not to parallelize the insert or update. The problems here are spinlock-related deadlocks, not logical lock deadlocks in the RDBMS.

ORM Support for Handling Deadlocks

Do you know of any ORM tool that offers deadlock recovery? I know deadlocks are a bad thing but sometimes any system will suffer from it given the right amount of load. In Sql Server, the deadlock message says "Rerun the transaction" so I would suspect that rerunning a deadlock statement is a desirable feature on ORM's.
I don't know of any special ORM tool support for automatically rerunning transactions that failed because of deadlocks. However I don't think that a ORM makes dealing with locking/deadlocking issues very different. Firstly, you should analyze the root cause for your deadlocks, then redesign your transactions and queries in a way that deadlocks are avoided or at least reduced. There are lots of options for improvement, like choosing the right isolation level for (parts) of your transactions, using lock hints etc. This depends much more on your database system then on your ORM. Of course it helps if your ORM allows you to use stored procedures for some fine-tuned command etc.
If this doesn't help to avoid deadlocks completely, or you don't have the time to implement and test the real fix now, of course you could simply place a try/catch around your save/commit/persist or whatever call, check catched exceptions if they indicate that the failed transaction is a "deadlock victim", and then simply recall save/commit/persist after a few seconds sleeping. Waiting a few seconds is a good idea since deadlocks are often an indication that there is a temporary peak of transactions competing for the same resources, and rerunning the same transaction quickly again and again would probably make things even worse.
For the same reason you probably would wont to make sure that you only try once to rerun the same transaction.
In a real world scenario we once implemented this kind of workaround, and about 80% of the "deadlock victims" succeeded on the second go. But I strongly recommend to digg deeper to fix the actual reason for the deadlocking, because these problems usually increase exponentially with the number of users. Hope that helps.
Deadlocks are to be expected, and SQL Server seems to be worse off in this front than other database servers. First, you should try to minimize your deadlocks. Try using the SQL Server Profiler to figure out why its happening and what you can do about it. Next, configure your ORM to not read after making an update in the same transaction, if possible. Finally, after you've done that, if you happen to use Spring and Hibernate together, you can put in an interceptor to watch for this situation. Extend MethodInterceptor and place it in your Spring bean under interceptorNames. When the interceptor is run, use invocation.proceed() to execute the transaction. Catch any exceptions, and define a number of times you want to retry.
An o/r mapper can't detect this, as the deadlock is always occuring inside the DBMS, which could be caused by locks set by other threads or other apps even.
To be sure a piece of code doesn't create a deadlock, always use these rules:
- do fetching outside the transaction. So first fetch, then perform processing then perform DML statements like insert, delete and update
- every action inside a method or series of methods which contain / work with a transaction have to use the same connection to the database. This is required because for example write locks are ignored by statements executed over the same connection (as that same connection set the locks ;)).
Often, deadlocks occur because either code fetches data inside a transaction which causes a NEW connection to be opened (which has to wait for locks) or uses different connections for the statements in a transaction.
I had a quick look (no doubt you have too) and couldn't find anything suggesting that hibernate at least offers this. This is probably because ORMs consider this outside of the scope of the problem they are trying to solve.
If you are having issues with deadlocks certainly follow some of the suggestions posted here to try and resolve them. After that you just need to make sure all your database access code gets wrapped with something which can detect a deadlock and retry the transaction.
One system I worked on was based on “commands” that were then committed to the database when the user pressed save, it worked like this:
While(true)
start a database transaction
Foreach command to process
read data the command need into objects
update the object by calling the command.run method
EndForeach
Save the objects to the database
If not deadlock
commit the database transaction
we are done
Else
abort the database transaction
log deadlock and try again
EndIf
EndWhile
You may be able to do something like with any ORM; we used an in house data access system, as ORM were too new at the time.
We run the commands outside of a transaction while the user was interacting with the system. Then rerun them as above (when you use did a "save") to cope with changes other people have made. As we already had a good ideal of the rows the command would change, we could even use locking hints or “select for update” to take out all the write locks we needed at the start of the transaction. (We shorted the set of rows to be updated to reduce the number of deadlocks even more)