VIEW repeatedly deadlocked by application-side commands - sql

I have a schema-bound view (SSMS 2008 R2) running off of a set of tables maintained and updated by a front-end application. Earlier this week, after a deployment to update the application, the view suddenly deadlock-victims every time its run in Prod despite running successfully in Dev thru Staging.
Running a trace and grabbing the deadlock graph showed the competing DELETE statement came from the application (it doesn't UPDATE records; instead it DELETEs and INSERTs).
Edit1: deadlocks are being caused by competing application-side commands with IX-level locks. VIEW issues S-level locks, but the competing commands continue to deadlock, with the VIEW query consistently being the victim process. Setting isolation to 'read uncommitted' does not resolve the issue.
The VIEW recursively outer-joins on the same tables multiple times to create a linked history of records. I suspect this is the functionality which makes the VIEW too complex to evade the timing of locks. It seems to work half the days and then will consistently deadlock on others.
Is this simply a capacity issue, or is there a better way to build reporting structures that would remedy the deadlocking issues?

if you're getting a lot of deadlocking in the view it may be worthwhile breaking it down into a larger number of simpler views - where a schema bound view has an index drawn from multiple tables it can also be particularly prone to locking issues.

Related

How to delete large data from Firebird SQL database

I have a very large database (at least for me) - above 1 000 000 records and I need to delete all records that are with a timestamp lower than something. Like for example:
DELETE FROM table WHERE TS < 2020-01-01;
The problem I'm facing is that after the transaction finishes, if it finishes at all, is that the database is unresponsive and unusable. How can I delete so much records without the above said problem?
I'm new to this, as of now I've only worked with databases that had 1000-10000 rows and the command I used to delete records hasn't caused problems.
Based on your description in the question, and your comments, the problem has to do with how garbage collection works in Firebird. Firebird is a so-called Multi-Version Concurrency Control (MVCC) database, each change you make to a row (record), including deletions, will create new versions of that record, and keep previous versions available for other transactions that were started before the transaction that made the change is committed.
If there are no more transactions 'interested' in a previous version of record, that previous version becomes eligible for garbage collection. Firebird has two options for garbage collection: cooperative (supported by all server modes) and background (supported by SuperServer), and a third combined mode which does both (this is the default for SuperServer).
The background mode is a dedicated thread which cleans up garbage, it's signaled by active statements if they see garbage.
In the cooperative mode, a statement that sees garbage is also the one that has to clean it up. This can be especially costly when the statement performs a full table scan just after a large update or delete. Instead of just finding and returning rows, that statement will also rewrite database pages to get rid of that garbage.
See also the slides Garbage collection mechanism and sweep in details.
There are some possible solutions:
If you're using SuperServer, change the policy, by setting the setting GCPolicy in firebird.conf to background.
The downside of this solution is that it might take longer before all garbage is collected, but the big benefit is that transactions are not slowed down by doing garbage collection work.
After committing a transaction that produced a lot garbage, execute a statement that performs a full table scan (e.g. select count(*) from table) to trigger garbage collection, using a separate worker thread to not block the rest of your process.
This option only really works if there are no active transactions that are still interested in those old record versions.
Create a backup of the database (there is no need to restore, except to verify if the backup worked correctly).
By default (unless you specify the -g option to disable garbage collection), the gbak tool will perform garbage collection during a backup. This has the same restriction as option 2, as this works because gbak does the equivalent of a select * from table
Perform a 'sweep' of the database using gfix -sweep.
This has similar restrictions as the previous two options
For connections that cannot incur the slowdown of a garbage collection, specify the connection option isc_dpb_no_garbage_collect (details vary between drivers and connection libraries).
If you specify this for all connections, and your policy is cooperative (either because it is configured, or you're using Classic or SuperClassic server mode), then no garbage collection will take place, which can cause an eventual slowdown as well, because the engine will have to scan longer chains of record versions. This can be mitigated by using the previous two options to perform a garbage collection.
Instead of really deleting records, introduce a soft-delete in your application to mark records as deleted instead of really deleting them.
Either keep those records permanently, or really delete them at a later time, for example by a scheduled job running at a time the database is not under load, and include one of the previous options to trigger a garbage collection.
Actually background garbage collection is exactly what can cause "unresponsive database" behavior because of high tension between garbage collector and working threads. Cooperative GC may slow down operations but keep the database "responsive". At least for version 2.5.
Another reason is bad indexes which have a lot of duplicates. Such indexes are often useless for queries and should be simply dropped. If it is not an option, they could be deactivated before delete and reactivated after in separate transactions (as a side effect the activation will cause full garbage collection).
But of course the best option is to keep all data. 1kk records is not that much for well-designed database on decent hardware.

Can a deadlock in a databases affect other databases or hung entire server?

This sounds pretty strange, but there is ocurring in a databases of a bunch we've got in a server, we can tell by the output on the log, but this seems to be affecting another databases, since the systems hungs when the deadlock occur.
We've identified the objects involved in the deadlock event, but none lives in the databases from the system we are using.
I still need to look at the procedure bodies, but is this possible? processes from other databases entering in deadlock and hunging the entire server or other databases?
A deadlock is not a fatal event in MS Sql Server (unlike, eg., in code). This is because Sql Server will periodically scan for deadlocks, and then pick one of the of the processes to kill. That's when you get the log messages.
Absent a Sql Server bug (which I've never encountered), I'd think it's more likely that the order is reversed - the hung server/database prevents normal execution of queries, resulting in deadlocks as procedures take longer to execute.
I have seen this happen when two processes that are in a deadlock also have objects locked in TempDB.
The locked objects in tempdb then stop other processes from being able to create objects and thus hang.
This was an issue on older versions of SQL Server (2000), but I can't recall seeing it on more recent version.
it is possible. if your server can't react to any interupt it can't execute other requests (correctly).

sql deadlocking and timing out almost constantly

looks like today is going to be another rubbish one. we have recently updated our sql box with a complete monster, with loads of cores and ram, however we are stuck with out old DB schema which is crapola our old sql box had problems but nothing like what we are experiencing with the new one, although on the day of rolling out it was running super fast, within a week its a complete mess...
our .net app used by a couple of hundred people or so is generating a huge amount of deadlocks and timeouts on the SQL box. and we are struggling to work out why. we have - checked all the indexes and they are as good as they can be right now some of the major tables are way too wide and have a stupid amount of triggers on, but there is nothing we can do about this now.
alot of the pids seem to be the same for the same users who are trying multiple times.. so for instance..
User: user1 Time: 09:21 Error Message: Transaction (Process ID 76) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
User: user1 Time: 09:22 Error Message: Transaction (Process ID 76) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
etc.. when we moved the db to the new box it was backed up from the old and restored to the new...
if anyone has any suggestions as to something we can do , i will buy them multiple pints
thanks
nat
Deadlocks don't necessarily need high load to occur. They tend to be the byproduct of design issues in terms of which processes are locking data in which order, for how long, etc.
There are some useful features of the SQL profiler (2008 article here) to help you track & analyse deadlocks. I'd recommend this as the best starting point. If you're lucky, you'll find that there's just one or two culprits where you can readily, for instance, remove transactions, or reduce their longevity, to alleviate the situation.

Firebird backup restore is frustrating, is there a way to avoid it?

I am using Firebird, but lately the database grows really seriously.
There is really a lot of delete statements running, as well update/inserts, and the database file size grows really fast.
After tons of deleting records the database size doesn't decrease, and even worse, i have the feeling that actually the query getting slowed down a bit.
In order to fix this a daily backup/restore process have been involved, but because of it's time to complete - i could say that it is really frustrating to use Firebird.
Any ideas on workarounds or solution on this will be welcome.
As well, I am considering switching to Interbase because I heard from a friend that it is not having this issue - it is so ?
We have a lot of huge databases on Firebird in production but never had an issue with a database growth. Yes, every time a record being deleted or updated an old version of it will be kept in the file. But sooner or later a garbage collector will sweap it away. Once both processes will balance each other the database file will grow only for the size of new data and indices.
As general precaution to prevent an enormous database growth try to make your transactions as short as possible. In our applications we use one READ ONLY transaction for reading all the data. This transaction is open through whole application life time. For every batch of insert/update/delete statements we use short separate transactions.
Slowing of database operations could be resulted from obsolete indices stats. Here you can find an example of how to recalculate statistics for all indices: http://www.firebirdfaq.org/faq167/
Check if you have unfinished transactions in your applications. If transaction is started but not committed or rolled back, database will have own revision for each transaction after the oldest active transaction.
You can check the database statistics (gstat or external tool), there's oldest transaction and the next transaction. If the difference between those numbers keeps growing, you have the stuck transaction problem.
There are also monitoring tools the check situation, one I've used is Sinatica Monitor for Firebird.
Edit: Also, database file doesn't shrink automatically ever. Parts of it get marked as unused (after sweep operation) and will be reused. http://www.firebirdfaq.org/faq41/
The space occupied by deleted records will be re-used as soon as it is garbage collected by Firebird.
If GC is not happening (transaction problems?), DB will keep growing, until GC can do its job.
Also, there is a problem when you do a massive delete in a table (ex: millions of records), the next select in that table will "trigger" the garbage collection, and the performance will drop until GC finishes. The only way to workaround this would be to do the massive deletes in a time when the server is not very used, and run a sweep after that, making sure that there are no stuck transactions.
Also, keep in mind that if you are using "standard" tables to hold temporary data (ie: info is inserted and delete several times), you can get corrupted database in some circumstances. I strongly suggest you to start using Global Temporary Tables feature.

ORM Support for Handling Deadlocks

Do you know of any ORM tool that offers deadlock recovery? I know deadlocks are a bad thing but sometimes any system will suffer from it given the right amount of load. In Sql Server, the deadlock message says "Rerun the transaction" so I would suspect that rerunning a deadlock statement is a desirable feature on ORM's.
I don't know of any special ORM tool support for automatically rerunning transactions that failed because of deadlocks. However I don't think that a ORM makes dealing with locking/deadlocking issues very different. Firstly, you should analyze the root cause for your deadlocks, then redesign your transactions and queries in a way that deadlocks are avoided or at least reduced. There are lots of options for improvement, like choosing the right isolation level for (parts) of your transactions, using lock hints etc. This depends much more on your database system then on your ORM. Of course it helps if your ORM allows you to use stored procedures for some fine-tuned command etc.
If this doesn't help to avoid deadlocks completely, or you don't have the time to implement and test the real fix now, of course you could simply place a try/catch around your save/commit/persist or whatever call, check catched exceptions if they indicate that the failed transaction is a "deadlock victim", and then simply recall save/commit/persist after a few seconds sleeping. Waiting a few seconds is a good idea since deadlocks are often an indication that there is a temporary peak of transactions competing for the same resources, and rerunning the same transaction quickly again and again would probably make things even worse.
For the same reason you probably would wont to make sure that you only try once to rerun the same transaction.
In a real world scenario we once implemented this kind of workaround, and about 80% of the "deadlock victims" succeeded on the second go. But I strongly recommend to digg deeper to fix the actual reason for the deadlocking, because these problems usually increase exponentially with the number of users. Hope that helps.
Deadlocks are to be expected, and SQL Server seems to be worse off in this front than other database servers. First, you should try to minimize your deadlocks. Try using the SQL Server Profiler to figure out why its happening and what you can do about it. Next, configure your ORM to not read after making an update in the same transaction, if possible. Finally, after you've done that, if you happen to use Spring and Hibernate together, you can put in an interceptor to watch for this situation. Extend MethodInterceptor and place it in your Spring bean under interceptorNames. When the interceptor is run, use invocation.proceed() to execute the transaction. Catch any exceptions, and define a number of times you want to retry.
An o/r mapper can't detect this, as the deadlock is always occuring inside the DBMS, which could be caused by locks set by other threads or other apps even.
To be sure a piece of code doesn't create a deadlock, always use these rules:
- do fetching outside the transaction. So first fetch, then perform processing then perform DML statements like insert, delete and update
- every action inside a method or series of methods which contain / work with a transaction have to use the same connection to the database. This is required because for example write locks are ignored by statements executed over the same connection (as that same connection set the locks ;)).
Often, deadlocks occur because either code fetches data inside a transaction which causes a NEW connection to be opened (which has to wait for locks) or uses different connections for the statements in a transaction.
I had a quick look (no doubt you have too) and couldn't find anything suggesting that hibernate at least offers this. This is probably because ORMs consider this outside of the scope of the problem they are trying to solve.
If you are having issues with deadlocks certainly follow some of the suggestions posted here to try and resolve them. After that you just need to make sure all your database access code gets wrapped with something which can detect a deadlock and retry the transaction.
One system I worked on was based on “commands” that were then committed to the database when the user pressed save, it worked like this:
While(true)
start a database transaction
Foreach command to process
read data the command need into objects
update the object by calling the command.run method
EndForeach
Save the objects to the database
If not deadlock
commit the database transaction
we are done
Else
abort the database transaction
log deadlock and try again
EndIf
EndWhile
You may be able to do something like with any ORM; we used an in house data access system, as ORM were too new at the time.
We run the commands outside of a transaction while the user was interacting with the system. Then rerun them as above (when you use did a "save") to cope with changes other people have made. As we already had a good ideal of the rows the command would change, we could even use locking hints or “select for update” to take out all the write locks we needed at the start of the transaction. (We shorted the set of rows to be updated to reduce the number of deadlocks even more)