I was wondering what everyone's opinion was with regard to pre-checking foreign key look ups before INSERTS and UPDATES versus letting the database handle it. As you know the server will throw an exception if the corresponding row does not exist.
Within .NET we always try to avoid Exception coding in the sense of not using raised exceptions to drive code flow. This means we attempt to detect potential errors before the run-time does.
With SQL I see two opposite points
1) Whether you check or not the database always will. This means that you could be wasting (how much is subjective) CPU cycles doing the same check twice. This makes one lean towards letting the database do it only.
2) Pre-checking allows the developer to raise more informative exceptions back to the calling application. Instead of receiving the generic "foreign key violation" one could return different error codes for each check that needs to be done.
What are your thoughts?
Don't test before:
the DB engine will check anyway on INSERT (you have 2 reads of the index, not one)
it won't scale without lock hints or semaphores which reduce concurrency and performance (an 2nd overlapping concurrent call can pass the EXISTS before the first call does an INSERT)
What you can do is to wrap the INSERT in it's own TRY/CATCH and ignore error xxxx (foreign key violation, sorry don't know it). I've mentioned this before (for unique keys, error 2627)
Only inserting a row if it's not already there
Select / Insert version of an Upsert: is there a design pattern for high concurrency?
SQL Server 2008: INSERT if not exits, maintain unique column
This scales very well to high volumes.
Data integrity maintanence is the Databases's job, so I would say you let the DB handle it. Raised exceptions in this case is a valid case, and even though it could be avoided, it is a correctly raised exception, because it means something in the code didn't work right, that it is sending an orphaned record for insert (or something failed in the first insert - however way you are inserting it). Besides, you should have try/catch anyway, so you can implement a meaningful way to handle this...
I don't see the benefit of pre-checking for FK violations.
If you want more informative error statements, you can simply wrap your insert in a try-catch block and return custom error messages at that point. That way you're only running the extra queries on failure rather than every time.
Related
I had to remove few thousands records from a table which had many FK constraints.
I stopped constraints enforcement in for all tables and deleted the records plus fixed some
obvious constraints which DBCC check showed me and then I have enabled back constraints.
After that DBCC check still shows some errors unfortunately, but at this point I have no time
to search why. My question is then like in title maybe stupid but what can happen if I leave database for a while with constraints errors? Will the app which uses this DB be affected?
Can I deffer fixing the constraints? (I use SQL Server 2008)
thanks
You'll have data that is lacking integrity (has some invalid information). This will not make the database crash or produce errors, but how the application will react to this depends entirely on the application. You might see some error pages, or even worse, the invalid data will propagate to produce even more invalid data or actions (that could be even harder to track down and fix afterwards).
The point of these constraints is to make the database validate (or reject) the data, so that the application can rely on certain invalid patterns not occurring in the data. Many applications are built to not use these assurances and handle data integrity themselves, but if your application did depend on them, and then you pull them out from under it, that sounds a bit dangerous.
I'm writing some SQL code that needs to be executed when rows are inserted in a database table, so I'm using an AFTER INSERT trigger; the code is quite complex, thus there could still be some bugs around.
I've discovered that, if an error happens when executing a trigger, SQL Server aborts the batch and/or the whole transaction. This is not acceptable for me, because it causes problems to the main application that uses the database; I also don't have the source code for that application, so I can't perform proper debugging on it. I absolutely need all database actions to succeed, even if my trigger fails.
How can I code my trigger so that, should an error happen, SQL Server will not abort the INSERT action?
Additionally, how can I perform proper error handling so that I can actually know the trigger has failed? Sending an email with the error data would be ok for me (the trigger's main purpose is actually sending emails), but how do I detect an error condition in a trigger and react to it?
Edit:
Thanks for the tips about optimizing performance by using something else than a trigger, but this code is not "complex" in the sense that it's long-running or performance intensive; it simply builds and sends a mail message, but in order to do so, it must retrieve data from various linked tables, and since I am reverse-engineering this application, I don't have the database schema available and am still trying to find my way around it; this is why conversion errors or unexpected/null values can still creep up, crashing the trigger execution.
Also, as stated above, I absolutely can't perform debugging on the application itself, nor modify it to do what I need in the application layer; the only way to react to an application event is by firing a database trigger when the application writes to the DB that something has just heppened.
If the operations in the trigger are complex and/or potentially long running, and you don't want the activity to affect the original transaction, then you need to find a way to decouple the activity.
One way might be to use Service Broker. In the trigger, just create message(s) (one per row) and send them on their way, then do the rest of the processing in the service.
If that seems too complex, the older way to do it is to insert the rows needing processing into a work/queue table, and then have a job continuously pulling rows from there are doing the work.
Either way, you're now not preventing the original transaction from committing.
Triggers are part of the transaction. You could do try catch swallow around the trigger code, or somewhat more professional try catch log swallow, but really you should let it go bang and then fix the real problem which can only be in your trigger.
If none of the above are acceptable, then you can't use a trigger.
I'm looking for a way to continue execution of a transaction despite errors while inserting low-priority data. It seems like real nested transaction could be a solution, but they aren't supported by SQL Server 2005/2008. Another solution would be to have logic to decide if an error is critical or not, but it would seem that's not possible either.
Here's more detail on my scenario:
Data is periodicaly inserted in the database using ADO.NET/C#, and while some of it is vital, some could also be missing without problems. When the inserts are done, some computations are made on the data. (Both vital and non-vital) This whole process is inside a transaction so everything remains in synch.
Currently, transaction save points are used, and partial rollbacks are made on exceptions which occur during non-vital inserts. However, this doesn't work for "batch-abort" errors, which automaticly rollback the entire transaction. I understand some errors are critical, but things like failed casts are considered by SQL Server to be batch-abort errors. (Info on batch errors) I'm trying to prevent these errors from bringing down the whole insert if they occur on low priority data.
If what I'm describing isn't possible, I'm willing to consider any alternative way to achieve data integrity but allow the failure of the non-vital inserts.
Thanks for your help.
Unfortunately, can't be done as you describe (full support for nested transactions would be key here). Couple things I can think of that have been used to get around this in the past:
Best option would probably be to separate the commands into important/non-important commands that could be executed distinctly, naturally this would require that they not be order-dependent on each other
Could also use a messaging based approach (see Service Broker) where you would execute the primary commands inline and push the non-primary commands onto a queue for execution later/separately. The push to the queue would be transactional within the batch, but the execution of the command when you pop off the queue would be separate. This too would require they not be order-dependent on each other.
If order-dependent, you could use the messaging approach for everything, which would ensure order and could have separate messages per operation, then grouping them together (via conversation groups) would allow you to pull them off the queue in order as well and use separate transactions for each 'type' of operation (i.e. primary vs. non-primary). This would require some special coding on your part if all the grouped messages must be a single autonomous operation, but could be done.
I hesitate to even mention this option, because it is a terrible option, but for full disclosure I suppose you could consider it at your discretion if you think it fits (but it is definitely not an architecture that would apply to almost any scenario). You could use xp_cmdshell to call out to the command line and execute sqlcmd/osql for the non-critical tasks - this sqlcmd execution would be in a separate transaction from the module you are executing from, and simply ignoring the xp_cmdshell failure should allow the primary batch to continue.
Those are some ideas...
Can you do your import into a temporary location, using transactions only for the important parts. Once the temp location loaded, having absorbed any non-critical errors, you can copy the data into its final destination in a single transaction. Depends on the nature the work you are doing, but potentially a viable option.
We've got a web-based application. There are time-bound database operations (INSERTs and UPDATEs) in the application which take more time to complete, hence this particular flow has been changed into a Java Thread so it will not wait (block) for the complete database operation to be completed.
My problem is, if more than 1 user comes across this particular flow, I'm facing the following error thrown by PostgreSQL:
org.postgresql.util.PSQLException: ERROR: deadlock detected
Detail: Process 13560 waits for ShareLock on transaction 3147316424; blocked by process 13566.
Process 13566 waits for ShareLock on transaction 3147316408; blocked by process 13560.
The above error is consistently thrown in INSERT statements.
Additional Information:
1) I have PRIMARY KEY defined in this table.
2) There are FOREIGN KEY references in this table.
3) Separate database connection is passed to each Java Thread.
Technologies
Web Server: Tomcat v6.0.10
Java v1.6.0
Servlet
Database: PostgreSQL v8.2.3
Connection Management: pgpool II
One way to cope with deadlocks is to have a retry mechanism that waits for a random interval and tries to run the transaction again. The random interval is necessary so that the colliding transactions don't continuously keep bumping into each other, causing what is called a live lock - something even nastier to debug. Actually most complex applications will need such a retry mechanism sooner or later when they need to handle transaction serialization failures.
Of course if you are able to determine the cause of the deadlock it's usually much better to eliminate it or it will come back to bite you. For almost all cases, even when the deadlock condition is rare, the little bit of throughput and coding overhead to get the locks in deterministic order or get more coarse-grained locks is worth it to avoid the occasional large latency hit and the sudden performance cliff when scaling concurrency.
When you are consistently getting two INSERT statements deadlocking it's most likely an unique index insert order issue. Try for example the following in two psql command windows:
Thread A | Thread B
BEGIN; | BEGIN;
| INSERT uniq=1;
INSERT uniq=2; |
| INSERT uniq=2;
| block waiting for thread A to commit or rollback, to
| see if this is an unique key error.
INSERT uniq=1; |
blocks waiting |
for thread B, |
DEADLOCK |
V
Usually the best course of action to resolve this is to figure out the parent objects that guard all such transactions. Most applications have one or two of primary entities, such as users or accounts, that are good candidates for this. Then all you need is for every transaction to get the locks on the primary entity it touches via SELECT ... FOR UPDATE. Or if touches several, get locks on all of them but in the same order every time (order by primary key is a good choice).
What PostgreSQL does here is covered in the documentation on Explicit Locking. The example in the "Deadlocks" section shows what you're probably doing. The part you may not have expected is that when you UPDATE something, that acquires a lock on that row that continues until the transaction involved ends. If you have multiple clients all doing updates of more than one thing at once, you'll inevitably end up with deadlocks unless you go out of your way to prevent them.
If you have multiple things that take out implicit locks like UPDATE, you should wrap the whole sequence in BEGIN/COMMIT transaction blocks, and make sure you're consistent about the order they acquire locks (even the implicit ones like what UPDATE grabs) at everywhere. If you need to update something in table A then table B, and one part of the app does A then B while the other does B then A, you're going to deadlock one day. Two UPDATEs against the same table are similarly destined to fail unless you can enforce some ordering of the two that's repeatable among clients. Sorting by primary key once you have the set of records to update and always grabbing the "lower" one first is a common strategy.
It's less likely your INSERTs are to blame here, those are much harder to get into a deadlocked situation, unless you violate a primary key as Ants already described.
What you don't want to do is try and duplicate locking in your app, which is going to turn into a giant scalability and reliability mess (and will likely still result in database deadlocks). If you can't work around this within the confines of the standard database locking methods, consider using either the advisory lock facility or explicit LOCK TABLE to enforce what you need instead. That will save you a world of painful coding over trying to push all the locks onto the client side. If you have multiple updates against a table and can't enforce the order they happen in, you have no choice but to lock the whole table while you execute them; that's the only route that doesn't introduce a potential for deadlock.
Deadlock explained:
In a nutshell, what is happening is that a particular SQL statement (INSERT or other) is waiting on another statement to release a lock on a particular part of the database, before it can proceed. Until this lock is released, the first SQL statement, call it "statement A" will not allow itself to access this part of the database to do its job (= regular lock situation). But... statement A has also put a lock on another part of the database to ensure that no other users of the database access (for reading, or modifiying/deleting, depending on the type of lock). Now... the second SQL statement, is itself in need of accessing the data section marked by the lock of Statement A. That is a DEAD LOCK : both Statement will wait, ad infinitum, on one another.
The remedy...
This would require to know the specific SQL statement these various threads are running, and looking in there if there is a way to either:
a) removing some of the locks, or changing their types.
For example, maybe the whole table is locked, whereby only a given row, or
a page thereof would be necessary.
b) preventing multiple of these queries to be submitted at a given time.
This would be done by way of semaphores/locks (aka MUTEX) at the level of the
multi-threading logic.
Beware that the "b)" approach, if not correctly implemented may just move the deadlock situation from within SQL to within the program/threads logic. The key would be to only create one mutex to be obtained first by any thread which is about to run one of these deadlock-prone queries.
Your problem, probably, is the insert command is trying to lock one or both index and the indexes is locked for the other tread.
One common mistake is lock resources in different order on each thread. Check the orders and try to lock the resources in the same order in all threads.
In my client application I have a method like this (in practice it's more complex, but I've left the main part):
public void btnUpdate_Click(...)
{
...
dataAdapter.Update(...);
...
dataAdapter.Fill(...); // here I got exception one time
}
The exception I found in logs says "Deadlock found when trying to get lock; try restarting transaction". I met this exception only time, so it wasn't repeated.
As I understand, DataAdapter.Fill() method executes only select query. I don't make an explicit transaction and I have autocommit enabled.
So how can I get dead lock on a simple select query which is not a part of bigger transaction?
As I understand, to get a dead lock, two transactions should wait for each other. How is that possible with a single select not inside a transaction? Maybe it's a bug in MySql?
Thank you in advance.
You are right it takes two transactions to make a deadlock. That is to say, No statement or statements within a single transaction can deadlock with other statements within the same transaction.
But it only take one transaction to notice a report of a deadlock. How do you know that the transaction you are seeing the deadlock reported in is the only transaction being executed in the database? Isn't there other activity going on in this database?
Also. your statement "I don't make an explicit transaction", and "... which is not a part of bigger transaction" implies that you do not understand that every SQL statement executed is always in an implicit transaction, even if you do not explicitly start one.
Most databases have reporting mechanisms specifically designed to track, report and/or log instances of deadlocks for diagnostic purposes. In SQL server there is a trace flag that causes a log entry with much detail about each deadlock that occurs, including details about each of the two transactions involved, like what sql statements were being executed, what objects in the database were being locked, and why the lock could not be obtained. I'd guess mySQL has similar disgnostic tool. Find out what it is and turn it on so that the next time this occurs you can look and find out exactly what happened.
You can deadlock a simple SELECT against other statements, like an UPDATE. On my blog I have an example explaining a deadlock between two well tunned statements: Read/Write deadlock. While the example is SQL Server specific, the principle is generic. I don't have enough knowledge of MySQL to claim this is necessarily the case or not, specially in the light of various engines MySQL can deploy, but none the less a simple SELECT can be the victim of a deadlock.
I haven't looked into how MySQL transaction works, but this is based on how MSSQL transactions work:
If you are not using a transaction, each query has a transaction by itself. Otherwise you would get a mess every time an update failed in the middle.
The reason for the deadlock might be lock escalation. The database tries to lock as little as possible for each query, so it starts out by locking only the single rows affected. When most of the rows in a page is locked by the query it might decide that escalating the lock into locking the entire page would be better, which may have the side effect of locking some rows not otherwise affected by the query.
If a select query and an update query are trying to escalate locks on the same table, they may cause a deadlock eventhough only a single table is involved.
I agree that in this particular issue this is unlikely to be the issue but this is supplemental to the other answers in terms of limiting their scope, recorded for posterity in case someone finds it useful.
MySQL can in rare cases have single statements periodically deadlock against themselves. This seems to happen particularly on bulk inserts and the issues are almost certainly a deadlock between different threads relating to the operation. I would expect bulk updates to have the same problem. In the past when faced with this sort of issue I have generally just cut down on the number of rows being inserted (or updated) in a single statement. You won't usually get a deadlock when trying to obtain the lock in this case but other messages.
A colleague of mine and I were discussing similar problems in MS SQL Server (so this is not unique to MySQL!) and he pointed out that the solution there is to tell the server not to parallelize the insert or update. The problems here are spinlock-related deadlocks, not logical lock deadlocks in the RDBMS.