SQL unique : manual check vs catch exception - sql

I'm working on a big database and i'm seeking for everything that can speed up the database. The question is : when you have an unique index on somes fields, what is the faster between make a select request to check if it's ok, or try it anyway and catch the exception if the entry already exists ?
I made some researchs but nothing conclusive. Thanks.

A manual check won't do what you think it does. (See below.)
If you check first, every insert requires two round-trips to the database. It might also require serializable transactions.
And you have to trap errors anyway. A duplicate value is just one thing that can go wrong on an insert; there are a lot of other things that can go wrong.
I say just insert, and trap the errors.
The point of a SELECT before INSERT is to determine whether a value already exists in the database. But you can't rely on that to work. Here's why.
Open two terminal sessions (for example), and connect both to your database. This table already exists. It's empty.
create table test (
test_id serial primary key,
test_email varchar(15) not null unique
);
A: begin transaction;
A: select test_email
from test
where test_email = 'a#b.com';
(0 rows)
B: begin transaction;
A: insert into test (test_email)
values ('a#b.com');
INSERT 0 1
B: select test_email
from test
where test_email = 'a#b.com';
(0 rows)
B: insert into test (test_email)
values ('a#b.com');
(waiting for lock)
A: commit;
B: ERROR: duplicate key value
violates unique constraint...

you have 2 choices
Try to insert and if the query fails, you will catch and roll back transaction.
Try to query once to check if it exists or not. If it does not exist,Insert the value.
In my opinion, first one is better because you use network connection twice if you query twice. Also select is an good option when you have really big data.
In the first case you try to insert but you get DataIntegrityException. Single request and response is better than two request and two response.
Transaction Manager can handle the exception as well.

My understanding is try / catch exceptions will abruptly stop the flow of the program. Even when properly handled. The recommended practice is to use them apart from domain logic. An extra select shouldn't be that bad unless your database server is physically far away.

Related

INSERT INTO .. SELECT .. unique constraint violation

I'm running a stored procedure that selects values my temp table and inserts them into the database like so:
INSERT INTO emails (EmailAddress) (
SELECT
DISTINCT eit.EmailAddress
FROM #EmailInfoTemp eit
LEFT JOIN emails ea
ON eit.EmailAddress = ea.EmailAddress
WHERE ea.EmailAddressID IS NULL )
On rare cases(~ once every couple of hours on a server that handles thousands of requests a minute), I then receive a unique constraint error "Violation of UNIQUE KEY constraint�".. on an index on the EmailAddress column.
I can confirm that I am not passing in duplicate values. Even if I was, it should be caught by the DISTINCT.
-SQL Server 2008
-Stored proc + not using transactions + JDBC callablestatement
Could it happen that between the SELECT and the ensuing INSERT, there was another call to the same/different stored proc that completed an INSERT with similiar data? If so, what would be the best way to prevent that?
Some ideas: We have many duplicate instances of "clients" who communicate with this one SQL Server at once in production, so my first reaction was a concurrency issue, but I can't seem to replicate it myself. That's the best guess I had, but it's gone nowhere so far. This does not happen on our staging environment where the load is insignificant compared to the production environment. That was the main reason I started looking into concurrency issues.
The error is probably caused by two sessions executing an insert at the same time.
You can make your SQL code safer by using MERGE. As Aaron Bertrand's comment says (thanks!), you have to include a with (holdlock) hint to make merge really safe.
; merge emails e with (holdlock)
using #EmailInfoTemp eit
on e.EmailAddress = eit.EmailAddress
when not matched then insert
(EmailAddress) values (eit.EmailAddress)
The merge statement will take appropriate locks to ensure that no other session can sneak in between it's "not matched" check and the "insert".
If you can't use merge, you could solve the problem client-side. Make sure that no two inserts are running at the same time. This is typically easy to do with a mutex or other synchronization construct.

What kind of errors exists in SQL querys for ROLLBACK?

For example:
insert into table( a, b ) values ('a','b') could generate the following error:
**a-b duplicate entry**
BUT here I can ignore this error selecting the ID of this values, then use this ID:
select ID from table where a = 'a' and b = 'b'
insert into brother( table ) values (ID)
Finally I could COMMIT the PROCEDURE. Look that this error isn't relevant for rollback if I need the ID.
The question is: what kind of errors will doing me to ROLLBACK the PROCEDURE???
I hope you understand.
I think you're asking, "What kind of errors can an INSERT statement cause that will make MySQL rollback a transaction?"
An INSERT that violates any constraint will cause a rollback. It could be foreign key constraint like you've outlined, but it could also be a UNIQUE constraint, or a CHECK constraint. (A CHECK constraint would probably be implemented as a trigger in MySQL.)
Trying to insert values that aren't valid (NULL in nonnullable columns, numbers that are out of range, invalid dates) might cause a rollback. But they might not, depending on the server configuration. (See link below.)
An INSERT can also fail due because it lacks permissions. That will also cause a rollback.
Some conditions that would cause a rollback on other platforms don't cause a rollback on MySQL.
The options MySQL has when an error
occurs are to stop the statement in
the middle or to recover as well as
possible from the problem and
continue. By default, the server
follows the latter course. This means,
for example, that the server may
coerce illegal values to the closest
legal values.
That quote is from How MySQL Deals with Constraints.
One of my favorite quotes from the MySQL documentation, 1.8.6.2. Constraints on Invalid Data.
MySQL enables you to store certain
incorrect date values into DATE and
DATETIME columns (such as '2000-02-31'
or '2000-02-00'). The idea is that it
is not the job of the SQL server to
validate dates.
Isn't that cute?

SQL continue executing queries after duplicate key violation

I have a situation where I want to insert a row if it doesn't exist, and to not insert it if it already does. I tried creating sql queries that prevented this from happening (see here), but I was told a solution is to create constraints and catch the exception when they're violated.
I have constraints in place already. My question is - how can I catch the exception and continue executing more queries? If my code looks like this:
cur = transaction.cursor()
#execute some queries that succeed
try:
cur.execute(fooquery, bardata) #this query might fail, but that's OK
except psycopg2.IntegrityError:
pass
cur.execute(fooquery2, bardata2)
Then I get an error on the second execute:
psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block
How can I tell the computer that I want it to keep executing queries? I don't want to transaction.commit(), because I might want to roll back the entire transaction (the queries that succeeded before).
I think what you could do is use a SAVEPOINT before trying to execute the statement which could cause the violation. If the violation happens, then you could rollback to the SAVEPOINT, but keep your original transaction.
Here's another thread which may be helpful:
Continuing a transaction after primary key violation error
I gave an up-vote to the SAVEPOINT answer--especially since it links to a question where my answer was accepted. ;)
However, given your statement in the comments section that you expect errors "more often than not," may I suggest another alternative?
This solution actually harkens back to your other question. The difference here is how to load the data very quickly into the right place and format in order to move data around a single SELECT -and- is generic for any table you want to populate (so the same code could be used for multiple different tables). Here's a rough layout of how I would do it in pure PostgreSQL, assuming I had a CSV file in the same format of the table to be inserted into:
CREATE TEMP TABLE input_file (LIKE target_table);
COPY input_file FROM '/path/to/file.csv' WITH CSV;
INSERT INTO target_table
SELECT * FROM input_file
WHERE (<unique key field list>) NOT IN (
SELECT <unique key field list>
FROM target_table
);
Okay, this is a idealized example and I'm also glossing over several things (like reporting back the duplicates, pushing the data into the table via Python in-memory data, COPY from STDIN rather than via a file, etc.), but hopefully the basic idea is there and it's going to avoid much of the overhead if you expect more records to be rejected than accepted.

Why could "insert (...) values (...)" not insert a new row?

I have a simple SQL insert statement of the form:
insert into MyTable (...) values (...)
It is used repeatedly to insert rows and usually works as expected. It inserts exactly 1 row to MyTable, which is also the value returned by the Delphi statement AffectedRows:= myInsertADOQuery.ExecSQL.
After some time there was a temporary network connectivity problem. As a result, other threads of the same application perceived EOleExceptions (Connection failure, -2147467259 = unspecified error). Later, the network connection was reestablished, these threads reconnected and were fine.
The thread responsible for executing the insert statement described above, however, did not perceive the connectivity problems (No exceptions) - probably it was simply not executed while the network was down. But after the network connectivity problems myInsertADOQuery.ExecSQL always returned 0 and no rows were inserted to MyTable anymore. After a restart of the application the insert statement worked again as expected.
For SQL Server, is there any defined case where an insert statment like the one above would not insert a row and return 0 as the number of affected rows? Primary key is an autogenerated GUID. There are no unique or check constraints (which should result in an exception anyway rather than not inserting a row).
Are there any known ADO bugs (Provider=SQLOLEDB.1)?
Any other explanations for this behaviour?
Thanks,
Nang.
If you does not have any exceptions, then:
When a table has triggers without SET NOCOUNT ON, then actually the operation (INSERT / UPDATE / DELETE) may be finished successfully, but a number of affected records may be returned as 0.
Depending on a transaction activity in current session, other sessions may not see changes made by current session. But current session will see own changes and a number of affected records will be (may be) not 0.
So, the exact answer may depend on your table DDL (+ triggers if any) and on how you are checking the inserted rows.
Looks like your Insert thread lost silently the connection and is not checking on it to do an auto reconnect if needed but keeps queuing the inserts without actually sending them.
I would isolate this code in a small standalone app to debug it and see how it behaves when you voluntarily disconnect the network then reconnect it.
I would not be surprised if you either found a "swallowed" exception, or some code omitting to check for success/failure.
Hope it helps...
If the values you're trying to insert are violating
a CHECK constraint
a FOREIGN KEY relationship
a NOT NULL constraint
a UNIQUE constraint
or any other constraints, then the row(s) will not be inserted.
Do you use transactions? Maybe your application has no autocommit? Some drivers do not commit data if there was error in transaction.

creating a table if it doesn't exist

Given the following:
if object_id('MyTable') is null create table MyTable( myColumn int )
Is it not possible that two separate callers could both evaluate object_id('MyTable') as null and so both attempt to create the table.
Obviously one of the two callers in that scenario would fail, but ideally no caller should fail, rather one should block and the other should create the table, then the blocked caller will see object_id('MyTable') as not null and proceed.
On what can I apply exclusive lock, such that I'm not locking more than is absolutely required?
After your initial check, use a try catch when creating the table, and if the error is that the table already exists, proceed, if not, you have a bigger problem
Usually CREATE TABLE is run from setup and installation scripts and is unreasonable to expect installation scripts to allow for concurrent installation from separate connections.
I recommend you use a session scopped app lock acquired at the beginning of your install/upgrade procedure, see sp_getapplock.
I don't think, you should be worrying about this.
DDL statements don't run under a transaction. Also, the 2nd caller will fail, if the table already was created by a call from the 1st caller.
I don't allow users to create tables. In general that's a bad practice. If they need to insert data, the table is already there. If you are worried about two people creating the same table are you also worried about whether their data is crossing? I don't know what your proc does but if it deos something like delte the records if the table exists and then insert, then you could have strange reults if two users were on at a the same time. In general though, if you are needing to creat ea table at run time , it is usually a sign that your design needs work.