INSERT INTO .. SELECT .. unique constraint violation - sql

I'm running a stored procedure that selects values my temp table and inserts them into the database like so:
INSERT INTO emails (EmailAddress) (
SELECT
DISTINCT eit.EmailAddress
FROM #EmailInfoTemp eit
LEFT JOIN emails ea
ON eit.EmailAddress = ea.EmailAddress
WHERE ea.EmailAddressID IS NULL )
On rare cases(~ once every couple of hours on a server that handles thousands of requests a minute), I then receive a unique constraint error "Violation of UNIQUE KEY constraint�".. on an index on the EmailAddress column.
I can confirm that I am not passing in duplicate values. Even if I was, it should be caught by the DISTINCT.
-SQL Server 2008
-Stored proc + not using transactions + JDBC callablestatement
Could it happen that between the SELECT and the ensuing INSERT, there was another call to the same/different stored proc that completed an INSERT with similiar data? If so, what would be the best way to prevent that?
Some ideas: We have many duplicate instances of "clients" who communicate with this one SQL Server at once in production, so my first reaction was a concurrency issue, but I can't seem to replicate it myself. That's the best guess I had, but it's gone nowhere so far. This does not happen on our staging environment where the load is insignificant compared to the production environment. That was the main reason I started looking into concurrency issues.

The error is probably caused by two sessions executing an insert at the same time.
You can make your SQL code safer by using MERGE. As Aaron Bertrand's comment says (thanks!), you have to include a with (holdlock) hint to make merge really safe.
; merge emails e with (holdlock)
using #EmailInfoTemp eit
on e.EmailAddress = eit.EmailAddress
when not matched then insert
(EmailAddress) values (eit.EmailAddress)
The merge statement will take appropriate locks to ensure that no other session can sneak in between it's "not matched" check and the "insert".
If you can't use merge, you could solve the problem client-side. Make sure that no two inserts are running at the same time. This is typically easy to do with a mutex or other synchronization construct.

Related

postgresql: \copy method enter valid entries and discard exceptions

When entering the following command:
\copy mmcompany from '<path>/mmcompany.txt' delimiter ',' csv;
I get the following error:
ERROR: duplicate key value violates unique constraint "mmcompany_phonenumber_key"
I understand why it's happening, but how do I execute the command in a way that valid entries will be inserted and ones that create an error will be discarded?
The reason PostgreSQL doesn't do this is related to how it implements constraints and validation. When a constraint fails it causes a transaction abort. The transaction is in an unclean state and cannot be resumed.
It is possible to create a new subtransaction for each row but this is very slow and defeats the purpose of using COPY in the first place, so it isn't supported by PostgreSQL in COPY at this time. You can do it yourself in PL/PgSQL with a BEGIN ... EXCEPTION block inside a LOOP over a select from the data copied into a temporary table. This works fairly well but can be slow.
It's better, if possible, to use SQL to check the constraints before doing any insert that violates them. That way you can just:
CREATE TEMPORARY TABLE stagingtable(...);
\copy stagingtable FROM 'somefile.csv'
INSERT INTO realtable
SELECT * FROM stagingtable
WHERE check_constraints_here;
Do keep concurrency issues in mind though. If you're trying to do a merge/upsert via COPY you must LOCK TABLE realtable; at the start of your transaction or you will still have the potential for errors. It looks like that's what you're trying to do - a copy if not exists. If so, skipping errors is absolutely the wrong approach. See:
How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?
Insert, on duplicate update in PostgreSQL?
Postgresql - Clean way to insert records if they don't exist, update if they do
Can COPY be used with a function?
Postgresql csv importation that skips rows
... this is a much-discussed issue.
One way to handle the constraint violations is to define triggers on the target table to handle the errors. This is not ideal as there can still be race conditions (if concurrently loading), and triggers have pretty high overhead.
Another method: COPY into a staging table and load the data into the target table using SQL with some handling to skip existing entries.
Additionally, another useful method is to use pgloader

sql error - unique constraint

I have one data migration script like this.
Data_migration.sql
It's contents are
insert into table1 select * from old_schema.table1;
commit;
insert into table2 select * from old_schema.table2;
commit;
And table1 has the pk_productname constraint when I execute the script
SQL> # "data_migration.sql"
I will get an unique constraint(pk_productname) violation. But when I execute the individual sql statements I won't get any error. Any reason behind this. And how to resolve this.
The failure of the unique constraint means you are attempting to insert one of more records whose primary key columns collide.
If it happens when you run a script but not when you run the individual statements then there must be a bug in your script. Without seeing the script it is impossible for us to be sure what that bug is, but the most likely thing is you are somehow running the same statement twice.
Another possible cause is that the constraint is deferred. This means it is not enforced until the end of the transaction. So the INSERT statement would appear to succeed if you run it without issuing the subsequent COMMIT.
It is common to run data migration without enabled constraints. Re-enable them afterwards using an EXCEPTIONS table. This makes it easier to investigate problems. Find out more.

SQL continue executing queries after duplicate key violation

I have a situation where I want to insert a row if it doesn't exist, and to not insert it if it already does. I tried creating sql queries that prevented this from happening (see here), but I was told a solution is to create constraints and catch the exception when they're violated.
I have constraints in place already. My question is - how can I catch the exception and continue executing more queries? If my code looks like this:
cur = transaction.cursor()
#execute some queries that succeed
try:
cur.execute(fooquery, bardata) #this query might fail, but that's OK
except psycopg2.IntegrityError:
pass
cur.execute(fooquery2, bardata2)
Then I get an error on the second execute:
psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block
How can I tell the computer that I want it to keep executing queries? I don't want to transaction.commit(), because I might want to roll back the entire transaction (the queries that succeeded before).
I think what you could do is use a SAVEPOINT before trying to execute the statement which could cause the violation. If the violation happens, then you could rollback to the SAVEPOINT, but keep your original transaction.
Here's another thread which may be helpful:
Continuing a transaction after primary key violation error
I gave an up-vote to the SAVEPOINT answer--especially since it links to a question where my answer was accepted. ;)
However, given your statement in the comments section that you expect errors "more often than not," may I suggest another alternative?
This solution actually harkens back to your other question. The difference here is how to load the data very quickly into the right place and format in order to move data around a single SELECT -and- is generic for any table you want to populate (so the same code could be used for multiple different tables). Here's a rough layout of how I would do it in pure PostgreSQL, assuming I had a CSV file in the same format of the table to be inserted into:
CREATE TEMP TABLE input_file (LIKE target_table);
COPY input_file FROM '/path/to/file.csv' WITH CSV;
INSERT INTO target_table
SELECT * FROM input_file
WHERE (<unique key field list>) NOT IN (
SELECT <unique key field list>
FROM target_table
);
Okay, this is a idealized example and I'm also glossing over several things (like reporting back the duplicates, pushing the data into the table via Python in-memory data, COPY from STDIN rather than via a file, etc.), but hopefully the basic idea is there and it's going to avoid much of the overhead if you expect more records to be rejected than accepted.

Why could "insert (...) values (...)" not insert a new row?

I have a simple SQL insert statement of the form:
insert into MyTable (...) values (...)
It is used repeatedly to insert rows and usually works as expected. It inserts exactly 1 row to MyTable, which is also the value returned by the Delphi statement AffectedRows:= myInsertADOQuery.ExecSQL.
After some time there was a temporary network connectivity problem. As a result, other threads of the same application perceived EOleExceptions (Connection failure, -2147467259 = unspecified error). Later, the network connection was reestablished, these threads reconnected and were fine.
The thread responsible for executing the insert statement described above, however, did not perceive the connectivity problems (No exceptions) - probably it was simply not executed while the network was down. But after the network connectivity problems myInsertADOQuery.ExecSQL always returned 0 and no rows were inserted to MyTable anymore. After a restart of the application the insert statement worked again as expected.
For SQL Server, is there any defined case where an insert statment like the one above would not insert a row and return 0 as the number of affected rows? Primary key is an autogenerated GUID. There are no unique or check constraints (which should result in an exception anyway rather than not inserting a row).
Are there any known ADO bugs (Provider=SQLOLEDB.1)?
Any other explanations for this behaviour?
Thanks,
Nang.
If you does not have any exceptions, then:
When a table has triggers without SET NOCOUNT ON, then actually the operation (INSERT / UPDATE / DELETE) may be finished successfully, but a number of affected records may be returned as 0.
Depending on a transaction activity in current session, other sessions may not see changes made by current session. But current session will see own changes and a number of affected records will be (may be) not 0.
So, the exact answer may depend on your table DDL (+ triggers if any) and on how you are checking the inserted rows.
Looks like your Insert thread lost silently the connection and is not checking on it to do an auto reconnect if needed but keeps queuing the inserts without actually sending them.
I would isolate this code in a small standalone app to debug it and see how it behaves when you voluntarily disconnect the network then reconnect it.
I would not be surprised if you either found a "swallowed" exception, or some code omitting to check for success/failure.
Hope it helps...
If the values you're trying to insert are violating
a CHECK constraint
a FOREIGN KEY relationship
a NOT NULL constraint
a UNIQUE constraint
or any other constraints, then the row(s) will not be inserted.
Do you use transactions? Maybe your application has no autocommit? Some drivers do not commit data if there was error in transaction.

How can remove lock from table in SQL Server 2005?

I am using the Function in stored procedure , procedure contain transaction and update the table and insert values in the same table , while the function is call in procedure is also fetch data from same table.
i get the procedure is hang with function.
Can have any solution for the same?
If I'm hearing you right, you're talking about an insert BLOCKING ITSELF, not two separate queries blocking each other.
We had a similar problem, an SSIS package was trying to insert a bunch of data into a table, but was trying to make sure those rows didn't already exist. The existing code was something like (vastly simplified):
INSERT INTO bigtable
SELECT customerid, productid, ...
FROM rawtable
WHERE NOT EXISTS (SELECT CustomerID, ProductID From bigtable)
AND ... (other conditions)
This ended up blocking itself because the select on the WHERE NOT EXISTS was preventing the INSERT from occurring.
We considered a few different options, I'll let you decide which approach works for you:
Change the transaction isolation level (see this MSDN article). Our SSIS package was defaulted to SERIALIZABLE, which is the most restrictive. (note, be aware of issues with READ UNCOMMITTED or NOLOCK before you choose this option)
Create a UNIQUE index with IGNORE_DUP_KEY = ON. This means we can insert ALL rows (and remove the "WHERE NOT IN" clause altogether). Duplicates will be rejected, but the batch won't fail completely, and all other valid rows will still insert.
Change your query logic to do something like put all candidate rows into a temp table, then delete all rows that are already in the destination, then insert the rest.
In our case, we already had the data in a temp table, so we simply deleted the rows we didn't want inserted, and did a simple insert on the rest.
This can be difficult to diagnose. Microsoft has provided some information here:
INF: Understanding and resolving SQL Server blocking problems
A brute force way to kill the connection(s) causing the lock is documented here:
http://shujaatsiddiqi.blogspot.com/2009/01/killing-sql-server-process-with-x-lock.html
Some more Microsoft info here: http://support.microsoft.com/kb/323630
How big is the table? Do you have problem if you call the procedure from separate windows? Maybe the problem is related to the amount of data the procedure is working with and lack of indexes.