I have a situation where I want to insert a row if it doesn't exist, and to not insert it if it already does. I tried creating sql queries that prevented this from happening (see here), but I was told a solution is to create constraints and catch the exception when they're violated.
I have constraints in place already. My question is - how can I catch the exception and continue executing more queries? If my code looks like this:
cur = transaction.cursor()
#execute some queries that succeed
try:
cur.execute(fooquery, bardata) #this query might fail, but that's OK
except psycopg2.IntegrityError:
pass
cur.execute(fooquery2, bardata2)
Then I get an error on the second execute:
psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block
How can I tell the computer that I want it to keep executing queries? I don't want to transaction.commit(), because I might want to roll back the entire transaction (the queries that succeeded before).
I think what you could do is use a SAVEPOINT before trying to execute the statement which could cause the violation. If the violation happens, then you could rollback to the SAVEPOINT, but keep your original transaction.
Here's another thread which may be helpful:
Continuing a transaction after primary key violation error
I gave an up-vote to the SAVEPOINT answer--especially since it links to a question where my answer was accepted. ;)
However, given your statement in the comments section that you expect errors "more often than not," may I suggest another alternative?
This solution actually harkens back to your other question. The difference here is how to load the data very quickly into the right place and format in order to move data around a single SELECT -and- is generic for any table you want to populate (so the same code could be used for multiple different tables). Here's a rough layout of how I would do it in pure PostgreSQL, assuming I had a CSV file in the same format of the table to be inserted into:
CREATE TEMP TABLE input_file (LIKE target_table);
COPY input_file FROM '/path/to/file.csv' WITH CSV;
INSERT INTO target_table
SELECT * FROM input_file
WHERE (<unique key field list>) NOT IN (
SELECT <unique key field list>
FROM target_table
);
Okay, this is a idealized example and I'm also glossing over several things (like reporting back the duplicates, pushing the data into the table via Python in-memory data, COPY from STDIN rather than via a file, etc.), but hopefully the basic idea is there and it's going to avoid much of the overhead if you expect more records to be rejected than accepted.
Related
I am writing my first PL/SQL script.
In a batch-processing script I'm writing, I'm inserting data from a "comment" field of Table A into a column of Table B that has various integrity constraints (hence concerns about ORA-12899 - value too long, or ORA-02291 - value not allowed).
If an attempt to insert a value into Table B raises an exception, I'd like to use standard output to report not only that an insertion had to be skipped, but which contents of Table A resulted in a skipped insertion.
I'm aware that I could procedurally loop through Table A's data and essentially call 1 INSERT at a time, reporting out the value of any data that raised an exception to that call. But if possible, I'd like to stick closer to the principle of "letting SQL do its job."
Is there any way to display data from Table A when a straight-SQL "INSERT INTO {Table B} SELECT FROM {Table A}" raises an exception?
Note: I don't have any DDL permissions against the database and know that I can't get any; I only have DML & query like INSERT/UPDATE/SELECT. So solutions like "creating a log table" are out of the question, unfortunately.
Update: Second part to my question: Reading Question 1065829, I'm starting to think that even the skipping of invalid insertions and moving on with the valid insertions isn't possible with straight SQL and no DDL access (which I had presumed would be possible before writing my question above). It looks like I'm going to have to choose between iterating through Table A's rows, doing one INSERT call at a time, and asking for special DDL permissions. Is this correct?
Thank you!
When entering the following command:
\copy mmcompany from '<path>/mmcompany.txt' delimiter ',' csv;
I get the following error:
ERROR: duplicate key value violates unique constraint "mmcompany_phonenumber_key"
I understand why it's happening, but how do I execute the command in a way that valid entries will be inserted and ones that create an error will be discarded?
The reason PostgreSQL doesn't do this is related to how it implements constraints and validation. When a constraint fails it causes a transaction abort. The transaction is in an unclean state and cannot be resumed.
It is possible to create a new subtransaction for each row but this is very slow and defeats the purpose of using COPY in the first place, so it isn't supported by PostgreSQL in COPY at this time. You can do it yourself in PL/PgSQL with a BEGIN ... EXCEPTION block inside a LOOP over a select from the data copied into a temporary table. This works fairly well but can be slow.
It's better, if possible, to use SQL to check the constraints before doing any insert that violates them. That way you can just:
CREATE TEMPORARY TABLE stagingtable(...);
\copy stagingtable FROM 'somefile.csv'
INSERT INTO realtable
SELECT * FROM stagingtable
WHERE check_constraints_here;
Do keep concurrency issues in mind though. If you're trying to do a merge/upsert via COPY you must LOCK TABLE realtable; at the start of your transaction or you will still have the potential for errors. It looks like that's what you're trying to do - a copy if not exists. If so, skipping errors is absolutely the wrong approach. See:
How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?
Insert, on duplicate update in PostgreSQL?
Postgresql - Clean way to insert records if they don't exist, update if they do
Can COPY be used with a function?
Postgresql csv importation that skips rows
... this is a much-discussed issue.
One way to handle the constraint violations is to define triggers on the target table to handle the errors. This is not ideal as there can still be race conditions (if concurrently loading), and triggers have pretty high overhead.
Another method: COPY into a staging table and load the data into the target table using SQL with some handling to skip existing entries.
Additionally, another useful method is to use pgloader
At work we have a table to hold settings which essentially contains the following columns:
PARAMNAME
VALUE
Most of the time new settings are added but on rare occasions, settings are removed. Unfortunately this means that any scripts which might have previously updated this value will continue to do so despite the fact that the update results in "0 rows updated" and leads to unexpected behaviour.
This situation was picked up recently by a regression test failure but only after much investigation into why the data in the system was different.
So my question is: Is there a way to generate an error condition when an update results in zero rows updated?
Here are some options I have thought of, but none of them are really all that desirable:
PL/SQL wrapper which notices the failed update and throws an exception.
Not ideal as it doesn't stop anyone/a script from manually doing an update.
A trigger on the table which throws an exception.
Goes against our current policy of phasing out triggers.
Requires updating trigger every time a setting is removed and maintaining a list of obsolete settings (if doing exclusion).
Might have problems with mutating table (if doing inclusion by querying what settings currently exist).
A PL/SQL wrapper seems like the best option to me. Triggers are a great thing to phase out, with the exception of generating sequences and inserting history records.
If you're concerned about someone manually updating rather than using the PL/SQL wrapper, just restrict the user role so that it does not have UPDATE privileges on the table but has EXECUTE privileges on the procedure.
Not really a solution but a method to organize things a bit:
Create a separate table with the parameter definitions and link to that table from the parameter value table. Make the reference to the parameter definition required (nulls not allowed).
Definition table PARAMS (ID, NAME)
Actual settings table PARAM_VALUES (PARAM_ID, VALUE)
(changing your table structure is also a very effective way to evoke errors in scripts that have not been updated...)
May be you can use MERGE statement
here is a link for it
http://www.oracle-developer.net/display.php?id=203
The merge statement allows you to combine insert and update in the same query, so in case the desired row does not exist you may insert a record in a buffer table to indicate the the row does not exist or else you can update the required record
Hope it helps
I am not sure if the transaction log is what I need or what.
My first problem is that I have a stored procedure that inserts some rows. I was just checking out elmah and I see that some sql exceptions happens. They all are the same error(a PK constraint was violated).
Other than that elmah is not telling me much more. So I don't know what row caused this primary key constraint(I am guessing the same row was for some reason being added twice).
So I am not sure if the the transaction log would tell me what happened and what data was trying to be inserted. I can't recreate this error it always works for me.
My second problem is for some reason when my page loads up I have a row from that database that I don't think exists anymore(I have a hidden column with the PK in it.) When I try to find this primary key it does not exist in the database.
I am using ms sql 2005.
Thanks
I don't think transaction log will help you.
SQL 2 modes on how to insert data with uniqueness violation.
There is a setting : IGNORE_DUP_KEY. By default it is OFF. IF you turn it ON, SQL will ignire duplicate rows and your INSERT statement will succeed.
You can read about it here:
http://msdn.microsoft.com/en-us/library/ms175132.aspx
BTW, to view transaction log, you can use this command:
SELECT * FROM fn_dblog(null, null)
You can inspect the log with the (undocumented) function fn_dblog(), but it won't tell you anything in the case of duplicate key violation because the violation happens before the row is inserted, so no log record is generated. Is true though that you'll get other operations at the time of error and from those you can, possibly, recreate the actions that lead to the error condition. Note that if the database is in SIMPLE recovery model then the log gets reused and you likely lost track of anything that happened.
Have a look at this article How do checkpoints work and what gets logged for an example of fn_dblog() usage. Although is on a different topic, it shows how the function works.
If you can repeat the error I would suggest using SQL Server profiler so you can see exactly what is going on.
If you are using asp.net to load the page are you using any output caching or data caching that might be retaining the row that no longer exists in the db?
I am trying to do a bulk upload into a SQL server DB. The source file has duplicates which I want to remove, so I was hoping that the operation would automatically upload the first one, then discard the rest. (I've set a unique key constraint). Problem is, the moment a duplicate upload is attempted the whole thing fails and gets rolled back. Is there any way I can just tell SQL to keep going?
Try to bulk insert the data to the temporary table and then SELECT DISTINCT as #madcolor suggested or
INSERT INTO yourTable
SELECT * FROM #tempTable tt
WHERE NOT EXISTS (SELECT 1 FROM youTable yt WHERE yt.id = tt.id)
or other field in WHERE clause.
If you're doing this through some SQL tool like SQL Plus or DBVis or Toad, then I suspect not. If you're doing this programatically in a language, then you need to divide and conquer. Presumably executing an update line by line and catching each exception would be too lengthy a process, so instead you could do a batch operation first on the whole SQL block, and if it fails, do it on the first half, and if that fails, do it on the first half of the first half. Iterate this way until you have a block that succeeds. Discard the block and do the same procedure on the rest of the SQL. Anything that violates a constraint will eventually end up as a sole SQL statement which you know to log and discard. This should import with as much bulk processing as is possible while still throwing out the invalid lines.
Use SSIS for this. You can tell it to skip the duplicates. But first make sure they are true duplicates. What if the data in some of the columns is different, how do you know which is the better record to keep?