PLSQL - Fastest way to check for dups before insert

PLSQL - Fastest way to check for dups before insert - sql

(Using Oracle) I have a table with one column (myCol) which is the primary key.
When doing an insert, is it faster to check with a select statement before doing the insert or just write the insert and let error handling check it?
So would this be faster?
BEGIN
SELECT count(*) INTO v_count FROM myTbl WHERE myCol = v_newVal;
IF v_count = 0 THEN
INSERT INTO myTbl (myCol) VALUES (v_newVal);
END IF;
END;
or this?
BEGIN
INSERT INTO myTbl (myCol) VALUES (v_newVal);
EXCEPTION WHEN DUP_VAL_ON_INDEX THEN
null;
END;
Thanks!

In terms of performance, certainly the second option is better.
But, more importantly, the first option has a bug - it will fail in cases where more than one session tries to insert the same value at the same time - one of them will succeed, the other session will wait for the first to commit and then raise DUP_VAL_ON_INDEX (which you haven't handled).

I think you could try merge. Typically you should avoid count() from dup checking. There is also a hint you can use to ignore duplicates. This page gives a nice example of this.
http://guyharrison.squarespace.com/blog/2010/1/1/the-11gr2-ignore_row_on_dupkey_index-hint.html

Related

PL/SQL Oracle Stored Procedure loop structure

Just wondering if the way I put COMMIT in the code block is appropriate or not? Should I put them when it finished loop or after each insert statement or after the if else statement?
FOR VAL1 IN (SELECT A.* FROM TABLE_A A) LOOP
IF VAL1.QTY >= 0 THEN
INSERT INTO TEMP_TABLE VALUES('MORE OR EQUAL THAN 0');
COMMIT; /*<-- Should I put this here?*/
INSERT INTO AUDIT_TABLE VALUE('DATA INSERTED >= 0');
COMMIT; /*<-- Should I put this here too?*/
ELSE
INSERT INTO TEMP_TABLE VALUES ('0');
COMMIT; /*<-- Should I put this here too?*/
INSERT INTO AUDIT_TABLE('DATA INSERTED IS 0');
COMMIT; /*<-- Should I put this here too?*/
END IF;
/*Or put commit here?*/
END LOOP;
/*Or here??*/

Generally, committing in a loop is not a good idea, especially after every DML in that loop. Doing so you force oracle(LGWR) to write data in redo log files and may find yourself in a situation when other sessions hang because of log file sync wait event. Or facing ORA-1555 because undo segments will be cleared more often.
Divide your DMLs into logical units of work (transactions) and commit when that unit of work is done, not before and not too late or in a middle of a transaction. This will allow you to keep your database in a consistent state. If, for example, two insert statements form a one unit of work(one transaction), it makes sense to commit or rollback them altogether not separately.
So, generally, you should commit as less as possible. If you have to commit in a loop, introduce some threshold. For instance issue commit after, let say 150 rows:
declare
l_commit_rows number;
For i in (select * from some_table)
loop
l_commit_rows := l_commit_rows + 1;
insert into some_table(..) values(...);
if mode(l_commit_rows, 150) = 0
then
commit;
end if;
end loop;
-- commit the rest
commit;

It is rarely appropriate; say your insert into TEMP_TABLE succeeds but your insert into AUDIT_TABLE fails. You then don't know where you are at all. Additionally, commits will increase the amount of time it takes to perform an operation.
It would be more normal to do everything within a single transaction; that is remove the LOOP and perform your inserts in a single statement. This can be done by using a multi-table insert and would look something like this:
insert
when ( a.qty >= 0 ) then
into temp_table values ('MORE OR EQUAL THAN 0')
into audit_table values ('DATA INSERTED >= 0')
else
into temp_table values ('0')
into audit_table values ('DATA INSERTED IS 0')
select qty from table_a
A simple rule is to not commit in the middle of an action; you need to be able to tell exactly where you were if you have to restart an operation. This normally means, go back to the beginning but doesn't have to. For instance, if you were to place your COMMIT inside your loop but outside the IF statement then you know that that has completed. You'd have to write back somewhere to tell you that this operation has been completed though or use your SQL statement to determine whether you need to re-evaluate that row.

If you insert commit after each insert statement then the database will commit each row inserted. Same will happen if you insert commit after the IF statement ends. (So both will commit after each inserted row). If commit is given after loop then commit will happen after all rows are inserted.
Commit after the loop should work faster as it will commit bulk data but if your loop encounters any error (say after 50 rows are processed there is an error) then your 50 rows also won't be inserted.
So according to your requirement u can either use commit after if or after loop

Oracle DB insert and do nothing on duplicate key

I have to insert some data in oracle DB, without previously checking if it already exist.
Does exist any way, transiction on oracle to catch the exception inside the query and handle it to don't return any exception?
It would be perfect something in mysql's style like
insert .... on duplicate key a=a

You can use MERGE. The syntax is a bit different from a regular insert though;
MERGE INTO test USING (
SELECT 1 AS id, 'Test#1' AS value FROM DUAL -- your row to insert here
) t ON (test.id = t.id) -- duplicate check
WHEN NOT MATCHED THEN
INSERT (id, value) VALUES (t.id, t.value); -- insert if no duplicate
An SQLfiddle to test with.

If you can use PL/SQL, and you have a unique index on the columns where you don't want any duplicates, then you can catch the exception and ignore it:
begin
insert into your_table (your_col) values (your_value);
exception
when dup_val_on_index then null;
end;

Since 11g there is the ignore_row_on_dupkey_index hint that ignores Unique Constraint-exceptions and let the script go on with the next row if there is any, see Link. The exception is not logged. It needs two arguments, the table name and the index name.
INSERT /*+ ignore_row_on_dupkey_index(my_table, my_table_idx) */
INTO my_table(id,name,phone)
VALUES (24,'Joe','+49 19450704');

PLSQL Insert into with subquery and returning clause

I can't figure out the correct syntax for the following pseudo-sql:
INSERT INTO some_table
(column1,
column2)
SELECT col1_value,
col2_value
FROM other_table
WHERE ...
RETURNING id
INTO local_var;
I would like to insert something with the values of a subquery.
After inserting I need the new generated id.
Heres what oracle doc says:
Insert Statement
Returning Into
OK i think it is not possible only with the values clause...
Is there an alternative?

You cannot use the RETURNING BULK COLLECT from an INSERT.
This methodology can work with updates and deletes howeveer:
create table test2(aa number)
/
insert into test2(aa)
select level
from dual
connect by level<100
/
set serveroutput on
declare
TYPE t_Numbers IS TABLE OF test2.aa%TYPE
INDEX BY BINARY_INTEGER;
v_Numbers t_Numbers;
v_count number;
begin
update test2
set aa = aa+1
returning aa bulk collect into v_Numbers;
for v_count in 1..v_Numbers.count loop
dbms_output.put_line('v_Numbers := ' || v_Numbers(v_count));
end loop;
end;
You can get it to work with a few extra steps (doing a FORALL INSERT utilizing TREAT)
as described in this article:
returning with insert..select
T
to utilize the example they create and apply it to test2 test table
CREATE or replace TYPE ot AS OBJECT
( aa number);
/
CREATE TYPE ntt AS TABLE OF ot;
/
set serveroutput on
DECLARE
nt_passed_in ntt;
nt_to_return ntt;
FUNCTION pretend_parameter RETURN ntt IS
nt ntt;
BEGIN
SELECT ot(level) BULK COLLECT INTO nt
FROM dual
CONNECT BY level <= 5;
RETURN nt;
END pretend_parameter;
BEGIN
nt_passed_in := pretend_parameter();
FORALL i IN 1 .. nt_passed_in.COUNT
INSERT INTO test2(aa)
VALUES
( TREAT(nt_passed_in(i) AS ot).aa
)
RETURNING ot(aa)
BULK COLLECT INTO nt_to_return;
FOR i IN 1 .. nt_to_return.COUNT LOOP
DBMS_OUTPUT.PUT_LINE(
'Sequence value = [' || TO_CHAR(nt_to_return(i).aa) || ']'
);
END LOOP;
END;
/

Unfortunately that's not possible. RETURNING is only available for INSERT...VALUES statements. See this Oracle forum thread for a discussion of this subject.

You can't, BUT at least in Oracle 19c, you can specify a SELECT subquery inside the VALUES clause and so use RETURNING! This can be a good workaround, even if you may have to repeat the WHERE clause for every field:
INSERT INTO some_table
(column1,
column2)
VALUES((SELECT col1_value FROM other_table WHERE ...),
(SELECT col2_value FROM other_table WHERE ...))
RETURNING id
INTO local_var;

Because the insert is based on a select, Oracle is assuming that you are permitting a multiple-row insert with that syntax. In that case, look at the multiple row version of the returning clause document as it demonstrates that you need to use BULK COLLECT to retrieve the value from all inserted rows into a collection of results.
After all, if your insert query creates two rows - which returned value would it put into an single variable?
EDIT - Turns out this doesn't work as I had thought.... darn it!

This isn't as easy as you may think, and certainly not as easy as it is using MySQL. Oracle doesn't keep track of the last inserts, in a way that you can ping back the result.
You will need to work out some other way of doing this, you can do it using ROWID - but this has its pitfalls.
This link discussed the issue: http://forums.oracle.com/forums/thread.jspa?threadID=352627

Get count of records affected by INSERT or UPDATE in PostgreSQL

My database driver for PostgreSQL 8/9 does not return a count of records affected when executing INSERT or UPDATE.
PostgreSQL offers the non-standard syntax "RETURNING" which seems like a good workaround. But what might be the syntax? The example returns the ID of a record, but I need a count.
INSERT INTO distributors (did, dname) VALUES (DEFAULT, 'XYZ Widgets')
RETURNING did;

I know this question is oooolllllld and my solution is arguably overly complex, but that's my favorite kind of solution!
Anyway, I had to do the same thing and got it working like this:
-- Get count from INSERT
WITH rows AS (
INSERT INTO distributors
(did, dname)
VALUES
(DEFAULT, 'XYZ Widgets'),
(DEFAULT, 'ABC Widgets')
RETURNING 1
)
SELECT count(*) FROM rows;
-- Get count from UPDATE
WITH rows AS (
UPDATE distributors
SET dname = 'JKL Widgets'
WHERE did <= 10
RETURNING 1
)
SELECT count(*) FROM rows;
One of these days I really have to get around to writing a love sonnet to PostgreSQL's WITH clause ...

I agree w/ Milen, your driver should do this for you. What driver are you using and for what language? But if you are using plpgsql, you can use GET DIAGNOSTICS my_var = ROW_COUNT;
http://www.postgresql.org/docs/current/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-DIAGNOSTICS

You can take ROW_COUNT after update or insert with this code:
insert into distributors (did, dname) values (DEFAULT, 'XYZ Widgets');
get diagnostics v_cnt = row_count;

It's not clear from your question how you're calling the statement. Assuming you're using something like JDBC you may be calling it as a query rather than an update. From JDBC's executeQuery:
Executes the given SQL statement, which returns a single ResultSet
object.
This is therefore appropriate when you execute a statement that returns some query results, such as SELECT or INSERT ... RETURNING. If you are making an update to the database and then want to know how many tuples were affected, you need to use executeUpdate which returns:
either (1) the row count for SQL Data Manipulation Language (DML)
statements or (2) 0 for SQL statements that return nothing

You could wrap your query in a transaction and it should show you the count before you ROLLBACK or COMMIT. Example:
BEGIN TRANSACTION;
INSERT .... ;
ROLLBACK TRANSACTION;
If you run the first 2 lines of the above, it should give you the count. You can then ROLLBACK (undo) the insert if you find that the number of affected lines isn't what you expected. If you're satisfied that the INSERT is correct, then you can run the same thing, but replace line 3 with COMMIT TRANSACTION;.
Important note: After you run any BEGIN TRANSACTION; you must either ROLLBACK; or COMMIT; the transaction, otherwise the transaction will create a lock that can slow down or even cripple an entire system, if you're running on a production environment.

How bad is ignoring Oracle DUP_VAL_ON_INDEX exception?

I have a table where I'm recording if a user has viewed an object at least once, hence:
HasViewed
ObjectID number (FK to Object table)
UserId number (FK to Users table)
Both fields are NOT NULL and together form the Primary Key.
My question is, since I don't care how many times someone has viewed an object (after the first), I have two options for handling inserts.
Do a SELECT count(*) ... and if no records are found, insert a new record.
Always just insert a record, and if it throws a DUP_VAL_ON_INDEX exceptions (indicating that there already was such a record), just ignore it.
What's the downside of choosing the second option?
UPDATE:
I guess the best way to put it is : "Is the overhead caused by the exception worse than the overhead caused by the initial select?"

I would normally just insert and trap the DUP_VAL_ON_INDEX exception, as this is the simplest to code. This is more efficient than checking for existence before inserting. I don't consider doing this a "bad smell" (horrible phrase!) because the exception we handle is raised by Oracle - it's not like raising your own exceptions as a flow-control mechanism.
Thanks to Igor's comment I have now run two different benchamrks on this: (1) where all insert attempts except the first are duplicates, (2) where all inserts are not duplicates. Reality will lie somewhere between the two cases.
Note: tests performed on Oracle 10.2.0.3.0.
Case 1: Mostly duplicates
It seems that the most efficient approach (by a significant factor) is to check for existence WHILE inserting:
prompt 1) Check DUP_VAL_ON_INDEX
begin
for i in 1..1000 loop
begin
insert into hasviewed values(7782,20);
exception
when dup_val_on_index then
null;
end;
end loop
rollback;
end;
/
prompt 2) Test if row exists before inserting
declare
dummy integer;
begin
for i in 1..1000 loop
select count(*) into dummy
from hasviewed
where objectid=7782 and userid=20;
if dummy = 0 then
insert into hasviewed values(7782,20);
end if;
end loop;
rollback;
end;
/
prompt 3) Test if row exists while inserting
begin
for i in 1..1000 loop
insert into hasviewed
select 7782,20 from dual
where not exists (select null
from hasviewed
where objectid=7782 and userid=20);
end loop;
rollback;
end;
/
Results (after running once to avoid parsing overheads):
1) Check DUP_VAL_ON_INDEX
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.54
2) Test if row exists before inserting
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.59
3) Test if row exists while inserting
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.20
Case 2: no duplicates
prompt 1) Check DUP_VAL_ON_INDEX
begin
for i in 1..1000 loop
begin
insert into hasviewed values(7782,i);
exception
when dup_val_on_index then
null;
end;
end loop
rollback;
end;
/
prompt 2) Test if row exists before inserting
declare
dummy integer;
begin
for i in 1..1000 loop
select count(*) into dummy
from hasviewed
where objectid=7782 and userid=i;
if dummy = 0 then
insert into hasviewed values(7782,i);
end if;
end loop;
rollback;
end;
/
prompt 3) Test if row exists while inserting
begin
for i in 1..1000 loop
insert into hasviewed
select 7782,i from dual
where not exists (select null
from hasviewed
where objectid=7782 and userid=i);
end loop;
rollback;
end;
/
Results:
1) Check DUP_VAL_ON_INDEX
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.15
2) Test if row exists before inserting
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.76
3) Test if row exists while inserting
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.71
In this case DUP_VAL_ON_INDEX wins by a mile. Note the "select before insert" is the slowest in both cases.
So it appears that you should choose option 1 or 3 according to the relative likelihood of inserts being or not being duplicates.

I don't think there is a downside to your second option. I think it's a perfectly valid use of the named exception, plus it avoids the lookup overhead.

Try this?
SELECT 1
FROM TABLE
WHERE OBJECTID = 'PRON_172.JPG' AND
USERID='JCURRAN'
It should return 1, if there is one there, otherwise NULL.
In your case, it looks safe to ignore, but for performance, one should avoid exceptions on the common path. A question to ask, "How common will the exceptions be?"
Few enough to ignore? or so many another method should be used?

IMHO it is best to go with Option 2: Other than what is already been said, you should consider thread safety. If you go with option 1 and If multiple threads are executing your PL/SQL block then its possible that two or more threads fire select at the same time and at that time there is no record, this will end up leading all threads to insert and you will get unique constraint error.

Usually, exception handling is slower; however if it would happen only seldom, then you would avoid the overhead of the query.
I think it mainly depends on the frequency of the exception, but if performance is important, I would suggest some benchmarking with both approaches.
Generally speaking, treating common events as exception is a bad smell; for this reason you could see also from another point of view.
If it is an exception, then it should be treated as an exception - and your approach is correct.
If it is a common event, then you should try to explicitly handle it - and then checking if the record is already inserted.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

PLSQL - Fastest way to check for dups before insert - sql

I think you could try merge. Typically you should avoid count() from dup checking. There is also a hint you can use to ignore duplicates. This page gives a nice example of this. http://guyharrison.squarespace.com/blog/2010/1/1/the-11gr2-ignore_row_on_dupkey_index-hint.html

Related

PL/SQL Oracle Stored Procedure loop structure

Oracle DB insert and do nothing on duplicate key

PLSQL Insert into with subquery and returning clause

Get count of records affected by INSERT or UPDATE in PostgreSQL

How bad is ignoring Oracle DUP_VAL_ON_INDEX exception?

Categories

Resources