How to merge rows + retrieve new and existing keys - sql

In an Oracle table (e.g. MYTABLE, with a numeric sequenced field as primary key), I have to insert several thousand of rows, but some of them are supposed to already exist in the table.
Naturally, I should try to use MERGE but I need, as well, to retrieve all created (when inserting) and existing (when updating) primary keys.
As well, it should be as fast as possible.
Is the following attempt (pseudo code) the only way to go? Thanks.
keys_list = empty array
for each row to merge
do query 'SELECT PK_MYTABLE FROM MYTABLE WHERE PK_MYTABLE = '+row.pk_mytable
==> retrieve key
if found then:
add key to keys_list
else:
do query 'INSERT INTO MYTABLE (PK_MYTABLE, ...) VALUES (SEQ_MYTABLE.NEXTVAL, ...)'
do query 'SELECT SEQ_MYTABLE.CURRVAL FROM DUAL' ==> retrieve key
add key to keys_list

Add a MODIFICATION_DATE column to the table
Grab and save the sysdate.
When you merge update/insert the value of the sysdate as well.
When the merge is complete, select the rows where the MODIFICATION_DATE = SYSDATE and you
have the set you are interested in.

Why can't you use a MERGE statement for this? This is exactly what a MERGE is for. Here is a rough idea of how it would look...
merge into mytable mt
using
(
select key_field, value_field from sourcetable
) st
on
( mt.key_field = st.key_field )
when matched then update
set mt.value_field = st.value_field
when not matched then insert
( key_field, value_field )
values
( st.key_field, st.value_field )
;
Using a MERGE statement is fast because it is a single statement and the Oracle optimizer can utilize indexes and choose a better explain path than iterating through a cursor using PL/SQL.

If the keys are being generated from a sequence, then the normal way to get the key generated by that insert is to use the returning clause:
declare
v_insert_seq integer;
begin
insert into t1 (pk, c1)
values (myseq.nextval, 'value') returning pk into v_insert_seq;
end;
/
However, as best as I can tell, the merge statement doesn't support that returning feature.
Depending on the source of your new rows, there are different ways you could do this. If you are inserting one row at a time, then the approach above will work pretty well.
To detect the duplicate records, just catch the exceptions when you are inserting (when dup_val_on_index) and then handle them with updates.
If your source of rows is another table, you probably want to look at bulk inserts, and allowing Oracle to return you an array of new PK values. I tried this, but couldn't get it working, so perhaps it's not supported (or I'm missing something today - it gives a syntax error):
declare
type t_type is table of t1.pk%type;
v_insert_seqs t_type;
begin
insert into t1 (pk, c1)
select level newpk, 'value' c1value
from dual
connect by level <= 10 returning pk bulk collect into v_insert_seqs;
exception
when dup_val_on_index then
raise;
end;
/
The next best thing is to select the rows into arrays and then use bulk binds with the returning clause to capture the new PK IDs and also use Save Exceptions to catch all the rows that failed to inserted. Then you can process any of the failed inserted afterwards:
set serveroutput on
declare
type t_pk is table of t1.pk%type;
type t_c1 is table of t1.c1%type;
v_pks t_pk;
v_c1s t_c1;
v_new_pks t_pk;
ex_dml_errors EXCEPTION;
PRAGMA EXCEPTION_INIT(ex_dml_errors, -24381);
begin
-- get the batch of rows you want to insert
select level newpk, 'value' c1
bulk collect into v_pks, v_c1s
from dual connect by level <= 10;
-- bulk bind insert, saving exceptions and capturing the newly inserted
-- records
forall i in v_pks.first .. v_pks.last save exceptions
insert into t1 (pk, c1)
values (v_pks(i), v_c1s(i)) returning pk bulk collect into v_new_pks;
exception
-- Process the exceptions
when ex_dml_errors then
for i in 1..SQL%BULK_EXCEPTIONS.count loop
DBMS_OUTPUT.put_line('Error: ' || i ||
' Array Index: ' || SQL%BULK_EXCEPTIONS(i).error_index ||
' Message: ' || SQLERRM(-SQL%BULK_EXCEPTIONS(i).ERROR_CODE));
end loop;
end;
/

If you are running Oracle 10 or better, you might be able to do much the same thing, for nearly free by issuing a commit before the merge to update the SCN, then after the merge,
use the ORA_ROWSCN to detect which rows have changed.

Related

Solution to Oracle mutating trigger

I am stuck in a small requirement.
my table should restrict if any overlapping data is getting inserted or updated.
Below is my try so far:
CREATE TABLE my_table (
ID NUMBER,
startdate DATE,
enddate DATE,
CONSTRAINT my_table_pk PRIMARY KEY ( ID,startdate,enddate )
);
/
CREATE OR REPLACE TRIGGER trg_my_table_biu
BEFORE INSERT OR UPDATE
ON my_table
FOR EACH ROW
DECLARE
v_count NUMBER;
BEGIN
SELECT COUNT(*)
INTO v_count
FROM my_table
WHERE id = :new.id
AND startdate < = :new.enddate
AND enddate >= :new.startdate;
IF v_count >= 1 THEN
raise_application_error( -20001, 'Cannot make the data overlapped.!' );
END IF;
END;
/
--existing data - good data - Result: Success
INSERT INTO my_table VALUES (1, to_date('01/02/2018','dd/mm/yyyy '),to_date('01/03/2018','dd/mm/yyyy '));
--1 good data - Result: Success
INSERT INTO my_table VALUES (1, to_date('01/01/2018','dd/mm/yyyy '),to_date('15/01/2018','dd/mm/yyyy '));
--2 good data - Result: Success
INSERT INTO my_table VALUES (1, to_date('02/03/2018','dd/mm/yyyy '),to_date('31/03/2018','dd/mm/yyyy '));
--3 bad data - Result: Success
INSERT INTO MY_TABLE VALUES (1, TO_DATE('01/01/2018','dd/mm/yyyy '),TO_DATE('01/04/2018','dd/mm/yyyy '));
--4 bad data - Result: Success
INSERT INTO my_table VALUES (1, to_date('15/01/2018','dd/mm/yyyy '),to_date('02/02/2018','dd/mm/yyyy '));
--5 bad data - Result: Success
INSERT INTO my_table VALUES (1, to_date('16/02/2018','dd/mm/yyyy '),to_date('15/03/2018','dd/mm/yyyy '));
--6 bad data - Result: Success
INSERT INTO my_table VALUES (1, to_date('15/02/2018','dd/mm/yyyy '),to_date('20/02/2018','dd/mm/yyyy '));
--7 good data - Result: Fail
UPDATE my_table
SET enddate = TO_DATE('31/03/2018','dd/mm/yyyy') + 1
WHERE startdate = TO_DATE('02/03/2018','dd/mm/yyyy');
For the 7th statement ie, UPDATE. I am facing mutaing table error.
Please help me here.
Thanks in advance.
As #mic.sca's answer says, triggers are a poor/tricky way to implement rules like this. What you really want is a constraint that can work at table-level rather than row-level. ANSI SQL would call this an "assertion", but no DBMS vendor has yet implemented this to date (though it seems that Oracle is seriously considering doing so in a future release).
However, there is a way to simulate such a constraint/assertion using materialized views. I blogged about this way back in 2004 - your requirement is very like my example 2 there. Modified for your table this would be:
create materialized view my_table_mv1
refresh complete on commit as
select 1 dummy
from my_table t1, my_table t2
where t1.id = t2.id
and t1.startdate <= t2.enddate
and t1.enddate >= t2.startdate;
alter table my_table_mv1
add constraint my_table_mv1_chk
check (1=0) deferrable;
This materialized view only contains instances of overlaps, so should always be empty. As soon as an overlap is created, a row is inserted into the materialized view - but immediately violates its check constraint, which can never be satisfied!
Note that this is a deferred constraint, i.e. it will not be checked until commit time.
By the way, I don't know why I didn't use ANSI join syntax back in 2004 - maybe I just wasn't using it then. However, there are cases (I think more with outer joins) where materialized views can't be created using ANSI syntax but can be with the equivalent old-style syntax!
The mutating table error occurs because during the update in the trigger you are selecting the same row that you are updating.
My advice would be not to use a trigger and instead doing all the insert and update using stored procedures that check that the dates do not overlap before doing the operation.
To prevent concurent operation on the same id. you need as well to have a mechanism to serialize the possible concurrent sessions running the operations on the data. You might have a separate parent table with your ids and all the operations which operate on a specific Id should do a select for update on that id on the parent table before running insert or updates on my_table.
Trigger might look cool but can create maintenance headaches in the long run as they are not that explicit and they apply on all the operations on a table(http://www.oracle.com/technetwork/testcontent/o58asktom-101055.html).
By the way if two users update concurrently two rows with the same id with your trigger you could end up with overlapping values without your trigger raising any error (though it is very unlikely).

Insert row into SQL table based on existing row

I need to add a new row to a table in Oracle. The problem is that the table has 50 columns and I really don't want to write them all out for an INSERT statement. I tried to do a SELECT INTO statement to duplicate the row and then change the fields I care about individually, but this results in a UNIQUE violation on the primary key. So what I really want to do is declare a variable that holds one row without naming all the columns, change the primary key field, and then insert that variable. How do?
You can use %ROWTYPE in an anonymous PL/SQL block to declare a record representing a row from a table and then select a row into that record and change the primary key and insert the updated record. You can even re-use it for multiple inserts:
DECLARE
rec SOME_TABLE%ROWTYPE;
BEGIN
SELECT *
INTO rec
FROM SOME_TABLE
WHERE A = 1; -- Primary Key
rec.A := 2; -- Change the primary key value.
INSERT INTO SOME_TABLE VALUES rec;
rec.A := 3; -- Change the primary key again.
INSERT INTO SOME_TABLE VALUES rec;
FOR i IN 4 .. 9 LOOP
rec.A := i; -- Change it repeatedly...
INSERT INTO SOME_TABLE VALUES rec;
END LOOP;
FOR i IN 1 .. 3 LOOP
rec.A := SOME_SEQUENCE.NEXTVAL; -- Or you can manage the primary key's value using a sequence.
INSERT INTO SOME_TABLE VALUES rec;
END LOOP;
END;
/
SQLFIDDLE
I have often wanted to do something similar to this, but it's just not possible in any SQL variant I know of. You cannot ask for only some of the columns in a table without explicitly naming them (or perhaps defining a view on them in advance).
The only shortcut I can suggest is to dump the list of column names into a convenient location and then just copy it into an insert statement, changing only the value you need:
insert into foo (select 'newC1' as c1, c2, c3, c4, ..., c50 from foo where bar='baz');
::edit:: In fact, I do this so often that I wrote a Python script to help me. I tell it what table I'm editing, some where clause that matches exactly 1 row, the list of column(s) I want to change, and the list of new value(s) I want in those columns. Then it does the rest.

PLSQL Insert into with subquery and returning clause

I can't figure out the correct syntax for the following pseudo-sql:
INSERT INTO some_table
(column1,
column2)
SELECT col1_value,
col2_value
FROM other_table
WHERE ...
RETURNING id
INTO local_var;
I would like to insert something with the values of a subquery.
After inserting I need the new generated id.
Heres what oracle doc says:
Insert Statement
Returning Into
OK i think it is not possible only with the values clause...
Is there an alternative?
You cannot use the RETURNING BULK COLLECT from an INSERT.
This methodology can work with updates and deletes howeveer:
create table test2(aa number)
/
insert into test2(aa)
select level
from dual
connect by level<100
/
set serveroutput on
declare
TYPE t_Numbers IS TABLE OF test2.aa%TYPE
INDEX BY BINARY_INTEGER;
v_Numbers t_Numbers;
v_count number;
begin
update test2
set aa = aa+1
returning aa bulk collect into v_Numbers;
for v_count in 1..v_Numbers.count loop
dbms_output.put_line('v_Numbers := ' || v_Numbers(v_count));
end loop;
end;
You can get it to work with a few extra steps (doing a FORALL INSERT utilizing TREAT)
as described in this article:
returning with insert..select
T
to utilize the example they create and apply it to test2 test table
CREATE or replace TYPE ot AS OBJECT
( aa number);
/
CREATE TYPE ntt AS TABLE OF ot;
/
set serveroutput on
DECLARE
nt_passed_in ntt;
nt_to_return ntt;
FUNCTION pretend_parameter RETURN ntt IS
nt ntt;
BEGIN
SELECT ot(level) BULK COLLECT INTO nt
FROM dual
CONNECT BY level <= 5;
RETURN nt;
END pretend_parameter;
BEGIN
nt_passed_in := pretend_parameter();
FORALL i IN 1 .. nt_passed_in.COUNT
INSERT INTO test2(aa)
VALUES
( TREAT(nt_passed_in(i) AS ot).aa
)
RETURNING ot(aa)
BULK COLLECT INTO nt_to_return;
FOR i IN 1 .. nt_to_return.COUNT LOOP
DBMS_OUTPUT.PUT_LINE(
'Sequence value = [' || TO_CHAR(nt_to_return(i).aa) || ']'
);
END LOOP;
END;
/
Unfortunately that's not possible. RETURNING is only available for INSERT...VALUES statements. See this Oracle forum thread for a discussion of this subject.
You can't, BUT at least in Oracle 19c, you can specify a SELECT subquery inside the VALUES clause and so use RETURNING! This can be a good workaround, even if you may have to repeat the WHERE clause for every field:
INSERT INTO some_table
(column1,
column2)
VALUES((SELECT col1_value FROM other_table WHERE ...),
(SELECT col2_value FROM other_table WHERE ...))
RETURNING id
INTO local_var;
Because the insert is based on a select, Oracle is assuming that you are permitting a multiple-row insert with that syntax. In that case, look at the multiple row version of the returning clause document as it demonstrates that you need to use BULK COLLECT to retrieve the value from all inserted rows into a collection of results.
After all, if your insert query creates two rows - which returned value would it put into an single variable?
EDIT - Turns out this doesn't work as I had thought.... darn it!
This isn't as easy as you may think, and certainly not as easy as it is using MySQL. Oracle doesn't keep track of the last inserts, in a way that you can ping back the result.
You will need to work out some other way of doing this, you can do it using ROWID - but this has its pitfalls.
This link discussed the issue: http://forums.oracle.com/forums/thread.jspa?threadID=352627

Efficient way to update all rows in a table

I have a table with a lot of records (could be more than 500 000 or 1 000 000). I added a new column in this table and I need to fill a value for every row in the column, using the corresponding row value of another column in this table.
I tried to use separate transactions for selecting every next chunk of 100 records and update the value for them, but still this takes hours to update all records in Oracle10 for example.
What is the most efficient way to do this in SQL, without using some dialect-specific features, so it works everywhere (Oracle, MSSQL, MySQL, PostGre etc.)?
ADDITIONAL INFO: There are no calculated fields. There are indexes. Used generated SQL statements which update the table row by row.
The usual way is to use UPDATE:
UPDATE mytable
SET new_column = <expr containing old_column>
You should be able to do this is a single transaction.
As Marcelo suggests:
UPDATE mytable
SET new_column = <expr containing old_column>;
If this takes too long and fails due to "snapshot too old" errors (e.g. if the expression queries another highly-active table), and if the new value for the column is always NOT NULL, you could update the table in batches:
UPDATE mytable
SET new_column = <expr containing old_column>
WHERE new_column IS NULL
AND ROWNUM <= 100000;
Just run this statement, COMMIT, then run it again; rinse, repeat until it reports "0 rows updated". It'll take longer but each update is less likely to fail.
EDIT:
A better alternative that should be more efficient is to use the DBMS_PARALLEL_EXECUTE API.
Sample code (from Oracle docs):
DECLARE
l_sql_stmt VARCHAR2(1000);
l_try NUMBER;
l_status NUMBER;
BEGIN
-- Create the TASK
DBMS_PARALLEL_EXECUTE.CREATE_TASK ('mytask');
-- Chunk the table by ROWID
DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID('mytask', 'HR', 'EMPLOYEES', true, 100);
-- Execute the DML in parallel
l_sql_stmt := 'update EMPLOYEES e
SET e.salary = e.salary + 10
WHERE rowid BETWEEN :start_id AND :end_id';
DBMS_PARALLEL_EXECUTE.RUN_TASK('mytask', l_sql_stmt, DBMS_SQL.NATIVE,
parallel_level => 10);
-- If there is an error, RESUME it for at most 2 times.
l_try := 0;
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask');
WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)
LOOP
l_try := l_try + 1;
DBMS_PARALLEL_EXECUTE.RESUME_TASK('mytask');
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask');
END LOOP;
-- Done with processing; drop the task
DBMS_PARALLEL_EXECUTE.DROP_TASK('mytask');
END;
/
Oracle Docs: https://docs.oracle.com/database/121/ARPLS/d_parallel_ex.htm#ARPLS67333
You could drop any indexes on the table, then do your insert, and then recreate the indexes.
Might not work you for, but a technique I've used a couple times in the past for similar circumstances.
created updated_{table_name}, then select insert into this table in batches. Once finished, and this hinges on Oracle ( which I don't know or use ) supporting the ability to rename tables in an atomic fashion. updated_{table_name} becomes {table_name} while {table_name} becomes original_{table_name}.
Last time I had to do this was for a heavily indexed table with several million rows that absolutely positively could not be locked for the duration needed to make some serious changes to it.
What is the database version? Check out virtual columns in 11g:
Adding Columns with a Default Value
http://www.oracle.com/technology/pub/articles/oracle-database-11g-top-features/11g-schemamanagement.html
update Hotels set Discount=30 where Hotelid >= 1 and Hotelid <= 5504
For Postgresql I do something like this (if we are sure no more updates/inserts take place):
create table new_table as table orig_table with data;
update new_table set column = <expr>
start transaction;
drop table orig_table;
rename new_table to orig_table;
commit;
Update:
One improvement is that if your table is very large you will not lock the table, this operation in this case could take minutes.
Only if you are sure in the process no inserts and/or updates take
place.

Oracle: how to UPSERT (update or insert into a table?)

The UPSERT operation either updates or inserts a row in a table, depending if the table already has a row that matches the data:
if table t has a row exists that has key X:
update t set mystuff... where mykey=X
else
insert into t mystuff...
Since Oracle doesn't have a specific UPSERT statement, what's the best way to do this?
The MERGE statement merges data between two tables. Using DUAL
allows us to use this command. Note that this is not protected against concurrent access.
create or replace
procedure ups(xa number)
as
begin
merge into mergetest m using dual on (a = xa)
when not matched then insert (a,b) values (xa,1)
when matched then update set b = b+1;
end ups;
/
drop table mergetest;
create table mergetest(a number, b number);
call ups(10);
call ups(10);
call ups(20);
select * from mergetest;
A B
---------------------- ----------------------
10 2
20 1
The dual example above which is in PL/SQL was great becuase I wanted to do something similar, but I wanted it client side...so here is the SQL I used to send a similar statement direct from some C#
MERGE INTO Employee USING dual ON ( "id"=2097153 )
WHEN MATCHED THEN UPDATE SET "last"="smith" , "name"="john"
WHEN NOT MATCHED THEN INSERT ("id","last","name")
VALUES ( 2097153,"smith", "john" )
However from a C# perspective this provide to be slower than doing the update and seeing if the rows affected was 0 and doing the insert if it was.
An alternative to MERGE (the "old fashioned way"):
begin
insert into t (mykey, mystuff)
values ('X', 123);
exception
when dup_val_on_index then
update t
set mystuff = 123
where mykey = 'X';
end;
Another alternative without the exception check:
UPDATE tablename
SET val1 = in_val1,
val2 = in_val2
WHERE val3 = in_val3;
IF ( sql%rowcount = 0 )
THEN
INSERT INTO tablename
VALUES (in_val1, in_val2, in_val3);
END IF;
insert if not exists
update:
INSERT INTO mytable (id1, t1)
SELECT 11, 'x1' FROM DUAL
WHERE NOT EXISTS (SELECT id1 FROM mytble WHERE id1 = 11);
UPDATE mytable SET t1 = 'x1' WHERE id1 = 11;
None of the answers given so far is safe in the face of concurrent accesses, as pointed out in Tim Sylvester's comment, and will raise exceptions in case of races. To fix that, the insert/update combo must be wrapped in some kind of loop statement, so that in case of an exception the whole thing is retried.
As an example, here's how Grommit's code can be wrapped in a loop to make it safe when run concurrently:
PROCEDURE MyProc (
...
) IS
BEGIN
LOOP
BEGIN
MERGE INTO Employee USING dual ON ( "id"=2097153 )
WHEN MATCHED THEN UPDATE SET "last"="smith" , "name"="john"
WHEN NOT MATCHED THEN INSERT ("id","last","name")
VALUES ( 2097153,"smith", "john" );
EXIT; -- success? -> exit loop
EXCEPTION
WHEN NO_DATA_FOUND THEN -- the entry was concurrently deleted
NULL; -- exception? -> no op, i.e. continue looping
WHEN DUP_VAL_ON_INDEX THEN -- an entry was concurrently inserted
NULL; -- exception? -> no op, i.e. continue looping
END;
END LOOP;
END;
N.B. In transaction mode SERIALIZABLE, which I don't recommend btw, you might run into
ORA-08177: can't serialize access for this transaction exceptions instead.
I'd like Grommit answer, except it require dupe values. I found solution where it may appear once: http://forums.devshed.com/showpost.php?p=1182653&postcount=2
MERGE INTO KBS.NUFUS_MUHTARLIK B
USING (
SELECT '028-01' CILT, '25' SAYFA, '6' KUTUK, '46603404838' MERNIS_NO
FROM DUAL
) E
ON (B.MERNIS_NO = E.MERNIS_NO)
WHEN MATCHED THEN
UPDATE SET B.CILT = E.CILT, B.SAYFA = E.SAYFA, B.KUTUK = E.KUTUK
WHEN NOT MATCHED THEN
INSERT ( CILT, SAYFA, KUTUK, MERNIS_NO)
VALUES (E.CILT, E.SAYFA, E.KUTUK, E.MERNIS_NO);
I've been using the first code sample for years. Notice notfound rather than count.
UPDATE tablename SET val1 = in_val1, val2 = in_val2
WHERE val3 = in_val3;
IF ( sql%notfound ) THEN
INSERT INTO tablename
VALUES (in_val1, in_val2, in_val3);
END IF;
The code below is the possibly new and improved code
MERGE INTO tablename USING dual ON ( val3 = in_val3 )
WHEN MATCHED THEN UPDATE SET val1 = in_val1, val2 = in_val2
WHEN NOT MATCHED THEN INSERT
VALUES (in_val1, in_val2, in_val3)
In the first example the update does an index lookup. It has to, in order to update the right row. Oracle opens an implicit cursor, and we use it to wrap a corresponding insert so we know that the insert will only happen when the key does not exist. But the insert is an independent command and it has to do a second lookup. I don't know the inner workings of the merge command but since the command is a single unit, Oracle could execute the correct insert or update with a single index lookup.
I think merge is better when you do have some processing to be done that means taking data from some tables and updating a table, possibly inserting or deleting rows. But for the single row case, you may consider the first case since the syntax is more common.
A note regarding the two solutions that suggest:
1) Insert, if exception then update,
or
2) Update, if sql%rowcount = 0 then insert
The question of whether to insert or update first is also application dependent. Are you expecting more inserts or more updates? The one that is most likely to succeed should go first.
If you pick the wrong one you will get a bunch of unnecessary index reads. Not a huge deal but still something to consider.
Try this,
insert into b_building_property (
select
'AREA_IN_COMMON_USE_DOUBLE','Area in Common Use','DOUBLE', null, 9000, 9
from dual
)
minus
(
select * from b_building_property where id = 9
)
;
From http://www.praetoriate.com/oracle_tips_upserts.htm:
"In Oracle9i, an UPSERT can accomplish this task in a single statement:"
INSERT
FIRST WHEN
credit_limit >=100000
THEN INTO
rich_customers
VALUES(cust_id,cust_credit_limit)
INTO customers
ELSE
INTO customers SELECT * FROM new_customers;