Efficient way to update all rows in a table - sql

I have a table with a lot of records (could be more than 500 000 or 1 000 000). I added a new column in this table and I need to fill a value for every row in the column, using the corresponding row value of another column in this table.
I tried to use separate transactions for selecting every next chunk of 100 records and update the value for them, but still this takes hours to update all records in Oracle10 for example.
What is the most efficient way to do this in SQL, without using some dialect-specific features, so it works everywhere (Oracle, MSSQL, MySQL, PostGre etc.)?
ADDITIONAL INFO: There are no calculated fields. There are indexes. Used generated SQL statements which update the table row by row.

The usual way is to use UPDATE:
UPDATE mytable
SET new_column = <expr containing old_column>
You should be able to do this is a single transaction.

As Marcelo suggests:
UPDATE mytable
SET new_column = <expr containing old_column>;
If this takes too long and fails due to "snapshot too old" errors (e.g. if the expression queries another highly-active table), and if the new value for the column is always NOT NULL, you could update the table in batches:
UPDATE mytable
SET new_column = <expr containing old_column>
WHERE new_column IS NULL
AND ROWNUM <= 100000;
Just run this statement, COMMIT, then run it again; rinse, repeat until it reports "0 rows updated". It'll take longer but each update is less likely to fail.
EDIT:
A better alternative that should be more efficient is to use the DBMS_PARALLEL_EXECUTE API.
Sample code (from Oracle docs):
DECLARE
l_sql_stmt VARCHAR2(1000);
l_try NUMBER;
l_status NUMBER;
BEGIN
-- Create the TASK
DBMS_PARALLEL_EXECUTE.CREATE_TASK ('mytask');
-- Chunk the table by ROWID
DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID('mytask', 'HR', 'EMPLOYEES', true, 100);
-- Execute the DML in parallel
l_sql_stmt := 'update EMPLOYEES e
SET e.salary = e.salary + 10
WHERE rowid BETWEEN :start_id AND :end_id';
DBMS_PARALLEL_EXECUTE.RUN_TASK('mytask', l_sql_stmt, DBMS_SQL.NATIVE,
parallel_level => 10);
-- If there is an error, RESUME it for at most 2 times.
l_try := 0;
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask');
WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)
LOOP
l_try := l_try + 1;
DBMS_PARALLEL_EXECUTE.RESUME_TASK('mytask');
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask');
END LOOP;
-- Done with processing; drop the task
DBMS_PARALLEL_EXECUTE.DROP_TASK('mytask');
END;
/
Oracle Docs: https://docs.oracle.com/database/121/ARPLS/d_parallel_ex.htm#ARPLS67333

You could drop any indexes on the table, then do your insert, and then recreate the indexes.

Might not work you for, but a technique I've used a couple times in the past for similar circumstances.
created updated_{table_name}, then select insert into this table in batches. Once finished, and this hinges on Oracle ( which I don't know or use ) supporting the ability to rename tables in an atomic fashion. updated_{table_name} becomes {table_name} while {table_name} becomes original_{table_name}.
Last time I had to do this was for a heavily indexed table with several million rows that absolutely positively could not be locked for the duration needed to make some serious changes to it.

What is the database version? Check out virtual columns in 11g:
Adding Columns with a Default Value
http://www.oracle.com/technology/pub/articles/oracle-database-11g-top-features/11g-schemamanagement.html

update Hotels set Discount=30 where Hotelid >= 1 and Hotelid <= 5504

For Postgresql I do something like this (if we are sure no more updates/inserts take place):
create table new_table as table orig_table with data;
update new_table set column = <expr>
start transaction;
drop table orig_table;
rename new_table to orig_table;
commit;
Update:
One improvement is that if your table is very large you will not lock the table, this operation in this case could take minutes.
Only if you are sure in the process no inserts and/or updates take
place.

Related

Statistics rapidly changing within transaction - fixing execution plan

A problem I'm facing (Oracle 11g):
I create a table, let's call it table_xyz, with index (not unique, no primary key).
I create package with procedure that will insert let's say 10 millions ofrecords monthly- it's not a simple "insert into", it's thousand lines of procedural code and some of it actually also selects data from table_xyz to calculate what data to insert further.
For example somewhere within the procedure there is this query
Now, there is a problem:
When the procedure is run for the first time, all queries on table_xyz will have execution plan based on the moment, when there were 0 records in table_xyz.
So, all queries will effectively full scan table_xyz, instead of starting to use indexes at some point.
This leads to terrible performance and in my case actually, the first run will never finish...
Now, there are three approaches i thought of:
At some point within the transaction, recalculate statistics. For example run "analyze table / analyze index compute statistics" after the count of records in table_xyz reaches the power of 10.
This is not possible, however, since ANALYZE commits transaction and i cannot allow that
At some point within the transaction, recalculate statistics as above, but do it in autonomous transaction. This does not work however, since the new statistics will not be visible for the main transaction (i tested that).
Just hint all the cursors that use table_xyz with USE_INDEX. This gets the job done, but is ugly and generally frowned upon in the codebase.
Are there any other ways?
I attach some code. It is just an example, please do not try to optimize it by removing the procedure and so on.
create table table_xyz (idx number(10) /*+ Specifically this is NOT a primary key */
,some_value varchar2(10)
);
create index table_xyz_idx on table_xyz (idx);
declare
cursor idxes is
select level idx
from dual d
connect by level < 100000;
current_val varchar2(10);
function calculate_some_value(p_idx number) return varchar2
is
cursor c_previous is
select t.some_value
from table_xyz t
where t.idx in (round(p_idx / 2, 0), round(p_idx / 3, 0), round(p_idx / 5, 0))
order by t.idx desc
;
x varchar2(100);
begin
open c_previous;
fetch c_previous into x;
close c_previous;
x := nvl(x, 'XYZ');
if mod(p_idx, 2) = 0 then
x := x || '2';
elsif mod(p_idx, 3) = 0 then
x := '3' || x;
elsif mod(p_idx, 5) = 0 then
x := substr(x, 1,1) || '5' || substr(x, 2, 2 + mod(p_idx, 7));
end if;
x := substr(x, 1, 10);
return x;
end calculate_some_value;
begin
for idx in idxes
loop
current_val := calculate_some_value(idx.idx);
insert into table_xyz(idx, some_value) values (idx.idx, current_val);
end loop;
end;
Consider taking a look at the DBMS_STATS package.
Option A: use the DBMS_STATS procedures for manually setting table, column, and index statistics (i.e., SET_TABLE_STATS, SET_COLUMN_STATS, and SET_INDEX_STATS, respectively). Then use DBMS_STATS.LOCK_TABLE_STATS to keep your manually set statistics from being overwritten (e.g., by a DBA gathering schema statistics while your table happens to be empty).
Option B: run you procedure as is and then, after, manually gather stats on the table. Then, as above, use DBMS_STATS.LOCK_TABLE_STATS to keep them from being overwritten.
Either way, the idea is to set or gather statistics on your table and then lock them in place.
If you want to get fancier, maybe you could automate this. E.g.,
At install time, manually set the statistics and lock them for your first run
In your procedure code, at the end, unlock the statistics, gather them, and re-lock them.

Updating Millions of Records Oracle

I have created one query to update the 35 million records column,
but unfortunately, it took around more than one hour to process.
did I miss anything on the below query?
DECLARE
CURSOR exp_cur IS
SELECT
DECODE(
COLUMN_NAME,
NULL, NULL,
standard_hash(COLUMN_NAME)
) AS COLUMN_NAME
FROM TABLE1;
TYPE nt_fName IS TABLE OF VARCHAR2(100);
fname nt_fName;
BEGIN
OPEN exp_cur;
FETCH exp_cur BULK COLLECT INTO fname LIMIT 1000000;
CLOSE exp_cur;
--Print data
FOR idx IN 1 .. fname.COUNT
LOOP
UPDATE TABLE1 SET COLUMN_NAME=fname(idx);
commit;
DBMS_OUTPUT.PUT_LINE (idx||' '||fname(idx) );
END LOOP;
END;
The reason why bulk collect used with a forall construction is generally faster than the equivalent row-by-row loop is because it applies all the updates in one shot, instead of laboriously stepping though the rows one at a time and launching 35 million separate update statements, each one requiring the database to search for the individual row before updating it. But what you have written (even when the bugs are fixed) is still a row-by-row loop with 35 million search and update statements, plus the additional overhead of populating a 700 MB array in memory, 35 million commits, and 35 million dbms_output messages. It has to be slower because it has significantly more work to do than a plain update.
If it is practical to copy the data to a new table, insert will be a lot faster than update. At the end you can reapply any grants, indexes and constraints to the new table, rename both tables and drop the old one. You can also insert /*+ parallel enable_parallel_dml */ (or prior to Oracle 12c, you have to alter session enable parallel dml separately.) You could define the new table as nologging during the copy, but check with your DBA as that can affect replication and backups, though that might not matter if this is a test system. This will all need careful scripting if it's going to form part of a routine workflow.
Your code is updating all records of TABLE1 in each loop. (It loops 35 million times and in each loop updating 35 million records, That's why it is taking time)
You can simply use a single update statement as follows:
UPDATE TABLE1 SET COLUMN_NAME = standard_hash(COLUMN_NAME)
WHERE COLUMN_NAME IS NOT NULL;
So, If you want to use the BULK COLLECT and FORALL then you can use it as follows:
DECLARE
CURSOR EXP_CUR IS
SELECT COLUMN_NAME FROM TABLE1
WHERE COLUMN_NAME IS NOT NULL;
TYPE NT_FNAME IS TABLE OF VARCHAR2(100);
FNAME NT_FNAME;
BEGIN
OPEN EXP_CUR;
FETCH EXP_CUR BULK COLLECT INTO FNAME LIMIT 1000000;
FORALL IDX IN FNAME.FIRST..FNAME.LAST
UPDATE TABLE1
SET COLUMN_NAME = STANDARD_HASH(COLUMN_NAME)
WHERE COLUMN_NAME = FNAME(IDX);
COMMIT;
CLOSE EXP_CUR;
END;
/

Tuning SQL Updation query

I am using this update query which is taking about 8hrs to execute, I want it to take lesser time, how can I do that? My query is:
BEGIN
LOOP
UPDATE ENCORE_LIVE.INSTRUMENT_CLASSIFICATION SET CODE = NULL WHERE TYPE_ID =
15 AND CODE is not NULL
and rownum < 50000;
exit when SQL%rowcount < 49999;
commit;
END LOOP;
commit;
END;
So you are updating a table in batches of 50 000 records, right?
You should verify if you have columns TYPE_ID and CODE in INSTRUMENT_CLASSIFICATION table indexed and eventually create indexes.
Check also this answer.
Your UPDATE query is basic and cannot be optimized anyhow. To create an index you can use:
CREATE INDEX instrumentClasification_TypeCode_idx
ON ENCORE_LIVE.INSTRUMENT_CLASSIFICATION
(TYPE_ID, CODE)
Also you should consider running the update query in one batch without splitting it in batches.

Fast Update database with more than 10 million records

I am fairly new to SQL and was wondering if someone can help me.
I got a database that has around 10 million rows.
I need to make a script that finds the records that have some NULL fields, and then updates it to a certain value.
The problem I have from doing a simple update statement, is that it will blow the rollback space.
I was reading around that I need to use BULK COLLECT AND FETCH.
My idea was to fetch 10,000 records at a time, update, commit, and continue fetching.
I tried looking for examples on Google but I have not found anything yet.
Any help?
Thanks!!
This is what I have so far:
DECLARE
CURSOR rec_cur IS
SELECT DATE_ORIGIN
FROM MAIN_TBL WHERE DATE_ORIGIN IS NULL;
TYPE date_tab_t IS TABLE OF DATE;
date_tab date_tab_t;
BEGIN
OPEN rec_cur;
LOOP
FETCH rec_cur BULK COLLECT INTO date_tab LIMIT 1000;
EXIT WHEN date_tab.COUNT() = 0;
FORALL i IN 1 .. date_tab.COUNT
UPDATE MAIN_TBL SET DATE_ORIGIN = '23-JAN-2012'
WHERE DATE_ORIGIN IS NULL;
END LOOP;
CLOSE rec_cur;
END;
I think I see what you're trying to do. There are a number of points I want to make about the differences between the code below and yours.
Your forall loop will not use an index. This is easy to get round by using rowid to update your table.
By committing after each forall you reduce the amount of undo needed; but make it more difficult to rollback if something goes wrong. Though logically your query could be re-started in the middle easily and without detriment to your objective.
rowids are small, collect at least 25k at a time; if not 100k.
You cannot index a null in Oracle. There are plenty of questions on stackoverflow about this is you need more information. A functional index on something like nvl(date_origin,'x') as a loose example would increase the speed at which you select data. It also means you never actually have to use the table itself. You only select from the index.
Your date data-type seems to be a string. I've kept this but it's not wise.
If you can get someone to increase your undo tablespace size then a straight up update will be quicker.
Assuming as per your comments date_origin is a date then the index should be on something like:
nvl(date_origin,to_date('absolute_minimum_date_in_Oracle_as_a_string','yyyymmdd'))
I don't have access to a DB at the moment but to find out the amdiOaas run the following query:
select to_date('0001','yyyy') from dual;
It should raise a useful error for you.
Working example in PL/SQL Developer.
create table main_tbl as
select cast( null as date ) as date_origin
from all_objects
;
create index i_main_tbl
on main_tbl ( nvl( to_date(date_origin,'yyyy-mm-dd')
, to_date('0001-01-01' ,'yyyy-mm-dd') )
)
;
declare
cursor c_rec is
select rowid
from main_tbl
where nvl(date_origin,to_date('0001-01-01','yyyy-mm-dd'))
= to_date('0001-01-01','yyyy-mm-dd')
;
type t__rec is table of rowid index by binary_integer;
t_rec t__rec;
begin
open c_rec;
loop
fetch c_rec bulk collect into t_rec limit 50000;
exit when t_rec.count = 0;
forall i in t_rec.first .. t_rec.last
update main_tbl
set date_origin = to_date('23-JAN-2012','DD-MON-YYYY')
where rowid = t_rec(i)
;
commit ;
end loop;
close c_rec;
end;
/

How to merge rows + retrieve new and existing keys

In an Oracle table (e.g. MYTABLE, with a numeric sequenced field as primary key), I have to insert several thousand of rows, but some of them are supposed to already exist in the table.
Naturally, I should try to use MERGE but I need, as well, to retrieve all created (when inserting) and existing (when updating) primary keys.
As well, it should be as fast as possible.
Is the following attempt (pseudo code) the only way to go? Thanks.
keys_list = empty array
for each row to merge
do query 'SELECT PK_MYTABLE FROM MYTABLE WHERE PK_MYTABLE = '+row.pk_mytable
==> retrieve key
if found then:
add key to keys_list
else:
do query 'INSERT INTO MYTABLE (PK_MYTABLE, ...) VALUES (SEQ_MYTABLE.NEXTVAL, ...)'
do query 'SELECT SEQ_MYTABLE.CURRVAL FROM DUAL' ==> retrieve key
add key to keys_list
Add a MODIFICATION_DATE column to the table
Grab and save the sysdate.
When you merge update/insert the value of the sysdate as well.
When the merge is complete, select the rows where the MODIFICATION_DATE = SYSDATE and you
have the set you are interested in.
Why can't you use a MERGE statement for this? This is exactly what a MERGE is for. Here is a rough idea of how it would look...
merge into mytable mt
using
(
select key_field, value_field from sourcetable
) st
on
( mt.key_field = st.key_field )
when matched then update
set mt.value_field = st.value_field
when not matched then insert
( key_field, value_field )
values
( st.key_field, st.value_field )
;
Using a MERGE statement is fast because it is a single statement and the Oracle optimizer can utilize indexes and choose a better explain path than iterating through a cursor using PL/SQL.
If the keys are being generated from a sequence, then the normal way to get the key generated by that insert is to use the returning clause:
declare
v_insert_seq integer;
begin
insert into t1 (pk, c1)
values (myseq.nextval, 'value') returning pk into v_insert_seq;
end;
/
However, as best as I can tell, the merge statement doesn't support that returning feature.
Depending on the source of your new rows, there are different ways you could do this. If you are inserting one row at a time, then the approach above will work pretty well.
To detect the duplicate records, just catch the exceptions when you are inserting (when dup_val_on_index) and then handle them with updates.
If your source of rows is another table, you probably want to look at bulk inserts, and allowing Oracle to return you an array of new PK values. I tried this, but couldn't get it working, so perhaps it's not supported (or I'm missing something today - it gives a syntax error):
declare
type t_type is table of t1.pk%type;
v_insert_seqs t_type;
begin
insert into t1 (pk, c1)
select level newpk, 'value' c1value
from dual
connect by level <= 10 returning pk bulk collect into v_insert_seqs;
exception
when dup_val_on_index then
raise;
end;
/
The next best thing is to select the rows into arrays and then use bulk binds with the returning clause to capture the new PK IDs and also use Save Exceptions to catch all the rows that failed to inserted. Then you can process any of the failed inserted afterwards:
set serveroutput on
declare
type t_pk is table of t1.pk%type;
type t_c1 is table of t1.c1%type;
v_pks t_pk;
v_c1s t_c1;
v_new_pks t_pk;
ex_dml_errors EXCEPTION;
PRAGMA EXCEPTION_INIT(ex_dml_errors, -24381);
begin
-- get the batch of rows you want to insert
select level newpk, 'value' c1
bulk collect into v_pks, v_c1s
from dual connect by level <= 10;
-- bulk bind insert, saving exceptions and capturing the newly inserted
-- records
forall i in v_pks.first .. v_pks.last save exceptions
insert into t1 (pk, c1)
values (v_pks(i), v_c1s(i)) returning pk bulk collect into v_new_pks;
exception
-- Process the exceptions
when ex_dml_errors then
for i in 1..SQL%BULK_EXCEPTIONS.count loop
DBMS_OUTPUT.put_line('Error: ' || i ||
' Array Index: ' || SQL%BULK_EXCEPTIONS(i).error_index ||
' Message: ' || SQLERRM(-SQL%BULK_EXCEPTIONS(i).ERROR_CODE));
end loop;
end;
/
If you are running Oracle 10 or better, you might be able to do much the same thing, for nearly free by issuing a commit before the merge to update the SCN, then after the merge,
use the ORA_ROWSCN to detect which rows have changed.