I have created one query to update the 35 million records column,
but unfortunately, it took around more than one hour to process.
did I miss anything on the below query?
DECLARE
CURSOR exp_cur IS
SELECT
DECODE(
COLUMN_NAME,
NULL, NULL,
standard_hash(COLUMN_NAME)
) AS COLUMN_NAME
FROM TABLE1;
TYPE nt_fName IS TABLE OF VARCHAR2(100);
fname nt_fName;
BEGIN
OPEN exp_cur;
FETCH exp_cur BULK COLLECT INTO fname LIMIT 1000000;
CLOSE exp_cur;
--Print data
FOR idx IN 1 .. fname.COUNT
LOOP
UPDATE TABLE1 SET COLUMN_NAME=fname(idx);
commit;
DBMS_OUTPUT.PUT_LINE (idx||' '||fname(idx) );
END LOOP;
END;
The reason why bulk collect used with a forall construction is generally faster than the equivalent row-by-row loop is because it applies all the updates in one shot, instead of laboriously stepping though the rows one at a time and launching 35 million separate update statements, each one requiring the database to search for the individual row before updating it. But what you have written (even when the bugs are fixed) is still a row-by-row loop with 35 million search and update statements, plus the additional overhead of populating a 700 MB array in memory, 35 million commits, and 35 million dbms_output messages. It has to be slower because it has significantly more work to do than a plain update.
If it is practical to copy the data to a new table, insert will be a lot faster than update. At the end you can reapply any grants, indexes and constraints to the new table, rename both tables and drop the old one. You can also insert /*+ parallel enable_parallel_dml */ (or prior to Oracle 12c, you have to alter session enable parallel dml separately.) You could define the new table as nologging during the copy, but check with your DBA as that can affect replication and backups, though that might not matter if this is a test system. This will all need careful scripting if it's going to form part of a routine workflow.
Your code is updating all records of TABLE1 in each loop. (It loops 35 million times and in each loop updating 35 million records, That's why it is taking time)
You can simply use a single update statement as follows:
UPDATE TABLE1 SET COLUMN_NAME = standard_hash(COLUMN_NAME)
WHERE COLUMN_NAME IS NOT NULL;
So, If you want to use the BULK COLLECT and FORALL then you can use it as follows:
DECLARE
CURSOR EXP_CUR IS
SELECT COLUMN_NAME FROM TABLE1
WHERE COLUMN_NAME IS NOT NULL;
TYPE NT_FNAME IS TABLE OF VARCHAR2(100);
FNAME NT_FNAME;
BEGIN
OPEN EXP_CUR;
FETCH EXP_CUR BULK COLLECT INTO FNAME LIMIT 1000000;
FORALL IDX IN FNAME.FIRST..FNAME.LAST
UPDATE TABLE1
SET COLUMN_NAME = STANDARD_HASH(COLUMN_NAME)
WHERE COLUMN_NAME = FNAME(IDX);
COMMIT;
CLOSE EXP_CUR;
END;
/
Related
Does Postgres have any limits for a number of tables to be deleted in one DROP TABLE command (about 10000 in my case)? Does it depend on the version? Will it be faster than executing the command 10 times & 1000 tables?
The possibilities for testing this are limited in my case, so please share if you've had a similar experience.
There is no theoretical limit on the number of tables you can drop in one statement. However, each table dropped will require a couple of ACCESS EXCLUSIVE locks, which are retained in the locking table until the end of the transaction, so you will exceed the default limit of 6400 locks at some point. Increasing max_locks_per_transaction will increase the limit and is safe to do.
Use This For All Table Of Database.
DO $$
DECLARE
r RECORD;
BEGIN
FOR r IN
(
SELECT table_name
FROM information_schema.tables
WHERE table_schema=current_schema()
)
LOOP
EXECUTE 'DROP TABLE IF EXISTS ' || quote_ident(r.table_name) || ' CASCADE';
END LOOP;
END $$ ;
I made this procedure to bulk delete data (35m records). Can you see why this pl/sql procedure runs without exiting and rows are not getting deleted ?
create or replace procedure clear_logs
as
CURSOR c_logstodel IS SELECT * FROM test where id=23;
TYPE typ_log is table of test%ROWTYPE;
v_log_del typ_log;
BEGIN
OPEN c_logstodel;
LOOP
FETCH c_logstodel BULK COLLECT INTO v_log_del LIMIT 5000;
EXIT WHEN c_logstodel%NOTFOUND;
FORALL i IN v_log_del.FIRST..v_log_del.LAST
DELETE FROM test WHERE id =v_log_del(i).id;
COMMIT;
END LOOP;
CLOSE c_logstodel;
END clear_logs;
Adding in rowid instead of column name, exit when v_delete_data.count = 0; instead of EXIT WHEN c_logstodel%NOTFOUND; and changing chunk limit to 50,000 allowed the script clear 35 million rows in 15 mins
create or replace procedure clear_logs
as
CURSOR c_logstodel IS SELECT rowid FROM test where id=23;
TYPE typ_log is table of rowid index by binary_integer;
v_log_del typ_log;
BEGIN
OPEN c_logstodel;
LOOP
FETCH c_logstodel BULK COLLECT INTO v_log_del LIMIT 50000;
exit when v_log_del.count = 0;
FORALL i IN v_log_del.FIRST..v_log_del.LAST
DELETE FROM test WHERE rowid =v_log_del(i);
exit when v_log_del.count = 0;
COMMIT;
END LOOP;
COMMIT;
CLOSE c_logstodel;
END clear_logs;
First off when using BULK COLLECT LIMIT X the %NOTFOUND takes on a slightly unexpected meaning. In this case %NOTFOUND actually means Oracle could not retrieve X rows. (I guess technically it always does you fetch the next 1 and it says it could not fill the 1 row buffer.) Just move the EXIT WHEN %NOTFOUND to after the FORALL. But there is actually no reason to retrieve the data and then delete the retrieved rows. While one statement would be considerable faster 35M rows would require signifient rollback space. There is an interment solution.
Although not commonly used Delate statements generate rownum as do selects. This value can be user to limit the number or rows processed. So to break into a given commit size just limit rownum on the delete:
create or replace procedure clear_logs
as
k_max_rows_per_interation constant integer := 50000;
begin
loop
delete
from test
where id=23
and rownum <= k_max_rows_per_interation;
exit when sql%rowcount < k_max_rows_per_interation;
commit;
end loop;
commit;
end;
As #Stilgar points out deletes are expensive, meaning slow, so their solution may be better. But this has the advantage that it does not essentially take the table completely out-of-service during the operation. NOTE: I tend to use a much larger commit interval size, generally around 400,000 - 300,000 rows. I suggest you talk with your DBA see what they think this limit should be. Remember it is their job to properly size rollback space for typical operations. If this is normal in your operation they need to set it correctly. If you can get rollback space for 35M deletes then that is the fastest you are going to get.
I need help in optimizing this query to use bulk collect and forall statements. I have created backup tables (BCK_xxxx) to copy all data from original tables (ORIG_xxx) but I am having problems converting this to bulk collect. Most examples I saw in BULK collect includes already defining the table name and structure using %rowtype. However, I have hundreds of tables to backup so I need my query specifically the table name to be dynamic. This my original query that inserts/deleted data one by one without bulk collect and takes a lot of time:
DECLARE
--select all table names from backup tables (ex: BCK_tablename)
CURSOR cur_temp_tbl IS
SELECT table_name
FROM all_tables
WHERE OWNER = 'BCKUP'
ORDER BY 1;
--select all table names from original tables (ex: ORIG_tablename)
CURSOR cur_original_tbl IS
SELECT table_name
FROM all_tables
WHERE OWNER = 'ORIG'
ORDER BY 1;
l_tbl_nm VARCHAR2(30 CHAR);
BEGIN
--first loop to delete all tables from backup
FOR a IN cur_temp_tbl LOOP
l_tbl_nm := a.table_name;
EXECUTE IMMEDIATE 'DELETE FROM '|| l_tbl_nm;
l_deleted_cnt := l_deleted_cnt +1;
END LOOP;
--second loop to insert data from original to backup
FOR b IN cur_original_tbl LOOP
l_tbl_nm := b.table_name;
CASE
WHEN INSTR(l_tbl_nm,'ORIG_') > 0 THEN
l_tbl_nm := REPLACE(l_tbl_nm,'ORIG_','BCK_');
ELSE
l_tbl_nm := 'BCK_' || l_tbl_nm;
END CASE;
EXECUTE IMMEDIATE 'INSERT INTO ' || l_tbl_nm || ' SELECT * FROM ' || b.table_name;
l_inserted_cnt := l_inserted_cnt +1;
END LOOP;
dbms_output.put_line('Deleted/truncated tables from backup :' ||l_deleted_cnt);
dbms_output.put_line('No of tables inserted with data from original to backup :' ||l_inserted_cnt);
EXCEPTION
WHEN OTHERS THEN
dbms_output.put_line(SQLERRM);
dbms_output.put_line(l_tbl_nm);
END;
I am thinking of including the code below to add after my second loop but I am having problems how to declare the 'cur_tbl' cursor and 'l_tbl_data' TABLE data type. I am unable to use rowtype since the tablename should be dynamic and will change in each iteration of my second loop that will list all table names from original table:
TYPE CurTblTyp IS REF CURSOR;
cur_tbl CurTblTyp;
TYPE l_tbl_t IS TABLE OF tablename.%ROWTYPE;
l_tbl_data l_tbl_t ;
OPEN cur_tbl FOR 'SELECT * FROM :s ' USING b.table_name;
FETCH cur_tbl BULK COLLECT INTO l_tbl_data LIMIT 5000;
EXIT WHEN cur_tbl%NOTFOUND;
CLOSE cur_tbl;
FORALL i IN 1 .. l_tbl_data .count
EXECUTE IMMEDIATE 'insert into '||l_tbl_nm||' values (:1)' USING
l_tbl_data(i);
Hope you can help me and suggest how I can make this code much simpler. Thanks a lot.
It looks like you want to remove all rows from the existing backup tables, then re-copy the entire contents from the original tables to the backup tables. If this is correct, using DELETE for the deletion and any loop operation for the insert will be slow.
First, to remove the data, use TRUNCATE. Since you are going to repopulate, use the REUSE STORAGE option. This is the most efficient way to delete all rows from the table.
TRUNCATE TABLE <backup table> REUSE STORAGE;
Second, to repopulate, just INSERT with a SELECT.
INSERT INTO <backup table> SELECT * FROM <orig table>;
You can use these in your loops as you loop by table. No need to cursor through the table rows as this will be faster.
If you have a new table, you can do something similar with a CTAS...
CREATE TABLE <backup table> AS SELECT * FROM <orig_table>;
There is a 3rd option in addition to the delete and truncate options: that's rename/drop. You rename the old back up tables, recreate the new backups (CTAS). If the create - insert is successful you drop the renamed tables, if the new backup fails you rename the prior old backups back to the initial backup names. You basically trade temporary usage of disk space for redo logs.
You don't need bulk processing, CTAS is still faster than bulk processing.
Have you used FORCE DELETE?
It was first introduced by the Oracle Master J.B.E
it is used to delete the data and ignores the constraint that the table may have and is a lot more faster than other delete statements.
FORCE DELETE FROM <table_name>;
I am fairly new to SQL and was wondering if someone can help me.
I got a database that has around 10 million rows.
I need to make a script that finds the records that have some NULL fields, and then updates it to a certain value.
The problem I have from doing a simple update statement, is that it will blow the rollback space.
I was reading around that I need to use BULK COLLECT AND FETCH.
My idea was to fetch 10,000 records at a time, update, commit, and continue fetching.
I tried looking for examples on Google but I have not found anything yet.
Any help?
Thanks!!
This is what I have so far:
DECLARE
CURSOR rec_cur IS
SELECT DATE_ORIGIN
FROM MAIN_TBL WHERE DATE_ORIGIN IS NULL;
TYPE date_tab_t IS TABLE OF DATE;
date_tab date_tab_t;
BEGIN
OPEN rec_cur;
LOOP
FETCH rec_cur BULK COLLECT INTO date_tab LIMIT 1000;
EXIT WHEN date_tab.COUNT() = 0;
FORALL i IN 1 .. date_tab.COUNT
UPDATE MAIN_TBL SET DATE_ORIGIN = '23-JAN-2012'
WHERE DATE_ORIGIN IS NULL;
END LOOP;
CLOSE rec_cur;
END;
I think I see what you're trying to do. There are a number of points I want to make about the differences between the code below and yours.
Your forall loop will not use an index. This is easy to get round by using rowid to update your table.
By committing after each forall you reduce the amount of undo needed; but make it more difficult to rollback if something goes wrong. Though logically your query could be re-started in the middle easily and without detriment to your objective.
rowids are small, collect at least 25k at a time; if not 100k.
You cannot index a null in Oracle. There are plenty of questions on stackoverflow about this is you need more information. A functional index on something like nvl(date_origin,'x') as a loose example would increase the speed at which you select data. It also means you never actually have to use the table itself. You only select from the index.
Your date data-type seems to be a string. I've kept this but it's not wise.
If you can get someone to increase your undo tablespace size then a straight up update will be quicker.
Assuming as per your comments date_origin is a date then the index should be on something like:
nvl(date_origin,to_date('absolute_minimum_date_in_Oracle_as_a_string','yyyymmdd'))
I don't have access to a DB at the moment but to find out the amdiOaas run the following query:
select to_date('0001','yyyy') from dual;
It should raise a useful error for you.
Working example in PL/SQL Developer.
create table main_tbl as
select cast( null as date ) as date_origin
from all_objects
;
create index i_main_tbl
on main_tbl ( nvl( to_date(date_origin,'yyyy-mm-dd')
, to_date('0001-01-01' ,'yyyy-mm-dd') )
)
;
declare
cursor c_rec is
select rowid
from main_tbl
where nvl(date_origin,to_date('0001-01-01','yyyy-mm-dd'))
= to_date('0001-01-01','yyyy-mm-dd')
;
type t__rec is table of rowid index by binary_integer;
t_rec t__rec;
begin
open c_rec;
loop
fetch c_rec bulk collect into t_rec limit 50000;
exit when t_rec.count = 0;
forall i in t_rec.first .. t_rec.last
update main_tbl
set date_origin = to_date('23-JAN-2012','DD-MON-YYYY')
where rowid = t_rec(i)
;
commit ;
end loop;
close c_rec;
end;
/
I have a table with a lot of records (could be more than 500 000 or 1 000 000). I added a new column in this table and I need to fill a value for every row in the column, using the corresponding row value of another column in this table.
I tried to use separate transactions for selecting every next chunk of 100 records and update the value for them, but still this takes hours to update all records in Oracle10 for example.
What is the most efficient way to do this in SQL, without using some dialect-specific features, so it works everywhere (Oracle, MSSQL, MySQL, PostGre etc.)?
ADDITIONAL INFO: There are no calculated fields. There are indexes. Used generated SQL statements which update the table row by row.
The usual way is to use UPDATE:
UPDATE mytable
SET new_column = <expr containing old_column>
You should be able to do this is a single transaction.
As Marcelo suggests:
UPDATE mytable
SET new_column = <expr containing old_column>;
If this takes too long and fails due to "snapshot too old" errors (e.g. if the expression queries another highly-active table), and if the new value for the column is always NOT NULL, you could update the table in batches:
UPDATE mytable
SET new_column = <expr containing old_column>
WHERE new_column IS NULL
AND ROWNUM <= 100000;
Just run this statement, COMMIT, then run it again; rinse, repeat until it reports "0 rows updated". It'll take longer but each update is less likely to fail.
EDIT:
A better alternative that should be more efficient is to use the DBMS_PARALLEL_EXECUTE API.
Sample code (from Oracle docs):
DECLARE
l_sql_stmt VARCHAR2(1000);
l_try NUMBER;
l_status NUMBER;
BEGIN
-- Create the TASK
DBMS_PARALLEL_EXECUTE.CREATE_TASK ('mytask');
-- Chunk the table by ROWID
DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID('mytask', 'HR', 'EMPLOYEES', true, 100);
-- Execute the DML in parallel
l_sql_stmt := 'update EMPLOYEES e
SET e.salary = e.salary + 10
WHERE rowid BETWEEN :start_id AND :end_id';
DBMS_PARALLEL_EXECUTE.RUN_TASK('mytask', l_sql_stmt, DBMS_SQL.NATIVE,
parallel_level => 10);
-- If there is an error, RESUME it for at most 2 times.
l_try := 0;
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask');
WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)
LOOP
l_try := l_try + 1;
DBMS_PARALLEL_EXECUTE.RESUME_TASK('mytask');
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask');
END LOOP;
-- Done with processing; drop the task
DBMS_PARALLEL_EXECUTE.DROP_TASK('mytask');
END;
/
Oracle Docs: https://docs.oracle.com/database/121/ARPLS/d_parallel_ex.htm#ARPLS67333
You could drop any indexes on the table, then do your insert, and then recreate the indexes.
Might not work you for, but a technique I've used a couple times in the past for similar circumstances.
created updated_{table_name}, then select insert into this table in batches. Once finished, and this hinges on Oracle ( which I don't know or use ) supporting the ability to rename tables in an atomic fashion. updated_{table_name} becomes {table_name} while {table_name} becomes original_{table_name}.
Last time I had to do this was for a heavily indexed table with several million rows that absolutely positively could not be locked for the duration needed to make some serious changes to it.
What is the database version? Check out virtual columns in 11g:
Adding Columns with a Default Value
http://www.oracle.com/technology/pub/articles/oracle-database-11g-top-features/11g-schemamanagement.html
update Hotels set Discount=30 where Hotelid >= 1 and Hotelid <= 5504
For Postgresql I do something like this (if we are sure no more updates/inserts take place):
create table new_table as table orig_table with data;
update new_table set column = <expr>
start transaction;
drop table orig_table;
rename new_table to orig_table;
commit;
Update:
One improvement is that if your table is very large you will not lock the table, this operation in this case could take minutes.
Only if you are sure in the process no inserts and/or updates take
place.