Update million rows using rowids from one table to another Oracle - sql

Hi I have two table with million rows in each.I have oracle 11 g R1
I am sure many of us must have gone through this situation.
What is the most efficient and fast way to update from one table to another where the values are DIFFERENT.
Eg: Table 1 has 4 NUMBER columns with a high precision eg : 0.2212454215454212
Table 2 has 6 columns.
update table 2's four columns based on common column on both the tables, only the different ones.
I have something like this
DECLARE
TYPE test1_t IS TABLE OF test.score%TYPE INDEX BY PLS_..;
TYPE test2_t IS TABLE OF test.id%TYPE INDEX BY PLS..;
TYPE test3_t IS TABLE OF test.Crank%TYPE INDEX BY PLS..;
vscore test1_t;
vid test2_t;
vurank test4_t;
BEGIN
SELECT id,score,urank
BULK COLLECT INTO vid,vscore,vurank
FROM test;
FORALL i IN 1 .. vid.COUNT
MERGE INTO final T
USING (SELECT vid (i) AS o_id,
vurank (i) AS o_urank,
vscore (i) AS o_score FROM DUAL) S
ON (S.o_id = T.id)
WHEN MATCHED THEN
UPDATE SET T.crank = S.o_crank
WHERE T.crank <> S.o_crank;
Since the numbers are with high precision is it slowing down?
I tried Bulk Collect and Merge combination still its taking time ~ 30 mins for worst case scenario if I have to update 1 million rows.
Is there something with rowid?
Help will be appreciated.

If you want to update all the rows, then just use update:
update table_1
set (col1,
col2) = (
select col1,
col2
from table2
where table2.col_a = table1.col_a and
table2.col_b = table1.col_b)
Bulk collect or any PL/SQL technique will always be slower than a pure SQL technique.
The numeric precision is probably not significant, and rowid is not relevant as there is no common value between the two tables.

When dealing with millions of rows, parallel DML is a game changer. Of course you need to have Enterprise Edition to use parallel, but it's really the only thing which will make much difference.
I recommend you read an article on OraFAQ by rleishman comparing 8 Bulk Update Methods. His key finding is that "the cost of disk reads so far outweighs the context switches that that they are barely noticable (sic)". In other words, unless your data is already cached in memory there really isn't a significant difference between SQL and PL/SQL approaches.
The article does have some neat suggestions on employing parallel. The surprising outcome is that a parallel pipelined function offers the best performance.

Focusing on the syntax have been used and skipping the logic (may using a pure update + pure insert may solve the problem, merge cost, indexes, possible full scan on merge and else )
You should use Limit in Bulk Collect syntax
Using a bulk collect with no limit
Will case all records to be loaded in memory
With no partially committed merges, you will create a larg redolog,
that must be apply in the end of the process.
Both will reason in low performance.
DECLARE
v_fetchSize NUMBER := 1000; -- based on hardware, design and .... could be scaled
CURSOR a_cur IS
SELECT id,score,urank FROM test;
TYPE myarray IS TABLE OF a_cur%ROWTYPE;
cur_array myarray;
BEGIN
OPEN a_cur;
LOOP
FETCH a_cur BULK COLLECT INTO cur_array LIMIT v_fetchSize;
FORALL i IN 1 .. cur_array.COUNT
// DO Operation
COMMIT;
EXIT WHEN a_cur%NOTFOUND;
END LOOP;
CLOSE a_cur;
END;

Just to be sure: test.id and final.id must be indexed.
With first select ... from test you got too much records from Table 1 and after that you need to compare all of them with records on Table 2. Try to select only what you need to update. So, there are at least 2 variants:
a) select only changed records:
SELECT source_table.id, source_table.score, source_table.urank
BULK COLLECT INTO vid,vscore,vurank
FROM
test source_table,
final destination_table
where
source_table.id = destination_table.id
and
source_table.crank <> destination_table.crank
;
b) Add new field to source table with datetime value and fill it in trigger with current time. While synchronizing pick only records changed during last day. This field needs to be indexed.
After such a change on update phase you don't need to compare other fields, only match ID's:
FORALL i IN 1 .. vid.COUNT
MERGE INTO FINAL T
USING (
SELECT vid (i) AS o_id,
vurank (i) AS o_urank,
vscore (i) AS o_score FROM DUAL
) S
ON (S.o_id = T.id)
WHEN MATCHED
THEN UPDATE SET T.crank = S.o_crank
If you worry about size of undo/redo segments then variant b) is more useful, because you can get records from source Table 1 divided to time slices and commit changes after updating every slice. E.g. from 00:00 to 01:00 , from 01:00 to 02:00 etc.
In this variant update can be done just by SQL statement without selecting a data into collections in row with maintaining acceptable sizes of redo/undo logs.

Related

Why changing where statement to a variable cause query to be 4 times slower

I am inserting data from one table "Tags" from "Recovery" database into another table "Tags" in "R3" database
they all live in my laptop similar SQL Server instance
I have built the insert query and because Recovery..Tags table is around 180M records I decided to break it into smaller sebsets. ( 1 million recs at the time)
Here is my query (Let's call Query A)
insert into R3..Tags (iID,DT,RepID,Tag,xmiID,iBegin,iEnd,Confidence,Polarity,Uncertainty,Conditional,Generic,HistoryOf,CodingScheme,Code,CUI,TUI,PreferredText,ValueBegin,ValueEnd,Value,Deleted,sKey,RepType)
SELECT T.iID,T.DT,T.RepID,T.Tag,T.xmiID,T.iBegin,T.iEnd,T.Confidence,T.Polarity,T.Uncertainty,T.Conditional,T.Generic,T.HistoryOf,T.CodingScheme,T.Code,T.CUI,T.TUI,T.PreferredText,T.ValueBegin,T.ValueEnd,T.Value,T.Deleted,T.sKey,R.RepType
FROM Recovery..tags T inner join Recovery..Reps R on T.RepID = R.RepID
where T.iID between 13000001 and 14000000
it takes around 2 minutes.
That is ok
To make things a bit easier for me
I put the iiD in the were statement in a variable
so my query looks like this (Let's call Query B)
declare #i int = 12
insert into R3..Tags (iID,DT,RepID,Tag,xmiID,iBegin,iEnd,Confidence,Polarity,Uncertainty,Conditional,Generic,HistoryOf,CodingScheme,Code,CUI,TUI,PreferredText,ValueBegin,ValueEnd,Value,Deleted,sKey,RepType)
SELECT T.iID,T.DT,T.RepID,T.Tag,T.xmiID,T.iBegin,T.iEnd,T.Confidence,T.Polarity,T.Uncertainty,T.Conditional,T.Generic,T.HistoryOf,T.CodingScheme,T.Code,T.CUI,T.TUI,T.PreferredText,T.ValueBegin,T.ValueEnd,T.Value,T.Deleted,T.sKey,R.RepType
FROM Recovery..tags T inner join Recovery..Reps R on T.RepID = R.RepID
where T.iID between (1000000 * #i) + 1 and (#i+1)*1000000
but that cause the insert to become so slow (around 10 min)
So what I tried query A again and gave me around 2 min
I tried query B again and gave around 8 min!!
I am attaching exec plan for each one (at a site that shows an analysis of the query plan) - Query A Plan and Query B Plan
Any idea why this is happening?
and how to fix it?
The big difference in time is due to the very different plans that are being created to join Tags and Reps.
Fundamentally, in version A, it knows how much data is being extracted (a million rows) and it can design an efficient query for that. However, because you are using variables in B to define how much data is being imported, it has to define a more generic query - one that would work for 10 rows, a million rows, or a hundred million rows.
In the plans, here are the relevant sections of the query joining Tags and Reps...
... in A
... and B
Note that in A it takes just over a minute to do the join; in B it takes 6 and a half minutes.
The key thing that appears to take the time is that it does a table scan of the Tags table which takes 5:44 to complete. The plan has this as a table scan, as the next time you run the query you may want many more than 1 million rows.
A secondary issue is that the amount of data it reads (or expects to read) from Reps is also way out of whack. In A it expected to read 2 million rows and read 1421; in B it basically read them all (even though technically it probably only needed the same 1421).
I think you have two main approaches to fix
Look at indexing, to remove the table scan on Tags - ensure the indexes match what is needed and allows the query to do a scan on that index (it appears that the index at the top of #MikePetri's answer is what you need, or similar). This way instead of doing a table scan, it can do an index scan which can start 'in the middle' of the data set (a table scan must start at either the start or end of the data set).
Separate this into two processes. The first process gets the relevant million rows from Tags, and saves it in a temporary table. The second process uses the data in the temporary table to join to Reps (also try using option (recompile) in the second query, so that it checks the temporary table's size before creating the plan).
You can even put an index or two (and/or Primary Key) on that temporary table to make it better for the next step.
The reason the first query is so much faster is it went parallel. This means the cardinality estimator knew enough about the data it had to handle, and the query was large enough to tip the threshold for parallel execution. Then, the engine passed chunks of data for different processors to handle individually, then report back and repartition the streams.
With the value as a variable, it effectively becomes a scalar function evaluation, and a query cannot go parallel with a scalar function, because the value has to determined before the cardinality estimator can figure out what to do with it. Therefore, it runs in a single thread, and is slower.
Some sort of looping mechanism might help. Create the included indexes to assist the engine in handling this request. You can probably find a better looping mechanism, since you are familiar with the identity ranges you care about, but this should get you in the right direction. Adjust for your needs.
With a loop like this, it commits the changes with each loop, so you aren't locking the table indefinitely.
USE Recovery;
GO
CREATE INDEX NCI_iID
ON Tags (iID)
INCLUDE (
DT
,RepID
,tag
,xmiID
,iBegin
,iEnd
,Confidence
,Polarity
,Uncertainty
,Conditional
,Generic
,HistoryOf
,CodingScheme
,Code
,CUI
,TUI
,PreferredText
,ValueBegin
,ValueEnd
,value
,Deleted
,sKey
);
GO
CREATE INDEX NCI_RepID ON Reps (RepID) INCLUDE (RepType);
USE R3;
GO
CREATE INDEX NCI_iID ON Tags (iID);
GO
DECLARE #RowsToProcess BIGINT
,#StepIncrement INT = 1000000;
SELECT #RowsToProcess = (
SELECT COUNT(1)
FROM Recovery..tags AS T
WHERE NOT EXISTS (
SELECT 1
FROM R3..Tags AS rt
WHERE T.iID = rt.iID
)
);
WHILE #RowsToProcess > 0
BEGIN
INSERT INTO R3..Tags
(
iID
,DT
,RepID
,Tag
,xmiID
,iBegin
,iEnd
,Confidence
,Polarity
,Uncertainty
,Conditional
,Generic
,HistoryOf
,CodingScheme
,Code
,CUI
,TUI
,PreferredText
,ValueBegin
,ValueEnd
,Value
,Deleted
,sKey
,RepType
)
SELECT TOP (#StepIncrement)
T.iID
,T.DT
,T.RepID
,T.Tag
,T.xmiID
,T.iBegin
,T.iEnd
,T.Confidence
,T.Polarity
,T.Uncertainty
,T.Conditional
,T.Generic
,T.HistoryOf
,T.CodingScheme
,T.Code
,T.CUI
,T.TUI
,T.PreferredText
,T.ValueBegin
,T.ValueEnd
,T.Value
,T.Deleted
,T.sKey
,R.RepType
FROM Recovery..tags AS T
INNER JOIN Recovery..Reps AS R ON T.RepID = R.RepID
WHERE NOT EXISTS (
SELECT 1
FROM R3..Tags AS rt
WHERE T.iID = rt.iID
)
ORDER BY
T.iID;
SET #RowsToProcess = #RowsToProcess - #StepIncrement;
END;

Update statement taking more than 24 hours for updating 32k rows

Below is the update statement that I am running for 32k times and it is taking more than 15 hours and running.
I have to update the value in table 2 for 32k different M_DISPLAY VALUES.
UPDATE TABLE_2 T2 SET T2.M_VALUE = 'COL_ANC'
WHERE EXISTS (SELECT 1 FROM TABLE_2 T1 WHERE TRIM(T1.M_DISPLAY) = 'ANCHORTST' AND T1.M_LABEL=T2.M_LABEL );
Am not sure Why is it taking such a long time as I have tuned the query,
I have copied 32000 update statements in a Update.sql file and running the SQL in command line.
Though it is updating the table, it is a neverending process
Please advice if I have gone wrong anywhere
Regards
Using FORALL
If you cannot rewrite the query to run a single bulk-update instead of 32k individual updates, you might still get lucky by using PL/SQL's FORALL. An example:
DECLARE
TYPE rec_t IS RECORD (
m_value table_2.m_value%TYPE,
m_display table_2.m_display%TYPE
);
TYPE tab_t IS TABLE OF rec_t;
data tab_t := tab_t();
BEGIN
-- Fill in data object. Replace this by whatever your logic for matching
-- m_value to m_display is
data.extend(1);
data(1).m_value := 'COL_ANC';
data(1).m_display := 'ANCHORTST';
-- Then, run the 32k updates using FORALL
FORALL i IN 1 .. data.COUNT
UPDATE table_2 t2
SET t2.m_value = data(i).m_value
WHERE EXISTS (
SELECT 1
FROM table_2 t1
WHERE trim(t1.m_display) = data(i).m_display
AND t1.m_label = t2.m_label
);
END;
/
Concurrency
If you're not the only process on the system, 32k updates in a single transaction can hurt. It's definitely worth committing a few thousand rows in sub-transactions to reduce concurrency effects with other processes that might read the same table while you're updating.
Bulk update
Really, the goal of any improvement should be bulk updating the entire data set in one go (or perhaps split in a few bulks, see concurrency).
If you had a staging table containing the update instructions:
CREATE TABLE update_instructions (
m_value VARCHAR2(..),
m_display VARCHAR2(..)
);
Then you could pull off something along the lines of:
MERGE INTO table_2 t2
USING (
SELECT u.*, t1.m_label
FROM update_instructions u
JOIN table_2 t1 ON trim(t1.m_display) = u.m_display
) t1
ON t2.m_label = t1.m_label
WHEN MATCHED THEN UPDATE SET t2.m_value = t1.m_value;
This should be even faster than FORALL (but might have more concurrency implications).
Indexing and data sanitisation
Of course, one thing that might definitely hurt you when running 32k individual update statements is the TRIM() function, which prevents using an index on M_DISPLAY efficiently. If you could sanitise your data so it doesn't need trimming first, that would definitely help. Otherwise, you could add a function based index just for the update (and then drop it again):
CREATE INDEX i ON table_2 (trim (m_display));
The query and subquery query the same table: TABLE_2. Assuming that M_LABEL is unique, the subquery returns 1s for all rows in TABLE_2 where M_DISPLAY is ANCHORTST. Then the update query updates the same (!) TABLE_2 for all 1s returned from subquery - so for all rows where M_DISPLAY is ANCHORTST.
Therefore, the query could be simplified, exploiting the fact that both update and select work on the same table - TABLE_2:
UPDATE TABLE_2 T2 SET T2.M_VALUE = 'COL_ANC' WHERE TRIM(T2.M_DISPLAY) = 'ANCHORTST'
If M_LABEL is not unique, then the above is not going to work - thanks to commentators for pointing that out!
For significantly faster execution:
Ensure that you have created an index on M_DISPLAY and M_LABEL columns that are in your WHERE clause.
Ensure that M_DISPLAY has a function based index. If it does not, then do not pass it to the TRIM function, because the function will prevent the database from using the index that you have created for the M_DISPLAY column. TRIM the data before storing in the table.
Thats it.
By the way, as has been mentioned, you shouldn't need 32k queries for meeting your objective. One will probably suffice. Look into query based update. As an example, see the accepted answer here:
Oracle SQL: Update a table with data from another table

How to SELECT COUNT from tables currently being INSERT?

Hi consider there is an INSERT statement running on a table TABLE_A, which takes a long time, I would like to see how has it progressed.
What I tried was to open up a new session (new query window in SSMS) while the long running statement is still in process, I ran the query
SELECT COUNT(1) FROM TABLE_A WITH (nolock)
hoping that it will return right away with the number of rows everytime I run the query, but the test result was even with (nolock), still, it only returns after the INSERT statement is completed.
What have I missed? Do I add (nolock) to the INSERT statement as well? Or is this not achievable?
(Edit)
OK, I have found what I missed. If you first use CREATE TABLE TABLE_A, then INSERT INTO TABLE_A, the SELECT COUNT will work. If you use SELECT * INTO TABLE_A FROM xxx, without first creating TABLE_A, then non of the following will work (not even sysindexes).
Short answer: You can't do this.
Longer answer: A single INSERT statement is an atomic operation. As such, the query has either inserted all the rows or has inserted none of them. Therefore you can't get a count of how far through it has progressed.
Even longer answer: Martin Smith has given you a way to achieve what you want. Whether you still want to do it that way is up to you of course. Personally I still prefer to insert in manageable batches if you really need to track progress of something like this. So I would rewrite the INSERT as multiple smaller statements. Depending on your implementation, that may be a trivial thing to do.
If you are using SQL Server 2016 the live query statistics feature can allow you to see the progress of the insert in real time.
The below screenshot was taken while inserting 10 million rows into a table with a clustered index and a single nonclustered index.
It shows that the insert was 88% complete on the clustered index and this will be followed by a sort operator to get the values into non clustered index key order before inserting into the NCI. This is a blocking operator and the sort cannot output any rows until all input rows are consumed so the operators to the left of this are 0% done.
With respect to your question on NOLOCK
It is trivial to test
Connection 1
USE tempdb
CREATE TABLE T2
(
X INT IDENTITY PRIMARY KEY,
F CHAR(8000)
);
WHILE NOT EXISTS(SELECT * FROM T2 WITH (NOLOCK))
LOOP:
SELECT COUNT(*) AS CountMethod FROM T2 WITH (NOLOCK);
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('T2');
RAISERROR ('Waiting for 10 seconds',0,1) WITH NOWAIT;
WAITFOR delay '00:00:10';
SELECT COUNT(*) AS CountMethod FROM T2 WITH (NOLOCK);
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('T2');
RAISERROR ('Waiting to drop table',0,1) WITH NOWAIT
DROP TABLE T2
Connection 2
use tempdb;
--Insert 2000 * 2000 = 4 million rows
WITH T
AS (SELECT TOP 2000 'x' AS x
FROM master..spt_values)
INSERT INTO T2
(F)
SELECT 'X'
FROM T v1
CROSS JOIN T v2
OPTION (MAXDOP 1)
Example Results - Showing row count increasing
SELECT queries with NOLOCK allow dirty reads. They don't actually take no locks and can still be blocked, they still need a SCH-S (schema stability) lock on the table (and on a heap it will also take a hobt lock).
The only thing incompatible with a SCH-S is a SCH-M (schema modification) lock. Presumably you also performed some DDL on the table in the same transaction (e.g. perhaps created it in the same tran)
For the use case of a large insert, where an approximate in flight result is fine, I generally just poll sysindexes as shown above to retrieve the count from metadata rather than actually counting the rows (non deprecated alternative DMVs are available)
When an insert has a wide update plan you can even see it inserting to the various indexes in turn that way.
If the table is created inside the inserting transaction this sysindexes query will still block though as the OBJECT_ID function won't return a result based on uncommitted data regardless of the isolation level in effect. It's sometimes possible to get around that by getting the object_id from sys.tables with nolock instead.
Use the below query to find the count for any large table or locked table or being inserted table in seconds . Just replace the table name which you want to search.
SELECT
Total_Rows= SUM(st.row_count)
FROM
sys.dm_db_partition_stats st
WHERE
object_name(object_id) = 'TABLENAME' AND (index_id < 2)
For those who just need to see the record count while executing a long running INSERT script, I found you can see the current record count through SSMS by right clicking on the destination database table, -> Properties -> Storage, then view the "Row Count" value like so:
Close window and repeat to see the updated record count.

update x set y = null takes a long time

At work, I have a large table (some 3 million rows, like 40-50 columns). I sometimes need to empty some of the columns and fill them with new data. What I did not expect is that
UPDATE table1 SET y = null
takes much more time than filling the column with data which is generated, for example, in the sql query from other columns of the same table or queried from other tables in a subquery. It does not matter if I go through all table rows at once (like in the update query above) or if I use a cursor to go through the table row by row (using the pk). It does not matter if I use the large table at work or if I create a small test table and fill it with some hundredthousands of test-rows. Setting the column to null always takes way longer (Throughout the tests, I encountered factors of 2 to 10) than updating the column with some dynamic data (which is different for each row).
Whats the reason for this? What does Oracle do when setting a column to null? Or - what's is my error in reasoning?
Thanks for your help!
P.S.: I am using oracle 11g2, and found these results using both plsql developer and oracle sql developer.
Is column Y indexed? It could be that setting the column to null means Oracle has to delete from the index, rather than just update it. If that's the case, you could drop and rebuild it after updating the data.
EDIT:
Is it just column Y that exhibits the issue, or is it independent of the column being updated? Can you post the table definition, including constraints?
Summary
I think updating to null is slower because Oracle (incorrectly) tries to take advantage of the way it stores nulls, causing it to frequently re-organize the rows in the block ("heap block compress"), creating a lot of extra UNDO and REDO.
What's so special about null?
From the Oracle Database Concepts:
"Nulls are stored in the database if they fall between columns with data values. In these cases they require 1 byte to store the length of the column (zero).
Trailing nulls in a row require no storage because a new row header signals that the remaining columns in the previous row are null. For example, if the last three columns of a table are null, no information is stored for those columns. In tables with many columns,
the columns more likely to contain nulls should be defined last to conserve disk space."
Test
Benchmarking updates is very difficult because the true cost of an update cannot be measured just from the update statement. For example, log switches will
not happen with every update, and delayed block cleanout will happen later. To accurately test an update, there should be multiple runs,
objects should be recreated for each run, and the high and low values should be discarded.
For simplicity the script below does not throw out high and low results, and only tests a table with a single column. But the problem still occurs regardless of the number of columns, their data, and which column is updated.
I used the RunStats utility from http://www.oracle-developer.net/utilities.php to compare the resource consumption of updating-to-a-value with updating-to-a-null.
create table test1(col1 number);
BEGIN
dbms_output.enable(1000000);
runstats_pkg.rs_start;
for i in 1 .. 10 loop
execute immediate 'drop table test1 purge';
execute immediate 'create table test1 (col1 number)';
execute immediate 'insert /*+ append */ into test1 select 1 col1
from dual connect by level <= 100000';
commit;
execute immediate 'update test1 set col1 = 1';
commit;
end loop;
runstats_pkg.rs_pause;
runstats_pkg.rs_resume;
for i in 1 .. 10 loop
execute immediate 'drop table test1 purge';
execute immediate 'create table test1 (col1 number)';
execute immediate 'insert /*+ append */ into test1 select 1 col1
from dual connect by level <= 100000';
commit;
execute immediate 'update test1 set col1 = null';
commit;
end loop;
runstats_pkg.rs_stop();
END;
/
Result
There are dozens of differences, these are the four I think are most relevant:
Type Name Run1 Run2 Diff
----- ---------------------------- ------------ ------------ ------------
TIMER elapsed time (hsecs) 1,269 4,738 3,469
STAT heap block compress 1 2,028 2,027
STAT undo change vector size 55,855,008 181,387,456 125,532,448
STAT redo size 133,260,596 581,641,084 448,380,488
Solutions?
The only possible solution I can think of is to enable table compression. The trailing-null storage trick doesn't happen for compressed tables.
So even though the "heap block compress" number gets even higher for Run2, from 2028 to 23208, I guess it doesn't actually do anything.
The redo, undo, and elapsed time between the two runs is almost identical with table compression enabled.
However, there are lots of potential downsides to table compression. Updating to a null will run much faster, but every other update will run at least slightly slower.
That's because it deletes from blocks that data.
And delete is the hardest operation. If you can avoid a delete, do it.
I recommend you to create another table with that column null(Create table as select for example, or insert select), and fill it(the column) with your procedure. Drop old table and then rename the new table with current name.
UPDATE:
Another important thing is that you should update the column as is, with new values. It is useless to set them null and after that refill them.
If you do not have values for all rows, you can do the update like this:
udpate table1
set y = (select new_value from source where source.key = table1.key)
and will set to null those rows that does not exists in source.
I would try what Tom Kyte suggested on large updates.
When it comes to huge tables, it best to go like this : take a few rows, update them, take some more, update those etc. Don't try to issue an update on all the table. That's a killer move right from the start.
Basically create binary_integer indexed table, fetch 10 rows at a time, and update them.
Here is a piece of code that i have used of large tables with success. Because im lazy and its like 2AM now ill just copy paste it here and let you figure it out, but let me know if you need help :
DECLARE
TYPE BookingRecord IS RECORD (
bprice number,
bevent_id number,
book_id number
);
TYPE array is TABLE of BookingRecord index by binary_integer;
l_data array;
CURSOR c1 is
SELECT LVC_USD_PRICE_V2(ev.activity_version_id,ev.course_start_date,t.local_update_date,ev.currency,nvl(t.delegate_country,ev.sponsor_org_country),ev.price,ev.currency,t.ota_status,ev.location_type) x,
ev.title,
t.ota_booking_id
FROM ota_gsi_delegate_bookings_t#diseulprod t,
inted_parted_events_t#diseulprod ev
WHERE t.event_id = ev.event_id
and t.ota_booking_id =
BEGIN
open c1;
loop
fetch c1 bulk collect into l_data limit 20;
for i in 1..l_data.count
loop
update ou_inc_int_t_01
set price = l_data(i).bprice,
updated = 'Y'
where booking_id = l_data(i).book_id;
end loop;
exit when c1%notfound;
end loop;
close c1;
END;
what can also help speed up updates is to use alter table table1 nologging so that the update won't generate redo logs. another possibility is to drop the column and re-add it. since this is a DDL operation it will generate neither redo nor undo.

Selecting a subset of records from a very large record set in Oracle runs out of memory

I have a process that is converting dates from GMT to Australian Eastern Standard Time. To do this, I need to select the records from the database, process them and then save them back.
To select the records, I have the following query:
SELECT id,
user_id,
event_date,
event,
resource_id,
resource_name
FROM
(SELECT rowid id,
rownum r,
user_id,
event_date,
event,
resource_id,
resource_name
FROM user_activity
ORDER BY rowid)
WHERE r BETWEEN 0 AND 50000
to select a block of 50000 rows from a total of approx. 60 million rows. I am splitting them up because a) Java (what the update process is written in) runs out of memory with too many rows (I have a bean object for each row) and b) I only have 4 gig of Oracle temp space to play with.
In the process, I use the rowid to update the record (so I have a unique value) and the rownum to select the blocks. I then call this query in iterations, selecting the next 50000 records until none remain (the java program controls this).
The problem I'm getting is that I'm still running out of Oracle temp space with this query. My DBA has told me that more temp space cannot be granted, so another method must be found.
I've tried substituting the subquery (what I presume is using all the temp space with the sort) with a view but an explain plan using a view is identical to one of the original query.
Is there a different/better way to achieve this without running into the memory/tempspace problems? I'm assuming an update query to update the dates (as opposed to a java program) would suffer from the same problem using temp space available?
Your assistance on this is greatly appreciated.
Update
I went down the path of the pl/sql block as suggested below:
declare
cursor c is select event_date from user_activity for update;
begin
for t_row in c loop
update user_activity
set event_date = t_row.event_date + 10/24 where current of c;
commit;
end loop;
end;
However, I'm running out of undo space. I was under the impression that if the commit was made after each update, then the need for undo space is minimal. Am I incorrect in this assumption?
A single update probably would not suffer from the same issue, and would probably be orders of magnitude faster. The large amount of temp tablespace is only needed because of the sorting. Although if your DBA is so stingy with the temp tablespace you may end up running out of UNDO space or something else. (Take a look at ALL_SEGMENTS, how large is your table?)
But if you really must use this method, maybe you can use a filter instead of an order by. Create 1200 buckets and process them one at a time:
where ora_hash(rowid, 1200) = 1
where ora_hash(rowid, 1200) = 2
...
But this will be horribly, horribly slow. And what happens if a value changes halfway through the process? A single SQL statement is almost certainly the best way to do this.
Why not just one update or merge?
Or you can write anonymous pl/sql block with processing data with cursor
For example
declare
cursor c is select * from aa for update;
begin
for t_row in c loop
update aa
set val=t_row.val||' new value';
end loop;
commit;
end;
How about not updating it at all?
rename user_activity to user_activity_gmt
create view user_activity as
select id,
user_id,
event_date+10/24 as event_date,
event,
resource_id,
resource_name
from user_activity_gmt;