Update statement taking more than 24 hours for updating 32k rows

Update statement taking more than 24 hours for updating 32k rows - sql

Below is the update statement that I am running for 32k times and it is taking more than 15 hours and running.
I have to update the value in table 2 for 32k different M_DISPLAY VALUES.
UPDATE TABLE_2 T2 SET T2.M_VALUE = 'COL_ANC'
WHERE EXISTS (SELECT 1 FROM TABLE_2 T1 WHERE TRIM(T1.M_DISPLAY) = 'ANCHORTST' AND T1.M_LABEL=T2.M_LABEL );
Am not sure Why is it taking such a long time as I have tuned the query,
I have copied 32000 update statements in a Update.sql file and running the SQL in command line.
Though it is updating the table, it is a neverending process
Please advice if I have gone wrong anywhere
Regards

Using FORALL
If you cannot rewrite the query to run a single bulk-update instead of 32k individual updates, you might still get lucky by using PL/SQL's FORALL. An example:
DECLARE
TYPE rec_t IS RECORD (
m_value table_2.m_value%TYPE,
m_display table_2.m_display%TYPE
);
TYPE tab_t IS TABLE OF rec_t;
data tab_t := tab_t();
BEGIN
-- Fill in data object. Replace this by whatever your logic for matching
-- m_value to m_display is
data.extend(1);
data(1).m_value := 'COL_ANC';
data(1).m_display := 'ANCHORTST';
-- Then, run the 32k updates using FORALL
FORALL i IN 1 .. data.COUNT
UPDATE table_2 t2
SET t2.m_value = data(i).m_value
WHERE EXISTS (
SELECT 1
FROM table_2 t1
WHERE trim(t1.m_display) = data(i).m_display
AND t1.m_label = t2.m_label
);
END;
/
Concurrency
If you're not the only process on the system, 32k updates in a single transaction can hurt. It's definitely worth committing a few thousand rows in sub-transactions to reduce concurrency effects with other processes that might read the same table while you're updating.
Bulk update
Really, the goal of any improvement should be bulk updating the entire data set in one go (or perhaps split in a few bulks, see concurrency).
If you had a staging table containing the update instructions:
CREATE TABLE update_instructions (
m_value VARCHAR2(..),
m_display VARCHAR2(..)
);
Then you could pull off something along the lines of:
MERGE INTO table_2 t2
USING (
SELECT u.*, t1.m_label
FROM update_instructions u
JOIN table_2 t1 ON trim(t1.m_display) = u.m_display
) t1
ON t2.m_label = t1.m_label
WHEN MATCHED THEN UPDATE SET t2.m_value = t1.m_value;
This should be even faster than FORALL (but might have more concurrency implications).
Indexing and data sanitisation
Of course, one thing that might definitely hurt you when running 32k individual update statements is the TRIM() function, which prevents using an index on M_DISPLAY efficiently. If you could sanitise your data so it doesn't need trimming first, that would definitely help. Otherwise, you could add a function based index just for the update (and then drop it again):
CREATE INDEX i ON table_2 (trim (m_display));

The query and subquery query the same table: TABLE_2. Assuming that M_LABEL is unique, the subquery returns 1s for all rows in TABLE_2 where M_DISPLAY is ANCHORTST. Then the update query updates the same (!) TABLE_2 for all 1s returned from subquery - so for all rows where M_DISPLAY is ANCHORTST.
Therefore, the query could be simplified, exploiting the fact that both update and select work on the same table - TABLE_2:
UPDATE TABLE_2 T2 SET T2.M_VALUE = 'COL_ANC' WHERE TRIM(T2.M_DISPLAY) = 'ANCHORTST'
If M_LABEL is not unique, then the above is not going to work - thanks to commentators for pointing that out!

For significantly faster execution:
Ensure that you have created an index on M_DISPLAY and M_LABEL columns that are in your WHERE clause.
Ensure that M_DISPLAY has a function based index. If it does not, then do not pass it to the TRIM function, because the function will prevent the database from using the index that you have created for the M_DISPLAY column. TRIM the data before storing in the table.
Thats it.
By the way, as has been mentioned, you shouldn't need 32k queries for meeting your objective. One will probably suffice. Look into query based update. As an example, see the accepted answer here:
Oracle SQL: Update a table with data from another table

Related

How to SELECT COUNT from tables currently being INSERT?

Hi consider there is an INSERT statement running on a table TABLE_A, which takes a long time, I would like to see how has it progressed.
What I tried was to open up a new session (new query window in SSMS) while the long running statement is still in process, I ran the query
SELECT COUNT(1) FROM TABLE_A WITH (nolock)
hoping that it will return right away with the number of rows everytime I run the query, but the test result was even with (nolock), still, it only returns after the INSERT statement is completed.
What have I missed? Do I add (nolock) to the INSERT statement as well? Or is this not achievable?
(Edit)
OK, I have found what I missed. If you first use CREATE TABLE TABLE_A, then INSERT INTO TABLE_A, the SELECT COUNT will work. If you use SELECT * INTO TABLE_A FROM xxx, without first creating TABLE_A, then non of the following will work (not even sysindexes).

Short answer: You can't do this.
Longer answer: A single INSERT statement is an atomic operation. As such, the query has either inserted all the rows or has inserted none of them. Therefore you can't get a count of how far through it has progressed.
Even longer answer: Martin Smith has given you a way to achieve what you want. Whether you still want to do it that way is up to you of course. Personally I still prefer to insert in manageable batches if you really need to track progress of something like this. So I would rewrite the INSERT as multiple smaller statements. Depending on your implementation, that may be a trivial thing to do.

If you are using SQL Server 2016 the live query statistics feature can allow you to see the progress of the insert in real time.
The below screenshot was taken while inserting 10 million rows into a table with a clustered index and a single nonclustered index.
It shows that the insert was 88% complete on the clustered index and this will be followed by a sort operator to get the values into non clustered index key order before inserting into the NCI. This is a blocking operator and the sort cannot output any rows until all input rows are consumed so the operators to the left of this are 0% done.
With respect to your question on NOLOCK
It is trivial to test
Connection 1
USE tempdb
CREATE TABLE T2
(
X INT IDENTITY PRIMARY KEY,
F CHAR(8000)
);
WHILE NOT EXISTS(SELECT * FROM T2 WITH (NOLOCK))
LOOP:
SELECT COUNT(*) AS CountMethod FROM T2 WITH (NOLOCK);
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('T2');
RAISERROR ('Waiting for 10 seconds',0,1) WITH NOWAIT;
WAITFOR delay '00:00:10';
SELECT COUNT(*) AS CountMethod FROM T2 WITH (NOLOCK);
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('T2');
RAISERROR ('Waiting to drop table',0,1) WITH NOWAIT
DROP TABLE T2
Connection 2
use tempdb;
--Insert 2000 * 2000 = 4 million rows
WITH T
AS (SELECT TOP 2000 'x' AS x
FROM master..spt_values)
INSERT INTO T2
(F)
SELECT 'X'
FROM T v1
CROSS JOIN T v2
OPTION (MAXDOP 1)
Example Results - Showing row count increasing
SELECT queries with NOLOCK allow dirty reads. They don't actually take no locks and can still be blocked, they still need a SCH-S (schema stability) lock on the table (and on a heap it will also take a hobt lock).
The only thing incompatible with a SCH-S is a SCH-M (schema modification) lock. Presumably you also performed some DDL on the table in the same transaction (e.g. perhaps created it in the same tran)
For the use case of a large insert, where an approximate in flight result is fine, I generally just poll sysindexes as shown above to retrieve the count from metadata rather than actually counting the rows (non deprecated alternative DMVs are available)
When an insert has a wide update plan you can even see it inserting to the various indexes in turn that way.
If the table is created inside the inserting transaction this sysindexes query will still block though as the OBJECT_ID function won't return a result based on uncommitted data regardless of the isolation level in effect. It's sometimes possible to get around that by getting the object_id from sys.tables with nolock instead.

Use the below query to find the count for any large table or locked table or being inserted table in seconds . Just replace the table name which you want to search.
SELECT
Total_Rows= SUM(st.row_count)
FROM
sys.dm_db_partition_stats st
WHERE
object_name(object_id) = 'TABLENAME' AND (index_id < 2)

For those who just need to see the record count while executing a long running INSERT script, I found you can see the current record count through SSMS by right clicking on the destination database table, -> Properties -> Storage, then view the "Row Count" value like so:
Close window and repeat to see the updated record count.

Alternatives to UPDATE statement Oracle 11g

I'm currently using Oracle 11g and let's say I have a table with the following columns (more or less)
Table1
ID varchar(64)
Status int(1)
Transaction_date date
tons of other columns
And this table has about 1 Billion rows. I would want to update the status column with a specific where clause, let's say
where transaction_date = somedatehere
What other alternatives can I use rather than just the normal UPDATE statement?
Currently what I'm trying to do is using CTAS or Insert into select to get the rows that I want to update and put on another table while using AS COLUMN_NAME so the values are already updated on the new/temporary table, which looks something like this:
INSERT INTO TABLE1_TEMPORARY (
ID,
STATUS,
TRANSACTION_DATE,
TONS_OF_OTHER_COLUMNS)
SELECT
ID
3 AS STATUS,
TRANSACTION_DATE,
TONS_OF_OTHER_COLUMNS
FROM TABLE1
WHERE
TRANSACTION_DATE = SOMEDATE
So far everything seems to work faster than the normal update statement. The problem now is I would want to get the remaining data from the original table which I do not need to update but I do need to be included on my updated table/list.
What I tried to do at first was use DELETE on the same original table using the same where clause so that in theory, everything that should be left on that table should be all the data that i do not need to update, leaving me now with the two tables:
TABLE1 --which now contains the rows that i did not need to update
TABLE1_TEMPORARY --which contains the data I updated
But the delete statement in itself is also too slow or as slow as the orginal UPDATE statement so without the delete statement brings me to this point.
TABLE1 --which contains BOTH the data that I want to update and do not want to update
TABLE1_TEMPORARY --which contains the data I updated
What other alternatives can I use in order to get the data that's the opposite of my WHERE clause (take note that the where clause in this example has been simplified so I'm not looking for an answer of NOT EXISTS/NOT IN/NOT EQUALS plus those clauses are slower too compared to positive clauses)
I have ruled out deletion by partition since the data I need to update and not update can exist in different partitions, as well as TRUNCATE since I'm not updating all of the data, just part of it.
Is there some kind of JOIN statement I use with my TABLE1 and TABLE1_TEMPORARY in order to filter out the data that does not need to be updated?
I would also like to achieve this using as less REDO/UNDO/LOGGING as possible.
Thanks in advance.

I'm assuming this is not a one-time operation, but you are trying to design for a repeatable procedure.
Partition/subpartition the table in a way so the rows touched are not totally spread over all partitions but confined to a few partitions.
Ensure your transactions wouldn't use these partitions for now.
Per each partition/subpartition you would normally UPDATE, perform CTAS of all the rows (I mean even the rows which stay the same go to TABLE1_TEMPORARY). Then EXCHANGE PARTITION and rebuild index partitions.
At the end rebuild global indexes.
If you don't have Oracle Enterprise Edition, you would need to either CTAS entire billion of rows (followed by ALTER TABLE RENAME instead of ALTER TABLE EXCHANGE PARTITION) or to prepare some kind of "poor man's partitioning" using a view (SELECT UNION ALL SELECT UNION ALL SELECT etc) and a bunch of tables.
There is some chance that this mess would actually be faster than UPDATE.
I'm not saying that this is elegant or optimal, I'm saying that this is the canonical way of speeding up large UPDATE operations in Oracle.

How about keeping in the UPDATE in the same table, but breaking it into multiple small chunks?
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 0000000 and 0999999
COMMIT
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 1000000 and 1999999
COMMIT
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 2000000 and 2999999
COMMIT
This could help if the total workload is potentially manageable, but doing it all in one chunk is the problem. This approach breaks it into modest-sized pieces.
Doing it this way could, for example, enable other apps to keep running & give other workloads a look in; and would avoid needing a single humungous transaction in the logfile.

Update million rows using rowids from one table to another Oracle

Hi I have two table with million rows in each.I have oracle 11 g R1
I am sure many of us must have gone through this situation.
What is the most efficient and fast way to update from one table to another where the values are DIFFERENT.
Eg: Table 1 has 4 NUMBER columns with a high precision eg : 0.2212454215454212
Table 2 has 6 columns.
update table 2's four columns based on common column on both the tables, only the different ones.
I have something like this
DECLARE
TYPE test1_t IS TABLE OF test.score%TYPE INDEX BY PLS_..;
TYPE test2_t IS TABLE OF test.id%TYPE INDEX BY PLS..;
TYPE test3_t IS TABLE OF test.Crank%TYPE INDEX BY PLS..;
vscore test1_t;
vid test2_t;
vurank test4_t;
BEGIN
SELECT id,score,urank
BULK COLLECT INTO vid,vscore,vurank
FROM test;
FORALL i IN 1 .. vid.COUNT
MERGE INTO final T
USING (SELECT vid (i) AS o_id,
vurank (i) AS o_urank,
vscore (i) AS o_score FROM DUAL) S
ON (S.o_id = T.id)
WHEN MATCHED THEN
UPDATE SET T.crank = S.o_crank
WHERE T.crank <> S.o_crank;
Since the numbers are with high precision is it slowing down?
I tried Bulk Collect and Merge combination still its taking time ~ 30 mins for worst case scenario if I have to update 1 million rows.
Is there something with rowid?
Help will be appreciated.

If you want to update all the rows, then just use update:
update table_1
set (col1,
col2) = (
select col1,
col2
from table2
where table2.col_a = table1.col_a and
table2.col_b = table1.col_b)
Bulk collect or any PL/SQL technique will always be slower than a pure SQL technique.
The numeric precision is probably not significant, and rowid is not relevant as there is no common value between the two tables.

When dealing with millions of rows, parallel DML is a game changer. Of course you need to have Enterprise Edition to use parallel, but it's really the only thing which will make much difference.
I recommend you read an article on OraFAQ by rleishman comparing 8 Bulk Update Methods. His key finding is that "the cost of disk reads so far outweighs the context switches that that they are barely noticable (sic)". In other words, unless your data is already cached in memory there really isn't a significant difference between SQL and PL/SQL approaches.
The article does have some neat suggestions on employing parallel. The surprising outcome is that a parallel pipelined function offers the best performance.

Focusing on the syntax have been used and skipping the logic (may using a pure update + pure insert may solve the problem, merge cost, indexes, possible full scan on merge and else )
You should use Limit in Bulk Collect syntax
Using a bulk collect with no limit
Will case all records to be loaded in memory
With no partially committed merges, you will create a larg redolog,
that must be apply in the end of the process.
Both will reason in low performance.
DECLARE
v_fetchSize NUMBER := 1000; -- based on hardware, design and .... could be scaled
CURSOR a_cur IS
SELECT id,score,urank FROM test;
TYPE myarray IS TABLE OF a_cur%ROWTYPE;
cur_array myarray;
BEGIN
OPEN a_cur;
LOOP
FETCH a_cur BULK COLLECT INTO cur_array LIMIT v_fetchSize;
FORALL i IN 1 .. cur_array.COUNT
// DO Operation
COMMIT;
EXIT WHEN a_cur%NOTFOUND;
END LOOP;
CLOSE a_cur;
END;

Just to be sure: test.id and final.id must be indexed.
With first select ... from test you got too much records from Table 1 and after that you need to compare all of them with records on Table 2. Try to select only what you need to update. So, there are at least 2 variants:
a) select only changed records:
SELECT source_table.id, source_table.score, source_table.urank
BULK COLLECT INTO vid,vscore,vurank
FROM
test source_table,
final destination_table
where
source_table.id = destination_table.id
and
source_table.crank <> destination_table.crank
;
b) Add new field to source table with datetime value and fill it in trigger with current time. While synchronizing pick only records changed during last day. This field needs to be indexed.
After such a change on update phase you don't need to compare other fields, only match ID's:
FORALL i IN 1 .. vid.COUNT
MERGE INTO FINAL T
USING (
SELECT vid (i) AS o_id,
vurank (i) AS o_urank,
vscore (i) AS o_score FROM DUAL
) S
ON (S.o_id = T.id)
WHEN MATCHED
THEN UPDATE SET T.crank = S.o_crank
If you worry about size of undo/redo segments then variant b) is more useful, because you can get records from source Table 1 divided to time slices and commit changes after updating every slice. E.g. from 00:00 to 01:00 , from 01:00 to 02:00 etc.
In this variant update can be done just by SQL statement without selecting a data into collections in row with maintaining acceptable sizes of redo/undo logs.

Oracle query performance degrades when inserting many rows in a single transaction

In a single transaction I am inserting may rows into a table, before inserting the row I perform a query to see if there is already a row with the key I am about to insert.
What I see is that the query to check the key exists gets very slow within my transaction, but from another transaction it is fast, and in the next transaction it is fast.
I cant break this work down into smaller transactions as the request I am processing needs to be in a single transaction.
Is there anything I can do to make the select query in this transaction fast?

So, please add constraint / primary key. This will allow you to remove all your selects.
May be consider to use MERGE as # Egor_Skriptunoff recommended.
OR Add indexes for columns you are selected by.

if inserting millions of thousand of records first thing is do incremental commits as you are likely hitting temp space fragmentation or limits which result in slowdowns. that can be done in a begin end block
also, this allows you to add index via
create index b indexName on table_name(col1, col2, col3);
merge is faster as previous answer states.
alternatively add all ignoring duplicates then remove duplicates
this can be done via
for example
begin
insert into table_name select * from table_name; [ if pulling from another table]or[use values and column maps]
delete from table_name A where rowid >(select min(rowid) from table_name B where A.key_value=B.key_value);
end
if in a procedure this would also require
both query and delete can be in begin end block and execute immediate(' you ddl statement here';');

update x set y = null takes a long time

At work, I have a large table (some 3 million rows, like 40-50 columns). I sometimes need to empty some of the columns and fill them with new data. What I did not expect is that
UPDATE table1 SET y = null
takes much more time than filling the column with data which is generated, for example, in the sql query from other columns of the same table or queried from other tables in a subquery. It does not matter if I go through all table rows at once (like in the update query above) or if I use a cursor to go through the table row by row (using the pk). It does not matter if I use the large table at work or if I create a small test table and fill it with some hundredthousands of test-rows. Setting the column to null always takes way longer (Throughout the tests, I encountered factors of 2 to 10) than updating the column with some dynamic data (which is different for each row).
Whats the reason for this? What does Oracle do when setting a column to null? Or - what's is my error in reasoning?
Thanks for your help!
P.S.: I am using oracle 11g2, and found these results using both plsql developer and oracle sql developer.

Is column Y indexed? It could be that setting the column to null means Oracle has to delete from the index, rather than just update it. If that's the case, you could drop and rebuild it after updating the data.
EDIT:
Is it just column Y that exhibits the issue, or is it independent of the column being updated? Can you post the table definition, including constraints?

Summary
I think updating to null is slower because Oracle (incorrectly) tries to take advantage of the way it stores nulls, causing it to frequently re-organize the rows in the block ("heap block compress"), creating a lot of extra UNDO and REDO.
What's so special about null?
From the Oracle Database Concepts:
"Nulls are stored in the database if they fall between columns with data values. In these cases they require 1 byte to store the length of the column (zero).
Trailing nulls in a row require no storage because a new row header signals that the remaining columns in the previous row are null. For example, if the last three columns of a table are null, no information is stored for those columns. In tables with many columns,
the columns more likely to contain nulls should be defined last to conserve disk space."
Test
Benchmarking updates is very difficult because the true cost of an update cannot be measured just from the update statement. For example, log switches will
not happen with every update, and delayed block cleanout will happen later. To accurately test an update, there should be multiple runs,
objects should be recreated for each run, and the high and low values should be discarded.
For simplicity the script below does not throw out high and low results, and only tests a table with a single column. But the problem still occurs regardless of the number of columns, their data, and which column is updated.
I used the RunStats utility from http://www.oracle-developer.net/utilities.php to compare the resource consumption of updating-to-a-value with updating-to-a-null.
create table test1(col1 number);
BEGIN
dbms_output.enable(1000000);
runstats_pkg.rs_start;
for i in 1 .. 10 loop
execute immediate 'drop table test1 purge';
execute immediate 'create table test1 (col1 number)';
execute immediate 'insert /*+ append */ into test1 select 1 col1
from dual connect by level <= 100000';
commit;
execute immediate 'update test1 set col1 = 1';
commit;
end loop;
runstats_pkg.rs_pause;
runstats_pkg.rs_resume;
for i in 1 .. 10 loop
execute immediate 'drop table test1 purge';
execute immediate 'create table test1 (col1 number)';
execute immediate 'insert /*+ append */ into test1 select 1 col1
from dual connect by level <= 100000';
commit;
execute immediate 'update test1 set col1 = null';
commit;
end loop;
runstats_pkg.rs_stop();
END;
/
Result
There are dozens of differences, these are the four I think are most relevant:
Type Name Run1 Run2 Diff
----- ---------------------------- ------------ ------------ ------------
TIMER elapsed time (hsecs) 1,269 4,738 3,469
STAT heap block compress 1 2,028 2,027
STAT undo change vector size 55,855,008 181,387,456 125,532,448
STAT redo size 133,260,596 581,641,084 448,380,488
Solutions?
The only possible solution I can think of is to enable table compression. The trailing-null storage trick doesn't happen for compressed tables.
So even though the "heap block compress" number gets even higher for Run2, from 2028 to 23208, I guess it doesn't actually do anything.
The redo, undo, and elapsed time between the two runs is almost identical with table compression enabled.
However, there are lots of potential downsides to table compression. Updating to a null will run much faster, but every other update will run at least slightly slower.

That's because it deletes from blocks that data.
And delete is the hardest operation. If you can avoid a delete, do it.
I recommend you to create another table with that column null(Create table as select for example, or insert select), and fill it(the column) with your procedure. Drop old table and then rename the new table with current name.
UPDATE:
Another important thing is that you should update the column as is, with new values. It is useless to set them null and after that refill them.
If you do not have values for all rows, you can do the update like this:
udpate table1
set y = (select new_value from source where source.key = table1.key)
and will set to null those rows that does not exists in source.

I would try what Tom Kyte suggested on large updates.
When it comes to huge tables, it best to go like this : take a few rows, update them, take some more, update those etc. Don't try to issue an update on all the table. That's a killer move right from the start.
Basically create binary_integer indexed table, fetch 10 rows at a time, and update them.
Here is a piece of code that i have used of large tables with success. Because im lazy and its like 2AM now ill just copy paste it here and let you figure it out, but let me know if you need help :
DECLARE
TYPE BookingRecord IS RECORD (
bprice number,
bevent_id number,
book_id number
);
TYPE array is TABLE of BookingRecord index by binary_integer;
l_data array;
CURSOR c1 is
SELECT LVC_USD_PRICE_V2(ev.activity_version_id,ev.course_start_date,t.local_update_date,ev.currency,nvl(t.delegate_country,ev.sponsor_org_country),ev.price,ev.currency,t.ota_status,ev.location_type) x,
ev.title,
t.ota_booking_id
FROM ota_gsi_delegate_bookings_t#diseulprod t,
inted_parted_events_t#diseulprod ev
WHERE t.event_id = ev.event_id
and t.ota_booking_id =
BEGIN
open c1;
loop
fetch c1 bulk collect into l_data limit 20;
for i in 1..l_data.count
loop
update ou_inc_int_t_01
set price = l_data(i).bprice,
updated = 'Y'
where booking_id = l_data(i).book_id;
end loop;
exit when c1%notfound;
end loop;
close c1;
END;

what can also help speed up updates is to use alter table table1 nologging so that the update won't generate redo logs. another possibility is to drop the column and re-add it. since this is a DDL operation it will generate neither redo nor undo.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas