Statistics rapidly changing within transaction - fixing execution plan - sql

A problem I'm facing (Oracle 11g):
I create a table, let's call it table_xyz, with index (not unique, no primary key).
I create package with procedure that will insert let's say 10 millions ofrecords monthly- it's not a simple "insert into", it's thousand lines of procedural code and some of it actually also selects data from table_xyz to calculate what data to insert further.
For example somewhere within the procedure there is this query
Now, there is a problem:
When the procedure is run for the first time, all queries on table_xyz will have execution plan based on the moment, when there were 0 records in table_xyz.
So, all queries will effectively full scan table_xyz, instead of starting to use indexes at some point.
This leads to terrible performance and in my case actually, the first run will never finish...
Now, there are three approaches i thought of:
At some point within the transaction, recalculate statistics. For example run "analyze table / analyze index compute statistics" after the count of records in table_xyz reaches the power of 10.
This is not possible, however, since ANALYZE commits transaction and i cannot allow that
At some point within the transaction, recalculate statistics as above, but do it in autonomous transaction. This does not work however, since the new statistics will not be visible for the main transaction (i tested that).
Just hint all the cursors that use table_xyz with USE_INDEX. This gets the job done, but is ugly and generally frowned upon in the codebase.
Are there any other ways?
I attach some code. It is just an example, please do not try to optimize it by removing the procedure and so on.
create table table_xyz (idx number(10) /*+ Specifically this is NOT a primary key */
,some_value varchar2(10)
);
create index table_xyz_idx on table_xyz (idx);
declare
cursor idxes is
select level idx
from dual d
connect by level < 100000;
current_val varchar2(10);
function calculate_some_value(p_idx number) return varchar2
is
cursor c_previous is
select t.some_value
from table_xyz t
where t.idx in (round(p_idx / 2, 0), round(p_idx / 3, 0), round(p_idx / 5, 0))
order by t.idx desc
;
x varchar2(100);
begin
open c_previous;
fetch c_previous into x;
close c_previous;
x := nvl(x, 'XYZ');
if mod(p_idx, 2) = 0 then
x := x || '2';
elsif mod(p_idx, 3) = 0 then
x := '3' || x;
elsif mod(p_idx, 5) = 0 then
x := substr(x, 1,1) || '5' || substr(x, 2, 2 + mod(p_idx, 7));
end if;
x := substr(x, 1, 10);
return x;
end calculate_some_value;
begin
for idx in idxes
loop
current_val := calculate_some_value(idx.idx);
insert into table_xyz(idx, some_value) values (idx.idx, current_val);
end loop;
end;

Consider taking a look at the DBMS_STATS package.
Option A: use the DBMS_STATS procedures for manually setting table, column, and index statistics (i.e., SET_TABLE_STATS, SET_COLUMN_STATS, and SET_INDEX_STATS, respectively). Then use DBMS_STATS.LOCK_TABLE_STATS to keep your manually set statistics from being overwritten (e.g., by a DBA gathering schema statistics while your table happens to be empty).
Option B: run you procedure as is and then, after, manually gather stats on the table. Then, as above, use DBMS_STATS.LOCK_TABLE_STATS to keep them from being overwritten.
Either way, the idea is to set or gather statistics on your table and then lock them in place.
If you want to get fancier, maybe you could automate this. E.g.,
At install time, manually set the statistics and lock them for your first run
In your procedure code, at the end, unlock the statistics, gather them, and re-lock them.

Related

Updating Millions of Records Oracle

I have created one query to update the 35 million records column,
but unfortunately, it took around more than one hour to process.
did I miss anything on the below query?
DECLARE
CURSOR exp_cur IS
SELECT
DECODE(
COLUMN_NAME,
NULL, NULL,
standard_hash(COLUMN_NAME)
) AS COLUMN_NAME
FROM TABLE1;
TYPE nt_fName IS TABLE OF VARCHAR2(100);
fname nt_fName;
BEGIN
OPEN exp_cur;
FETCH exp_cur BULK COLLECT INTO fname LIMIT 1000000;
CLOSE exp_cur;
--Print data
FOR idx IN 1 .. fname.COUNT
LOOP
UPDATE TABLE1 SET COLUMN_NAME=fname(idx);
commit;
DBMS_OUTPUT.PUT_LINE (idx||' '||fname(idx) );
END LOOP;
END;
The reason why bulk collect used with a forall construction is generally faster than the equivalent row-by-row loop is because it applies all the updates in one shot, instead of laboriously stepping though the rows one at a time and launching 35 million separate update statements, each one requiring the database to search for the individual row before updating it. But what you have written (even when the bugs are fixed) is still a row-by-row loop with 35 million search and update statements, plus the additional overhead of populating a 700 MB array in memory, 35 million commits, and 35 million dbms_output messages. It has to be slower because it has significantly more work to do than a plain update.
If it is practical to copy the data to a new table, insert will be a lot faster than update. At the end you can reapply any grants, indexes and constraints to the new table, rename both tables and drop the old one. You can also insert /*+ parallel enable_parallel_dml */ (or prior to Oracle 12c, you have to alter session enable parallel dml separately.) You could define the new table as nologging during the copy, but check with your DBA as that can affect replication and backups, though that might not matter if this is a test system. This will all need careful scripting if it's going to form part of a routine workflow.
Your code is updating all records of TABLE1 in each loop. (It loops 35 million times and in each loop updating 35 million records, That's why it is taking time)
You can simply use a single update statement as follows:
UPDATE TABLE1 SET COLUMN_NAME = standard_hash(COLUMN_NAME)
WHERE COLUMN_NAME IS NOT NULL;
So, If you want to use the BULK COLLECT and FORALL then you can use it as follows:
DECLARE
CURSOR EXP_CUR IS
SELECT COLUMN_NAME FROM TABLE1
WHERE COLUMN_NAME IS NOT NULL;
TYPE NT_FNAME IS TABLE OF VARCHAR2(100);
FNAME NT_FNAME;
BEGIN
OPEN EXP_CUR;
FETCH EXP_CUR BULK COLLECT INTO FNAME LIMIT 1000000;
FORALL IDX IN FNAME.FIRST..FNAME.LAST
UPDATE TABLE1
SET COLUMN_NAME = STANDARD_HASH(COLUMN_NAME)
WHERE COLUMN_NAME = FNAME(IDX);
COMMIT;
CLOSE EXP_CUR;
END;
/

Postgresql insert trigger becomes slow when querying current table

When inserting a lot number into a table we are counting the number times the base number exists, and adding a -## to the end of the new number based on that count.
I have stripped out most the logic (we check for other things as well). I also am aware of the logic flaw here that would skip -1.
-- Function: stone._lsuniqueid()
-- DROP FUNCTION stone._lsuniqueid();
CREATE OR REPLACE FUNCTION stone._lsuniqueid()
RETURNS trigger AS
$BODY$
DECLARE
_count INTEGER;
BEGIN
-- Obtain the number of occurences of this new ls_number
SELECT COUNT(ls_number) into _count
FROM ls
WHERE ls_number LIKE CAST(NEW.ls_number || '%' AS text);
-- Allow new ls_numbers to be entered as is, otherwise add "-#{count + 1}"
-- to the end of the ls_number
if _count > 0 THEN
NEW.ls_number = NEW.ls_number || '-' || CAST(_count + 1 AS text);
END IF;
RETURN NEW;
END
$BODY$
INSERT INTO ls VALUES (NEXTVAL('ls_ls_id_seq'),7285,UPPER('20151012'));
--> Query returned successfully: one row affected, 391 ms execution time.
The count query is plenty fast
SELECT COUNT(ls_number)
FROM ls
WHERE ls_number LIKE CAST('20151012' || '%' AS text);
--> 19ms
For comparison I tried a similar trigger, but ran the count against a different table with same amount of rows, and similar query time.
SELECT COUNT(lsdetail_id)
FROM lsdetail
WHERE lsdetail_id > 2433308
--> 20ms
Running the same insert with the count running against a different table returns the result 20 times faster.
INSERT INTO ls VALUES (NEXTVAL('ls_ls_id_seq'),7285,UPPER('20151012'));
--> Query returned successfully: one row affected, 20 ms execution time.
The ls table has about 2.5 million rows
I've tried a couple of different things and the issue seems to be when selecting from the same table I'm inserting into.
I would like to know why this happening, but I would also be open to a better way to create "sub-lot" numbers.
Thanks!
Found the answer here:
http://www.postgresql.org/message-id/27705.1150381444#sss.pgh.pa.us
Re: How to analyze function performance
"Mindaugas" writes:
Is it possible to somehow analyze function performance? E.g. we are using function cleanup() which takes obviously too much time to execute but I have problems trying to figure what is slowing things down.
When I explain analyze function lines step by step it show quite acceptable performance.
--
Are you sure you are "explain analyze"ing the same queries the function
is really doing? You have to account for the fact that what plpgsql is
issuing is parameterized queries, and sometimes that limits the
planner's ability to pick a good plan. For instance, if you have
declare x int;
begin
...
for r in select * from foo where key = x loop ...
then what is really getting planned and executed is "select * from foo
where key = $1" --- every plpgsql variable gets replaced by a parameter
symbol "$n". You can model this for EXPLAIN purposes with a prepared
statement:
prepare p1(int) as select * from foo where key = $1;
explain analyze execute p1(42);
If you find out that a particular query really sucks when parameterized,
you can work around this by using EXECUTE to force the query to be
planned afresh on each use with literal constants instead of parameters:
Then I looked into this:
http://www.postgresql.org/docs/9.1/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN
39.5.4. Executing Dynamic Commands
Oftentimes you will want to generate dynamic commands inside your PL/pgSQL functions, that is, commands that will involve different tables or different data types each time they are executed. PL/pgSQL's normal attempts to cache plans for commands (as discussed in Section 39.10.2) will not work in such scenarios. To handle this sort of problem, the EXECUTE statement is provided:
EXECUTE 'SELECT count(*) FROM mytable WHERE inserted_by = $1 AND inserted <= $2'
INTO c
USING checked_user, checked_date;
--
So in the end it was a matter of updating count select to this:
EXECUTE 'SELECT COALESCE(COUNT(ls_number), 0) FROM ls WHERE ls_number LIKE $1 || ''%'';'
INTO _count
USING NEW.ls_number;

Will moving a table/partition to a different tablespace interrupt queries accessing said table/partition?

As the title says: will moving (without renaming) a table/partition while it's being accessed have negative consequences on any queries accessing it?
For example, say there's a long-running SELECT COUNT(*) FROM some_table.
If I were to ALTER TABLE some_table MOVE TABLESPACE some_other_tablespace, would the SELECT fail?
Would it complete, but have incorrect results? Maybe something else entirely?
The only info I could find was that moving the table to a different tablespace would require rebuilding the indices under certain circumstances, but none made mention of what happens to any active queries.
It might fail with ORA-08103: object no longer exists.
In Oracle, readers and writers do not block each other. Which means DML and queries will not interfere with each other, excluding a few weird cases like running out of UNDO space. But moving a tablespace, or any type of ALTER or other DDL statement, is not a normal write. The multiversion concurrency control model breaks down when you run DDL, at least for the involved objects, and weird things start to happen.
Testing a large move is difficult, but you can reproduce these errors by looping through a lot of small alters and queries. In case you think this is only a theoretical issue, I have seen these errors occur in real-life, on a production database.
Warning: infinite loops below since I can't predict how long it will take to reproduce this error. But it usually only takes me tens of seconds.
--Create sample table.
drop table test1 purge;
create table test1(a number, b number)
partition by list(a) (partition p1 values(1), partition p2 values(2))
nologging tablespace users;
--Session 1
begin
loop
execute immediate '
insert /*+ append */ into test1 select mod(level,2)+1, level
from dual connect by level <= 100000';
commit;
execute immediate 'alter table test1 move partition p1 tablespace users';
end loop;
end;
/
--Session 2: Read from moved partition
declare
v_count number;
begin
loop
select count(*) into v_count from test1 where a = 1;
end loop;
end;
/
--Session 3: Read from unmoved partition
declare
v_count number;
begin
loop
select count(*) into v_count from test1 where a = 2;
end loop;
end;
/
Session 2 will eventually die with:
ORA-08103: object no longer exists
ORA-06512: at line 6
Session 3 will not fail, it is not querying an altered partition. Each partition has its own segment, and is a separate object that can potentially "no longer exist".
A partition move WILL cause DML to fail if it is accessing the parition.
I just proved it :-/.
I was doing a large delete and tried to move a partition (alter table move partition).
The error I received was:
ORA-12801: error signaled in parallel query server P004ORA-08103: object no longer exists

Fast Update database with more than 10 million records

I am fairly new to SQL and was wondering if someone can help me.
I got a database that has around 10 million rows.
I need to make a script that finds the records that have some NULL fields, and then updates it to a certain value.
The problem I have from doing a simple update statement, is that it will blow the rollback space.
I was reading around that I need to use BULK COLLECT AND FETCH.
My idea was to fetch 10,000 records at a time, update, commit, and continue fetching.
I tried looking for examples on Google but I have not found anything yet.
Any help?
Thanks!!
This is what I have so far:
DECLARE
CURSOR rec_cur IS
SELECT DATE_ORIGIN
FROM MAIN_TBL WHERE DATE_ORIGIN IS NULL;
TYPE date_tab_t IS TABLE OF DATE;
date_tab date_tab_t;
BEGIN
OPEN rec_cur;
LOOP
FETCH rec_cur BULK COLLECT INTO date_tab LIMIT 1000;
EXIT WHEN date_tab.COUNT() = 0;
FORALL i IN 1 .. date_tab.COUNT
UPDATE MAIN_TBL SET DATE_ORIGIN = '23-JAN-2012'
WHERE DATE_ORIGIN IS NULL;
END LOOP;
CLOSE rec_cur;
END;
I think I see what you're trying to do. There are a number of points I want to make about the differences between the code below and yours.
Your forall loop will not use an index. This is easy to get round by using rowid to update your table.
By committing after each forall you reduce the amount of undo needed; but make it more difficult to rollback if something goes wrong. Though logically your query could be re-started in the middle easily and without detriment to your objective.
rowids are small, collect at least 25k at a time; if not 100k.
You cannot index a null in Oracle. There are plenty of questions on stackoverflow about this is you need more information. A functional index on something like nvl(date_origin,'x') as a loose example would increase the speed at which you select data. It also means you never actually have to use the table itself. You only select from the index.
Your date data-type seems to be a string. I've kept this but it's not wise.
If you can get someone to increase your undo tablespace size then a straight up update will be quicker.
Assuming as per your comments date_origin is a date then the index should be on something like:
nvl(date_origin,to_date('absolute_minimum_date_in_Oracle_as_a_string','yyyymmdd'))
I don't have access to a DB at the moment but to find out the amdiOaas run the following query:
select to_date('0001','yyyy') from dual;
It should raise a useful error for you.
Working example in PL/SQL Developer.
create table main_tbl as
select cast( null as date ) as date_origin
from all_objects
;
create index i_main_tbl
on main_tbl ( nvl( to_date(date_origin,'yyyy-mm-dd')
, to_date('0001-01-01' ,'yyyy-mm-dd') )
)
;
declare
cursor c_rec is
select rowid
from main_tbl
where nvl(date_origin,to_date('0001-01-01','yyyy-mm-dd'))
= to_date('0001-01-01','yyyy-mm-dd')
;
type t__rec is table of rowid index by binary_integer;
t_rec t__rec;
begin
open c_rec;
loop
fetch c_rec bulk collect into t_rec limit 50000;
exit when t_rec.count = 0;
forall i in t_rec.first .. t_rec.last
update main_tbl
set date_origin = to_date('23-JAN-2012','DD-MON-YYYY')
where rowid = t_rec(i)
;
commit ;
end loop;
close c_rec;
end;
/

Efficient way to update all rows in a table

I have a table with a lot of records (could be more than 500 000 or 1 000 000). I added a new column in this table and I need to fill a value for every row in the column, using the corresponding row value of another column in this table.
I tried to use separate transactions for selecting every next chunk of 100 records and update the value for them, but still this takes hours to update all records in Oracle10 for example.
What is the most efficient way to do this in SQL, without using some dialect-specific features, so it works everywhere (Oracle, MSSQL, MySQL, PostGre etc.)?
ADDITIONAL INFO: There are no calculated fields. There are indexes. Used generated SQL statements which update the table row by row.
The usual way is to use UPDATE:
UPDATE mytable
SET new_column = <expr containing old_column>
You should be able to do this is a single transaction.
As Marcelo suggests:
UPDATE mytable
SET new_column = <expr containing old_column>;
If this takes too long and fails due to "snapshot too old" errors (e.g. if the expression queries another highly-active table), and if the new value for the column is always NOT NULL, you could update the table in batches:
UPDATE mytable
SET new_column = <expr containing old_column>
WHERE new_column IS NULL
AND ROWNUM <= 100000;
Just run this statement, COMMIT, then run it again; rinse, repeat until it reports "0 rows updated". It'll take longer but each update is less likely to fail.
EDIT:
A better alternative that should be more efficient is to use the DBMS_PARALLEL_EXECUTE API.
Sample code (from Oracle docs):
DECLARE
l_sql_stmt VARCHAR2(1000);
l_try NUMBER;
l_status NUMBER;
BEGIN
-- Create the TASK
DBMS_PARALLEL_EXECUTE.CREATE_TASK ('mytask');
-- Chunk the table by ROWID
DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID('mytask', 'HR', 'EMPLOYEES', true, 100);
-- Execute the DML in parallel
l_sql_stmt := 'update EMPLOYEES e
SET e.salary = e.salary + 10
WHERE rowid BETWEEN :start_id AND :end_id';
DBMS_PARALLEL_EXECUTE.RUN_TASK('mytask', l_sql_stmt, DBMS_SQL.NATIVE,
parallel_level => 10);
-- If there is an error, RESUME it for at most 2 times.
l_try := 0;
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask');
WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)
LOOP
l_try := l_try + 1;
DBMS_PARALLEL_EXECUTE.RESUME_TASK('mytask');
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask');
END LOOP;
-- Done with processing; drop the task
DBMS_PARALLEL_EXECUTE.DROP_TASK('mytask');
END;
/
Oracle Docs: https://docs.oracle.com/database/121/ARPLS/d_parallel_ex.htm#ARPLS67333
You could drop any indexes on the table, then do your insert, and then recreate the indexes.
Might not work you for, but a technique I've used a couple times in the past for similar circumstances.
created updated_{table_name}, then select insert into this table in batches. Once finished, and this hinges on Oracle ( which I don't know or use ) supporting the ability to rename tables in an atomic fashion. updated_{table_name} becomes {table_name} while {table_name} becomes original_{table_name}.
Last time I had to do this was for a heavily indexed table with several million rows that absolutely positively could not be locked for the duration needed to make some serious changes to it.
What is the database version? Check out virtual columns in 11g:
Adding Columns with a Default Value
http://www.oracle.com/technology/pub/articles/oracle-database-11g-top-features/11g-schemamanagement.html
update Hotels set Discount=30 where Hotelid >= 1 and Hotelid <= 5504
For Postgresql I do something like this (if we are sure no more updates/inserts take place):
create table new_table as table orig_table with data;
update new_table set column = <expr>
start transaction;
drop table orig_table;
rename new_table to orig_table;
commit;
Update:
One improvement is that if your table is very large you will not lock the table, this operation in this case could take minutes.
Only if you are sure in the process no inserts and/or updates take
place.