Oracle SQL script parallel execution - sql

For my job I need to prepare two tables (CTAS) and then do some joins between them. For this job I created a script (run it in SQL Developer) which consequentially creates these two tables one after another. Since these two tables are not related I'd like to start creating them in parallel. Is it possible in SQL script to start two table creations (or two other scripts) in parallel and then proceed when both finish their jobs?

Here's one option.
I wouldn't really CTAS - I'd rather create both tables in advance, and then insert rows into them. Why? Because this approach uses stored procedures which - in order to perform DDL (which is CTAS) - require dynamic SQL. Not that it is impossible to do that; on the contrary, but it is way simpler NOT to use it.
I'd create yet another table (let's call it table_done) which contains only one row with two columns: table_1 and table_2 whose values can be 0 (meaning: data for that table is not ready) or 1 (data ready).
Furthermore, I'd create two stored procedures which look the same; the only difference is that each of them inserts rows into its own table:
create procedure p_insert_1 as
begin
-- remove old data
execute immediate 'truncate table table_1';
-- table_1 data not ready
update table_done set table_1 = 0;
-- prepare new data
insert into table_1 (...) select ...;
-- table_1 data ready
insert into table_done (table_1) values (1);
commit;
end;
The 3rd, "main" procedure, is the one you'd run manually. What would it do? Create two one-time database jobs that run immediately, each of them starting its own p_insert procedure so that they run in parallel. That procedure would then (in a loop) check whether both columns in table_done are set to 1 and - if so - continue execution.
create procedure p_main is
l_job_1 number;
l_job_2 number;
--
l_t1_done number;
l_t2_done number;
begin
dbms_job.submit(l_job_1, 'begin p_insert_1; end;');
dbms_job.submit(l_job_2, 'begin p_insert_2; end;');
loop
select table_1, table_2
into l_t1_done, l_t2_done
from table_done;
if l_t1_done = 1 and l_t2_done = 1 then
-- exit the loop
exit;
else
-- tables aren't ready yet; wait 60 seconds and try again
dbms_lock.sleep(60);
end if;
end loop;
-- process data prepared in table_1 and table_2
end;
That's just a simplified idea; I didn't test it myself so I apologize if there are any errors I made. Also,
instead of dbms_job, you could choose to use advanced dbms_scheduler
if you're on 18c (or later), use dbms_session.sleep instead of dbms_lock.sleep
and so forth

Use SQL parallelism instead of process concurrency. While the words parallelism and concurrency are colloquially interchangeable, in Oracle they have different meanings. Parallelism implies that the SQL engine handles all the coordination of breaking work into little pieces, running those pieces at the same time, and then re-assembling the results at the end. Concurrency implies that the user will create multiple sessions and handle the coordination manually.
For simply creating two tables, parallelism will probably be simpler and faster than concurrency. For parallelism, you may only need to create the table in parallel. (And you probably want to reset the parallelism back to none at the end.)
CREATE TABLE TABLE1 PARALLEL 2 AS SELECT ...;
ALTER TABLE TABLE1 NOPARALLEL;
The PARALLEL 2 option instructs Oracle to run two server processes at the same time while the SQL statement is running. You can easily increase that number, but don't go too high or you'll be stealing too many resources from other sessions.
DBMS_SCHEDULER and other concurrency mechanisms are powerful and useful, but I recommend avoiding them if possible. Running and monitoring scheduler jobs will likely be much more complicated than the preceding code. (Although you may still need to occasionally monitor the parallel SQL statement using a tool like OEM SQL Monitor Reports to ensure that the server is actually using the requested parallelism.)

Related

UPDATE Table with 500+ Millions of Records using 50 Parallel SQLPLUS Sessions with Bulk Collect LIMIT and FORALL

I am new to Oracle and I tried to improve the performance of updating a huge table (500+ Millions Records) - no matter how much i try, its killing the performance.
I am trying to UPDATE a Table with 500 Million Records using 50 Parallel SQLPLUS Sessions with Bulk Collect LIMIT and FORALL - Which is taking huge time (Elapsed: 04:32:30.29).
CURSOR NIT_CURSOR IS
SELECT SA_ROWID, NEW_TRX_1, NEW_TRX_2
FROM NIT_SA
WHERE MOD(DBMS_ROWID.rowid_block_number(SA_ROWID),&2) = &1;
Note: &2 is 50 where as &1 is number of instance invoked from shell between (0-49),
SA_ROWID is acutal rowid from SA Table
Now, UPDATE Statement is as Below:
C_COMMIT CONSTANT PLS_INTEGER := 10000;
FETCH NIT_CURSOR BULK COLLECT INTO BLK_NIT_SA LIMIT C_COMMIT;
FORALL indx IN 1 .. BLK_NIT_SA.COUNT
UPDATE SA
SET INS_TRX_ID = BLK_NIT_SA(indx).NEW_TRX_1,
UPD_TRX_ID = BLK_NIT_SA(indx).NEW_TRX_2
WHERE ROWID = BLK_NIT_SA(indx).SA_ROWID;
This Script is getting invoked by KSH in 50 Parallel SQLPLUS Sessions:
typeset -A sqlFile
sqlFile[2.3.sql]=50
for counter_i in "${!sqlFile[#]}"
do
for counter_j in {0..$((${sqlFile[$counter_i]}-1))}
do
sqlplus -s ${DATABASE_CONNECT} #${SQLFILE} "${SPOOLFILE}" ${sqlFile[$counter_i]} $counter_j &
done
done
As per my understanding, since i am directly using the rowid of the table SA, which would be the fastest way of accessing row and 50 SQL Session should be actually process the data much faster, where as with my observation, soon i increase the number the parallel processes of SQL Session from 50 to 100, my update time of each process increases by 2 hours i.e. from 4.5 hours to 7.5 hours.
I am intended to finish it in 1.5 to 2 hours. not sure if its realistic.
Can someone please help me with above?
500+ million records is not a good idea, you can tell tables structure, and what column needs to be updated. We can consider the following options:
if required new values on the table can be calculated dynamically then we can create a view on top of the table so that whenever we read the view, we can get the required values.
create new table and insert all the data from existing table as required (with new values for those columns) and drop the old table, rename the new one
Converting the PL/SQL into a single SQL statement and using statement parallelism should significantly improve the performance:
alter session enable parallel dml;
merge /*+ parallel(50) */ into sa using
(
select sa_rowid, new_trx_1, new_trx_2
from nit_sa
) nit_sa
on (sa.rowid = nit_sa.sa_rowid)
when matched then update set
sa.ins_trx_id = nit_sa.new_trx_1,
sa.upd_trx_id = nit_sa.new_trx_2;
The above statement will only read from the table once. Oracle will automatically divide the table into pieces called granules, there's no need to manually partition the tables.
Parallelism is easy in theory - just add a hint and the run time can improve by an order of magnitude. But getting it right in practice can be tricky. Parallelism requires Enterprise Edition, sufficient resources, a sane configuration, etc. And running parallel DML will lock the entire table until the statement is complete - no other DML can run on the table at the same time. (Although other processes can still read from the table.)

Stored procedures vs standard select update, avoid locks

In order to retrieve an ID, I first do a select and then an update, in two consequent queries.
The problem is that I am having problems with locked rows. I've read that putting both this statements, Select and Update in one stored procedure it helps with the locks. Is this true?
The queries I run are:
select counter
from dba.counter_list
where table_name = :TableName
update dba.counter_list
set counter = :NewCounter
where table_name = :TableName
The problem is that it can happen that multiple users are selecting the same row and also possible that they update the same row.
Assumptions:
you're using Sybase ASE
your select returns a single value for counter
you may want the old counter value for some purpose other than performing the update
Consider the following update statement which should eliminate any race conditions that may occur with multiple users running your select/update logic concurrently:
declare #counter int -- change to the appropriate datatype
update dba.counter_list
set #counter = counter, -- grab current value
counter = :NewCounter -- set to new value
where table_name = :TableName
select #counter -- send previous counter value to client
the update obtains an exclusive lock on the desired row (or page/table depending on table design and locking scheme)
with an exclusive lock in place you're able to retrieve the current value and set the new value with a single statement
Whether you submit the above via a SQL batch or a stored proc call is up to you and your DBA to decide ...
if statement cache is disabled, a SQL batch will need to be compiled each time it's submitted to the dataserver
if statement cache is enabled, and you submit this SQL batch on a regular basis then there's a chance the previous query plan is still in statement/procedure cache thus eliminating the (costly) compilation step
if a copy of previous stored proc (query) plan is not in procedure cache then you'll incur the (costly) compilation step when loading a (proc) query plan into procedure cahe
a stored proc is typically easier to replace in the event of a syntax/logic/performance issue (as opposed to editing, and possibly compiling, a front-end application)
... add your (least) favorite argument for SQL batch vs stored proc (vs prepared statement?) vs ??? ...
Is the table counter_list accessed by multiple clients concurrently ?
The best practices for OLTP is to call a stored procedure that will perform the update logic in one transaction.
Check that the table dba.counter_list has an index on column table_name.
Check also that it is row level locked.

same Query is executing with different time intervals

I have a scenario, in which i have one stored proc which contains set of sql statements( combination of joins and sub queries as well, query is large to displays)
and finally result is storing in temp table.
this is executing by user from frontend or programmer from backend with specific permissions.
here the problem is, there is difference in execution time for this query.
sometimes it is taking 10 mins, sometimes it is taking 1 hour, but an average elapsed time is 10 mins, and one common thing is always it is giving the same amount of records (approximately same).
As ErikL mentioned checking the execution plan of the query is a good start. In Oracle 11g you can use the DBMS_PROFILER. This will give you information about the offending statements. I would run it multiple times and see what the difference is between multiple run times. First check to see if you have the DBMS_PROFILER installed. I believe it comes as a seperate package.
To start the profiler:
SQL> execute dbms_profiler.start_profiler('your_procedure_name');
Run your stored procedure:
SQL> exec your_procedure_name
Stop the profiler:
SQL> execute dbms_profiler.stop_profiler;
This will show you all statements in your store procedure and their associated run time, and this way you can narrow down the problem to possibly a single query that is causing the difference.
Here is the Oracle doc on DBMS_PROFILER:
Oracle DBMS PROFILER
If you are new to oracle then you can use dbms_output or use a logging table to store intermediate execution times, that way you will know which SQL is causing the issue.
declare
run_nbr number;
begin
run_nbr = 1; -- or get it from sequence
SQL1;
log(run_nbr ,user,'sql1',sysdate);
SQL2;
log(run_nbr ,user,'sql2',sysdate);
commit;
end;
here log procedure is nothing but simple insert statements which will insert into a table say "LOG" and which has minimal columns say run_nbr, user, sql_name, execution_date
procedure log(run_nbr number, user varchar2, sql_name varchar2, execution_date date)
is
begin
insert into log values(run_nbr, user, sql_name, execution_date);
-- commit; -- Un-comment if you are using pragma autonomous_transaction
end;
This is little time consuming to put these log statements, but can give you idea about the execution times. Later once you know the issue, you simply remove/comment these lines or take a code backup of your original procedure without these log statements and re-compile it after pin-pointing the issue.
I would check the execution plan of the query, maybe there are profiles in there that are not always used.
or if that doesn't solve it, you can also try tracing the session that calls the SP from the frontend. There's a very good explanation about tracing here: http://tinky2jed.wordpress.com/technical-stuff/oracle-stuff/what-is-the-correct-way-to-trace-a-session-in-oracle/

Query performance difference pl/sql forall insert and plain SQL insert

We have been using temporary table to store intermediate results in pl/sql Stored procedure. Could anyone tell if there is a performance difference between doing bulk collect insert through pl/sql and a plain SQL insert.
Insert into [Table name] [Select query Returning huge amount of data]
or
Cursor for [Select query returning huge amount of data]
open cursor
fetch cursor bulk collect into collection
Use FORALL to perform insert
Which of the above 2 options is better to insert huge amount of temporary data?.
Some experimental data for your problem (Oracle 9.2)
bulk collect
DECLARE
TYPE t_number_table IS TABLE OF NUMBER;
v_tab t_number_table;
BEGIN
SELECT ROWNUM
BULK COLLECT INTO v_tab
FROM dual
CONNECT BY LEVEL < 100000;
FORALL i IN 1..v_tab.COUNT
INSERT INTO test VALUES (v_tab(i));
END;
/
-- 2.6 sec
insert
-- test table
CREATE global TEMPORARY TABLE test (id number)
ON COMMIT preserve ROWS;
BEGIN
INSERT INTO test
SELECT ROWNUM FROM dual
CONNECT BY LEVEL < 100000;
END;
/
-- 1.4 sec
direct path insert
http://download.oracle.com/docs/cd/B10500_01/server.920/a96524/c21dlins.htm
BEGIN
INSERT /*+ append */ INTO test
SELECT ROWNUM FROM dual
CONNECT BY LEVEL < 100000;
END;
/
-- 1.2 sec
Insert into select must certainly be faster. Skips the overhead of storing the data in a collection first.
It depends on the nature of the work you're doing to populate the intermediate results. If the work can be done relatively simply in the SELECT statement for the INSERT, that will generally perform better.
However, if you have some complex intermediate logic, it may be easier (from a code maintenance point of view) to fetch and insert the data in batches using bulk collects/binds. In some cases it might even be faster.
One thing to note very carefully: the query plan used by the INSERT INTO x SELECT ... will sometimes be quite different to that used when the query is run by itself (e.g. in a PL/SQL explicit cursor). When comparing performance, you need to take this into account.
Tom Kyte of asktomhome fame has answered this question more firmly. If you are willing to do some searching you can find the question and his response which constains detailed testing results and explanations. He shows plsql cursor vs. plsql bulk collect including affect of periodic commit, vs. sql insert as select.
insert as select wins hands down all the time and the difference on even modest datasets is dramatic.
That said. the comment was made earlier about the complexity of intermediary computations. I can think of three situations where this would be relevant.
1) If computations require going outside of the Oracle database, then clearly a simple insert as select does not do the trick.
2) If the solution requires the use of PLSQL function calls then context switching can potentially kill your query and you may have better results with plsql calling plsql functions. PLSQl was made to call SQL but not the other way around. Thus calling PLSQL from SQL is expensive.
3) If computations make the sql code very difficulty to read then even though it may be slower, a plsql bulk collect solution may be better for these other reasons.
Good luck.
When we declare cursor explicitly, oracle will allocate a private SQL work area in our RAM. When you have select statement that returns multiple rows will be copied from table or view to private SQL work area as ACTIVE SET. Its size is the number of rows that meet your search criteria. Once cursor is opened, your pointer will be placed in the first row of ACTIVE SET. Here you can perform DML. For example if you perform some update operation. It will update any changes in rows in the work area and not in the table directly. So it is not using the table every time we need to update. It fetches once to the work area, then after performing operation, the update will be done once for all operations. This reduces input/output data transfer between database and user.
I Suggest using PL\SQL explicit cursor, u r just going to perform any DML operation at the private workspace alloted for the cursor. This will not hit the database server performance during peak hours

Bulk Insert into Oracle database: Which is better: FOR Cursor loop or a simple Select?

Which would be a better option for bulk insert into an Oracle database ?
A FOR Cursor loop like
DECLARE
CURSOR C1 IS SELECT * FROM FOO;
BEGIN
FOR C1_REC IN C1 LOOP
INSERT INTO BAR(A,
B,
C)
VALUES(C1.A,
C1.B,
C1.C);
END LOOP;
END
or a simple select, like:
INSERT INTO BAR(A,
B,
C)
(SELECT A,
B,
C
FROM FOO);
Any specific reason either one would be better ?
I would recommend the Select option because cursors take longer.
Also using the Select is much easier to understand for anyone who has to modify your query
The general rule-of-thumb is, if you can do it using a single SQL statement instead of using PL/SQL, you should. It will usually be more efficient.
However, if you need to add more procedural logic (for some reason), you might need to use PL/SQL, but you should use bulk operations instead of row-by-row processing. (Note: in Oracle 10g and later, your FOR loop will automatically use BULK COLLECT to fetch 100 rows at a time; however your insert statement will still be done row-by-row).
e.g.
DECLARE
TYPE tA IS TABLE OF FOO.A%TYPE INDEX BY PLS_INTEGER;
TYPE tB IS TABLE OF FOO.B%TYPE INDEX BY PLS_INTEGER;
TYPE tC IS TABLE OF FOO.C%TYPE INDEX BY PLS_INTEGER;
rA tA;
rB tB;
rC tC;
BEGIN
SELECT * BULK COLLECT INTO rA, rB, rC FROM FOO;
-- (do some procedural logic on the data?)
FORALL i IN rA.FIRST..rA.LAST
INSERT INTO BAR(A,
B,
C)
VALUES(rA(i),
rB(i),
rC(i));
END;
The above has the benefit of minimising context switches between SQL and PL/SQL. Oracle 11g also has better support for tables of records so that you don't have to have a separate PL/SQL table for each column.
Also, if the volume of data is very great, it is possible to change the code to process the data in batches.
If your rollback segment/undo segment can accomodate the size of the transaction then option 2 is better. Option 1 is useful if you do not have the rollback capacity needed and have to break the large insert into smaller commits so you don't get rollback/undo segment too small errors.
A simple insert/select like your 2nd option is far preferable. For each insert in the 1st option you require a context switch from pl/sql to sql. Run each with trace/tkprof and examine the results.
If, as Michael mentions, your rollback cannot handle the statement then have your dba give you more. Disk is cheap, while partial results that come from inserting your data in multiple passes is potentially quite expensive. (There is almost no undo associated with an insert.)
I think that in this question is missing one important information.
How many records will you insert?
If from 1 to cca. 10.000 then you should use SQL statement (Like they said it is easy to understand and it is easy to write).
If from cca. 10.000 to cca. 100.000 then you should use cursor, but you should add logic to commit on every 10.000 records.
If from cca. 100.000 to millions then you should use bulk collect for better performance.
As you can see by reading the other answers, there are a lot of options available. If you are just doing < 10k rows you should go with the second option.
In short, for approx > 10k all the way to say a <100k. It is kind of a gray area. A lot of old geezers will bark at big rollback segments. But honestly hardware and software have made amazing progress to where you may be able to get away with option 2 for a lot of records if you only run the code a few times. Otherwise you should probably commit every 1k-10k or so rows. Here is a snippet that I use. I like it because it is short and I don't have to declare a cursor. Plus it has the benefits of bulk collect and forall.
begin
for r in (select rownum rn, t.* from foo t) loop
insert into bar (A,B,C) values (r.A,r.B,r.C);
if mod(rn,1000)=0 then
commit;
end if;
end;
commit;
end;
I found this link from the oracle site that illustrates the options in more detail.
You can use:
Bulk collect along with FOR ALL that is called Bulk binding.
Because PL/SQL forall operator speeds 30x faster for simple table inserts.
BULK_COLLECT and Oracle FORALL together these two features are known as Bulk Binding. Bulk Binds are a PL/SQL technique where, instead of multiple individual SELECT, INSERT, UPDATE or DELETE statements are executed to retrieve from, or store data in, at table, all of the operations are carried out at once, in bulk. This avoids the context-switching you get when the PL/SQL engine has to pass over to the SQL engine, then back to the PL/SQL engine, and so on, when you individually access rows one at a time. To do bulk binds with INSERT, UPDATE, and DELETE statements, you enclose the SQL statement within a PL/SQL FORALL statement. To do bulk binds with SELECT statements, you include the BULK COLLECT clause in the SELECT statement instead of using INTO.
It improves performance.
I do neither for a daily complete reload of data. For example say I am loading my Denver site. There are other strategies for near real time deltas.
I use a create table SQL as I have found is just almost as fast as a bulk load
For example, below a create table statement is used to stage the data, casting the columns to the correct data type needed:
CREATE TABLE sales_dataTemp as select
cast (column1 as Date) as SALES_QUARTER,
cast (sales as number) as SALES_IN_MILLIONS,
....
FROM
TABLE1;
this temporary table mirrors my target table's structure exactly which is list partitioned by site.
I then do a partition swap with the DENVER partition and I have a new data set.