SQL optimization - sql

Right now I am facing with an optimization problem.
I have a list of aticles (17000+) and some of them are inactive. The list is provided by the client into an EXCEL file and he asked me to resend them (obviosly only those active).
For this, I have to filter the production database based on the list provided by the customer. Unfortunately, I cannot load the list into a sepparate table from production and then join with master article table but I was able to do this into a UAT database, linked with production one.
The production article master data contains 200 000 000+ rows but filtering it, I can redure to around 80 000 000.
I order to retreive only the active article from production, I was thinking to use collections but it seems the last filter is taking tooooooo long.
Here are my code:
declare
type t_art is table of number index by pls_integer;
v_art t_art;
v_filtered t_art;
idx number := 0;
begin
for i in (select * from test_table#UAT_DATABASE)
loop
idx := idx + 1;
v_art(idx) := i.art_nr;
end loop;
for j in v_art.first .. v_art.last
loop
select distinct art_nr
bulk collect into v_filtered
from production_article_master_data
where status = 0 -- status is active
and sperr_stat in (0, 2)
and trunc(valid_until) >= trunc(sysdate)
and art_nr = v_art(j);
end loop;
end;
Explanation: from UAT database, via DBLink, I am insertinting the list into an ASSOCIATIVE ARRAY in production (v_art). Then, for each value in v_art(17000+ distinct articles), I am filtering with production article master data, returning in 2nd ASSOCITIAVE ARRAY, only the valid articles (there might be 6-8000).
Unfortunately, this filtering action is taking hours.
Can someone provide me some hints how to improve this in orde to decrease the execution time, please?
Thank you,

Just use SQL and join the two tables:
select distinct p.art_nr
from production_article_master_data p
INNER JOIN
test_table#UAT_DATABASE t
ON ( p.art_nr = t.art_nr )
where status = 0 -- status is active
and sperr_stat in (0, 2)
and trunc(valid_until) >= trunc(sysdate)
If you have to do it in PL/SQL then:
CREATE OR REPLACE TYPE numberlist IS TABLE OF NUMBER;
/
declare
-- If you are using Oracle 12c you should be able to declare the
-- type in the PL/SQL block. In earlier versions you will need to
-- declare it in the SQL scope instead.
-- TYPE numberlist IS TABLE OF NUMBER;
v_art NUMBERLIST;
v_filtered NUMBERLIST;
begin
select art_nr
BULK COLLECT INTO v_art
from test_table#UAT_DATABASE;
select distinct art_nr
bulk collect into v_filtered
from production_article_master_data
where status = 0 -- status is active
and sperr_stat in (0, 2)
and trunc(valid_until) >= trunc(sysdate)
and art_nr MEMBER OF v_art;
end;

Related

How to store multiple rows in a variable in pl/sql function?

I'm writing a pl/sql function. I need to select multiple rows from select statement:
SELECT pel.ceid
FROM pa_exception_list pel
WHERE trunc(pel.creation_date) >= trunc(SYSDATE-7)
if i use:
SELECT pel.ceid
INTO v_ceid
it only stores one value, but i need to store all values that this select returns. Given that this is a function i can't just use simple select because i get error, "INTO - is expected."
You can use a record type to do that. The below example should work for you
DECLARE
TYPE v_array_type IS VARRAY (10) OF NUMBER;
var v_array_type;
BEGIN
SELECT x
BULK COLLECT INTO
var
FROM (
SELECT 1 x
FROM dual
UNION
SELECT 2 x
FROM dual
UNION
SELECT 3 x
FROM dual
);
FOR I IN 1..3 LOOP
dbms_output.put_line(var(I));
END LOOP;
END;
So in your case, it would be something like
select pel.ceid
BULK COLLECT INTO <variable which you create>
from pa_exception_list
where trunc(pel.creation_Date) >= trunc(sysdate-7);
If you really need to store multiple rows, check BULK COLLECT INTO statement and examples. But maybe FOR cursor LOOP and row-by-row processing would be better decision.
You may store all in a rowtype parameter and show whichever column you want to show( assuming ceid is your primary key column, col1 & 2 are some other columns of your table ) :
SQL> set serveroutput on;
SQL> declare
l_exp pa_exception_list%rowtype;
begin
for c in ( select *
from pa_exception_list pel
where trunc(pel.creation_date) >= trunc(SYSDATE-7)
) -- to select multiple rows
loop
select *
into l_exp
from pa_exception_list
where ceid = c.ceid; -- to render only one row( ceid is primary key )
dbms_output.put_line(l_exp.ceid||' - '||l_exp.col1||' - '||l_exp.col2); -- to show the results
end loop;
end;
/
SET SERVEROUTPUT ON
BEGIN
FOR rec IN (
--an implicit cursor is created here
SELECT pel.ceid AS ceid
FROM pa_exception_list pel
WHERE trunc(pel.creation_date) >= trunc(SYSDATE-7)
)
LOOP
dbms_output.put_line(rec.ceid);
END LOOP;
END;
/
Notes from here:
In this case, the cursor FOR LOOP declares, opens, fetches from, and
closes an implicit cursor. However, the implicit cursor is internal;
therefore, you cannot reference it.
Note that Oracle Database automatically optimizes a cursor FOR LOOP to
work similarly to a BULK COLLECT query. Although your code looks as if
it fetched one row at a time, Oracle Database fetches multiple rows at
a time and allows you to process each row individually.

Performance of Local PL/SQL Array Vs SQL Call

If I'm using oracle sql and plsql to do computations on an employee, and then selecting more data based on the result of those computations... will it be faster to select the all the data I may need all at once when selecting the employee (assume something like 20 rows with 5 columns each) and then use that data as a local array, or select just the one row I will need when finished?
-- Example with multiple selects
declare
l_employeeID number;
l_birthday date;
l_horoscope varchar2;
begin
select employeeID into l_employeeID from employeeTbl t where t.rownum = 1;
l_birthday := get_birthdayFromID(l_employeeID);
select horoscope into l_horoscope from horoscopeTable t where l_birthday between l_horoscope.fromDate and l_horoscope.toDate;
return l_horoscope;
end;
-- Example with table selection, and loop iteration
declare
l_empolyeeID number;
l_birthday date;
l_horoscope varchar2;
l_horoscopeDates date_table;
begin
select employeeID, cast(multiset(select horoscopeDate from horoscopeTable)) as date_table into l_employeeID, l_horoscopeDates from employeeTbl t where t.rownum = 1;
l_birthday := get_birthdayFromID(l_employeeID);
for i in 1 .. l_horoscopeDates.count - 1 loop
if l_birthday between l_horoscopeDates(i) and l_horoscopeDates(i + 1) then
return l_horoscopeDates(i);
end if;
end loop;
return null;
end;
I understand that I'm paying more ram and IO to select additional data, but is it more efficient than incurring another context switch to call the sql when the extra data is not significantly larger than needed?
Context switches are considered very expansive when using Oracle. If the table doesn't contain large amounts of data, you should defeinitely query more data in order to reduce the number of times PL/SQL makes an SQL query.
Having said that, I think your question should be the other way around, why are you using PL/SQL at all, when your entire logic can be summed into a single SQL statement?
example -
select horoscope
from horoscopeTable
where (select get_birthdayFromID(employeeID)
from employeeTbl
where t.rownum = 1) between l_horoscope.fromDate and l_horoscope.toDate;
Syntax might need a little touch ups but the main idea seems right to me.
You can also find more observations in this piece about tuning PL/SQL code - http://logicalread.solarwinds.com/sql-plsql-oracle-dev-best-practices-mc08/

For loop update better alternative

In Oracle 11g, I am using the following in a procedure.. can someone please provide a better solution to achieve the same results.
FOR REC IN
(SELECT E.EMP FROM EMPLOYEE E
JOIN
COMPANY C ON E.EMP=C.EMP
WHERE C.FLAG='Y')
LOOP
UPDATE EMPLOYEE SET FLAG='Y' WHERE EMP=REC.EMP;
END LOOP;
Is there a more efficient/better way to do this? I feel as if this method will run one update statement for each record found (Please correct me if I am wrong).
Here's the is actual code in full:
create or replace
PROCEDURE ACTION_MSC AS
BEGIN
-- ALL MIGRATED CONTACTS, CANDIDATES, COMPANIES, JOBS
-- ALL MIGRATED CANDIDATES, CONTACTS
FOR REC IN (SELECT DISTINCT AC.PEOPLE_HEX
FROM ACTION AC JOIN PEOPLE P ON AC.PEOPLE_HEX=P.PEOPLE_HEX
WHERE P.TO_MIGRATE='Y')
LOOP
UPDATE ACTION SET TO_MIGRATE='Y' WHERE PEOPLE_HEX=REC.PEOPLE_HEX;
END LOOP;
-- ALL MIGRATED COMPANIES
FOR REC IN (SELECT DISTINCT AC.COMPANY_HEX
FROM ACTION AC JOIN COMPANY CM ON AC.COMPANY_HEX=CM.COMPANY_HEX
WHERE CM.TO_MIGRATE='Y')
LOOP
UPDATE ACTION SET TO_MIGRATE='Y' WHERE COMPANY_HEX=REC.COMPANY_HEX;
END LOOP;
-- ALL MIGRATED JOBS
FOR REC IN (SELECT DISTINCT AC.JOB_HEX
FROM ACTION AC JOIN "JOB" J ON AC.JOB_HEX=J.JOB_HEX
WHERE J.TO_MIGRATE='Y')
LOOP
UPDATE ACTION SET TO_MIGRATE='Y' WHERE JOB_HEX=REC.JOB_HEX;
END LOOP;
COMMIT;
END ACTION_MSC;
You're right, it will do one update for each record found. Looks like you could just do:
UPDATE EMPLOYEE SET FLAG = 'Y'
WHERE EMP IN (SELECT EMP FROM COMPANY WHERE FLAG = 'Y')
AND FLAG != 'Y';
A single update will generally be faster and more efficient than multiple individual row updates in a loop; see this answer for another example. Apart from anything else, you're reducing the number of context switches between PL/SQL and SQL, which add up if you have a lot of rows. You could always benchmark this with your own data, of course.
I've added a check of the current flag state so you don't do a pointless update with no chamges.
It's fairly easy to compare the approaches to see that a single update is faster than one in a loop; with some contrived data:
create table people (id number, people_hex varchar2(16), to_migrate varchar2(1));
insert into people (id, people_hex, to_migrate)
select level, to_char(level - 1, 'xx'), 'Y'
from dual
connect by level <= 100;
create table action (id number, people_hex varchar2(16), to_migrate varchar2(1));
insert into action (id, people_hex, to_migrate)
select level, to_char(mod(level, 200), 'xx'), 'N'
from dual
connect by level <= 500000;
All of these will update half the rows in the action table. Updating in a loop:
begin
for rec in (select distinct ac.people_hex
from action ac join people p on ac.people_hex=p.people_hex
where p.to_migrate='Y')
loop
update action set to_migrate='Y' where people_hex=rec.people_hex;
end loop;
end;
/
Elapsed: 00:00:10.87
Single update (after rollback; I've left this in a block to mimic your procedure):
begin
update action set to_migrate = 'Y'
where people_hex in (select people_hex from people where to_migrate = 'Y');
end;
/
Elapsed: 00:00:07.14
Merge (after rollback):
begin
merge into action a
using (select people_hex, to_migrate from people where to_migrate = 'Y') p
on (a.people_hex = p.people_hex)
when matched then update set a.to_migrate = p.to_migrate;
end;
/
Elapsed: 00:00:07.00
There's some variation from repeated runs, particularly that update and merge are usually pretty close but sometimes swap which is faster in my environment; but both are always significantly faster than updating in a loop. You can repeat this in your own environment and with your own data spread and volumes, and you should if performance is that critical; but a single update is going to be faster than the loop. Whether you use update or merge isn't likely to make much difference.

PL/SQL loop through cursor

My problem isn't overly complicated, but I am a newbie to PL/SQL.
I need to make a selection from a COMPANIES table based on certain conditions. I then need to loop through these and convert some of the fields into a different format (I have created functions for this), and finally use this converted version to join to a reference table to get the score variable I need. So basically:
select id, total_empts, bank from COMPANIES where turnover > 100000
loop through this selection
insert into MY_TABLE (select score from REF where conversion_func(MY_CURSOR.total_emps) = REF.total_emps)
This is basically what I am looking to do. It's slightly more complicated but I'm just looking for the basics and how to approach it to get me started!
Here's the basic syntax for cursor loops in PL/SQL:
BEGIN
FOR r_company IN (
SELECT
ID,
total_emps,
bank
FROM
companies
WHERE
turnover > 100000
) LOOP
INSERT INTO
my_table
SELECT
score
FROM
ref_table
WHERE
ref.total_emps = conversion_func( r_company.total_emps )
;
END LOOP;
END;
/
You don't need to use PL/SQL to do this:
insert into my_table
select score
from ref r
join companies c
on r.total_emps on conversion_func(c.total_emps)
where c.turnover > 100000
If you have to do this in a PL/SQL loop as asked, then I'd ensure that you do as little work as possible. I would, however, recommend bulk collect instead of the loop.
begin
for xx in ( select conversion_func(total_emps) as tot_emp
from companies
where turnover > 100000 ) loop
insert into my_table
select score
from ref
where total_emps = xx.tot_emp
;
end loop;
end;
/
For either method you need one index on ref.total_emps and preferably one on companies.turnover

Fast Update database with more than 10 million records

I am fairly new to SQL and was wondering if someone can help me.
I got a database that has around 10 million rows.
I need to make a script that finds the records that have some NULL fields, and then updates it to a certain value.
The problem I have from doing a simple update statement, is that it will blow the rollback space.
I was reading around that I need to use BULK COLLECT AND FETCH.
My idea was to fetch 10,000 records at a time, update, commit, and continue fetching.
I tried looking for examples on Google but I have not found anything yet.
Any help?
Thanks!!
This is what I have so far:
DECLARE
CURSOR rec_cur IS
SELECT DATE_ORIGIN
FROM MAIN_TBL WHERE DATE_ORIGIN IS NULL;
TYPE date_tab_t IS TABLE OF DATE;
date_tab date_tab_t;
BEGIN
OPEN rec_cur;
LOOP
FETCH rec_cur BULK COLLECT INTO date_tab LIMIT 1000;
EXIT WHEN date_tab.COUNT() = 0;
FORALL i IN 1 .. date_tab.COUNT
UPDATE MAIN_TBL SET DATE_ORIGIN = '23-JAN-2012'
WHERE DATE_ORIGIN IS NULL;
END LOOP;
CLOSE rec_cur;
END;
I think I see what you're trying to do. There are a number of points I want to make about the differences between the code below and yours.
Your forall loop will not use an index. This is easy to get round by using rowid to update your table.
By committing after each forall you reduce the amount of undo needed; but make it more difficult to rollback if something goes wrong. Though logically your query could be re-started in the middle easily and without detriment to your objective.
rowids are small, collect at least 25k at a time; if not 100k.
You cannot index a null in Oracle. There are plenty of questions on stackoverflow about this is you need more information. A functional index on something like nvl(date_origin,'x') as a loose example would increase the speed at which you select data. It also means you never actually have to use the table itself. You only select from the index.
Your date data-type seems to be a string. I've kept this but it's not wise.
If you can get someone to increase your undo tablespace size then a straight up update will be quicker.
Assuming as per your comments date_origin is a date then the index should be on something like:
nvl(date_origin,to_date('absolute_minimum_date_in_Oracle_as_a_string','yyyymmdd'))
I don't have access to a DB at the moment but to find out the amdiOaas run the following query:
select to_date('0001','yyyy') from dual;
It should raise a useful error for you.
Working example in PL/SQL Developer.
create table main_tbl as
select cast( null as date ) as date_origin
from all_objects
;
create index i_main_tbl
on main_tbl ( nvl( to_date(date_origin,'yyyy-mm-dd')
, to_date('0001-01-01' ,'yyyy-mm-dd') )
)
;
declare
cursor c_rec is
select rowid
from main_tbl
where nvl(date_origin,to_date('0001-01-01','yyyy-mm-dd'))
= to_date('0001-01-01','yyyy-mm-dd')
;
type t__rec is table of rowid index by binary_integer;
t_rec t__rec;
begin
open c_rec;
loop
fetch c_rec bulk collect into t_rec limit 50000;
exit when t_rec.count = 0;
forall i in t_rec.first .. t_rec.last
update main_tbl
set date_origin = to_date('23-JAN-2012','DD-MON-YYYY')
where rowid = t_rec(i)
;
commit ;
end loop;
close c_rec;
end;
/