Oracle SQL and PL/SQL : how to minimize execution time of retrieving members of the object (returned by user's function) - sql

I wrote a function to get a number of values in an Oracle view. As functions can't return more then one value, I have used an object (with a signature of 8 numbers). This works, but not fine...
The execution time of a select query (and selecting from view, based on this query) is proportional to retrieved members number, i.e.:
retrieving 1 attribute consumes 1 second (it's equal to retrieve a WHOLE object, but object value is unusable for report),
retrieving 2 attributes consumes 2 seconds,
and so on...
This looks like Oracle executes PL function to get every member of returned object.
I think that function, returning varray(8) of numbers will not solve the problem too: eight implicit calls must be replaced by eight explicit subqueries. Can anybody solve this problem? (Except to rewrite to use a function returning one string, which I will try myself now...)
Here is the type declaration:
create or replace type "ARD"."PAY_FINE_FR_12_" AS object
(fed1 number
, reg1 number
, fed_nach number
, reg_nach number
, fed_upl number
, reg_upl number
, fed2 number
, reg2 number);

I will assume you have given meaningful names to your type's attributes. In which case you are returning not eight numbers but four pairs of numbers. This suggests a possible way of improving things. Whether it could actually solve your problem will depend on the precise details of your situation (which you have not provided).
Here is a type representing those number pairs, and a nested table type we can use for array processing.
create or replace type pay_pair as object
( pay_cat varchar2(4)
, fed number
, reg number )
/
create or replace type pay_pair_nt as table of pay_pair
/
This is a function which populates an array with four pairs of numbers. In the absence of any actual business rule I have plumped for the simplest possible example.
create or replace function get_pay_pairs
return pay_pair_nt
is
return_value pay_pair_nt;
begin
select
pay_pair (
case col1
when 1 then 'one'
when 2 then 'nach'
when 3 then 'upl'
when 4 then 'two'
else null;
end
, fed
, pay )
bulk collect into return_value
from v23;
return return_value;
end;
/
If you need the signature of the original type you can rewrite your function like this:
create or replace function get_pay_fine
return PAY_FINE_FR_12_
is
return_value PAY_FINE_FR_12_;
l_array pay_pair_nt;
begin
l_array := get_pay_pairs;
for i in 1..4 loop
case l_array(i).pay_cat
when 'one' then
return_value.fed1 := l_array(i).fed;
return_value.reg1 := l_array(i).reg;
when 'nach' then
return_value.fed_nach := l_array(i).fed;
return_value.reg_nach := l_array(i).reg;
when 'upl' then
return_value.fed_upl := l_array(i).fed;
return_value.reg_upl := l_array(i).reg;
else
return_value.fed2 := l_array(i).fed;
return_value.reg2 := l_array(i).reg;
end case;
end loop;
return return_value;
end;
I'll repeat, this is a demonstration of available techniques rather than a proposed solution. The crux is how your view supplies the values.

Related

I wrote the procedure, only one column is filled in, tell me where the error is?

I tried to write a procedure that fills the table with random data using the pseudo-random sequence formula, only one column is filled, help find my error, here is my code:
CREATE OR REPLACE PROCEDURE pr_zakaz2(Номер integer, Сумма integer)
AS
$$
BEGIN
INSERT INTO Заказ(Номер, Сумма)
SELECT Заказ(Номер, Сумма) FROM generate_series(1, 100000), i WHERE
result = next * 1103515245+12345;
END;
$$ language 'plpgsql';
enter image description here
enter image description here
I think the syntax you want is:
INSERT INTO Заказ(Номер, Сумма)
SELECT i, (next * 1103515245+12345) % 101
FROM generate_series(1, 100000) gs(i)
END;
Note that %101 which is normally used in this case. This limits the result to some finite range -- in this case 0 to 100. Also, the two numbers used would typically be prime (or at least relatively prime) -- and numbers ending in 5 are not.

How to speed up custom window function with arrays and loops in PostgreSQL?

I'm currently learning UDFs and wrote the PostgreSQL UDF below to calculate the mean average deviation (MAD). It is the average absolute difference between the mean and the current value over any window.
In python pandas/numpy, to find the MAD, we could write something like this:
series_mad = abs(series - series.mean()).mean()
Where series is a set of numbers and series_mad is a single numeric value representing the MAD of the series.
I'm trying to write this in PostgreSQL using Windows and UDF. So far, this is what I have:
CREATE TYPE misc_tuple AS (
arr_store numeric[],
ma_period integer
);
CREATE OR REPLACE FUNCTION mad_func(prev misc_tuple, curr numeric, ma_period integer)
RETURNS misc_tuple AS $$
BEGIN
IF curr is null THEN
RETURN (null::numeric[], -1);
ELSEIF prev.arr_store is null THEN
RETURN (ARRAY[curr]::numeric[], ma_period);
ELSE
-- accumulate new values in array
prev.arr_store := array_append(prev.arr_store, curr);
RETURN prev;
END IF;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION mad_final(prev misc_tuple)
RETURNS numeric AS $$
DECLARE
total_len integer;
count numeric;
mad_val numeric;
mean_val numeric;
BEGIN
count := 0;
mad_val := 0;
mean_val := 0;
total_len := array_length(prev.arr_store, 1);
-- first loop to find the mean of the series
FOR i IN greatest(1,total_len-prev.ma_period+1)..total_len
LOOP
mean_val := mean_val + prev.arr_store[i];
count := count + 1;
END LOOP;
mean_val := mean_val/NULLIF(count,0);
-- second loop to subtract mean from each value
FOR i IN greatest(1,total_len-prev.ma_period+1)..total_len
LOOP
mad_val := mad_val + abs(prev.arr_store[i]-mean_val);
END LOOP;
RETURN mad_val/NULLIF(count, 0);
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE AGGREGATE mad(numeric, integer) (
SFUNC = mad_func,
STYPE = misc_tuple,
FINALFUNC = mad_final
);
This is how I'm testing the performance:
-- find rolling 12-period MAD
SELECT x,
mad(x, 12) OVER (ROWS 12-1 PRECEDING)
FROM generate_series(0,1000000) as g(x);
Currently, it takes ~45-50 secs on my desktop (i5 4670, 3.4 GHz, 16 GB RAM). I'm still learning UDFs, so I'm not sure what else I could do to my function to make it faster. I have a few other similar UDFs - but ones which don't use arrays and they take <15 secs on the same 1m rows. My guess is maybe I'm not efficiently looping the arrays or something could be done about the 2 loops in the UDF.
Are there any changes I can make to this UDF to make it faster?
Your example code does not work, you have an extra comma in the type definition, and you use an undefined variable cnt in one of the functions.
Why are you specifying 12 as both an argument to the aggregate itself, and in the ROWS PRECEDING? That seems redundant.
Your comparison to numpy doesn't seem very apt, as that is not a sliding window function.
I have a few other similar UDFs - but ones which don't use arrays and they take <15 secs on the same 1m rows
Are they also used as sliding window functions? Also written in plpgsql? Could you show one and it's usage?
pl/pgsql is not generally a very efficient language, especially not when manipulating large arrays. Although in your usage, the arrays never get very large, so I would not expect that to be particularly a problem.
One way to make it more efficient would be to write the code in C rather than pl/pgsql, using INTERNAL datatype rather than an SQL composite type.
Another way to improve this particular usage (a large number of windows each of which is small) might be to implement the MINVFUNC function and friends for this aggregate, so that it doesn't have to keep restarting the aggregation from scratch for every row.
Here is an example inverse function, which doesn't change the output at all, but does cut the run time in about half:
CREATE OR REPLACE FUNCTION mad_invfunc(prev misc_tuple, curr numeric, ma_period integer)
RETURNS misc_tuple AS $$
BEGIN
-- remove prev value
prev.arr_store := prev.arr_store[2:];
RETURN prev;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE AGGREGATE mad(numeric, integer) (
SFUNC = mad_func,
STYPE = misc_tuple,
FINALFUNC = mad_final,
MSFUNC = mad_func,
MSTYPE = misc_tuple,
MFINALFUNC = mad_final,
MINVFUNC = mad_invfunc
);
If I change the type from numeric to double precision everywhere they cuts the run time in half again. So while the loops over the array might not be terribly efficient, when using only 12-member windows they are not the main bottleneck.

remove_duplicate_vertices in Oracle Spatial converted polygon/multipolygon types to collections, How do I convert these back to multipolygons?

I have run REMOVE_DUPLICATE_VERTICES on a spatial table which contained polygons and multipolygons, now there are 6 geometries tagged as Collection (2004). How can I safely change these back to multipolygons?
Example of one (with shortened coodinate array)
MDSYS.SDO_GEOMETRY(2004,8307,NULL,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1,5,1003,1,13,1003,1,21,1003,1,279,2003,1,581,2003,1,961,2003,1,1551,2003,1,2073,2003,1,2215,2003,1,2277,2003,1,2349,2003,1,2379,2003,1,2671,2003,1,2847,2003,1,3033,2003,1,3145,2003,1,3263,2003,1,3271,2003,1,3403,2003,1),MDSYS.SDO_ORDINATE_ARRAY(-89.60549292,-1.30359069399998,...,-89.6571104179999,-0.900517332999925))
One way to do this would be:
declare wgeom mdsys.sdo_geometry; outgeom mdsys.sdo_geometry;
begin
for i in 1 .. nvl(sdo_util.getNumElem(ingeom) ,0)
loop
wgeom := sdo_util.extract(ingeom, i, 0);
if wgeom.get_gtype() in (3,7) then
outgeom := sdo_util.append(outgeom, wgeom);
end if;
end loop;
It loops through the geometry's elements and for each one that is a polygon appends it to a geometry variable. You can put it in a function which you 'feed' with your geometry (which should be a valid geometry) and it outputs the filtered outgeom.
I use this approach because it is easy to put logic in the loop to filter out e.g. elements of under a certain area e.t.c. I use in production, with millions of geometries.
If you need further assistance, let me know.

Number formatting in Oracle using TO_CHAR

Proper way to format the numbers in ORACLE stored procedures.
I need to display currency fields with 2 decimals.
Expected output is as follows:
0 > 0.00
5 > 5.00
1253.6 > 1253.60
1253.689 > 1253.69
Below worked for me:
select to_char(9876.23 , 'fm999990.00') from dual;
But this has the issue of hard coding a bunch of 9s. If I give a larger number it will be displayed as "##############"
Is there any other way I can do this?
I need to display currency fields with 2 decimals.
Ensure you use the number data-type with scale and precision appropriate to the data rather than using NUMBER without scale and precision. If you are going to be storing dollars/euroes/pounds/etc. then the Gross World Product was of the order of $100,000,000,000,000 in 2014. Lets assume that you are not going to be dealing with more than this[citation needed] then your currency column can be:
NUMBER(17,2)
If you get a value that is bigger than that then you need to perform a sanity check on your data and think whether an amount bigger than the world's gross product makes sense. If you are going to store the values as, for example, Yen or Zimbabwe dollars then adjust the scale appropriately.
You could even define a sub-type in a package as:
CREATE PACKAGE currencies_pkg IS
SUBTYPE currency_type IS NUMBER(17,2);
FUNCTION formatCurrency(
amount IN CURRENCY_TYPE
) RETURN VARCHAR2;
END;
/
And your code to format it can be:
CREATE PACKAGE BODY currencies_pkg IS
FUNCTION formatCurrency(
amount IN CURRENCY_TYPE
) RETURN VARCHAR2
IS
BEGIN
RETURN TO_CHAR( currency_value, 'FM999999999999990D00' );
END;
END;
/
Then if you reference that sub-type in your stored procedures/packages you will not be able to exceed the maximum size of the currency data type without an exception being raised. The format model for displaying the value only needs to be defined in a single place and since the input is limited to the currency sub-type, then the formatting function will never exceed the imposed scale/precision and cannot output #s.
CREATE PROCEDURE your_procedure(
in_value1 IN ACCOUNTS_TABLE.ACCOUNT_BALANCE%TYPE,
in_value2 IN ACCOUNTS_TABLE.ACCOUNT_BALANCE%TYPE
)
IS
v_value CURRENCIES_PKG.CURRENCY_TYPE;
BEGIN
-- Do something
v_value := in_value1 + in_value2;
-- Output formatted value
DBMS_OUTPUT.PUT_LINE( CURRENCIES_PKG.formatCurrency( v_value ) );
END;
/
Why is "hardcoding a bunch of 9s" an issue? (It's how you need to do it if you plan to use TO_CHAR)
select to_char(9876.23 , 'fm9999999999999999999990D00') from dual;
ps; you might want to consider using D rather than . (not every country uses . as a decimal separator - D is language sensitive and will use the appropriate symbol)

oracle stored procedure - local variable behaving mysteriously

This is how I am delcaring the local variables:
team_counter number (38) := 0;
username varchar2(50) := '';
This is how I am trying to use/see their value after using some select into statement:
dbms_output.put_line(team_counter||'.'||username);
if team_counter< 30 AND username <>'' then
begin
dbms_output.put_line('yuhj');
end;
end if;
The second output is not being printed!
The first output is being printed as '1.tuser' which I was expecting.
This is because you're trying to compare a string with a 0 length string using an inequality operator.
Oracle assumes that 0 length strings are equivalent to NULL and will not evaluate comparisons that don't use the NULL specific conditional. To quote:
Oracle Database currently treats a character value with a length of
zero as null. However, this may not continue to be true in future
releases, and Oracle recommends that you do not treat empty strings
the same as nulls.
Simply put this means that your IF statement should be:
if team_counter < 30 and username is not null then
...
As an additional note there's no need for the begin ... end around the dbms_output.put_line. As you're not catching any exceptions explicitly related to this call or declaring additional variables etc there's no real need.