I'm migrating an Oracle database to PostgreSQL, to transfer the tables I had no problem, however I'm having trouble transcribing a function for it to run in Postgres, below the function found in Oracle:
create or replace FUNCTION "FN_HOUR_MINUTE" (P_HOUR IN NUMBER)
RETURN NUMBER
IS
-- PL/SQL Specification
V_RETORN NUMBER(4);
-- Convert hour to minute
-- PL/SQL Block
BEGIN
V_RETORN := 60*TO_NUMBER(SUBSTR(LTRIM(TO_CHAR(P_HOUR,'0000'),' ') ,1,2))+
TO_NUMBER(SUBSTR(LTRIM(TO_CHAR(P_HOUR, '0000'),' '), 3,2));
RETURN V_RETORN;
EXCEPTION
WHEN OTHERS THEN
RETURN NULL ;
END;
I tried writing in postgres as follows:
CREATE OR REPLACE FUNCTION fn_hour_minute(p_hour in NUMERIC)
RETURNS NUMERIC(4) AS $$
DECLARE
v_retorn NUMERIC(4);
BEGIN
v_retorn := 60*TO_NUMBER(SUBSTR(LTRIM(TO_CHAR(p_hour,'0000'),' ') ,1,2))+
TO_NUMBER(SUBSTR(LTRIM(TO_CHAR(p_hour, '0000'),' '), 3,2));
RETURN v_retorn;
END;
$$ LANGUAGE plpgsql;
But gives an error that says the to_number function does not exist.
If you spread the expression into factors:
select TO_CHAR(1234,'0000'),
ltrim(TO_CHAR(1234,'0000')),
substr(ltrim(TO_CHAR(1234,'0000')),1,2),
substr(ltrim(TO_CHAR(1234,'0000')),3,2)
from dual;
TO_CH LTRIM SU SU
----- ----- -- --
1234 1234 12 34
you will see that this is just a very advanced way to calculate such an expression
60 * TRUNC( p_hour / 100 ) + p_hour % 100
You forgot to include the formating for the TO_NUMBER section. Update the TO_NUMBER to TO_NUMBER(SUBSTR(LTRIM(TO_CHAR(p_hour,'0000'),' ') ,1,2), '0000') and it should work.
Related
I have strings (saved in database as varchar) and I have to cut them just before n'th occurence of delimiter.
Example input:
String: 'My-Example-Awesome-String'
Delimiter: '-'
Occurence: 2
Output:
My-Example
I implemented this function for fast prototype:
CREATE OR REPLACE FUNCTION find_position_delimiter(fulltext varchar, delimiter varchar, occurence integer)
RETURNS varchar AS
$BODY$
DECLARE
result varchar = '';
arr text[] = regexp_split_to_array( fulltext, delimiter);
word text;
counter integer := 0;
BEGIN
FOREACH word IN ARRAY arr LOOP
EXIT WHEN ( counter = occurence );
IF (counter > 0) THEN result := result || delimiter;
END IF;
result := result || word;
counter := counter + 1;
END LOOP;
RETURN result;
END;
$BODY$
LANGUAGE 'plpgsql' IMMUTABLE;
SELECT find_position_delimiter('My-Example-Awesome-String', '-', 2);
For now it assumes that string is not empty (provided by query where I will call function) and delimiter string contains at least one delimiter of provided pattern.
But now I need something better for performance test. If it is possible, I would love to see the most universal solution, because not every user of my system is working on PostgreSQL database (few of them prefer Oracle, MySQL or SQLite), but it is not the most importatnt. But performance is - because on specific search, that function can be called even few hundreds times.
I didn't find anything about fast and easy using varchar as a table of chars and checking for occurences of delimiter (I could remember position of occurences and then create substring from first char to n'th delimiter position-1). Any ideas? Are smarter solutions?
# EDIT: yea, I know that function in every database will be a bit different, but body of function can be very similliar or the same. Generality is not a main goal :) And sorry for that bad function working-name, I just saw it has not right meaning.
you can try doing something based on this:
select
varcharColumnName,
INSTR(varcharColumnName,'-',1,2),
case when INSTR(varcharColumnName,'-',1,2) <> 0
THEN SUBSTR(varcharColumnName, 1, INSTR(varcharColumnName,'-',1,2) - 1)
else '...'
end
from tableName;
of course, you have to handle "else" the way you want. It works on postgres and oracle (tested), it should work on other dbms's because these are standard sql functions
//edit - as a function, however this way it's rather hard to make it cross-dbms
CREATE OR REPLACE FUNCTION find_position_delimiter(fulltext varchar, delimiter varchar, occurence integer)
RETURNS varchar as
$BODY$
DECLARE
result varchar := '';
delimiterPos integer := 0;
BEGIN
delimiterPos := INSTR(fulltext,delimiter,1,occurence);
result := SUBSTR(fulltext, 1, delimiterPos - 1);
RETURN result;
END;
$BODY$
LANGUAGE 'plpgsql' IMMUTABLE;
SELECT find_position_delimiter('My-Example-Awesome-String', '-', 2);
create or replace function trunc(string text, delimiter char, occurence int) returns text as $$
return delimiter.join(string.split(delimiter)[:occurence])
$$ language plpythonu;
# select trunc('My-Example-Awesome-String', '-', 2);
trunc
------------
My-Example
(1 row)
In Ruby:
-2 % 24
=> 22
In Postgres:
SELECT -2 % 24;
?column?
----------
-2
SELECT mod(-2,24);
mod
-----
-2
I can easily write one myself, but I'm curious whether Postgres has a real modulus operation, as opposed to remainder after division.
SELECT MOD(24 + MOD(-2, 24), 24);
will return 22 instead of -2
It seems that I won't find anything easier than:
CREATE OR REPLACE FUNCTION modulus(dividend numeric, divisor numeric) RETURNS numeric AS $$
DECLARE
result numeric;
BEGIN
divisor := ABS(divisor);
result := MOD(dividend, divisor);
IF result < 0 THEN
result := result + divisor;
END IF;
RETURN result;
END;
$$ LANGUAGE plpgsql;
Just a belated answer to the side question that a_horse_with_no_name asked:
Yes this is specified in the SQL standard. And the mentioned vendors comply.
For simple things is it better to use the translate function on the premise that it is less CPU intensive or is regexp_replace the way to go?
This question comes forth from How can I replace brackets to hyphens within Oracle REGEXP_REPLACE function?
I think you're running into simple optimization. The regexp expression is so expensive to compute that the result is cached in the hope that it will be used again in the future. If you actually use distinct strings to convert, you will see that the modest translate is naturally faster because it is its specialized function.
Here's my example, running on 11.1.0.7.0:
SQL> DECLARE
2 TYPE t IS TABLE OF VARCHAR2(4000);
3 l t;
4 l_level NUMBER := 1000;
5 l_time TIMESTAMP;
6 l_char VARCHAR2(4000);
7 BEGIN
8 -- init
9 EXECUTE IMMEDIATE 'ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL=2';
10 SELECT dbms_random.STRING('p', 2000)
11 BULK COLLECT
12 INTO l FROM dual
13 CONNECT BY LEVEL <= l_level;
14 -- regex
15 l_time := systimestamp;
16 FOR i IN 1 .. l.count LOOP
17 l_char := regexp_replace(l(i), '[]()[]', '-', 1, 0);
18 END LOOP;
19 dbms_output.put_line('regex :' || (systimestamp - l_time));
20 -- tranlate
21 l_time := systimestamp;
22 FOR i IN 1 .. l.count LOOP
23 l_char := translate(l(i), '()[]', '----');
24 END LOOP;
25 dbms_output.put_line('translate :' || (systimestamp - l_time));
26 END;
27 /
regex :+000000000 00:00:00.979305000
translate :+000000000 00:00:00.238773000
PL/SQL procedure successfully completed
on 11.2.0.3.0 :
regex :+000000000 00:00:00.617290000
translate :+000000000 00:00:00.138205000
Conclusion: In general I suspect translate will win.
For SQL, I tested this with the following script:
set timing on
select sum(length(x)) from (
select translate('(<FIO>)', '()[]', '----') x
from (
select *
from dual
connect by level <= 2000000
)
);
select sum(length(x)) from (
select regexp_replace('[(<FIO>)]', '[\(\)\[]|\]', '-', 1, 0) x
from (
select *
from dual
connect by level <= 2000000
)
);
and found that the performance of translate and regexp_replace were almost always the same, but it could be that the cost of the other operations is overwhelming the cost of the functions I'm trying to test.
Next, I tried a PL/SQL version:
set timing on
declare
x varchar2(100);
begin
for i in 1..2500000 loop
x := translate('(<FIO>)', '()[]', '----');
end loop;
end;
/
declare
x varchar2(100);
begin
for i in 1..2500000 loop
x := regexp_replace('[(<FIO>)]', '[\(\)\[]|\]', '-', 1, 0);
end loop;
end;
/
Here the translate version takes just under 10 seconds, while the regexp_replace version around 0.2 seconds -- around 2 orders of magnitude faster(!)
Based on this result, I will be using regular expressions much more often in my performance critical code -- both SQL and PL/SQL.
This question already has answers here:
LISTAGG function: "result of string concatenation is too long"
(14 answers)
Closed 8 years ago.
I'm using Oracle 11g r2 and I need to concatenate strings (VARCHAR2, 300) from multiple rows. I'm using LISTAGG which works great until the concatenated string reaches the limit. At that point I receive a ORA-01489: result of string concatenation is too long.
In the end, I only want the first 4000 chars of the concatenated string. How I get there doesn't matter. I will accept inefficient solutions.
Here's my query:
SELECT LISTAGG(T.NAME, ' ') WITHIN GROUP (ORDER BY NULL)
FROM T
This code works for any length of data, fast enough
SELECT REPLACE(
REPLACE(
XMLAGG(
XMLELEMENT("A",T.NAME)
ORDER BY 1).getClobVal(),
'<A>',''),
'</A>','[delimiter]')
FROM T
You can either use the built-in (but deprecated) STRAGG function
select sys.stragg(distinct name) from t
(please note that distinct seems to be necessary to avoid duplicates)
or define your own aggregation function / type:
CREATE OR REPLACE TYPE "STRING_AGG_TYPE" as object
(
total varchar2(4000),
static function ODCIAggregateInitialize(sctx IN OUT string_agg_type) return number,
member function ODCIAggregateIterate(self IN OUT string_agg_type,
value IN varchar2) return number,
member function ODCIAggregateTerminate(self IN string_agg_type,
returnValue OUT varchar2,
flags IN number) return number,
member function ODCIAggregateMerge(self IN OUT string_agg_type,
ctx2 IN string_agg_type) return number
);
CREATE OR REPLACE TYPE BODY "STRING_AGG_TYPE" is
static function ODCIAggregateInitialize(sctx IN OUT string_agg_type) return number is
begin
sctx := string_agg_type(null);
return ODCIConst.Success;
end;
member function ODCIAggregateIterate(self IN OUT string_agg_type,
value IN varchar2) return number is
begin
-- prevent buffer overflow for more than 4,000 characters
if nvl(length(self.total),
0) + nvl(length(value),
0) < 4000 then
self.total := self.total || ';' || value;
end if;
return ODCIConst.Success;
end;
member function ODCIAggregateTerminate(self IN string_agg_type,
returnValue OUT varchar2,
flags IN number) return number is
begin
returnValue := ltrim(self.total,
';');
return ODCIConst.Success;
end;
member function ODCIAggregateMerge(self IN OUT string_agg_type,
ctx2 IN string_agg_type) return number is
begin
self.total := self.total || ctx2.total;
return ODCIConst.Success;
end;
end;
CREATE OR REPLACE FUNCTION stragg(input varchar2 )
RETURN varchar2
PARALLEL_ENABLE AGGREGATE USING string_agg_type;
and use it like this:
select STRAGG(name) from t
I believe this approach was orginally proposed by Tom Kyte (at least, that's where I got it from - Asktom: StringAgg
Maybe it will help you:
substr(string, 1, 4000)
EDIT:
or try
SELECT [column], rtrim(
xmlserialize(content
extract(
xmlagg(xmlelement("n", (T.NAME||',') order by [column])
, '//text()'
)
)
, ','
) as list
FROM [table]
GROUP BY [column]
;
This is the drawback of the LISTAGG function ,it does not handle the limit of the string generated due to LISTAGG analytical function .For that you need to take the cumulative sum of length and based on that you need to limit .
worked out example on emp table
select listagg(ename,' ')within group (order by null)
from
(
select ename,
sum( length( ename ) + 1)
over ( order by ename rows between unbounded preceding and current row) length
from emp
)where lngth <= 4000
;
But this will not give the perfect result because if you look to the inner query ,it will generate a column with ename and its length as shown below
ename lenght
===================
gaurav 6
rohan 11
:
:
garima 3996
anshoo 4002
=====================
So the above function will give you result till garima ....not till ansh,because listagg is based on the length column of inner query .
I have a query where I need to call a SQL function to format a particular column in the query. The formatting needed is very similar to formatting a phone number, ie. changing 1234567890 into (123)456-7890.
I've read that calling a function from a select statement could be a performance killer, and it was kind of reflected in my situation, the time the query took more than tripled and I did not think the function would take this much longer. The function runs in linear time but does use SQL loops. To give an idea of the size of the database this particular query returns about 220,000 rows. The run time of the query went from < 3s to > 9s when running without calling the function vs. running calling the function. The column that needs formatting isn't indexed or used in a join condition or where clause.
Is the performance drop here expected or is there something I can do to improve it?
This is the function in question:
CREATE OR REPLACE FUNCTION fn(bigint)
RETURNS character varying LANGUAGE plpgsql AS
$BODY$
DECLARE
v_chars varchar[];
v_ret varchar;
v_length int4;
v_count int4;
BEGIN
if ($1 isnull or $1 = 0) then
return null;
end if;
v_chars := regexp_split_to_array($1::varchar,'');
v_ret := '';
v_length := array_upper (v_chars,1);
v_count := 0;
for v_index in 1..11 loop
v_count := v_count + 1;
if (v_index <= v_length) then
v_ret := v_chars[v_length - (v_index - 1)] || v_ret;
else
v_ret := '0' || v_ret;
end if;
if (v_count <= 6 and (v_count % 2) = 0) then
v_ret := '.' || v_ret;
end if;
end loop;
return v_ret;
END
$BODY$
It depends on the specifics of the function. To find out how much a bare function call will cost, create dummy functions like:
CREATE FUNCTION f_bare_plpgsql(text)
RETURNS text LANGUAGE plpgsql IMMUTABLE AS
$BODY$
BEGIN
RETURN $1;
END
$BODY$;
CREATE FUNCTION f_bare_sql(text)
RETURNS text LANGUAGE sql IMMUTABLE AS
$BODY$
SELECT $1;
$BODY$;
And try your query again.
If then you wonder why your function is slow, add it to your question.
Solution for updated question
Your function could be improved in many places, but there is a more radical solution:
SELECT to_char(12345678901, '00000"."00"."00"."00')
Many times faster, obviously. More about to_char() in the manual.
Consider the following demo:
WITH x(n) AS (
VALUES (1::bigint), (12), (123), (1234), (12345), (123456), (1234567)
,(12345678), (123456789), (1234567890), (12345678901), (123456789012)
)
SELECT n, x.fn(n), to_char(n, '00000"."00"."00"."00')
FROM x
n | fn | to_char
--------------+----------------+-----------------
1 | 00000.00.00.01 | 00000.00.00.01
12 | 00000.00.00.12 | 00000.00.00.12
123 | 00000.00.01.23 | 00000.00.01.23
1234 | 00000.00.12.34 | 00000.00.12.34
12345 | 00000.01.23.45 | 00000.01.23.45
123456 | 00000.12.34.56 | 00000.12.34.56
1234567 | 00001.23.45.67 | 00001.23.45.67
12345678 | 00012.34.56.78 | 00012.34.56.78
123456789 | 00123.45.67.89 | 00123.45.67.89
1234567890 | 01234.56.78.90 | 01234.56.78.90
12345678901 | 12345.67.89.01 | 12345.67.89.01
123456789012 | 23456.78.90.12 | #####.##.##.##
to_char() is only prepared for up to 11 decimal digits, as you can see.
Can easily be extended, if need should be.
If you really must perform the formatting in the database then modify your table to include a field to store the formatted number.
A trigger can call your function to generate the formatted number when the value changes, then you only (slightly) increase the time taken to INSERT or UPDATE a few rows at a time, rather than all of them.
Your query returning all 220k rows then becomes a simple SELECT of the formatted value and should be nice and quick.