Oracle. Not valid ascii value of regex result - sql

I'd like to edit a string. Get from 2 standing nearby digits digit and letter (00 -> 0a, 01 - 0b, 23-> 2c etc.)
111324 -> 1b1d2e.
Then my code:
set serveroutput on size unlimited
declare
str varchar2(128);
function convr(num varchar2) return varchar2 is
begin
return chr(ascii(num)+49);
-- return chr(ascii(num)+49)||'<-'||(ascii(num)+49)||','||ascii(num)||','||num||'|';
end;
function replace_dd(str varchar2) return varchar2 is
begin
return regexp_replace(str,'((\d)(\d))','\2'||convr('\3'));
end;
begin
str := '111324';
Dbms_Output.Put_Line(str);
Dbms_Output.Put_Line(replace_dd(str));
end;
But I get the next string: '112'.
When I checked result by commented return string I'v got:
'1<-141,92,1|1<-141,92,3|2<-141,92,4|'.
ascii(num) does not depend on num. It always works like ascii('\'). It is 92, plus 49 we got 141 and it is out of ascii table. But num by itself is printed correctly.
How can I get correct values? Or maybe another way to resolve this issue?

What is happening is that the replacement string is expanded first, and only after it is fully processed, any remaining backreferences like \2 are replaced by string fragments. So convr('\3') is processed first, and at this stage '\3' is a literal. ascii() returns the ascii code of the FIRST character of whatever string it receives as argument. So the 3 plays no role, you only get ascii('\') as you noticed. Then your user-defined function is evaluated and plugged back into the concatenation... by now there is no \3 left in the replacement string.
Exercise: Try to explain/understand why
regexp_replace('abcdef', '(b).*(e)', '\2' || upper('\1'))
is aebf and not aeBf. (Hint: what is the return from upper('\1') by itself, unrelated to anything else?)
You could split the input string into component characters, apply your transformation on those with even index and combine the string back (all in SQL, no need for loops and such). Something like this (done in plain SQL, you can rewrite it into your function if you like):
with
inputs ( str ) as (
select '111324' from dual union all
select '372' from dual
),
singletons ( str, idx, ch ) as (
select str, level, substr(str, level, 1)
from inputs
connect by level <= length(str)
and prior str = str
and prior sys_guid() is not null
)
select str,
listagg(case mod(idx, 2) when 1 then ch else chr(ascii(ch)+49) end, '')
within group (order by idx)
as modified_str
from singletons
group by str
;
STR MODIFIED_STR
------ --------------
111324 1b1d2e
372 3h2

Here code adds 5 to a single letter and resolve the isssue.
set serveroutput on size unlimited
declare
str varchar2(128);
str1 varchar2(128);
function replace_a(str varchar2) return varchar2 is
begin
return regexp_replace(str,'(\D)','5\1');
end;
function convr(str varchar2) return varchar2 is
ind number;
ret varchar2(128);
begin
Dbms_Output.Put_Line(str);
--return chr(ascii(num)+49)||'<-'||(ascii(num)+49)||','||ascii(num)||','||num||'|';
ind := 1 ;
ret :=str;
loop
ind := regexp_instr(':'||ret,'(#\d#)',ind) ;
exit when ind=0;
Dbms_Output.Put_Line(ind);
ret := substr(ret,1,ind-2)||chr(ascii(substr(ret,ind,1))+49)||substr(ret,ind+2);
SYS.Dbms_Output.Put_Line(ret);
end loop;
return ret;
end;
function replace_dd(str varchar2) return varchar2 is
begin
return convr(regexp_replace(str,'((\d)(\d))','\2#\3#'));
end;
begin
str := '11a34';
Dbms_Output.Put_Line(str);
Dbms_Output.Put_Line(replace_a(str));
Dbms_Output.Put_Line(replace_dd(replace_a(str)));
end;
result:
11a34
115a34
1#1#5a3#4#
3
1b5a3#4#
7
1b5a3e
1b5a3e

Related

Oracle remove html from clob fields

I have a simple function to convert html blob to plain text
FUNCTION HTML_TO_TEXT(html IN CLOB) RETURN CLOB
IS v_return CLOB;
BEGIN
select utl_i18n.unescape_reference(regexp_replace(html, '<.+?>', ' ')) INTO v_return from dual;
return (v_return);
END;
called in that way:
SELECT A, B, C, HTML_TO_TEXT(BLobField) FROM t1
all works fine until BlobFields contains more than 4000 character, then i got
ORA-01704: string literal too long
01704. 00000 - "string literal too long"
*Cause: The string literal is longer than 4000 characters.
*Action: Use a string literal of at most 4000 characters.
Longer values may only be entered using bind variables.
i try to avoud string inside function using variables but nothing changes:
FUNCTION HTML_TO_TEXT(html IN CLOB) RETURN CLOB
IS v_return CLOB;
"stringa" CLOB;
BEGIN
SELECT regexp_replace(html, '<.+?>', ' ') INTO "stringa" FROM DUAL;
select utl_i18n.unescape_reference("stringa") INTO v_return from dual;
return (v_return);
END;
Do not use regular expressions to parse HTML. If you want to extract the text then use an XML parser:
SELECT a,
b,
c,
UTL_I18N.UNESCAPE_REFERENCE(
XMLQUERY(
'//text()'
PASSING XMLTYPE(blobfield, 1)
RETURNING CONTENT
).getStringVal()
) AS text
FROM t1
Which will work where the extracted text is 4000 characters or less (since XMLTYPE.getStringVal() will return a VARCHAR2 data type and UTL_I18N.UNESCAPE_REFERENCE accepts a VARCHAR2 argument).
If you want to get it to work on CLOB values then you can still use XMLQUERY and getClobVal() but UTL_I18N.UNESCAPE_REFERENCE still only works on VARCHAR2 input (and not CLOBs) so you will need to split the CLOB into segments and parse those and concatenate them once you are done.
Something like:
CREATE FUNCTION html_to_text(
i_xml IN XMLTYPE
) RETURN CLOB
IS
v_text CLOB;
v_output CLOB;
str VARCHAR2(4000);
len PLS_INTEGER;
pos PLS_INTEGER := 1;
lim CONSTANT PLS_INTEGER := 4000;
BEGIN
SELECT XMLQUERY(
'//text()'
PASSING i_xml
RETURNING CONTENT
).getStringVal()
INTO v_text
FROM DUAL;
len := LENGTH(v_text);
WHILE pos <= len LOOP
str := DBMS_LOB.SUBSTR(v_text, lim, pos);
v_output := v_output || UTL_I18N.UNESCAPE_REFERENCE(str);
pos := pos + lim;
END LOOP;
RETURN v_output;
END;
/
However, you probably want to make it more robust and check if you are going to split the string in the middle of an escaped XML character.
db<>fiddle here

PL/SQL LOOP - Return a row with mixed capital letters

I know this question probably has an easy answer, but I can't get my head around it.
I'm trying to, inside a loop, return a string (in the SQL output) with mixed capital and non-capital letters.
Example: If a name in the row is John Doe, the output will print JoHn DoE, or MiXeD CaPiTaL.
This is my code (which I know is poor written but I need to use the cursor!):
declare
aa_ VARCHAR2(2000);
bb_ NUMBER:=0;
cc_ NUMBER:=0;
CURSOR cur_ IS
SELECT first_name namn, last_name efternamn FROM person_info
;
begin
FOR rec_ IN cur_ LOOP
dbms_output.put_line(rec_.namn);
FOR bb_ IN 1.. LENGTH(rec_.namn) LOOP
dbms_output.put(UPPER(SUBSTR(rec_.namn,bb_,1)));
cc_ := MOD(bb_,2);
IF cc_ = 0 THEN
dbms_output.put(UPPER(SUBSTR(rec_.namn,cc_,1)));
ELSE
dbms_output.put(LOWER(SUBSTR(rec_.namn,2)));
END IF;
end loop;
dbms_output.new_line;
end loop;
end;
Again, I know the code is really bad but yeah, trying to learn!
Thanks in advance :)
You may use plain SQL for this purpose, without any loop:
Split input text by pairs separated with some special character (that doesn't appear in the text).
Use initcap SQL function to turn each first letter to upper case.
Remove the special separator.
with a as (
select 'John Doe' as a
from dual
union all
select 'mixed capital and non-capital letters'
from dual
)
select
replace(
initcap(
/*Convert case*/
regexp_replace(a, '([a-zA-Z]{2})',
/*Add ASCII nul after each two letters*/
'\1' || chr(0)
)
),
/*Remove ASCII nul to revert changes*/
chr(0)
) as mixed_case
from a
| MIXED_CASE |
| :------------------------------------ |
| JoHn DoE |
| MiXeD CaPiTaL AnD NoN-CaPiTaL LeTtErS |
db<>fiddle here
I'd put the text transformation into a function, rather than including all the logic in the body of the loop.
declare
cursor c_people is
select 'John' as first_name, 'Doe' as last_name from dual union all
select 'Mixed', 'Capitals'
from dual;
function mixCaps(inText varchar2) return varchar2
is
letter varchar2(1);
outText varchar2(4000);
begin
for i in 1..length(inText) loop
letter := substr(inText,i,1);
outText := outText ||
case mod(i,2)
when 0 then lower(letter)
else upper(letter)
end;
end loop;
return outText;
end mixCaps;
begin
for person in c_people loop
dbms_output.put_line(mixCaps(person.first_name|| ' ' || person.last_name));
end loop;
end;
If performance was critical and you had large numbers of values, you might consider inlining the function using pragma inline (but then you wouldn't be using dbms_output anyway).
For learning purpose you can use code below (it is not efficient it is for learning of oracle features)
Steps :
split word on letters using connect by level
get Nth (level) occurence of one letter ('.?') from word using reg exp
convert to upper case every 2nd letter
concatenate back using list agg and sorting by letter number
used here function in with so you can apply it to any sql table
with
function mixed(iv_name varchar2) return varchar2 as
l_result varchar2(4000);
begin
with src_letters as
(select REGEXP_SUBSTR(iv_name, '.?', level) as letter
,level lvl
from dual
connect by level <= length(iv_name)),
mixed_letters as
(select case
when mod(lvl, 2) = 0 then
letter
else
upper(letter)
end as letter
,lvl
from src_letters
order by lvl)
select listagg(letter) within group(order by lvl)
into l_result
from mixed_letters;
return l_result;
end;
select mixed('text') from dual

Why this procedure does not work properly? INSTR and SUBSTR issues

I made this procedure:
create or replace procedure calculate_vertices(vertices VARCHAR2) AS
pos2 INTEGER;
pos1 INTEGER := 1;
posDash INTEGER;
lat VARCHAR2(20);
lon VARCHAR2(20);
BEGIN
loop
pos2:=INSTR(vertices,'#',pos1);
exit when pos2 = 0;
posDash := INSTR (vertices,'-',pos1);
lat := SUBSTR(vertices,pos1,pos2-(posDash+1));
dbms_output.put_line(lat);
lon := SUBSTR(vertices,posDash+1,pos2-(posDash+1));
dbms_output.put_line(lon);
pos1 := pos2+1;
end loop;
END;
then i called it as follows:
exec calculate_vertices('122.23-243.345#222.22-323#');
The result expected is:
122.23
243.345
222.22
323
But the real output is:
122.23-
243.345
222
323
How is it possible?
EDIT: I noticed it put in the variable LAT the same number of characters of the variable LON. Why?
I think
lat := SUBSTR(vertices,pos1,pos2-(posDash+1));
is wrong, it should be
lat := SUBSTR(vertices,pos1,posDash -pos1);
just think, what does the position of the # matter for the length of the string before the dash.
You may use REGEXP_SUBSTR, REGEXP_COUNT in a single select instead of your PL/SQL block.
WITH t (str)
AS (SELECT '122.23-243.345#222.22-323#'
FROM dual)
SELECT TRIM(REGEXP_SUBSTR(str, '[^-#]+', 1, LEVEL)) str
FROM t
CONNECT BY LEVEL <= REGEXP_COUNT (str, '[^-#]+');
DEMO

I have a string value i need to check the format of the value in plsql

I have a INPUT VARIABLE as string 'AB1234567' it should not be more than 9 digits. i need a function in oracle Using regular expressions i need to check the format of the string.
i.e The first two characters of the string should be alphabetes and the next 7 characters should be numbers.
If i get any other special characters in the first two characters of string the function need's to return 'F' and next 7 characters should be numbers if i get any junk characters in the next 7 variables then it needs to return 'f'.
The universal format of the string is 'AB1234567' first two characters are alpha and the next 7 should be the numbers .
Thank you
You can use regexp_like with different character classes to check for different patterns.
create or replace function str_test(txt in varchar2) return varchar2 as
begin
if not regexp_like(txt, '^[[:alpha:]]{2}') then
return 'F';
elsif not regexp_like(txt, '^.{2}\d{7}$') then
return 'f';
else
return 'some other output';
end if;
end;
/
If I correctly understand, you need something like this:
(upss, I missed not be more than 9 digits part, updated)
create function func(var nvarchar2)
return nvarchar2
as
begin
if NOT REGEXP_LIKE(var, '^[A-Z]{2}') then
return 'F';
elsif NOT REGEXP_LIKE(var, '^..[0-9]{7}$') then
return 'f';
else
return 'ok';
end if;
end;
CREATE OR REPLACE FUNCTION str_test (
str IN VARCHAR2
) RETURN CHAR
AS
BEGIN
IF str IS NULL THEN
RETURN 'X';
ELSE IF LENGTH( str ) != 9 THEN
RETURN 'Y'
ELSE IF SUBSTR( str, 1, 1 ) NOT BETWEEN 'A' AND 'Z'
OR SUBSTR( str, 2, 1 ) NOT BETWEEN 'A' AND 'Z' THEN
RETURN 'F';
ELSE
TO_NUMBER( SUBSTR( str, 3 ) );
RETURN NULL;
END IF;
EXCEPTION
WHEN other THEN
RETURN 'f';
END;
/

recursive permutation algorithm in plsql

I'm trying to run a recursive procedure that permutates a given string.
It's compiling on sqldeveloper but when I try to run with input its giving me ora-06502: numeric or value errors on line 13 (the prefix assignment)
create or replace
procedure print_anagrams
(pre in varchar2, str in varchar2)
is
prefix varchar2(30);
stringg varchar2(30);
strlen number;
begin
strlen := length(str);
if strlen = 0 then
dbms_output.put_line(pre);
else
for i in 1..strlen loop
prefix := pre || SUBSTR(str,i,1);
stringg := SUBSTR(str,1,i) || SUBSTR(str,i+1,strlen);
print_anagrams(prefix,stringg);
end loop;
end if;
end;
There were two problems:
Firstly, the LENGTH function returns NULL if its parameter is NULL, not 0, so the following condition in your code was never true (because strlen is NULL):
if strlen = 0 then
You were getting the ora-06502: numeric or value errors error, because, when the str argument was empty, the upper range limit of the FOR LOOP was NULL (because strlen is NULL):
for i in 1..NULL loop
And this yields:
ora-06502: numeric or value errors
Secondly, the last parameter of the substr function in Oracle has different meaning than String's substring method in Java. In Oracle, that parameter means "how many characters should be returned", whereas in Java it stands for "end index of the substring to be returned from the original string", so the following line should be changed:
stringg := SUBSTR(str,1,i) || SUBSTR(str,i+1,strlen);
to:
stringg := SUBSTR(str,1,i - 1) || SUBSTR(str,i+1,strlen);
The change had to be made, because in the Java code that you provided the link to, the loop starts from 0, and 0 is passed as the third argument, which results in an empty string being returned for the first iteration of the loop. Without the change, first iteration in PL/SQL version would return the first character from the argument.
In the end, you get a working procedure:
create or replace
procedure print_anagrams
(pre in varchar2, str in varchar2)
is
prefix varchar2(30);
stringg varchar2(30);
strlen number;
begin
strlen := length(str);
if NVL(strlen, 0) = 0 then
dbms_output.put_line(pre);
else
for i in 1..strlen loop
prefix := pre || SUBSTR(str,i,1);
stringg := SUBSTR(str,1,i - 1) || SUBSTR(str,i+1,strlen);
print_anagrams(prefix,stringg);
end loop;
end if;
end;
/
Test:
EXEC print_anagrams('', 'cat');
Output:
cat
cta
act
atc
tca
tac
Oracle Substr Function
Java String's substring method