Oracle regular expression to replace numbers with alphabets - sql

I have a string which contains only numbers.
I need to replace all digits in the string with a corresponding alphabet as below,
0 -> A
1 -> B
2 -> C
..
9 -> J
I tried with below using translate and replace functions and it works fine for me,
Forward :
WITH T (ID) AS (SELECT '10005614827' FROM DUAL)
SELECT ID, TRANSLATE(ID,'0123456789','ABCDEFGHIJ') "TRANSLATE",
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(ID,'0','A'),'1','B'),'2','C'),'3','D'),'4','E'),'5','F'),'6','G'),'7','H'),'8','I'),'9','J') "REPLACE"
FROM T;
Output:
ID TRANSLATE REPLACE
10005614827 BAAAFGBEICH BAAAFGBEICH
Reverse:
WITH T (ID) AS (SELECT 'BAAAFGBEICH' FROM DUAL)
SELECT ID, TRANSLATE(ID,'ABCDEFGHIJ','0123456789') "TRANSLATE",
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(ID,'A','0'),'B','1'),'C','2'),'D','3'),'E','4'),'F','5'),'G','6'),'H','7'),'I','8'),'J','9') "REPLACE"
FROM T;
Output:
ID TRANSLATE REPLACE
BAAAFGBEICH 10005614827 10005614827
Is there any way to use regular expression to implement this?
WITH T (ID) AS (SELECT '10005614827' FROM DUAL)
SELECT ID, REGEXP_REPLACE(ID,'[0-9]','[A-J]')
FROM T;

It's not possible in Oracle because of the limitations of current implementation.
More specifically, you cannot apply a function to matched value, you only can use backreferences in a form \n where n is a digit from 1 to 9.
For example, you can match each digit and repeat it as many times as it equals to.
column example format a40
with t(id) as (select '10005614827' from dual)
select id,
regexp_replace(id,'(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(0)','\1\2\2\3\3\3\4\4\4\4\5\5\5\5\5\6\6\6\6\6\6\7\7\7\7\7\7\7\8\8\8\8\8\8\8\8\9\9\9\9\9\9\9\9\9') example
from t
/
ID EXAMPLE
----------- ----------------------------------------
10005614827 1555556666661444488888888227777777
1 row selected.
But you can't apply any function to \n in replacing string.
On the other hand, in languages like Perl, Java, Scala... or even in PowerShell and others it's doable.
Example from Scala REPL.
scala> val str = "10005614827"
str: String = 10005614827
scala> // matching and converting each digit separately
scala> "\\d".r.replaceAllIn(str, x => (x.group(0)(0)+17).toChar + "")
res0: String = BAAAFGBEICH
scala> // marching and converting sequences of digits
scala> "\\d+".r.replaceAllIn(str, x => x.group(0).map(x=>(x+17).toChar))
res1: String = BAAAFGBEICH
To complete the picture, model solution just for fun.
SQL> with t(id) as (select '10005614827' from dual)
2 select *
3 from t
4 model partition by (id) dimension by (0 i) measures (id result)
5 rules iterate(10)
6 (result[0] = replace(result[0],iteration_number,chr(ascii(iteration_number)+17)))
7 /
ID I RESULT
----------- ---------- -----------
10005614827 0 BAAAFGBEICH
translate is the very best approach in this case though. This is exactly what this function was made for.
PS. Scala equivalent for example above with a function applied to matched value instead of using backreferences.
"\\d".r.replaceAllIn(str, x => (x.group(0)*(x.group(0)(0)-48)))

Using translate function I do not see problems.
Having for example the string number '3389432543' you can convert it using
SELECT TRANSLATE('3389432543','0123456789','ABCDEFGHIJ')
FROM DUAL;

Related

How to replace characters at specific position in several words using REGEX_REPLACE

I have a query similar to this:
SELECT YEAR_CODE FROM YEAR_CODES
and it returns several records: typically 1 but sometimes 2 or 3. The returned records look like this: 2018FOO, 2019BAR
I need to get the matching previous year of the returned codes. For instance:
2018FOO becomes 2017FOO
2019BAR becomes 2018BAR
Looking for something similar to:
REGEX_REPLACE(SELECT YEAR_CODE FROM YEAR_CODES, 4th character, 4th character minus 1)
You don't need regexp_replace(), using substr() string operator with concat() function (or concatenation operators ||) is enough :
with year_codes(year_code) as
(
select '2018FOO' from dual union all
select '2019BAR' from dual
)
select concat(substr(year_code,1,4) - 1,substr(year_code,-3)) as year_code
from year_codes;
YEAR_CODE
---------
2017FOO
2018BAR
to_number() conversion is redundant, since Oracle implicitly considers a string as a number which is completely composed of digits for an arithmetic operation.
You can do use string operations:
with c as (
<your query here>
)
select
from year_code yc
where to_number(substr(yc.code, 1, 4)) = to_number(substr(c.code)) - 1 and
substr(yc.code, 5) = substr(c.code, 5)

Retrivieng specific occurrences of a given Regex with Oracle SQL

In a simplified form, I'm attempting to retrieve either the first occurrence of the '.*?=(.*?);.*' regex, or the second, or the third -- that is, either x or y or z (that is, I want to be able to hardcode in this query that I want the first, second or third values) in this following example:
select regexp_replace(
'margin=x;margin=y;margin=z;',
'.*?=(.*?);.*',
'\1',
1 -- occurrences. I thought that picking 1, 2 or 3 would solve my problem?
) from dual;
-- This returns "xyz", which is terrible. I was expecting it to return "x", in this case.
Looking at the Oracle documentation, I thought this would be relatively straightforward, as the last parameter (occurrences), apparently allows me to select which groups to take into consideration. But it doesn't! Why?
Thanks
i´m goingoff to another completly different solution. Would combining a hierarchial substring select with a regexp_replace be an option for your needs?
This way you could create an option to either select one or multiple values, depending on your needs. You wouldn´t need to write a concatinating regex value and you could adjust the select a bit more to your needs
select regexp_replace(subselect.val, '.*=(.*?);', '\1') -- remove "margin="
from (select regexp_substr(
'margin=x;margin=y;margin=z;',
'.*?=(.*?);',
1,
level) val,
level lvl
from dual
connect by regexp_substr('margin=x;margin=y;margin=z;',
'.*?=(.*?);',
1,
level) is not null) subselect -- This select represents each margin=T as a single row
where lvl = 1; -- cou could define multiple values to select aswell.
You need a regex that will match 1 to n occurrences of the whole group. E.g.
([^=]*=([^;]*);){2}.*
(replaced with \2 backreference) will get the 2nd attribute value. Your regex can also be used (though it is quite synonymous to the above pattern): (.*?=(.*?);){2}.*.
See the regex demo
If you define the index variable as IDX, you can use something like
select regexp_replace(
'margin=x;margin=y;margin=z;',
CONCAT('([^=]*=([^;]*);){', IDX, '}.*'),
'\2'
) from dual;
NOTE: If you want to get an empty string as a result of trying to obtain a non-existing value, add |.* at the end of the regex:
(.*?=(.*?);){4}.*|.*
See this regex demo (with your input string, the result will be empty string).
Perhaps all you need is this.... The fourth parameter is NOT the occurrence but the POSITION from which the search starts. The FIFTH parameter is the occurrence.
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions130.htm
Also, are you sure you want REPLACE and not SUBSTR?
EDITED: To clarify (it seems at least one person was confused). I show a possible solution to what you need (perhaps) at the end, but first let's look at REGEXP_REPLACE. I rewrote your query to use different occurrences; I put the index in a CTE, but you can instead make idx into a bind variable, or any other mechanism you need to use. As you will see, the output makes no sense.
with t1 ( idx ) as (select 1 from dual union all select 2 from dual
union all select 3 from dual)
select idx,
regexp_replace('margin=x;margin=y;margin=z;', '.*?=(.*?);.*', '\1', 1, idx) as val
from t1;
Output:
IDX VAL
---------- -----------------------
1 xmargin=y;margin=z;
2 margin=x;ymargin=z;
3 margin=x;margin=y;z
3 rows selected.
I guess this is not what you needed - but it demonstrates what was wrong in your query. The fourth argument to REGEXP_REPLACE, 1 in all cases in the above query, is the position from which the search begins. The fifth argument, idx, is the occurrence. This query replaces the first, second, third occurrence with the subexpression - probably not what you wanted.
If you need to extract x, or y, or z, depending on the occurrence number, you must use REGEXP_SUBSTR, not REGEXP_REPLACE. Note also that I changed the match pattern - the .*? at the beginning and the .* at the end are unnecessary. If you want to find x, y or z in something like margin=x; but not in length=x; then you must make that explicit, the match pattern should be 'margin=(.*?);'.
with t1 ( idx ) as (select 1 from dual union all select 2 from dual
union all select 3 from dual)
select idx,
regexp_replace('margin=x;margin=y;margin=z;', '=(.*?);', '\1', 1, idx) as val
from t1;
Output:
IDX VAL
---------- -------
1 x
2 y
3 z

PLSQL Getting values from string

I have this varchar2(2000) string:
id=100\nid2=0\nid3=0\dtext='more Text'
and I want to get only the values e.g. more Text or 0 (id3).
I was trying to use a customized SPLIT function, where separator is \n but this only returns me for example id3=0 (in this case I need '0' as result).
How can I do this more efficient?
and I want to get only the values e.g. more Text.
Simply use SUBSTR and INSTR
SQL> WITH DATA AS
2 ( SELECT q'[id=100\nid2=0\nid3=0\dtext='more Text']' str FROM dual
3 )
4 SELECT SUBSTR(str,instr(str, '''')+1,LENGTH(SUBSTR(str,instr(str, '''')))-2) str
5 FROM DATA
6 /
STR
---------
more Text
SQL>
You could get all the values with something like this:
WITH DATA AS
(SELECT q'[id=100\nid2=0\nid3=0\ndtext='more Text']' str FROM dual)
SELECT replace(substr(regexp_substr(str,'(=.+?\n)|(=.+?$)',1,level),2),'\n') v
FROM DATA
CONNECT BY LEVEL <= LENGTH(regexp_replace(str,'([^=]+=.+?\n)|([^=]=.+?$)'))

Deleting records with number repeating more than 5

I have data in a table of length 9 where data is like
999999969
000000089
666666689
I want to delete only those data in which any number from 1-9 is repeating more than 5 times.
OK, so the logic here can be summed up as:
Find the longest series of the same consecutive digit in any given number; and
Return true if that longest value is > 5 digits
Right?
So, lets split it into series of consecutive digits:
regress=> SELECT regexp_matches('666666689', '(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)', 'g');
regexp_matches
----------------
{6666666}
{8}
{9}
(3 rows)
then filter for the longest:
regress=>
SELECT x[1]
FROM regexp_matches('6666666898', '(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)', 'g') x
ORDER BY length(x[1]) DESC
LIMIT 1;
x
---------
6666666
(1 row)
... but really, we don't actually care about that, just if any entry is longer than 5 digits, so:
SELECT x[1]
FROM regexp_matches('6666666898', '(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)', 'g') x
WHERE length(x[1]) > 5;
can be used as an EXISTS test, e.g.
WITH blah(n) AS (VALUES('999999969'),('000000089'),('666666689'),('15552555'))
SELECT n
FROM blah
WHERE EXISTS (
SELECT x[1]
FROM regexp_matches(n, '(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)', 'g') x
WHERE length(x[1]) > 5
)
which is actually pretty efficient and return the correct result (always nice). But it can be simplified a little more with:
WITH blah(n) AS (VALUES('999999969'),('000000089'),('666666689'),('15552555'))
SELECT n
FROM blah
WHERE EXISTS (
SELECT x[1]
FROM regexp_matches(n, '(0{6}|1{6}|2{6}|3{6}|4{6}|5{6}|6{6}|7{6}|8{6}|9{6})', 'g') x;
)
You can use the same WHERE clause in a DELETE.
This can be much simpler with a regular expression using a back reference.
DELETE FROM tbl
WHERE col ~ '([1-9])\1{5}';
That's all.
Explain
([1-9]) ... a character class with digits from 1 to 9, parenthesized for the following back reference.
\1 ... back reference to first (and only in this case) parenthesized subexpression.
{5} .. exactly (another) 5 times, making it "more than 5".
Per documentation:
A back reference (\n) matches the same string matched by the previous
parenthesized subexpression specified by the number n [...] For example, ([bc])\1 matches bb or cc but not bc or cb.
SQL Fiddle demo.
Horrible and terrible in terms of performance, but it should work:
DELETE FROM YOURTABLE
WHERE YOURDATA LIKE '%111111%'
OR YOURDATA LIKE '%222222%'
OR YOURDATA LIKE '%333333%'
OR YOURDATA LIKE '%444444%'
OR YOURDATA LIKE '%555555%'
OR YOURDATA LIKE '%666666%'
OR YOURDATA LIKE '%777777%'
OR YOURDATA LIKE '%888888%'
OR YOURDATA LIKE '%999999%'

Netezza string comparison

I have 2 columns string and Character. My requirement is to compare those 2, if the second column(char) is available in first column(string) the answer should be some value(say 1).
I tried using translate but it doesn't work.
Any functions available for above requirement.
Netezza implements the usual ANSI standard position function.
TESTDB.ADMIN(ADMIN)=> select position('BC' in 'ABCDEF');
STRPOS
--------
2
(1 row)
TESTDB.ADMIN(ADMIN)=> select position('BZ' in 'ABCDEF');
STRPOS
--------
0
(1 row)
The translate function converts certain characters in a source string, so that won't do what you want at all.
TESTDB.ADMIN(ADMIN)=> select translate('ABCDE', 'BE', 'XYZ');
TRANSLATE
-----------
AXCDY
(1 row)
select
case when
(select
STRPOS(
trim(lower('abc.com')),
trim(lower('xyz'))
)
) > 0
then
1
else
2
end;
To test this code put your first string instead of 'abc.com' and second string at 'xyz'. If your second string present in first string it will output 1 else 2.
Hope this will help.