Oracle RegExp_Like for Two Repeating Digits - sql

I would like to write a regexp_like function which would identify if a string consists of two repeating characters. It would only identify a string that has alternating numbers and only consisting of two unique numbers, but the unique number cannot repeat, it must alternate.
Requirement :
Regular expression should match the pattern for 787878787, but it should NOT match the pattern 787878788
It should NOT consider the pattern like 000000000

I think you want the following:
WITH t1 AS (
SELECT '787878787' AS str FROM dual
UNION
SELECT '787878788' AS str FROM dual
UNION
SELECT '7878787878' AS str FROM dual
UNION
SELECT '78' AS str FROM dual
)
SELECT * FROM t1
WHERE REGEXP_LIKE(str, '^(.)(.)(\1\2)*\1?$')
AND SUBSTR(str, 1, 1) != SUBSTR(str, 2, 1)
This will cover the case (mentioned in the requirements) where the string ends with the same character with which it begins. If you want only digits, replace the . in the regex with \d.
Update:
Here is how the regex breaks down:
^ = start of string
(.) = first character - can be anything - in parentheses to capture it and use it in a backreference
(.) = second character - can be anything
\1 = backreference to first captured group
\2 = backreference to second captured group
(\1\2)* = These should appear together zero or more times
\1? = The first captured group should appear zero or one times
$ = end of the string
Hope this helps.

You might do something like this -
SQL> WITH DATA AS(
2 SELECT '787878787' str FROM dual UNION ALL
3 SELECT '787878788' FROM dual
4 )
5 SELECT *
6 FROM DATA
7 WHERE REGEXP_LIKE(str, '(\d+?)\1')
8 AND SUBSTR(str, 1,1) = SUBSTR(str, -1, 1)
9 /
STR
---------
787878787
SQL>
Since you are dealing only with digits, I used \d.
\d+? will match the digits, and, \1 are the captured digits. The substr in the AND condition is checking whether the first and last digit of the string are same.
Edit : Additional requirement by OP
To avoid the numbers like 00000000, you need to add a NOT condition to the predicate.
SQL> WITH DATA AS
2 ( SELECT '787878787' str FROM dual
3 UNION ALL
4 SELECT '787878788' FROM dual
5 UNION ALL
6 SELECT '787878788' FROM dual
7 )
8 SELECT *
9 FROM DATA
10 WHERE REGEXP_LIKE(str, '(\d+?)\1')
11 AND SUBSTR(str, 1,1) = SUBSTR(str, -1, 1)
12 AND SUBSTR(str, 2,1) <> SUBSTR(str, -1, 1)
13 /
STR
---------
787878787
SQL>

You could try:
^(..)\1*$
Breakdown:
^ - assert beginning of line
(..) - capture the first 2 characters
\1* - repeat the captured group pattern zero or more times
$ - assert end of line
Untested in oracle...

Related

Issues with SUBSTR function Oracle_SQL

I used the SUBSTR function for the similar purposes, but I encountered the following issue:
I am extracting 6 characters from the right, but the data in column is inconsistent and for some rows it has characters less than 6, i.e. 5 or 4. So for such rows, the function returns blanks. How can I fix this?
Example Scenario 1:
SUBSTR('0000123456',-6,6)
Output: 123456
Scenario 2 (how do I fix this?, I need it to return '23456'):
SUBSTR('23456',-6,6)
Output: ""
You can use a case expression: if the string length is strictly greater than 6 then return just the last 6 characters; otherwise return the string itself. This way you don't need to call substr unless it is really needed.
Alternatively, if speed is not the biggest issue and you are allowed to use regular expressions, you can write this more compactly - select between 0 and 6 characters - as many as possible - at the end of the string.
Finally, if you don't mind using undocumented functions, you can use reverse and standard substr (starting from character 1 and extracting the first 6 characters; that will work as expected even if the string has length less than 6). So: reverse the string, extract first (up to) 6 characters, and then reverse again to restore the order. WARNING: This is shown only for fun; DO NOT USE THIS METHOD!
with
test_data (str) as (
select '0123449389' from dual union all
select '00000000' from dual union all
select null from dual union all
select 'abcd' from dual
)
select str,
case when length(str) > 6 then substr(str, -6) else str end as case_substr,
regexp_substr(str, '.{0,6}$') as regexp_substr,
reverse(substr(reverse(str), 1, 6)) as rev_substr
from test_data
;
STR CASE_SUBSTR REGEXP_SUBSTR REV_SUBSTR
---------- ------------- ------------- --------------
0123449389 449389 449389 449389
00000000 000000 000000 000000
abcd abcd abcd abcd
One method uses coalesce():
select coalesce(substr('23456', -6, 6), '23456')
Another tweaks the length:
select substr('23456', greatest(- length('23456'), -6), 6)

ORACLE SQL - REGEXP_LIKE Contains First Character As a Number and Second Character as an Alphabet

I am trying to generate a query in Oracle where i can get records that has first character in String as 3 or 4 AND second character is an alphabet. The rest can be anything else.
Something like this
SELECT COL1 FROM TABLE
WHERE REGEXP_LIKE (COL1, '3[A-Za-Z]')
OR REGEXP_LIKE (COL1, '4[A-Za-z]')
I Do get the output but for few records the data doesn't start with 3 or 4.
Meaning it selects those records who have 3 and An alphabet together anywhere in the column.
ex: 10573T2 (10573T2). I have to query records that should start with either 3 or 4 and the next character should be a letter.
Any help would be great
SQL> with test (col) as
2 (select '10573T2' from dual union all
3 select '3A1234F' from dual union all
4 select '23XXX02' from dual union all
5 select '4GABC23' from dual union all
6 select '31234FX' from dual
7 )
8 select col
9 from test
10 where regexp_like(col, '(^3|^4)[[:alpha:]]');
COL
-------
3A1234F
4GABC23
SQL>
begins ^ with 3 or | 4
and is followed by a letter [[:alpha:]]
As of your ^ doubts: that character has two roles:
[^ ... ] - Non-Matching Character List: matches any character not in list ...
^ - Beginning of Line Anchor: match the subsequent expression only when it occurs at the beginning of a line.
You need to anchor the pattern at the beginning of the string:
REGEXP_LIKE(COL1, '^[34][A-Za-z]')
Here is a db<>fiddle

Retrieve the characters before a matching pattern

135 ;1111776698 ;AB555678765
I have the above string and what I am looking for is to retrieve all the digits before the first occurrence of ;.
But the number of characters before the first occurrence of ; varies i.e. it may be a 4 digit number or 3 digit number.
I have played with regex_instr and instr, but I unable to figure this out.
The query should return all the digits before the first occurrence of ;
This answer assumes that you are using Oracle database. I don't know of way to do this using REGEX_INSTR alone, but we can do with REGEXP_REPLACE using capture groups:
SELECT REGEXP_REPLACE('135 ;1111776698 ;AB555678765', '^\s*(\d{3,4})\s*;.*', '\1')
FROM dual;
Demo
Here is the regex pattern being used:
^\s*(\d{3,4})\s*;.*
This allows, from the start of the string, any amount of leading whitespace, followed by a 3 or 4 digit number, followed again by any amount of whitespace, then a semicolon. The .* at the end of the pattern just consumes whatever remains in your string. Note (\d{3,4}), which captures the 3-4 digit number, which is then available in the replacement as \1.
Using INSTR,SUBTSR and TRIM should work ( based on your comment that there are "just white spaces and digits" )
select TRIM(SUBSTR(s,1, INSTR(s,';')-1)) FROM t;
Demo
The following using regexp_substr() should work:
SELECT s, REGEXP_SUBSTR(s, '^[^;]*')
Make sure you try all possible values in that first position, even those you don't expect and make sure they are handled as you want them to be. Always expect the unexpected! This regex matches the first subgroup of zero or more optional digits (allows a NULL to be returned) when followed by an optional space then a semi-colon, or the end of the line. You may need to tighten (or loosen) up the matching rules for your situation, just make sure to test even for incorrect values, especially if the input comes from user-entered data.
with tbl(id, str) as (
select 1, '135 ;1111776698 ;AB555678765' from dual union all
select 2, ' 135 ;1111776698 ;AB555678765' from dual union all
select 3, '135;1111776698 ;AB555678765' from dual union all
select 4, ';1111776698 ;AB555678765' from dual union all
select 5, ';135 ;1111776698 ;AB555678765' from dual union all
select 6, ';;1111776698 ;AB555678765' from dual union all
select 7, 'xx135 ;1111776698 ;AB555678765' from dual union all
select 8, '135;1111776698 ;AB555678765' from dual union all
select 9, '135xx;1111776698 ;AB555678765' from dual
)
select id, regexp_substr(str, '(\d*?)( ?;|$)', 1, 1, NULL, 1) element_1
from tbl
order by id;
ID ELEMENT_1
---------- ------------------------------
1 135
2 135
3 135
4
5
6
7 135
8 135
9
9 rows selected.
To get the desired result, you should use REGEX_SUBSTR as it will substring your desired data from the string you give. Here is the example of the Query.
Solution to your example data:
SELECT REGEXP_SUBSTR('135 ;1111776698 ;AB555678765','[^;]+',1,1) FROM DUAL;
So what it does, Regex splits the string on the basis of ; separator. You needed the first occurrence so I gave arguments as 1,1.
So if you need the second string 1111776698 as your output you can give an argument as 1,2.
The syntax for Regexp_substr is as following:
REGEXP_SUBSTR( string, pattern [, start_position [, nth_appearance [, match_parameter [, sub_expression ] ] ] ] )
Here is the link for more examples:
https://www.techonthenet.com/oracle/functions/regexp_substr.php
Let me know if this works for you. Best luck.

Oracle SQL query to convert a string into a comma separated string with comma after every n characters

How can we convert a string of any length into a comma separated string with comma after every n characters. I am using Oracle 10g and above. I tried with REGEXP_SUBSTR but couldn't get desired result.
e.g.: for below string comma after every 5 characters.
input:
aaaaabbbbbcccccdddddeeeeefffff
output:
aaaaa,bbbbb,ccccc,ddddd,eeeee,fffff,
or
aaaaa,bbbbb,ccccc,ddddd,eeeee,fffff
Thanks in advance.
This can be done with regexp_replace, like so:
WITH sample_data AS (SELECT 'aaaaabbbbbcccccdddddeeeeefffff' str FROM dual UNION ALL
SELECT 'aaaa' str FROM dual UNION ALL
SELECT 'aaaaabb' str FROM dual)
SELECT str,
regexp_replace(str, '(.{5})', '\1,')
FROM sample_data;
STR REGEXP_REPLACE(STR,'(.{5})','\
------------------------------ --------------------------------------------------------------------------------
aaaaabbbbbcccccdddddeeeeefffff aaaaa,bbbbb,ccccc,ddddd,eeeee,fffff,
aaaa aaaa
aaaaabb aaaaa,bb
The regexp_replace simply looks for any 5 characters (.{5}), and then replaces them with the same 5 characters plus a comma. The brackets around the .{5} turn it into a labelled subexpression - \1, since it's the first set of brackets - which we can then use to represent our 5 characters in the replacement section.
You would then need to trim the extra comma off the resultant string, if necessary.
SELECT RTRIM ( REGEXP_REPLACE('aaaaabbbbbcccccdddddeeeeefffff', '(.{5})' ,'\1,') ,',') replaced
FROM DUAL;
This worked for me:
WITH strlen AS
(
SELECT 'aaaaabbbbbcccccdddddeeeeefffffggggg' AS input,
LENGTH('aaaaabbbbbcccccdddddeeeeefffffggggg') AS LEN,
5 AS part
FROM dual
)
,
pattern AS
(
SELECT regexp_substr(strlen.input, '[[:alnum:]]{5}', 1, LEVEL)
||',' AS line
FROM strlen,
dual
CONNECT BY LEVEL <= strlen.len / strlen.part
)
SELECT rtrim(listagg(line, '') WITHIN GROUP (
ORDER BY 1), ',') AS big_bang$
FROM pattern ;

How to replace more than one character in oracle?

How to replace multiple whole characters, except those in combinations...?
The below code replaces multiple characters, but it also disturbing those in combinations.
SELECT regexp_replace('a,ca,va,ea,r,y,q,b,g','(a|y|q|g)','X') RESULT FROM dual;
Current output:
RESULT
--------------------
X,cX,vX,eX,r,X,X,b,X
Expected output:
RESULT
------------------------
'X,ca,va,ea,r,X,X,b,X
I just want to replace only separate whole characters('a','y','q','g'), but not the 1 in combinations('ca','va','ea')...
Because you are delimiting with a comma ',' you can combine that like ',a,'
and this will replace only single a's.
you can try follows:
with t as
(
select 'a,ca,va,ea,r,y,q,b,g' str
from dual
)
select substr(sys_connect_by_path(regexp_replace(regexp_substr(str, '[^,]+', 1, level), '^(a|y|q|g)$', 'X'), ','), 2) as str
from t
where connect_by_isleaf = 1
connect by level <= length(regexp_replace(str, '[^,]*')) + 1;
Sadly oracle doesn´t support lookahead and lookbehind. But this is a solution i came up with.
SELECT regexp_replace
(regexp_replace
('a,ca,va,ea,r,y,q,b,g',
'^[ayqg](,)|(,)[ayqg](,)|(,)[ayqg]$',
'\2\4X\1\3'),'(,)[ayqg](,)','\1X\2')
RESULT FROM dual;
I had to use the regexp twice sadly, since it doesn´t find two similar values following after each other and replacing it. ..,a,y,.. is getting replaced as ..,X,y,... So the second call replaces the missing [ayqg] with the exact values. In the first inner regexp call replaces the first and last values.
Maybe this could be simplified into one expression, but i am not that conform with the regex from oracle.
As a explanation i am grouping the commata and basicly replace every ,[ayqg], with ,X, by backreferencing the commata
You would look for word boundaries, which is \b, and which is unfortunately not supported by Oracle's regexp_replace.
So let's look for a non-word character \W or the beginning ^ or ending $ of the text.
select
regexp_replace('a,ca,va,ea,r,y,q,b,g','(^|$|\W)(a|y|q|g)(^|$|\W)','\1X\3') as result
from dual;
In order to not remove the non-word characters, we must have them in the replace string: \1 for the expression in the first parenteses, \3 for the ones in the third. Thus we only change the expression in the second parentheses, which is a, y, q or g, with X.
Unfortunately above gives
X,ca,va,ea,r,X,q,b,X
The q was not replaced, because we recognize ',y,' thus being positioned a 'g,' whereas we'd need to be positioned at ',g,' to recognize g as a word, too.
So we need to replace in iterations (i.e. recursively):
with results(txt, num) as
(
select 'a,ca,va,ea,r,y,q,b,g' as txt, 0 as num from dual
union all
select regexp_replace(txt, '(^|$|\W)(a|y|q|g)(^|$|\W)','\1X\3'), num + 1 as num
from results
where txt <> regexp_replace(txt, '(^|$|\W)(a|y|q|g)(^|$|\W)','\1X\3')
)
select max(txt) keep (dense_rank last order by num) as result
from results;
EDIT: Kevin Esche is right; of course one has to do it only twice. Hence you can also do:
select
regexp_replace(txt, search_str, replace_str) as result
from
(
select
regexp_replace(txt, search_str, replace_str) as txt, search_str, replace_str
from
(
select
'a,ca,va,ea,r,y,q,y,q,b,g' as txt,
'(^|$|\W)(a|y|q|g)(^|$|\W)' as search_str,
'\1X\3' as replace_str
from dual
)
);
with replaced_values as (
SELECT case when length(val)=1 then regexp_replace(val,'(a|y|q|g)','X') else val end new_val, lvl
from (
SELECT regexp_substr('a,ca,va,ea,r,y,q,b,g','[^,]+', 1, LEVEL) val, level lvl FROM dual
connect by regexp_substr('a,ca,va,ea,r,y,q,b,g','[^,]+',1, LEVEL) is not null
) all_values
)
select lISTAGG(new_val, ',') WITHIN GROUP (ORDER BY lvl) RESULT
from replaced_values
This statement pivots data into rows and replaces only lines wich contains one character.
Data are then unpivoted in one rows
This sql works also with empty entries like 'a,,,b,c' and more complex regular expressions:
with t as
(select ',a,,ca,va,ea,bbb,ba,r,y,q,b,g,,,' as str,
',' as delimiter,
'(a|y|q|g|ea|[b]*)' as regexp_expr,
'X' as replace_expr
from dual)
(select substr (sys_connect_by_path(regexp_replace(substr(str,
decode(level - 1, 0, 0, instr(str, ',', 1, level - 1)) + 1,
decode(instr(str, ',', 1, level),
0,
length(str),
instr(str, ',', 1, level) - 1) -
decode(level - 1, 0, 0, instr(str, ',', 1, level - 1))),
'^' || regexp_expr || '$',
replace_expr), ','), 2)
from t
where connect_by_isleaf = 1
connect by level <= length(regexp_replace(str, '[^'|| delimiter||']')) + 1)
Result
,X,,ca,va,X,X,ba,r,X,X,X,X,,,
Don't Know much Oracle, but I would have thought something like this could work. Assuming the delimiter is always a comma.
SELECT
regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace('a,ca,va,ea,r,y,q,b,g','(,a,|,y,|,q,|,g,)',',X,') ,'(,a,|,y,|,q,|,g,)',',X,'), '(^a,|^y,|^q,|^g,)','X,'), '(,a$|,y$|,q$|,g$)',',X'), '(^a$|^y$|^q$|^g$)','X')
RESULT FROM test;
The first two parts replaces a single character in commas in the middle, the third part gets those at the start of the string, the fourth is for the end of the string and the fifth is for when then string has just one character.
This answer might will be simplifiable by advanced Regexp use.
How i can replace words?
RS & OS ===> D, LS & IS ==== >
SECTION_ID Output required
1-LS-1991 1-P-1991
1-IS-1991 1-P-1991
1-RS-1991 1- D- 1991
1-OS-1991 1-D-1991