\b regular expression character in Oracle 11g - sql

Unfortunately \b regular expression character doesn't work in Oracle.
As a workaround I found following expression:
(^|\s|\W)(100100|100101|100102|100103)($|\s|\W)
(see: The missing \b regular expression special character in Oracle.), but in the test string data:
Test string 100100/100101, ABC-DEF, 100102 100103 test data abc100100 100100abc.
values 100101 and 100103 are not matched, while I am expecting them to be matched like it is the case of \b expression.
Is there any way to make it working? I am using Oracle 11g.
I would be appreciated for any help.
EDIT:
My goal is to tag all matches. The output that I am expecting is:
Test string [ddd]100100[/ddd]/[ddd]100101[/ddd], ABC-DEF, [ddd]100102[/ddd] [ddd]100103[/ddd] test data abc100100 100100abc.
In this purpose I am using following statement:
regexp_replace(p_text,'(^|\s|\W)(' || l_ids || ')($|\s|\W)', '\1[ddd]\2[/ddd]\3');
Where:
l_ids - list of ids separated by |, id can contain number, letters, underscores and dashes
p_text - input text
EDIT 2:
In the above test string value 100100 should not be matched in the word abc100100 as well as 100100abc.

Assuming -
chr(1) does not appear in the text
Any character that is not in [a-zA-Z0-9] is considered as a delimiter (e.g. /)
with t (p_text) as (select 'Test string 100100/100101, ABC-DEF, 100102 100103 test data abc100100 100100abc.' from dual)
select replace
(
regexp_replace
(
regexp_replace
(
p_text
,'([a-zA-Z0-9]+)'
,chr(1) || '\1' || chr(1)
)
,chr(1) || '(100100|100101|100102|100103)' || chr(1)
,'[ddd]\1[/ddd]'
)
,chr(1)
)
from t
Test string [ddd]100100[/ddd]/[ddd]100101[/ddd], ABC-DEF,
[ddd]100102[/ddd] [ddd]100103[/ddd] test data abc100100 100100abc.

Related

Adapt regex with lookahead to oracle db format

After the useful answers on my previous question (see How do I create a regex to avoid a repeated number with optional hyphen?) we reached a solution that matched my needings.
The final result was:
^(?!(\d)(?:-?\1)*$)\d{2}-?\d{7}$
The above regex excludes these data:
00-0000000 and 000000000
11-1111111 and 111111111
22-2222222 and 222222222
...
99-9999999 and 999999999
Note that 22-2222221 is valid.
Note also that the position of the hyphen can be anywhere after the first digit and before the last one
Now that everything seemd to work fine we noticed that this pattern is not compatible with the oracle database REGEXP LIKE command.
Any suggestion on how to adapt it?
Thanks in advance.
I read here Oracle regular expression replacement for negative lookahead/lookbehind and the solution provided doesn't seem to work for me.
Given you comment:
the hyphen can be anywhere after the first digit and before the last one
You can do it all without regular expressions using:
SELECT *
FROM table_name
WHERE -- Check that the value as the correct length
LENGTH(value) IN (9, 10)
-- Check that the value has the correct length without hyphens
AND LENGTH(REPLACE(value, '-')) = 9
-- Check that the value has only digits or hyphens
AND TRANSLATE(value, 'a-0123456789', 'a') IS NULL
-- Check that all the characters are not either hyphens or the same as the
-- first character
AND TRANSLATE(value, 'a-' || SUBSTR(value, 1, 1), 'a') IS NOT NULL;
If the hyphen will always be the 3rd character (if it is present) then:
SELECT *
FROM table_name
WHERE -- Check that the value has the correct format
( value LIKE '_________' OR value LIKE '__-_______' )
-- Check that the other characters are digits
AND TRANSLATE(
SUBSTR(value, 1, 2) || SUBSTR(value, -7),
'a0123456789',
'a'
) IS NULL
-- Check that all the characters are not either hyphens or the same as the
-- first character
AND TRANSLATE(value, 'a-' || SUBSTR(value, 1, 1), 'a') IS NOT NULL;
If you want to use regular expressions then you will need two regular expressions:
SELECT *
FROM table_name
WHERE REGEXP_LIKE(value, '^\d{2}-?\d{7}$')
AND NOT REGEXP_LIKE(value, '^(\d)\1-?\1{7}$');
or for hyphens anywhere:
SELECT *
FROM table_name
WHERE REGEXP_LIKE(value, '^\d+-?\d+$')
AND REGEXP_LIKE(value, '^[0-9-]{9,10}$')
AND NOT REGEXP_LIKE(value, '^(\d)(-?\1){8}$');
Alternatively, you can enable Java inside the database and use look-ahead via a Java method and the regular expression:
^(?!(\d)(-?\1){8}$)(?=(\d{9}|[0-9-]{10})$)\d+-?\d+$
fiddle

How to give argument for a repeating character in snowflake regex

My string is a comment that looks like:
***z|Samuel|Amount:15|Frequency:1
I want to use regex to filter all such rows out of a data base, my query is below
select
ID,
COMMENT,
max(case when lower(COMMENT) Rlike '\*+z\|Samuel\|Amount:[0-9]+\|Frequency:[0-9]+'
then 1 else 0 end) as indicator
from Table_Name group by 1,2
But this gives me an error:
Invalid regular expression: '*+z|Samuel|Amount:[0-9]+|Frequency:[0-9]+', no argument for repetition operator: *
Does anyone know how to navigate through this?
Using '[*]+z[|]Samuel[|]Amount:[0-9]+[|]Frequency:[0-9]+':
CREATE OR REPLACE TEMPORARY TABLE t AS
SELECT '***z|Samuel|Amount:15|Frequency:1' AS COMMENT;
SELECT *
FROM t
WHERE RLIKE (t.COMMENT, '[*]+z[|]Samuel[|]Amount:[0-9]+[|]Frequency:[0-9]+', 'i');
Output:
Alternatively the original \ should be doubled or the string not wrapped with ':
'\*+z\|Samuel\|Amount:[0-9]+\|Frequency:[0-9]+'
=>
'\\*+z\\|Samuel\\|Amount:[0-9]+\\|Frequency:[0-9]+'
$$\*+z\|Samuel\|Amount:[0-9]+\|Frequency:[0-9]+$$
Matching Characters That Are Metacharacters
If you are using the regular expression in a single-quoted string constant, you must escape the backslash with a second backslash (e.g. \., \*, \?, etc.).
SELECT COMMENT,
RLIKE (t.COMMENT, '[*]+z[|]Samuel[|]Amount:[0-9]+[|]Frequency:[0-9]+', 'i') AS "[]",
RLIKE (t.COMMENT, $$\*+z\|Samuel\|Amount:[0-9]+\|Frequency:[0-9]+$$, 'i') AS "$$",
RLIKE (t.COMMENT, '\\*+z\\|Samuel\\|Amount:[0-9]+\\|Frequency:[0-9]+', 'i') AS "\\"
FROM t;
Output:

REGEXP_REPLACE for spark.sql()

I need to write a REGEXP_REPLACE query for a spark.sql() job.
If the value, follows the below pattern then only, the words before the first hyphen are extracted and assigned to the target column 'name', but if the pattern doesn't match, the entire 'name' should be reported.
Pattern:
Values should be hyphen delimited. Any values can be present before the first hyphen (be it numbers,
alphabets, special characters or even space)
First hyphen should be exactly followed by 2 words, separated by hyphen (it can only be numbers,
alphabets or alphanumeric) (Note: Special characters & blanks are not allowed)
Two words should be followed by one or more digits, followed by hyphen.
Last portion should be only one or more digits.
For Example:
if name = abc45-dsg5-gfdvh6-9890-7685, output of REGEXP_REPLACE = abc45
if name = abc, output of REGEXP_REPLACE = abc
if name = abc-gf5-dfg5-asd5-98-00, output of REGEXP_REPLACE = abc-gf5-dfg5-asd5-98-00
I have
spark.sql("SELECT REGEXP_REPLACE(name , '-[^-]+-\\w{2}-\\d+-\\d+$','',1,1,'i') AS name").show();
But it does not work.
Use
^([^-]*)(-[a-zA-Z0-9]+){2}-[0-9]+-[0-9]+$
See proof. Replace with $1. If $1 does not work, use \1. If \1 does not work use \\1.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^-]* any character except: '-' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
( group and capture to \2 (2 times):
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
[a-zA-Z0-9]+ any character of: 'a' to 'z', 'A' to
'Z', '0' to '9' (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
){2} end of \2 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \2)
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

How do I use an ORACLE REGEX function to remove all leading and trailing line break characters and spaces?

How do I use an ORACLE REGEX function to remove all leading and trailing line break characters and spaces?
For example, assume I have the following string where refers to actual invisible carriage return line feed characters. Here's the input:
"
SELECT *
FROM
TABLE
"
And here's the desire output:
"SELECT *
FROM
TABLE"
This would do it if regex_replace() is a requirement:
select regexp_replace('
SELECT *
FROM
TABLE
', '^\s*|\s*$', '') as hello
from dual
See https://www.techonthenet.com/oracle/functions/regexp_replace.php for documentation.
A single regexp_replace is sufficient, eg.
select regexp_replace('
select frut
from prut
','^[[:space:]]*(.*[^[:space:]])[[:space:]]*$','\1',1,1,'mn') from dual;
results in
select frut
from prut

Delete certain character based on the preceding or succeeding character - ORACLE

I have used REPLACE function in order to delete email addresses from hundreds of records. However, as it is known, the semicolon is the separator, usually between each email address and anther. The problem is, there are a lot of semicolons left randomly.
For example: the field:
123#hotmail.com;456#yahoo.com;789#gmail.com;xyz#msn.com
Let's say that after I deleted two email addresses, the field content became like:
;456#yahoo.com;789#gmail.com;
I need to clean these fields from these extra undesired semicolons to be like
456#yahoo.com;789#gmail.com
For double semicolons I have used REPLACE as well by replacing each ;; with ;
Is there anyway to delete any semicolon that is not preceded or following by any character?
If you only need to replace semicolons at the start or end of the string, using a regular expression with the anchor '^' (beginning of string) / '$' (end of string) should achieve what you want:
with v_data as (
select '123#hotmail.com;456#yahoo.com;789#gmail.com;xyz#msn.com' value
from dual union all
select ';456#yahoo.com;789#gmail.com;' value from dual
)
select
value,
regexp_replace(regexp_replace(value, '^;', ''), ';$', '') as normalized_value
from v_data
If you also need to replace stray semicolons from the middle of the string, you'll probably need regexes with lookahead/lookbehind.
You remove leading and trailing characters with TRIM:
select trim(both ';' from ';456#yahoo.com;;;789#gmail.com;') from dual;
To replace multiple characters with only one occurrence use REGEXP_REPLACE:
select regexp_replace(';456#yahoo.com;;;789#gmail.com;', ';+', ';') from dual;
Both methods combined:
select regexp_replace( trim(both ';' from ';456#yahoo.com;;;789#gmail.com;'), ';+', ';' ) from dual;
regular expression replace can help
select regexp_replace('123#hotmail.com;456#yahoo.com;;456#yahoo.com;;789#gmail.com',
'456#yahoo.com(;)+') as result from dual;
Output:
| RESULT |
|-------------------------------|
| 123#hotmail.com;789#gmail.com |