ORACLE REGEXP_SUBSTR. Retrieving vales between characters [duplicate] - sql

This question already has answers here:
Regular expression to stop at first match
(9 answers)
Closed 5 days ago.
I have values in a table which have a format similar to below. I only want to retrieve the string of data between A-E>>.....>>. (eg. the first occurrence of the >> so in the case below it would be Chubb Fire & Security Pty Ltd - ABN 47000067541
A-E>>Chubb Fire & Security Pty Ltd - ABN 47000067541>>C2004/10539>>My Docs
I have tried using REGEXP_SUBSTR(path,'A-E>>([^.]+)>>',1,1,NULL,1) and other variances but it is also returning values past the >>. For example it would return
Chubb Fire & Security Pty Ltd - ABN 47000067541>>C2004/10539>>My Docs
Any ideas what I have missed in my Regex?

An option would be using REGEXP_REPLACE() with capture group 2 in order to extract the second piece sliced by the below pattern group such as
SELECT REGEXP_REPLACE(path,'^(.*A-E>>)([^>>]*).*','\2') AS new_path
FROM t -- your table
Demo

I'd rather suggest simple & fast substr + instr combination; extract substring between the 1st and the 2nd occurrence of the >> sign.
Sample data:
SQL> with test(col) as
2 (select 'A-E>>Chubb Fire & Security Pty Ltd - ABN 47000067541>>C2004/10539>>My Docs' from dual)
Query:
3 select substr(col, instr(col, '>>', 1, 1) + 2,
4 instr(col, '>>', 1, 2) - instr(col, '>>', 1, 1) - 2
5 ) result
6 from test;
RESULT
-----------------------------------------------
Chubb Fire & Security Pty Ltd - ABN 47000067541
SQL>

You could try this regex pattern in your query: A-E>>([^.>]+)>>.*$
SELECT
REGEXP_SUBSTR(
'A-E>>Chubb Fire & Security Pty Ltd - ABN 47000067541>>C2004/10539>>My Docs',
'A-E>>([^.>]+)>>.*$',1,1,NULL,1)
FROM DUAL
Please check a demo here

SELECT REGEXP_REPLACE('A-E>>Chubb Fire Security Pty Ltd - ABN
47000067541>>C2004/10539>>My DoC>>some other text', '^[^>]*>>([^>]+).*$', '\1')
AS extracted_text FROM DUAL;
'^[^>]*': Matches any characters at the start of the string that are not ">" (i.e. the characters before the first ">>").
'>>': Matches the first occurrence of ">>"
'([^>]+)': Matches one or more characters that are not ">" and captures them in a group (i.e. the characters between the first and second ">>")
'.*$': Matches any characters to the end of the string.
Regex101
or
use this function with substitution "&3" or "\3" (.+[a-zA-Z])(>>)([a-zA-Z].+?)>>.*
Regex101

Related

Need to remove the exact group of characters

I need to remove all the characters after a particular string (-->).
select
REGEXP_SUBSTR('-->Team Name - Red-->blue', '[^(-->)]+')
from dual;
expected result from the above query is "Team Name - Red". But its returning "Team Name".
Its filtering out everything whenever it matches any of one character.
You can still use Regexp_Substr() analytic function :
Select Regexp_Substr('-->Team Name - Red-->blue',
'-{2}>(.*?)-{2}>',1,1,null,1) as "Result"
From dual;
Result
---------------
Team Name - Red
-{2}> ~ exactly twice occurence of - and single occurence of > e.g. ( --> )
(.*?) ~ matches anything delimited by the pattern above
Demo
You could try using REGEXP_REPLACE here with a capture group:
SELECT
REGEXP_REPLACE('-->Team Name - Red-->blue', '.*-->(.*?)-->.*', '\1')
FROM dual;
The output from this is Team Name - Red
Demo
It seems that you, actually, want to return string between two --> marks. A good, old substr + instr option would be
SQL> with test (col) as
2 (select '-->Team Name - Red-->blue' from dual)
3 select substr(col,
4 instr(col, '-->', 1, 1) + 3,
5 instr(col, '-->', 1, 2) - instr(col, '-->', 1, 1) - 3
6 ) result
7 from test;
RESULT
---------------
Team Name - Red
SQL>

SELECT MULTIPLE TEXT FROM A TEXT STRING USING REGEXP ORACLE

i have the following text string stored in an oracle 11g table
"MGK8M76HRT Confirmed. You have received Kshs 6,678.00 from Peter 0700123456 on 1/1/2018"
I would like to extract the following from the text using regexp
6,678.00 - amount paid
MGK8M76HRT - unique payment transaction code (changes pattern everytime)
0700123456 - phone number
1/1/2018 - payment date
I have tried multiple oracle regexp patterns to extract the texts without any success. Any assistance/ideas will be appreciated.
I tried:
CONFIRMATION_CODE_PATTERN = "[A-Z0-9]+ Confirmed.";
PHONE_PATTERN = "07[\\d]{8}";
AMOUNT_PATTERN = "Ksh[,|.|\\d]+";
DATETIME_PATTERN = "d/M/yy hh:mm a";
Note that inside bracket expressions, in Oracle regex, you cannot use regex escapes. [\d] does not match a digit, it matches a \ or d chars. You should use [0-9] / [[:digit:]] instead. Next, you should use capturing groups, (...), to wrap those parts of the pattern that you want to exract.
You may use the following regular expressions:
select regexp_substr('MGK8M76HRT Confirmed. You have received Kshs 6,678.00 from Peter 0700123456 on 1/1/2018',
'Kshs\s*(\d([,.0-9]*\d)?)', 1, 1, NULL, 1) as Paid from dual
\\
select regexp_substr('MGK8M76HRT Confirmed. You have received Kshs 6,678.00 from Peter 0700123456 on 1/1/2018',
'(\D|^)(07\d{8})(\D|$)', 1, 1, NULL, 2) as Phone from dual
\\
select regexp_substr('MGK8M76HRT Confirmed. You have received Kshs 6,678.00 from Peter 0700123456 on 1/1/2018',
'(\S+)\s+Confirmed\.', 1, 1, NULL, 1) as Code from dual
\\
select regexp_substr('MGK8M76HRT Confirmed. You have received Kshs 6,678.00 from Peter 0700123456 on 1/1/2018',
'\d{1,2}/\d{1,2}/\d{4}') as TrDate from dual
Please organize this as per your requirements, it does not seem to be in the scope of the question.
Output:

How to get string after character oracle

I have VP3 - Art & Design and HS5 - Health & Social Care, I need to get string after '-' in Oracle. Can this be achieved using substring?
For a string operation as simple as this, I might just use the base INSTR() and SUBSTR() functions. In the query below, we take the substring of your column beginning at two positions after the hyphen.
SELECT
SUBSTR(col, INSTR(col, '-') + 2) AS subject
FROM yourTable
We could also use REGEXP_SUBSTR() here (see Gordon's answer), but it would be a bit more complex and the performance might not be as good as the above query.
You can use regexp_substr():
select regexp_substr(col, '[^-]+', 1, 2)
If you want to remove an optional space, then you can use trim():
select trim(leading ' ', regexp_substr(col, '[^-]+', 1, 2))
The non-ovious parameters mean
1 -- search from the first character of the source. 1 is the default, but you have to set it anyway to be able to provide the second parameter.
2 -- take the second match as the result substring. the default would be 1.
You can use:
SELECT CASE
WHEN INSTR(value, '-') > 0
THEN SUBSTR(value, INSTR(value, '-') + 1)
END AS subject
FROM table_name
or
SELECT REGEXP_SUBSTR( value, '-(.*)$', 1, 1, NULL, 1 ) AS subject
FROM table_name
Which, for the sample data:
CREATE TABLE table_name ( value ) AS
SELECT 'VP3 - Art & Design and HS5 - Health & Social Care' FROM DUAL UNION ALL
SELECT '1-2-3' FROM DUAL UNION ALL
SELECT '123456' FROM DUAL
Both output:
| SUBJECT |
| :------------------------------------------- |
| Art & Design and HS5 - Health & Social Care |
| 2-3 |
| null |
Trimming leading white-space:
If you want to trim the leading white-space then you can use:
SELECT CASE
WHEN INSTR(value, '-') > 0
THEN LTRIM(SUBSTR(value, INSTR(value, '-') + 1))
END AS subject
FROM table_name
or
SELECT REGEXP_SUBSTR( value, '-\s*(.*)$', 1, 1, NULL, 1 ) AS subject
FROM table_name
Which both output:
| SUBJECT |
| :------------------------------------------ |
| Art & Design and HS5 - Health & Social Care |
| 2-3 |
| null |
Why the naive solutions don't always work:
SELECT SUBSTR(value, INSTR(value, '-') + 2) AS subject
FROM table_name
Does not work in 2 cases:
It finds the index of the - character and then skips 2 characters (the - character and then the assumed white-space character); if the second character is not a white-space character then it will miss the first character of the substring (i.e. if the input is 1-2-3 then the output would be -3 rather than 2-3).
It assumes that there will always be a - character in the string; if this is not the case then it will erroneously return the substring starting from the second character rather than returning NULL (i.e. if the input is 123456 then the output is 23456 rather than NULL).
Using the regular expression:
SELECT REGEXP_SUBSTR(value, '[^-]+', 1, 2)
FROM table_name
Does not find the substring after the 1st - character; it will find the substring between the 1st and 2nd - characters and strip any characters outside that range (inclusive of the - characters). So if the input is VP3 - Art & Design and HS5 - Health & Social Care then the output is Art & Design and HS5 rather than the expected Art & Design and HS5 - Health & Social Care.

PLSQL show digits from end of the string

I have the following problem.
There is a String:
There is something 2015.06.06. in the air 1234567 242424 2015.06.07. 12125235
I need to show only just the last date from this string: 2015.06.07.
I tried with regexp_substr with insrt but it doesn't work.
So this is just test, and if I can solve this after it with this solution I should use it for a CLOB query where there are multiple date, and I need only the last one. I know there is regexp_count, and it is help to solve this, but the database what I use is Oracle 10g so it wont work.
Can somebody help me?
The key to find the solution of this problem is the idea of reversing the words in the string presented in this answer.
Here is the possible solution:
WITH words AS
(
SELECT regexp_substr(str, '[^[:space:]]+', 1, LEVEL) word,
rownum rn
FROM (SELECT 'There is something 2015.06.06. in the air 1234567 242424 2015.06.07. 2015.06.08 2015.06.17. 2015.07.01. 12345678999 12125235' str
FROM dual) tab
CONNECT BY LEVEL <= LENGTH(str) - LENGTH(REPLACE(str, ' ')) + 1
)
, words_reversed AS
(
SELECT *
FROM words
ORDER BY rn DESC
)
SELECT regexp_substr(word, '\d{4}\.\d{2}\.\d{2}', 1, 1)
FROM words_reversed
WHERE regexp_like(word, '\d{4}\.\d{2}\.\d{2}')
AND rownum = 1;
From the documentation on regexp_substr, I see one problem immediately:
The . (period) matches any character. You need to escape those with a backslash: \. in order to match only a period character.
For reference, I am linking this post which appears to be the approach you are taking with substr and instr.
Relevant documentation from Oracle:
INSTR(string , substring [, position [, occurrence]])
When position is negative, then INSTR counts and searches backward from the end of string. The default value of position is 1, which means that the function begins searching at the beginning of string.
The problem here is that your regular expression only returns a single value, as explained here, so you will be giving the instr function the appropriate match in the case of multiple dates.
Now, because of this limitation, I recommend using the approach that was proposed in this question, namely reverse the entire string (and your regular expression, i.e. \d{2}\.\d{2}\.\d{4}) and then the first match will be the 'last match'. Then, perform another string reversal to get the original date format.
Maybe this isn't the best solution, but it should work.
There are three different PL/SQL functions that will get you there.
The INSTR function will identify where the first "period" in the date string appears.
SUBSTR applied to the entire string using the value from (1) as the start point
TO_DATE for a specific date mask: YYYY.MM.DD will convert the result from (2) into a Oracle date time type.
To make this work in procedural code, the standard blocks apply:
DECLARE
v_position pls_integer;
... other variables
BEGIN
sql code and function calls;
END
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE finddate
(column1 varchar2(11), column2 varchar2(39))
;
INSERT ALL
INTO finddate (column1, column2)
VALUES ('row1', '1234567 242424 2015.06.07. 12125235')
INTO finddate (column1, column2)
VALUES ('string2', '1234567 242424 2015.06.07. 12125235')
SELECT * FROM dual
;
Query 1:
select instr(column2,'.',1) from finddate
where column1 = 'string2'
select substr(column2,(20-4),10) from finddate
select to_date('2015.06.07','YYYY.MM.DD') from finddate
Results:
| TO_DATE('2015.06.07','YYYY.MM.DD') |
|------------------------------------|
| June, 07 2015 00:00:00 |
| June, 07 2015 00:00:00 |
Here's a way using regexp_replace() that should work with 10g, assuming the format of the lines will be the same:
with tbl(col_string) as
(
select 'There is something 2015.06.06. in the air 1234567 242424 2015.06.07. 12125235'
from dual
)
select regexp_replace(col_string, '^.*(\d{4}\.\d{2}\.\d{2})\. \d*$', '\1')
from tbl;
The regex can be read as:
^ - Match the start of the line
. - followed by any character
* - followed by 0 or more of the previous character (which is any character)
( - Start a remembered group
\d{4}\.\d{2}\.\d{2} - 4 digits followed by a literal period followed by 2 digits, etc
) - End the first remembered group
\. - followed by a literal period
- followed by a space
\d* - followed by any number of digits
$ - followed by the end of the line
regexp_replace then replaces all that with the first remembered group (\1).
Basically describe the whole line as a regular expression, group around what you want to return. You will most likely need to tweak the regex for the end of the line if it could be other characters than digits but this should give you an idea.
For the sake of argument this works too ONLY IF there are 2 occurrences of the date pattern:
with tbl(col_string) as
(
select 'There is something 2015.06.06. in the air 1234567 242424 2015.06.07. 12125235' from dual
)
select regexp_substr(col_string, '\d{4}\.\d{2}\.\d{2}', 1, 2)
from tbl;
returns the second occurrence of the pattern. I expect the above regexp_replace more accurately describes the solution.

Regular Expression - Retrieve specific asterisk separated value string

I need to retrieve a specific part of a string which has values separated by asterisk's
In the example below I need to retrieve the string Client Contact Centre Seniors2 which sits between the 6 and 7 asterisk.
I am fairly new to regular expressions and have only managed to find select a value between 2 asterisks using *[\w]+*
Is there a way to specify which number of asterisk to look at using regular expression, or is there a better way for me to retrieve the string I am after?
String:
2*J25*Owner11*Owner Group2*L231*CLIENTCONTACTCENTRESENIORSQUEUE29*Client Contact Centre Seniors2*K20*0*2*C110*SR_STAT_ID2*N18*Referred2*O10*
Note: I will be using this regular expression in Oracle SQL using REGEXP_LIKE(string, regex).
* is a regex operator and needs to be escaped, unless used inside brackets that holds character list. You can use this simplified pattern to extract the
seventh word.
regexp_substr(Audits.audit_log,'[^*]+',1,7)
SQL Fiddle
Query 1:
with x(y) as (
select '2*J25*Owner11*Owner Group2*L231*CLIENTCONTACTCENTRESENIORSQUEUE29*Client Contact Centre Seniors2*K20*0*2*C110*SR_STAT_ID2*N18*Referred2*O10*'
from dual
)
select regexp_substr(y,'([^*]+)\*',1,7,null,1)
from x
Results:
| REGEXP_SUBSTR(Y,'([^*]+)\*',1,7,NULL,1) |
|-----------------------------------------|
| Client Contact Centre Seniors2 |
Query 2:
with x(y) as (
select '2*J25*Owner11*Owner Group2*L231*CLIENTCONTACTCENTRESENIORSQUEUE29*Client Contact Centre Seniors2*K20*0*2*C110*SR_STAT_ID2*N18*Referred2*O10*'
from dual
)
select regexp_substr(y,'[^*]+',1,7)
from x
Results:
| REGEXP_SUBSTR(Y,'[^*]+',1,7) |
|--------------------------------|
| Client Contact Centre Seniors2 |
You could also use INSTR and SUBSTR for that. Simple and fast, but not as concise as the REGEXP_SUBSTR.
with t as (
select '2*J25*Owner11*Owner Group2*L231*CLIENTCONTACTCENTRESENIORSQUEUE29*Client Contact Centre Seniors2*K20*0*2*C110*SR_STAT_ID2*N18*Referred2*O10*' testvalue
from dual
)
select substr(testvalue, instr(testvalue, '*', 1, 6)+1, instr(testvalue, '*', 1, 7) - instr(testvalue, '*', 1, 6) - 1)
from t;