How to extract a text between brackets in oracle sql query - sql

I am trying to extract a value between the brackets from a string.
For example, I have this string:
No information was found [AI1234].
And I want to get the result between the brackets, i.e. AI1234.
However the expression is not always the same. It may vary.
I am trying to write a query like this:
REGEXP_SUBSTR(mssg, '\((.+)\)', 1, 1, NULL, 1) AS "description" from book
But it is not returning anything.What am I missing?
Also I already tried something like that, the things is that the bracket length may vary. So this one below will return something, but not what I am looking for:
substr(mssg,instr(mssg,'(')-8,10) as "description"

If you're looking for a group of digits between square brackets, try this:
WITH
indata(msg) AS (
SELECT 'No information was found [1234]'
)
SELECT
REGEXP_SUBSTR(
msg -- the string
, '^[^[]+[[](\d+)[]].*$' -- the pattern (with a captured
-- string "\d+" in round parentheses)
, 1 -- start from position 1
, 1 -- first found occurrence
, '' -- no modifiers
, 1 -- first captured group
) AS extr
FROM indata;
extr
------
1234

You should do googling more about oracle regexp.
Please try with this.(above Oracle 11g)
SELECT REGEXP_SUBSTR(mssg, '\[[^0-9]*(\d+)[^0-9]*\]', 1, 1, NULL, 1) description
FROM book;
** This helped me to answer here.
UPDATE: This will be OK.
SELECT REGEXP_SUBSTR('No information was found [{AI1234}].', '[[({][^0-9]*(\d+)[^0-9]*[]})]', 1, 1, NULL, 1) description
FROM dual;
UPDATE: Final solution
SELECT REGEXP_SUBSTR('No information was found [{AI1234}].', '[[({]+([^][)(}{]*)[])}]+', 1, 1, NULL, 1) description
FROM dual;
Here, you should take care to [^][)(}{].
DO NOT swap the bracket chracters.
I'll quote from Oracle 11g Regexp reference
[ ]
Bracket expression for specifying a matching list that should match any one of the expressions represented in the list. A non-matching list expression begins with a circumflex (^) and specifies a list that matches any character except for the expressions represented in the list.
To specify a right bracket (]) in the bracket expression, place it first in the list (after the initial circumflex (^), if any).
To specify a hyphen in the bracket expression, place it first in the list (after the initial circumflex (^), if any), last in the list, or as an ending range point in a range expression.
This part - [^ ] - was a hard nut to crack and finally I found solution from the reference, that's why I emphasis this.

Related

Oracle SQL How to extract a string on the first occurrence between two specific fixed strings of "//"

I have data in an Oracle column in the below format:
Ch99// 4// Sub// 586915 16// jkc 12/12/22
And I need to extract the substring of "4" that will always be between the first two occurrences of "//".
So the output needed is:
4
I'm currently using ,regexp_substr(column, '//([^}]+)//', 1,1,NULL,1) but it's returning everything after the first "//"
I think the simplest thing is probably to use '//(.*?)//'. This matches any characters between // and //, but does it in a non-greedy way.
It is not clear why you are trying to use [^}]+ (matches one or more characters that are not a closing curly brace), since your question doesn't mention any requirements around curly braces.
A Fiddle of it working with your example:
SELECT regexp_substr(
'Ch99// 4// Sub// 586915 16// jkc 12/12/22',
'//(.*?)//',
1, 1, null, 1)
FROM dual
returns
4

Use REGEXP_SUBSTR to extract string of varied length

I want to extract alphanumeric text of varied length from a string between the second occurrence of a specific characters.
I have tried various forms of substr and regexp_substr but can't seem to get the syntax right. This is for use in Teradata SQL assistant. In the past I would have to create a temp table and use substr twice before trimming down the string to what I need. I want to do it all in one go.
SELECT regexp_substr('Channel:DF GB, Order Num:12345T6, Order Date:01/01/2019, Charge Codes:TAXES,,GBRAX', 'Num\\:+(\\:+)',1,2, ':') as RESULTING_STRING
My desired result is to return ONLY what is between "Num:" and the next "," in this case "12345T6". The length of the order number can vary so it is not a fixed length. When I run my code the actual output is a '?' returned by Teradata. What am I doing wrong?
This seems to work:
SELECT regexp_substr('Channel:DF GB, Order Num:12345T6, Order Date:01/01/2019, Charge Codes:TAXES,,GBRAX', 'Num:(\w*)', 1, 1, NULL, 1) as RESULTING_STRING from dual
Finds Num: and then captures as many word characters (, is not a word char) as there are available. The last parameter - subexpr - specifies which subexpression (aka capture group) you want, without it the whole thing will be matched (Num:12345T6).
Assuming you use Teradata SQL Assistant to query a Teradata system (but why do you tag Oracle then) the RegEx syntax is slightly different (both use a different RegEx dialects):
Teradata's RegExp_Substr doesn't support the subexpression parameter, you can either switch to the (I really don't know why) undocumented RegExp_Substr_gpl
RegExp_Substr_gpl(x, 'Num:([^,]*)', 1, 1, 'i', 1)
or tell the RegEx to forget the previous match using \K:
RegExp_Substr(x, 'Num:\K[^,]*', 1,1, 'i')
You can give a try to the below pattern search !
SELECT REGEXP_REPLACE ((REGEXP_SUBSTR('Channel:DF GB, Order Num:12345T6, Order Date:01/01/2019, Charge Codes:TAXES,,GBRAX', 'Num:[A-Za-z0-9]*',1,1, 'i')),'Num:','',1,1,'i') AS RESULTING_STRING
Regexp_substr pattern search ['Num:[A-Za-z0-9]*'], will first filter out the alphanumeric characters that follow the pattern 'Num:',astriek, helps to find out zero or more occurrences of the specified pattern.
For eg:, in this 'Num:12345T6' will be filtered out of the string provided, also note the last parameter in the regexp_substr is 'i', which ensures case in-specific search.
Lastly, Regexp_replace will replace the pattern 'Num:' from the output of the regexp_substr with an empty string,resulting in a final string as '12345T6'.

How can I extract a substring from a character column without using SUBSTR()?

I have a questions regarding below data.
You clearly can see each EMP_IDENTIFIER has connected with EMP_ID.
So I need to pull only identifier which is 10 characters that will insert another column.
How would I do that?
I did some traditional way, using INSTR, SUBSTR.
I just want to know is there any other way to do it but not using INSTR, SUBSTR.
EMP_ID(VARCHAR2)EMP_IDENTIFIER(VARCHAR2)
62049 62049-2162400111
6394 6394-1368000222
64473 64473-1814702333
61598 61598-0876000444
57452 57452-0336503555
5842 5842-0000070666
75778 75778-0955501777
76021 76021-0546004888
76274 76274-0000454999
73910 73910-0574500122
I am using Oracle 11g.
If you want the second part of the identifier and it is always 10 characters:
select t.*, substr(emp_identifier, -10) as secondpart
from t;
Here is one way:
REGEXP_SUBSTR (EMP_IDENTIFIER, '-(.{10})',1,1,null,1)
That will give the 1st 10 character string that follows a dash ("-") in your string. Thanks to mathguy for the improvement.
Beyond that, you'll have to provide more details on the exact logic for picking out the identifier you want.
Since apparently this is for learning purposes... let's say the assignment was more complicated. Let's say you had a longer input string, and it had several groups separated by -, and the groups could include letters and digits. You know there are at least two groups that are "digits only" and you need to grab the second such "purely numeric" group. Then something like this will work (and there will not be an instr/substr solution):
select regexp_substr(input_str, '(-|^)(\d+)(-|$)', 1, 2, null, 2) from ....
This searches the input string for one or more digits ( \d means any digit, + means one or more occurrences) between a - or the beginning of the string (^ means beginning of the string; (a|b) means match a OR b) and a - or the end of the string ($ means end of the string). It starts searching at the first character (the second argument of the function is 1); it looks for the second occurrence (the argument 2); it doesn't do any special matching such as ignore case (the argument "null" to the function), and when the match is found, return the fragment of the match pattern included in the second set of parentheses (the last argument, 2, to the regexp function). The second fragment is the \d+ - the sequence of digits, without the leading and/or trailing dash -.
This solution will work in your example too, it's just overkill. It will find the right "digits-only" group in something like AS23302-ATX-20032-33900293-CWV20-3499-RA; it will return the second numeric group, 33900293.

Argument '0' is out of range error

I have a query (sql) to pull out a street name from a string. It's looking for the last occurrence of a digit, and then pulling the proceeding text as the street name. I keep getting the oracle
"argument '0' is out of range"
error but I'm struggling to figure out how to fix it.
the part of the query in question is
substr(address,regexp_instr(address,'[[:digit:]]',1,regexp_count(address,'[[:digit:]]'))+2)
any help would be amazing. (using sql developer)
The fourth parameter of regexp_instr is the occurrence:
occurrence is a positive integer indicating which occurrence of
pattern in source_string Oracle should search for. The default is 1,
meaning that Oracle searches for the first occurrence of pattern.
In this case, if an address has no digits within, the regexp_count will return 0, that's not a valid occurrence.
A simpler solution, which does not require separate treatment for addresses without a house number, is this:
with t (address) as (
select '422 Hickory Str.' from dual union all
select 'One US Bank Plaza' from dual
)
select regexp_substr(address, '\s*([^0-9]*)$', 1, 1, null, 1) as street from t;
The output looks like this:
STREET
-------------------------
Hickory Str.
One US Bank Plaza
The third argument to regexp_substr is the first of the three 1's. It means start the search at the first character of address. The second 1 means find the first occurrence of the search pattern. The null means no special match modifiers (such as case insensitive - nothing like that needed here). The last 1 means "return the first SUBEXPRESSION from the match pattern". Subexpressions are parts of the match expression enclosed in parentheses.
The match pattern has a $ at the end - meaning "anchor at the end of the input string" ($ means the end of the string). Then [...] means match any of the characters in square brackets, but the ^ in [^...] changes it to match any character OTHER THAN what is in the square brackets. 0-9 means all characters between 0 and 9; so [^0-9] means match any character(s) OTHER THAN digits, and the * after that means "any number of such characters" (between 0 and everything in the input string). \s is "blank space" - if there are any blank spaces following a possible number in the address, you don't want them included right at the beginning of the street name. The subexpression is just [^0-9]* meaning the non-digits, not including any spaces before them (because the \s* is outside the left parenthesis).
My example illustrates a potential problem though - sometimes an address does, in fact, have a "number" in it, but spelled out as a word instead of using digits. What I show is in fact a real-life address in my town.
Good luck!
looking for the last occurrence of a digit, and then pulling the proceeding text as the street name
You could simply do:
SELECT REGEXP_REPLACE( address, '^(.*)\d+\D*$', '\1' )
AS street_name
FROM address_table;

Comparing fields when a field has data in between 2 characters that match the field being compared

I have code that looks like this:
left outer join
gme_batch_header bh
on
substr(ln.lot_number,instr(ln.lot_number,'(') + 1,
instr(ln.lot_number,')') - instr(ln.lot_number,'(') - 1)
=
bh.batch_no
It works fine, but I have come across a few lot numbers that have two sections of strings that are between parenthesis. How would I compare what is between the second set of parenthesis? Here is an example of the data in the lot number field:
E142059-307-SCRAP-(74055)
This one works with the code,
58LF-3-B-2-2-2 (SCRAP)-(61448)
This one tries comparing SCRAP with the batch no, which isn't correct. It needs to be the 61448.
The result is always the last item in parenthesis.
After more research, I actually got it to work with this code:
substr(ln.lot_number,instr(ln.lot_number,'(',-1) + 1, instr(ln.lot_number,')',-1) - instr(ln.lot_number,'(',-1) - 1)
Assuming SQL2005+, and it is always the last occurrence you want, then I would suggest finding the last instance of a ( in your query and substring to there. To get the last instance you could use something like:
REVERSE(SUBSTRING(REVERSE(lot_number),0,CHARINDEX('(',REVERSE(lot_number))))
If your version of Oracle supports regular expressions try this:
substr(regexp_substr(ln.lot_number,'[0-9]+\)$'),1,length(regexp_substr(ln.lot_number,'[0-9]+\)$'))-1)
Explanation:
regexp_substr(scrap_row,'[0-9]+\)$' ==> find me just numbers in the string that ends in ). This returns the numbers but it includes the closing parenthesis.
To remove the closing parenthsis, just send it through substring and extract first number through the length of the number stopping at 1 character from the end of the string.
Query for analysis:
with scrap
as (select '58LF-3-B-2-2-2 (SCRAP)-(61448)' as scrap_row from dual)
select scrap_row,
regexp_substr(scrap_row,'[0-9]+\)$') as regex_substring,
length(regexp_substr(scrap_row,'[0-9]+\)$')) as length_regex_substring,
substr(regexp_substr(scrap_row,'[0-9]+\)$'),1,length(regexp_substr(scrap_row,'[0-9]+\)$'))-1) as regex_sans_parenthesis
from scrap
If you have 11g, this will do it pretty simply by using the subgroup argument of regexp_substr() and constructing the regex appropriately:
SQL> with tbl(data) as
(
select 'E142059-307-SCRAP-(74055)' from dual
union
select '58LF-3-B-2-2-2 (SCRAP)-(61448)' from dual
)
select data from tbl
where regexp_substr(data, '\((\d+)\)$', 1, 1, NULL, 1)
= '61448';
DATA
------------------------------
58LF-3-B-2-2-2 (SCRAP)-(61448)
The regular expression can be read as:
\( - Search for a literal left paren
( - Start a remembered subgroup
\d+ - followed by 1 more more digits
) - End remembered subgroup
\) - followed by a literal right paren
$ - at the end of the line.
The regexp_substr function arguments are:
Source - the source string
Pattern - The regex pattern to look for
position - Position in the string to start looking for the pattern
occurrence - If the pattern occurs multiple times, which occurrence you want
match_params - See the docs, not used here
subexpression - which subexpression to use (the remembered group)
So in English, look for a series of 1 or more digits surrounded by parens, where it occurs at the end of the line and save the digit part only to use to compare. IMHO a lot easier to follow/maintain than nested instr(), substr().
For re-useability, make a function called get_last_number_in_parens() that contains this code and uses an argument of the string to search. This way that logic is encapsulated and can be re-used by folks that may not be so comfortable with regular expressions, but can benefit from the power! One place to maintain code too. Then call like this:
select data from tbl
where get_last_number_in_parens(data) = '61448';
How easy is that?!
Hello you can check with this code. It works whaever the condition may be
SELECT SUBSTR('58LF-3-B-2-2-2-(61448)',instr('58LF-3-B-2-2-2-(61448)','(',-1)+1,LENGTH('58LF-3-B-2-2-2-(61448)')-instr('58LF-3-B-2-2-2-(61448)','(',-1)-1)
FROM dual;
SELECT SUBSTR('58LF-3-B-2-2-2 (SCRAP)-(61448)',instr('58LF-3-B-2-2-2 (SCRAP)-(61448)','(',-1)+1,LENGTH('58LF-3-B-2-2-2 (SCRAP)-(61448)')-instr('58LF-3-B-2-2-2 (SCRAP)-(61448)','(',-1)-1)
FROM dual;
Output
==================================
61448
==================================