Regular expression - capture number between underscores within a sequence between commas - sql

I have a field in a database table in the format:
111_2222_33333,222_444_3,aaa_bbb_ccc
This is format is uniform to the entire field. Three underscore separated numeric values, a comma, three more underscore separated numeric values, another comma and then three underscore separated text values. No spaces in between
I want to extract the middle value from the second numeric sequence, in the example above I want to get 444
In a SQL query I inherited, the regex used is ^.,(\d+)_.$ but this doesn't seem to do anything.
I've tried to identify the first comma, first number after and the following underscore ,222_ to use as a starting point and from there get the next number without the _ after it
This (,\d*_)(\d+[^_]) selects ,222_444 and is the closest I've gotten

We can try using REGEXP_REPLACE with a capture group:
SELECT
REGEXP_REPLACE(
'111_2222_33333,222_444_3,aaa_bbb_ccc',
'^[^,]+,[^_]+_(.*?)_[^_]+,.*$',
'\1') AS num
FROM yourTable;
Here is a demo showing that the above regex' first capture group contains the quantity you want.
Demo

Related

Add a character in a string at certain location based on logic in SQL Server

I have comma separated data like this in one of the column
48FGTG,100ERTD,18NH,07EWR,9FDC,2POANAR,100GTEDC
46FGTG,78ERTD,67NH,76EWR,3FDC
The numbers in the starting is percentage, whatever comes after the first alphabetic character is percentage, it varies from 0-100.
I have to update the data like
48% FGTG,100% ERTD,18% NH,07% EWR,9% FDC,2% POANAR,100% GTEDC
46% FGTG,78% ERTD,67% NH,76% EWR,3% FDC
I can filter out the percentile in regex, but not sure using it in SQL. Any lead would be helpful.
You can do it like
select STRING_AGG(substring(value,0,PATINDEX('%[^0-9]%',value))+'%'+substring(value,PATINDEX('%[^0-9]%',value),len(value)),',') from string_split('48FGTG,100ERTD,18NH,07EWR,9FDC,2POANAR,100GTEDC
46FGTG,78ERTD,67NH,76EWR,3FDC',',')
Here's what I have done
1.Use PATINDEX to find the first occurrence of character
2.Use substring function to extract the first number and then remaining string
3.Use STRING_AGG to concatenates the values of string expressions and places separator values between them

How can i find two extra characters in DB2 and list down those in a column?

I have written this expression for checking extra characters and I am counting the occurrence of those extra characters.
REGEXP_COUNT('Mr.John® Êlite', regexp_extract ('Mr.John® Êlite','[^\x00-\x7F]'))
It's working fine if the string has only one extra character e.g
Mr. John®
It will take out ® and give me count as 1.
But if my string has two extra characters, it will only pick the first one and ignore the second character e.g
Mr.John® Êlite
My function will extract ® and ignore Ê.
I have tried subquery as well.Not working.Need help
As noted by Wiktor Stribiżew REGEXP_COUNT needs just a source string and regexp:
db2 "values REGEXP_COUNT('Mr.John® Êlite', '[^\x00-\x7F]')"
1
-----------
2
Because you used REGEXP_EXTRACT, it does extract the first occurrence only:
The REGEXP_EXTRACT scalar function returns one occurrence of a substring of a string that matches the regular expression pattern.
and only then you do actual count.

Extract a number from comma separated string using regular expressions in oracle sql

I am trying to fetch a number which starts with 628 in a comma separated string.
Below is what I am using:
SELECT
REGEXP_REPLACE(REGEXP_SUBSTR('62810,5152,,', ',?628[[:alnum:]]+,?'),',','') first,
REGEXP_REPLACE(REGEXP_SUBSTR('5152,62810,,', ',?628[[:alnum:]]+,?'),',','') second,
REGEXP_REPLACE(REGEXP_SUBSTR('5152,562810,,', ',?628[[:alnum:]]+,?'),',','') third,
REGEXP_REPLACE(REGEXP_SUBSTR(',5152,,62810', ',?(628[[:alnum:]]+),?'),',','') fourth
FROM DUAL;
Its working but in one case it fails which is the third column where number is 562810. Actually I am expecting NULL in the third column.
Actual output from above query is:
"FIRST","SECOND","THIRD","FOURTH"
"62810","62810","62810","62810"
Not sure why you are using [[:alnum::]]. You could use matching group to extract the number starting with 628 or followed by a comma. REPLACE may be avoided this way
If you have alphabets as well, modify the 2nd match group () accordingly.
SELECT
REGEXP_SUBSTR('62810,5152,,' , '(^|,)(628\d*)',1,1,NULL,2) first,
REGEXP_SUBSTR('5152,62810,,' , '(^|,)(628\d*)',1,1,NULL,2) second,
REGEXP_SUBSTR('5152,562810,,', '(^|,)(628\d*)',1,1,NULL,2) third,
REGEXP_SUBSTR(',5152,,62810' , '(^|,)(628\d*)',1,1,NULL,2) fourth
FROM DUAL;
Demo
The problem with your regex logic is that you are searching for an optional comma before the numbers 628. This means that any number having 628 anywhere would match. Instead, you can phrase this by looking for 628 which is either preceded by either a comma, or the start of the string.
SELECT
REGEXP_REPLACE(REGEXP_SUBSTR('62810,5152,,', '(,|^)628[[:alnum:]]+,?'),',','') first,
REGEXP_REPLACE(REGEXP_SUBSTR('5152,62810,,', '(,|^)628[[:alnum:]]+,?'),',','') second,
REGEXP_REPLACE(REGEXP_SUBSTR('5152,562810,,', '(,|^)628[[:alnum:]]+,?'),',','') third,
REGEXP_REPLACE(REGEXP_SUBSTR(',5152,,62810', '(,|^)(628[[:alnum:]]+),?'),',','') fourth
FROM DUAL
Demo
The ideal pattern we'd like to use here is \b628.*, or something along these lines. But Oracle's regex functions do not appear to support word boundaries, hence we can use (^|,)628.* as an alternative.

How can I extract a substring from a character column without using SUBSTR()?

I have a questions regarding below data.
You clearly can see each EMP_IDENTIFIER has connected with EMP_ID.
So I need to pull only identifier which is 10 characters that will insert another column.
How would I do that?
I did some traditional way, using INSTR, SUBSTR.
I just want to know is there any other way to do it but not using INSTR, SUBSTR.
EMP_ID(VARCHAR2)EMP_IDENTIFIER(VARCHAR2)
62049 62049-2162400111
6394 6394-1368000222
64473 64473-1814702333
61598 61598-0876000444
57452 57452-0336503555
5842 5842-0000070666
75778 75778-0955501777
76021 76021-0546004888
76274 76274-0000454999
73910 73910-0574500122
I am using Oracle 11g.
If you want the second part of the identifier and it is always 10 characters:
select t.*, substr(emp_identifier, -10) as secondpart
from t;
Here is one way:
REGEXP_SUBSTR (EMP_IDENTIFIER, '-(.{10})',1,1,null,1)
That will give the 1st 10 character string that follows a dash ("-") in your string. Thanks to mathguy for the improvement.
Beyond that, you'll have to provide more details on the exact logic for picking out the identifier you want.
Since apparently this is for learning purposes... let's say the assignment was more complicated. Let's say you had a longer input string, and it had several groups separated by -, and the groups could include letters and digits. You know there are at least two groups that are "digits only" and you need to grab the second such "purely numeric" group. Then something like this will work (and there will not be an instr/substr solution):
select regexp_substr(input_str, '(-|^)(\d+)(-|$)', 1, 2, null, 2) from ....
This searches the input string for one or more digits ( \d means any digit, + means one or more occurrences) between a - or the beginning of the string (^ means beginning of the string; (a|b) means match a OR b) and a - or the end of the string ($ means end of the string). It starts searching at the first character (the second argument of the function is 1); it looks for the second occurrence (the argument 2); it doesn't do any special matching such as ignore case (the argument "null" to the function), and when the match is found, return the fragment of the match pattern included in the second set of parentheses (the last argument, 2, to the regexp function). The second fragment is the \d+ - the sequence of digits, without the leading and/or trailing dash -.
This solution will work in your example too, it's just overkill. It will find the right "digits-only" group in something like AS23302-ATX-20032-33900293-CWV20-3499-RA; it will return the second numeric group, 33900293.

Display certain sequence only in VARCHAR

I have a column error_desc with values like:
Failure occurred in (Class::Method) xxxxCalcModule::endCustomer. Fan id 111232 is not Effective or not present in BL9_XXXXX for date 20160XXX.
What SQL query can I use to display only the number 111232 from that column? The number is placed at 66th position in VARCHAR column and ends 71st.
SELECT substr(ERROR_DESC,66,6) as ABC FROM bl1_cycle_errors where error_desc like '%FAN%'
This solution uses regular expressions.
The challenge I faced was on pulling out alphanumerics. We have to retain only numbers and filter out string,alphanumerics or punctuations in this case, to detect the standalone number.
Pure strings and words not containing numbers can be easily filtered out using
[^[:digit:]]
Possible combinations of alphanumerics are :
1.Begins with a character, contains numbers, may end with characters or punctuations :
[a-zA-Z]+[0-9]+[[:punct:]]*[a-zA-Z]*[[:punct:]]*
2.Begins with numbers and then contains alphabets,may contain punctuations :
[0-9]+[[:punct:]]*[a-zA-Z]+[[:punct:]]*
Begins with numbers then contains punctuations,may contain alphabets :
-- [0-9]+[a-zA-Z][[:punct:]]+[a-zA-Z] --Not able to highlight as code, refer solution's last regex combination
Combining these regular expressions using | operator we get:
select trim(REGEXP_REPLACE(error_desc,'[^[:digit:]]|[a-zA-Z]+[0-9]+[[:punct:]]*[a-zA-Z]*[[:punct:]]*|[0-9]+[[:punct:]]*[a-zA-Z]+[[:punct:]]*|[0-9]+[a-zA-Z]*[[:punct:]]+[a-zA-Z]*',' '))
from error_table;
Will work in most cases.