Oracle REGEXP_SUBSTR to ignore the first ocurrence of a character but include the 2nd occurence - sql

I have a string that has this format "number - name" I'm using REGEXP_SUBSTR to split it in two separate columns one for name and one for number.
SELECT
REGEXP_SUBSTR('123 - ABC','[^-]+',1,1) AS NUM,
REGEXP_SUBSTR('123 - ABC','[^-]+',1,2) AS NAME
from dual;
But it doesn't work if the name includes a hyphen for example: ABC-Corp then the name is shown only like 'ABC' instead of 'ABC-Corp'. How can I get a regex exp to ignore everything before the first hypen and include everything after it?

You want to split the string on the first occurence of ' - '. It is a simple enough task to be efficiently performed by string functions rather than regexes:
select
substr(mycol, 1, instr(mycol, ' - ') - 1) num,
substr(mycol, instr(mycol, ' - ') + 3) name
from mytable
Demo on DB Fiddlde:
with mytable as (
select '123 - ABC' mycol from dual
union all select '123 - ABC - Corp' from dual
)
select
mycol,
substr(mycol, 1, instr(mycol, ' - ') - 1) num,
substr(mycol, instr(mycol, ' - ') + 3) name
from mytable
MYCOL | NUM | NAME
:--------------- | :-- | :---------
123 - ABC | 123 | ABC
123 - ABC - Corp | 123 | ABC - Corp

NB: #GMB solution is much better in your simple case. It's an overkill to use regular expressions for that.
tldr;
Usually it's easierr and more readable to use subexpr parameter instead of occurrence in case of such fixed masks. So you can specify full mask: \d+\s*-\s*\S+
ie numbers, then 0 or more whitespace chars, then -, again 0 or more whitespace chars and 1+ non-whitespace characters.
Then we adding () to specify subexpressions: since we need only numbers and trailing non-whitespace characters we puts them into ():
'(\d+)\s*-\s*(\S+)'
Then we just specify which subexpression we need, 1 or 2:
SELECT
REGEXP_SUBSTR(column_value,'(\d+)\s*-\s*(\S+)',1,1,null,1) AS NUM,
REGEXP_SUBSTR(column_value,'(\d+)\s*-\s*(\S+)',1,1,null,2) AS NAME
from table(sys.odcivarchar2list('123 - ABC', '123 - ABC-Corp'));
Result:
NUM NAME
---------- ----------
123 ABC
123 ABC-Corp
https://docs.oracle.com/database/121/SQLRF/functions164.htm#SQLRF06303
https://docs.oracle.com/database/121/SQLRF/ap_posix003.htm#SQLRF55544

Related

Extracting certain text between two characters

I have a nvarchar string from which I need to extract certain text from between characters.
Example: 1.abc.5,m001-1-Exit,822-FName-18001233321--2021-09-23 13:53:10 Thursday-m001-1-Exit-Swipe,Card NO: 822User ID: FNameName: 18001233321Dept: Read Date: 2021-09-23 13:53:10 ThursdayAddr: m001-1-ExitStatus: Swipe,07580ec2000002a52E917D0000000000372BA56E11010000
What I need:
| Name | Phone Number |
| -------- | -------------- |
| FName | 1800123321 |
My Attempt:
SELECT SUBSTRING(col, LEN(LEFT(col, CHARINDEX ('-', col))) + 1, LEN(col) - LEN(LEFT(col, CHARINDEX ('-', col))) - LEN(RIGHT(col, LEN(col) - CHARINDEX ('-', col))) - 1);
One way:
Use patindex to find "FName-"
Remove the start of the string up until and including "FName-"
Use patindex to find "--"
Remove the rest of the string from and including "--"
You can consolidate the query down to one line, but you'll find yourself repeating parts of the logic - which I like to avoid. And calculating one thing at a time makes it easier to debug.
select
A.Col
, B.StringStart
, C.NewString
, patindex('%--%',C.NewString) NewStringEnd
, substring(C.NewString,1,patindex('%--%',C.NewString)-1) -- <- Required Result
from (
values
(N'1.abc.5,m001-1-Exit,822-FName-18001233321--2021-09-23 13:53:10 Thursday-m001-1-Exit-Swipe,Card NO: 822User ID: FNameName: 18001233321Dept: Read Date: 2021-09-23 13:53:10 ThursdayAddr: m001-1-ExitStatus: Swipe,07580ec2000002a52E917D0000000000372BA56E11010000')
) A (Col)
cross apply (
values
(patindex('%FName-%',Col))
) B (StringStart)
cross apply (
values
(substring(A.Col,B.StringStart+6,len(A.Col)-B.StringStart-6))
) C (NewString);

PostgreSQL - Extract string before ending delimiter

I have a column of data that looks like this:
58,0:102,56.00
52,0:58,68
58,110
57,440.00
52,0:58,0:106,6105.95
I need to extract the character before the last delimiter (',').
Using the data above, I want to get:
102
58
58
57
106
Might be done with a regular expression in substring(). If you want:
the longest string of only digits before the last comma:
substring(data, '(\d+)\,[^,]*$')
Or you may want:
the string before the last comma (',') that's delimited at the start either by a colon (':') or the start of the string.
Could be another regexp:
substring(data, '([^:]*)\,[^,]*$')
Or this:
reverse(split_part(split_part(reverse(data), ',', 2), ':', 1))
More verbose but typically much faster than a (expensive) regular expression.
db<>fiddle here
Can't promise this is the best way to do it, but it is a way to do it:
with splits as (
select string_to_array(bar, ',') as bar_array
from foo
),
second_to_last as (
select
bar_array[cardinality(bar_array)-1] as field
from splits
)
select
field,
case
when field like '%:%' then split_part (field, ':', 2)
else field
end as last_item
from second_to_last
I went a little overkill on the CTEs, but that was to expose the logic a little better.
With a CTE that removes everything after the last comma and then splits the rest into an array:
with cte as (
select
regexp_split_to_array(
replace(left(col, length(col) - position(',' in reverse(col))), ':', ','),
','
) arr
from tablename
)
select arr[array_upper(arr, 1)] from cte
See the demo.
Results:
| result |
| ------ |
| 102 |
| 58 |
| 58 |
| 57 |
| 106 |
The following treats the source string as an "array of arrays". It seems each data element can be defined as S(x,y) and the overall string as S1:S2:...Sn.
The task then becomes to extract x from Sn.
with as_array as
( select string_to_array(S[n], ',') Sn
from (select string_to_array(col,':') S
, length(regexp_replace(col, '[^:]','','g'))+1 n
from tablename
) t
)
select Sn[array_length(Sn,1)-1] from as_array
The above extends S(x,y) to S(a,b,...,x,y) the task remains to extracting x from Sn. If it is the case that all original sub-strings S are formatted S(x,y) then the last select reduces to select Sn[1]

How to extract the number from a string using Oracle?

I have a string as follows: first, last (123456) the expected result should be 123456. Could someone help me in which direction should I proceed using Oracle?
It will depend on the actual pattern you care about (I assume "first" and "last" aren't literal hard-coded strings), but you will probably want to use regexp_substr.
For example, this matches anything between two brackets (which will work for your example), but you might need more sophisticated criteria if your actual examples have multiple brackets or something.
SELECT regexp_substr(COLUMN_NAME, '\(([^\)]*)\)', 1, 1, 'i', 1)
FROM TABLE_NAME
Your question is ambiguous and needs clarification. Based on your comment it appears you want to select the six digits after the left bracket. You can use the Oracle instr function to find the position of a character in a string, and then feed that into the substr to select your text.
select substr(mycol, instr(mycol, '(') + 1, 6) from mytable
Or if there are a varying number of digits between the brackets:
select substr(mycol, instr(mycol, '(') + 1, instr(mycol, ')') - instr(mycol, '(') - 1) from mytable
Find the last ( and get the sub-string after without the trailing ) and convert that to a number:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE test ( str ) AS
SELECT 'first, last (123456)' FROM DUAL UNION ALL
SELECT 'john, doe (jr) (987654321)' FROM DUAL;
Query 1:
SELECT TO_NUMBER(
TRIM(
TRAILING ')' FROM
SUBSTR(
str,
INSTR( str, '(', -1 ) + 1
)
)
) AS value
FROM test
Results:
| VALUE |
|-----------|
| 123456 |
| 987654321 |

How to get string after character oracle

I have VP3 - Art & Design and HS5 - Health & Social Care, I need to get string after '-' in Oracle. Can this be achieved using substring?
For a string operation as simple as this, I might just use the base INSTR() and SUBSTR() functions. In the query below, we take the substring of your column beginning at two positions after the hyphen.
SELECT
SUBSTR(col, INSTR(col, '-') + 2) AS subject
FROM yourTable
We could also use REGEXP_SUBSTR() here (see Gordon's answer), but it would be a bit more complex and the performance might not be as good as the above query.
You can use regexp_substr():
select regexp_substr(col, '[^-]+', 1, 2)
If you want to remove an optional space, then you can use trim():
select trim(leading ' ', regexp_substr(col, '[^-]+', 1, 2))
The non-ovious parameters mean
1 -- search from the first character of the source. 1 is the default, but you have to set it anyway to be able to provide the second parameter.
2 -- take the second match as the result substring. the default would be 1.
You can use:
SELECT CASE
WHEN INSTR(value, '-') > 0
THEN SUBSTR(value, INSTR(value, '-') + 1)
END AS subject
FROM table_name
or
SELECT REGEXP_SUBSTR( value, '-(.*)$', 1, 1, NULL, 1 ) AS subject
FROM table_name
Which, for the sample data:
CREATE TABLE table_name ( value ) AS
SELECT 'VP3 - Art & Design and HS5 - Health & Social Care' FROM DUAL UNION ALL
SELECT '1-2-3' FROM DUAL UNION ALL
SELECT '123456' FROM DUAL
Both output:
| SUBJECT |
| :------------------------------------------- |
| Art & Design and HS5 - Health & Social Care |
| 2-3 |
| null |
Trimming leading white-space:
If you want to trim the leading white-space then you can use:
SELECT CASE
WHEN INSTR(value, '-') > 0
THEN LTRIM(SUBSTR(value, INSTR(value, '-') + 1))
END AS subject
FROM table_name
or
SELECT REGEXP_SUBSTR( value, '-\s*(.*)$', 1, 1, NULL, 1 ) AS subject
FROM table_name
Which both output:
| SUBJECT |
| :------------------------------------------ |
| Art & Design and HS5 - Health & Social Care |
| 2-3 |
| null |
Why the naive solutions don't always work:
SELECT SUBSTR(value, INSTR(value, '-') + 2) AS subject
FROM table_name
Does not work in 2 cases:
It finds the index of the - character and then skips 2 characters (the - character and then the assumed white-space character); if the second character is not a white-space character then it will miss the first character of the substring (i.e. if the input is 1-2-3 then the output would be -3 rather than 2-3).
It assumes that there will always be a - character in the string; if this is not the case then it will erroneously return the substring starting from the second character rather than returning NULL (i.e. if the input is 123456 then the output is 23456 rather than NULL).
Using the regular expression:
SELECT REGEXP_SUBSTR(value, '[^-]+', 1, 2)
FROM table_name
Does not find the substring after the 1st - character; it will find the substring between the 1st and 2nd - characters and strip any characters outside that range (inclusive of the - characters). So if the input is VP3 - Art & Design and HS5 - Health & Social Care then the output is Art & Design and HS5 rather than the expected Art & Design and HS5 - Health & Social Care.

Split string by space and character as delimiter in Oracle with regexp_substr

I'm trying to split a string with regexp_subtr, but i can't make it work.
So, first, i have this query
select regexp_substr('Helloworld - test!' ,'[[:space:]]-[[:space:]]') from dual
which very nicely extracts my delimiter - blank-blank
But then, when i try to split the string with this option, it just doesn't work.
select regexp_substr('Helloworld - test!' ,'[^[[:space:]]-[[:space:]]]+')from dual
The query returns nothing.
Help will be much appreciated!
Thanks
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TEST( str ) AS
SELECT 'Hello world - test-test! - test' FROM DUAL
UNION ALL SELECT 'Hello world2 - test2 - test-test2' FROM DUAL;
Query 1:
SELECT Str,
COLUMN_VALUE AS Occurrence,
REGEXP_SUBSTR( str ,'(.*?)([[:space:]]-[[:space:]]|$)', 1, COLUMN_VALUE, NULL, 1 ) AS split_value
FROM TEST,
TABLE(
CAST(
MULTISET(
SELECT LEVEL
FROM DUAL
CONNECT BY LEVEL < REGEXP_COUNT( str ,'(.*?)([[:space:]]-[[:space:]]|$)' )
)
AS SYS.ODCINUMBERLIST
)
)
Results:
| STR | OCCURRENCE | SPLIT_VALUE |
|-----------------------------------|------------|--------------|
| Hello world - test-test! - test | 1 | Hello world |
| Hello world - test-test! - test | 2 | test-test! |
| Hello world - test-test! - test | 3 | test |
| Hello world2 - test2 - test-test2 | 1 | Hello world2 |
| Hello world2 - test2 - test-test2 | 2 | test2 |
| Hello world2 - test2 - test-test2 | 3 | test-test2 |
If i understood correctly, this will help you. Currently you are getting output as Helloworld(with space at the end). So i assume u don't want to have space at the end. If so you can simply use the space in the delimiter also like.
select regexp_substr('Helloworld - test!' ,'[^ - ]+',1,1)from dual;
OUTPUT
Helloworld(No space at the end)
As u mentioned in ur comment if u want two columns output with Helloworld and test!. you can do the following.
select regexp_substr('Helloworld - test!' ,'[^ - ]+',1,1),
regexp_substr('Helloworld - test!' ,'[^ - ]+',1,3) from dual;
OUTPUT
col1 col2
Helloworld test!
Trying to negate the match string '[[:space:]]-[[:space:]]' by putting it in a character class with a circumflex (^) to negate it will not work. Everything between a pair of square brackets is treated as a list of optional single characters except for named named character classes which expand out to a list of optional characters, however, due to the way character classes nest, it's very likely that your outer brackets are being interpreted as follows:
[^[[:space:]] A single non space non left square bracket character
- followed by a single hyphen
[[:space:]] followed by a single space character
]+ followed by 1 or more closing square brackets.
It may be easier to convert your multi-character separator to a single character with regexp_replace, then use regex_substr to find you individual pieces:
select regexp_substr(regexp_replace('Helloworld - test!'
,'[[:space:]]-[[:space:]]'
,chr(11))
,'([^'||chr(11)||']*)('||chr(11)||'|$)'
,1 -- Start here
,2 -- return 1st, 2nd, 3rd, etc. match
,null
,1 -- return 1st sub exp
)
from dual;
In this code I first changed - to chr(11). That's the ASCII vertical tab (VT) character which is unlikely to appear in most text strings. Then the match expression of the regexp_substr matches all non VT characters followed by either a VT character or the end of line. Only the non VT characters are returned (the first subexpression).
Slight improvement on MT0's answer. Dynamic count using regexp_count and proves it handles nulls where the format of [^delimiter]+ as a pattern does NOT handle NULL list elements. More info on that here: Split comma seperated values to columns
SQL> with tbl(str) as (
2 select ' - Hello world - test-test! - - test - ' from dual
3 )
4 SELECT LEVEL AS Occurrence,
5 REGEXP_SUBSTR( str ,'(.*?)([[:space:]]-[[:space:]]|$)', 1, LEVEL, NULL, 1 ) AS split_value
6 FROM tbl
7 CONNECT BY LEVEL <= regexp_count(str, '[[:space:]]-[[:space:]]')+1;
OCCURRENCE SPLIT_VALUE
---------- ----------------------------------------
1
2 Hello world
3 test-test!
4
5 test
6
6 rows selected.
SQL>
CREATE OR REPLACE FUNCTION field(i_string VARCHAR2
,i_delimiter VARCHAR2
,i_occurance NUMBER
,i_return_number NUMBER DEFAULT 0
,i_replace_delimiter VARCHAR2) RETURN VARCHAR2 IS
-----------------------------------------------------------------------
-- Function Name.......: FIELD
-- Author..............: Dan Simson
-- Date................: 05/06/2016
-- Description.........: This function is similar to the one I used from
-- long ago by Prime Computer. You can easily
-- parse a delimited string.
-- Example.............:
-- String.............: This is a cool function
-- Delimiter..........: ' '
-- Occurance..........: 2
-- Return Number......: 3
-- Replace Delimiter..: '/'
-- Return Value.......: is/a/cool
-------------------------------------------------------------------------- ---
v_return_string VARCHAR2(32767);
n_start NUMBER := i_occurance;
v_delimiter VARCHAR2(1);
n_return_number NUMBER := i_return_number;
n_max_delimiters NUMBER := regexp_count(i_string, i_delimiter);
BEGIN
IF i_return_number > n_max_delimiters THEN
n_return_number := n_max_delimiters + 1;
END IF;
FOR a IN 1 .. n_return_number LOOP
v_return_string := v_return_string || v_delimiter || regexp_substr (i_string, '[^' || i_delimiter || ']+', 1, n_start);
n_start := n_start + 1;
v_delimiter := nvl(i_replace_delimiter, i_delimiter);
END LOOP;
RETURN(v_return_string);
END field;
SELECT field('This is a cool function',' ',2,3,'/') FROM dual;
SELECT regexp_substr('This is a cool function', '[^ ]+', 1, 1) Word1
,regexp_substr('This is a cool function', '[^ ]+', 1, 2) Word2
,regexp_substr('This is a cool function', '[^ ]+', 1, 3) Word3
,regexp_substr('This is a cool function', '[^ ]+', 1, 4) Word4
,regexp_substr('This is a cool function', '[^ ]+', 1, 5) Word5
FROM dual;