Split string by space and character as delimiter in Oracle with regexp_substr - sql

I'm trying to split a string with regexp_subtr, but i can't make it work.
So, first, i have this query
select regexp_substr('Helloworld - test!' ,'[[:space:]]-[[:space:]]') from dual
which very nicely extracts my delimiter - blank-blank
But then, when i try to split the string with this option, it just doesn't work.
select regexp_substr('Helloworld - test!' ,'[^[[:space:]]-[[:space:]]]+')from dual
The query returns nothing.
Help will be much appreciated!
Thanks

SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TEST( str ) AS
SELECT 'Hello world - test-test! - test' FROM DUAL
UNION ALL SELECT 'Hello world2 - test2 - test-test2' FROM DUAL;
Query 1:
SELECT Str,
COLUMN_VALUE AS Occurrence,
REGEXP_SUBSTR( str ,'(.*?)([[:space:]]-[[:space:]]|$)', 1, COLUMN_VALUE, NULL, 1 ) AS split_value
FROM TEST,
TABLE(
CAST(
MULTISET(
SELECT LEVEL
FROM DUAL
CONNECT BY LEVEL < REGEXP_COUNT( str ,'(.*?)([[:space:]]-[[:space:]]|$)' )
)
AS SYS.ODCINUMBERLIST
)
)
Results:
| STR | OCCURRENCE | SPLIT_VALUE |
|-----------------------------------|------------|--------------|
| Hello world - test-test! - test | 1 | Hello world |
| Hello world - test-test! - test | 2 | test-test! |
| Hello world - test-test! - test | 3 | test |
| Hello world2 - test2 - test-test2 | 1 | Hello world2 |
| Hello world2 - test2 - test-test2 | 2 | test2 |
| Hello world2 - test2 - test-test2 | 3 | test-test2 |

If i understood correctly, this will help you. Currently you are getting output as Helloworld(with space at the end). So i assume u don't want to have space at the end. If so you can simply use the space in the delimiter also like.
select regexp_substr('Helloworld - test!' ,'[^ - ]+',1,1)from dual;
OUTPUT
Helloworld(No space at the end)
As u mentioned in ur comment if u want two columns output with Helloworld and test!. you can do the following.
select regexp_substr('Helloworld - test!' ,'[^ - ]+',1,1),
regexp_substr('Helloworld - test!' ,'[^ - ]+',1,3) from dual;
OUTPUT
col1 col2
Helloworld test!

Trying to negate the match string '[[:space:]]-[[:space:]]' by putting it in a character class with a circumflex (^) to negate it will not work. Everything between a pair of square brackets is treated as a list of optional single characters except for named named character classes which expand out to a list of optional characters, however, due to the way character classes nest, it's very likely that your outer brackets are being interpreted as follows:
[^[[:space:]] A single non space non left square bracket character
- followed by a single hyphen
[[:space:]] followed by a single space character
]+ followed by 1 or more closing square brackets.
It may be easier to convert your multi-character separator to a single character with regexp_replace, then use regex_substr to find you individual pieces:
select regexp_substr(regexp_replace('Helloworld - test!'
,'[[:space:]]-[[:space:]]'
,chr(11))
,'([^'||chr(11)||']*)('||chr(11)||'|$)'
,1 -- Start here
,2 -- return 1st, 2nd, 3rd, etc. match
,null
,1 -- return 1st sub exp
)
from dual;
In this code I first changed - to chr(11). That's the ASCII vertical tab (VT) character which is unlikely to appear in most text strings. Then the match expression of the regexp_substr matches all non VT characters followed by either a VT character or the end of line. Only the non VT characters are returned (the first subexpression).

Slight improvement on MT0's answer. Dynamic count using regexp_count and proves it handles nulls where the format of [^delimiter]+ as a pattern does NOT handle NULL list elements. More info on that here: Split comma seperated values to columns
SQL> with tbl(str) as (
2 select ' - Hello world - test-test! - - test - ' from dual
3 )
4 SELECT LEVEL AS Occurrence,
5 REGEXP_SUBSTR( str ,'(.*?)([[:space:]]-[[:space:]]|$)', 1, LEVEL, NULL, 1 ) AS split_value
6 FROM tbl
7 CONNECT BY LEVEL <= regexp_count(str, '[[:space:]]-[[:space:]]')+1;
OCCURRENCE SPLIT_VALUE
---------- ----------------------------------------
1
2 Hello world
3 test-test!
4
5 test
6
6 rows selected.
SQL>

CREATE OR REPLACE FUNCTION field(i_string VARCHAR2
,i_delimiter VARCHAR2
,i_occurance NUMBER
,i_return_number NUMBER DEFAULT 0
,i_replace_delimiter VARCHAR2) RETURN VARCHAR2 IS
-----------------------------------------------------------------------
-- Function Name.......: FIELD
-- Author..............: Dan Simson
-- Date................: 05/06/2016
-- Description.........: This function is similar to the one I used from
-- long ago by Prime Computer. You can easily
-- parse a delimited string.
-- Example.............:
-- String.............: This is a cool function
-- Delimiter..........: ' '
-- Occurance..........: 2
-- Return Number......: 3
-- Replace Delimiter..: '/'
-- Return Value.......: is/a/cool
-------------------------------------------------------------------------- ---
v_return_string VARCHAR2(32767);
n_start NUMBER := i_occurance;
v_delimiter VARCHAR2(1);
n_return_number NUMBER := i_return_number;
n_max_delimiters NUMBER := regexp_count(i_string, i_delimiter);
BEGIN
IF i_return_number > n_max_delimiters THEN
n_return_number := n_max_delimiters + 1;
END IF;
FOR a IN 1 .. n_return_number LOOP
v_return_string := v_return_string || v_delimiter || regexp_substr (i_string, '[^' || i_delimiter || ']+', 1, n_start);
n_start := n_start + 1;
v_delimiter := nvl(i_replace_delimiter, i_delimiter);
END LOOP;
RETURN(v_return_string);
END field;
SELECT field('This is a cool function',' ',2,3,'/') FROM dual;
SELECT regexp_substr('This is a cool function', '[^ ]+', 1, 1) Word1
,regexp_substr('This is a cool function', '[^ ]+', 1, 2) Word2
,regexp_substr('This is a cool function', '[^ ]+', 1, 3) Word3
,regexp_substr('This is a cool function', '[^ ]+', 1, 4) Word4
,regexp_substr('This is a cool function', '[^ ]+', 1, 5) Word5
FROM dual;

Related

Oracle REGEXP_SUBSTR to ignore the first ocurrence of a character but include the 2nd occurence

I have a string that has this format "number - name" I'm using REGEXP_SUBSTR to split it in two separate columns one for name and one for number.
SELECT
REGEXP_SUBSTR('123 - ABC','[^-]+',1,1) AS NUM,
REGEXP_SUBSTR('123 - ABC','[^-]+',1,2) AS NAME
from dual;
But it doesn't work if the name includes a hyphen for example: ABC-Corp then the name is shown only like 'ABC' instead of 'ABC-Corp'. How can I get a regex exp to ignore everything before the first hypen and include everything after it?
You want to split the string on the first occurence of ' - '. It is a simple enough task to be efficiently performed by string functions rather than regexes:
select
substr(mycol, 1, instr(mycol, ' - ') - 1) num,
substr(mycol, instr(mycol, ' - ') + 3) name
from mytable
Demo on DB Fiddlde:
with mytable as (
select '123 - ABC' mycol from dual
union all select '123 - ABC - Corp' from dual
)
select
mycol,
substr(mycol, 1, instr(mycol, ' - ') - 1) num,
substr(mycol, instr(mycol, ' - ') + 3) name
from mytable
MYCOL | NUM | NAME
:--------------- | :-- | :---------
123 - ABC | 123 | ABC
123 - ABC - Corp | 123 | ABC - Corp
NB: #GMB solution is much better in your simple case. It's an overkill to use regular expressions for that.
tldr;
Usually it's easierr and more readable to use subexpr parameter instead of occurrence in case of such fixed masks. So you can specify full mask: \d+\s*-\s*\S+
ie numbers, then 0 or more whitespace chars, then -, again 0 or more whitespace chars and 1+ non-whitespace characters.
Then we adding () to specify subexpressions: since we need only numbers and trailing non-whitespace characters we puts them into ():
'(\d+)\s*-\s*(\S+)'
Then we just specify which subexpression we need, 1 or 2:
SELECT
REGEXP_SUBSTR(column_value,'(\d+)\s*-\s*(\S+)',1,1,null,1) AS NUM,
REGEXP_SUBSTR(column_value,'(\d+)\s*-\s*(\S+)',1,1,null,2) AS NAME
from table(sys.odcivarchar2list('123 - ABC', '123 - ABC-Corp'));
Result:
NUM NAME
---------- ----------
123 ABC
123 ABC-Corp
https://docs.oracle.com/database/121/SQLRF/functions164.htm#SQLRF06303
https://docs.oracle.com/database/121/SQLRF/ap_posix003.htm#SQLRF55544

match something ended with a specific char BUT not those which have something in front of the ending char

I have the following string:
aaa'dd?'d'xxx'
The delimiter is
'
but if it has
?
in front of it, it should not be consider delimiter but just a literal (the ? is a escape character for delimiter).
The result I want to display is:
aaa
dd'd
xxx
In this moment I am using [^']+ which does not take into consideration the escaping character(?).
Can you help me, please?
A simple option is to replace offending string with something else; for example, I used #. For the final result, replace it with a single quote, '.
SQL> with test (col) as
2 (select q'[aaa'dd?'d'xxx']' from dual),
3 inter as
4 (select replace(col, '?''', '#') icol
5 from test
6 )
7 select replace(regexp_substr(icol, '[^'']+', 1, level), '#', '''') result
8 from inter
9 connect by level <= regexp_count(icol, '''');
RESULT
-------------
aaa
dd'd
xxx
If you want to this without replacing the ?' pattern with a fixed dummy character - whether that's a '#' or anything else you are sure will never actually appear - then you can use a regular expression pattern like this:
-- bind variable for sample value
var str varchar2(20);
exec :str := q'[aaa'dd?'d'xxx']';
select regexp_substr(:str, '((.*?[^?])*?)(''|$)', 1, level, null, 1) as result
from dual
connect by level < regexp_count(:str, '((.*?[^?])*?)(''|$)');
RESULT
--------------------
aaa
dd?'d
xxx
and you can then just apply a simple replace afterwards:
select replace(
regexp_substr(:str, '((.*?[^?])*?)(''|$)', 1, level, null, 1),
'?''',
'''') as result
from dual
connect by level < regexp_count(:str, '((.*?[^?])*?)(''|$)');
RESULT
--------------------
aaa
dd'd
xxx
If you have two adjacent unescaped delimiters you get a null element back from that position (this didn't happen with an earlier version of the regex pattern):
exec :str := q'[aaa''dd?'d'xxx']';
-- just to make them more visible...
set null (null)
select replace(
regexp_substr(:str, '((.*?[^?])*?)(''|$)', 1, level, null, 1),
'?''',
'''') as result
from dual
connect by level < regexp_count(:str, '((.*?[^?])*?)(''|$)');
RESULT
--------------------
aaa
(null)
dd'd
xxx

Oracle: split and join the splited elements

For example, I have a strings like this
this is test string1
this is another test string2
this is another another test string3
I need to split the strings by space, then join all the elements except last two. So the output should look like this
this is
this is another
this is another another
Regexp_Replace() should do the job here:
regexp_replace(yourcolumn, ' [^ ]* [^ ]*$','')
SQLFiddle of this in action (Oracle isn't working on sqlfiddle today, so this is postgres; but their implementation of regexp_replace is nearly the same, and for this example it's exactly the same)
CREATE TABLE test(f1 VARCHAR(500));
INSERT INTO test VALUES
('this is another another test string3'),
('this is test string1'),
('this is another test string2');
SELECT regexp_replace(f1, ' [^ ]* [^ ]*$','') FROM test;
+-------------------------+
| regexp_replace |
+-------------------------+
| this is another another |
| this is |
| this is another |
+-------------------------+
The regex string here ' [^ ]* [^ ]*$' says to find a space, followed by any number of non-space characters [^ ]* followed by another space, followed by any number of non-space characters [^ ]*, followed by the end of the string $ which we just replace out with nothing ''.
A different approach could be without regular expressions, longer to type, but faster to execute; it mainly depends on what you need.
It's not completely clear what to do if the input string has less than 3 tokens, so this is a way to handle different needs:
select str,
case when instr(str, ' ', 1, 2) != 0 then
substr(str, 1, instr(str, ' ', -1, 2)-1)
else
str
end as res1,
substr(str, 1, instr(str, ' ', -1, 2)-1) as res2
from (
select 'this' str from dual union all
select 'this is' str from dual union all
select 'this is test' str from dual union all
select 'this is test string1' str from dual union all
select 'this is another test string2' str from dual union all
select 'this is another another test string3' str from dual
)
STR RES1 RES2
------------------------------------ ------------------------------------ ------------------------------------
this this
this is this is
this is test this this
this is test string1 this is this is
this is another test string2 this is another this is another
this is another another test string3 this is another another this is another another

oracle sql split text into columns based on each occurrence of a certain character set

In our database (Oracle), there is a field named CONVERSATION containing speech to text records (formatted as CLOB).
After some pre-processing and replacement of unnecessary characters, currently this field has a format as the example below.
I want to split texts of agents and customers into separate columns. And I want them separeted by comma for each part starts with "a:" or "c:".
How can I do that?
"a:" stands for agent and "c:" stands for customer
CREATE TABLE TEXT_RECORDS (
CONVERSATION CLOB
);
INSERT INTO TEXT_RECORDS
(CONVERSATION)
VALUES
('a:some text 1 c:some text 2 a:some text 3 c:some text 4 a:some text 5 c:some text 6');
--EDITED (previously it was 'a:some_text_1 c:some_text_2 a:some_text_3 c:some_text_4 a:some_text_5 c:some_text_6')
Desired output as two separate fields:
CONV_AGENT CONV_CUSTOMER
some text 1 ,some text 3, some text 5 some text 2 ,some text 4, some text 6
You can just remove the sub-strings which do not have the correct prefix:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TEXT_RECORDS (
CONVERSATION CLOB
);
INSERT INTO TEXT_RECORDS(CONVERSATION)
SELECT 'a:some_text_1 c:some_text_2 a:some_text_3 c:some_text_4 a:some_text_5 c:some_text_6' FROM DUAL UNION ALL
SELECT 'a:some_text_1 a:some_text_2 a:some_text_3' FROM DUAL UNION ALL
SELECT 'c:some_text_1 a:some_text_2 a:some_text_3 c:some_text_4' FROM DUAL;
Query 1:
SELECT REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
conversation,
'.*?(a:(\S+))?(\s|$)', -- Find each word starting with "a:"
'\2, ' -- replace with just that part without prefix
),
'(, ){2,}', -- Replace multiple delimiters
', ' -- With a single delimiter
),
'^, |, $' -- Remove leading and trailing delimiters
) AS conv_agent,
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
conversation,
'.*?(c:(\S+))?(\s|$)', -- Find each word starting with "c:"
'\2, ' -- replace with just that part without prefix
),
'(, ){2,}', -- Replace multiple delimiters
', ' -- With a single delimiter
),
'^, |, $' -- Remove leading and trailing delimiters
) AS conv_customer
FROM text_records
Results:
| CONV_AGENT | CONV_CUSTOMER |
|---------------------------------------|---------------------------------------|
| some_text_1, some_text_3, some_text_5 | some_text_2, some_text_4, some_text_6 |
| some_text_1, some_text_2, some_text_3 | |
| some_text_2, some_text_3 | some_text_1, some_text_4 |
Updated - Spaces in conversation sentences
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TEXT_RECORDS (
CONVERSATION CLOB
);
INSERT INTO TEXT_RECORDS(CONVERSATION)
SELECT 'a:some text 1 c:some text 2 a:some text 3 c:some text 4 a:some text 5 c:some text 6' FROM DUAL UNION ALL
SELECT 'a:some text 1 a:some text 2 a:some text 3' FROM DUAL UNION ALL
SELECT 'c:some text 1 a:some text 2 a:some text 3 c:some text 4' FROM DUAL;
Query 1:
SELECT REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
conversation,
'.*?(a:([^:]*))?(\s|$)',
'\2, '
),
'(, ){2,}',
', '
),
'^, |, $'
) AS conv_agent,
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
conversation,
'.*?(c:([^:]*))?(\s|$)',
'\2, '
),
'(, ){2,}',
', '
),
'^, |, $'
) AS conv_customer
FROM text_records
Results:
| CONV_AGENT | CONV_CUSTOMER |
|---------------------------------------|---------------------------------------|
| some text 1, some text 3, some text 5 | some text 2, some text 4, some text 6 |
| some text 1, some text 2, some text 3 | |
| some text 2, some text 3 | some text 1, some text 4 |
You can create two functions, one that get the agent conversation and the other is for customer conversation, see below function to get for agent conversation.
CREATE OR REPLACE FUNCTION get_agent_conv(p_text CLOB) RETURN clob
IS
v_indx NUMBER := 1;
v_agent_conv CLOB;
v_occur NUMBER := 0;
BEGIN
LOOP
v_occur := v_occur + 1;
v_indx := DBMS_LOB.INSTR(p_text, 'a:', 1, v_occur);
v_agent_conv := v_agent_conv||', '||SUBSTR(p_text, v_indx+2, (DBMS_LOB.INSTR(p_text, 'c:', 1, v_occur)-4)-(v_indx-1));
EXIT WHEN v_indx = 0;
END LOOP;
RETURN TRIM(', ' FROM v_agent_conv);
END;
/
SELECT GET_AGENT_CONV(conversation) agent_conversation
FROM text_records;
AGENT_CONVERSATION
-------------------------------------
some_text_1, some_text_3, some_text_5

How to extract the number from a string using Oracle?

I have a string as follows: first, last (123456) the expected result should be 123456. Could someone help me in which direction should I proceed using Oracle?
It will depend on the actual pattern you care about (I assume "first" and "last" aren't literal hard-coded strings), but you will probably want to use regexp_substr.
For example, this matches anything between two brackets (which will work for your example), but you might need more sophisticated criteria if your actual examples have multiple brackets or something.
SELECT regexp_substr(COLUMN_NAME, '\(([^\)]*)\)', 1, 1, 'i', 1)
FROM TABLE_NAME
Your question is ambiguous and needs clarification. Based on your comment it appears you want to select the six digits after the left bracket. You can use the Oracle instr function to find the position of a character in a string, and then feed that into the substr to select your text.
select substr(mycol, instr(mycol, '(') + 1, 6) from mytable
Or if there are a varying number of digits between the brackets:
select substr(mycol, instr(mycol, '(') + 1, instr(mycol, ')') - instr(mycol, '(') - 1) from mytable
Find the last ( and get the sub-string after without the trailing ) and convert that to a number:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE test ( str ) AS
SELECT 'first, last (123456)' FROM DUAL UNION ALL
SELECT 'john, doe (jr) (987654321)' FROM DUAL;
Query 1:
SELECT TO_NUMBER(
TRIM(
TRAILING ')' FROM
SUBSTR(
str,
INSTR( str, '(', -1 ) + 1
)
)
) AS value
FROM test
Results:
| VALUE |
|-----------|
| 123456 |
| 987654321 |