Oracle: split and join the splited elements - sql

For example, I have a strings like this
this is test string1
this is another test string2
this is another another test string3
I need to split the strings by space, then join all the elements except last two. So the output should look like this
this is
this is another
this is another another

Regexp_Replace() should do the job here:
regexp_replace(yourcolumn, ' [^ ]* [^ ]*$','')
SQLFiddle of this in action (Oracle isn't working on sqlfiddle today, so this is postgres; but their implementation of regexp_replace is nearly the same, and for this example it's exactly the same)
CREATE TABLE test(f1 VARCHAR(500));
INSERT INTO test VALUES
('this is another another test string3'),
('this is test string1'),
('this is another test string2');
SELECT regexp_replace(f1, ' [^ ]* [^ ]*$','') FROM test;
+-------------------------+
| regexp_replace |
+-------------------------+
| this is another another |
| this is |
| this is another |
+-------------------------+
The regex string here ' [^ ]* [^ ]*$' says to find a space, followed by any number of non-space characters [^ ]* followed by another space, followed by any number of non-space characters [^ ]*, followed by the end of the string $ which we just replace out with nothing ''.

A different approach could be without regular expressions, longer to type, but faster to execute; it mainly depends on what you need.
It's not completely clear what to do if the input string has less than 3 tokens, so this is a way to handle different needs:
select str,
case when instr(str, ' ', 1, 2) != 0 then
substr(str, 1, instr(str, ' ', -1, 2)-1)
else
str
end as res1,
substr(str, 1, instr(str, ' ', -1, 2)-1) as res2
from (
select 'this' str from dual union all
select 'this is' str from dual union all
select 'this is test' str from dual union all
select 'this is test string1' str from dual union all
select 'this is another test string2' str from dual union all
select 'this is another another test string3' str from dual
)
STR RES1 RES2
------------------------------------ ------------------------------------ ------------------------------------
this this
this is this is
this is test this this
this is test string1 this is this is
this is another test string2 this is another this is another
this is another another test string3 this is another another this is another another

Related

We need to mask data for the String up to fixed length in Oracle

I am trying to mask the data for the below String :
This is the new ADHAR NUMBER 123456789989 this is the string 3456798983 from Customer Name like 345678 to a String .
In above data I want to mask data starting from ADHAR NUMBER to length up to 60 characters.
OUTPUT :
This is the new *********************************************************Customer Name like 345678 to a String .
Can anyone please help
A little bit of substr + instr does the job (sample data in the first 2 lines; query begins at line #3):
SQL> with test (col) as
2 (select 'This is the new ADHAR NUMBER 123456789989 this is the string 3456798983 from Customer Name like 345678 to a String .' from dual)
3 select substr(col, 1, instr(col, 'ADHAR NUMBER') - 1) ||
4 lpad('*', 60, '*') ||
5 substr(col, instr(col, 'ADHAR NUMBER') + 60) result
6 from test;
RESULT
--------------------------------------------------------------------------------
This is the new ************************************************************ Cus
tomer Name like 345678 to a String .
SQL>
Here is a solution that covers all possibilities (I think). Notice the different inputs in the WITH clause (which is not part of the solution - remove it, and use your actual table and column names in the query). This is how one should test their solutions - consider all possible cases, including NULL input, non-NULL input string that doesn't contain the "magic words", string that has the "magic words" right at the beginning, etc.
There is one important situation the solution does NOT address, namely when the exact substring 'ADHAR NUMBER' is not two full words, but it is part of longer words - for example 'BHADHAR NUMBERS'. In this case the output will look like 'BH****************' masking ADHAR NUMBER and the S after NUMBER and more characters, up to 60 total.
Note that the output string has the same length as the input. This is generally part of the definition of "masking".
with
test (col) as (
select 'This is the new ADHAR NUMBER 123456789989 this is the string ' ||
'3456798983 from Customer Name like 345678 to a String.'
from dual union all
select 'This string does not contain the magic words' from dual union all
select 'ADHAR NUMBER 12345' from dual union all
select 'Blah blah ADHAR NUMBER 1234' from dual union all
select null from dual union all
select 'Another blah ADHAR NUMBER' from dual
)
select case when pos > 0
then
substr(col, 1, pos - 1) ||
rpad('*', least(60, length(col) - pos + 1), '*') ||
substr(col, pos + 60)
else col end as masked
from (
select col, instr(col, 'ADHAR NUMBER') as pos
from test
)
;
MASKED

This is the new ************************************************************ Customer Name like 345678 to a String.
This string does not contain the magic words
******************
Blah blah *****************
Another blah ************

How to get first string after character Oracle SQL

I'm trying to get first string after a character.
Example is like
ABCDEF||GHJ||WERT
I need only
GHJ
I tried to use REGEXP but i couldnt do it.
Can anyone help me with please?
Thank you
Somewhat simpler:
SQL> select regexp_substr('ABCDEF||GHJ||WERT', '\w+', 1, 2) result from dual;
^
RES |
--- give me the 2nd "word"
GHJ
SQL>
which reads as: give me the 2nd word out of that string. Won't work properly if GHJ consists of several words (but that's not what your example suggests).
Something like I interpret with a separator in place, In this case it is || or | example is with oracle database
-- pattern -- > [^] represents non-matching character and + for says one or more character followed by ||
-- 3rd parameter --> starting position
-- 4th parameter --> nth occurrence
WITH tbl(str) AS
(SELECT 'ABCDEF||GHJ||WERT' str FROM dual)
SELECT regexp_substr(str
,'[^||]+'
,1
,2) output
FROM tbl;
I think the most general solution is:
WITH tbl(str) AS (
SELECT 'ABCDEF||GHJ||WERT' str FROM dual UNION ALL
SELECT 'ABC|DEF||GHJ||WERT' str FROM dual UNION ALL
SELECT 'ABClDEF||GHJ||WERT' str FROM dual
)
SELECT regexp_replace(str, '^.*\|\|(.*)\|\|.*', '\1')
FROM tbl;
Note that this works even if the individual elements contain punctuation or a single vertical bar -- which the other solutions do not. Here is a comparison.
Presumably, the double vertical bar is being used for maximum flexibility.
You should use regexp_substr function
select regexp_substr('ABCDEF||GHJ||WERT ', '\|{2}([^|]+)', 1, 1, 'i', 1) str
from dual;
STR
---
GHJ

Consecutive Pattern replacing is not happening with REGEXP_REPLACE

I have a string as below
Welcome to the world of the Hackers
I am trying to replace the occurrences of listed strings i.e. of,to,the in between the entire string using below query, but it's not working properly if the patterns are consecutive, it fails.
SELECT regexp_replace( 'Welcome to the world of the Hackers', '( to )|( the )|( of )', ' ' )
FROM dual;
Output: Welcome the world the Hackers
Even if the pattern is repeating consecutive it is not working i.e.
SELECT regexp_replace( 'Welcome to to the world of the Hackers', '( to )|( the )|( of )', ' ' )
FROM dual;
Output: Welcome to world the Hackers
Whereas my expected output is: Welcome world Hackers
Is there any alternative/solution for this using REGEXP_REPLACE?
You can use the regular expression (^|\s+)((to|the|of)(\s+|$))+:
SQL Fiddle
Query 1:
WITH test_data ( sentence ) AS (
SELECT 'to the of' FROM DUAL UNION ALL
SELECT 'woof breathe toto' FROM DUAL UNION ALL -- has all the words as sub-strings of words
SELECT 'theory of the offer to total' FROM DUAL -- mix of words to replace and words starting with those words
)
SELECT sentence,
regexp_replace(
sentence,
'(^|\s+)((to|the|of)(\s+|$))+',
'\1'
) AS replaced
FROM test_data
Results:
| SENTENCE | REPLACED |
|------------------------------|--------------------|
| to the of | (null) | -- All words replaced
| woof breathe toto | woof breathe toto |
| theory of the offer to total | theory offer total |
Why doesn't regexp_replace( 'Welcome to the world of the Hackers', '( to )|( the )|( of )', ' ' ) work with successive matches?
Because the regular expression parser will look for the second match after the end of the first match and will not include the already parsed part of the string or the replacement text when looking for subsequent matches.
So the first match will be:
'Welcome to the world of the Hackers'
^^^^
The second match will look in the sub-string following that match
'the world of the Hackers'
^^^^
The 'the ' at the start of the sub-string will not be matched as it has no leading space character (yes, there was a space before it but that was matched in the previous match and, yes, that match was replaced with a space but overlapping matches and matches on previous replacements are not how regular expressions work).
So the second match is the ' of ' in the middle of the remaining sub-string.
There will be no third match as the remaining un-parsed sub-string is:
'the Hackers'
and, again, the 'the ' is not matched as there is not leading space character to match.
REGEXP_REPLACE does not match a second pattern which is a part of the already matched pattern. This is more apparent when you use the multi-pattern matching like |. Thus, you can't rely on spaces for word boundaries to match multiple patterns this way. One solution could be to split and combine the characters. This may not be the best way, but works nonetheless. I would be glad to know a better solution.
This also assumes that you are ok with single spaces in the combined string when it had more than one in the original string.Also, words ending with comma or semicolon aren't considered. You may enhance it using NOT REGEXP_LIKE instead of NOT IN for such cases.
WITH t (id,s)
AS (
SELECT 1 , 'Welcome to the world of the Hackers, you told me these words at the'
FROM DUAL
UNION ALL
SELECT 2, 'The second line.Welcome to the world of the Hackers, you told me these words at the'
FROM DUAL
)
SELECT LISTAGG(word, ' ') WITHIN
GROUP (
ORDER BY w
)
FROM (
SELECT id,
LEVEL AS w
,REGEXP_SUBSTR(s, '[^ ]+', 1, LEVEL) AS word
FROM t CONNECT BY LEVEL <= REGEXP_COUNT(s, '[^ ]+')
AND PRIOR id = id
AND PRIOR SYS_GUID() IS NOT NULL
)
WHERE lower(word) NOT IN (
'to'
,'the'
,'of'
)
GROUP BY id;
Demo

Oracle SQL query to convert a string into a comma separated string with comma after every n characters

How can we convert a string of any length into a comma separated string with comma after every n characters. I am using Oracle 10g and above. I tried with REGEXP_SUBSTR but couldn't get desired result.
e.g.: for below string comma after every 5 characters.
input:
aaaaabbbbbcccccdddddeeeeefffff
output:
aaaaa,bbbbb,ccccc,ddddd,eeeee,fffff,
or
aaaaa,bbbbb,ccccc,ddddd,eeeee,fffff
Thanks in advance.
This can be done with regexp_replace, like so:
WITH sample_data AS (SELECT 'aaaaabbbbbcccccdddddeeeeefffff' str FROM dual UNION ALL
SELECT 'aaaa' str FROM dual UNION ALL
SELECT 'aaaaabb' str FROM dual)
SELECT str,
regexp_replace(str, '(.{5})', '\1,')
FROM sample_data;
STR REGEXP_REPLACE(STR,'(.{5})','\
------------------------------ --------------------------------------------------------------------------------
aaaaabbbbbcccccdddddeeeeefffff aaaaa,bbbbb,ccccc,ddddd,eeeee,fffff,
aaaa aaaa
aaaaabb aaaaa,bb
The regexp_replace simply looks for any 5 characters (.{5}), and then replaces them with the same 5 characters plus a comma. The brackets around the .{5} turn it into a labelled subexpression - \1, since it's the first set of brackets - which we can then use to represent our 5 characters in the replacement section.
You would then need to trim the extra comma off the resultant string, if necessary.
SELECT RTRIM ( REGEXP_REPLACE('aaaaabbbbbcccccdddddeeeeefffff', '(.{5})' ,'\1,') ,',') replaced
FROM DUAL;
This worked for me:
WITH strlen AS
(
SELECT 'aaaaabbbbbcccccdddddeeeeefffffggggg' AS input,
LENGTH('aaaaabbbbbcccccdddddeeeeefffffggggg') AS LEN,
5 AS part
FROM dual
)
,
pattern AS
(
SELECT regexp_substr(strlen.input, '[[:alnum:]]{5}', 1, LEVEL)
||',' AS line
FROM strlen,
dual
CONNECT BY LEVEL <= strlen.len / strlen.part
)
SELECT rtrim(listagg(line, '') WITHIN GROUP (
ORDER BY 1), ',') AS big_bang$
FROM pattern ;

Split string by space and character as delimiter in Oracle with regexp_substr

I'm trying to split a string with regexp_subtr, but i can't make it work.
So, first, i have this query
select regexp_substr('Helloworld - test!' ,'[[:space:]]-[[:space:]]') from dual
which very nicely extracts my delimiter - blank-blank
But then, when i try to split the string with this option, it just doesn't work.
select regexp_substr('Helloworld - test!' ,'[^[[:space:]]-[[:space:]]]+')from dual
The query returns nothing.
Help will be much appreciated!
Thanks
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TEST( str ) AS
SELECT 'Hello world - test-test! - test' FROM DUAL
UNION ALL SELECT 'Hello world2 - test2 - test-test2' FROM DUAL;
Query 1:
SELECT Str,
COLUMN_VALUE AS Occurrence,
REGEXP_SUBSTR( str ,'(.*?)([[:space:]]-[[:space:]]|$)', 1, COLUMN_VALUE, NULL, 1 ) AS split_value
FROM TEST,
TABLE(
CAST(
MULTISET(
SELECT LEVEL
FROM DUAL
CONNECT BY LEVEL < REGEXP_COUNT( str ,'(.*?)([[:space:]]-[[:space:]]|$)' )
)
AS SYS.ODCINUMBERLIST
)
)
Results:
| STR | OCCURRENCE | SPLIT_VALUE |
|-----------------------------------|------------|--------------|
| Hello world - test-test! - test | 1 | Hello world |
| Hello world - test-test! - test | 2 | test-test! |
| Hello world - test-test! - test | 3 | test |
| Hello world2 - test2 - test-test2 | 1 | Hello world2 |
| Hello world2 - test2 - test-test2 | 2 | test2 |
| Hello world2 - test2 - test-test2 | 3 | test-test2 |
If i understood correctly, this will help you. Currently you are getting output as Helloworld(with space at the end). So i assume u don't want to have space at the end. If so you can simply use the space in the delimiter also like.
select regexp_substr('Helloworld - test!' ,'[^ - ]+',1,1)from dual;
OUTPUT
Helloworld(No space at the end)
As u mentioned in ur comment if u want two columns output with Helloworld and test!. you can do the following.
select regexp_substr('Helloworld - test!' ,'[^ - ]+',1,1),
regexp_substr('Helloworld - test!' ,'[^ - ]+',1,3) from dual;
OUTPUT
col1 col2
Helloworld test!
Trying to negate the match string '[[:space:]]-[[:space:]]' by putting it in a character class with a circumflex (^) to negate it will not work. Everything between a pair of square brackets is treated as a list of optional single characters except for named named character classes which expand out to a list of optional characters, however, due to the way character classes nest, it's very likely that your outer brackets are being interpreted as follows:
[^[[:space:]] A single non space non left square bracket character
- followed by a single hyphen
[[:space:]] followed by a single space character
]+ followed by 1 or more closing square brackets.
It may be easier to convert your multi-character separator to a single character with regexp_replace, then use regex_substr to find you individual pieces:
select regexp_substr(regexp_replace('Helloworld - test!'
,'[[:space:]]-[[:space:]]'
,chr(11))
,'([^'||chr(11)||']*)('||chr(11)||'|$)'
,1 -- Start here
,2 -- return 1st, 2nd, 3rd, etc. match
,null
,1 -- return 1st sub exp
)
from dual;
In this code I first changed - to chr(11). That's the ASCII vertical tab (VT) character which is unlikely to appear in most text strings. Then the match expression of the regexp_substr matches all non VT characters followed by either a VT character or the end of line. Only the non VT characters are returned (the first subexpression).
Slight improvement on MT0's answer. Dynamic count using regexp_count and proves it handles nulls where the format of [^delimiter]+ as a pattern does NOT handle NULL list elements. More info on that here: Split comma seperated values to columns
SQL> with tbl(str) as (
2 select ' - Hello world - test-test! - - test - ' from dual
3 )
4 SELECT LEVEL AS Occurrence,
5 REGEXP_SUBSTR( str ,'(.*?)([[:space:]]-[[:space:]]|$)', 1, LEVEL, NULL, 1 ) AS split_value
6 FROM tbl
7 CONNECT BY LEVEL <= regexp_count(str, '[[:space:]]-[[:space:]]')+1;
OCCURRENCE SPLIT_VALUE
---------- ----------------------------------------
1
2 Hello world
3 test-test!
4
5 test
6
6 rows selected.
SQL>
CREATE OR REPLACE FUNCTION field(i_string VARCHAR2
,i_delimiter VARCHAR2
,i_occurance NUMBER
,i_return_number NUMBER DEFAULT 0
,i_replace_delimiter VARCHAR2) RETURN VARCHAR2 IS
-----------------------------------------------------------------------
-- Function Name.......: FIELD
-- Author..............: Dan Simson
-- Date................: 05/06/2016
-- Description.........: This function is similar to the one I used from
-- long ago by Prime Computer. You can easily
-- parse a delimited string.
-- Example.............:
-- String.............: This is a cool function
-- Delimiter..........: ' '
-- Occurance..........: 2
-- Return Number......: 3
-- Replace Delimiter..: '/'
-- Return Value.......: is/a/cool
-------------------------------------------------------------------------- ---
v_return_string VARCHAR2(32767);
n_start NUMBER := i_occurance;
v_delimiter VARCHAR2(1);
n_return_number NUMBER := i_return_number;
n_max_delimiters NUMBER := regexp_count(i_string, i_delimiter);
BEGIN
IF i_return_number > n_max_delimiters THEN
n_return_number := n_max_delimiters + 1;
END IF;
FOR a IN 1 .. n_return_number LOOP
v_return_string := v_return_string || v_delimiter || regexp_substr (i_string, '[^' || i_delimiter || ']+', 1, n_start);
n_start := n_start + 1;
v_delimiter := nvl(i_replace_delimiter, i_delimiter);
END LOOP;
RETURN(v_return_string);
END field;
SELECT field('This is a cool function',' ',2,3,'/') FROM dual;
SELECT regexp_substr('This is a cool function', '[^ ]+', 1, 1) Word1
,regexp_substr('This is a cool function', '[^ ]+', 1, 2) Word2
,regexp_substr('This is a cool function', '[^ ]+', 1, 3) Word3
,regexp_substr('This is a cool function', '[^ ]+', 1, 4) Word4
,regexp_substr('This is a cool function', '[^ ]+', 1, 5) Word5
FROM dual;