Split string by number of character in Oracle SQL - sql

I am looking to split string at 70 character and I found below code from this link which is slightly different than I am looking. Below code is splitting character at the max of 70 using the last space as break but what I am looking is without the last space as break. I tried to make changes from this code but honestly I am still learning the regexp_substr function and unable to figure out where the last space is considered. It will be very helpful if someone guide me through this. Thanks
Code:
with val as (
select 'I am also looking for similar solution but slightly different where the split occurs at 70 character and not considering last space.' str
from dual
), words as (
select regexp_substr(str, '[^ ]+', 1, level) w
from val
connect by regexp_substr(str, '[^ ]+', 1, level) is not null
), grps as (
select * from words
match_recognize (
measures
match_number () grp,
count ( text.* ) word#
all rows per match
pattern ( init text* )
define
text as sum ( length ( w ) ) + count ( text.* ) <= 70
)
)
select listagg ( w, ' ' )
within group ( order by word# ) split_strs
from grps
group by grp
order by grp;
Result from above code:
What I am looking to achieve is,

You need a Hierarchical query containing SUBSTR() function such as
SELECT level-1 AS "Row_Number",
SUBSTR(str,70*(level-1)+1,70*level)
FROM val
CONNECT BY level <= CEIL(LENGTH(str)/70)
Demo

For exactly 70 chars you can use SUBSTR instead of regexp_substr.
e.g. first 70 chars = SUBSTR(val,1,70)

This one done the trick, thanks those responded before I get to this.
with val as (
select 'I am also looking for similar solution but slightly different where the split occurs at 70 character and not considering last space.' str
from dual
), words as (
select regexp_substr(str, '(.{1})', 1, level) w
from val
connect by regexp_substr(str, '(.{1})', 1, level) is not null
), grps as (
select * from words
match_recognize (
measures
match_number () grp,
count ( text.* ) word#
all rows per match
pattern ( init text* )
define
text as sum ( length ( w ) ) <= 70
)
)
select listagg ( w, '' )
within group ( order by word# ) split_strs
from grps
group by grp
order by grp
;

Related

Apply order by in comma separated string in oracle

I have one of the column in oracle table which has below value :
select csv_val from my_table where date='09-OCT-18';
output
==================
50,100,25,5000,1000
I want this values to be in ascending order with select query, output would looks like :
output
==================
25,50,100,1000,5000
I tried this link, but looks like it has some restriction on number of digits.
Here, I made you a modified version of the answer you linked to that can handle an arbitrary (hardcoded) number of commas. It's pretty heavy on CTEs. As with most LISTAGG answers, it'll have a 4000-char limit. I also changed your regexp to be able to handle null list entries, based on this answer.
WITH
T (N) AS --TEST DATA
(SELECT '50,100,25,5000,1000' FROM DUAL
UNION
SELECT '25464,89453,15686' FROM DUAL
UNION
SELECT '21561,68547,51612' FROM DUAL
),
nums (x) as -- arbitrary limit of 20, can be changed
(select level from dual connect by level <= 20),
splitstr (N, x, substring) as
(select N, x, regexp_substr(N, '(.*?)(,|$)', 1, x, NULL, 1)
from T
inner join nums on x <= 1 + regexp_count(N, ',')
order by N, x)
select N, listagg(substring, ',') within group (order by to_number(substring)) as sorted_N
from splitstr
group by N
;
Probably it can be improved, but eh...
Based on sample data you posted, relatively simple query would work (you need lines 3 - 7). If data doesn't really look like that, query might need adjustment.
SQL> with my_table (csv_val) as
2 (select '50,100,25,5000,1000' from dual)
3 select listagg(token, ',') within group (order by to_number(token)) result
4 from (select regexp_substr(csv_val, '[^,]+', 1, level) token
5 from my_table
6 connect by level <= regexp_count(csv_val, ',') + 1
7 );
RESULT
-------------------------
25,50,100,1000,5000
SQL>

REGEXP_REPLACE to replace emails in a list except a specific domain

I am novice to regular expressions. I am trying to remove emails from a list which do not belong to a specific domain.
for e.g. I have a below list of emails:
John#yahoo.co.in , Jacob#gmail.com, Bob#rediff.com,
Lisa#abc.com, sam#gmail.com , rita#yahoo.com
I need to get only the gmail ids:
Jacob#gmail.com, sam#gmail.com
Please note we may have spaces before the comma delimiters.
Appreciate any help!
This could be a start for you.
SELECT *
FROM ( SELECT REGEXP_SUBSTR (str,
'[[:alnum:]\.\+]+#gmail.com',
1,
LEVEL)
AS SUBSTR
FROM (SELECT ' John#yahoo.co.in , Jacob.foo#gmail.com, Bob#rediff.com,Lisa#abc.com, sam#gmail.com , sam.bar+stackoverflow#gmail.com, rita#yahoo.com, foobar '
AS str
FROM DUAL)
CONNECT BY LEVEL <= LENGTH (REGEXP_REPLACE (str, '[^,]+')) + 1)
WHERE SUBSTR IS NOT NULL ;
Put in a few more examples, but an email checker should comply to the respective RFCs, look at wikipedia for further knowledge about them https://en.wikipedia.org/wiki/Email_address
Inspiration from https://stackoverflow.com/a/17597049/869069
Rather than suppress the emails not matching a particular domain (in your example, gmail.com), you might try getting only those emails that match the domain:
WITH a1 AS (
SELECT 'John#yahoo.co.in , Jacob#gmail.com, Bob#rediff.com,Lisa#abc.com, sam#gmail.com , rita#yahoo.com' AS email_list FROM dual
)
SELECT LISTAGG(TRIM(email), ',') WITHIN GROUP ( ORDER BY priority )
FROM (
SELECT REGEXP_SUBSTR(email_list, '[^,]+#gmail.com', 1, LEVEL, 'i') AS email
, LEVEL AS priority
FROM a1
CONNECT BY LEVEL <= REGEXP_COUNT(email_list, '[^,]+#gmail.com', 1, 'i')
);
That said, Oracle is probably not the best tool for this (do you have these email addresses stored as a list in a table somewhere? If so then #GordonLinoff's comment is apt - fix your data model if you can).
Here's a method using a CTE just for a different take on the problem. First step is to make a CTE "table" that contains the parsed list elements. Then select from that. The CTE regex handles NULL list elements.
with main_tbl(email) as (
select ' John#yahoo.co.in , Jacob.foo#gmail.com, Bob#rediff.com,Lisa#abc.com, sam#gmail.com , sam.bar+stackoverflow#gmail.com, rita#yahoo.com, foobar '
from dual
),
email_list(email_addr) as (
select trim(regexp_substr(email, '(.*?)(,|$)', 1, level, NULL, 1))
from main_tbl
connect by level <= regexp_count(email, ',')+1
)
-- select * from email_list;
select LISTAGG(TRIM(email_addr), ', ') WITHIN GROUP ( ORDER BY email_addr )
from email_list
where lower(email_addr) like '%gmail.com';

How to remove duplicates from space separated list by Oracle regexp_replace? [duplicate]

This question already has answers here:
How to remove duplicates from comma separated list by regexp_replace in Oracle?
(2 answers)
Closed 4 years ago.
I have a list called 'A B A A C D'. My expected result is 'A B C D'. So far from web I have found out
regexp_replace(l_user ,'([^,]+)(,[ ]*\1)+', '\1');
Expression. But this is for , separated list. What is the modification need to be done in order to make it space separated list. no need to consider the order.
If I understand well you don't simply need to replace ',' with a space, but also to remove duplicates in a smarter way.
If I modify that expression to work with space instead of ',', I get
select regexp_replace('A B A A C D' ,'([^ ]+)( [ ]*\1)+', '\1') from dual
which gives 'A B A C D', not what you need.
A way to get your needed result could be the following, a bit more complicated:
with string(s) as ( select 'A B A A C D' from dual)
select listagg(case when rn = 1 then str end, ' ') within group (order by lev)
from (
select str, row_number() over (partition by str order by 1) rn, lev
from (
SELECT trim(regexp_substr(s, '[^ ]+', 1, level)) str,
level as lev
FROM string
CONNECT BY instr(s, ' ', 1, level - 1) > 0
)
)
My main problem here is that I'm not able to build a regexp that checks for non adjacent duplicates, so I need to split the string, check for duplicates and then aggregate again the non duplicated values, keeping the order.
If you don't mind the order of the tokens in the result string, this can be simplified:
with string(s) as ( select 'A B A A C D' from dual)
select listagg(str, ' ') within group (order by 1)
from (
SELECT distinct trim(regexp_substr(s, '[^ ]+', 1, level)) as str
FROM string
CONNECT BY instr(s, ' ', 1, level - 1) > 0
)
Assuming you want to keep the component strings in the order of their first occurrence (and not, say, reorder them alphabetically - your example is poorly chosen in this regard, because both lead to the same result), the problem is more complicated, because you must keep track of order too. Then for each letter you must keep just the first occurrence - here is where row_number() helps.
with
inputs ( str ) as ( select 'A B A A C D' from dual)
-- end test data; solution begins below this line
select listagg(token, ' ') within group (order by id) as new_str
from (
select level as id, regexp_substr(str, '[^ ]+', 1, level) as token,
row_number() over (
partition by regexp_substr(str, '[^ ]+', 1, level)
order by level ) as rn
from inputs
connect by regexp_substr(str, '[^ ]+', 1, level) is not null
)
where rn = 1
;
Xquery?
select xmlquery('string-join(distinct-values(ora:tokenize(.," ")), " ")' passing 'A B A A C D' returning content) result from dual

Regexp_replace processing result

I have a string with groups of nubmers. And Id like to make constant length string. Now I use two regexp_replace. First to add 10 numbers to string and next to cut string and take last 10 values:
with s(txt) as ( select '1030123:12031:1341' from dual)
select regexp_replace(
regexp_replace(txt, '(\d+)','0000000000\1')
,'\d+(\d{10})','\1') from s ;
But Id like to use only one regex something like
regexp_replace(txt, '(\d+)',lpad('\1',10,'0'))
But it don't work. lpad executed before regexp. Could you have any ideas?
With a slightly different approach, you can try the following:
with s(id, txt) as
(
select rownum, txt
from (
select '1030123:12031:1341' as txt from dual union all
select '1234:0123456789:1341' from dual
)
)
SELECT listagg(lpad(regexp_substr(s.txt, '[^:]+', 1, lines.column_value), 10, '0'), ':') within group (order by column_value) txt
FROM s,
TABLE (CAST (MULTISET
(SELECT LEVEL FROM dual CONNECT BY instr(s.txt, ':', 1, LEVEL - 1) > 0
) AS sys.odciNumberList )) lines
group by id
TXT
-----------------------------------
0001030123:0000012031:0000001341
0000001234:0123456789:0000001341
This uses the CONNECT BY to split every string based on the separator ':', then uses LPAD to pad to 10 and then aggregates the strings to build rows containing the concatenation of padded values
This works for non-empty sequences (e.g. 123::456)
with s(txt) as ( select '1030123:12031:1341' from dual)
select regexp_replace (regexp_replace (txt,'(\d+)',lpad('0',10,'0') || '\1'),'0*(\d{10})','\1')
from s
;

Sum the numbers in a string in Oracle

Below is the interview question, can some please help me resolve it?
select 'a1b2c3d4e5f6g7' from dual;
Output is sum of given integer number(1+2+3+4+5+6+7)=28.
Any help?
Use a Regex to keep only the numbers,then connect by to add each number
With T
as (select regexp_replace('a1b2c3d4e5f6g7', '[A-Za-z]') as col from dual)
select sum(val)
From
(
select substr(col,level,1) val from t connect by level <= length(col)
)
FIDDLE
Since it is only 1 digit numbers you can use SUBSTR() to extract every other character:
SQL Fiddle
Oracle 11g R2 Schema Setup:
Query 1:
WITH data ( value ) AS (
select 'a1b2c3d4e5f6g7' from dual
)
SELECT SUM( TO_NUMBER( SUBSTR( value, 2*LEVEL, 1 ) ) ) AS total
FROM data
CONNECT BY 2 * LEVEL <= LENGTH( value )
Results:
| TOTAL |
|-------|
| 28 |
However, if you have two digit numbers then you can do:
Query 2:
WITH data ( value ) AS (
select 'a1b2c3d4e5f6g7h8i9j10' from dual
)
SELECT SUM( TO_NUMBER( REGEXP_SUBSTR( value, '\d+', 1, LEVEL ) ) ) AS total
FROM data
CONNECT BY LEVEL <= REGEXP_COUNT( value, '\d+' )
Results:
| TOTAL |
|-------|
| 55 |
You can use regexp_substr to extract exactly the numbers, then just sum them:
with t as (select 'a1b2c3d4e5f6g7' expr from dual)
select sum(regexp_substr(t.expr, '[0-9]+',1, level)) as col
from dual
connect by level < regexp_instr(t.expr, '[0-9]+',1, level);
example:
select sum(regexp_substr('a1b2c3d4e5f6g7r22g4', '[0-9]+',1, level)) as col
from dual
connect by level < regexp_instr('a1b2c3d4e5f6g7r22g4', '[0-9]+',1, level);
Result:
54
This solution works with numbers with more than 1 digit and it doesn't matter how many characters are between the numbers:
with t as (select 'a1b2c3d4e5f6g7' as str from dual)
select sum(to_number(regexp_substr(str,'[0-9]+',1,level)))
from t
connect by regexp_substr(str,'[0-9]+',1,level) is not null