Related
I have table with column having data in below format in Oracle DB.
COL 1
abc,mno:EMP
xyz:EMP;tyu,opr:PROF
abc,mno:EMP;tyu,opr:PROF
I am trying to convert the data in below format
COL 1
abc:EMP;mno:EMP
xyz:EMP;tyu:PROF;opr:PROF
abc:EMP;mno:EMP;tyu:PROF;opr:PROF
Basically trying to get everything after : and before ; to move it substitute comma with it.
I tried some SUBSTR and LISTAGG but couldn't get anything worth sharing.
Regards.
Here's one option; read comments within code.
SQL> with test (id, col) as
2 -- sample data
3 (select 1, 'abc,mno:EMP' from dual union all
4 select 2, 'xyz:EMP;tyu,opr:PROF' from dual union all
5 select 3, 'abc,mno:EMP;tyu,opr:PROF' from dual
6 ),
7 temp as
8 -- split sample data to rows
9 (select id,
10 column_value cv,
11 regexp_substr(col, '[^;]+', 1, column_value) val
12 from test cross join
13 table(cast(multiset(select level from dual
14 connect by level <= regexp_count(col, ';') + 1
15 ) as sys.odcinumberlist))
16 )
17 -- finally, replace comma with a string that follows a colon sign
18 select id,
19 listagg(replace(val, ',', substr(val, instr(val, ':')) ||';'), ';') within group (order by cv) new_val
20 from temp
21 group by id
22 order by id;
ID NEW_VAL
---------- ----------------------------------------
1 abc:EMP;mno:EMP
2 xyz:EMP;tyu:PROF;opr:PROF
3 abc:EMP;mno:EMP;tyu:PROF;opr:PROF
SQL>
Using the answer of littlefoot, if i were to use cross apply i wouldnt need to cast as multiset...
with test (id, col) as
-- sample data
(select 1, 'abc,mno:EMP' from dual union all
select 2, 'xyz:EMP;tyu,opr:PROF' from dual union all
select 3, 'abc,mno:EMP;tyu,opr:PROF' from dual
),
temp as
-- split sample data to rows
(select id,
column_value cv,
regexp_substr(col, '[^;]+', 1, column_value) val
from test
cross apply (select level as column_value
from dual
connect by level<= regexp_count(col, ';') + 1)
)
-- finally, replace comma with a string that follows a colon sign
select id,
listagg(replace(val, ',', substr(val, instr(val, ':')) ||';'), ';') within group (order by cv) new_val
from temp
group by id
order by id;
You do not need recursive anything, just basic regex: if the pattern is always something,something2:someCode (e.g. you have no colon before the comma), then it would be sufficient.
with test (id, col) as (
select 1, 'abc,mno:EMP' from dual union all
select 2, 'xyz:EMP;tyu,opr:PROF' from dual union all
select 3, 'abc,mno:EMP;tyu,opr:PROF' from dual union all
select 3, 'abc,mno:EMP;tyu,opr:PROF;something:QWE;something2:QWE' from dual
)
select
/*
Grab this groups:
1) Everything before the comma
2) Then everything before the colon
3) And then everything between the colon and a semicolon
Then place group 3 between 1 and 2
*/
trim(trailing ';' from regexp_replace(col || ';', '([^,]+),([^:]+):([^;]+)', '\1:\3;\2:\3')) as res
from test
| RES |
| :------------------------------------------------------------- |
| abc:EMP;mno:EMP |
| xyz:EMP;tyu:PROF;opr:PROF |
| abc:EMP;mno:EMP;tyu:PROF;opr:PROF |
| abc:EMP;mno:EMP;tyu:PROF;opr:PROF;something:QWE;something2:QWE |
db<>fiddle here
i have table like below :
|-------------|---------------------------------------------------|
|ID. | CONTENT |
|-------------|---------------------------------------------------|
|1 |<TITLE> <SUB-TITLE-1> Content <SUB-TITLE-2>Content.
|2 |<TITLE> <SUB-TITLE-1> Content <SUB-TITLE-2>Content.
|3 |<TITLE> <SUB-TITLE-1> Content <SUB-TITLE-2>Content. <SUB-TITLE-3> Content
|-------------|---------------------------------------------------|
I want to extract all text in between <>, so it will become like below :
|-------------|-------------------------------------------------|
|ID. | CONTENT |
|-------------|-------------------------------------------------|
|1 |TITLE |
|1 |SUB-TITLE-1 |
|1 |SUB-TITLE-2 |
|2 |TITLE |
|2 |SUB-TITLE-1 |
|2 |SUB-TITLE-2 |
|3 |TITLE |
|3 |SUB-TITLE-1 |
|3 |SUB-TITLE-2 |
|3 |SUB-TITLE-3 |
|-------------|-------------------------------------------------|
How to achieve this ? I'm trying to do by regex, but I think I'm lost..
My Oracle version is 18c, if that's help...
You can use the 4th argument of REGEXP_SUBSTR to specify an occurrence for matching.
To get a row for the 1st, 2nd, and 3rd occurrence, you can cross-join with a sub-query from dual.
WITH test_data AS (
SELECT 1 AS content_id, '<TITLE> <SUB-TITLE-1> Content<SUB-TITLE-2>Content.<A third sub-title>' AS content_data FROM dual UNION
SELECT 2 AS content_id, '<TITLE> <SUB-TITLE-1> Content<SUB-TITLE-2>Content.' AS content_data FROM dual
)
SELECT t.content_id,
REGEXP_SUBSTR(t.content_data, '<(.*?)>', 1, s.match_occurrence, 'i', 1) AS content_match
FROM test_data t
CROSS JOIN (
SELECT 1 AS match_occurrence FROM dual UNION
SELECT 2 AS match_occurrence FROM dual UNION
SELECT 3 AS match_occurrence FROM dual UNION
SELECT 4 AS match_occurrence FROM dual
/* ... etc, with the number of rows equal to the maximum number of matches that can appear */
) s
WHERE REGEXP_SUBSTR(t.content_data, '<.*?>', 1, s.match_occurrence) IS NOT NULL /* Only return records that have a match for the given occurrence */
ORDER BY t.content_id, s.match_occurrence
Borrowing the CONNECT_BY_LEVEL from Barbaros' excellent answer, you could do it more concisely as:
WITH test_data AS (
SELECT 1 AS content_id, '<TITLE> <SUB-TITLE-1> Content<SUB-TITLE-2>Content.<A third sub-title>' AS content_data FROM dual UNION
SELECT 2 AS content_id, '<TITLE> <SUB-TITLE-1> Content<SUB-TITLE-2>Content.' AS content_data FROM dual
)
SELECT t.content_id,
REGEXP_SUBSTR(t.content_data, '<(.*?)>', 1, LEVEL, 'i', 1) AS content_match
FROM test_data t
CONNECT BY
LEVEL <= REGEXP_COUNT(t.content_data, '<.*?>')
AND PRIOR sys_guid() IS NOT NULL
AND PRIOR content_id = content_id
ORDER BY t.content_id, LEVEL
Note that the CONNECT_BY_LEVEL method might be slower on large datasets, so I would avoid that if performance is a concern.
One option would be using instr() and substr() functions together within a
SELECT .. FROM ..CONNECT BY level style query in order to repeat through counting the numbers of > (or <) signs within each strings :
SELECT id, substr(content,
instr(content,'<',1,level)+1,
instr(content,'>',1,level)-instr(content,'<',1,level)-1) as content
FROM tab
CONNECT BY level <= regexp_count(content,'>')
AND PRIOR sys_guid() IS NOT NULL
AND PRIOR id = id
Demo
I had tried the more conventional way using SUBSTR and INSTR
With data as
Select column1,
Trim(CONTENT, '<', '>') as col2 FROM
TABLE
WITH subdata as
( Select column1,
SUBSTR(col2,0, INSTR(col2, ' '))
as s1
from
data) t1
Union
( Select t1.column1 as col1,
SUBSTR(col2, Length(t1.s1)+1
INSTR(
SUBSTR(
t1.col2, Length(t1.s1)+1,
LENGTH(col2)), ' '))) as col2
From
data) t3
Union
........ t3.... t4
From table
Perfect approach, #Josh Eller !
Only that #Gerry Gry needs it without the greater/smaller signs.
Try with grouping using parentheses?
WITH test_data(content_id,content_data) AS (
SELECT 1 , '<TITLE> <SUB-TITLE-1> Content<SUB-TITLE-2>Content.<SUB-TITLE-3>Content.' FROM dual
UNION SELECT 2 , '<TITLE> <SUB-TITLE-1> Content<SUB-TITLE-2>Content.' FROM dual
)
SELECT t.content_id
, match_occurrence
, REGEXP_SUBSTR(
t.content_data -- input string
, '[<]([^>]*)[>]' -- regex
, 1 -- starting position
, s.match_occurrence -- n-th occurrence
, '' -- regexp modifier
, 1 -- captured-subexp
) AS content_match
FROM test_data t
CROSS JOIN (
SELECT 1 FROM dual
UNION SELECT 2 FROM dual
UNION SELECT 3 FROM dual
UNION SELECT 4 FROM dual
) s(match_occurrence)
WHERE
REGEXP_SUBSTR(
t.content_data -- input string
, '[<]([^>]*)[>]' -- regex
, 1 -- starting position
, s.match_occurrence -- n-th occurrence
, '' -- regexp modifier
, 1 -- captured-subexp
)
IS NOT NULL
ORDER BY t.content_id, s.match_occurrence
;
-- out Time: First fetch (0 rows): 0.656 ms. All rows formatted: 0.667 ms
-- out content_id | match_occurrence | content_match
-- out ------------+------------------+---------------
-- out 1 | 1 | TITLE
-- out 1 | 2 | SUB-TITLE-1
-- out 1 | 3 | SUB-TITLE-2
-- out 1 | 4 | SUB-TITLE-3
-- out 2 | 1 | TITLE
-- out 2 | 2 | SUB-TITLE-1
-- out 2 | 3 | SUB-TITLE-2
-- out (7 rows)
-- out
-- out Time: First fetch (7 rows): 29.904 ms. All rows formatted: 29.947 ms
I'm searching for a way to get a list of special characters and how many times they appear in my column. I've tried using using regexp_count which works, but I'm not sure how to extend it to make it work for all special characters in one query.
For example for syntax = 'x=y*100' with the following query I get
select *
from (
select regexp_count(syntax, '\*') as charCnt, syntax
from tblTemp
)
where charCnt > 0
charCnt=1 and syntax='x=y*100'.
Which is correct but I want to be able to get back
specChar Cnt
\* 1
= 1
etc..
Oracle Setup:
CREATE TABLE table_name(
id INT,
value NVARCHAR2(200)
);
INSERT INTO table_name
SELECT 1, N'y=20x+3' FROM DUAL UNION ALL
SELECT 2, N'***^%$%$%*&*.&\?' FROM DUAL UNION ALL
SELECT 3, UNISTR('\00B5\00B6\00B5') FROM DUAL UNION ALL
SELECT 4, N'!"£$%^&*()!"£$%^&*()!"£$%^&*()!"£$%^&*()!"£$%^&*()'
|| N'!"£$%^&*()!"£$%^&*()!"£$%^&*()!"£$%^&*()!"£$%^&*()'
|| N'!"£$%^&*()!"£$%^&*()!"£$%^&*()!"£$%^&*()!"£$%^&*()'
|| N'!"£$%^&*()!"£$%^&*()!"£$%^&*()!"£$%^&*()!"£$%^&*()' FROM DUAL;
CREATE OR REPLACE TYPE CHAR_LIST IS TABLE OF CHAR(1 CHAR);
/
Query:
SELECT t.id,
--MAX( t.value ) AS value,
CAST( c.COLUMN_VALUE AS CHAR(1 CHAR) ) AS character,
COUNT(1) AS frequency
FROM table_name t,
TABLE(
CAST(
MULTISET(
SELECT SUBSTR( t.value, LEVEL, 1 )
FROM DUAL
WHERE REGEXP_LIKE( SUBSTR( t.value, LEVEL, 1 ), '[^a-zA-Z0-9]' )
CONNECT BY LEVEL <= LENGTH( t.value )
) AS CHAR_LIST
)
) c
GROUP BY t.id, c.COLUMN_VALUE
ORDER BY id, character;
Output:
ID CHARACTER FREQUENCY
---------- --------- ----------
1 + 1
1 = 1
2 $ 2
2 % 3
2 & 2
2 * 5
2 . 1
2 ? 1
2 \ 1
2 ^ 1
3 µ 2
3 ¶ 1
4 ! 20
4 " 20
4 $ 20
4 % 20
4 & 20
4 ( 20
4 ) 20
4 * 20
4 ^ 20
4 £ 20
I am having csv data in column named component_id in form of 800230,6015,6312,6315,700255,800170,
using the count of the comma I need to generated a unique sequence for the component_id in another column named component_instance_id.
e.g. 1, 2, 3, 4, 5, 6
+-------------------------------------+----------------+
| Component_id | Component |
+-------------------------------------+----------------+
| 800230,6015,6312,6315,700255,800170 | 1,2,3,4,5,6 |
| 800230,6015,6312,6315,700255,800170 | 7,8,9,10,11,12 |
| 800230,6015,6312,6315 | 13,14,15,16 |
+-------------------------------------+----------------+
You could use :
REGEXP_COUNT : To count the number of occurrences of comma
REGEXP_SUBSTR : To split the delimited string into rows
CONNECT BY : For row generation
LISTAGG : For string aggregation
See Split comma delimited strings in a table in Oracle
For example,
SQL> WITH t_1 AS
2 ( SELECT '800230,6015,6312,6315,700255,800170' component_id FROM dual
3 UNION ALL
4 SELECT '800230,6015,6312,6315,700255,800170' FROM dual
5 UNION ALL
6 SELECT '800230,6015,6312,6315' FROM dual
7 ),
8 t AS
9 ( SELECT ROWNUM ID, component_id FROM t_1
10 )
11 SELECT listagg(text, ',') WITHIN GROUP(
12 ORDER BY cv) component_id,
13 listagg(rn, ',') WITHIN GROUP(
14 ORDER BY cv) component_instance_id
15 FROM
16 (SELECT id, lines.column_value cv,
17 rownum rn,
18 trim(regexp_substr(component_id, '[^,]+', 1, lines.column_value)) text
19 FROM t,
20 TABLE (CAST (MULTISET
21 (SELECT LEVEL FROM dual CONNECT BY LEVEL <= regexp_count(component_id, ',')+1
22 ) AS sys.odciNumberList ) ) lines
23 ORDER BY id
24 )
25 GROUP BY id
26 /
COMPONENT_ID COMPONENT_INSTANCE_ID
----------------------------------- ------------------------------------------------
800230,6015,6312,6315,700255,800170 1,2,3,4,5,6
800230,6015,6312,6315,700255,800170 7,8,9,10,11,12
800230,6015,6312,6315 13,14,15,16
SQL>
NOTE : The WITH clause is only to build the sample data for demonstration. You need to only use the query and rename the table names per your requirement.
I want to preserve the record order, which is provided as comma delimited string. The 5th item in by delimited string is a null. I need the 5th row to be null as well.
with test as
(select 'ABC,DEF,GHI,JKL,,MNO' str from dual
)
select rownum, regexp_substr (str, '[^,]+', 1, rownum) split
from test
connect by level <= length (regexp_replace (str, '[^,]+' )) + 1
The current result I'm getting puts this in the 6th position:
1 ABC
2 DEF
3 GHI
4 JKL
5 MNO
6
Order is preserved by your expression, but your regular expression doesn't match nulls correctly, so the 5th item disappears. The 6th row is a NULL because there are no more match after the 5th match.
You could do this instead:
SQL> with test as
2 (select 'ABC,DEF,GHI,JKL,,MNO' str from dual
3 )
4 SELECT rownum,
5 rtrim(regexp_substr(str || ',', '[^,]*,', 1, rownum), ',') split
6 FROM test
7 CONNECT BY LEVEL <= length(regexp_replace(str, '[^,]+')) + 1;
ROWNUM SPLIT
---------- ---------------------------------------------------------------
1 ABC
2 DEF
3 GHI
4 JKL
5
6 MNO
6 rows selected
Or this:
SQL> with test as
2 (select 'ABC,DEF,GHI,JKL,,MNO' str from dual
3 )
4 SELECT rownum,
5 regexp_substr(str, '([^,]*)(,|$)', 1, rownum, 'i', 1) split
6 FROM test
7 CONNECT BY LEVEL <= length(regexp_replace(str, '[^,]+')) + 1;
ROWNUM SPLIT
---------- ------------------------------------------------------------
1 ABC
2 DEF
3 GHI
4 JKL
5
6 MNO
6 rows selected
Try Something like this:
SELECT
STR,
REPLACE ( SUBSTR ( STR,
CASE LEVEL
WHEN 1
THEN
0
ELSE
INSTR ( STR,
'~',
1,
LEVEL
- 1 )
END
+ 1,
1 ),
'~' )
FROM
(SELECT 'A~~C~~E' AS STR FROM DUAL)
CONNECT BY
LEVEL <= LENGTH ( REGEXP_REPLACE ( STR,
'[^~]+' ) )
+ 1;
This one works..
SELECT
ROWNUM,
CAST ( REGEXP_SUBSTR ( STR,
'(.*?)(,|$)',
1,
LEVEL,
NULL,
1 ) AS CHAR ( 12 ) )
OUTPUT
FROM
(SELECT 'ABC,DEF,GHI,JKL,,MNO' AS STR FROM DUAL)
CONNECT BY
LEVEL <= REGEXP_COUNT ( STR,
',' )
+ 1;