pl sql regular expression get specific occurrence

pl sql regular expression get specific occurrence - sql

I have a string like this: aa;bb;cc
Number of chars of each block could be different.
; is the delimiter.
I need to take values seperately. For example: I want to take only the occurrence in the second position (bb).
I tried this:
SELECT trim(regexp_substr('aa;bb;cc', '[^;]+', 1, LEVEL)) str
FROM dual
CONNECT BY regexp_substr('aa;bb;cc', '[^;]+', 1, LEVEL) IS NOT NULL;
But if I do:
SELECT * FROM (SELECT trim(regexp_substr('aa;bb;cc', '[^;]+', 1, LEVEL)) str
FROM dual
CONNECT BY regexp_substr('aa;bb;cc', '[^;]+', 1, LEVEL) IS NOT NULL)
WHERE ROWNUM = 2;
It doesn't work.

Why don't you simply write this?
select trim(regexp_substr('aa;bb;cc', '[^;]+', 1, 2)) str from dual
If you want use recursive query use rownum with alias in inner query or use level pseudocolumn:
select str
from (
select level lvl, trim(regexp_substr('aa;bb;cc', '[^;]+', 1, 2)) str
from dual
connect by regexp_substr('aa;bb;cc', '[^;]+', 1, level) is not null)
where lvl = 2

WHERE ROWNUM = 2 will never return any result, as the rownum is calculated from the resultset of the query. But as there never is a first row, ROWNUM =2 will never be reached.
Easiest is to use OFFSET and LIMIT instead:
SELECT * FROM (
SELECT trim(regexp_substr('aa;bb;cc', '[^;]+', 1, LEVEL)) str
FROM dual
CONNECT BY regexp_substr('aa;bb;cc', '[^;]+', 1, LEVEL) IS NOT NULL
)
OFFSET 1 ROW FETCH NEXT 1 ROWS ONLY

You can use
with t(str) as
(
select 'aa;bb;cc' from dual
), t2 as
(
select trim(regexp_substr(str, '[^;]+', 1, level)) str,
level as lvl
from t
connect by regexp_substr(str, '[^;]+', 1, level) is not null
)
select str
from t2
where lvl = 2;
STR
---
bb
Demo
I don't suggest you use rownum as much as possible, especially queries with subqueries and order by clauses. In your case, WHERE ROWNUM = 1 returns a value ( and the result is untrustable, I mean may be other than you want for real values derived from tables) but for the other equalities ROWNUM = 2 or ROWNUM = 3 even do not return a value.

Related

SELECT rows with a new DISTINCT from a VARCHAR with CSV in it

I have an Oracle database table with a field called Classification which is VARCHAR. The VARCHAR is a CSV(using semi colons). Example:
;CHR;
;OTR;CHR;ROW;
;CHA;ROW;
;OTR;ROW;
I want to pull all the rows with ONLY a different value in the CSV from the others. It is ok if a row has a previously found value as long as it has a new different value.
For instance from the above dataset it would be:
;CHR;
;OTR;CHR;ROW;
;CHA;ROW;
If I do just:
Select DISTINCT Classification from Table1
I get rows that overlap distinct values due to the overall VARCHAR being Distinct.
I can get all the distinct values using:
select LISTAGG(val,',') WITHIN GROUP ( ORDER BY val ) as final
FROM
(
select distinct trim(regexp_substr("Classification",'[^;]+', 1, level) ) as val
from Table1
connect by regexp_substr("Classification", '[^,]+', 1, level) is not null
ORDER BY val
)
which give me
FINAL
CHA,CHR,OTR,ROW
but am unable to make the link to pull out one record per unique value
Is this possible with SQL?
EDIT: This is a database created by a large corporation and mine purchased the product. Now I am tasked with data mining the backend database for BI and have absolutely no control of the database structure.
No offence but I see many answers in the questions I have researched stating 'Do better database design/normalization' and while I agree MOST I have read have no control over the database and are asking for SO assistance with a problem because of this, not ridicule on bad database design.
I apologize if I offend anyone
There is no parent/child relationship. I cannot see the object layer but I assume these values are changed in the object layer before propagating to the client as there is no link to them in the actual database
Clarification:
I see 2 ways to solve this:
1: One select statement that pulls out 1 row based on a new unique value within the VARCHAR CSV(Classification)
2: Use my select statement to loop through and pull one row containing that value in the VARCHAR CSV(Classification)
Thanks all for the input. I upvoted the ones that worked for me. In the end I will be using the one I developed just because I can easily manipulate the output(to a csv) for what the analyst wishes.

Here's one way to approach it:
Assign row numbers to the original CSV data
Split the CSV -> rows
Now assign the split CSV values row numbers, sorted by the CSV ordering from the first step
Return any rows where the row number for the previous step = 1
Return the distinct list of CSVs
For example:
with tab as (
select ';CHR;' str from dual union all
select ';OTR;CHR;ROW;' str from dual union all
select ';CHA;ROW;' str from dual union all
select ';OTR;ROW;' str from dual
), ranks as (
select row_number() over ( order by str ) rn, tab.* from tab
), rws as (
select trim ( regexp_substr(str,'[^;]+', 1, level ) ) as val, rn, str
from ranks
connect by regexp_substr ( str, '[^;]+', 1, level ) is not null
and prior rn = rn
and prior sys_guid () is not null
), rns as (
select row_number () over (
partition by val
order by rn
) val_rn, r.*
from rws r
)
select distinct str
from rns
where val_rn = 1;
STR
;CHA;ROW;
;OTR;CHR;ROW;
;CHR;

This is an ad Hoc solution proposal if the generic answer yields a suboptimal performance and some restrictions are fullfiled:
all the keys have a fixed length
the maximal number of the keys is known
Than to parse the CSV string you may use this query (add further UNION ALL for longer strings)
with tab as (
select ';CHR;' str from dual union all
select ';OTR;CHR;ROW;' str from dual union all
select ';CHA;ROW;' str from dual union all
select ';OTR;ROW;' str from dual
), tab2 as (
select str, substr(str,2,3) val from tab union all
select str, substr(str,6,3) val from tab where substr(str,6,3) is not null union all
select str, substr(str,10,3) val from tab where substr(str,10,3) is not null)
select * from tab2;
which results in
STR VAL
------------- ------------
;CHR; CHR
;OTR;CHR;ROW; OTR
;CHA;ROW; CHA
;OTR;ROW; OTR
;OTR;CHR;ROW; CHR
;CHA;ROW; ROW
;OTR;ROW; ROW
;OTR;CHR;ROW; ROW
Now you need only to find the first occurence of each key and get all distinct strings with this first occurence.
I'm reusing the approach from the solution of Chris Saxon
with tab as (
select ';CHR;' str from dual union all
select ';OTR;CHR;ROW;' str from dual union all
select ';CHA;ROW;' str from dual union all
select ';OTR;ROW;' str from dual
), tab2 as (
select str, substr(str,2,3) val from tab union all
select str, substr(str,6,3) val from tab where substr(str,6,3) is not null union all
select str, substr(str,10,3) val from tab where substr(str,10,3) is not null),
tab3 as (
select STR, VAL,
row_number() over (partition by val order by str) rn
from tab2)
select distinct str
from tab3
where rn = 1

You were very close since you had already gotten the list of distinct values. Instead of combining them with LISTAGG, you can use that list to find a row that contains that unique value. Below are two separate queries that will return a Classification for each unique value. You can try them both and see which performs better based on the data you have in the table.
Query Option 1
WITH
table1 (classification)
AS
(SELECT ';CHR;' FROM DUAL
UNION ALL
SELECT ';OTR;CHR;ROW;' FROM DUAL
UNION ALL
SELECT ';CHA;ROW;' FROM DUAL
UNION ALL
SELECT ';OTR;ROW;' FROM DUAL),
dist_vals (val)
AS
( SELECT DISTINCT TRIM (REGEXP_SUBSTR (classification,
'[^;]+',
1,
LEVEL)) AS val
FROM Table1
CONNECT BY LEVEL < REGEXP_COUNT (classification, ';'))
SELECT val, classification
FROM (SELECT dv.val,
t.classification,
ROW_NUMBER () OVER (PARTITION BY dv.val ORDER BY t.classification) AS occurence
FROM dist_vals dv, table1 t
WHERE t.classification LIKE '%;' || dv.val || ';%')
WHERE occurence = 1;
Query Option 2
WITH
table1 (classification)
AS
(SELECT ';CHR;' FROM DUAL
UNION ALL
SELECT ';OTR;CHR;ROW;' FROM DUAL
UNION ALL
SELECT ';CHA;ROW;' FROM DUAL
UNION ALL
SELECT ';OTR;ROW;' FROM DUAL),
dist_vals (val)
AS
( SELECT DISTINCT TRIM (REGEXP_SUBSTR (classification,
'[^;]+',
1,
LEVEL)) AS val
FROM Table1
CONNECT BY LEVEL < REGEXP_COUNT (classification, ';'))
SELECT dv.val,
(SELECT classification
FROM table1
WHERE classification LIKE '%;' || dv.val || ';%' AND ROWNUM = 1)
FROM dist_vals dv;

I figured it out this way and it runs fast(even once all my joins to other tables are added). Will test other answers as I can and decide best one(others look better than mine if they work as I would rather not use dbms_output)
DECLARE
v_search_string varchar2(4000);
v_classification varchar2(4000);
BEGIN
select LISTAGG(val,',') WITHIN GROUP ( ORDER BY val ) as final
INTO v_search_string
FROM
(
select distinct trim(regexp_substr("Classification",'[^;]+', 1, level) ) as val
from mytable
connect by regexp_substr("Classification", '[^,]+', 1, level) is not null
ORDER BY val
);
FOR i IN
(SELECT trim(regexp_substr(v_search_string, '[^,]+', 1, LEVEL)) l
FROM dual
CONNECT BY LEVEL <= regexp_count(v_search_string, ',')+1
)
LOOP
SELECT "Classification"
INTO v_classification
FROM mytable
WHERE "Classification" LIKE '%' || i.l || '%'
FETCH NEXT 1 ROWS ONLY;
dbms_output.put_line(v_classification);
END LOOP;
END;

How can I eliminate duplicate data in multiple columns query

I just asked the question about how I eliminate duplicate data in a column
How can I eliminate duplicate data in column
this code below can delete duplicates in a column
with data as
(
select 'apple, apple, apple, apple' col from dual
)
select listagg(col, ',') within group(order by 1) col
from (
select distinct regexp_substr(col, '[^,]+', 1, level) col
from data
connect by level <= regexp_count(col, ',')
)
next question is
now I do not know how to eliminate data in multiple columns
select 'apple, apple, apple' as col1,
'prince,prince,princess' as col2,
'dog, cat, cat' as col3
from dual;
I would like to show
COL1 COL2 COL3
----- ---------------- --------
apple prince, princess dog, cat

You may use such a combination :
select
(
select listagg(str,',') within group (order by 0)
from
(
select distinct trim(regexp_substr('apple, apple, apple','[^,]+', 1, level)) as str
from dual
connect by level <= regexp_count('apple, apple, apple',',') + 1
)
) as str1,
(
select listagg(str,',') within group (order by 0)
from
(
select distinct trim(regexp_substr('prince,prince,princess','[^,]+', 1, level)) as str
from dual
connect by level <= regexp_count('prince,prince,princess',',') + 1
)
) as str2,
(
select listagg(str,',') within group (order by 0)
from
(
select distinct trim(regexp_substr('dog, cat, cat','[^,]+', 1, level)) as str
from dual
connect by level <= regexp_count('dog, cat, cat',',') + 1
)
) as str3
from dual;
STR1 STR2 STR3
------ --------------- --------
apple prince,princess cat,dog
Rextester Demo

SQL slow when having subquery in in statement

I have the following statement:
WITH foos AS (
select regexp_substr('H50','[^,]+', 1, level) x from dual
connect by regexp_substr('H50', '[^,]+', 1, level) is not null
), baars AS (
select regexp_substr('G30','[^,]+', 1, level) x from dual
connect by regexp_substr('G30', '[^,]+', 1, level) is not null
)
select
count(*)
from VIEW
where foo in (
'H50'
-- select x from foos
)
and bar in (
'G30'
-- select x from bars
);
When using the constants G30 and H50 it is really fast. However when I use the subquerys, it is realy slow (~5 seconds).
I have no idea why this can be the case. any ideas?

For start, the inline views are not being used in the first use-case, as in not being executed at all.
with t1 (n) as (select 1 from dual union all select n+1 from t where n <= 10)
,t2 (n) as (select 1 from dual union all select n+1 from t where n <= 1000000000000)
select *
from t1
;

Hierarchical table merge itself many times

CREATE TABLE b( ID VARCHAR2(50 BYTE),
PARENT_ID VARCHAR2(50 BYTE),
NAME NVARCHAR2(200)
);
https://community.oracle.com/thread/3513540?
Above link i explained everything

Any reason you don't like the answer posted there? Does the SQL Fiddle which illustrates the answer in question not as you've described your needed results to be?
Edit:
An SQL Fiddle using the technique Frank Kulash suggested. This is the first Oracle based query I have written since 2002 ( on 9iAS ).
SELECT REGEXP_SUBSTR( a.PIDPath, '[^,]+', 1, 1 ) Header1,
REGEXP_SUBSTR( a.IDPath, '[^,]+', 1, 1 ) Header2,
REGEXP_SUBSTR( a.NamePath, '[^,]+', 1, 1 ) Header3,
REGEXP_SUBSTR( a.PIDPath, '[^,]+', 1, 2 ) Header4,
REGEXP_SUBSTR( a.IDPath, '[^,]+', 1, 2 ) Header5,
REGEXP_SUBSTR( a.NamePath, '[^,]+', 1, 2 ) Header6,
REGEXP_SUBSTR( a.PIDPath, '[^,]+', 1, 3 ) Header7,
REGEXP_SUBSTR( a.IDPath, '[^,]+', 1, 3 ) Header8,
REGEXP_SUBSTR( a.NamePath, '[^,]+', 1, 3 ) Header9,
REGEXP_SUBSTR( a.PIDPath, '[^,]+', 1, 4 ) Header10,
REGEXP_SUBSTR( a.IDPath, '[^,]+', 1, 4 ) Header11,
REGEXP_SUBSTR( a.NamePath, '[^,]+', 1, 4 ) Header12,
REGEXP_SUBSTR( a.PIDPath, '[^,]+', 1, 5 ) Header13,
REGEXP_SUBSTR( a.IDPath, '[^,]+', 1, 5 ) Header14,
REGEXP_SUBSTR( a.NamePath, '[^,]+', 1, 5 ) Header15
--,REGEXP_SUBSTR( a.PIDPath, '[^,]+', 1, n )
--,REGEXP_SUBSTR( a.IDPath, '[^,]+', 1, n )
--,REGEXP_SUBSTR( a.NamePath, '[^,]+', 1, n )
FROM ( SELECT LEVEL RootLvl,
b.ID RootID,
SYS_CONNECT_BY_PATH( b.PARENT_ID, ',' )
|| ',' PIDPath,
SYS_CONNECT_BY_PATH( b.ID, ',' )
|| ',' IDPath,
SYS_CONNECT_BY_PATH( b.NAME, ',' )
|| ',' NamePath
FROM t b
START WITH b.PARENT_ID = '1'
CONNECT BY NOCYCLE PRIOR b.ID = b.PARENT_ID ) a
ORDER BY a.RootLvl, a.RootID;
Frank Kulash points out:
The number of columns in the result set has to be hard-coded into the query. You can code something today that produces 30 columns (that is, enough for 10 levels in the hierarchy), but if you change the data later so that there are 11 or more levels, then your query will start to lose results.
I may be back to try a PIVOT thing later. Oracle is craziness.
Edit:
But manageable enough. A static PIVOT works just fine. I can't get CROSS APPLY working like I think it should with VALUES ( MSSQL style ), so I've given up and substituted it for a passable UNION ALL. There's potential for some dynamic SQL work with this one, so it actually might be able to do what you need without hard coding the columns.
;
WITH c ( RID, ID, PARENT_ID, NAME ) AS (
SELECT ROW_NUMBER() OVER (
ORDER BY PARENT_ID ) RID,
ID, PARENT_ID, NAME
FROM t
UNION ALL
SELECT b.RID, a.ID, a.PARENT_ID, a.NAME
FROM t a,
c b
WHERE a.ID = b.PARENT_ID
)
SELECT p."'ID_1'" Header1,
p."'PARENT_ID_1'" Header2,
p."'NAME_1'" Header3,
p."'ID_2'" Header4,
p."'PARENT_ID_2'" Header5,
p."'NAME_2'" Header6,
p."'ID_3'" Header7,
p."'PARENT_ID_3'" Header8,
p."'NAME_3'" Header9,
p."'ID_4'" Header10,
p."'PARENT_ID_4'" Header11,
p."'NAME_4'" Header12,
p."'ID_5'" Header13,
p."'PARENT_ID_5'" Header14,
p."'NAME_5'" Header15
FROM ( SELECT RID,
'ID_' || ROW_NUMBER() OVER (
PARTITION BY RID
ORDER BY ID ) KeyName,
ID KeyValue
FROM c
UNION ALL
SELECT RID,
'PARENT_ID_' || ROW_NUMBER() OVER (
PARTITION BY RID
ORDER BY ID ) KeyName,
PARENT_ID KeyValue
FROM c
UNION ALL
SELECT RID,
'NAME_' || ROW_NUMBER() OVER (
PARTITION BY RID
ORDER BY ID ) KeyName,
CAST( NAME AS VARCHAR2( 200 ) ) KeyValue
FROM c ) s
PIVOT ( MAX( KeyValue ) FOR KeyName IN (
'ID_1', 'PARENT_ID_1', 'NAME_1',
'ID_2', 'PARENT_ID_2', 'NAME_2',
'ID_3', 'PARENT_ID_3', 'NAME_3',
'ID_4', 'PARENT_ID_4', 'NAME_4',
'ID_5', 'PARENT_ID_5', 'NAME_5' ) ) p
ORDER BY p.RID;

split string into several rows

I have a table with a string which contains several delimited values, e.g. a;b;c.
I need to split this string and use its values in a query. For example I have following table:
str
a;b;c
b;c;d
a;c;d
I need to group by a single value from str column to get following result:
str count(*)
a 1
b 2
c 3
d 2
Is it possible to implement using single select query? I can not create temporary tables to extract values there and query against that temporary table.

From your comment to #PrzemyslawKruglej answer
Main problem is with internal query with connect by, it generates astonishing amount of rows
The amount of rows generated can be reduced with the following approach:
/* test table populated with sample data from your question */
SQL> create table t1(str) as(
2 select 'a;b;c' from dual union all
3 select 'b;c;d' from dual union all
4 select 'a;c;d' from dual
5 );
Table created
-- number of rows generated will solely depend on the most longest
-- string.
-- If (say) the longest string contains 3 words (wont count separator `;`)
-- and we have 100 rows in our table, then we will end up with 300 rows
-- for further processing , no more.
with occurrence(ocr) as(
select level
from ( select max(regexp_count(str, '[^;]+')) as mx_t
from t1 ) t
connect by level <= mx_t
)
select count(regexp_substr(t1.str, '[^;]+', 1, o.ocr)) as generated_for_3_rows
from t1
cross join occurrence o;
Result: For three rows where the longest one is made up of three words, we will generate 9 rows:
GENERATED_FOR_3_ROWS
--------------------
9
Final query:
with occurrence(ocr) as(
select level
from ( select max(regexp_count(str, '[^;]+')) as mx_t
from t1 ) t
connect by level <= mx_t
)
select res
, count(res) as cnt
from (select regexp_substr(t1.str, '[^;]+', 1, o.ocr) as res
from t1
cross join occurrence o)
where res is not null
group by res
order by res;
Result:
RES CNT
----- ----------
a 2
b 2
c 3
d 2
SQLFIddle Demo
Find out more about regexp_count()(11g and up) and regexp_substr() regular expression functions.
Note: Regular expression functions relatively expensive to compute, and when it comes to processing a very large amount of data, it might be worth considering to switch to a plain PL/SQL. Here is an example.

This is ugly, but seems to work. The problem with the CONNECT BY splitting is that it returns duplicate rows. I managed to get rid of them, but you'll have to test it:
WITH
data AS (
SELECT 'a;b;c' AS val FROM dual
UNION ALL SELECT 'b;c;d' AS val FROM dual
UNION ALL SELECT 'a;c;d' AS val FROM dual
)
SELECT token, COUNT(1)
FROM (
SELECT DISTINCT token, lvl, val, p_val
FROM (
SELECT
regexp_substr(val, '[^;]+', 1, level) AS token,
level AS lvl,
val,
NVL(prior val, val) p_val
FROM data
CONNECT BY regexp_substr(val, '[^;]+', 1, level) IS NOT NULL
)
WHERE val = p_val
)
GROUP BY token;
TOKEN COUNT(1)
-------------------- ----------
d 2
b 2
a 2
c 3

SELECT NAME,COUNT(NAME) FROM ( SELECT NAME FROM ( (SELECT rownum as ID, REGEXP_SUBSTR('a;b;c', '[^;]+', 1, LEVEL ) NAME
FROM dual CONNECT BY REGEXP_SUBSTR('a;b;c', '[^;]+', 1, LEVEL) IS NOT NULL))
UNION ALL (SELECT NAME FROM ( (SELECT rownum as ID, REGEXP_SUBSTR('b;c;d', '[^;]+', 1, LEVEL ) NAME
FROM dual CONNECT BY REGEXP_SUBSTR('b;c;d', '[^;]+', 1, LEVEL) IS NOT NULL)))
UNION ALL
(SELECT NAME FROM ( (SELECT rownum as ID, REGEXP_SUBSTR('a;c;d', '[^;]+', 1, LEVEL ) NAME
FROM dual CONNECT BY REGEXP_SUBSTR('a;c;d', '[^;]+', 1, LEVEL) IS NOT NULL)))) GROUP BY NAME
NAME COUNT(NAME)
----- -----------
d 2
a 2
b 2
c 3

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

pl sql regular expression get specific occurrence - sql

Related

SELECT rows with a new DISTINCT from a VARCHAR with CSV in it

How can I eliminate duplicate data in multiple columns query

SQL slow when having subquery in in statement

Hierarchical table merge itself many times

split string into several rows

Categories

Resources