Oracle regex count multiple occurrences of a string surrounded by commas - sql

This question is similar to a previous question of mine. I am looking for a way to count a character string in a comma-separated list of values in a column in an Oracle (11g) SQL database. For example, suppose I have the following data:
SELECT ('SL,PK') as col1 FROM dual
UNION ALL
SELECT ('SL,CR,SL') as col1 FROM dual
UNION ALL
SELECT ('PK,SL') as col1 FROM dual
UNION ALL
SELECT ('SL,SL') as col1 FROM dual
UNION ALL
SELECT ('SL') as col1 FROM dual
UNION ALL
SELECT ('PK') as col1 FROM dual
UNION ALL
SELECT ('PI,SL,PK') as col1 FROM dual
UNION ALL
SELECT ('PI,SL,SL,PK') as col1 FROM dual
UNION ALL
SELECT ('PI,SL,SL,SL,PK') as col1 FROM dual
UNION ALL
SELECT ('PI,SL,SL,SL,SL,PK') as col1 FROM dual
UNION ALL
SELECT ('PI,OSL,SL,PK') as col1 FROM dual
UNION ALL
SELECT ('PI,SL,SLR,PK') as col1 FROM dual
COL1
-----
SL,PK
SL,CR,SL
PK,SL
SL,SL
SL
PK
PI,SL,PK
PI,SL,SL,PK
PI,SL,SL,SL,PK
PI,SL,SL,SL,SL,PK
PI,OSL,SL,PK
PI,SL,SLR,PK
I am looking to count all occurrences of the substring 'SL', strictly (i.e. not including 'OSL', 'SLR', etc). The ideal result would look like this:
COL1 COL2
----- -----
SL,PK 1
SL,CR,SL 2
PK,SL 1
SL,SL 2
SL 1
PK 0
PI,SL,PK 1
PI,SL,SL,PK 2
PI,SL,SL,SL,PK 3
PI,SL,SL,SL,SL,PK 4
PI,OSL,SL,PK 1
PI,SL,SLR,PK 1
I can accomplish this using length and regexp_replace:
SELECT
col1,
(length(col1) - NVL(length(regexp_replace(regexp_replace(col1,'(^|,)(SL)($|,)','\1' || '' || '\3',1,0,'imn'),'(^|,)(SL)($|,)','\1' || '' || '\3',1,0,'imn')),0))/length('SL') as col2
FROM (
SELECT ('SL,PK') as col1 FROM dual
UNION ALL
SELECT ('SL,CR,SL') as col1 FROM dual
UNION ALL
SELECT ('PK,SL') as col1 FROM dual
UNION ALL
SELECT ('SL,SL') as col1 FROM dual
UNION ALL
SELECT ('SL') as col1 FROM dual
UNION ALL
SELECT ('PK') as col1 FROM dual
UNION ALL
SELECT ('PI,SL,PK') as col1 FROM dual
UNION ALL
SELECT ('PI,SL,SL,PK') as col1 FROM dual
UNION ALL
SELECT ('PI,SL,SL,SL,PK') as col1 FROM dual
UNION ALL
SELECT ('PI,SL,SL,SL,SL,PK') as col1 FROM dual
UNION ALL
SELECT ('PI,OSL,SL,PK') as col1 FROM dual
UNION ALL
SELECT ('PI,SL,SLR,PK') as col1 FROM dual
)
COL1 COL2
----- -----
SL,PK 1
SL,CR,SL 2
PK,SL 1
SL,SL 2
SL 1
PK 0
PI,SL,PK 1
PI,SL,SL,PK 2
PI,SL,SL,SL,PK 3
PI,SL,SL,SL,SL,PK 4
PI,OSL,SL,PK 1
PI,SL,SLR,PK 1
but was hoping for a more elegant solution, perhaps with regexp_count. I have achieved my goal successfully in other regex implementations that have the word boundary \b construct available (with \bSL\b), but have not found a solution for Oracle's regex.

You can use regexp_count() if you hack the string:
select col1, regexp_count(replace(col1, ',', ',,'), '(^|\W)SL(\W|$)')
This doubles the delimiter so the first match doesn't eat it up -- getting around the underlying issue which is that Oracle regular expressions do not support look-ahead.
Here is a db<>fiddle.

Here's one option:
SQL> with temp as
2 (select col1,
3 regexp_substr(col1, '[^,]+', 1, column_value) val
4 from test cross join
5 table(cast(multiset(select level from dual
6 connect by level <= regexp_count(col1, ',') + 1
7 ) as sys.odcinumberlist))
8 )
9 select col1,
10 sum(case when val = 'SL' then 1 else 0 end) col2
11 From temp
12 group by col1;
COL1 COL2
----------------- ----------
PI,SL,SLR,PK 1
PK,SL 1
PK 0
SL,CR,SL 2
PI,OSL,SL,PK 1
SL,SL 2
PI,SL,SL,PK 2
PI,SL,SL,SL,PK 3
SL,PK 1
SL 1
PI,SL,PK 1
PI,SL,SL,SL,SL,PK 4
12 rows selected.
SQL>
What does it do?
temp CTE splits each column into rows (separator is comma)
the final select simply counts number of SLs for each col1

You can use an XMLTABLE to spilt the string and then count:
SELECT col1,
(
SELECT COUNT(*)
FROM XMLTABLE(
('"' || REPLACE( col1, ',', '","' ) || '"')
COLUMNS
value CHAR(2) PATH '.'
)
WHERE value = 'SL'
) AS col2
FROM test_data
So, for your test data:
CREATE TABLE test_data ( col1 ) AS
SELECT 'SL,PK' FROM dual UNION ALL
SELECT 'SL,CR,SL' FROM dual UNION ALL
SELECT 'PK,SL' FROM dual UNION ALL
SELECT 'SL,SL' FROM dual UNION ALL
SELECT 'SL' FROM dual UNION ALL
SELECT 'PK' FROM dual UNION ALL
SELECT 'PI,SL,PK' FROM dual UNION ALL
SELECT 'PI,SL,SL,PK' FROM dual UNION ALL
SELECT 'PI,SL,SL,SL,PK' FROM dual UNION ALL
SELECT 'PI,SL,SL,SL,SL,PK' FROM dual UNION ALL
SELECT 'PI,OSL,SL,PK' FROM dual UNION ALL
SELECT 'PI,SL,SLR,PK' FROM dual
This outputs:
COL1 | COL2
:---------------- | ---:
SL,PK | 1
SL,CR,SL | 2
PK,SL | 1
SL,SL | 2
SL | 1
PK | 0
PI,SL,PK | 1
PI,SL,SL,PK | 2
PI,SL,SL,SL,PK | 3
PI,SL,SL,SL,SL,PK | 4
PI,OSL,SL,PK | 1
PI,SL,SLR,PK | 2
db<>fiddle here

Related

How to create binary column based on starts character in other column in Oracle SQL Developer?

I have table like below in Oracle SQL Developer:
col1
-------
Regio Apo
Makreg One15
Regio Kawalisz
Makreg Podl
Makrego BB
AAA
And based on values in "col1" I need to create new column "col2". It should be binary columns:
When value in "col1" starts with "M" return 1 in "col2"
When value in "col1" starts with "R" return 0 in "col2"
rather all values in "col1" start with M or R but just in case it starts with another letter give NULL
So as a result I need something like below:
col1 col2
-----------------------
Regio Apo | 0
Makreg One15 | 1
Regio Kawalisz | 0
Makreg Podl | 1
Makrego BB | 1
AAA | NULL
How can I do that in Oracle SQL Developer ?
CASE seems to be the most obvious:
SQL> with test (col1) as
2 (select 'Regio Apo' from dual union all
3 select 'Makreg One15' from dual union all
4 select 'Regio Kawalisz' from dual union all
5 select 'Makreg Podl' from dual union all
6 select 'Makrego BB' from dual union all
7 select 'AAA' from dual
8 )
9 select col1,
10 case when substr(col1, 1, 1) = 'M' then 1
11 when substr(col1, 1, 1) = 'R' then 0
12 else null
13 end cols
14 from test;
COL1 COLS
-------------- ----------
Regio Apo 0
Makreg One15 1
Regio Kawalisz 0
Makreg Podl 1
Makrego BB 1
AAA
6 rows selected.
SQL>
I need to create new column "col2".
Add a virtual column to the table:
ALTER TABLE table_name
ADD (
col2 NUMBER(1,0)
GENERATED ALWAYS AS (
CASE SUBSTR(col1, 1, 1)
WHEN 'M' THEN 1
WHEN 'R' THEN 0
ELSE NULL
END
)
);
Which, for the sample data:
CREATE TABLE table_name (col1) AS
SELECT 'Regio Apo' FROM DUAL UNION ALL
SELECT 'Makreg One15' FROM DUAL UNION ALL
SELECT 'Regio Kawalisz' FROM DUAL UNION ALL
SELECT 'Makreg Podl' FROM DUAL UNION ALL
SELECT 'Makrego BB' FROM DUAL UNION ALL
SELECT 'AAA' FROM DUAL;
After adding the column, then:
SELECT * FROM table_name;
Outputs:
COL1
COL2
Regio Apo
0
Makreg One15
1
Regio Kawalisz
0
Makreg Podl
1
Makrego BB
1
AAA
db<>fiddle here

(Oracle)Splitting strings then averaging at once

My colum COL1 have sometimes have data such as, '10|20'.
My goal is to split the data if the data have "|". And then averaging them to get 15.
How to modify my code below to add COL2 like this?
(Expected results)
COL1 COL2
------- -------
10 10
10|20 15
10|20|30 20
(My code)
WITH A AS (
SELECT '10' COL1 FROM DUAL
UNION ALL
SELECT '10|20' FROM DUAL
UNION ALL
SELECT '10|20|30' FROM DUAL
) SELECT COL1 FROM A DUAL
You can use a correlated XMLTABLE to split the values:
WITH A AS (
SELECT '10' COL1 FROM DUAL UNION ALL
SELECT '10|20' FROM DUAL UNION ALL
SELECT '10|20|30' FROM DUAL
)
SELECT col1,
(
SELECT AVG( TO_NUMBER( column_value ) )
FROM xmltable(('"' || REPLACE(a.col1, '|', '","') || '"'))
) AS col2
FROM A
Which outputs:
COL1 | COL2
:------- | ---:
10 | 10
10|20 | 15
10|20|30 | 20
db<>fiddle here
Here you go:
SQL> with a as
2 (select '10' col1 from dual union all
3 select '10|20' from dual union all
4 select '10|20|30' from dual
5 )
6 select
7 col1,
8 avg(to_number(regexp_substr(col1, '[^\|]+', 1, column_value))) col2
9 from a cross join
10 table(cast(multiset(select level from dual
11 connect by level <= regexp_count(col1, '\|') + 1
12 ) as sys.odcinumberlist))
13 group by col1
14 order by col1;
COL1 COL2
-------- ----------
10 10
10|20 15
10|20|30 20
SQL>
What does it do?
Line #8 (with a little help of lines #10 - 12):
REGEXP_SUBSTR part is used to split column to rows
TO_NUMBER converts substring to number
AVG calculates average value
WITH t AS (
SELECT '10' text FROM DUAL
UNION ALL
SELECT '10|20' FROM DUAL
UNION ALL
SELECT '10|20|30' FROM DUAL
)
SELECT text,
avg(to_number(regexp_substr(t.text, '[^\|]+', 1, column_value))) average
FROM t,
TABLE (CAST (MULTISET
(SELECT LEVEL FROM dual
CONNECT BY instr(t.text, '|', 1, LEVEL - 1) > 0
) AS sys.odciNumberList ) ) lines
GROUP BY t.text ORDER BY t.text;
TEXT AVERAGE
-------- ----------
10 10
10|20 15
10|20|30 20

Regular expression matching with Oracle

I am working on SQL Developer. I want only those records which have non-numeric data. The query I used is:
select * from TBL_NAME where regexp_like (mapping_name,'%[!0-9]%');
Strangely this is not working.
How about this? As you said, return values that are NOT numbers.
SQL> with test (col) as
2 (select 'abc123' from dual union
3 select '12345' from dual union
4 select 'abc' from dual union
5 select '($ff3' from dual union
6 select '12.345' from dual
7 )
8 select col
9 from test
10 where not regexp_like (col, '^\d+|(\.\d+)$');
COL
------
($ff3
abc
abc123
SQL>
If there are no decimal values, regular expression is even simpler: '^\d+$'
[EDIT, after sample data have been provided]
Piece of cake:
SQL> with test (col) as
2 (select 'ABC' from dual union
3 select 'BCE1' from dual union
4 select '2GHY' from dual union
5 select 'WE56S' from dual union
6 select 'TUY' from dual
7 )
8 select col
9 from test
10 where not regexp_like (col, '\d');
COL
-----
ABC
TUY
SQL>

SQL Order of execution when DISTINCT and AVG are present in the same SELECT statement

I'm using Oracle 11g. In what order will this SQL statement be "parsed"?
Assuming there are many duplicate values in col2:
SELECT DISTINCT col1, AVG(col2)
FROM table1
GROUP BY col1
Will it:
1. remove all the duplicate col1-col2 data combination, and then do an average on col2 on this reduced resultset, OR
2. do an aggregate average on col2 first, and then do a distinct on this resultset?
An exampe should be self-explanatory:
SQL> create table testDistinct (col1, col2) as(
2 select 1, 100 from dual union all
3 select 1, 10 from dual union all
4 select 1, 10 from dual union all
5 select 2, 50 from dual union all
6 select 3, 1 from dual union all
7 select 3, 100 from dual
8 );
Table created.
SQL> select col1, avg(col2)
2 from testDistinct
3 group by col1;
COL1 AVG(COL2)
---------- ----------
1 40
2 50
3 50,5
SQL> select DISTINCT col1, avg(col2)
2 from testDistinct
3 group by col1;
COL1 AVG(COL2)
---------- ----------
1 40
2 50
3 50,5
Applying the GROUP over the result of a DISTINCT gives:
SQL> select col1, avg(col2)
2 from (
3 select DISTINCT col1, col2
4 from testDistinct
5 )
6 group by col1;
COL1 AVG(COL2)
---------- ----------
1 55
2 50
3 50,5

Select records where all rows have same value in two columns

Here is my sample table
Col1 Col2
A 1
B 1
A 1
B 2
C 3
I want to be able to select distinct records where all rows have the same value in Col1 and Col2. So my answer should be
Col1 Col2
A 1
C 3
I tried
SELECT Col1, Col2 FROM Table GROUP BY Col1, Col2
This gives me
Col1 Col2
A 1
B 1
B 2
C 3
which is not the result I am looking for. Any tips would be appreciated.
Try this out:
SELECT col1, MAX(col2) aCol2 FROM t
GROUP BY col1
HAVING COUNT(DISTINCT col2) = 1
Output:
| COL1 | ACOL2 |
|------|-------|
| A | 1 |
| C | 3 |
Fiddle here.
Basically, this makes sure that amount the different values for col2 are unique for a given col1.
Try this:
SELECT * FROM MYTABLE
GROUP BY Col1, Col2
HAVING COUNT(*)>1
For example SQLFiddle here
you can try either of the below -
select col1, col2 from
(
select 'A' Col1 , 1 Col2
from dual
union all
select 'B' , 1
from dual
union all
select 'A' ,1
from dual
union all
select 'B' ,2
from dual
)
group by col1, col2
having count(*) >1;
OR
select col1, col2
from
(
select col1, col2, row_number() over (partition by col1, col2 order by col1, col2) cnt
from
(
select 'A' Col1 , 1 Col2
from dual
union all
select 'B' , 1
from dual
union all
select 'A' ,1
from dual
union all
select 'B' ,2
from dual
)
)
where cnt>1;