how to extract unique words from cell and count them - sql

I have an column "DESCRIPTION" (VARCHAR2 (500 Byte))
I want as result two columns. First extract from each cell unique words and display them in one column, and in second count their frequency.
Additionaly I have limiting parametre "ENTRYDATE" (i.e. "WHERE ENTRYDATE BETWEEN 20180101 and 20190101"). Because table is quite big.
I have some solution in Excel, but it's messy and painful to do.
Is it even possible to do in Oracle with SELECT?
Example:
NUMBER OF COLUMN | EXPLANATION
1 | roses are red violets are blue
2 | red violets
3 | red
4 | roses
5 | blue
RESULT:
WORDS | COUNTING
roses | 2
are | 2
red | 3
violets | 2
blue | 2
Variation of query:
with test as
(select 1 as nor, 'roses are red violets are blue' as explanation from dual union all
select 2 as nor, 'red violets' as explanation from dual union all
select 3 as nor, 'red' as explanation from dual union all
select 4 as nor, 'roses' as explanation from dual union all
select 5 as nor, 'blue' as explanation from dual
),
temp as
(select nor,
trim(column_value) word
from test join xmltable(('"' || replace(explanation, ' ', '","') ||'"')) on 1 = 1
)
select word,
count(*)
from temp
group by word
order by word;
returns ORA-00905: missing keyword

Split explanation into rows (so that you'd get words), then apply COUNT function to those words.
SQL> with test (nor, explanation) as
2 (select 1, 'roses are red violets are blue' from dual union all
3 select 2, 'red violets' from dual union all
4 select 3, 'red' from dual union all
5 select 4, 'roses' from dual union all
6 select 5, 'blue' from dual
7 ),
8 temp as
9 (select nor,
10 regexp_substr(explanation, '[^ ]+', 1, column_value) word
11 from test join table(cast(multiset(select level from dual
12 connect by level <= regexp_count(explanation, ' ') + 1
13 ) as sys.odcinumberlist)) on 1 = 1
14 )
15 select word,
16 count(*)
17 from temp
18 group by word
19 order by word;
WORD COUNT(*)
------------------------------ ----------
are 2
blue 2
red 3
roses 2
violets 2
SQL>
You mentioned entrydate column but there's none in your sample data so - if necessary, include it into the TEMP CTE.
Edit
Huh, Oracle 9i ... back to the Dark Ages:
SQL> with test (nor, explanation) as
2 (select 1, 'roses are red violets are blue' from dual union all
3 select 2, 'red violets' from dual union all
4 select 3, 'red' from dual union all
5 select 4, 'roses' from dual union all
6 select 5, 'blue' from dual
7 ),
8 temp as
9 (select nor,
10 trim(column_value) word
11 from test join xmltable(('"' || replace(explanation, ' ', '","') ||'"')) on 1 = 1
12 )
13 select word,
14 count(*)
15 from temp
16 group by word
17 order by word;
WORD COUNT(*)
-------------------- ----------
are 2
blue 2
red 3
roses 2
violets 2
SQL>

The problem is in your old Oracle version. This query should work, it has only basic connect by, instr and dbms_random:
select word, count(1) counting
from (
select id, trim(case pos2 when 0 then substr(description, pos1)
else substr(description, pos1, pos2 - pos1)
end) word
from (
select id, description,
case level when 1 then 1 else instr(description, ' ', 1, level - 1) end pos1,
instr(description, ' ', 1, level) pos2
from t
connect by prior dbms_random.value is not null
and prior id = id
and level <= length(description) - length(replace(description, ' ', '')) + 1))
group by word
demo

-- Oracle 12c+
with test (nor, explanation) as (
select 1, 'roses are red violets are blue' from dual union all
select 2, 'red violets' from dual union all
select 3, 'red' from dual union all
select 4, 'roses' from dual union all
select 5, 'blue' from dual)
select regexp_substr(explanation, '\S+', 1, lvl) word, count(*) cnt
from test,
lateral(
select rownum lvl
from dual
connect by level <= regexp_count(explanation, '\S+')
)
group by regexp_substr(explanation, '\S+', 1, lvl);
WORD CNT
------------------------------ ----------
roses 2
are 2
violets 2
red 3
blue 2

Related

SUBSTR to ADD value in oracle

I have table with column having data in below format in Oracle DB.
COL 1
abc,mno:EMP
xyz:EMP;tyu,opr:PROF
abc,mno:EMP;tyu,opr:PROF
I am trying to convert the data in below format
COL 1
abc:EMP;mno:EMP
xyz:EMP;tyu:PROF;opr:PROF
abc:EMP;mno:EMP;tyu:PROF;opr:PROF
Basically trying to get everything after : and before ; to move it substitute comma with it.
I tried some SUBSTR and LISTAGG but couldn't get anything worth sharing.
Regards.
Here's one option; read comments within code.
SQL> with test (id, col) as
2 -- sample data
3 (select 1, 'abc,mno:EMP' from dual union all
4 select 2, 'xyz:EMP;tyu,opr:PROF' from dual union all
5 select 3, 'abc,mno:EMP;tyu,opr:PROF' from dual
6 ),
7 temp as
8 -- split sample data to rows
9 (select id,
10 column_value cv,
11 regexp_substr(col, '[^;]+', 1, column_value) val
12 from test cross join
13 table(cast(multiset(select level from dual
14 connect by level <= regexp_count(col, ';') + 1
15 ) as sys.odcinumberlist))
16 )
17 -- finally, replace comma with a string that follows a colon sign
18 select id,
19 listagg(replace(val, ',', substr(val, instr(val, ':')) ||';'), ';') within group (order by cv) new_val
20 from temp
21 group by id
22 order by id;
ID NEW_VAL
---------- ----------------------------------------
1 abc:EMP;mno:EMP
2 xyz:EMP;tyu:PROF;opr:PROF
3 abc:EMP;mno:EMP;tyu:PROF;opr:PROF
SQL>
Using the answer of littlefoot, if i were to use cross apply i wouldnt need to cast as multiset...
with test (id, col) as
-- sample data
(select 1, 'abc,mno:EMP' from dual union all
select 2, 'xyz:EMP;tyu,opr:PROF' from dual union all
select 3, 'abc,mno:EMP;tyu,opr:PROF' from dual
),
temp as
-- split sample data to rows
(select id,
column_value cv,
regexp_substr(col, '[^;]+', 1, column_value) val
from test
cross apply (select level as column_value
from dual
connect by level<= regexp_count(col, ';') + 1)
)
-- finally, replace comma with a string that follows a colon sign
select id,
listagg(replace(val, ',', substr(val, instr(val, ':')) ||';'), ';') within group (order by cv) new_val
from temp
group by id
order by id;
You do not need recursive anything, just basic regex: if the pattern is always something,something2:someCode (e.g. you have no colon before the comma), then it would be sufficient.
with test (id, col) as (
select 1, 'abc,mno:EMP' from dual union all
select 2, 'xyz:EMP;tyu,opr:PROF' from dual union all
select 3, 'abc,mno:EMP;tyu,opr:PROF' from dual union all
select 3, 'abc,mno:EMP;tyu,opr:PROF;something:QWE;something2:QWE' from dual
)
select
/*
Grab this groups:
1) Everything before the comma
2) Then everything before the colon
3) And then everything between the colon and a semicolon
Then place group 3 between 1 and 2
*/
trim(trailing ';' from regexp_replace(col || ';', '([^,]+),([^:]+):([^;]+)', '\1:\3;\2:\3')) as res
from test
| RES |
| :------------------------------------------------------------- |
| abc:EMP;mno:EMP |
| xyz:EMP;tyu:PROF;opr:PROF |
| abc:EMP;mno:EMP;tyu:PROF;opr:PROF |
| abc:EMP;mno:EMP;tyu:PROF;opr:PROF;something:QWE;something2:QWE |
db<>fiddle here

Print number and character in two different column from a single column in oracle 11g

My table
create table tdata (val varchar(5));
val
a
1
b
c
dd
ee
f
2
3
4
5
--Output i want is
Number Character
1 a
2 b
3 c
4 d
5 e
I have done this but the only problem is that i'm gettting null values in both columns in place
of character in number column & vice versa
--Query
select REGEXP_SUBSTR(val, '[0-9]+') as num1,
REGEXP_SUBSTR(substr(val,1,1),'[a-z]')as char1 from tdata
The way you put it, see if something like this helps:
create two additional tables (using CTE) - one for numbers, another for letters
fetch ROWNUM which will then be used to join those tables
join them!
Sample data in lines #1 - 9, the rest might be what you need.
SQL> with tdata (val) as
2 (select 'a' from dual union all
3 select '1' from dual union all
4 select 'b' from dual union all
5 select 'c' from dual union all
6 select '2' from dual union all
7 select '3' from dual
8 ),
9 --
10 numbers as
11 (select val,
12 rownum rn
13 from tdata
14 where regexp_like(val, '[[:digit:]]')
15 ),
16 letters as
17 (select substr(val, 1, 1) val,
18 rownum rn
19 from tdata
20 where regexp_like(val, '[[:alpha:]]')
21 )
22 select n.val, l.val
23 from numbers n join letters l on n.rn = l.rn;
VAL VAL
----- -----
1 a
2 b
3 c
SQL>
WITH cte AS (
SELECT SUBSTR(val, 1, 1) Character FROM tdata
UNION
SELECT SUBSTR(val, 2, 1) Character FROM tdata WHERE LENGTH(val) > 1
UNION
SELECT SUBSTR(val, 3, 1) Character FROM tdata WHERE LENGTH(val) > 2
UNION
SELECT SUBSTR(val, 4, 1) Character FROM tdata WHERE LENGTH(val) > 3
UNION
SELECT SUBSTR(val, 5, 1) Character FROM tdata WHERE LENGTH(val) > 4
)
SELECT ROW_NUMBER() OVER (ORDER BY Character) "Number", Character
FROM cte
WHERE Character BETWEEN 'a' AND 'z';
fiddle

How to query for non-consecutive values?

I have a column of id: 1, 3, 4, 9, 10, 11 in the table called t_mark
How can I get the non-consecutive range? (e.g. [1, 3], [4, 9])
Alternatively, using LEAD analytic function, along with your fancy formatting. TEST CTE is what you already have; lines #9 onwards is what you need.
SQL> with test (col) as
2 (select 1 from dual union all
3 select 3 from dual union all
4 select 4 from dual union all
5 select 9 from dual union all
6 select 10 from dual union all
7 select 11 from dual
8 ),
9 temp as
10 (select col,
11 lead(col) over (order by col) lcol
12 from test
13 )
14 select '[' || col ||' - '|| lcol ||']' result
15 From temp
16 where lcol - col > 1
17 order by col;
RESULT
-------------------------------------------------------
[1 - 3]
[4 - 9]
SQL>
[EDIT: Adjusted so that you shouldn't have to think too much]
This is what you have:
SQL> select * From t_mark;
M_ID
----------
1
3
4
9
10
11
6 rows selected.
This is what you need:
SQL> with temp as
2 (select m_id,
3 lead(m_id) over (order by m_id) lm_id
4 from t_mark
5 )
6 select '[' || m_id ||' - '|| lm_id ||']' result
7 From temp
8 where lm_id - m_id > 1
9 order by m_id;
RESULT
------------------------------------------------------------------
[1 - 3]
[4 - 9]
SQL>
Basically, you should learn how to use a CTE (common table expression, a.k.a. the with factoring clause).
Assuming that by "list" you mean a table with a column, then you can do this with lag():
select prev_number, number
from (select t.*, lag(number) over (order by number) as prev_number
from t
) t
where prev_number <> number - 1;
This should do the trick :
WITH original_table(number_column) as (select 1 from dual union all
select 3 from dual union all
select 4 from dual union all
select 9 from dual union all
select 10 from dual union all
select 11 from dual),
numbers AS (
SELECT row_number() over (ORDER BY number_column ASC ) row_num,
number_column
FROM original_table
)
SELECT nb1.number_column AS lnumber,
nb2.number_column AS rnumber
FROM numbers nb1
INNER JOIN numbers nb2 ON nb1.row_num + 1 = nb2.row_num
AND nb1.number_column + 1 < nb2.number_column
Result :
| LNUMBER | RNUMBER |
|---------|---------|
| 1 | 3 |
| 4 | 9 |
Link to the dbfiddle for testing

How to sort alphanumeric String in oracle?

Input is:
Section1
Section2
Section3
Section10
Section11
Section1A
Section1B
Section12
Section11A
Section11B
And I want output like:
Section1
Section1A
Section1B
Section2
Section3
Section10
Section11
Section11A
Section11B
Section12
I tried query :
select section_name
from sections
order by length(section_name),section_name
Assuming that the structure of your strings is fixed, as in your example, this could be a way:
SQL> select x,
2 to_number(regexp_substr(x, '[0-9]+')) numericPart,
3 regexp_substr(x, '([0-9]+)([A-Z])', 1, 1, '', 2) optionalChar
4 from (
5 select 'Section1' x from dual union all
6 select 'Section2' from dual union all
7 select 'Section3' from dual union all
8 select 'Section10' from dual union all
9 select 'Section11' from dual union all
10 select 'Section1A' from dual union all
11 select 'Section1B' from dual union all
12 select 'Section12' from dual union all
13 select 'Section11A' from dual union all
14 select 'Section11B' from dual
15 )
16 order by numericPart,
17 optionalChar nulls first
18 ;
X NUMERICPART OPTIONALCHAR
---------- ----------- ----------------------------------------
Section1 1
Section1A 1 A
Section1B 1 B
Section2 2
Section3 3
Section10 10
Section11 11
Section11A 11 A
Section11B 11 B
Section12 12
Here you first order by the numeric part, treating it as number, and then consider the (optional) character after the number.

Oracle SQL : Regexp_substr

I have below sample values in a column
Abc-123-xyz
Def-456-uvw
Ghi-879-rst-123
Jkl-abc
Expected output is the third element split by '-', in case there is no third element, the last element will be retrieve.
See expected output below:
Xyz
Uvw
Rst
Abc
Thanks ahead for the help.
SELECT initcap(nvl(regexp_substr(word, '[^-]+', 1,3),regexp_substr(word, '[^-]+', 1,2))) FROM your_table;
Another approach:
SQL> with t1(col) as(
2 select 'Abc-123-xyz' from dual union all
3 select 'Def-456-uvw' from dual union all
4 select 'Ghi-879-rst-123' from dual union all
5 select 'Jkl-Abc' from dual
6 )
7 select regexp_substr( col
8 , '[^-]+'
9 , 1
10 , case
11 when regexp_count(col, '[^-]+') >= 3
12 then 3
13 else regexp_count(col, '[^-]+')
14 end
15 ) as res
16 from t1
17 ;
Result:
RES
---------------
xyz
uvw
rst
Abc
regexp_substr(column, '(.*?-){0,2}([^-]+)', 1, 1, '', 2)
You can also do it without RegEx:
with t1 as(
select 'Abc-123-xyz' as MyText from dual union all
select 'Def-456-uvw' from dual union all
select 'Ghi-879-rst-123' from dual union all
select 'Jkl-Abc' from dual
)
SELECT
SUBSTR(t1.mytext, LENGTH(t1.mytext) - INSTR(REVERSE(t1.mytext), '-') + 2)
FROM t1
;