Count and order comma separated values - sql

I have the below one column "table" (apologies for the data model, not my fault :():
COL_IN
------
2K, E
E, 2K
O
I would like to obtain the below output, ordered by count descending:
COL_OUT COUNT
----------
K 4
E 2
O 1

COUNT is a reserved keyword, so it's not a good column name - even in the final output. I use COUNT_ instead (with an underscore).
Other than that, you can modify the input strings so they become valid JSON arrays, so that you can then use JSON functions to split them. After you split the strings into tokens, it's a simple matter to separate the leading number (if present) from the rest of the string, and to aggregate. NVL in the sum adds 1 for each token without a leading integer.
Including the sample data for testing only (if you have an actual table, remove the WITH clause at the top):
with
tbl (col_in) as (
select '2K, E' from dual union all
select 'E, 2K' from dual union all
select 'O' from dual
)
select ltrim(col, '0123456789') as col_out
, sum(nvl(to_number(regexp_substr(col, '^\d*')), 1)) as count_
from tbl,
json_table('["' || regexp_replace(col_in, ', *', '","') || '"]', '$[*]'
columns col path '$')
group by ltrim(col, '0123456789')
order by count_ desc, col_out
;
COL_OUT COUNT_
------- ------
K 4
E 2
O 1

You can use hierarchical query in such a way that
WITH t2 AS
(
SELECT TRIM(REGEXP_SUBSTR(col_in,'[^,]+',1,level)) AS s
FROM t
CONNECT BY level <= REGEXP_COUNT(col_in,',')+1
AND PRIOR SYS_GUID() IS NOT NULL
AND PRIOR col_in = col_in
)
SELECT REGEXP_SUBSTR(s,'[^0-9]') AS col_out,
SUM(NVL(REGEXP_SUBSTR(s,'[^[:alpha:]]'),1)) AS count
FROM t2
GROUP BY REGEXP_SUBSTR(s,'[^0-9]'),REGEXP_SUBSTR(s,'[^[:alpha:]]')
ORDER BY count DESC
presuming all of the data are alphanumeric only(eg.not containing special charaters such as $,#,! ..etc.)

Related

SQL How to perform multiple look-ups from a list, in one query

We have a weird database table (wt) for which I can construct a query that can return a single row with these fields:
wt.thing_a_id = 5, wt.thing_b_id = 12, wt.thing_c_id = 9
Then, there's another lookup table (dt) that holds descriptions for these numbers, you could imagine it like this:
id desc
5 "flour"
12 "cups"
9 "barley"
what I need to end up with is numbers from wt, along with its description from dt.
I can do 3 simple queries, one to look up each of my three thing_ values (select desc from dt where id = ) but I was hoping to do it all in one query.
Is there a way to do this?
Even better, is there way to do my query to get my single row of thing id's and combine them with their descriptions? I think the fundamental problem/challenge is that my thing id's are not one per row, but that they come back as fields in just one row. This makes it really hard to join against them, for example.
Michael
You seem to want conditional aggregation:
select
max(case when id = 3 then descr end) descr_3,
max(case when id = 12 then descr end) descr_12,
max(case when id = 9 then descr end) descr_9
from dt
where id in (3, 12, 9)
Note that desc is a SQL keyword, hence a poor choice for a column name. I renamed it descr in the query.
You will need multiple joins to the dt table to get the description of each of the "things" you want in a single row:
SELECT thing_a_id, dta.desc AS thing_a_desc,
thing_b_id, dtb.desc AS thing_b_desc,
thing_c_id, dtc.desc AS thing_c_desc
FROM wt
JOIN dt dta ON dta.id = wt.thing_a_id
JOIN dt dtb ON dtb.id = wt.thing_b_id
JOIN dt dtc ON dtc.id = wt.thing_c_id
I love to play with common table expressions (CTE), this is an ideal candidate for one.
In the example below, decriptions and dataset are substitutes for the actual tables you use. I am just building them in memory rather than an actual table.
In the "breakdown" CTE I am splitting up the CSV value from dataset into multiple rows.
In the last part of the select I am converting everything after the = sign to a number, and then matching that on id from the descriptions CTE. The resulting dataset is I believe what you requested.
WITH
descriptions AS
(SELECT 5 AS id, 'flour' AS description FROM DUAL
UNION ALL
SELECT 12 AS id, 'cups' AS description FROM DUAL
UNION ALL
SELECT 9 AS id, 'barley' AS description FROM DUAL),
dataset AS
(SELECT 'wt.thing_a_id = 5, wt.thing_b_id = 12, wt.thing_c_id = 9' AS result FROM DUAL),
breakdown ( result, REMAINDER ) AS
(SELECT TRIM( SUBSTR( result
, 1
, INSTR( result || ',', ',' ) - 1 ) ) AS result
, TRIM( SUBSTR( result, INSTR( result || ',', ',' ) + 1 ) || ',' ) AS REMAINDER
FROM dataset
UNION ALL
SELECT TRIM( SUBSTR( REMAINDER
, 1
, INSTR( REMAINDER, ',' ) - 1 ) )
, SUBSTR( REMAINDER, INSTR( REMAINDER || ',', ',' ) + 1 ) AS REMAINDER
FROM breakdown
WHERE REMAINDER IS NOT NULL)
SELECT result, TO_NUMBER( TRIM( SUBSTR( result, INSTR( result, '=' ) + 1 ) ) ) AS id, description
FROM breakdown
LEFT OUTER JOIN descriptions
ON TO_NUMBER( TRIM( SUBSTR( breakdown.result, INSTR( breakdown.result, '=' ) + 1 ) ) ) =
descriptions.id
Results:
Result ID DESCRIPTION
wt.thing_a_id = 5 5 flour
wt.thing_b_id = 12 12 cups
wt.thing_c_id = 9 9 barley

How to select the list of words containing a particular substring as part of a SQL query (oracle)?

I'm trying to return the list of "words" (separated by spaces) containing a certain substring within a string as part of an Oracle Sql query. Would like to return the result as a comma separated list. Separate rows for each match would also work.
Example String in [text_col] field:
some words 123-asdf-789A and also this one 456-asdf-555A more words etc.
Desired result: 123-asdf-789A, 456-asdf-555A
This is what I have so far but it only returns the first result and the fact that it's two separate regular expressions makes it difficult to concatenate all matches as I would like to do.
CONCAT(REGEXP_SUBSTR(text_col, ''(([^[:space:]]+)\asdf)'', 1, 1, ''i'', 1),
REGEXP_SUBSTR(text_col, ''\asdf([^[:space:]]+)'', 1, 1, ''i'', 1))
You can use some regexp functions together as :
with tab(str) as
(
select 'some words 123-asdf-789A and also this one 456-asdf-555A more words etc' from dual
), t as
(
select regexp_substr(str,'[^[:space:]]+',1,level) as str, level as lvl
from tab
connect by level <= regexp_count(str,'[:space:]')
)
select listagg(str,',') within group (order by lvl) as "Result"
from t
where regexp_like(str,'-');
Result
---------------------------------
123-asdf-789A,456-asdf-555A
Demo
first split by spaces (through [:space:] posix) and take the ones containing dash characters, and finally concatenate by listagg() function
Use a recursive sub-query factoring clause and iterate through all the matches concatenating the string as you go:
Oracle Setup:
CREATE TABLE test_data ( value ) AS
SELECT 'some words 123-asdf-789A and also this one 456-asdf-555A more words etc.' FROM DUAL UNION ALL
SELECT 'some words without the expected sub-string' FROM DUAL UNION ALL
SELECT 'asdf asdf-123 456-asdf 78-asdf-90' FROM DUAL
Query:
WITH matches ( value, idx, cnt, match ) AS (
SELECT value,
0,
REGEXP_COUNT( value, '\S*asdf\S*' ),
CAST( NULL AS VARCHAR2(4000) )
FROM test_data
UNION ALL
SELECT value,
idx + 1,
cnt,
CASE idx WHEN 0 THEN '' ELSE match || ' ' END
|| REGEXP_SUBSTR( value, '\S*asdf\S*', 1, idx + 1 )
FROM matches
WHERE idx < cnt
)
SELECT value, match
FROM matches
WHERE idx = cnt;
Output:
VALUE | MATCH
:----------------------------------------------------------------------- | :--------------------------------
some words without the expected sub-string | null
some words 123-asdf-789A and also this one 456-asdf-555A more words etc. | 123-asdf-789A 456-asdf-555A
asdf asdf-123 456-asdf 78-asdf-90 | asdf asdf-123 456-asdf 78-asdf-90
db<>fiddle here

looping in sql with delimiter

I just had this idea of how can i loop in sql?
For example
I have this column
PARAMETER_VALUE
E,C;S,C;I,X;G,T;S,J;S,F;C,S;
i want to store all value before (,) in a temp column also store all value after (;) into another column
then it wont stop until there is no more value after (;)
Expected Output for Example
COL1 E S I G S S C
COL2 C C X T J F S
etc . . .
You can get by using regexp_substr() window analytic function with connect by level <= clause
with t1(PARAMETER_VALUE) as
(
select 'E,C;S,C;I,X;G,T;S,J;S,F;C,S;' from dual
), t2 as
(
select level as rn,
regexp_substr(PARAMETER_VALUE,'([^,]+)',1,level) as str1,
regexp_substr(PARAMETER_VALUE,'([^;]+)',1,level) as str2
from t1
connect by level <= regexp_count(PARAMETER_VALUE,';')
)
select listagg( regexp_substr(str1,'([^;]+$)') ,' ') within group (order by rn) as col1,
listagg( regexp_substr(str2,'([^,]+$)') ,' ') within group (order by rn) as col2
from t2;
COL1 COL2
------------- -------------
E S I G S S C C C X T J F S
Demo
Assuming that you need to separate the input into rows, at the ; delimiters, and then into columns at the , delimiter, you could do something like this:
-- WITH clause included to simulate input data. Not part of the solution;
-- use actual table and column names in the SELECT statement below.
with
t1(id, parameter_value) as (
select 1, 'E,C;S,C;I,X;G,T;S,J;S,F;C,S;' from dual union all
select 2, ',U;,;V,V;' from dual union all
select 3, null from dual
)
-- End of simulated input data
select id,
level as ord,
regexp_substr(parameter_value, '(;|^)([^,]*),', 1, level, null, 2) as col1,
regexp_substr(parameter_value, ',([^;]*);' , 1, level, null, 1) as col2
from t1
connect by level <= regexp_count(parameter_value, ';')
and id = prior id
and prior sys_guid() is not null
order by id, ord
;
ID ORD COL1 COL2
--- --- ---- ----
1 1 E C
1 2 S C
1 3 I X
1 4 G T
1 5 S J
1 6 S F
1 7 C S
2 1 U
2 2
2 3 V V
3 1
Note - this is not the most efficient way to split the inputs (nothing will be very efficient - the data model, which is in violation of First Normal Form, is the reason). This can be improved using standard instr and substr, but the query will be more complicated, and for that reason, harder to maintain.
I generated more input data, to illustrate a few things. You may have several inputs that must be broken up at the same time; that must be done with care. (Note the additional conditions in CONNECT BY). I also illustrate the handling of NULL - if a comma comes right after a semicolon, that means that the "column 1" part of that pair must be NULL. That is shown in the output.

Replace string with random text - Oracle SQL

I have a table table1 with 1 column - edi_value which is of type CLOB.
These are the entries:
seq edi_message
1 ISA*00* *00* *08*9254110060 *ZZ*123456789 *041216*0805*U*00501*000095071*0*P*>~
GS*AG*5137624388*123456789*20041216*0805*95071*X*005010~
ST*824*021390001*005010X186A1~
2 ISA*00* *00* *08*56789876678 *ZZ*123456789 *041216*0805*U*00501*000095071*0*P*>~
GS*AG*5137624388*123456789*20041216*0805*95071*X*005010~
ST*824*021390001*005010X186A1~
Please note - there can be varying number of lines, from 3 to 500.
What I'm looking for is the following conditions:
Ignore text before first * in each line, for every line, before the first *, it should not change. For ex. GS, ST should not change. ONLY after the first * should randomize
Replace numbers [0-9] with random numbers, for ex. if 0 is replaced with 1, then it should be 1 througout.
Replace text [A-Za-z] with random text, for ex. if A is replaced with W, then it should be replaced with W throughout
Leave special characters as is
One character/number should ONLY map to one random character/number
Output can be:
seq edi_message
1 ISA*11* *11* *13*4030111101 *QQ*102030234 *101010*1313*U*11311*111143121*1*V*>~
GS*WE*3122000233*102030234*01101010*1313*43121*X*113111~
ST*300*101241111*113111X130A1~
2 ISA*11* *11* *13*30234320023 *QQ*102030234 *101010*1313*U*11311*111143121*1*V*>~
GS*WE*3122000233*102030234*01101010*1313*43121*X*113111~
ST*300*101241111*113111X130W1~
How can this be achieved in Oracle SQL?
You can use translate with a helper function for generating random strings (though #LukStorms has a much neater SQL solution for that using LISTAGG), along with a method to tokenise and then re-concatenate the values into lines (I use a pure SQL method here for demonstration):
create or replace function f(p_low integer, p_high integer)
return varchar as
r varchar(2000) := '';
x integer;
begin
for i in p_low..p_high loop
x := dbms_random.value(0,length(r)+1);
r := substr(r,1,x)||chr(i)||substr(r,x+1);
end loop;
return r;
end;
/
select * from table1;
| EDI_VALUE |
| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ISA*00* *00* *08*9254110060 *ZZ*123456789 *041216*0805*U*00501*000095071*0*P*>~<br> GS*AG*5137624388*123456789*20041216*0805*95071*X*005010~<br> ST*824*021390001*005010X186A1~ |
| ISA*00* *00* *08*56789876678 *ZZ*123456789 *041216*0805*U*00501*000095071*0*P*>~<br> GS*AG*5137624388*123456789*20041216*0805*95071*X*005010~<br> ST*824*021390001*005010X186A |
with t as (select f(48,57)||f(65,90) translate_chars from dual)
select (select new_value
from (select substr(sys_connect_by_path(r_line,'
'),2) new_value, connect_by_isleaf isleaf
from (select lvl
, substr(line,1,instr(line,'*')-1)||
translate(substr(line,instr(line,'*'))
,'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
,(select translate_chars from t)) r_line
from (select level lvl
, regexp_substr(edi_value,'^.*$',1,level,'m') line
from (select table1.edi_value from dual)
connect by level <= regexp_count(edi_value,'^.*$',1,'m')))
start with lvl=1 connect by lvl=(prior lvl)+1)
where isleaf=1)
from table1;
| (SELECTNEW_VALUEFROM(SELECTSUBSTR(SYS_CONNECT_BY_PATH(R_LINE,''),2)NEW_VALUE,CONNECT_BY_ISLEAFISLEAFFROM(SELECTLVL,SUBSTR(LINE,1,INSTR(LINE,'*')-1)||TRANSLATE(SUBSTR(LINE,INSTR(LINE,'*')),'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',(SELECTTRANSLATE_CHARSFR |
| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ISA*66* *66* *67*1935006626 *VV*098532471 *650902*6763*K*66360*666613640*6*P*>~<br> GS*GZ*3084295877*098532471*96650902*6763*13640*I*663606~<br> ST*795*690816660*663606I072G0~ |
| ISA*66* *66* *67*32471742247 *VV*098532471 *650902*6763*K*66360*666613640*6*P*>~<br> GS*GZ*3084295877*098532471*96650902*6763*13640*I*663606~<br> ST*795*690816660*663606I072G |
db<>fiddle here
You can use CTE's with a CONNECT to generate the strings for the letters and numbers.
Then use the ordered and scrambled strings in the translate.
A CROSS APPLY can be used to REGEX split the message into parts.
Then only translate those that start with a *.
And use LISTAGG to glue the parts back together.
WITH
NUMS as
(
select
LISTAGG(n, '') WITHIN GROUP (ORDER BY n) as n_from,
LISTAGG(n, '') WITHIN GROUP (ORDER BY DBMS_RANDOM.VALUE) as n_to
from (select level-1 n from dual connect by level <= 10)
),
LETTERS as
(
select
LISTAGG(c, '') WITHIN GROUP (ORDER BY c) as c_from,
LISTAGG(c, '') WITHIN GROUP (ORDER BY DBMS_RANDOM.VALUE) as c_to
from (select chr(ascii('A')+level-1 ) c from dual connect by level <= 26)
)
SELECT ca.scrambled as scrambled_message
FROM table1 t
CROSS JOIN NUMS
CROSS JOIN LETTERS
CROSS APPLY
(
SELECT LISTAGG(CASE WHEN part like '*%' then translate(part, n_from||c_from, n_to||c_to) else part end, '') WITHIN GROUP (ORDER BY lvl) as scrambled
FROM
(
SELECT
level AS lvl,
REGEXP_SUBSTR(t.edi_message,'[*]\S+|[^*]+',1,level,'m') AS part
FROM dual
CONNECT BY level <= regexp_count(t.edi_message, '[*]\S+|[^*]+')+1
) parts
) ca;
A test on db<>fiddle here
Example output:
SCRAMBLED_MESSAGE
-----------------------------------------------------------------------------------------------------------
ISA*99* *99* *92*3525999959 *PP*950525023 *959595*9292*A*99299*999932909*9*J*>~
GS*WQ*2900555022*950525023*59959595*9292*32909*I*992999~
ST*255*959039999*992999I925V9~
ISA*99* *99* *92*25023205502 *PP*950525023 *959595*9292*A*99299*999932909*9*J*>~
GS*WQ*2900555022*950525023*59959595*9292*32909*I*992999~
ST*255*959039999*992999I925W9~

Apply order by in comma separated string in oracle

I have one of the column in oracle table which has below value :
select csv_val from my_table where date='09-OCT-18';
output
==================
50,100,25,5000,1000
I want this values to be in ascending order with select query, output would looks like :
output
==================
25,50,100,1000,5000
I tried this link, but looks like it has some restriction on number of digits.
Here, I made you a modified version of the answer you linked to that can handle an arbitrary (hardcoded) number of commas. It's pretty heavy on CTEs. As with most LISTAGG answers, it'll have a 4000-char limit. I also changed your regexp to be able to handle null list entries, based on this answer.
WITH
T (N) AS --TEST DATA
(SELECT '50,100,25,5000,1000' FROM DUAL
UNION
SELECT '25464,89453,15686' FROM DUAL
UNION
SELECT '21561,68547,51612' FROM DUAL
),
nums (x) as -- arbitrary limit of 20, can be changed
(select level from dual connect by level <= 20),
splitstr (N, x, substring) as
(select N, x, regexp_substr(N, '(.*?)(,|$)', 1, x, NULL, 1)
from T
inner join nums on x <= 1 + regexp_count(N, ',')
order by N, x)
select N, listagg(substring, ',') within group (order by to_number(substring)) as sorted_N
from splitstr
group by N
;
Probably it can be improved, but eh...
Based on sample data you posted, relatively simple query would work (you need lines 3 - 7). If data doesn't really look like that, query might need adjustment.
SQL> with my_table (csv_val) as
2 (select '50,100,25,5000,1000' from dual)
3 select listagg(token, ',') within group (order by to_number(token)) result
4 from (select regexp_substr(csv_val, '[^,]+', 1, level) token
5 from my_table
6 connect by level <= regexp_count(csv_val, ',') + 1
7 );
RESULT
-------------------------
25,50,100,1000,5000
SQL>