Dynamically spilt rows in to mulitple comma-separated lists - sql

I have a table that contains a list of users.
USER_TABLE
USER_ID DEPT
------- ----
USER1 HR
USER2 FINANCE
USER3 IT`
Using a SQL statement, I need to get the list of users as a delimited string returned as a varchar2 - this is the only datatype I can use as dictated by the application I'm using, e.g.
USER1, USER2, USER3
The issue I have is the list will exceed 4000 characters. I have the following which will manually chunk up the users in to lists of 150 users at a time (based on user_id max size being 20 characters plus delimiters safely fitting in to 4000 characters).
SELECT LISTAGG(USER_ID, ',') WITHIN GROUP (ORDER BY USER_ID)
FROM (SELECT DISTINCT USER_ID AS USER_ID, ROW_NUMBER() OVER (ORDER BY USER_ID) RN FROM TABLE_NAME)
WHERE RN <= 150
START WITH RN = 1
CONNECT BY PRIOR RN = RN - 1
UNION
SELECT LISTAGG(USER_ID, ',') WITHIN GROUP (ORDER BY USER_ID)
FROM (SELECT DISTINCT USER_ID AS USER_ID, ROW_NUMBER() OVER (ORDER BY USER_ID) RN FROM TABLE_NAME)
WHERE RN > 150 AND RN <= 300
START WITH RN = 1
CONNECT BY PRIOR RN = RN - 1
This is manual and would require an additional UNION for each chunk of 150 users and the total number of users could increase at a later date.
Is it possible to do this so the delimited strings of user_ids are generated dynamically so they fit in to multiple chunks of 4000 characters and no user_ids are split over multiple strings?
Ideally, I'd want the output to look like this:
USER1, USER2, USER3 (to) USER149
USER150, USER151, USER152 (to) USER300
USER301, USER302, USER303 (to) USER450`
The solution needs to be a SELECT statement as the schema is read-only and we aren't able to create any objects on the database. We're using Oracle 11g.

You can do this with a pipelined function:
create or replace function get_user_ids
return sys.dbms_debug_vc2coll pipelined
is
rv varchar2(4000) := null;
begin
for r in ( select user_id, length(user_id) as lng
from user_table
order by user_id )
loop
if length(rv) + r.lng + 1 > 4000
then
rv := rtrim(rv, ','); -- remove trailing comma
pipe row (rv);
rv := null;
end if;
rv := rv || r.user_id || ',';
end loop;
return;
end;
/
You would call it like this:
select column_value as user_id_csv
from table(get_user_ids);

Alternate way using below function :
create or replace FUNCTION my_agg_user
RETURN CLOB IS
l_string CLOB;
TYPE t_bulk_collect_test_tab IS TABLE OF VARCHAR2(4000);
l_tab t_bulk_collect_test_tab;
CURSOR user_list IS
SELECT USER_ID
FROM USER_TABLE ;
BEGIN
OPEN user_list;
LOOP
FETCH user_list
BULK COLLECT INTO l_tab LIMIT 1000;
FOR indx IN 1 .. l_tab.COUNT
LOOP
l_string := l_string || l_tab(indx);
l_string := l_string || ',';
END LOOP;
EXIT WHEN user_list%NOTFOUND;
END LOOP;
CLOSE user_list;
RETURN l_string;
END my_agg_user;
After function created ,
select my_agg_user from dual;

I believe the SQL I have below should work in most cases. I've hard-coded the SQL to break the strings up in to 150 entries of user id, but the rest is dynamic.
The middle part produces duplicates, which requires an additional distinct to eliminate, but I'm not sure if there is a better way to do this.
WITH POSITION AS ( SELECT ((LEVEL-1) * 150 + 1) FROM_POS, LEVEL * 150 TO_POS
FROM DUAL
CONNECT BY LEVEL <= (SELECT COUNT(DISTINCT( USER_ID)) / 150 FROM TABLE_NAME)
)
SELECT DISTINCT
LISTAGG(USER_ID, ',') WITHIN GROUP (ORDER BY USER_ID) OVER (PARTITION BY FROM_POS, TO_POS)
FROM
(SELECT DISTINCT USER_ID AS USER_ID, ROW_NUMBER() OVER (ORDER BY USER_ID) RN FROM TABLE_NAME) V0 ,
POSITION
WHERE V0.RN >= POSITION.FROM_POS
AND V0.RN <= POSITION.TO_POS

Related

Oracle SQL - group only by nearby same records

I need to sum delay time in seconds for records that are same by value column. Problem is I need them only grouped by the same chunks and not all together. For example, for below data I would need sum of 3 records for value 3 and separately for 2 records further down, and not to sum records for value 4 as they are not together. Is there a way to do this?
ID Value Timestamp Delay(s)
166549627 4 19-OCT-21 11:00:19 11.4
166549450 8 19-OCT-21 11:00:27 7.5
166549446 3 19-OCT-21 11:00:34 7.1
166549625 3 19-OCT-21 11:00:45 10.9
166549631 3 19-OCT-21 11:00:58 13.3
166550549 3 19-OCT-21 11:01:03 4.5
166549618 7 19-OCT-21 11:01:14 8.8
166549627 4 19-OCT-21 11:01:23 11.4
166550549 3 19-OCT-21 11:01:45 4.5
166550549 3 19-OCT-21 11:01:59 4.5
You don't even need to use plsql for that purpose. Only SQL can suffice.
Below solution uses the recursive common table expression (CTE) technic to create sub groups according to value column and timestamp column.
with ranked_rows (ID, VALUE, TIMESTAMP, DELAY, RNB) as (
select ID, VALUE, TIMESTAMP, DELAY, row_number()over(order by TIMESTAMP) rnb
from YourTable
)
, cte (ID, VALUE, TIMESTAMP, DELAY, RNB, grp) as (
select ID, VALUE, TIMESTAMP, DELAY, RNB, 1 grp
from ranked_rows
where rnb = 1
union all
select t.ID, t.VALUE, t.TIMESTAMP, t.DELAY, t.RNB, case when c.VALUE = t.VALUE then c.grp else c.grp + 1 end
from ranked_rows t
join cte c on c.rnb + 1 = t.rnb
)
select VALUE, sum(DELAY) sum_consecutive_DELAY, min(TIMESTAMP) min_TIMESTAMP, max(TIMESTAMP) max_TIMESTAMP, count(*)nb_rows
from cte
group by VALUE, GRP
order by min_TIMESTAMP
;
demo
If you want to use PL/SQL you can loop over all records orderd by timestamp.
Just remember the last value und sum up if it is the same as the current value. If not, just save the sum somewehre else and continue.
You can also write this as a pipelined function to use a query to access the data.
declare
l_sum number := 0;
l_last_val number;
begin
for rec in (select * from your_table order by timestamp) loop
if l_last_val = rec.value then
l_sum := l_sum + rec.delay;
continue;
elsif l_last_val is not null then
dbms_output.put_line('value: ' || l_last_val || ' sum: ' || l_sum); -- save last_val and sum
end if;
l_last_val := rec.val;
l_sum := rec.delay;
end loop;
dbms_output.put_line('value: ' || l_last_val || ' sum: ' || l_sum); -- save last_val and sum
end;

Remove duplicate values from comma separated variable in Oracle

I have a variable (called: all_email_list) which contains 3 email address lists altogether. (I found some similar question but not the same with a proper solution)
Example: test#asd.com, test2#asd.com,test#asd.com,test3#asd.com, test4#asd.com,test2#asd.com (it can contain spaces between comas but not all the time)
The desired output: test#asd.com, test2#asd.com,test3#asd.com,test4#asd.com
declare
first_email_list varchar2(4000);
second_email_list varchar2(4000);
third_email_list varchar2(4000);
all_email_list varchar2(4000);
begin
select listagg(EMAIL,',') into first_email_list from UM_USER a left join UM_USERROLLE b on (a.mynetuser=b.NT_NAME) left join UM_RULES c on (c.id=b.RULEID) where RULEID = 902;
select listagg(EMAIL,',') into second_email_list from table2 where CFT_ID =:P25_CFT_TEAM;
select EMAIL into third_email_list from table3 WHERE :P25_ID = ID;
all_email_list:= first_email_list || ',' || second_email_list || ',' || third_email_list;
dbms_output.put_line(all_email_list);
end;
Any solution to solve this in a simple way? By regex maybe.
Solution description. Use CTE to first split up the list of emails into rows with 1 email address per row (testd_rows). Then select distinct rows (testd_rows_unique) from testd_rows and finally put them back together with listagg. From 19c onwards you can use LISTAGG with the DISTINCT keyword.
set serveroutput on size 999999
clear screen
declare
all_email_list varchar2(4000);
l_unique_email_list varchar2(4000);
begin
all_email_list := 'test#asd.com, test2#asd.com,test#asd.com,test3#asd.com, test4#asd.com,test2#asd.com';
WITH testd_rows(email) AS
(
select regexp_substr (all_email_list, '[^, ]+', 1, rownum) split
from dual
connect by level <= length (regexp_replace (all_email_list, '[^, ]+')) + 1
), testd_rows_unique(email) AS
(
SELECT distinct email FROM testd_rows
)
SELECT listagg(email, ',') WITHIN GROUP (ORDER BY email)
INTO l_unique_email_list
FROM testd_rows_unique;
dbms_output.put_line(l_unique_email_list);
end;
/
test2#asd.com,test3#asd.com,test4#asd.com,test#asd.com
But ... why are you converting rows to a comma separated string and then de-duping it ? Use UNION to take out the duplicate values in a single SELECT statement and do LISTAGG on the values. No regexp needed then. UNION will skip duplicates as opposed to UNION ALL which returns all the rows.
DECLARE
all_email_list varchar2(4000);
BEGIN
WITH all_email (email) AS
(
select email from UM_USER a left join UM_USERROLLE b on (a.mynetuser=b.NT_NAME) left join UM_RULES c on (c.id=b.RULEID) where RULEID = 902
UNION
select email from table2 where CFT_ID =:P25_CFT_TEAM
UNION
select email from table3 WHERE :P25_ID = ID
)
SELECT listagg(email, ',') WITHIN GROUP (ORDER BY email)
INTO all_email_list
FROM all_email;
dbms_output.put_line(all_email_list);
END;
/
You could leverage the apex_string.split table function to simplify the code.
12c+ makes it real clean
select listagg(distinct column_value,',') within group (order by null)
from apex_String.split(replace('test#asd.com, test2#asd.com,test#asd.com,test3#asd.com, test4#asd.com,test2#asd.com'
,' ')
,',')
11g needs a wrapping table() and listagg doesn't support distinct.
select listagg(email,',') within group (order by null)
from
(select distinct column_value email
from table(apex_String.split(replace('test#asd.com, test2#asd.com,test#asd.com,test3#asd.com, test4#asd.com,test2#asd.com',' '),','))
);

Listagg Overflow function implementation (Oracle SQL)

I am using LISTAGG function for my query, however, it returned an ORA-01489: result of string concatenation is too long error. So I googled that error and found out I can use ON OVERFLOW TRUNCATE and I implemented that into my SQL but now it generates missing right parenthesis error and I can't seem to figure out why?
My query
SELECT DISTINCT cust_id, acct_no, state, language_indicator, billing_system, market_code,
EMAIL_ADDR, DATE_OF_CHANGE, TO_CHAR(DATE_LOADED, 'DD-MM-YYYY') DATE_LOADED,
(SELECT LISTAGG( SUBSTR(mtn, 7, 4),'<br>' ON OVERFLOW TRUNCATE '***' )
WITHIN GROUP (ORDER BY cust_id || acct_no) mtnlist
FROM process.feature WHERE date_loaded BETWEEN TO_DATE('02-08-2018','MM-dd-yyyy')
AND TO_DATE('02-09-2018', 'MM-dd-yyyy') AND cust_id = ffsr.cust_id
AND acct_no = ffsr.acct_no AND filename = 'FEATURE.VB2B.201802090040'
GROUP BY cust_id||acct_no) mtnlist
FROM process.feature ffsr WHERE date_loaded BETWEEN TO_DATE('02-08-2018','MM-dd-yyyy')
AND TO_DATE('02-09-2018','MM-dd-yyyy') AND cust_id BETWEEN 0542185146 AND 0942025571
AND src_ind = 'B' AND filename = 'FEATURE.VB2B.201802090040'
AND letter_type = 'FA' ORDER BY cust_id;
With a little bit of help by XML, you might get it work. Example is based on HR schema.
SQL> select
2 listagg(s.department_name, ',') within group (order by null) result
3 from departments s, departments d;
from departments s, departments d
*
ERROR at line 3:
ORA-01489: result of string concatenation is too long
SQL>
SQL> select
2 rtrim(xmlagg(xmlelement (e, s.department_name || ',')).extract
3 ('//text()').getclobval(), ',') result
4 from departments s, departments d;
RESULT
--------------------------------------------------------------------------------
Administration,Administration,Administration,Administration,Administration,Admin
SQL>
This demo sourced from livesql.oracle.com
-- Create table with 93 strings of different lengths, plus one NULL string. Notice the only ASCII character not used is '!', so I will use it as a delimiter in LISTAGG.
create table strings as
with letters as (
select level num,
chr(ascii('!')+level) let
from dual
connect by level <= 126 - ascii('!')
union all
select 1, null from dual
)
select rpad(let,num,let) str from letters;
-- Note the use of LENGTHB to get the length in bytes, not characters.
select str,
sum(lengthb(str)+1) over(order by str rows unbounded preceding) - 1 cumul_lengthb,
sum(lengthb(str)+1) over() - 1 total_lengthb,
count(*) over() num_values
from strings
where str is not null;
-- This statement implements the ON OVERFLOW TRUNCATE WITH COUNT option of LISTAGG in 12.2. If there is no overflow, the result is the same as a normal LISTAGG.
select listagg(str, '!') within group(order by str) ||
case when max(total_lengthb) > 4000 then
'! ... (' || (max(num_values) - count(*)) || ')'
end str_list
from (
select str,
sum(lengthb(str)+1) over(order by str) - 1 cumul_lengthb,
sum(lengthb(str)+1) over() - 1 total_lengthb,
count(*) over() num_values
from strings
where str is not null
)
where total_lengthb <= 4000
or cumul_lengthb <= 4000 - length('! ... (' || num_values || ')');

PL SQL Query NVL with Multiple values separated by commas

I have a query working that allows the user to select by date range, store (one store number) and zip code,
Regarding store I want to be able to enter multiple store numbers separated by commas.
The code below works for a single store but not for multiple store numbers
SELECT tt.id_str_rt store
,SUBSTR(tt.inf_ct,1,5) zip_code
,COUNT(tt.ai_trn) tran_count
,SUM(tr.mo_nt_tot) sales_value
FROM orco_owner.tr_trn tt
,orco_owner.tr_rtl tr
WHERE tt.id_str_rt = tr.id_str_rt
AND (tt.id_str_rt IN NVL(:PM_store_number,tt.id_str_rt) OR :PM_store_number IS NULL)
AND NVL(SUBSTR(tt.inf_ct,1,5),0) = NVL(:PM_zip_code,NVL(SUBSTR(tt.inf_ct,1,5),0))
AND tt.id_ws = tr.id_ws
AND tt.dc_dy_bsn = tr.dc_dy_bsn
AND tt.ai_trn = tr.ai_trn
AND TRUNC(TO_DATE(tt.dc_dy_bsn,'yyyy-MM-dd'))
BETWEEN NVL(:PM_date_from, TRUNC(TO_DATE(tt.dc_dy_bsn,'yyyy-MM-dd')))
AND NVL(:PM_date_to,TRUNC(TO_DATE(tt.dc_dy_bsn,'yyyy-MM-dd')))
AND LENGTH(TRIM(TRANSLATE(SUBSTR(inf_ct,1,5), '0123456789', ' '))) IS NULL
GROUP BY tt.id_str_rt,SUBSTR(tt.inf_ct,1,5)
ORDER BY zip_code, store
You could create a function like the one described in how to convert csv to table in oracle:
create or replace function splitter(p_str in varchar2) return sys.odcivarchar2list
is
v_tab sys.odcivarchar2list:=new sys.odcivarchar2list();
begin
with cte as (select level ind from dual
connect by
level <=regexp_count(p_str,',') +1
)
select regexp_substr(p_str,'[^,]+',1,ind)
bulk collect into v_tab
from cte;
return v_tab;
end;
/
Then you would use it in your query like this:
and (tt.id_str_rt in (select column_value from table(splitter(:PM_store_number)) ))
instead of this:
AND (tt.id_str_rt IN NVL(:PM_store_number,tt.id_str_rt) OR :PM_store_number IS NULL)

oracle query slow with REGEXP_SUBSTR(AGGREGATOR,'[^;]+',1,LEVEL)

I am using a query to get different rows instead of semicolon-seprated values.
The table looks like this:
row_id aggregator
1 12;45
2 25
Using the query I want the output to look like:
row_id aggregator
1 12
1 45
2 25
I am using the following query:
SELECT
DISTINCT ROW_ID,
REGEXP_SUBSTR(AGGREGATOR,'[^;]+',1,LEVEL) as AGGREGATOR
FROM DUMMY_1
CONNECT BY REGEXP_SUBSTR(AGGREGATOR,'[^;]+',1,LEVEL) IS NOT NULL;
which it is very slow even for 300 records, and I have to work with 40000 records.
Regular expressions are known to be expensive functions, so you should try to minimize their use when performance is critical (such as using standard functions in the CONNECT BY clause).
Using standard functions (INSTR, SUBSTR, REPLACE) will be more efficient, but the resulting code will be hard to read/understand/maintain.
I could not resist writing a recursive QTE, which I is much more efficient than both regular expressions and standard functions. Furthermore, recursive QTE queries have arguably some elegance. You'll need Oracle 11.2:
WITH rec_sql(row_id, aggregator, lvl, tail) AS (
SELECT row_id,
nvl(substr(aggregator, 1, instr(aggregator, ';') - 1),
aggregator),
1 lvl,
CASE WHEN instr(aggregator, ';') > 0 THEN
substr(aggregator, instr(aggregator, ';') + 1)
END tail
FROM dummy_1 initialization
UNION ALL
SELECT r.row_id,
nvl(substr(tail, 1, instr(tail, ';') - 1), tail),
lvl + 1,
CASE WHEN instr(tail, ';') > 0 THEN
substr(tail, instr(tail, ';') + 1)
END tail
FROM rec_sql r
WHERE r.tail IS NOT NULL
)
SELECT * FROM rec_sql;
You can see on SQLFiddle that this solution is very performant and on par with #A.B.Cade's solution. (Thanks to A.B.Cade for the test case).
Sometimes a pipelined table can be faster, try this:
create or replace type t is object(word varchar2(100), pk number);
/
create or replace type t_tab as table of t;
/
create or replace function split_string(del in varchar2) return t_tab
pipelined is
word varchar2(4000);
str_t varchar2(4000) ;
v_del_i number;
iid number;
cursor c is
select * from DUMMY_1;
begin
for r in c loop
str_t := r.aggregator;
iid := r.row_id;
while str_t is not null loop
v_del_i := instr(str_t, del, 1, 1);
if v_del_i = 0 then
word := str_t;
str_t := '';
else
word := substr(str_t, 1, v_del_i - 1);
str_t := substr(str_t, v_del_i + 1);
end if;
pipe row(t(word, iid));
end loop;
end loop;
return;
end split_string;
Here is a sqlfiddle demo
And here is another demo with 22 rows containing 3 vals in aggregator each - see the difference between first and second query..
Your connect by produces much more records than needed, that's why the performance is poor and you need to use distinct to limit the number records. An approach that does need distinct would be:
select row_id, regexp_substr(aggregator,'[^;]+',1,n) aggregator
from dummy_1, (select level n from dual connect by level < 100)
where n <= regexp_count(aggregator,';')+1
The above works if number of semicolons is less than 99. The below solution does not have this limitation and is faster if the maximal number of semicolons is lower:
with dummy_c as (select row_id, aggregator, regexp_count(aggregator,';')+1 c from dummy_1)
select row_id, regexp_substr(aggregator,'[^;]+',1,n) aggregator
from dummy_c, (select level n from dual connect by level <= (select max(c) from dummy_c))
where n <= c
I think the DISTINCT may the problem. Besides, I do not understand why do you need to CONNECT BY REGEXP_SUBSTR(AGGREGATOR,'[^;]+',1,LEVEL) IS NOT NULL. You are using regexp in your select and connect by. Can you use where AGGREGATOR IS NOT NULL instead of connect by? Find a way to get rid of distinct and revise your query. You can use EXISTS instead of distinct... To help you more I need tables and data.
SELECT * FROM
(
SELECT REGEXP_SUBSTR(AGGREGATOR ,'[^;]+',1,LEVEL) as AGGREGATOR
FROM your_table
)
WHERE AGGREGATOR IS NOT NULL
/