Oracle SQL - group only by nearby same records - sql

I need to sum delay time in seconds for records that are same by value column. Problem is I need them only grouped by the same chunks and not all together. For example, for below data I would need sum of 3 records for value 3 and separately for 2 records further down, and not to sum records for value 4 as they are not together. Is there a way to do this?
ID Value Timestamp Delay(s)
166549627 4 19-OCT-21 11:00:19 11.4
166549450 8 19-OCT-21 11:00:27 7.5
166549446 3 19-OCT-21 11:00:34 7.1
166549625 3 19-OCT-21 11:00:45 10.9
166549631 3 19-OCT-21 11:00:58 13.3
166550549 3 19-OCT-21 11:01:03 4.5
166549618 7 19-OCT-21 11:01:14 8.8
166549627 4 19-OCT-21 11:01:23 11.4
166550549 3 19-OCT-21 11:01:45 4.5
166550549 3 19-OCT-21 11:01:59 4.5

You don't even need to use plsql for that purpose. Only SQL can suffice.
Below solution uses the recursive common table expression (CTE) technic to create sub groups according to value column and timestamp column.
with ranked_rows (ID, VALUE, TIMESTAMP, DELAY, RNB) as (
select ID, VALUE, TIMESTAMP, DELAY, row_number()over(order by TIMESTAMP) rnb
from YourTable
)
, cte (ID, VALUE, TIMESTAMP, DELAY, RNB, grp) as (
select ID, VALUE, TIMESTAMP, DELAY, RNB, 1 grp
from ranked_rows
where rnb = 1
union all
select t.ID, t.VALUE, t.TIMESTAMP, t.DELAY, t.RNB, case when c.VALUE = t.VALUE then c.grp else c.grp + 1 end
from ranked_rows t
join cte c on c.rnb + 1 = t.rnb
)
select VALUE, sum(DELAY) sum_consecutive_DELAY, min(TIMESTAMP) min_TIMESTAMP, max(TIMESTAMP) max_TIMESTAMP, count(*)nb_rows
from cte
group by VALUE, GRP
order by min_TIMESTAMP
;
demo

If you want to use PL/SQL you can loop over all records orderd by timestamp.
Just remember the last value und sum up if it is the same as the current value. If not, just save the sum somewehre else and continue.
You can also write this as a pipelined function to use a query to access the data.
declare
l_sum number := 0;
l_last_val number;
begin
for rec in (select * from your_table order by timestamp) loop
if l_last_val = rec.value then
l_sum := l_sum + rec.delay;
continue;
elsif l_last_val is not null then
dbms_output.put_line('value: ' || l_last_val || ' sum: ' || l_sum); -- save last_val and sum
end if;
l_last_val := rec.val;
l_sum := rec.delay;
end loop;
dbms_output.put_line('value: ' || l_last_val || ' sum: ' || l_sum); -- save last_val and sum
end;

Related

SQL - Extracting an ID range for a packet of records

I have a table where I have about 40000000 records. Min(id) = 2 and max(80000000).
I would like create a automated script which will be running in a loop.
But I don't want to create about 80 iteration because a part of then will be empty.
Who knows how I can find range min(id) and max(id) for first iteration, and next?
I used mod but it doesn't work correctly:
SELECT MIN(ID), MAX(ID)
FROM (
SELECT mod(id,45), id FROM table
WHERE mod(id,45) = 0
GROUP BY mod(id,45), id
ORDER BY id desc
)
Because I want to:
first iteration has range for 1mln records: min(id) = 2 max(id) = 1 500 000
second iteration has range for 1 mln records: min(id)=1 550 000, max(id) = 5 000 000
and so on
It should be easy for whatever DBMS supporting ordered numbering of rows.
Db2.
Every SELECT returns 2 rows except the last one, which may return less rows.
SELECT 'SELECT * FROM MYTAB WHERE I BETWEEN ' || MIN (I) || ' AND ' || MAX (I) AS STMT
FROM
(
SELECT I, (ROW_NUMBER () OVER (ORDER BY I) - 1) / 2 AS RN_
FROM (VALUES 1, 9, 2, 7, 4) MYTAB (I)
) G
GROUP BY RN_
The result is:
STMT
SELECT * FROM MYTAB WHERE I BETWEEN 1 AND 2
SELECT * FROM MYTAB WHERE I BETWEEN 4 AND 7
SELECT * FROM MYTAB WHERE I BETWEEN 9 AND 9

How to compare range of integer values in PL/SQL?

I am trying to compare a range of integer values between a test table and a reference table. If any range of values from the test table overlaps with the available ranges in the reference table, it should be deleted.
Sorry if it's not clear but here is an example data:
TEST_TABLE:
MIN MAX
10 121
122 648
1200 1599
REFERENCE_TABLE:
MIN MAX
50 106
200 1400
1450 1500
MODIFIED TEST_TABLE: (expected result after running PL/SQL)
MIN MAX
10 49
107 121
122 199
1401 1449
1501 1599
In the first row from the example above, the 10-121 has been cut down into two rows: 10-49 and 107-121 because the values 50, 51, ..., 106 are included in the first row of the reference_table (50-106); and so on..
Here's what I've written so far with nested loops. I've created two additional temp tables that would store all the values that would be found in the reference table. Then it would create new sets of ranges to be inserted to test_table.
But this does not seem to work correctly and might cause performance issues especially if we're dealing with values of millions and above:
CREATE TABLE new_table (num_value NUMBER);
CREATE TABLE new_table_next (num_value NUMBER, next_value NUMBER);
-- PL/SQL start
DECLARE
l_count NUMBER;
l_now_min NUMBER;
l_now_max NUMBER;
l_final_min NUMBER;
l_final_max NUMBER;
BEGIN
FOR now IN (SELECT min_num, max_num FROM test_table) LOOP
l_now_min:=now.min_num;
l_now_max:=now.max_num;
WHILE (l_now_min < l_now_max) LOOP
SELECT COUNT(*) -- to check if number is found in reference table
INTO l_count
FROM reference_table refr
WHERE l_now_min >= refr.min_num
AND l_now_min <= refr.max_num;
IF l_count > 0 THEN
INSERT INTO new_table (num_value) VALUES (l_now_min);
COMMIT;
END IF;
l_now_min:=l_now_min+1;
END LOOP;
INSERT INTO new_table_next (num_value, next_value)
VALUES (SELECT num_value, (SELECT MIN (num_value) FROM new_table t2 WHERE t2.num_value > t.num_value) AS next_value FROM new_table t);
DELETE FROM test_table t
WHERE now.min_num = t.min_num
AND now.max_num = t.max_num;
COMMIT;
SELECT (num_value + 1) INTO l_final_min FROM new_table_next;
SELECT (next_value - num_value - 2) INTO l_final_max FROM new_table_next;
INSERT INTO test_table (min_num, max_num)
VALUES (l_final_min, l_final_max);
COMMIT;
DELETE FROM new_table;
DELETE FROM new_table_next;
COMMIT;
END LOOP;
END;
/
Please help, I'm stuck. :)
The idea behind this approach is to unwind both tables, keeping track of whether the numbers are in the reference table or the original table. This is really cumbersome, because adjacent values can cause problems.
The idea is then to do a "gaps-and-islands" type solution along both dimensions -- and then only keep the values that are in the original table and not in the second. Perhaps this could be called "exclusionary gaps-and-islands".
Here is a working version:
with vals as (
select min as x, 1 as inc, 0 as is_ref
from test_table
union all
select max + 1, -1 as inc, 0 as is_ref
from test_table
union all
select min as x, 0, 1 as is_ref
from reference_table
union all
select max + 1 as x, 0, -1 as is_ref
from reference_table
)
select min, max
from (select refgrp, incgrp, ref, inc2, min(x) as min, (lead(min(x), 1, max(x) + 1) over (order by min(x)) - 1) as max
from (select v.*,
row_number() over (order by x) - row_number() over (partition by ref order by x) as refgrp,
row_number() over (order by x) - row_number() over (partition by inc2 order by x) as incgrp
from (select v.*, sum(is_ref) over (order by x, inc) as ref,
sum(inc) over (order by x, inc) as inc2
from vals v
) v
) v
group by refgrp, incgrp, ref, inc2
) v
where ref = 0 and inc2 = 1 and min < max
order by min;
And here is a db<>fiddle.
The inverse problem of getting the overlaps is much easier. It might be feasible to "invert" the reference table to handle this.
select greatest(tt.min, rt.min), least(tt.max, rt.max)
from test_table tt join
reference_table rt
on tt.min < rt.max and tt.max > rt.min -- is there an overlap?
This is modified from a similar task (using dates instead of numbers) I did on Teradata, it's based on the same base data as Gordon's (all begin/end values combined in a single list), but uses a simpler logic:
WITH minmax AS
( -- create a list of all existing start/end values (possible to simplify using Unpivot or Cross Apply)
SELECT Min AS val, -1 AS prio, 1 AS flag -- main table, range start
FROM test_table
UNION ALL
SELECT Max+1, -1, -1 -- main table, range end
FROM test_table
UNION ALL
SELECT Min, 1, 1 -- reference table, adjusted range start
FROM reference_table
UNION ALL
SELECT Max+1, 1, -1 -- reference table, adjusted range end
FROM reference_table
)
, all_ranges AS
( -- create all ranges from current to next row
SELECT minmax.*,
Lead(val) Over (ORDER BY val, prio desc, flag) AS next_val, -- next value = end of range
Sum(flag) Over (ORDER BY val, prio desc, flag ROWS Unbounded Preceding) AS Cnt -- how many overlapping periods exist
FROM minmax
)
SELECT val, next_val-1
FROM all_ranges
WHERE Cnt = 1 -- 1st level only
AND prio + flag = 0 -- either (prio -1 and flag 1) = range start in base table
-- or (prio 1 and flag -1) = range end in ref table
ORDER BY 1
See db-fiddle
Here's one way to do this. I put the test data in a WITH clause rather than creating the tables (I find this is easier for testing purposes). I used your column names (MIN and MAX); these are very poor choices though, as MIN and MAX are Oracle keywords. They will generate confusion for sure, and they may cause queries to error out.
The strategy is simple - first take the COMPLEMENT of the ranges in REFERENCE_TABLE, which will also be a union of intervals (using NULL as marker for minus infinity and plus infinity); then take the intersection of each interval in TEST_TABLE with each interval in the complement of REFERENCE_TABLE. How that is done is shown in the final (outer) query in the solution below.
with
test_table (min, max) as (
select 10, 121 from dual union all
select 122, 648 from dual union all
select 1200, 1599 from dual
)
, reference_table (min, max) as (
select 50, 106 from dual union all
select 200, 1400 from dual union all
select 1450, 1500 from dual
)
,
prep (min, max) as (
select lag(max) over (order by max) + 1 as min
, min - 1 as max
from ( select min, max from reference_table
union all
select null, null from dual
)
)
select greatest(t.min, nvl(p.min, t.min)) as min
, least (t.max, nvl(p.max, t.max)) as max
from test_table t inner join prep p
on t.min <= nvl(p.max, t.max)
and t.max >= nvl(p.min, t.min)
order by min
;
MIN MAX
---------- ----------
10 49
107 121
122 199
1401 1449
1501 1599
Example to resolve the problem:
CREATE TABLE xrange_reception
(
vdeb NUMBER,
vfin NUMBER
);
CREATE TABLE xrange_transfert
(
vdeb NUMBER,
vfin NUMBER
);
CREATE TABLE xrange_resultat
(
vdeb NUMBER,
vfin NUMBER
);
insert into xrange_reception values (10,50);
insert into xrange_transfert values (15,25);
insert into xrange_transfert values (30,33);
insert into xrange_transfert values (40,45);
DECLARE
CURSOR cr_rec IS SELECT * FROM xrange_reception;
CURSOR cr_tra IS
SELECT *
FROM xrange_transfert
ORDER BY vdeb;
i NUMBER;
vdebSui NUMBER;
BEGIN
FOR rc IN cr_rec
LOOP
i := 1;
vdebSui := NULL;
FOR tr IN cr_tra
LOOP
IF tr.vdeb BETWEEN rc.vdeb AND rc.vfin
THEN
IF i = 1 AND tr.vdeb > rc.vdeb
THEN
INSERT INTO xrange_resultat (vdeb, vfin)
VALUES (rc.vdeb, tr.vdeb - 1);
ELSIF i = cr_rec%ROWCOUNT AND tr.vfin < rc.vfin
THEN
INSERT INTO xrange_resultat (vdeb, vfin)
VALUES (tr.vfin, rc.vfin);
ELSIF vdebSui < tr.vdeb
THEN
INSERT INTO xrange_resultat (vdeb, vfin)
VALUES (vdebSui + 1, tr.vdeb - 1);
END IF;
vdebSui := tr.vfin;
i := i + 1;
END IF;
END LOOP;
IF vdebSui IS NOT NULL THEN
IF vdebSui < rc.vfin
THEN
INSERT INTO xrange_resultat (vdeb, vfin)
VALUES (vdebSui + 1, rc.vfin);
END IF;
ELSE
INSERT INTO xrange_resultat (vdeb, vfin)
VALUES (rc.vdeb, rc.vfin);
END IF;
END LOOP;
END;
So:
Table xrange_reception:
vdeb vfin
10 50
Table xrange_transfert:
vdeb vfin
15 25
30 33
40 45
Table xrange_resultat:
vdeb vfin
10 14
26 29
34 39
46 50

Dynamically spilt rows in to mulitple comma-separated lists

I have a table that contains a list of users.
USER_TABLE
USER_ID DEPT
------- ----
USER1 HR
USER2 FINANCE
USER3 IT`
Using a SQL statement, I need to get the list of users as a delimited string returned as a varchar2 - this is the only datatype I can use as dictated by the application I'm using, e.g.
USER1, USER2, USER3
The issue I have is the list will exceed 4000 characters. I have the following which will manually chunk up the users in to lists of 150 users at a time (based on user_id max size being 20 characters plus delimiters safely fitting in to 4000 characters).
SELECT LISTAGG(USER_ID, ',') WITHIN GROUP (ORDER BY USER_ID)
FROM (SELECT DISTINCT USER_ID AS USER_ID, ROW_NUMBER() OVER (ORDER BY USER_ID) RN FROM TABLE_NAME)
WHERE RN <= 150
START WITH RN = 1
CONNECT BY PRIOR RN = RN - 1
UNION
SELECT LISTAGG(USER_ID, ',') WITHIN GROUP (ORDER BY USER_ID)
FROM (SELECT DISTINCT USER_ID AS USER_ID, ROW_NUMBER() OVER (ORDER BY USER_ID) RN FROM TABLE_NAME)
WHERE RN > 150 AND RN <= 300
START WITH RN = 1
CONNECT BY PRIOR RN = RN - 1
This is manual and would require an additional UNION for each chunk of 150 users and the total number of users could increase at a later date.
Is it possible to do this so the delimited strings of user_ids are generated dynamically so they fit in to multiple chunks of 4000 characters and no user_ids are split over multiple strings?
Ideally, I'd want the output to look like this:
USER1, USER2, USER3 (to) USER149
USER150, USER151, USER152 (to) USER300
USER301, USER302, USER303 (to) USER450`
The solution needs to be a SELECT statement as the schema is read-only and we aren't able to create any objects on the database. We're using Oracle 11g.
You can do this with a pipelined function:
create or replace function get_user_ids
return sys.dbms_debug_vc2coll pipelined
is
rv varchar2(4000) := null;
begin
for r in ( select user_id, length(user_id) as lng
from user_table
order by user_id )
loop
if length(rv) + r.lng + 1 > 4000
then
rv := rtrim(rv, ','); -- remove trailing comma
pipe row (rv);
rv := null;
end if;
rv := rv || r.user_id || ',';
end loop;
return;
end;
/
You would call it like this:
select column_value as user_id_csv
from table(get_user_ids);
Alternate way using below function :
create or replace FUNCTION my_agg_user
RETURN CLOB IS
l_string CLOB;
TYPE t_bulk_collect_test_tab IS TABLE OF VARCHAR2(4000);
l_tab t_bulk_collect_test_tab;
CURSOR user_list IS
SELECT USER_ID
FROM USER_TABLE ;
BEGIN
OPEN user_list;
LOOP
FETCH user_list
BULK COLLECT INTO l_tab LIMIT 1000;
FOR indx IN 1 .. l_tab.COUNT
LOOP
l_string := l_string || l_tab(indx);
l_string := l_string || ',';
END LOOP;
EXIT WHEN user_list%NOTFOUND;
END LOOP;
CLOSE user_list;
RETURN l_string;
END my_agg_user;
After function created ,
select my_agg_user from dual;
I believe the SQL I have below should work in most cases. I've hard-coded the SQL to break the strings up in to 150 entries of user id, but the rest is dynamic.
The middle part produces duplicates, which requires an additional distinct to eliminate, but I'm not sure if there is a better way to do this.
WITH POSITION AS ( SELECT ((LEVEL-1) * 150 + 1) FROM_POS, LEVEL * 150 TO_POS
FROM DUAL
CONNECT BY LEVEL <= (SELECT COUNT(DISTINCT( USER_ID)) / 150 FROM TABLE_NAME)
)
SELECT DISTINCT
LISTAGG(USER_ID, ',') WITHIN GROUP (ORDER BY USER_ID) OVER (PARTITION BY FROM_POS, TO_POS)
FROM
(SELECT DISTINCT USER_ID AS USER_ID, ROW_NUMBER() OVER (ORDER BY USER_ID) RN FROM TABLE_NAME) V0 ,
POSITION
WHERE V0.RN >= POSITION.FROM_POS
AND V0.RN <= POSITION.TO_POS

Fill in missing dates in date range from a table

table A
no date count
1 20160401 1
1 20160403 4
2 20160407 3
result
no date count
1 20160401 1
1 20160402 0
1 20160403 4
1 20160404 0
.
.
.
2 20160405 0
2 20160406 0
2 20160407 3
.
.
.
I'm using Oracle and I want to write a query that returns rows for every date within a range based on table A.
Is there some function in Oracle that can help me?
you can use the SEQUENCES.
First create a sequence
Create Sequence seq_name start with 20160401 max n;
where n is the max value till u want to display.
Then use the sql
select seq_name.next,case when seq_name.next = date then count else 0 end from tableA;
Note:- Its better not to use date,count as the column names.
Try this:
with
A as (
select 1 no, to_date('20160401', 'yyyymmdd') dat, 1 cnt from dual union all
select 1 no, to_date('20160403', 'yyyymmdd') dat, 4 cnt from dual union all
select 2 no, to_date('20160407', 'yyyymmdd') dat, 3 cnt from dual),
B as (select min(dat) mindat, max(dat) maxdat from A t),
C as (select level + mindat - 1 dat from B connect by level + mindat - 1 <= maxdat),
D as (select distinct no from A),
E as (select * from D,C)
select E.no, E.dat, nvl(cnt, 0) cnt
from E
full outer join A on A.no = E.no and A.dat = E.dat
order by 1, 2, 3
This isn't an oracle specific answer, you'll need to translate it to oracle yourself.
Create an intervals table, containing all integers from 0 to 999. Something like this:
CREATE TABLE intervals (days int);
INSERT INTO intervals (days) VALUES (0), (1);
DECLARE #rc int;
SELECT #rc = 2;
WHILE (SELECT Count(*) FROM intervals) < 1000 BEGIN
INSERT INTO intervals (days) SELECT days + #rc FROM intervals WHERE days + #rc < 1000;
SELECT #rc = #rc * 2
END;
Then all the dates in the range can be identified by adding intervals.days to the first date you've got, where the first date + intervals.days is <= the end date, and the resultant date is new. Do this by cross joining intervals to your own table. Something like (it would be in SQL, but again you'll need to translate):
SELECT DateAdd(a.date, d, i.days)
FROM (select min(date) from table_A) a, intervals I
WHERE DateAdd(a.date, d, i.days) < (select max(date) from table_A)
AND NOT EXISTS (select 1 from table_A aa where aa.date = DateAdd(a.date, d, i.days))
Hope this gives you a starting point

Sum of working days with date ranges from multiple records (overlapping)

suppose there are records as follows:
Employee_id, work_start_date, work_end_date
1, 01-jan-2014, 07-jan-2014
1, 03-jan-2014, 12-jan-2014
1, 23-jan-2014, 25-jan-2014
2, 15-jan-2014, 25-jan-2014
2, 07-jan-2014, 15-jan-2014
2, 09-jan-2014, 12-jan-2014
The requirement is to write an SQL select statment which would summarize the work days grouped by employee_id, but exclude the overlapped periods (meaning - take them into calculation only once).
The desired output would be:
Employee_id, worked_days
1, 13
2, 18
The calculations for working days in the date range are done like this:
If work_start_date = 5 and work_end_date = 9 then worked_days = 4 (9 - 5).
I could write a pl/sql function which solves this (manually iterating over the records and doing the calculation), but I'm sure it can be done using SQL for better performance.
Can someone please point me in the right direction?
Thanks!
This is a slightly modified query from similar question:
compute sum of values associated with overlapping date ranges
SELECT "Employee_id",
SUM( "work_end_date" - "work_start_date" )
FROM(
SELECT "Employee_id",
"work_start_date" ,
lead( "work_start_date" )
over (Partition by "Employee_id"
Order by "Employee_id", "work_start_date" )
As "work_end_date"
FROM (
SELECT "Employee_id", "work_start_date"
FROM Table1
UNION
SELECT "Employee_id","work_end_date"
FROM Table1
) x
) x
WHERE EXISTS (
SELECT 1 FROM Table1 t
WHERE t."work_start_date" > x."work_end_date"
AND t."work_end_date" > x."work_start_date"
OR t."work_start_date" = x."work_start_date"
AND t."work_end_date" = x."work_end_date"
)
GROUP BY "Employee_id"
;
Demo: http://sqlfiddle.com/#!4/4fcce/2
This is a tricky problem. For instance, you can't use lag(), because the overlapping period may not be the "previous" one. Or different periods can start and or stop on the same day.
The idea is to reconstruct the periods. How to do this? Find the records where the periods start -- that is, there is no overlap with any other. Then use this as a flag and count this flag cumulatively to count overlapping groups. Then getting the working days is just aggregation from there:
with ps as (
select e.*,
(case when exists (select 1
from emps e2
where e2.employee_id = e.employee_id and
e2.work_start_date <= e.work_start_date and
e2.work_end_date >= e.work_end_date
)
then 0 else 1
) as IsPeriodStart
from emps e
)
select employee_id, sum(work_end_date - work_start_date) as Days_Worked
from (select employee_id, min(work_start_date) as work_start_date,
max(work_end_date) as work_end_date
from (select ps.*,
sum(IsPeriod_Start) over (partition by employee_id
order by work_start_date
) as grp
from ps
) ps
group by employee_id, grp
) ps
group by employee_id;
date_tbl type
create or replace package RG_TYPE is
type date_tbl is table of date;
end;
function (result as a table with the dates between 2 parameters)
create or replace function dates
(
p_from date,
p_to date
) return rg_type.date_tbl pipelined
is
l_idx date:=p_from;
begin
loop
if l_idx>nvl(p_to,p_from) then
exit;
end if;
pipe row(l_idx);
l_idx:=l_idx+1;
end loop;
return;
end;
SQL:
select employee_id,sum(c)
from
(select e.employee_id,d.column_value,count(distinct w.employee_id) as c
from (select distinct employee_id from works) e,
table(dates((select min(work_start_date) as a from works),(select max(work_end_date) as b from works))) d,
works w
where e.employee_id=w.employee_id
and d.column_value>=w.work_start_date
and d.column_value<w.work_end_date
group by e.employee_id,d.column_value) Sub
group by employee_id
order by 1,2