Oracle SQL calculate average/opening/closing balances from discrete data - sql

I have account balances like this
acc_no balance balance_date
account1 5000 2020-01-01
account1 6000 2020-01-05
account2 3000 2020-01-01
account1 3500 2020-01-08
account2 7500 2020-01-15
the effective balance for any day without a balance entry is equal to to the last balance. eg account1 balance on 2,3,4 Jan is 5000 etc.
I would like to produce the daily average, opening and closing balance from this data for any period. I came up with the following query and it works but it takes half an hour when I run it against the full data set. Is my approach correct or there is a more efficient method?
WITH cte_period
AS (
SELECT '2020-01-01' date_from
,'2020-01-31' date_to
FROM dual
)
,cte_calendar
AS (
SELECT rownum
,(
SELECT to_date(date_from, 'YYYY-MM-DD')
FROM cte_period
) + rownum - 1 AS balance_day
FROM dual connect BY rownum <= (
SELECT to_date(date_to, 'YYYY-MM-DD')
FROM cte_period
) - (
SELECT to_date(date_from, 'YYYY-MM-DD')
FROM cte_period
) + 1
)
,cte_balances
AS (
SELECT 'account1' acc_no
,5000 balance
,to_date('2020-01-01', 'YYYY-MM-DD') sys_date
FROM dual
UNION ALL
SELECT 'account1'
,6000
,to_date('2020-01-05', 'YYYY-MM-DD')
FROM dual
UNION ALL
SELECT 'account2'
,3000
,to_date('2020-01-01', 'YYYY-MM-DD')
FROM dual
UNION ALL
SELECT 'account1'
,3500
,to_date('2020-01-08', 'YYYY-MM-DD')
FROM dual
UNION ALL
SELECT 'account2'
,7500
,to_date('2020-01-15', 'YYYY-MM-DD')
FROM dual
)
,cte_accounts
AS (
SELECT DISTINCT acc_no
FROM cte_balances
)
SELECT t.acc_no
,(
SELECT eff_bal
FROM (
SELECT cal.balance_day
,acc_nos.acc_no
,(
SELECT balance
FROM cte_balances bal
WHERE bal.sys_date <= cal.balance_day
AND acc_nos.acc_no = bal.acc_no
ORDER BY bal.sys_date DESC FETCH first 1 row ONLY
) eff_bal
FROM cte_calendar cal
CROSS JOIN cte_accounts acc_nos
) t1
WHERE balance_day = (
SELECT to_date(date_from, 'YYYY-MM-DD')
FROM cte_period
)
AND t.acc_no = t1.acc_no
) opening_bal
,(
SELECT eff_bal
FROM (
SELECT cal.balance_day
,acc_nos.acc_no
,(
SELECT balance
FROM cte_balances bal
WHERE bal.sys_date <= cal.balance_day
AND acc_nos.acc_no = bal.acc_no
ORDER BY bal.sys_date DESC FETCH first 1 row ONLY
) eff_bal
FROM cte_calendar cal
CROSS JOIN cte_accounts acc_nos
) t1
WHERE balance_day = (
SELECT to_date(date_to, 'YYYY-MM-DD')
FROM cte_period
)
AND t.acc_no = t1.acc_no
) closing_bal
,round(avg(eff_bal), 2) avg_bal
FROM (
SELECT cal.balance_day
,acc_nos.acc_no
,(
SELECT balance
FROM cte_balances bal
WHERE bal.sys_date <= cal.balance_day
AND acc_nos.acc_no = bal.acc_no
ORDER BY bal.sys_date DESC FETCH first 1 row ONLY
) eff_bal
FROM cte_calendar cal
CROSS JOIN cte_accounts acc_nos
) t
GROUP BY acc_no
order by acc_no
The expected result
ACC_NO OPENING_BAL CLOSING_BAL AVG_BAL
account1 5000 3500 3935.48
account2 3000 7500 5467.74

Yes. You are unnecesary selecting from same table many times. Produce calendar as you did, join with your data partitioned by account and use analytic functions for computations:
select acc_no, round(avg(bal), 2) av_bal,
max(bal) keep (dense_rank first order by day) op_bal,
max(bal) keep (dense_rank last order by day) cl_bal
from (
select acc_no, day,
nvl(balance, lag(balance) ignore nulls over (partition by acc_no order by day)) bal
from (
select date_from + level - 1 as day
from (select date '2020-01-01' date_from, date '2020-01-31' date_to from dual)
connect by date_from + level - 1 <= date_to)
left join cte_balances partition by (acc_no) on day = sys_date)
group by acc_no
dbfiddle
Edit:
sometimes the first day of the month has no balance entry, it should
take form the last available
We have to treat first row in special way. It's done in subquery data, where in case of first row and null balance I run correlated subquery which looks for balance from max previous date.
with
cte_calendar as (
select level lvl, date_from + level - 1 as day
from (select date '2020-01-01' date_from, date '2020-01-31' date_to from dual)
connect by date_from + level - 1 <= date_to),
data as (
select lvl, day, acc_no,
case when balance is null and lvl = 1
then (select max(balance) keep (dense_rank last order by sys_date)
from cte_balances a
where a.acc_no = b.acc_no and a.sys_date <= day)
else balance
end bal
from cte_calendar
left join cte_balances b partition by (acc_no) on day = sys_date)
select acc_no,
max(bal) keep (dense_rank first order by day) op_bal,
max(bal) keep (dense_rank last order by day) cl_bal,
round(avg(bal), 2)
from (
select acc_no, day,
nvl(bal, lag(bal) ignore nulls over (partition by acc_no order by day)) bal
from data)
group by acc_no
dbfiddle
although I don't understand it yet
There are thre things, which are not obvoius here and you should know to understand query:
partitioned outer join. It's main part of the solution which produces whole period for each account. You can read about them here for instance,
lag() ignore nulls - fills null balance values, take them from previous not null,
max(bal) keep (dense_rank first order by day) takes balance value from first date for opening balance. last - from last row for closing balance.

If you can afford using first_value, last_value analytic functions, then this, based on my understanding of your description, may help:
with data as (
select 'account1' as acc, 5000 as balance, to_date('2020-01-01', 'YYYY-MM-DD') as d from dual
union all select 'account1' as acc, 6000 as balance, to_date('2020-01-05', 'YYYY-MM-DD') as d from dual
union all select 'account2' as acc, 3000 as balance, to_date('2020-01-01', 'YYYY-MM-DD') as d from dual
union all select 'account1' as acc, 3500 as balance, to_date('2020-01-08', 'YYYY-MM-DD') as d from dual
union all select 'account1' as acc, 7500 as balance, to_date('2020-01-15', 'YYYY-MM-DD') as d from dual
)
select acc, avg(balance) over (partition by acc order by balance) as average,
first_value(balance) over(partition by acc order by balance asc rows unbounded preceding) as first,
last_value(balance) over(partition by acc order by balance asc rows unbounded preceding) as last
from data
where d between to_date('2020-01-01', 'YYYY-MM-DD') and to_date('2020-01-06', 'YYYY-MM-DD')
order by acc
ACC | AVERAGE | FIRST | LAST
:------- | ------: | ----: | ---:
account1 | 5000 | 5000 | 5000
account1 | 5500 | 5000 | 6000
account2 | 3000 | 3000 | 3000
db<>fiddle here

Related

SQL to Calculate Customer Tenure/Service Start Date using a cooldown period logic

The business scenario here is to calculate the customer tenure with the service provider. Customer tenure is calculated based on below aspects:
Oldest account start date to be taken for tenure calculation
One Customer can have more than 1 active account at a given time
Cooldown period is 6 months, i.e., if a customer has to stay as a customer, s/he has 6 months to open a new account with the provider after closing the account or should already have another account open before closing
If the customer opens an account post 6 months then the tenure calculation happens from the new account open date
We can better understand this with an example: (values in bold are Customer since/tenure-start date)
Customer_ID
ACCT_SERIAL_NUM
ACCT_STRT_DT
ACCT_END_DT
COMMENTS
11111
Account1
2000-01-20
(null)
Customer already had an active account before closing the existing account
11111
Account2
2002-12-10
2021-09-22
11111
Account3
2021-10-22
(null)
Customer_ID
ACCT_SERIAL_NUM
ACCT_STRT_DT
ACCT_END_DT
COMMENTS
11112
Account1
2000-01-20
2002-08-10
Account closed but customer opened another account within cooling period of 6months
11112
Account2
2002-12-10
2021-09-22
11112
Account3
2021-10-22
(null)
Customer_ID
ACCT_SERIAL_NUM
ACCT_STRT_DT
ACCT_END_DT
COMMENTS
11113
Account1
2000-01-20
2002-05-10
Account closed but customer didn't open another account within cooling period of 6months
11113
Account2
2002-12-10
2021-09-22
Hence this is the new customer tenure start date
11113
Account3
2021-10-22
(null)
The query I was trying (below) could possibly help me if the events occur sequentially (like in above 3 scenarios)
With dataset as (
SELECT Customer_ID, ACCT_SERIAL_NUM, ACCT_STRT_DT, ACCT_END_DT, COMMENTS,
CASE WHEN NVL(LEAD(ACCT_STRT_DT, 1) OVER(PARTITION BY Customer_ID ORDER BY ACCT_STRT_DT asc ) , SYSDATE-1 ) < ADD_MONTHS(nvl(acct_end_dt, SYSDATE), 6)
THEN 'Y' ELSE 'N' END as ACTV_FLG
FROM calc_customer_tenure ct
order by Customer_ID, ACCT_STRT_DT asc )
SELECT
Customer_ID, MIN(CASE WHEN FLAG = 'Y' THEN ACCT_STRT_DT ELSE NULL END) as CUST_TNUR
FROM (
SELECT ds.*,
CASE WHEN ACCT_END_DT is NULL
THEN 'Y' ELSE MIN(ACTV_FLG) OVER (PARTITION BY Customer_ID ORDER BY ACCT_STRT_DT asc ROWS between current row and unbounded following)
END as FLAG
from dataset ds )
GROUP BY Customer_ID ORDER BY Customer_ID ;
but fails for the below scenario: (which is an ideal real-world scenario)
Unfortunately the above code takes account3 as start date instead of taking account1:
Customer_ID
ACCT_SERIAL_NUM
ACCT_STRT_DT
ACCT_END_DT
COMMENTS
11114
Account1
2000-01-20
2021-08-22
Customer has closed this account(1) after subsequent account(2) is closed. But then has opened an account(3) within 6 months of closing the account(1) hence this is the tenure start date
11114
Account2
2002-12-10
2003-12-10
11114
Account3
2021-10-22
(null)
Thanks to Akina I was able to re-write the query to fit as required! Also thanks to P3Consulting for contributing! Really appreciate the support!
Re-posting the final SQL here for the Oracle which helped with my use case:
Below is using Recursive CTEs
WITH cte1 as (
SELECT customer_id, ACCT_STRT_DT, ACCT_END_DT,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY ACCT_STRT_DT) rn
FROM calc_customer_tenure
), cte2 (customer_id, ACCT_STRT_DT, ACCT_END_DT, rn, tenure_start_date, tenure_end_date) AS (
SELECT customer_id, ACCT_STRT_DT, ACCT_END_DT, rn,
ACCT_STRT_DT tenure_start_date,
ACCT_END_DT tenure_end_date
FROM cte1
WHERE rn = 1
UNION ALL
SELECT cte1.customer_id, cte1.ACCT_STRT_DT, cte1.ACCT_END_DT, cte1.rn,
CASE WHEN cte1.ACCT_STRT_DT > ADD_MONTHS(cte2.tenure_end_date, 6)
THEN cte1.ACCT_STRT_DT
ELSE cte2.tenure_start_date
END,
CASE WHEN cte1.ACCT_STRT_DT > ADD_MONTHS(cte2.tenure_end_date, 6)
THEN cte1.ACCT_END_DT
ELSE GREATEST(cte1.ACCT_END_DT, cte2.tenure_end_date)
END
FROM cte1
JOIN cte2 ON cte1.customer_id = cte2.customer_id AND cte1.rn = cte2.rn + 1
)
SELECT customer_id, CASE WHEN ADD_MONTHS(NVL(tenure_end_date, SYSDATE), 6) < SYSDATE THEN NULL ELSE tenure_start_date END AS CUSTOMER_TENURE_START_DATE FROM (
SELECT
cte2.*, row_number() over (partition by customer_id order by rn desc) as rank_derv
FROM cte2 ) subset
WHERE rank_derv = 1
ORDER BY 1,2 ;
I am also posting one which may work in case of Oracle only (since it uses hierarchical query syntax):
WITH dataset_rnkd as (
SELECT CT.*, row_number() over (partition by customer_id order by ACCT_STRT_DT DESC) as row_rnk
from calc_customer_tenure CT
)
SELECT customer_id, MIN(ACCT_STRT_DT) as CUSTOMER_TENURE FROM (
SELECT * FROM dataset_rnkd
START WITH ADD_MONTHS(NVL(ACCT_END_DT, SYSDATE), 6) >= SYSDATE
CONNECT BY NOCYCLE PRIOR customer_id = customer_id AND PRIOR ACCT_STRT_DT <= ADD_MONTHS(NVL(ACCT_END_DT, SYSDATE), 6)
) DS
GROUP BY customer_id
ORDER BY customer_id ;
Try this, gives the correct results for the dataset you supplied, but you should test more scenarii:
with data(customer_id,acct_serial_num,acct_strt_dt,acct_end_dt,comments) as
(
select '11111', 'Account1', to_date('2000-01-20','yyyy-mm-dd'), cast(NULL AS DATE), 'Customer already had an active account before closing the existing account' from dual union all
select '11111', 'Account2', to_date('2002-12-10','yyyy-mm-dd'), to_date('2021-09-22','yyyy-mm-dd'), '' from dual union all
select '11111', 'Account3', to_date('2021-10-22','yyyy-mm-dd'), cast(NULL AS DATE), '' from dual union all
select '11112', 'Account1', to_date('2000-01-20','yyyy-mm-dd'), to_date('2002-08-10','yyyy-mm-dd'), 'Account closed but customer opened another account within cooling period of 6months' from dual union all
select '11112', 'Account2', to_date('2002-12-10','yyyy-mm-dd'), to_date('2021-09-22','yyyy-mm-dd'), '' from dual union all
select '11112', 'Account3', to_date('2021-10-22','yyyy-mm-dd'), cast(NULL AS DATE), '' from dual union all
select '11113', 'Account1', to_date('2000-01-20','yyyy-mm-dd'), to_date('2002-05-10','yyyy-mm-dd'), 'Account closed but customer didn''t open another account within cooling period of 6months' from dual union all
select '11113', 'Account2', to_date('2002-12-10','yyyy-mm-dd'), to_date('2021-09-22','yyyy-mm-dd'), 'Hence this is the new customer tenure start date' from dual union all
select '11113', 'Account3', to_date('2021-10-22','yyyy-mm-dd'), cast(NULL AS DATE), '' from dual union all
select '11114', 'Account1', to_date('2000-01-20','yyyy-mm-dd'), to_date('2021-08-22','yyyy-mm-dd'), 'Customer has closed this account(1) after subsequent account(2) is closed. But then has opened an account(3) within 6 months of closing the account(1) hence this is the tenure start date' from dual union all
select '11114', 'Account2', to_date('2002-12-10','yyyy-mm-dd'), to_date('2003-12-10','yyyy-mm-dd'), '' from dual union all
select '11114', 'Account3', to_date('2021-10-22','yyyy-mm-dd'), cast(NULL AS DATE), '' from dual
),
datawc as (
select d.customer_id,acct_serial_num,acct_strt_dt, nvl(d.acct_end_dt, to_date('2999-12-31','yyyy-mm-dd')) as acct_end_dt,
nvl(add_months(acct_end_dt,6),to_date('2999-12-31','yyyy-mm-dd')) as cooldown_end_dt,
case when
acct_strt_dt < add_months(lag(acct_end_dt,1) over(partition by customer_id order by acct_strt_dt),6)
then 1 else 0 end as prev_within_cooldown,
case when
add_months(nvl(acct_end_dt, to_date('2999-12-31','yyyy-mm-dd')),6) > lead(acct_strt_dt,1) over(partition by customer_id order by acct_strt_dt)
then 1 else 0 end as next_within_cooldown,
d.comments
from data d
),
mergeddata as (
select customer_id, acct_serial_num, acct_strt_dt, acct_end_dt, prev_within_cooldown, next_within_cooldown, cooldown_end_dt, comments
from datawc d
match_recognize(
partition by customer_id
order by acct_strt_dt,acct_end_dt
measures first(acct_serial_num) as acct_serial_num, cooldown_end_dt as cooldown_end_dt,
first(prev_within_cooldown) as prev_within_cooldown, first(next_within_cooldown) as next_within_cooldown,
comments as comments, first(acct_strt_dt) as acct_strt_dt, max(acct_end_dt) as acct_end_dt
pattern( merged* str)
define merged as acct_end_dt >= next(acct_strt_dt)
)
)
select d.customer_id, min(acct_strt_dt) as tenure_dt
from mergeddata d
where next_within_cooldown = 1
group by d.customer_id
;
CUSTO TENURE_DT
----- -----------
11111 20-JAN-2000
11112 20-JAN-2000
11113 10-DEC-2002
11114 20-JAN-2000

Calculate standdard deviation over time

I have information about sales per day. For example:
Date - Product - Amount
01-07-2020 - A - 10
01-03-2020 - A - 20
01-02-2020 - B - 10
Now I would like to know the average sales per day and the standard deviation for the last year. For average I can just count the number of entries per item, and then count 365-amount of entries and take that many 0's, but I wonder what the best way is to calculate the standard deviation while incorporating the 0's for the days there are not sales.
Use a hierarchical (or recursive) query to generate daily dates for the year and then use a PARTITION OUTER JOIN to join it to your product data then you can find the average and standard deviation with the AVG and STDDEV aggregation functions and use COALESCE to fill in NULL values with zeroes:
WITH start_date ( dt ) AS (
SELECT DATE '2020-01-01' FROM DUAL
),
calendar ( dt ) AS (
SELECT dt + LEVEL - 1
FROM start_date
CONNECT BY dt + LEVEL - 1 < ADD_MONTHS( dt, 12 )
)
SELECT product,
AVG( COALESCE( amount, 0 ) ) AS average_sales_per_day,
STDDEV( COALESCE( amount, 0 ) ) AS stddev_sales_per_day
FROM calendar c
LEFT OUTER JOIN (
SELECT t.*
FROM test_data t
INNER JOIN start_date s
ON (
s.dt <= t."DATE"
AND t."DATE" < ADD_MONTHS( s.dt, 12 )
)
) t
PARTITION BY ( t.product )
ON ( c.dt = t."DATE" )
GROUP BY product
So, for your sample data:
CREATE TABLE test_data ( "DATE", Product, Amount ) AS
SELECT DATE '2020-07-01', 'A', 10 FROM DUAL UNION ALL
SELECT DATE '2020-03-01', 'A', 20 FROM DUAL UNION ALL
SELECT DATE '2020-02-01', 'B', 10 FROM DUAL;
This outputs:
PRODUCT | AVERAGE_SALES_PER_DAY | STDDEV_SALES_PER_DAY
:------ | ----------------------------------------: | ----------------------------------------:
A | .0819672131147540983606557377049180327869 | 1.16752986363678031669548047505759328696
B | .027322404371584699453551912568306010929 | .5227083734893166933219264686616717636897
db<>fiddle here

SQL query needed - Counting 365 days backwards

I have searched the forum many times but couldn't find a solution for my situation. I am working with an Oracle database.
I have a table with all Order Numbers and Customer Numbers by Day. It looks like this:
Day | Customer Nbr | Order Nbr
2018-01-05 | 25687459 | 256
2018-01-09 | 36478592 | 398
2018-03-07 | 25687459 | 1547
and so on....
Now I need a SQL Query which gives me a table by day and Customer Nbr and counts the number of unique Order Numbers within the last 365 days starting from column 1.
For the example above the resulting table should look like:
Day | Customer Nbr | Order Cnt
2019-01-01 | 25687459 | 2
2019-01-02 | 25687459 | 2
...
2019-03-01 | 25687459 | 1
One method is to generate values for all days of interest for each customer and then use a correlated subquery:
with dates as (
select date '2019-01-01' + rownum as dte from dual
connect by date '2019-01-01' + rownum < sysdate
)
select d.dte, t.customer_nbr,
(select count(*)
from t t2
where t2.customer_nbr = t.customer_nbr and
t2.day <= t.dte and
t2.date > t.dte - 365
) as order_cnt
from dates d cross join
(select distinct customer_nbr from t) ;
Edit:
I've just seen you clarify the question, which I've interpreted to mean:
For every day in the last year, show how many orders there were for each customer between that date, and 1 year previously. Working on an answer now...
Updated Answer:
For each customer, we count the number of records between the order day, and 365 days before it...
WITH yourTable AS
(
SELECT SYSDATE - 1 Day, 'Alex' CustomerNbr FROM DUAL
UNION ALL
SELECT SYSDATE - 2, 'Alex' FROM DUAL
UNION ALL
SELECT SYSDATE - 366, 'Alex'FROM DUAL
UNION ALL
SELECT SYSDATE - 400, 'Alex'FROM DUAL
UNION ALL
SELECT SYSDATE - 500, 'Alex'FROM DUAL
UNION ALL
SELECT SYSDATE - 1, 'Joe'FROM DUAL
UNION ALL
SELECT SYSDATE - 300, 'Chris'FROM DUAL
UNION ALL
SELECT SYSDATE - 1, 'Chris'FROM DUAL
)
SELECT Day, CustomerNbr, OrdersLast365Days
FROM yourTable t
OUTER APPLY
(
SELECT COUNT(1) OrdersLast365Days
FROM yourTable t2
WHERE t.CustomerNbr = t2.CustomerNbr
AND TRUNC(t2.Day) >= TRUNC(t.Day) - 364
AND TRUNC(t2.Day) <= TRUNC(t.Day)
)
ORDER BY t.Day DESC, t.CustomerNbr;
If you want to report on just the days you have orders for, then a simple WHERE clause should be enough:
SELECT Day, CustomerNbr, COUNT(1) OrderCount
FROM <yourTable>
WHERE TRUNC(DAY) >= TRUNC(SYSDATE -364)
GROUP BY Day, CustomerNbr
ORDER BY Day Desc;
If you want to report on every day, you'll need to generate them first. This can be done by a recursive CTE, which you then join to your table:
WITH last365Days AS
(
SELECT TRUNC (SYSDATE - ROWNUM + 1) dt
FROM DUAL CONNECT BY ROWNUM < 365
)
SELECT d.Day, COALESCE(t.CustomerNbr, 'None') CustomerNbr, SUM(CASE WHEN t.CustomerNbr IS NULL THEN 0 ELSE 1 END) OrderCount
FROM last365Days d
LEFT OUTER JOIN <yourTable> t
ON d.Day = TRUNC(t.Day)
GROUP BY d.Day, t.CustomerNbr
ORDER BY d.Day Desc;
I would probably have done it with and analytic function. In your windowing clause, you can specify a number of rows before, or a range. In this case I will use a range.
This will give you, For Each customer for each day the number of orders during one rolling year before the date displayed
WITH DATES AS (
SELECT * FROM
(SELECT TRUNC(SYSDATE)-(LEVEL-1) AS DAY FROM DUAL CONNECT BY TRUNC(SYSDATE)-(LEVEL-1) >= ( SELECT MIN(TRUNC(DAY)) FROM MY_TABLE ))
CROSS JOIN
(SELECT DISTINCT CUST_ID FROM MY_TABLE))
SELECT DISTINCT
DATES.DAY,
DATES.CUST_ID,
COUNT(ORDER_ID) OVER (PARTITION BY DATES.CUST_ID ORDER BY DATES.DAY RANGE BETWEEN INTERVAL '1' YEAR PRECEDING AND INTERVAL '1' SECOND PRECEDING)
FROM
DATES
LEFT JOIN
MY_TABLE
ON DATES.DAY=TRUNC(MY_TABLE.DAY) AND DATES.CUST_ID=MY_TABLE.CUST_ID
ORDER BY DATES.CUST_ID,DATES.DAY;

Select min/max dates for periods that don't intersect

Example! I have a table with 4 columns. date format dd.MM.yy
id ban start end
1 1 01.01.15 31.12.18
1 1 02.02.15 31.12.18
1 1 05.04.15 31.12.17
In this case dates from rows 2 and 3 are included in dates from row 1
1 1 02.04.19 31.12.20
1 1 05.05.19 31.12.20
In this case dates from row 5 are included in dates from rows 4. Basically we have 2 periods that don't intersect.
01.01.15 31.12.18
and
02.04.19 31.12.20
Situation where a date starts in one period and ends in another are impossible. The end result should look like this
1 1 01.01.15 31.12.18
1 1 02.04.19 31.12.20
I tried using analitical functions(LAG)
select id
, ban
, case
when start >= nvl(lag(start) over (partition by id, ban order by start, end asc), start)
and end <= nvl(lag(end) over (partition by id, ban order by start, end asc), end)
then nvl(lag(start) over (partition by id, ban order by start, end asc), start)
else start
end as start
, case
when start >= nvl(lag(start) over (partition by id, ban order by start, end asc), start)
and end <= nvl(lag(end) over (partition by id, ban order by start, end asc), end)
then nvl(lag(end) over (partition by id, ban order by start, end asc), end)
else end
end as end
from table
Where I order rows and if current dates are included in previous I replace them. It works if I have just 2 rows. For example this
1 1 08.09.15 31.12.99
1 1 31.12.15 31.12.99
turns into this
1 1 08.09.15 31.12.99
1 1 08.09.15 31.12.99
which I can then group by all fields and get what I want, but if there are more
1 2 13.11.15 31.12.99
1 2 31.12.15 31.12.99
1 2 16.06.15 31.12.99
I get
1 2 16.06.15 31.12.99
1 2 16.06.15 31.12.99
1 2 13.11.15 31.12.99
I understand why this happens, but how do I work around it? Running the query multiple times is not an option.
This query looks promising:
-- test data
with t(id, ban, dtstart, dtend) as (
select 1, 1, date '2015-01-01', date '2015-03-31' from dual union all
select 1, 1, date '2015-02-02', date '2015-03-31' from dual union all
select 1, 1, date '2015-03-15', date '2015-03-31' from dual union all
select 1, 1, date '2015-08-05', date '2015-12-31' from dual union all
select 1, 2, date '2015-01-01', date '2016-12-31' from dual union all
select 2, 1, date '2016-01-01', date '2017-12-31' from dual),
-- end of test data
step1 as (select id, ban, dt, to_number(inout) direction
from t unpivot (dt for inout in (dtstart as '1', dtend as '-1'))),
step2 as (select distinct id, ban, dt, direction,
sum(direction) over (partition by id, ban order by dt) sm
from step1),
step3 as (select id, ban, direction, dt dt1,
lead(dt) over (partition by id, ban order by dt) dt2
from step2
where (direction = 1 and sm = 1) or (direction = -1 and sm = 0) )
select id, ban, dt1, dt2
from step3 where direction = 1 order by id, ban, dt1
step1 - unpivot dates and assign 1 for start date, -1 for end
date (column direction)
step2 - add cumulative sum for direction
step3 - filter only interesting dates, pivot second date using lead()
You can shorten this syntax, I divided it to steps to show what's going on.
Result:
ID BAN DT1 DT2
------ ---------- ----------- -----------
1 1 2015-01-01 2015-03-31
1 1 2015-08-05 2015-12-31
1 2 2015-01-01 2016-12-31
2 1 2016-01-01 2017-12-31
I assumed that for different (ID, BAN) we have to make calculations separately. If not - change partitioning and ordering in sum() and lead().
Pivot and unpivot works in Oracle 11 and later, for earlier versions you need case when.
BTW - START is reserved word in Oracle so in my example I changed slightly column names.
I like to do this by identifying the period starts, then doing a cumulative sum to define the group, and a final aggregation:
select id, ban, min(start), max(end)
from (select t.*, sum(start_flag) over (partition by id, bin order by start) as grp
from (select t.*,
(case when exists (select 1
from t t2
where t2.id = t.id and t2.ban = t.ban and
t.start <= t2.end and t.end >= t2.start and
t.start <> t2.start and t.end <> t2.end
)
then 0 else 1
end) as start_flag
from t
) t
) t
group by id, ban, grp;

Number of unique dates

There is table:
CREATE TABLE my_table
(gr_id NUMBER,
start_date DATE,
end_date DATE);
All dates always have zero time portion. I need to know a fastest way to compute number of unique dates inside gr_id.
For example, if there is rows (dd.mm.rrrr):
1 | 01.01.2000 | 07.01.2000
1 | 01.01.2000 | 07.01.2000
2 | 01.01.2000 | 03.01.2000
2 | 05.01.2000 | 07.01.2000
3 | 01.01.2000 | 04.01.2000
3 | 03.01.2000 | 05.01.2000
then right answer will be
1 | 7
2 | 6
3 | 5
At now I use additional table
CREATE TABLE mfr_date_list
(MFR_DATE DATE);
with every date between 01.01.2000 and 31.12.2020 and query like this:
SELECT COUNT(DISTINCT mfr_date_list.mfr_date) cnt,
dt.gr_id
FROM dwh_mfr.mfr_date_list,
(SELECT gr_id,
start_date AS sd,
end_date AS ed
FROM my_table
) dt
WHERE mfr_date_list.mfr_date BETWEEN dt.sd AND dt.ed
AND dt.ed IS NOT NULL
GROUP BY dt.gr_id
This query return correct resul data set, but I think it's not fastest way. I think there is some way to build query withot table mfr_date_list at all.
Oracle 11.2 64-bit.
I would expect what you're doing to be the fastest way (as always test). Your query can be simplified, though this only aids understanding and not necessarily speed:
select t.gr_id, count(distinct dl.mfr_date) as cnt
from my_table t
join mfr_date_list dl
on dl.mfr_date between t.date_start and t.date_end
where t.end_date is not null
group by t.gr_id
Whatever you do you have to generate the data between the two dates somehow as you need to remove the overlap. One way would be to use CAST(MULTISET()), as Lalit Kumar explains:
select gr_id, count(distinct end_date - column_value + 1)
from my_table m
cross join table(cast(multiset(select level
from dual
connect by level <= m.end_date - m.start_date + 1
) as sys.odcinumberlist))
group by gr_id;
GR_ID COUNT(DISTINCTEND_DATE-COLUMN_VALUE+1)
---------- --------------------------------------
1 7
2 6
3 5
This is very Oracle specific but should perform substantially better than most other row-generators as you're only accessing the table once and you're generating the minimal number of rows required due to the condition linking MY_TABLE and your generated rows.
What you really need to do is combine the ranges and then count the lengths. This can be quite challenging because of duplicate dates. The following is one way to approach this.
First, enumerate the dates and determine whether the date is "in" or "out". When the cumulative sum is 0 then it is "out":
select t.gr_id, dt,
sum(inc) over (partition by t.gr_id order by dt) as cume_inc
from (select t.gr_id, t.start_date as dt, 1 as inc
from my_table t
union all
select t.gr_id, t.end_date + 1, -1 as inc
from my_table t
) t
Then, use lead() to determine how long the period is:
with inc as (
select t.gr_id, dt,
sum(inc) over (partition by t.gr_id order by dt) as cume_inc
from (select t.gr_id, t.start_date as dt, 1 as inc
from my_table t
union all
select t.gr_id, t.end_date + 1, -1 as inc
from my_table t
) t
)
select t.gr_id,
sum(nextdt - dt) as daysInUse
from (select inc.*, lead(dt) over (partition by t.gr_id order by dt) as nextdt
from inc
) t
group by t.gr_id;
This is close to what you want. The following are two challenges: (1) putting in the limits and (2) handling ties. The following should work (although there might be off-by-one and boundary issues):
with inc as (
select t.gr_id, dt, priority,
sum(inc) over (partition by t.gr_id order by dt) as cume_inc
from ((select t.gr_id, t.start_date as dt, count(*) as inc, 1 as priority
from my_table t
group by t.gr_id, t.start_date
)
union all
(select t.gr_id, t.end_date + 1, - count(*) as inc, -1
from my_table t
group by t.gr_id, t.end_date
)
) t
)
select t.gr_id,
sum(least(nextdt, date '2020-12-31') - greatest(dt, date, '2010-01-01')) as daysInUse
from (select inc.*, lead(dt) over (partition by t.gr_id order by dt, priority) as nextdt
from inc
) t
group by t.gr_id;