Add missing month in result with values from previous month - sql

I have a result set with month as first column. Some of the month are missing in the result. I need to add previous month record as the missing month till last month.
Current data:
Desired Output:
I have a sql but instead of filling for just missing month it is taking every rows into account and populate it.
select
to_char(generate_series(date_trunc('MONTH',to_date(period,'YYYYMMDD')+interval '1' month),
date_trunc('MONTH',now()+interval '1' day),
interval '1' month) - interval '1 day','YYYYMMDD') as period,
name,age,salary,rating
from( values ('20201205','Alex',35,100,'A+'),
('20210110','Alex',35,110,'A'),
('20210512','Alex',35,999,'A+'),
('20210625','Jhon',20,175,'B-'),
('20210922','Jhon',20,200,'B+')) v (period,name,age,salary,rating) order by 2,3,4,5,1;
Output of this query:
Can someone help in getting desired output.
Regards!!

You can achieve this with a recursive cte like this:
with RECURSIVE ctetest as (SELECT * FROM (values ('2020-12-31'::date,'Alex',35,100,'A+'),
('2021-01-31'::date,'Alex',35,110,'A'),
('2021-05-31'::date,'Alex',35,999,'A+'),
('2021-06-30'::date,'Jhon',20,175,'B-'),
('2021-09-30'::date,'Jhon',20,200,'B+')) v (mth, emp, age, salary, rating)),
cte AS (
SELECT MIN(mth) AS mth, emp, age, salary, rating
FROM ctetest
GROUP BY emp, age, salary, rating
UNION
SELECT COALESCE(n.mth, (l.mth + interval '1 day' + interval '1 month' - interval '1 day')::date), COALESCE(n.emp, l.emp),
COALESCE(n.age, l.age), COALESCE(n.salary, l.salary), COALESCE(n.rating, l.rating)
FROM cte l
LEFT OUTER JOIN ctetest n ON n.mth = (l.mth + interval '1 day' + interval '1 month' - interval '1 day')::date
AND n.emp = l.emp
WHERE (l.mth + interval '1 day' + interval '1 month' - interval '1 day')::date <= (SELECT MAX(mth) FROM ctetest)
)
SELECT * FROM cte order by 2, 1;
Note that although ctetest is not itself recursive, being only used to get the test data, if any cte among multiple ctes are recursive, you must have the recursive keyword after the with.

You can use cross join lateral to fill the gaps and then union all with the original data.
WITH the_table (period, name, age, salary, rating) as ( values
('2020-12-01'::date, 'Alex', 35, 100, 'A+'),
('2021-01-01'::date, 'Alex', 35, 110, 'A'),
('2021-05-01'::date, 'Alex', 35, 999, 'A+'),
('2021-06-01'::date, 'Jhon', 20, 100, 'B-'),
('2021-09-01'::date, 'Jhon', 20, 200, 'B+')
),
t as (
select *, coalesce(
lead(period) over (partition by name order by period) - interval 'P1M',
max(period) over ()
) last_period
from the_table
)
SELECT lat::date period, name, age, salary, rating
from t
cross join lateral generate_series
(period + interval 'P1M', last_period, interval 'P1M') lat
UNION ALL
SELECT * from the_table
ORDER BY name, period;
Please note that using integer data type for a date column is sub-optimal. Better review your data design and use date data type instead. You can then present it as integer if necessary.
period
name
age
salary
rating
2020-12-01
Alex
35
100
A+
2021-01-01
Alex
35
110
A
2021-02-01
Alex
35
110
A
2021-03-01
Alex
35
110
A
2021-04-01
Alex
35
110
A
2021-05-01
Alex
35
999
A+
2021-06-01
Alex
35
999
A+
2021-07-01
Alex
35
999
A+
2021-08-01
Alex
35
999
A+
2021-09-01
Alex
35
999
A+
2021-06-01
Jhon
20
100
B-
2021-07-01
Jhon
20
100
B-
2021-08-01
Jhon
20
100
B-
2021-09-01
Jhon
20
200
B+

Related

Explode time duration defined by start and end timestamp by the hour

I have a table with work shifts (1 row per shift) that include date, start and end time.
Main goal: I want to aggregate the number of working hours per hour per store.
This is what my shift table looks like:
employee_id
store
start_timestamp
end_timestamp
1
1
2022-01-01T07:00
2022-01-01T11:30
2
1
2022-01-01T08:30
2022-01-01T12:30
...
...
...
...
I want to "explode" the information into a table something like this:
hour
employee_id
store
date
scheduled_work (h)
07:00
1
1
2022-01-01
1
08:00
1
1
2022-01-01
1
09:00
1
1
2022-01-01
1
10:00
1
1
2022-01-01
1
11:00
1
1
2022-01-01
0.5
08:00
2
1
2022-01-01
0.5
09:00
2
1
2022-01-01
1
10:00
2
1
2022-01-01
1
11:00
2
1
2022-01-01
1
12:00
2
1
2022-01-01
0.5
...
...
...
...
...
I have tried using a method using cross joins and it consumed a lot of memory and looks like this:
with test as (
select 1 as employee_id, 1 as store_id, timestamp('2022-01-01 07:00:00') as start_timestamp, timestamp('2022-01-01 11:30:00') as end_timestamp union all
select 2 as employee_id, 1 as store_id, timestamp('2022-01-01 08:30:00') as start_timestamp, timestamp('2022-01-01 12:30:00') as end_timestamp
)
, cte as (
select ts
, test.*
, safe_divide(
timestamp_diff(
least(date_add(ts, interval 1 hour), end_timestamp)
, greatest(ts, start_timestamp)
, millisecond
)
, 3600000
) as scheduled_work
from test
cross join unnest(generate_timestamp_array(timestamp('2022-01-01 07:00:00'),
timestamp('2022-01-01 12:30:00'), interval 1 hour)) as ts
order by employee_id, ts)
select * from cte
where scheduled_work >= 0;
It's working but I know this will not be good when the number of shifts starts to add up. Does anyone have another solution that is more efficient?
I'm using BigQuery.
you might want to remove order by inside cte subquery, it'll affect the query performance.
And another similar approach:
WITH test AS (
select 1 as employee_id, 1 as store_id, timestamp('2022-01-01 07:00:00') as start_timestamp, timestamp('2022-01-01 11:30:00') as end_timestamp union all
select 2 as employee_id, 1 as store_id, timestamp('2022-01-01 08:30:00') as start_timestamp, timestamp('2022-01-01 12:30:00') as end_timestamp
),
explodes AS (
SELECT employee_id, store_id, EXTRACT(DATE FROM h) date, TIME_TRUNC(EXTRACT(TIME FROM h), HOUR) hour, 1 AS scheduled_work
FROM test,
UNNEST (GENERATE_TIMESTAMP_ARRAY(
TIMESTAMP_TRUNC(start_timestamp + INTERVAL 1 HOUR, HOUR),
TIMESTAMP_TRUNC(end_timestamp - INTERVAL 1 HOUR, HOUR), INTERVAL 1 HOUR
)) h
UNION ALL
SELECT employee_id, store_id, EXTRACT(DATE FROM h), TIME_TRUNC(EXTRACT(TIME FROM h), HOUR),
CASE offset
WHEN 0 THEN 1 - (EXTRACT(MINUTE FROM h) * 60 + EXTRACT(SECOND FROM h)) / 3600
WHEN 1 THEN (EXTRACT(MINUTE FROM h) * 60 + EXTRACT(SECOND FROM h)) / 3600
END
FROM test, UNNEST([start_timestamp, end_timestamp]) h WITH OFFSET
)
SELECT * FROM explodes WHERE scheduled_work > 0;
Consider below approach
with temp as (
select * replace(
parse_time('%H:%M', start_time) as start_time,
parse_time('%H:%M', end_time) as end_time
)
from your_table
)
select * except(start_time, end_time),
case
when hour = time_trunc(start_time, hour) then (60 - time_diff(start_time, hour, minute)) / 60
when hour = time_trunc(end_time, hour) then time_diff(end_time, hour, minute) / 60
else 1
end as scheduled_work
from (
select time_add(time_trunc(start_time, hour), interval delta hour) as hour,
employee_id, store, date, start_time, end_time
from temp, unnest(generate_array(0,time_diff(end_time, start_time, hour))) delta
)
order by employee_id, hour
if applied to sample data as in your question
output is

SQL COUNT number of patients each month

I have a table with:
PATIENT_ID
START_DATE
END_DATE
Ward
1
19/01/2022
19/02/2022
A
2
20/01/2022
19/03/2022
A
And I want to create a summarized table to show for each month, how many patients were active in that ward as well as the total number of patient days for that month. Is this possible in SQL?
I'm thinking I might need an external DIM_DATE table that has all of the months up until now and starts from the first START_DATE out of all the PATIENT_ID's but doesn't sound very efficient?
Note that I have shown only 1 ward but there are also different wards.
Expected result:
Month
Ward
COUNT_PATIENTS
TOTAL_NUMBER_DAYS
31/01/2022
A
2
33
28/02/2022
A
2
47
31/03/2022
A
1
19
with data(PATIENT_ID, START_DATE, END_DATE, Ward) as (
select column1, to_date(column2, 'dd/mm/yyyy'), to_date(column3, 'dd/mm/yyyy'), column4
from values
(1, '19/01/2022','19/02/2022','A'),
(2, '20/01/2022','19/03/2022','A')
), ranges as (
select
date_trunc('month', min(start_date)) as min_start,
dateadd('day', -1, dateadd('month', 1, date_trunc('month', max(end_date)))) as max_end
from data
), gen as (
select
row_number() over(order by null)-1 as rn
from table(generator(ROWCOUNT => 1000))
), all_months as (
select
dateadd('month', g.rn, date_trunc(month, r.min_start)) as month_start,
dateadd('day', -1, dateadd('month', 1, month_start)) as month_end
from ranges as r
cross join gen as g
qualify month_start <= r.max_end
)
select
a.month_end as month,
d.ward,
a.month_end))+1 as days
count(distinct patient_id) as count_patients,
sum(datediff(days, greatest(d.start_date, a.month_start), least(d.END_DATE, a.month_end))+1) as total_numbers_days
from all_months as a
left join data as d
on a.month_start between date_trunc('month', d.START_DATE) and date_trunc('month', d.END_DATE)
group by 1,2
order by 1,2
gives:
MONTH
WARD
COUNT_PATIENTS
TOTAL_NUMBERS_DAYS
2022-01-31
A
2
25
2022-02-28
A
2
47
2022-03-31
A
1
19
I think your 33 is wrong, as the partials are:
MONTH
WARD
PATIENT_ID
_START
_END
DAYS
2022-01-31
A
1
2022-01-19
2022-01-31
13
2022-01-31
A
2
2022-01-20
2022-01-31
12
2022-02-28
A
1
2022-02-01
2022-02-19
19
2022-02-28
A
2
2022-02-01
2022-02-28
28
2022-03-31
A
2
2022-03-01
2022-03-19
19

SQL Query calculating two additional columns

I have a table which gets populated daily with database size. I need to modify the query where I can calculate daily growth and weekly growth.
select * from sys.dbsize
where SNAP_TIME > sysdate -3
order by SNAP_TIME
Current Output
I would like to add two additional columns which would be
Daily Growth (DB_SIZE sysdate - DB_SIZE (sysdate -1))
Weekly Growth (DB_SIZE sysdate - DB_SIZE (sysdate -7))
Need some help constructing the SQL for those two additional columns. Any help will be greatly appreciated.
Thanks,
One option is to use LAG analytic function to calculate daily growth and correlated subquery (within the SELECT statement) for weekly growth.
For example:
SQL> with dbsize (snap_time, db_size) as
2 (select sysdate - 8, 100 from dual union all
3 select sysdate - 7, 110 from dual union all
4 select sysdate - 6, 105 from dual union all
5 select sysdate - 5, 120 from dual union all
6 select sysdate - 4, 130 from dual union all
7 select sysdate - 3, 130 from dual union all
8 select sysdate - 2, 142 from dual union all
9 select sysdate - 1, 144 from dual union all
10 select sysdate - 0, 150 from dual
11 )
12 select
13 a.snap_time,
14 a.db_size,
15 a.db_size - lag(a.db_size) over (order by a.snap_time) daily_growth,
16 --
17 db_size - (select db_size from dbsize b
18 where trunc(b.snap_time) = trunc(a.snap_time) - 7
19 ) weekly_growth
20 from dbsize a
21 order by a.snap_time;
SNAP_TIME DB_SIZE DAILY_GROWTH WEEKLY_GROWTH
------------------- ---------- ------------ -------------
24.08.2020 21:52:20 100
25.08.2020 21:52:20 110 10
26.08.2020 21:52:20 105 -5
27.08.2020 21:52:20 120 15
28.08.2020 21:52:20 130 10
29.08.2020 21:52:20 130 0
30.08.2020 21:52:20 142 12
31.08.2020 21:52:20 144 2 44
01.09.2020 21:52:20 150 6 40
9 rows selected.
SQL>
I would recommend lag() for both columns:
select s.*,
(dbsize - dbsize_1) as daily_growth,
(dbsize - dbsize_7) as weekly_growth
from (select s.*,
lag(dbsize) over (order by snap_time) as dbsize_1,
lag(dbsize, 7) over (order by snap_time) as dbsize_7
from sys.dbsize
) s
where SNAP_TIME > sysdate -3
order by SNAP_TIME;
If you don't have a snapshot each day, you can handle this with a window frame:
select s.*,
(dbsize - dbsize_1) as daily_growth,
(dbsize - dbsize_7) as weekly_growth
from (select s.*,
max(dbsize) over (order by trunc(snap_time) range between interval '1' day preceding and interval '1' second preceding) as dbsize_1,
lag(dbsize, 7) over (order by trunc(snap_time) range between '7' day preceding and interval '6 1' day to hour) as dbsize_7
from sys.dbsize
) s
where SNAP_TIME > sysdate - 3
order by SNAP_TIME;
If there is always is one record per day, you can use lag():
select
snap_time
db_size,
db_size - lag(db_size, 1) over(order by snap_time) daily_growth,
db_size - lag(db_size, 7) over(order by snap_time) weekly_growth
from sys.db.size
order by snap_time
This actually looks 1 row back and 7 rows back. If there are missing dates, or multiple records per day, then you could average the snap size by day, and use a window range in the window function:
select
trunc(snap_time) snap_day,
avg(db_size) avg_db_size,
avg(db_size) - avg(db_size) over(
order by trunc(snap_time)
range between interval '1' day preceding and interval '1' day preceding
) daily_growth,
avg(db_size) - avg(db_size) over(
order by trunc(snap_time)
range between interval '7' day preceding and interval '7' day preceding
) weekly_growth
from sys.db.size
group by trunc(snap_time)
order by trunc(snap_time)
If you want the results for the last 3 days only, you can turn any of the two above queries to subqueries, and filter in the outer query:
select *
from ( ... ) t
where snap_time > sysdate - 3 -- or: snap_day > trunc(sysdate) - 3

Complete datetime list despite missing values for analysis

I have been a reader of StackOverflow for a long time already. Nearly always, I find my answers here. Great!
But now, I have a problem where I could not find a solution yet:
I have an Oracle table with an ID, a date and a value.
Think of it as a list of outstanding tasks (value) and project (ID). When the number of open tasks of a project changes, the list gets a new entry.
It looks like this:
ID month RemainingValue
1 01/01/2018 1000
1 01/03/2018 800
1 01/04/2018 600
1 01/07/2018 400
2 01/02/2018 700
2 01/03/2018 650
2 01/05/2018 600
3 01/02/2018 50
3 01/08/2018 40
4 01/01/2018 2000
(DateFormat DD/MM/YYYY)
Please note that not every month has a value!
I have to calculate the sum of all open tasks per month.
If there is no value for a month, that means that the number of open tasks has not decreased in that month, so the query should take the previous existing value of this project into account.
I want this result:
month result calculation remark
01/01/2018 3000 =1000 + 2000 ID 1+4
01/02/2018 3750 =1000 + 700 + 50 + 2000 ID 1[value of 01/01/2018]+2+3+4[value of 01/01/2018]
01/03/2018 3500 =800 + 650 + 50 + 2000 ID 1+2+3[value of 01/02/2018]+4[value of 01/01/2018]
What I did already:
I created a list of all months using the CONNECT BY LEVEL functionality, similar to this:
SELECT LEVEL AS NR
, ADD_MONTHS('01-JAN-2018', LEVEL) AS MONAT
FROM DUAL
CONNECT BY LEVEL <= (... SOME.SUBSELECT.TO.GET.THE.NUMBER.OF.LEVELS ...)
Then I can outer join this list of months to the table above based on the date.
The problem is, that the values of tasks of the unfilled months are NULL. But I don't want them to be NULL, I want the previous filled value in this case.
I tried with LAG functions, but without success so far.
I am hoping that there is some functionality in (Oracle) SQL which can do this where I don't know of.
Or maybe it's even simpler and I just don't get it...
The resulting query should also be performant, because the underlying table has millions of rows. So I'd like to avoid slow PL/SQL solutions...
Hope you can help!
Kind Regards,
Nadine
You could use an analytic query to get the latest value for each ID, up to and including that month (relying on the default windowing clause.
This uses your sample data in a CTE, and adds another one to provide your month generation (may not match your desired range of course):
-- first CTE to replictae your data
with my_table(ID, month, RemainingValue) as (
select 1, to_date('01/01/2018', 'DD/MM/YYYY'), 1000 from dual
union all select 1, to_date('01/03/2018', 'DD/MM/YYYY'), 800 from dual
union all select 1, to_date('01/04/2018', 'DD/MM/YYYY'), 600 from dual
union all select 1, to_date('01/07/2018', 'DD/MM/YYYY'), 400 from dual
union all select 2, to_date('01/02/2018', 'DD/MM/YYYY'), 700 from dual
union all select 2, to_date('01/03/2018', 'DD/MM/YYYY'), 650 from dual
union all select 2, to_date('01/05/2018', 'DD/MM/YYYY'), 600 from dual
union all select 3, to_date('01/02/2018', 'DD/MM/YYYY'), 50 from dual
union all select 3, to_date('01/08/2018', 'DD/MM/YYYY'), 40 from dual
union all select 4, to_date('01/01/2018', 'DD/MM/YYYY'), 2000 from dual
),
-- second CTE to generate all months, here based on full range in table
-- use whatever you currently have for this
all_months (month) as (
select add_months(min_month, + level - 1)
from (
select min(month) as min_month, max(month) as max_month from my_table
)
connect by level <= months_between(max_month, min_month) + 1
)
select am.month, mt.id,
max(mt.remainingvalue) keep (dense_rank last order by mt.month) as remainingvalue
from all_months am
left join my_table mt on mt.month <= am.month
group by am.month, mt.id
order by id, month;
which gets
MONTH ID REMAININGVALUE
---------- ---------- --------------
2018-01-01 1 1000
2018-02-01 1 1000
2018-03-01 1 800
2018-04-01 1 600
2018-05-01 1 600
2018-06-01 1 600
2018-07-01 1 400
2018-08-01 1 400
2018-02-01 2 700
2018-03-01 2 650
2018-04-01 2 650
...
And then use that as an inline view or another CTE, summing the values:
-- first CTE to replictae your data
with my_table(ID, month, RemainingValue) as (
select 1, to_date('01/01/2018', 'DD/MM/YYYY'), 1000 from dual
union all select 1, to_date('01/03/2018', 'DD/MM/YYYY'), 800 from dual
union all select 1, to_date('01/04/2018', 'DD/MM/YYYY'), 600 from dual
union all select 1, to_date('01/07/2018', 'DD/MM/YYYY'), 400 from dual
union all select 2, to_date('01/02/2018', 'DD/MM/YYYY'), 700 from dual
union all select 2, to_date('01/03/2018', 'DD/MM/YYYY'), 650 from dual
union all select 2, to_date('01/05/2018', 'DD/MM/YYYY'), 600 from dual
union all select 3, to_date('01/02/2018', 'DD/MM/YYYY'), 50 from dual
union all select 3, to_date('01/08/2018', 'DD/MM/YYYY'), 40 from dual
union all select 4, to_date('01/01/2018', 'DD/MM/YYYY'), 2000 from dual
),
-- second CTE to generate all months, here based on full range in table
-- use whatever you currently have for this
all_months (month) as (
select add_months(min_month, + level - 1)
from (
select min(month) as min_month, max(month) as max_month from my_table
)
connect by level <= months_between(max_month, min_month) + 1
),
-- third CTE to get the latest value for each ID up to that month
inter (month, id, remainingvalue) as (
select am.month, mt.id,
max(mt.remainingvalue) keep (dense_rank last order by mt.month)
from all_months am
left join my_table mt on mt.month <= am.month
group by am.month, mt.id
)
select month, sum(remainingvalue) as result,
listagg(remainingvalue, ' + ') within group (order by id) as calculation
from inter
group by month
order by month;
which gets:
MONTH RESULT CALCULATION
---------- ---------- ------------------------------
2018-01-01 3000 1000 + 2000
2018-02-01 3750 1000 + 700 + 50 + 2000
2018-03-01 3500 800 + 650 + 50 + 2000
2018-04-01 3300 600 + 650 + 50 + 2000
2018-05-01 3250 600 + 600 + 50 + 2000
2018-06-01 3250 600 + 600 + 50 + 2000
2018-07-01 3050 400 + 600 + 50 + 2000
2018-08-01 3040 400 + 600 + 40 + 2000
I assume the calculation and remark columns in your result are just for our benefit to understand the logic; if you do want them then calculation is easy to get as above, and if you want remark too then you just need to identify the month the value comes from too, and add another listagg:
...
-- third CTE to get the latest value for each ID up to that month
inter (month, id, remainingvalue, valuemonth) as (
select am.month, mt.id,
max(mt.remainingvalue) keep (dense_rank last order by mt.month),
max(mt.month)
from all_months am
left join my_table mt on mt.month <= am.month
group by am.month, mt.id
)
select month, sum(remainingvalue) as result,
'= ' || listagg(remainingvalue, ' + ') within group (order by id) as calculation,
'ID ' || listagg(id || case when month != valuemonth then '[' || valuemonth || ']' end, ' + ')
within group (order by id) as remark
from inter
group by month
order by month;
MONTH RESULT CALCULATION REMARK
---------- ---------- ------------------------ -----------------------------------------------------------------
2018-01-01 3000 = 1000 + 2000 ID 1 + 4
2018-02-01 3750 = 1000 + 700 + 50 + 2000 ID 1[2018-01-01] + 2 + 3 + 4[2018-01-01]
2018-03-01 3500 = 800 + 650 + 50 + 2000 ID 1 + 2 + 3[2018-02-01] + 4[2018-01-01]
2018-04-01 3300 = 600 + 650 + 50 + 2000 ID 1 + 2[2018-03-01] + 3[2018-02-01] + 4[2018-01-01]
2018-05-01 3250 = 600 + 600 + 50 + 2000 ID 1[2018-04-01] + 2 + 3[2018-02-01] + 4[2018-01-01]
2018-06-01 3250 = 600 + 600 + 50 + 2000 ID 1[2018-04-01] + 2[2018-05-01] + 3[2018-02-01] + 4[2018-01-01]
2018-07-01 3050 = 400 + 600 + 50 + 2000 ID 1 + 2[2018-05-01] + 3[2018-02-01] + 4[2018-01-01]
2018-08-01 3040 = 400 + 600 + 40 + 2000 ID 1[2018-07-01] + 2[2018-05-01] + 3 + 4[2018-01-01]
You seem to want to sum the most recent value for each project before a given month.
The following gets the remaining value for each id for each month:
with months as (
SELECT LEVEL AS NR, ADD_MONTHS(DATE '2018-01-01', LEVEL) AS MONTH
FROM DUAL
CONNECT BY LEVEL <= (... SOME.SUBSELECT.TO.GET.THE.NUMBER.OF.LEVELS ...)
)
select m.month, i.id,
(select max(t.remainingvalue) keep (dense_rank first order by month desc)
from t
where t.id = i.id and t.month <= m.month
) as remainingvalue
from months m cross join
(select distinct id from t) i;
Now let's just summarize this:
with months as (
SELECT LEVEL AS NR, ADD_MONTHS(DATE '2018-01-01', LEVEL) AS MONTH
FROM DUAL
CONNECT BY LEVEL <= (... SOME.SUBSELECT.TO.GET.THE.NUMBER.OF.LEVELS ...)
)
select month, sum(remainingvalue)
from (select m.month, i.id,
(select max(t.remainingvalue) keep (dense_rank first order by month desc)
from t
where t.id = i.id and t.month <= m.month
) as remaining_value
from months m cross join
(select distinct id from t) i
) mi
group by month;

Oracle date as fraction of month

I would like to get a table of months between two dates with a fraction of each month that the two dates cover.
For example with a start date of 15/01/2017 and end date of 01/03/2017 it would output:
01/2017 : 0.5483..
02/2017 : 1
03/2017: 0.0322..
where for January and March the calculations are 17/31 and 1/31 respectively. I currently have the query:
WITH dates_between as (SELECT ADD_MONTHS(TRUNC(TO_DATE(:givenStartDate,'dd/mm/yyyy'), 'MON'), ROWNUM - 1) date_out
FROM DUAL
CONNECT BY ADD_MONTHS(TRUNC(TO_DATE(:givenStartDate,'dd/mm/yyyy'), 'MON'), ROWNUM - 1)
<= TRUNC(TO_DATE(:givenEndDate,'dd/mm/yyyy'), 'MON')
)
select * from dates_between
This outputs each month between two dates and formats it to the start of the month. I just need another column to give me the fraction the start and end dates cover. I'm not sure of a way to do this without it getting messy.
The months_between() function "calculates the fractional portion of the result based on a 31-day month". That means that if your range starts or ends in a month that doesn't have 31 days, the fraction you get might not be quite what you expect:
select months_between(date '2017-04-02', date '2017-04-01') as calc from dual
CALC
----------
.0322580645
... which is 1/31, not 1/30. To get 0.0333... instead you'd need to calculate the number of days in each month, at least for the first and last month. This uses a recursive CTE (11gR2+) to get the months, using a couple of date ranges provided by another CTE as a demo to show the difference (you can use a hierarchical query too of course):
with ranges (id, start_date, end_date) as (
select 1, date '2017-01-15', date '2017-03-01' from dual
union all select 2, date '2017-01-31', date '2017-03-01' from dual
union all select 3, date '2017-02-28', date '2017-04-01' from dual
),
months (id, month_start, month_days, range_start, range_end) as (
select id,
trunc(start_date, 'MM'),
extract(day from last_day(start_date)),
start_date,
end_date
from ranges
union all
select id,
month_start + interval '1' month,
extract(day from last_day(month_start + interval '1' month)),
range_start,
range_end
from months
where month_start < range_end
)
select id,
to_char(month_start, 'YYYY-MM-DD') as month_start,
month_days,
case when month_start = trunc(range_start, 'MM')
then month_days - extract(day from range_start) + 1
when month_start = trunc(range_end, 'MM')
then extract(day from range_end)
else month_days end as range_days,
(case when month_start = trunc(range_start, 'MM')
then month_days - extract(day from range_start) + 1
when month_start = trunc(range_end, 'MM')
then extract(day from range_end)
else month_days end) / month_days as fraction
from months
order by id, month_start;
which gets:
ID MONTH_STAR MONTH_DAYS RANGE_DAYS FRACTION
------ ---------- ---------- ---------- --------
1 2017-01-01 31 17 0.5483
1 2017-02-01 28 28 1
1 2017-03-01 31 1 0.0322
2 2017-01-01 31 1 0.0322
2 2017-02-01 28 28 1
2 2017-03-01 31 1 0.0322
3 2017-02-01 28 1 0.0357
3 2017-03-01 31 31 1
3 2017-04-01 30 1 0.0333
The first CTE ranges is just the demo data. The second, recursive, CTE months generates the start and number of days in each month, while keeping track of the original range dates too. The final query just calculates the fractions based on the number of days in the month in the range against the number of days in that month overall.
The month_days and range_days are only shown in the output so you can see what the calculation is based on, you can obviously omit those from your actual result, and format the month start date however you want.
With your original single pair of bind variables the equivalent would be:
with months (month_start, month_days, range_start, range_end) as (
select trunc(to_date(:givenstartdate, 'DD/MM/YYYY'), 'MM'),
extract(day from last_day(to_date(:givenstartdate, 'DD/MM/YYYY'))),
to_date(:givenstartdate, 'DD/MM/YYYY'),
to_date(:givenenddate, 'DD/MM/YYYY')
from dual
union all
select month_start + interval '1' month,
extract(day from last_day(month_start + interval '1' month)),
range_start,
range_end
from months
where month_start < range_end
)
select to_char(month_start, 'MM/YYYY') as month,
(case when month_start = trunc(range_start, 'MM')
then month_days - extract(day from range_start) + 1
when month_start = trunc(range_end, 'MM')
then extract(day from range_end)
else month_days end) / month_days as fraction
from months
order by month_start;
MONTH FRACTION
------- --------
01/2017 0.5483
02/2017 1
03/2017 0.0322
Here's how I would do it (n.b. I have expanded your dates_between to work against multiple rows, purely for demonstration purposes. If you're only working with a single set of parameters, you wouldn't need to do that):
WITH params AS (SELECT 1 ID, '15/01/2017' givenstartdate, '01/03/2017' givenenddate FROM dual UNION ALL
SELECT 2 ID, '15/01/2017' givenstartdate, '23/01/2017' givenenddate FROM dual UNION ALL
SELECT 3 ID, '01/01/2017' givenstartdate, '07/04/2017' givenenddate FROM dual),
dates_between AS (SELECT ID,
to_date(givenstartdate, 'dd/mm/yyyy') givenstartdate,
to_date(givenenddate, 'dd/mm/yyyy') givenenddate,
add_months(trunc(to_date(givenstartdate, 'dd/mm/yyyy'), 'MON'), LEVEL - 1) start_of_month,
last_day(add_months(trunc(to_date(givenstartdate, 'dd/mm/yyyy'), 'MON'), LEVEL - 1)) end_of_month
FROM params
CONNECT BY add_months(trunc(to_date(givenstartdate, 'dd/mm/yyyy'), 'MON'), LEVEL - 1) <=
trunc(to_date(givenenddate, 'dd/mm/yyyy'), 'MON')
AND PRIOR ID = ID
AND PRIOR sys_guid() IS NOT NULL)
SELECT ID,
givenstartdate,
givenenddate,
start_of_month date_out,
end_of_month,
months_between(LEAST(givenenddate, end_of_month) + 1, GREATEST(start_of_month, givenstartdate))
FROM dates_between;
ID GIVENSTARTDATE GIVENENDDATE DATE_OUT END_OF_MONTH DIFF
1 15/01/2017 01/03/2017 01/01/2017 31/01/2017 0.54838709
1 15/01/2017 01/03/2017 01/02/2017 28/02/2017 1
1 15/01/2017 01/03/2017 01/03/2017 31/03/2017 0.03225806
2 15/01/2017 23/01/2017 01/01/2017 31/01/2017 0.29032258
3 01/01/2017 07/04/2017 01/01/2017 31/01/2017 1
3 01/01/2017 07/04/2017 01/02/2017 28/02/2017 1
3 01/01/2017 07/04/2017 01/03/2017 31/03/2017 1
3 01/01/2017 07/04/2017 01/04/2017 30/04/2017 0.22580645
N.B. You may need to add a case statement to decide whether you want to add 1 or not to the diff calculation, based on your requirements.
Try this
For first month, I have calculated remaining days / total days and for last month, I subtracted it by 1 to get days passed / total days.
DBFiddle Demo
WITH tbl AS
(SELECT date '2017-01-15' AS givenStartDate
,date '2017-03-01' AS givenEndDate
FROM dual
)
SELECT ADD_MONTHS(TRUNC(givenStartDate, 'MON'), ROWNUM - 1) AS date_out ,
CASE
WHEN
rownum - 1 = 0
THEN months_between(last_day(givenStartDate), givenStartDate)
WHEN ADD_MONTHS(TRUNC(givenStartDate, 'MON'), ROWNUM - 1) = TRUNC(givenEndDate, 'MON')
THEN 1 - (months_between(last_day(givenEndDate), givenEndDate))
ELSE 1
END AS perc
FROM tbl
CONNECT BY ADD_MONTHS(TRUNC(givenStartDate, 'MON'), ROWNUM - 1)
<= TRUNC(givenEndDate, 'MON');
Output
+-----------+-------------------------------------------+
| DATE_OUT | PERC |
+-----------+-------------------------------------------+
| 01-JAN-17 | .5161290322580645161290322580645161290323 |
| 01-FEB-17 | 1 |
| 01-MAR-17 | .0322580645161290322580645161290322580645 |
+-----------+-------------------------------------------+