I have a table UPCALL_HISTORY that has 3 columns: SUBSCRIBER_ID, START_DATE and END_DATE. Let the number of unique subscribers be N.
I want to create a new table with 3 columns:
SUBSCRIBER_ID: All of the unique subscriber ids repeated 36 times in a row.
MONTHLY_CALENDAR_ID: For each SUBSCRIBER_ID, this column will have dates listed from July 2015 until July 2018 (36 months).
ACTIVE: This column will be used as a flag for each subscriber and whether they have a subscription during that month. This subscription data is in a table called UPCALL_HISTORY.
I am fairly new to SQL, don't have a lot of experience. I am good at Python but it seems that SQL doesn't work like Python.
Any query ideas that could help me build this table?
Let my UPCALL_HISTORY table be:
+---------------+------------+------------+
| SUBSCRIBER_ID | START_DATE | END_DATE |
+---------------+------------+------------+
| 119 | 01/07/2015 | 01/08/2015 |
| 120 | 01/08/2015 | 01/09/2015 |
| 121 | 01/09/2015 | 01/10/2015 |
+---------------+------------+------------+
I want a table that looks like:
+---------------+------------+--------+
| SUBSCRIBER_ID | MON_CA | ACTIVE |
+---------------+------------+--------+
| 119 | 01/07/2015 | 1 |
| * | 01/08/2015 | 0 |
| * | 01/09/2015 | 0 |
| (36 times) | 01/10/2015 | 0 |
| * | * | 0 |
| 119 | 01/07/2018 | 0 |
+---------------+------------+--------+
that continues for 120 and 121
EDIT: Added Example
Here's how I understood the question.
Sample table and several rows:
SQL> create table upcall_history
2 (subscriber_id number,
3 start_date date,
4 end_date date);
Table created.
SQL> insert into upcall_history
2 select 1, date '2015-12-25', date '2016-01-13' from dual union
3 select 1, date '2017-07-10', date '2017-07-11' from dual union
4 select 2, date '2018-01-01', date '2018-04-24' from dual;
3 rows created.
Create a new table. For distinct SUBSCRIBER_ID's, it creates 36 "monthly" rows, fixed (as you stated).
SQL> create table new_table as
2 select
3 x.subscriber_id,
4 add_months(date '2015-07-01', column_value - 1) monthly_calendar_id,
5 0 active
6 from (select distinct subscriber_id from upcall_history) x,
7 table(cast(multiset(select level from dual
8 connect by level <= 36
9 ) as sys.odcinumberlist));
Table created.
Update ACTIVE column value to "1" for rows whose MONTHLY_CALENDAR_ID is contained in START_DATE and END_DATE of the UPCALL_HISTORY table.
SQL> merge into new_table n
2 using (select subscriber_id, start_date, end_date from upcall_history) x
3 on ( n.subscriber_id = x.subscriber_id
4 and n.monthly_calendar_id between trunc(x.start_date, 'mm')
5 and trunc(x.end_date, 'mm')
6 )
7 when matched then
8 update set n.active = 1;
7 rows merged.
SQL>
Result (only ACTIVE = 1):
SQL> select * from new_table
2 where active = 1
3 order by subscriber_id, monthly_calendar_id;
SUBSCRIBER_ID MONTHLY_CA ACTIVE
------------- ---------- ----------
1 2015-12-01 1
1 2016-01-01 1
1 2017-07-01 1
2 2018-01-01 1
2 2018-02-01 1
2 2018-03-01 1
2 2018-04-01 1
7 rows selected.
SQL>
If you're on 12c you can use an inline view of all the months with cross apply to get the combinations of those with all IDs:
select uh.subscriber_id, m.month,
case when trunc(uh.start_date, 'MM') <= m.month
and (uh.end_date is null or uh.end_date >= add_months(m.month, 1))
then 1 else 0 end as active
from upcall_history uh
cross apply (
select add_months(trunc(sysdate, 'MM'), - level) as month
from dual
connect by level <= 36
) m
order by uh.subscriber_id, m.month;
I've made it a rolling 36-months window up to the current month, but you may actually want fixed dates as you had in the question.
With sample data from a CTE:
with upcall_history (subscriber_id, start_date, end_date) as (
select 1, date '2015-09-04', '2015-12-15' from dual
union all select 2, date '2017-12-04', '2018-05-15' from dual
)
that generates 72 rows:
SUBSCRIBER_ID MONTH ACTIVE
------------- ---------- ----------
1 2015-07-01 0
1 2015-08-01 0
1 2015-09-01 1
1 2015-10-01 1
1 2015-11-01 1
1 2015-12-01 0
1 2016-01-01 0
...
2 2017-11-01 0
2 2017-12-01 1
2 2018-01-01 1
2 2018-02-01 1
2 2018-03-01 1
2 2018-04-01 1
2 2018-05-01 0
2 2018-06-01 0
You can use that to create a new table, or populate an existing table; though if you do want a rolling window then a view might be more appropriate.
If you aren't on 12c then cross apply isn't available - you'll get "ORA-00905: missing keyword".
You can get the same result with two CTEs (one to get all the months, the other to get all the IDs) cross-joined, and then outer joined to your actual data:
with m (month) as (
select add_months(trunc(sysdate, 'MM'), - level)
from dual
connect by level <= 36
),
i (subscriber_id) as (
select distinct subscriber_id
from upcall_history
)
select i.subscriber_id, m.month,
case when uh.subscriber_id is null then 0 else 1 end as active
from m
cross join i
left join upcall_history uh
on uh.subscriber_id = i.subscriber_id
and trunc(uh.start_date, 'MM') <= m.month
and (uh.end_date is null or uh.end_date >= add_months(m.month, 1))
order by i.subscriber_id, m.month;
You can do this in 11g using Partitioned Outer Joins, like so:
WITH upcall_history AS (SELECT 119 subscriber_id, to_date('01/07/2015', 'dd/mm/yyyy') start_date, to_date('01/08/2015', 'dd/mm/yyyy') end_date FROM dual UNION ALL
SELECT 120 subscriber_id, to_date('01/08/2015', 'dd/mm/yyyy') start_date, to_date('01/09/2015', 'dd/mm/yyyy') end_date FROM dual UNION ALL
SELECT 121 subscriber_id, to_date('01/09/2015', 'dd/mm/yyyy') start_date, to_date('01/10/2015', 'dd/mm/yyyy') end_date FROM dual),
mnths AS (SELECT add_months(TRUNC(SYSDATE, 'mm'), + 1 - LEVEL) mnth
FROM dual
CONNECT BY LEVEL <= 12 * 3 + 1)
SELECT uh.subscriber_id,
m.mnth,
CASE WHEN mnth BETWEEN start_date AND end_date - 1 THEN 1 ELSE 0 END active
FROM mnths m
LEFT OUTER JOIN upcall_history uh PARTITION BY (uh.subscriber_id) ON (1=1)
ORDER BY uh.subscriber_id,
m.mnth;
SUBSCRIBER_ID MNTH ACTIVE
------------- ----------- ----------
119 01/07/2015 1
119 01/08/2015 0
119 01/09/2015 0
119 01/10/2015 0
<snip>
119 01/06/2018 0
119 01/07/2018 0
--
120 01/07/2015 0
120 01/08/2015 1
120 01/09/2015 0
120 01/10/2015 0
<snip>
120 01/06/2018 0
120 01/07/2018 0
--
121 01/07/2015 0
121 01/08/2015 0
121 01/09/2015 1
121 01/10/2015 0
<snip>
121 01/06/2018 0
121 01/07/2018 0
N.B. I have assumed some things about your start/end dates and what constitutes active; hopefully it should be easy enough for you to tweak the case statement to fit the logic that works best for your situation.
I also believe this is can be an example for CROSS JOIN. All I had to do was create a small table of all of the dates and then CROSS JOIN it with the table of subscribers.
Example: https://www.essentialsql.com/cross-join-introduction/
Related
I'm sure there is an easy way to do this in SQL Oracle, but I'm not able to find it.
I have a table ( temp ) , with 3 columns (id_1, id_2, date ), where date is Date type. Each line is unique.
The output I want is to repeat each line 15 times, with a new date columns where I get in one row the original date, in the second the original + 1 day, in the third the original + 2 days, etc, for each original row...
Define a helper CTO with 15 rows and simple cross joinit with your original table
with days as (
select rownum -1 as day_offset
from dual connect by level <= 15)
select
a.id1, a.id2, a.date_d + b.day_offset new_date_d
from tab a
cross join days b
order by 1,2,3;
Example - for the sample data ...
select * from tab;
ID1 ID2 DATE_D
---------- ---------- -------------------
1 1 01.01.2020 00:00:00
2 2 01.01.2019 00:00:00
... you will get the following output
ID1 ID2 NEW_DATE_D
---------- ---------- -------------------
1 1 01.01.2020 00:00:00
1 1 02.01.2020 00:00:00
1 1 03.01.2020 00:00:00
1 1 04.01.2020 00:00:00
1 1 05.01.2020 00:00:00
....
2 2 13.01.2019 00:00:00
2 2 14.01.2019 00:00:00
2 2 15.01.2019 00:00:00
30 rows selected.
Alternativly you may use recursive subquery factoring
The *day offset is calculated recursively as days.day_offset + 1 is limited to 15 and used to build the new date value
with days( id1, id2, date_d, day_offset) as (
select id1, id2, date_d, 0 day_offset from tab
union all
select days.id1, days.id2, tab.date_d + days.day_offset + 1 as date_d,
days.day_offset + 1 as day_offset
from tab
join days
on tab.id1 = days.id1 and tab.id2 = days.id2
and days.day_offset +1 < 15)
select * from days
order by 1,2,3
I have the following data in a SQL table:
+------------------------------------+
| ID YEARS START_DATE |
+------------------------------------+
| ----------- ----------- ---------- |
| 1 5 2020-12-01 |
| 2 8 2020-12-01 |
+------------------------------------+
Trying to create a SQL that would expand the above data and give me a start and end date for each year depending on YEARS and START_DATE from above table. Sample output below:
+-----------------------------------------------+
| ID YEAR DATE_START DATE_END |
+-----------------------------------------------+
| ----------- ----------- ---------- ---------- |
| 1 1 2020-12-01 2021-11-30 |
| 1 2 2021-12-01 2022-11-30 |
| 1 3 2022-12-01 2023-11-30 |
| 1 4 2023-12-01 2024-11-30 |
| 1 5 2024-12-01 2025-11-30 |
| 2 1 2020-12-01 2021-11-30 |
| 2 2 2021-12-01 2022-11-30 |
| 2 3 2022-12-01 2023-11-30 |
| 2 4 2023-12-01 2024-11-30 |
| 2 5 2024-12-01 2025-11-30 |
| 2 6 2025-12-01 2026-11-30 |
| 2 7 2026-12-01 2027-11-30 |
| 2 8 2027-12-01 2028-11-30 |
+-----------------------------------------------+
I would use an inline tally for this, as they are Far faster than a recursive CTE solution. Assuming you have low values for Years:
WITH YourTable AS(
SELECT *
FROM (VALUES(1,5,CONVERT(date,'20201201')),
(2,8,CONVERT(date,'20201201')))V(ID,Years, StartDate))
SELECT ID,
V.I + 1 AS [Year],
DATEADD(YEAR, V.I, YT.StartDate) AS StartDate,
DATEADD(DAY, -1, DATEADD(YEAR, V.I+1, YT.StartDate)) AS EndDate
FROM YourTable YT
JOIN (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10))V(I) ON YT.Years > V.I;
If you have more than 10~ years you can use either create a tally table, or create an large one inline in a CTE. This would start as:
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 AS I --remove the -1 if you don't want to start from 0
FROM N N1, N N2) --100 rows, add more Ns for more rows
...
Of course, I doubt you have 1,000 of years of data.
You can use a recursive CTE:
with cte as (
select id, 1 as year, start_date,
dateadd(day, -1, dateadd(year, 1, start_date)) as end_date,
years as num_years
from t
union all
select id, year + 1, dateadd(year, 1, start_date),
dateadd(day, -1, dateadd(year, 1, start_date)) as end_date,
num_years
from cte
where year < num_years
)
select id, year, start_date, end_date
from cte;
Here is a db<>fiddle.
In a query, you can use the following:
DATEADD(YEAR, 1, DATE_START) - 1
to add this to the table you can just create the extra column, and set it equal to the value of the above, e.g.
UPDATE MyTable
SET DATE_END = DATEADD(YEAR, 1, DATE_START) - 1
If you are working with sql server, then you can try to use operator CROSS APPLY with master.dbo.spt_values table to get list of numbers and generate dates:
select ID,T.number+1 as YEAR,
--generate date_start using T.number
dateadd(year,T.number,START_DATE)date_start,
--generate end_date: adding 1 year to start date
dateadd(dd,-1,dateadd(year,1,dateadd(year,T.number,START_DATE)))date_end
from Table
cross apply
master.dbo.spt_values T
where T.type='P' and T.number<YEARS
My knowledge is pretty basic so your help would be highly appreciated.
I'm trying to return the same row multiple times when it meets the condition (I only have access to select query).
I have a table of more than 500000 records with Customer ID, Start Date and End Date, where end date could be null.
I am trying to add a new column called Week_No and list all rows accordingly. For example if the date range is more than one week, then the row must be returned multiple times with corresponding week number. Also I would like to count overlapping days, which will never be more than 7 (week) per row and then count unavailable days using second table.
Sample data below
t1
ID | Start_Date | End_Date
000001 | 12/12/2017 | 03/01/2018
000002 | 13/01/2018 |
000003 | 02/01/2018 | 11/01/2018
...
t2
ID | Unavailable
000002 | 14/01/2018
000003 | 03/01/2018
000003 | 04/01/2018
000003 | 08/01/2018
...
I cannot pass the stage of adding week no. I have tried using CASE and UNION ALL but keep getting errors.
declare #week01start datetime = '2018-01-01 00:00:00'
declare #week01end datetime = '2018-01-07 00:00:00'
declare #week02start datetime = '2018-01-08 00:00:00'
declare #week02end datetime = '2018-01-14 00:00:00'
...
SELECT
ID,
'01' as Week_No,
'2018' as YEAR,
Start_Date,
End_Date
FROM t1
WHERE (Start_Date <= #week01end and End_Date >= #week01start)
or (Start_Date <= #week01end and End_Date is null)
UNION ALL
SELECT
ID,
'02' as Week_No,
'2018' as YEAR,
Start_Date,
End_Date
FROM t1
WHERE (Start_Date <= #week02end and End_Date >= #week02start)
or (Start_Date <= #week02end and End_Date is null)
...
The new table should look like this
ID | Week_No | Year | Start_Date | End_Date | Overlap | Unavail_Days
000001 | 01 | 2018 | 12/12/2017 | 03/01/2018 | 3 |
000002 | 02 | 2018 | 13/01/2018 | | 2 | 1
000003 | 01 | 2018 | 02/01/2018 | 11/01/2018 | 6 | 2
000003 | 02 | 2018 | 02/01/2018 | 11/01/2018 | 4 | 1
...
business wise i cannot understand what you are trying to achieve. You can use the following code though to calculate your overlapping days etc. I did it the way you asked, but i would recommend a separate table, like a Time dimension to produce a "cleaner" solution
/*sample data set in temp table*/
select '000001' as id, '2017-12-12'as start_dt, ' 2018-01-03' as end_dt into #tmp union
select '000002' as id, '2018-01-13 'as start_dt, null as end_dt union
select '000003' as id, '2018-01-02' as start_dt, '2018-01-11' as end_dt
/*calculate week numbers and week diff according to dates*/
select *,
DATEPART(WK,start_dt) as start_weekNumber,
DATEPART(WK,end_dt) as end_weekNumber,
case
when DATEPART(WK,end_dt) - DATEPART(WK,start_dt) > 0 then (DATEPART(WK,end_dt) - DATEPART(WK,start_dt)) +1
else (52 - DATEPART(WK,start_dt)) + DATEPART(WK,end_dt)
end as WeekDiff
into #tmp1
from
(
SELECT *,DATEADD(DAY, 2 - DATEPART(WEEKDAY, start_dt), CAST(start_dt AS DATE)) [start_dt_Week_Start_Date],
DATEADD(DAY, 8 - DATEPART(WEEKDAY, start_dt), CAST(start_dt AS DATE)) [startdt_Week_End_Date],
DATEADD(DAY, 2 - DATEPART(WEEKDAY, end_dt), CAST(end_dt AS DATE)) [end_dt_Week_Start_Date],
DATEADD(DAY, 8 - DATEPART(WEEKDAY, end_dt), CAST(end_dt AS DATE)) [end_dt_Week_End_Date]
from #tmp
) s
/*cte used to create duplicates when week diff is over 1*/
;with x as
(
SELECT TOP (10) rn = ROW_NUMBER() --modify the max you want
OVER (ORDER BY [object_id])
FROM sys.all_columns
ORDER BY [object_id]
)
/*final query*/
select --*
ID,
start_weekNumber+ (r-1) as Week,
DATEPART(YY,start_dt) as [YEAR],
start_dt,
end_dt,
null as Overlap,
null as unavailable_days
from
(
select *,
ROW_NUMBER() over (partition by id order by id) r
from
(
select d.* from x
CROSS JOIN #tmp1 AS d
WHERE x.rn <= d.WeekDiff
union all
select * from #tmp1
where WeekDiff is null
) a
)a_ext
order by id,start_weekNumber
--drop table #tmp1,#tmp
The above will produce the results you want except the overlap and unavailable columns. Instead of just counting weeks, i added the number of week in the year using start_dt, but you can change that if you don't like it:
ID Week YEAR start_dt end_dt Overlap unavailable_days
000001 50 2017 2017-12-12 2018-01-03 NULL NULL
000001 51 2017 2017-12-12 2018-01-03 NULL NULL
000001 52 2017 2017-12-12 2018-01-03 NULL NULL
000002 2 2018 2018-01-13 NULL NULL NULL
000003 1 2018 2018-01-02 2018-01-11 NULL NULL
000003 2 2018 2018-01-02 2018-01-11 NULL NULL
I am learning SQL and I was wondering how to select active users by month, depending on their starting and ending date (both timestamp(6)). My table looks like this:
Cust_Num | Start_Date | End_Date
1 | 2018-01-01 | 2019-01-01
2 | 2018-01-01 | NULL
3 | 2019-01-01 | 2019-06-01
4 | 2017-01-01 | 2019-03-01
So, counting the active users by month, I should have an output like:
As of. | Count
2018-06-01 | 3
...
2019-02-01 | 3
2019-07-01 | 1
So far, I do a manual operation by entering each month:
Select
201906,
count(distinct a.cust_num)
From
active_users a
Where
to_date(‘20190630’,’yyyymmdd) between a.start_date and nvl (a.end_date, ‘31-dec-9999)
union all
Select
201905,
count(distinct a.cust_num)
From
active_users a
Where
to_date(‘20190531’,’yyyymmdd) between a.start_date and nvl (a.end_date, ‘31-dec-9999)
union all
...
Not very optimized and sustainable if I want to enter 10 years ao 120 months lol.
Any help is welcome. Thanks a lot!
This query shows the active-user-count effective as-of the end of the month.
How it works:
Convert each input row (with StartDate and EndDate value) into two rows that represent a point-in-time when the active-user-count incremented (on StartDate) and decremented (on EndDate). We need to convert NULL to a far-off date value because NULL values are sorted before instead of after non-NULL values:
This makes your data look like this:
OnThisDate Change
2018-01-01 1
2019-01-01 -1
2018-01-01 1
9999-12-31 -1
2019-01-01 1
2019-06-01 -1
2017-01-01 1
2019-03-01 -1
Then we simply SUM OVER the Change values (after sorting) to get the active-user-count as of that specific date:
So first, sort by OnThisDate:
OnThisDate Change
2017-01-01 1
2018-01-01 1
2018-01-01 1
2019-01-01 1
2019-01-01 -1
2019-03-01 -1
2019-06-01 -1
9999-12-31 -1
Then SUM OVER:
OnThisDate ActiveCount
2017-01-01 1
2018-01-01 2
2018-01-01 3
2019-01-01 4
2019-01-01 3
2019-03-01 2
2019-06-01 1
9999-12-31 0
Then we PARTITION (not group!) the rows by month and sort them by their date so we can identify the last ActiveCount row for that month (this actually happens in the WHERE of the outermost query, using ROW_NUMBER() and COUNT() for each month PARTITION):
OnThisDate ActiveCount IsLastInMonth
2017-01-01 1 1
2018-01-01 2 0
2018-01-01 3 1
2019-01-01 4 0
2019-01-01 3 1
2019-03-01 2 1
2019-06-01 1 1
9999-12-31 0 1
Then filter on that where IsLastInMonth = 1 (actually, where ROW_COUNT() = COUNT(*) inside each PARTITION) to give us the final output data:
At-end-of-month Active-count
2017-01 1
2018-01 3
2019-01 3
2019-03 2
2019-06 1
9999-12 0
This does result in "gaps" in the result-set because the At-end-of-month column only shows rows where the Active-count value actually changed rather than including all possible calendar months - but that's ideal (as far as I'm concerned) because it excludes redundant data. Filling in the gaps can be done inside your application code by simply repeating output rows for each additional month until it reaches the next At-end-of-month value.
Here's the query using T-SQL on SQL Server (I don't have access to Oracle right now). And here's the SQLFiddle I used to come to a solution: http://sqlfiddle.com/#!18/ad68b7/24
SELECT
OtdYear,
OtdMonth,
ActiveCount
FROM
(
-- This query adds columns to indicate which row is the last-row-in-month ( where RowInMonth == RowsInMonth )
SELECT
OnThisDate,
OtdYear,
OtdMonth,
ROW_NUMBER() OVER ( PARTITION BY OtdYear, OtdMonth ORDER BY OnThisDate ) AS RowInMonth,
COUNT(*) OVER ( PARTITION BY OtdYear, OtdMonth ) AS RowsInMonth,
ActiveCount
FROM
(
SELECT
OnThisDate,
YEAR( OnThisDate ) AS OtdYear,
MONTH( OnThisDate ) AS OtdMonth,
SUM( [Change] ) OVER ( ORDER BY OnThisDate ASC ) AS ActiveCount
FROM
(
SELECT
StartDate AS [OnThisDate],
1 AS [Change]
FROM
tbl
UNION ALL
SELECT
ISNULL( EndDate, DATEFROMPARTS( 9999, 12, 31 ) ) AS [OnThisDate],
-1 AS [Change]
FROM
tbl
) AS sq1
) AS sq2
) AS sq3
WHERE
RowInMonth = RowsInMonth
ORDER BY
OtdYear,
OtdMonth
This query can be flattened into fewer nested queries by using aggregate and window functions directly instead of using aliases (like OtdYear, ActiveCount, etc) but that would make the query much harder to understand.
I have created the query which will give the result of all the months starting from the minimum start date in the table till maximum end date.
You can change it using adding one condition in WHERE clause.
-- table creation
CREATE TABLE ACTIVE_USERS (CUST_NUM NUMBER, START_DATE DATE, END_DATE DATE)
-- data creation
INSERT INTO ACTIVE_USERS
SELECT * FROM
(
SELECT 1, DATE '2018-01-01', DATE '2019-01-01' FROM DUAL UNION ALL
SELECT 2, DATE '2018-01-01', NULL FROM DUAL UNION ALL
SELECT 3, DATE '2019-01-01', DATE '2019-06-01' FROM DUAL UNION ALL
SELECT 4, DATE '2017-01-01', DATE '2019-03-01' FROM DUAL
)
-- data in the actual table
SELECT * FROM ACTIVE_USERS ORDER BY CUST_NUM;
CUST_NUM START_DATE END_DATE
---------- ---------- ----------
1 2018-01-01 2019-01-01
2 2018-01-01
3 2019-01-01 2019-06-01
4 2017-01-01 2019-03-01
Query to fetch desired result
WITH CTE ( START_DATE, END_DATE ) AS
(
SELECT
ADD_MONTHS( START_DATE, LEVEL - 1 ),
ADD_MONTHS( START_DATE, LEVEL ) - 1
FROM
(
SELECT
MIN( START_DATE ) AS START_DATE,
MAX( END_DATE ) AS END_DATE
FROM
ACTIVE_USERS
)
CONNECT BY LEVEL <= CEIL( MONTHS_BETWEEN( END_DATE, START_DATE ) ) + 1
)
--
--
SELECT
C.START_DATE,
COUNT(1) AS CNT
FROM
CTE C
JOIN ACTIVE_USERS D ON
(
C.END_DATE BETWEEN
D.START_DATE
AND
CASE
WHEN D.END_DATE IS NOT NULL THEN D.END_DATE
ELSE C.END_DATE
END
)
GROUP BY
C.START_DATE
ORDER BY
C.START_DATE;
-- output --
START_DATE CNT
---------- ----------
2017-01-01 1
2017-02-01 1
2017-03-01 1
2017-04-01 1
2017-05-01 1
2017-06-01 1
2017-07-01 1
2017-08-01 1
2017-09-01 1
2017-10-01 1
2017-11-01 1
START_DATE CNT
---------- ----------
2017-12-01 1
2018-01-01 3
2018-02-01 3
2018-03-01 3
2018-04-01 3
2018-05-01 3
2018-06-01 3
2018-07-01 3
2018-08-01 3
2018-09-01 3
2018-10-01 3
START_DATE CNT
---------- ----------
2018-11-01 3
2018-12-01 3
2019-01-01 3
2019-02-01 3
2019-03-01 2
2019-04-01 2
2019-05-01 2
2019-06-01 1
30 rows selected.
Cheers!!
Trying to find an efficient way of achieving the results in table B below based on data from table A. Is there an efficient way (i.e. not resource hungry) of doing so assuming one has millions of such records in table A? Please note ID 1 has an end date of 12/31/2199 (not a typo), and we only list the income for each ID during the months of 09/2016 to 12/2016. Also note that ID 3 has two incomes in the month of 11/2016, with 600 representing the November income (since that's the income the ID had at the end of November 2016 month). As for IDs that started in say November 2016, their rows would be missing for Sept 16 and Oct 16 since they did not exist pre-November.
Table A:
ID INCOME EFFECTIVE_DATE END_DATE
1 700 07/01/2016 12/31/2199
2 500 08/20/2016 12/31/2017
3 600 11/11/2016 02/28/2017
3 100 09/01/2016 11/10/2016
4 400 11/21/2016 12/31/2016
Table B (Intended results):
ID INCOME MONTH
1 700 12/2016
1 700 11/2016
1 700 10/2016
1 700 09/2016
2 500 12/2016
2 500 11/2016
2 500 10/2016
2 500 09/2016
3 600 12/2016
3 600 11/2016
3 100 10/2016
3 100 09/2016
4 400 12/2016
4 400 11/2016
RESOLVED I used the answer provided by #mathguy below and it worked like a charm -- learned something new in this process: pivot and unpivot. Also thanks to #MTO (and everyone else) for taking the time to help.
Here is a solution that uses each row from the base table just once, and does not require joins, group by, etc. It uses the unpivot operation, available since Oracle 11.1, which is not an expensive operation.
with
table_a ( id, income, effective_date, end_date ) as (
select 1, 700, date '2016-07-01', date '2199-12-31' from dual union all
select 2, 500, date '2016-08-20', date '2017-12-31' from dual union all
select 3, 600, date '2016-11-11', date '2017-02-28' from dual union all
select 3, 100, date '2016-09-01', date '2016-11-10' from dual union all
select 4, 400, date '2016-11-21', date '2016-12-31' from dual
)
-- end of test data (not part of the solution): SQL query begins BELOW THIS LINE
select id, income, mth
from (
select id,
case when date '2016-09-30' between effective_date and end_date
then income end as sep16,
case when date '2016-10-31' between effective_date and end_date
then income end as oct16,
case when date '2016-11-30' between effective_date and end_date
then income end as nov16,
case when date '2016-12-31' between effective_date and end_date
then income end as dec16
from table_a
)
unpivot ( income for mth in ( sep16 as '09/2016', oct16 as '10/2016', nov16 as '11/2016',
dec16 as '12/2016' )
)
order by id, mth desc
;
Output:
ID INCOME MTH
-- ------ -------
1 700 12/2016
1 700 11/2016
1 700 10/2016
1 700 09/2016
2 500 12/2016
2 500 11/2016
2 500 10/2016
2 500 09/2016
3 600 12/2016
3 600 11/2016
3 100 10/2016
3 100 09/2016
4 400 12/2016
4 400 11/2016
14 rows selected.
A solution using a recursive sub-query factoring clause. This does not rely on hard-coding the bounds into the query as they can be passed as the bind variable :lower_bound and :upper_bound; in the example below they are set to DATE '2016-09-01' and DATE '2016-12-31' respectively.
Query:
WITH months ( id, income, month, end_dt ) AS (
SELECT id,
income,
CAST( TRUNC( GREATEST( a.effective_date, :lower_bound ), 'MM' ) AS DATE ),
LEAST( a.end_date, :upper_bound )
FROM TableA a
WHERE :lower_bound <= a.end_date
AND a.effective_date <= :upper_bound
UNION ALL
SELECT id,
income,
CAST( ADD_MONTHS( month, 1 ) AS DATE ),
end_dt
FROM months
WHERE ADD_MONTHS( month, 1 ) <= end_dt
)
SELECT id,
income,
LAST_DAY( month ) AS month
FROM months
WHERE LAST_DAY( month ) <= end_dt
ORDER BY id, month;
Output:
ID INCOME MONTH
-- ------ ----------
1 700 2016-09-30
1 700 2016-10-31
1 700 2016-11-30
1 700 2016-12-31
2 500 2016-09-30
2 500 2016-10-31
2 500 2016-11-30
2 500 2016-12-31
3 100 2016-09-30
3 100 2016-10-31
3 600 2016-11-30
3 600 2016-12-31
4 400 2016-11-30
4 400 2016-12-31