I have Assignment Table like this
EMPLID | RCD | COMPANY | EFFDT | SALARY
---------------------------------------------------
100 | 0 | xyz | 1/1/2000 | 1000
100 | 0 | xyz | 1/15/2000 | 1100
100 | 0 | xyz | 1/31/2000 | 1200
100 | 0 | ggg | 2/15/2000 | 1500
100 | 1 | abc | 3/1/2000 | 2000
100 | 1 | abc | 4/1/2000 | 2100
I need a counter which should increase whenever RCD or Company combination changes and it should be order by effdt.
EMPLID | RCD | COMPANY | EFFDT | SALARY | COUNTER
-------|-----|---------|---------------|-------------|----------
100 | 0 | xyz | 1/1/2000 | 1000 | 1
100 | 0 | xyz | 1/15/2000 | 1100 | 1
100 | 0 | xyz | 1/31/2000 | 1200 | 1
100 | 0 | ggg | 2/15/2000 | 1500 | 2
100 | 1 | abc | 3/1/2000 | 2000 | 3
100 | 1 | abc | 4/1/2000 | 2100 | 3
I tried Dense_Rank function with order by EMPLID , RCD , COMPANY , It provides me Counter but its not in order by effdt.
SELECT EMPLID,RCD,COMPANY,EFFDT,
DENSE_RANK() over (order by EMPLID , RCD , COMPANY) AS COUNTER
FROM ASSIGNMENT ;
Order by EFFDT , Gives incremental counter 1 ... 6
SELECT EMPLID,RCD,COMPANY,EFFDT,
DENSE_RANK() over (order by EFFDT) AS COUNTER
FROM ASSIGNMENT;
Kindly help me to find out what I am missing.
Try LAG
WITH flagged AS (
SELECT *,
CASE WHEN LAG(RCD) OVER(PARTITION BY EMPLID ORDER BY EFFDT) = RCD
AND LAG(COMPANY) OVER(PARTITION BY EMPLID ORDER BY EFFDT) = COMPANY THEN 0 ELSE 1 END strtFlag
FROM tbl
)
SELECT EMPLID, RCD, COMPANY, EFFDT, SALARY, SUM(strtFlag) OVER(PARTITION BY EMPLID ORDER BY EFFDT) COUNTER
FROM flagged
alternatively, with DENSE_RANK() of group
WITH grps AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY EMPLID ORDER BY EFFDT) -
ROW_NUMBER() OVER(PARTITION BY EMPLID, RCD, COMPANY ORDER BY EFFDT) grp
FROM tbl
)
SELECT EMPLID, RCD, COMPANY, EFFDT, SALARY
, DENSE_RANK() OVER(PARTITION BY EMPLID ORDER BY grp) COUNTER
FROM grps
Anyway looks like two steps are needed to get dense numbering.
This should work - with the clarification that a combination of rcd and company should keep the same "counter" even if it appears in non-consecutive periods. I added to more rows to the test data to make sure I get the correct result.
Like Serg's solutions (which answer a different question), the solution does one pass over the base data, and then a second pass over the results of the first pass (all in memory, so it should be relatively fast). There's no way around that - this requires two different analytic functions where one depends on the results of the other, and nested analytic functions are not allowed. (This part of the answer addresses a comment by the OP to the Answer by Serg.)
with
test_data ( emplid, rcd, company, effdt, salary ) as (
select 100, 0, 'xyz', to_date('1/1/2000' , 'mm/dd/yyyy'), 1000 from dual union all
select 100, 0, 'xyz', to_date('1/15/2000', 'mm/dd/yyyy'), 1100 from dual union all
select 100, 0, 'xyz', to_date('1/31/2000', 'mm/dd/yyyy'), 1200 from dual union all
select 100, 0, 'ggg', to_date('2/15/2000', 'mm/dd/yyyy'), 1500 from dual union all
select 100, 1, 'abc', to_date('3/1/2000' , 'mm/dd/yyyy'), 2000 from dual union all
select 100, 1, 'abc', to_date('4/1/2000' , 'mm/dd/yyyy'), 2100 from dual union all
select 100, 0, 'xyz', to_date('5/1/2000' , 'mm/dd/yyyy'), 2200 from dual union all
select 100, 1, 'ggg', to_date('8/15/2000', 'mm/dd/yyyy'), 2300 from dual
)
-- end of test data; the actual solution (SQL query) begins below this line
select emplid, rcd, company, effdt, salary,
dense_rank() over (partition by emplid order by min_dt) as counter
from ( select emplid, rcd, company, effdt, salary,
min(effdt) over (partition by emplid, rcd, company) as min_dt
from test_data )
order by effdt -- ORDER BY is optional
;
EMPLID RCD COM EFFDT SALARY COUNTER
---------- ---------- --- ------------------- ---------- ----------
100 0 xyz 2000-01-01 00:00:00 1000 1
100 0 xyz 2000-01-15 00:00:00 1100 1
100 0 xyz 2000-01-31 00:00:00 1200 1
100 0 ggg 2000-02-15 00:00:00 1500 2
100 1 abc 2000-03-01 00:00:00 2000 3
100 1 abc 2000-04-01 00:00:00 2100 3
100 0 xyz 2000-05-01 00:00:00 2200 1
100 1 ggg 2000-08-15 00:00:00 2300 4
8 rows selected
I think you're looking for:
SELECT EMPLID,RCD,COMPANY,EFFDT,
DENSE_RANK() over (order by EMPLID , RCD , COMPANY) AS COUNTER
FROM (select * from ASSIGNMENT order by EFFDT);
or
SELECT EMPLID,RCD,COMPANY,EFFDT,
DENSE_RANK() over (order by EMPLID , RCD , COMPANY) AS COUNTER
FROM (select * from ASSIGNMENT order by EMPLID , RCD , COMPANY, EFFDT);
Related
I have a report that needs to list activity where total is >= 150 over 3 consecutive days.
Let's say I've created a temp table foo, to summarize daily totals.
| ID | Day | Total |
| -- | ---------- | ----- |
| 01 | 2020-01-01 | 10 |
| 01 | 2020-01-02 | 50 |
| 01 | 2020-01-03 | 50 |
| 01 | 2020-01-04 | 50 |
| 01 | 2020-01-05 | 20 |
| 02 | 2020-01-01 | 10 |
| 02 | 2020-01-02 | 10 |
| 02 | 2020-01-03 | 10 |
| 02 | 2020-01-04 | 10 |
| 02 | 2020-01-05 | 10 |
How Would I write SQL to return ID 01, but not 02?
Example Result:
| ID |
| -- |
| 01 |
I suspect that you want window functions here:
select distinct id
from (
select
t.*,
sum(total) over(partition by id order by day rows between 2 preceding and current row) sum_total,
count(*) over(partition by id order by day rows between 2 preceding and current row) cnt
from mytable t
) t
where cnt = 3 and sum_total >= 150
This gives you the ids that have a total greater than the given threshold over 3 consecutive days - which is how I understood your question.
If you just want to output the rows that have 3 consecutive days with a sum >= 150, you can use an analytic function to determine the moving total across each 3 day period per id, and then find the aggregate max value of the moving total per id, returning the id where it's >= 150.
E.g.:
WITH your_table AS (SELECT 1 ID, to_date('01/01/2020', 'dd/mm/yyyy') dy, 10 total FROM dual UNION ALL
SELECT 1 ID, to_date('02/01/2020', 'dd/mm/yyyy') dy, 50 total FROM dual UNION ALL
SELECT 1 ID, to_date('03/01/2020', 'dd/mm/yyyy') dy, 50 total FROM dual UNION ALL
SELECT 1 ID, to_date('04/01/2020', 'dd/mm/yyyy') dy, 50 total FROM dual UNION ALL
SELECT 1 ID, to_date('05/01/2020', 'dd/mm/yyyy') dy, 20 total FROM dual UNION ALL
SELECT 2 ID, to_date('01/01/2020', 'dd/mm/yyyy') dy, 10 total FROM dual UNION ALL
SELECT 2 ID, to_date('02/01/2020', 'dd/mm/yyyy') dy, 10 total FROM dual UNION ALL
SELECT 2 ID, to_date('03/01/2020', 'dd/mm/yyyy') dy, 10 total FROM dual UNION ALL
SELECT 2 ID, to_date('04/01/2020', 'dd/mm/yyyy') dy, 10 total FROM dual UNION ALL
SELECT 2 ID, to_date('05/01/2020', 'dd/mm/yyyy') dy, 10 total FROM dual),
moving_sums AS (SELECT ID,
dy,
total,
SUM(total) OVER (PARTITION BY ID ORDER BY dy RANGE BETWEEN 2 PRECEDING AND CURRENT ROW) moving_sum
FROM your_table)
SELECT ID
FROM moving_sums
GROUP BY ID
HAVING MAX(moving_sum) >= 150;
ID
----------
1
You can use a HAVING Clause GROUPED BY ID to list the desired ID values
SELECT ID
FROM foo
GROUP BY ID
HAVING COUNT( distinct day )>=3 AND SUM( NVL(Total,0) ) >= 150
Demo
Use this if you are to specify dates
WITH foo( ID, Day, Total ) AS
(
SELECT '01', date'2020-01-01' , 10 FROM dual
UNION ALL SELECT '01', date'2020-01-02' , 50 FROM dual
UNION ALL SELECT '01', date'2020-01-03' , 50 FROM dual
UNION ALL SELECT '01', date'2020-01-04' , 50 FROM dual
UNION ALL SELECT '01', date'2020-01-05' , 20 FROM dual
UNION ALL SELECT '02', date'2020-01-01' , 10 FROM dual
UNION ALL SELECT '02', date'2020-01-02' , 10 FROM dual
UNION ALL SELECT '02', date'2020-01-03' , 10 FROM dual
UNION ALL SELECT '02', date'2020-01-04' , 10 FROM dual
UNION ALL SELECT '02', date'2020-01-05' , 10 FROM dual
)SELECT
ID
FROM foo
WHERE day BETWEEN TO_DATE('2020-01-01', 'YYYY-MM-DD' ) AND TO_DATE('2020-01-04', 'YYYY-MM-DD' )
GROUP BY ID HAVING SUM(Total) >= 150;
RESULT:
ID|
--|
01|
Maybe you can try something like this :
SELECT
*
FROM foo
WHERE day BETWEEN 2020-01-01 AND 2020-01-04
AND total > 150
I am unable to group by on date from a timestamp column in below query:
CHG_TABLE
+----+--------+----------------+-----------------+-------+-----------+
| Key|Seq_Num | Start_Date | End_Date | Value |Record_Type|
+----+--------+----------------+-----------------+-------+-----------+
| 1 | 1 | 5/25/2019 2.05 | 12/31/9999 00.00| 800 | Insert |
| 1 | 1 | 5/25/2019 2.05 | 5/31/2019 11.12 | 800 | Update |
| 1 | 2 | 5/31/2019 11.12| 12/31/9999 00.00| 900 | Insert |
| 1 | 2 | 5/31/2019 11.12| 6/15/2019 12.05 | 900 | Update |
| 1 | 3 | 6/15/2019 12.05| 12/31/9999 00.00| 1000 | Insert |
| 1 | 3 | 6/15/2019 12.05| 6/25/2019 10.20 | 1000 | Update |
+---+---------+----------------+-----------------+-------+-----------+
RESULT:
+-----+------------------+----------------+-----------+----------+
| Key | Month_Start_Date | Month_End_Date |Begin_Value|End_Value |
+---- +------------------+----------------+-----------+----------+
| 1 | 6/1/2019 | 6/30/2019 | 1700 | 1000 |
| 1 | 7/1/2019 | 7/31/2019 | 1000 | 1000 |
+-----+------------------+----------------+-----------+----------+
Begin_Value : Sum(Value) for Max(Start_Date) < Month_Start_Date -> Should pick up latest date from last month
End_Value : Sum(Value) for Max(Start_Date) <= Month_End_Date -> Should pick up the latest date
SELECT k.key,
dd.month_start_date,
dd.month_end_date,
gendata.value first_value,
gendata.next_value last_value
FROM dim_date dd CROSS JOIN dim_person k
JOIN (SELECT ct.key,
dateadd('day',1,last_day(ct.start_date)) start_date ,
SUM(ct.value),
lead(SUM(ct.value)) OVER(ORDER BY ct.start_date) next_value
FROM (SELECT key,to_char(start_Date,'MM-YYYY') MMYYYY, max(start_Date) start_date
FROM CHG_TABLE
GROUP BY to_char(start_Date,'MM-YYYY'), key
) dt JOIN CHG_TABLE ct ON
dt.start_date = ct.start_date AND
dt.key = ct.key
group by ct.key, to_char(start_Date,'MM-YYYY')
) gendata ON
to_char(dd.month_end_date,'MM-YYYY') = to_char(to_char(start_Date,'MM-YYYY')) AND
k.key = gendata.key;
Error:
start_Date is not a valid group by expression
Related post:
Monthly Snapshot using Date Dimension
Hoping, I understood your question correctly.
You can check below query
WITH chg_table ( key, seq_num, start_date, end_date, value, record_type ) AS
(
SELECT 1,1,TO_DATE('5/25/2019 2.05','MM/DD/YYYY HH24.MI'),TO_DATE('12/31/9999 00.00','MM/DD/YYYY HH24.MI'), 800, 'Insert' FROM DUAL UNION ALL
SELECT 1,1,TO_DATE('5/25/2019 2.05','MM/DD/YYYY HH24.MI'),TO_DATE('5/31/2019 11.12','MM/DD/YYYY HH24.MI'), 800, 'Update' FROM DUAL UNION ALL
SELECT 1,2,TO_DATE('5/31/2019 11.12','MM/DD/YYYY HH24.MI'),TO_DATE('12/31/9999 00.00','MM/DD/YYYY HH24.MI'), 900, 'Insert' FROM DUAL UNION ALL
SELECT 1,2,TO_DATE('5/31/2019 11.12','MM/DD/YYYY HH24.MI'),TO_DATE('6/15/2019 12.05','MM/DD/YYYY HH24.MI'), 900, 'Update' FROM DUAL UNION ALL
SELECT 1,3,TO_DATE('6/15/2019 12.05','MM/DD/YYYY HH24.MI'),TO_DATE('12/31/9999 00.00','MM/DD/YYYY HH24.MI'), 1000, 'Insert' FROM DUAL UNION ALL
SELECT 1,3,TO_DATE('6/15/2019 12.05','MM/DD/YYYY HH24.MI'),TO_DATE('6/25/2019 10.20','MM/DD/YYYY HH24.MI'), 1000, 'Update' FROM DUAL
)
select key , new_start_date Month_Start_Date , new_end_date Month_End_Date , begin_value ,
nvl(lead(begin_value) over(order by new_start_date),begin_value) end_value
from
(
select key , new_start_date , new_end_date , sum(value) begin_value
from
(
select key, seq_num, start_date
, value, record_type ,
trunc(add_months(start_date,1),'month') new_start_date ,
trunc(add_months(start_date,2),'month')-1 new_end_date
from chg_table
where record_type = 'Insert'
)
group by key , new_start_date , new_end_date
)
order by new_start_date
;
Db Fiddle link: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=c77a71afa82769b48f424e1c0fa1c0b6
I am assuming that you are getting an "ORA-00979: not a GROUP BY expression" and this is due to your use of the TO_CHAR(timestamp_col,'DD-MM-YYYY') in the GROUP BY clause.
Adding the TO_CHAR(timestamp_col,'DD-MM-YYYY') to the select side of your statement should resolve this and provide the results you are expecting.
a, b, dateadd('day',1,last_day(timestamp_col)) start_date, TO_CHAR(timestamp_col,'DD-MM-YYYY'), ...```
I have a table with 200.000 rows in a SQL Server 2014 database looking like this:
CREATE TABLE DateRanges
(
Contract VARCHAR(8),
Sector VARCHAR(8),
StartDate DATE,
EndDate DATE
);
INSERT INTO DateRanges (Contract, Sector, StartDate, Enddate)
SELECT '111', '999', '01-01-2014', '03-31-2014'
union
SELECT '111', '999', '04-01-2014', '06-30-2014'
union
SELECT '111', '999', '07-01-2014', '09-30-2014'
union
SELECT '111', '999', '10-01-2014', '12-31-2014'
union
SELECT '111', '888', '08-01-2014', '08-31-2014'
union
SELECT '111', '777', '08-15-2014', '08-31-2014'
union
SELECT '222', '999', '01-01-2014', '03-31-2014'
union
SELECT '222', '999', '04-01-2014', '06-30-2014'
union
SELECT '222', '999', '07-01-2014', '09-30-2014'
union
SELECT '222', '999', '10-01-2014', '12-31-2014'
union
SELECT '222', '666', '11-01-2014', '11-30-2014'
UNION
SELECT '222', '555', '11-15-2014', '11-30-2014';
As you can see there can be multiple overlaps for each contract and what I would like to have is the result like this
Contract Sector StartDate EndDate
---------------------------------------------
111 999 01-01-2014 07-31-2014
111 888 08-01-2014 08-14-2014
111 777 08-15-2014 08-31-2014
111 999 09-01-2014 12-31-2014
222 999 01-01-2014 10-31-2014
222 666 11-01-2014 11-14-2014
222 555 11-15-2014 11-30-2014
222 999 12-01-2014 12-31-2014
I can not figure out how this can be done and the examples i have seen on this site quite do not fit my problem.
This answer makes use of a few different techniques. The first is a recursive-cte that creates a table with every relevant cal_date which then gets cross apply'd with unique Contract values to get every combination of both values. The second is window-functions such as lag and row_number to determine a variety of things detailed in the comments below. Lastly, and probably most importantly, gaps-and-islands to determine when one Contract/Sector combination ends and the next begins.
Answer:
--determine range of dates
declare #bgn_dt date = (select min(StartDate) from DateRanges)
, #end_dt date = (select max(EndDate) from DateRanges)
--use a recursive CTE to create a record for each day / Contract
; with dates as
(
select #bgn_dt as cal_date
union all
select dateadd(d, 1, a.cal_date) as cal_date
from dates as a
where a.cal_date < #end_dt
)
select d.cal_date
, c.Contract
into #contract_dates
from dates as d
cross apply (select distinct Contract from DateRanges) as c
option (maxrecursion 0)
--Final Select
select f.Contract
, f.Sector
, min(f.cal_date) as StartDate
, max(f.cal_date) as EndDate
from (
--Use the sum-over to obtain the Island Numbers
select dr.Contract
, dr.Sector
, dr.cal_date
, sum(dr.IslandBegin) over (partition by dr.Contract order by dr.cal_date asc) as IslandNbr
from (
--Determine if the record is the start of a new Island
select a.Contract
, a.Sector
, a.cal_date
, case when lag(a.Sector, 1, NULL) over (partition by a.Contract order by a.cal_date asc) = a.Sector then 0 else 1 end as IslandBegin
from (
--Determine which Contract/Date combinations are valid, and rank the Sectors that are in effect
select cd.cal_date
, dr.Contract
, dr.Sector
, dr.EndDate
, row_number() over (partition by dr.Contract, cd.cal_date order by dr.StartDate desc) as ConractSectorRnk
from #contract_dates as cd
left join DateRanges as dr on cd.Contract = dr.Contract
and cd.cal_date between dr.StartDate and dr.EndDate
) as a
where a.ConractSectorRnk = 1
and a.Contract is not null
) as dr
) as f
group by f.Contract
, f.Sector
, f.IslandNbr
order by f.Contract asc
, min(f.cal_date) asc
Output:
+----------+--------+------------+------------+
| Contract | Sector | StartDate | EndDate |
+----------+--------+------------+------------+
| 111 | 999 | 2014-01-01 | 2014-07-31 |
| 111 | 888 | 2014-08-01 | 2014-08-14 |
| 111 | 777 | 2014-08-15 | 2014-08-31 |
| 111 | 999 | 2014-09-01 | 2014-12-31 |
| 222 | 999 | 2014-01-01 | 2014-10-31 |
| 222 | 666 | 2014-11-01 | 2014-11-14 |
| 222 | 555 | 2014-11-15 | 2014-11-30 |
| 222 | 999 | 2014-12-01 | 2014-12-31 |
+----------+--------+------------+------------+
I have a table with four columns : id,validFrom,validTo and price.
This table contains the price of an article and the duration when that price is effective.
| id| validFrom | validTo | price
|---|-----------|-----------|---------
| 1 | 01-01-17 | 10-01-17 | 30000
| 1 | 04-01-17 | 09-01-17 | 20000
Now, for this inputs in my table my query output should be :
| id| validFrom | validTo | price
|---|-----------|----------|-------
| 1 | 01-01-17 | 03-01-17 | 30000
| 1 | 04-01-17 | 09-01-17 | 20000
| 1 | 10-01-17 | 10-01-17 | 30000
I can compare the dates and check if products with same id have overlapping dates but I have no idea how to split those dates into non-overlapping dates. Also I am not allowed to use PL/SQL.
Is this possible using only SQL ?
Oracle Setup:
CREATE TABLE prices ( id, validFrom, validTo, price ) AS
SELECT 1, DATE '2017-01-01', DATE '2017-01-10', 30000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-04', DATE '2017-01-09', 20000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-11', DATE '2017-01-15', 10000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-16', DATE '2017-01-18', 15000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-17', DATE '2017-01-20', 40000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-21', DATE '2017-01-24', 28000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-23', DATE '2017-01-26', 23000 FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-26', DATE '2017-01-26', 17000 FROM DUAL;
Query:
WITH daily_prices ( id, dt, price, duration ) AS (
-- Unroll the price ranges to individual days
SELECT id,
d.COLUMN_VALUE,
price,
validTo - validFrom
FROM prices p,
TABLE(
CAST(
MULTISET(
SELECT p.validFrom + LEVEL - 1
FROM DUAL
CONNECT BY p.validFrom + LEVEL - 1 <= p.validTo
)
AS SYS.ODCIDATELIST
)
) d
),
min_daily_prices ( id, dt, price ) AS (
-- Where a day falls between multiple ranges group them so the price
-- is for the shortest duration offer and if there are two equally short
-- durations then take the minimum price
SELECT id,
dt,
MIN( price ) KEEP ( DENSE_RANK FIRST ORDER BY duration )
FROM daily_prices
GROUP BY id, dt
),
group_changes ( id, dt, price, has_changed_group ) AS (
-- Find when the price changes or a day is skipped which means a new price
-- group is beginning
SELECT id,
dt,
price,
CASE WHEN dt = LAG( dt ) OVER ( PARTITION BY id ORDER BY dt ) + 1
AND price = LAG( price ) OVER ( PARTITION BY id ORDER BY dt )
THEN 0
ELSE 1
END
FROM min_daily_prices
),
groups ( id, dt, price, grp ) AS (
-- Calculate unique indexes (per id) for each group of price ranges
SELECT id,
dt,
price,
SUM( has_changed_group ) OVER ( PARTITION BY id ORDER BY dt )
FROM group_changes
)
SELECT id,
MIN( dt ) AS validFrom,
MAX( dt ) AS validTo,
MIN( price ) AS price
FROM groups
GROUP BY id, grp
ORDER BY id, validFrom;
Output:
ID VALIDFROM VALIDTO PRICE
---------- -------------------- -------------------- ----------
1 01-JAN-2017 00:00:00 03-JAN-2017 00:00:00 30000
1 04-JAN-2017 00:00:00 09-JAN-2017 00:00:00 20000
1 10-JAN-2017 00:00:00 10-JAN-2017 00:00:00 30000
1 11-JAN-2017 00:00:00 15-JAN-2017 00:00:00 10000
1 16-JAN-2017 00:00:00 18-JAN-2017 00:00:00 15000
1 19-JAN-2017 00:00:00 20-JAN-2017 00:00:00 40000
1 21-JAN-2017 00:00:00 22-JAN-2017 00:00:00 28000
1 23-JAN-2017 00:00:00 25-JAN-2017 00:00:00 23000
1 26-JAN-2017 00:00:00 26-JAN-2017 00:00:00 17000
I have a database of IDs with income and start/end dates as below but I have trouble breaking the income per ID per month for the given start/end date range.
A sample of the table data is below:
ID | INCOME | START_DATE | END_DATE
1 | 2000 | 02/01/2016 | 05/31/2016
1 | 1500 | 12/01/2015 | 01/31/2016
2 | 1000 | 01/01/2016 | 04/30/2016
The outcome should be:
ID | INCOME | MONTH
1 | 2000 | 05/2016
1 | 2000 | 04/2016
1 | 2000 | 03/2016
1 | 2000 | 02/2016
1 | 1500 | 01/2016
1 | 1500 | 12/2015
2 | 1000 | 04/2016
2 | 1000 | 03/2016
2 | 1000 | 02/2016
2 | 1000 | 01/2016
How would I write the Oracle SQL such that I am able to produce the above outcome efficiently (assuming the table has thousands of unique IDs)?
You can do this using connect by, like so:
with sample_data as (select 1 id, 2000 income, to_date('01/02/2016', 'dd/mm/yyyy') start_date, to_date('31/05/2016', 'dd/mm/yyyy') end_date from dual union all
select 1 id, 1500 income, to_date('01/12/2015', 'dd/mm/yyyy') start_date, to_date('31/01/2016', 'dd/mm/yyyy') end_date from dual union all
select 2 id, 1000 income, to_date('01/01/2016', 'dd/mm/yyyy') start_date, to_date('30/04/2016', 'dd/mm/yyyy') end_date from dual)
select id,
income,
add_months(trunc(start_date, 'mm'), -1 + level) mnth
from sample_data
connect by prior id = id
and prior income = income
and prior sys_guid() is not null
and add_months(trunc(start_date, 'mm'), -1 + level) <= trunc(end_date, 'mm')
order by id, income desc, mnth desc;
ID INCOME MNTH
---------- ---------- ---------
1 2000 01-MAY-16
1 2000 01-APR-16
1 2000 01-MAR-16
1 2000 01-FEB-16
1 1500 01-JAN-16
1 1500 01-DEC-15
2 1000 01-APR-16
2 1000 01-MAR-16
2 1000 01-FEB-16
2 1000 01-JAN-16
You could use recursive subquery factoring, if you're on 11gR2 or higher:
with r (id, income, this_date, end_date) as (
select id, income, trunc(start_date, 'MM'), trunc(end_date, 'MM')
from your_table
union all
select id, income, this_date + interval '1' month, end_date
from r
where end_date > this_date
)
select id, income, to_char(this_date, 'MM/YYYY') as month
from r
order by id, this_date desc;
ID INCOME MONTH
---------- ---------- -------
1 2000 05/2016
1 2000 04/2016
1 2000 03/2016
1 2000 02/2016
1 1500 01/2016
1 1500 12/2015
2 1000 04/2016
2 1000 03/2016
2 1000 02/2016
2 1000 01/2016
The anchor member gets the starting information - which I'm truncating to the start of the month, probably redundantly, but just in case one starts late enough in the month to cause a problem with interval addition. The recursive member then keeps adding a month to each existing member until it reaches the end date.