Extreme values within each group of dataset - abap

I have an SQLScript query written in AMDP which creates two new columns source_contract and target_contract.
RETURN SELECT client as client,
pob_id as pob_id,
dateto as change_to,
datefrom as change_from,
cast( cast( substring( cast( datefrom as char( 8 ) ), 1,4 ) as NUMBER ( 4 ) ) as INT )
as change_year,
cast( CONCAT( '0' , ( substring( cast( datefrom as char( 8 ) ), 5,2 ) ) ) as VARCHAR (3))
as change_period,
LAG( contract_id, 1, '00000000000000' ) OVER ( PARTITION BY pob_id ORDER BY pob_id, datefrom )
as source_contract,
contract_id as target_contract
from farr_d_pob_his
ORDER BY pob_id
Original data:
POB Valid To Valid From Contract
257147 05.04.2018 05.04.2018 10002718
257147 29.05.2018 06.04.2018 10002719
257147 31.12.9999 30.05.2018 10002239
Data from AMDP view:
I want to ignore any intermediate rows (Date is the criteria to decide order). Any suggestion or ideas ?
I thought of using Group by to get the max date and min date and using union on these entries in a separate consumption view but if we are using group by we can't fetch other entries. The other possibility is order by date but it is not available in CDS.

You already have the optimal solution with sub-selects.
Pseudo code:
SELECT *
FROM OriginalData
WHERE (POB, ValidFrom)
IN (SELECT POB, MIN(ValidFrom)
FROM OriginalData
GROUP BY POB)
OR (POB, ValidTo)
IN (SELECT POB, MAX(ValidTo)
FROM OriginalData
GROUP BY POB);
GROUP BY won't work as it "mixes up" the minimums in different columns.
A nice touch might be extracting the sub-selects into views of their own, eg. EarliestContractPerPob and LatestContractPerPob.

Here is the proof-of-concept of solution for your task.
Provided we have pre-selected by material type (MTART) dataset based on table mara which is quite similar to yours:
------------------------------------------------
| MATNR | ERSDA | VPSTA |MTART|
------------------------------------------------
| 17000000007|18.06.2018|KEDBXCZ |ZSHD |
| 17000000008|21.06.2018|K |ZSHD |
| 17000000011|21.06.2018|K |ZSHD |
| 17000000023|22.06.2018|KEDCBGXZLV|ZSHD |
| 17000000103|09.01.2019|K |ZSHD |
| 17000000104|09.01.2019|K |ZSHD |
| 17000000105|09.01.2019|K |ZSHD |
| 17000000113|06.02.2019|V |ZSHD |
------------------------------------------------
Here are the materials and we want to leave only the last and the first material (MATNR) by creation date (ERSDA) and find maintenance type (VPSTA) for first and last ones.
------------------------------------------------
| MATNR | ERSDA | VPSTA |MTART|
------------------------------------------------
| 17000000007|18.06.2018|KEDBXCZ |ZSHD |
| 17000000113|06.02.2019|V |ZSHD |
------------------------------------------------
In your case you similarly search within each POB (mtart) source and target contracts contract_id (last and first vpsta) on the basis of datefrom criterion (ersda).
One can achieve that using UNION and two selects with sub-queries:
SELECT ersda AS date, matnr AS max, mtart AS type, vpsta AS maint
FROM mara AS m
WHERE ersda = ( SELECT MAX( ersda ) FROM mara WHERE mtart = m~mtart )
UNION SELECT ersda AS date, matnr AS max, mtart AS type, vpsta AS maint
FROM mara AS m2
WHERE ersda = ( SELECT MIN( ersda ) FROM mara WHERE mtart = m2~mtart )
ORDER BY type, date
INTO TABLE #DATA(lt_result).
Here you can notice the first select fetches max ersda dates and the second select fetches min ones.
The resulted set ordered by type and date will be somewhat what are you looking for (F = first, L = last):
Your SELECT should look somewhat like this:
SELECT datefrom as change_from, contract_id AS contract, pob_id AS pob
FROM farr_d_pob_his AS farr
WHERE datefrom = ( SELECT MAX( datefrom ) FROM farr_d_pob_his WHERE pob_id = farr~pob_id )
UNION SELECT datefrom as change_from, contract_id AS contract, pob_id AS pob
FROM farr_d_pob_his AS farr2
WHERE datefrom = ( SELECT MIN( datefrom ) FROM farr_d_pob_his WHERE pob_id = farr2~pob_id )
ORDER BY pob, date
INTO TABLE #DATA(lt_result).
Note, this will work only if you have unique datefrom dates, otherwise the query will not know which last/first contract you want to use. Also, in case of the only one contract within each POB there will be only one record.
A couple of words about implementation. In your sample I see that you use AMDP class but later you mentioned that ORDER is not supported by CDS. Yes, they are not supported in CDS as well as sub-queries, but they are supported in AMDP.
You should differentiate two types of AMDP functions: functions for AMDP method and functions for CDS table functions. The first ones perfectly handle SELECTs with sorting and sub-queries. You can view the samples in CL_DEMO_AMDP_VS_OPEN_SQL demo class which demonstrate AMDP features including sub-queries. You can derive you code in AMDP function and call it from your CDS table function implementation.

Related

How to create an observation with 0 in the column

I am using the code below to get the quarterly wages for individuals from 2010Q1-2020Q4. If an individual did not work in a particular quarter they do not have an observation for that quarter. Instead, I would like for there to be an observation but have the quarterly wage be 0. For example,
What is currently happening:
| MPI | Quarter| Wage|
|PersonA|2010Q1 | 100 |
|PersonA|2010Q2 | 100 |
|PersonA|2010Q3 | 100 |
|PersonB|2010Q1 | 100 |
Desired output
| MPI | Quarter| Wage|
|PersonA|2010Q1 | 100 |
|PersonA|2010Q2 | 100 |
|PersonA|2010Q3 | 100 |
|PersonA|2010Q4 | 0 |
|PersonB|2010Q1 | 100 |
|PersonB|2010Q2 | 0 |
|PersonB|2010Q3 | 0 |
|PersonB|2010Q4 | 0 |
ws_data AS (
SELECT
MASTER_PERSON_INDEX AS mpi
,SUBSTR(cast(wg.naics as string), 1, 2) AS NAICS_2
,SUBSTR(cast(wg.yrqtr as string), 0,5) AS quarter
,wg.yrqtr
,wg.employer
,wg.wages
,SUBSTR(cast(wg.yrqtr as string), 0,4) AS YEAR
FROM
( SELECT
*
FROM
`ws.ws_ui_wage_records_di` wsui
WHERE
wsui.MASTER_PERSON_INDEX IN (SELECT mpi FROM rc_table_ra16_all_grads_1b)
AND
wsui.yrqtr IN (20101, 20102, 20103, 20104,
20111, 20112, 20113, 20114,
20121, 20122, 20123, 20124,
20131, 20132, 20133, 20134,
20141, 20142, 20143, 20144,
20151, 20152, 20153, 20154,
20161, 20162, 20163, 20164,
20171, 20172, 20173, 20174,
20181, 20182, 20183, 20184,
20191, 20192, 20193, 20194,
20201, 20202, 20203, 20204)
)wg
),
ws_agg AS (
SELECT
mpi
-- ,STATS_MODE(NAICS_2) AS NAICS_2
-- ,STATS_MODE(NAICS_DESC) AS NAICS_DESC
,quarter
,SUM(wages) AS wages_quart
FROM
ws_data
GROUP BY
mpi, quarter
),
ws_annot AS (
SELECT
dagg.*
,row_number() OVER(PARTITION BY dagg.mpi, cast(wages_quart as string) ORDER BY dagg.wages_quart DESC)AS rn
FROM
ws_agg dagg
)
Try using this data to create a CTE at the top as a Quarter table and then using that as the starting point in your main from statement. You should be able to replace the original code I copied from (wg where statement) with that top CTE as well.
(20101, 20102, 20103, 20104,
20111, 20112, 20113, 20114,
20121, 20122, 20123, 20124,
20131, 20132, 20133, 20134,
20141, 20142, 20143, 20144,
20151, 20152, 20153, 20154,
20161, 20162, 20163, 20164,
20171, 20172, 20173, 20174,
20181, 20182, 20183, 20184,
20191, 20192, 20193, 20194,
20201, 20202, 20203, 20204)
Your db may have a DateDimension table with quarters in it that you could use as well.
Since you want all quarters, and all individuals, one way to achieve this is to start with building all individual-quarter combinations in your data and use that as a 'driver' in a left join; like this:
select
Pers.MID
, Qtr.Quarter
, coalesce(W.Wage,0) as Wage
, ...
from
(select distinct MPIfrom YourTable) Pers
cross join
(select distinct Quarter from DateDimensionTable) Qtr
left join
YourTable W
on w.MPI=Pers.MPI
and w.Quarter=Qtr.Quarter
If your table has all the periods you are interested in, you can use YourTable, instead of DateDimensionTable. But if it doesn't, and I guess it can't be guaranteed, then you can use a Date/Calendar table here , if you have any, or you can dynamically generate quarters between min and max quarter in YourTable; just search for these terms). You can also hardcode them as you have in your query (as JBontje recommended).
If a combination is missing from YourTable then the Wage for that combo will be null, you can use coalesce to treat it as zero.

Two dimensional comparison in sql

DB schema
CREATE TABLE newsletter_status
(
cryptid varchar(255) NOT NULL,
status varchar(25),
regDat timestamp,
confirmDat timestamp,
updateDat timestamp,
deleteDat timestamp
);
There are rows with the same cryptid, I need to squash them to one row. So the cryptid becomes effectively unique. The complexity comes from the fact that I need to compare dates by rows as well as by columns. How to implement that?
The rule I need to use is:
status should be taken from the row with the latest timestamp (among all 4 dates)
for every date column I need to select the latest date
Example:
002bc5 | new | 2010.01.15 | 2001.01.15 | NULL | 2020.01.10
002bc5 | confirmed | NULL | 2020.01.30 | 2020.01.15 | 2020.01.15
002bc5 | deactivated | NULL | NULL | NULL | 2020.12.03
needs to be squashed into:
002bc5 | deactivated | 2010.01.15 | 2020.01.30 | 2020.01.15 | 2020.12.03
The status deactivated is taken because the timestamp 2020.12.03 is the latest
What you need to get the status is to sort rowset by dates in descending order. In Oracle there is agg_func(<arg>) keep (dense_rank first ...), in other databases it can be replaced with row_number() and filter. Because analytic functions in HANA works not so good sometimes, I suggest to use the only one aggregate function I know in HANA that supports ordering inside - STRING_AGG - with little trick. If you have not a thousands of rows with statuses (i.e. concatenated status will not be greater 4000 for varchar), it will work. This is the query:
select
cryptid,
max(regDat) as regDat,
max(confirmDat) as confirmDat,
max(updateDat) as updateDat,
max(deleteDat) as deleteDat,
substr_before(
string_agg(status, '|'
order by greatest(
ifnull(regDat, date '1000-01-01'),
ifnull(confirmDat, date '1000-01-01'),
ifnull(updateDat, date '1000-01-01'),
ifnull(deleteDat, date '1000-01-01')
) desc),
'|'
) as status
from newsletter_status
group by cryptid
You can use aggregation:
select cryptid,
coalesce(max(case when status = 'deactivated' then status end)
max(case when status = 'confirmed' then status end),
max(case when status = 'new' then status end),
) as status,
max(regDat),
max(confirmDat),
max(updateDat),
max(deleteDat)
from newsletter_status
group by cryptid;
The coalesce()s are a trick to get the statuses in priority order.
EDIT:
If you just want the row with the latest timestamp:
select cryptid,
max(case when seqnum = 1 then status end) as status_on_max_date,
max(regDat),
max(confirmDat),
max(updateDat),
max(deleteDat)
from (select ns.*,
row_number() over (partition by cryptid
order by greatest(coalesce(regDat, '2000-01-01'),
coalesce(confirmDat, '2000-01-01'),
coalesce(updateDat, '2000-01-01'),
coalesce(deleteDat, '2000-01-01')
)
) as seqnum
from newsletter_status ns
) ns
group by cryptid;
I would start by ranking the rows of each cryptid by the greatest value of the date column. Then we can use that information to identify the latest status per cryptid, and aggregate :
select cryptid,
max(case when rn = 1 then status end) as status,
max(regDate) as regDat,
max(confirmDat) as confirmDat,
max(updatedDat) as updatedDat,
max(deleteDat) as deleteDat
from (
select ns.*,
row_number() over(
partition by cryptid
order by greatest(
coalesce(regDate, '0001-01-01'),
coalesce(confirmDat, '0001-01-01'),
coalesce(updatedDat, '0001-01-01'),
coalesce(deleteDat, '0001-01-01')
)
) rn
from newsletter_status ns
) ns
group by cryptid

logic to create more rows in sql

I have a table1 that I wanted to transform into the expected table.
Expected table logic for columns:
cal: comes from cal of table1. ID comes from the ID of table1.
code: this is populated with lp or fp depending upon if we have a value in f_a then we create a new record with fp as code. corresponding to it we check if f_a is populated if yes then we take that date from f_a and put in in the Al column for the same ID. also we check if f_pl is populated if yes then we take the date from that and put it in the pl column.
If the code was lp then we check if l_a is populated then we take that date and place in the date in Al for that code and Id. also, we check if lpl is populated if yes then we take that date and put it in pl.
I am just a starter with SQL so it is a bit overwhelming for me on how to get it started. Please post some solutions.
table1:
ID f_a l_a f_pl lpl cal
CNT 6/20/2018 6/28/2018 6/28/2018 1/31/2020
expected output:
ID Cal code pl Al
CNT 1/31/2020 lp 6/28/2018 6/28/2018
CNT 1/31/2020 fp 6/20/2018
Update:
I have more IDs in the table, so it is not that CNT is the only Id. If I use unpivot then it should follow the same logic for all IDs.
This is a question about how to unpivot columns to rows. In Oracle, I would recommend a lateral join:
select t.id, t.cal, x.*
from mytable t
cross apply (
select 'lp' as code, t.lpl as pl, l_a as al from dual
union all
select 'fp', t.f_pl, t.f_a from dual
) x
This syntax is available in Oracle 12.1 onwards. In earlier versions, you would use union all:
select id, cal, 'lp' as code, lpl as pl, l_a as al from mytable
union all
select id, cal, 'lp' as code, 'fp', f_pl, f_a from mytable
You can use UNPIVOT for multiple columns then do the checks you need on dates:
with a as (
select
'CNT' as ID,
date '2018-06-20' as f_a,
date '2018-06-28' as l_a,
cast(null as date) as f_pl,
date '2018-06-28' as l_pl,
date '2020-01-31' as cal
from dual
)
select *
from a
unpivot(
(pl, al) for code in ((l_pl, l_a) as 'lp', (f_pl, f_a) as 'fp')
) up
ID | CAL | CODE | PL | AL
CNT | 31-JAN-07 | lp | 28-JUN-18 | 28-JUN-18
CNT | 31-JAN-07 | fp | | 20-JUN-18
Working example here.
Please try this script which is not version dependend:
-- Here we select columns foom source table. Please change the names if they are different
with r as (
select
ID,
f_a,
l_a,
f_pl,
lpl, -- Here not sure if example is wrong or no underscore in column deffinition
cal
from table_1 -- Please put real table name here
)
select * from (
select r.id, r.cal, 'lp' as code, r.l_pl as pl, l_a as ai
from r
where r.l_a is not null
union all
select r1.id, r1.cal, 'pl', r1.f_pl, r1.f_a
from r r1
where r1.f_a is not null
)
order by id, cal, code;

datediff for row that meets my condition only once per row

I want to do a datediff between 2 dates on different rows only if the rows have a condition.
my table looks like the following, with additional columns (like guid)
Id | CreateDateAndTime | condition
---------------------------------------------------------------
1 | 2018-12-11 12:07:55.273 | with this
2 | 2018-12-11 12:07:53.550 | I need to compare this state
3 | 2018-12-11 12:07:53.550 | with this
4 | 2018-12-11 12:06:40.780 | state 3
5 | 2018-12-11 12:06:39.317 | I need to compare this state
with this example I would like to have 2 rows in my selection which represent the difference between the dates from id 5-3 and from id 2-1.
As of now I come with a request that gives me the difference between dates from id 5-3 , id 5-1 and id 2-1 :
with t as (
SELECT TOP (100000)
*
FROM mydatatable
order by CreateDateAndTime desc)
select
DATEDIFF(SECOND, f.CreateDateAndTime, s.CreateDateAndTime) time
from t f
join t s on (f.[guid] = s.[guid] )
where f.condition like '%I need to compare this state%'
and s.condition like '%with this%'
and (f.id - s.id) < 0
My problem is I cannot set f.id - s.id to a value since other rows can be between the ones I want to make the diff on.
How can I make the datediff only on the first rows that meet my conditions?
EDIT : To make it more clear
My condition is an eventname and I want to calculate the time between the occurence of my event 1 and my event 2 and fill a column named time for example.
#Salman A answer is really close to what I want except it will not work when my event 2 is not happening (which was not in my initial example)
i.e. in table like the following , it will make the datediff between row id 5 and row id 2
Id | CreateDateAndTime | condition
---------------------------------------------------------------
1 | 2018-12-11 12:07:55.273 | with this
2 | 2018-12-11 12:07:53.550 | I need to compare this state
3 | 2018-12-11 12:07:53.550 | state 3
4 | 2018-12-11 12:06:40.780 | state 3
5 | 2018-12-11 12:06:39.317 | I need to compare this state
the code I modified :
WITH cte AS (
SELECT id
, CreateDateAndTime AS currdate
, LAG(CreateDateAndTime) OVER (PARTITION BY guid ORDER BY id desc ) AS prevdate
, condition
FROM t
WHERE condition IN ('I need to compare this state', 'with this ')
)
SELECT *
,DATEDIFF(second, currdate, prevdate) time
FROM cte
WHERE condition = 'I need to compare this state '
and DATEDIFF(second, currdate, prevdate) != 0
order by id desc
Perhaps you want to match ids with the nearest smaller id. You can use window functions for this:
WITH cte AS (
SELECT id
, CreateDateAndTime AS currdate
, CASE WHEN LAG(condition) OVER (PARTITION BY guid ORDER BY id) = 'with this'
THEN LAG(CreateDateAndTime) OVER (PARTITION BY guid ORDER BY id) AS prevdate
, condition
FROM t
WHERE condition IN ('I need to compare this state', 'with this')
)
SELECT *
, DATEDIFF(second, currdate, prevdate)
FROM cte
WHERE condition = 'I need to compare this state'
The CASE expression will match this state with with this. If you have mismatching pairs then it'll return NULL.
try by using analytic function lead()
with cte as
(
select 1 as id, '2018-12-11 12:07:55.273' as CreateDateAndTime,'with this' as condition union all
select 2,'2018-12-11 12:07:53.550','I need to compare this state' union all
select 3,'2018-12-11 12:07:53.550','with this' union all
select 4,'2018-12-11 12:06:40.780','state 3' union all
select 5,'2018-12-11 12:06:39.317','I need to compare this state'
) select *,
DATEDIFF(SECOND,CreateDateAndTime,lead(CreateDateAndTime) over(order by Id))
from cte
where condition in ('with this','I need to compare this state')
You Ideally want LEADIF/LAGIF functions, because you are looking for the previous row where condition = 'with this'. Since there are no LEADIF/LAGIFI think the best option is to use OUTER/CROSS APPLY with TOP 1, e.g
CREATE TABLE #T (Id INT, CreateDateAndTime DATETIME, condition VARCHAR(28));
INSERT INTO #T (Id, CreateDateAndTime, condition)
VALUES
(1, '2018-12-11 12:07:55', 'with this'),
(2, '2018-12-11 12:07:53', 'I need to compare this state'),
(3, '2018-12-11 12:07:53', 'with this'),
(4, '2018-12-11 12:06:40', 'state 3'),
(5, '2018-12-11 12:06:39', 'I need to compare this state');
SELECT ID1 = t1.ID,
Date1 = t1.CreateDateAndTime,
ID2 = t2.ID,
Date2 = t2.CreateDateAndTime,
Difference = DATEDIFF(SECOND, t1.CreateDateAndTime, t2.CreateDateAndTime)
FROM #T AS t1
CROSS APPLY
( SELECT TOP 1 t2.CreateDateAndTime, t2.ID
FROM #T AS t2
WHERE t2.Condition = 'with this'
AND t2.CreateDateAndTime > t1.CreateDateAndTime
--AND t2.GUID = t.GUID
ORDER BY CreateDateAndTime
) AS t2
WHERE t1.Condition = 'I need to compare this state';
Which Gives:
ID1 Date1 D2 Date2 Difference
-------------------------------------------------------------------------------
2 2018-12-11 12:07:53.000 1 2018-12-11 12:07:55.000 2
5 2018-12-11 12:06:39.000 3 2018-12-11 12:07:53.000 74
I would enumerate the values and then use window functions for the difference.
select min(id), max(id),
datediff(second, min(CreateDateAndTime), max(CreateDateAndTime)) as seconds
from (select t.*,
row_number() over (partition by condition order by CreateDateAndTime) as seqnum
from t
where condition in ('I need to compare this state', 'with this')
) t
group by seqnum;
I cannot tell what you want the results to look like. This version only output the differences, with the ids of the rows you care about. The difference can also be applied to the original rows, rather than put into summary rows.

How to write Oracle query to find a total length of possible overlapping from-to dates

I'm struggling to find the query for the following task
I have the following data and want to find the total network day for each unique ID
ID From To NetworkDay
1 03-Sep-12 07-Sep-12 5
1 03-Sep-12 04-Sep-12 2
1 05-Sep-12 06-Sep-12 2
1 06-Sep-12 12-Sep-12 5
1 31-Aug-12 04-Sep-12 3
2 04-Sep-12 06-Sep-12 3
2 11-Sep-12 13-Sep-12 3
2 05-Sep-12 08-Sep-12 3
Problem is the date range can be overlapping and I can't come up with SQL that will give me the following results
ID From To NetworkDay
1 31-Aug-12 12-Sep-12 9
2 04-Sep-12 08-Sep-12 4
2 11-Sep-12 13-Sep-12 3
and then
ID Total Network Day
1 9
2 7
In case the network day calculation is not possible just get to the second table would be sufficient.
Hope my question is clear
We can use Oracle Analytics, namely the "OVER ... PARTITION BY" clause, in Oracle to do this. The PARTITION BY clause is kind of like a GROUP BY but without the aggregation part. That means we can group rows together (i.e. partition them) and them perform an operation on them as separate groups. As we operate on each row we can then access the columns of the previous row above. This is the feature PARTITION BY gives us. (PARTITION BY is not related to partitioning of a table for performance.)
So then how do we output the non-overlapping dates? We first order the query based on the (ID,DFROM) fields, then we use the ID field to make our partitions (row groups). We then test the previous row's TO value and the current rows FROM value for overlap using an expression like: (in pseudo code)
max(previous.DTO, current.DFROM) as DFROM
This basic expression will return the original DFROM value if it doesnt overlap, but will return the previous TO value if there is overlap. Since our rows are ordered we only need to be concerned with the last row. In cases where a previous row completely overlaps the current row we want the row then to have a 'zero' date range. So we do the same thing for the DTO field to get:
max(previous.DTO, current.DFROM) as DFROM, max(previous.DTO, current.DTO) as DTO
Once we have generated the new results set with the adjusted DFROM and DTO values, we can aggregate them up and count the range intervals of DFROM and DTO.
Be aware that most date calculations in database are not inclusive such as your data is. So something like DATEDIFF(dto,dfrom) will not include the day dto actually refers to, so we will want to adjust dto up a day first.
I dont have access to an Oracle server anymore but I know this is possible with the Oracle Analytics. The query should go something like this:
(Please update my post if you get this to work.)
SELECT id,
max(dfrom, LAST_VALUE(dto) OVER (PARTITION BY id ORDER BY dfrom) ) as dfrom,
max(dto, LAST_VALUE(dto) OVER (PARTITION BY id ORDER BY dfrom) ) as dto
from (
select id, dfrom, dto+1 as dto from my_sample -- adjust the table so that dto becomes non-inclusive
order by id, dfrom
) sample;
The secret here is the LAST_VALUE(dto) OVER (PARTITION BY id ORDER BY dfrom) expression which returns the value previous to the current row.
So this query should output new dfrom/dto values which dont overlap. It's then a simple matter of sub-querying this doing (dto-dfrom) and sum the totals.
Using MySQL
I did haves access to a mysql server so I did get it working there. MySQL doesnt have results partitioning (Analytics) like Oracle so we have to use result set variables. This means we use #var:=xxx type expressions to remember the last date value and adjust the dfrom/dto according. Same algorithm just a little longer and more complex syntax. We also have to forget the last date value any time the ID field changes!
So here is the sample table (same values you have):
create table sample(id int, dfrom date, dto date, networkDay int);
insert into sample values
(1,'2012-09-03','2012-09-07',5),
(1,'2012-09-03','2012-09-04',2),
(1,'2012-09-05','2012-09-06',2),
(1,'2012-09-06','2012-09-12',5),
(1,'2012-08-31','2012-09-04',3),
(2,'2012-09-04','2012-09-06',3),
(2,'2012-09-11','2012-09-13',3),
(2,'2012-09-05','2012-09-08',3);
On to the query, we output the un-grouped result set like above:
The variable #ld is "last date", and the variable #lid is "last id". Anytime #lid changes, we reset #ld to null. FYI In mysql the := operators is where the assignment happens, an = operator is just equals.
This is a 3 level query, but it could be reduced to 2. I went with an extra outer query to keep things more readable. The inner most query is simple and it adjusts the dto column to be non-inclusive and does the proper row ordering. The middle query does the adjustment of the dfrom/dto values to make them non-overlapped. The outer query simple drops the non-used fields, and calculate the interval range.
set #ldt=null, #lid=null;
select id, no_dfrom as dfrom, no_dto as dto, datediff(no_dto, no_dfrom) as days from (
select if(#lid=id,#ldt,#ldt:=null) as last, dfrom, dto, if(#ldt>=dfrom,#ldt,dfrom) as no_dfrom, if(#ldt>=dto,#ldt,dto) as no_dto, #ldt:=if(#ldt>=dto,#ldt,dto), #lid:=id as id,
datediff(dto, dfrom) as overlapped_days
from (select id, dfrom, dto + INTERVAL 1 DAY as dto from sample order by id, dfrom) as sample
) as nonoverlapped
order by id, dfrom;
The above query gives the results (notice dfrom/dto are non-overlapping here):
+------+------------+------------+------+
| id | dfrom | dto | days |
+------+------------+------------+------+
| 1 | 2012-08-31 | 2012-09-05 | 5 |
| 1 | 2012-09-05 | 2012-09-08 | 3 |
| 1 | 2012-09-08 | 2012-09-08 | 0 |
| 1 | 2012-09-08 | 2012-09-08 | 0 |
| 1 | 2012-09-08 | 2012-09-13 | 5 |
| 2 | 2012-09-04 | 2012-09-07 | 3 |
| 2 | 2012-09-07 | 2012-09-09 | 2 |
| 2 | 2012-09-11 | 2012-09-14 | 3 |
+------+------------+------------+------+
How about constructing an SQL which merges intervals by removing holes and considering only maximum intervals. It goes like this (not tested):
SELECT DISTINCT F.ID, F.From, L.To
FROM Temp AS F, Temp AS L
WHERE F.From < L.To AND F.ID = L.ID
AND NOT EXISTS (SELECT *
FROM Temp AS T
WHERE T.ID = F.ID
AND F.From < T.From AND T.From < L.To
AND NOT EXISTS ( SELECT *
FROM Temp AS T1
WHERE T1.ID = F.ID
AND T1.From < T.From
AND T.From <= T1.To)
)
AND NOT EXISTS (SELECT *
FROM Temp AS T2
WHERE T2.ID = F.ID
AND (
(T2.From < F.From AND F.From <= T2.To)
OR (T2.From < L.To AND L.To < T2.To)
)
)
with t_data as (
select 1 as id,
to_date('03-sep-12','dd-mon-yy') as start_date,
to_date('07-sep-12','dd-mon-yy') as end_date from dual
union all
select 1,
to_date('03-sep-12','dd-mon-yy'),
to_date('04-sep-12','dd-mon-yy') from dual
union all
select 1,
to_date('05-sep-12','dd-mon-yy'),
to_date('06-sep-12','dd-mon-yy') from dual
union all
select 1,
to_date('06-sep-12','dd-mon-yy'),
to_date('12-sep-12','dd-mon-yy') from dual
union all
select 1,
to_date('31-aug-12','dd-mon-yy'),
to_date('04-sep-12','dd-mon-yy') from dual
union all
select 2,
to_date('04-sep-12','dd-mon-yy'),
to_date('06-sep-12','dd-mon-yy') from dual
union all
select 2,
to_date('11-sep-12','dd-mon-yy'),
to_date('13-sep-12','dd-mon-yy') from dual
union all
select 2,
to_date('05-sep-12','dd-mon-yy'),
to_date('08-sep-12','dd-mon-yy') from dual
),
t_holidays as (
select to_date('01-jan-12','dd-mon-yy') as holiday
from dual
),
t_data_rn as (
select rownum as rn, t_data.* from t_data
),
t_model as (
select distinct id,
start_date
from t_data_rn
model
partition by (rn, id)
dimension by (0 as i)
measures(start_date, end_date)
rules
( start_date[for i
from 1
to end_date[0]-start_date[0]
increment 1] = start_date[0] + cv(i),
end_date[any] = start_date[cv()] + 1
)
order by 1,2
),
t_network_days as (
select t_model.*,
case when
mod(to_char(start_date, 'j'), 7) + 1 in (6, 7)
or t_holidays.holiday is not null
then 0 else 1
end as working_day
from t_model
left outer join t_holidays
on t_holidays.holiday = t_model.start_date
)
select id,
sum(working_day) as network_days
from t_network_days
group by id;
t_data - your initial data
t_holidays - contains list of holidays
t_data_rn - just adds unique key (rownum) to each row of t_data
t_model - expands t_data date ranges into a flat list of dates
t_network_days - marks each date from t_model as working day or weekend based on day of week (Sat and Sun) and holidays list
final query - calculates number of network day per each group.