I have a dataset that looks like this:
StartDate EndDate InstrumentID Dimension DimensionValue
2018-01-01 2018-01-01 123 Currency GBP
2018-01-02 2018-01-02 123 Currency GBP
2018-01-03 2018-01-03 123 Currency USD
2018-01-04 2018-01-04 123 Currency USD
2018-01-05 2018-01-05 123 Currency GBP
2018-01-06 2018-01-06 123 Currency GBP
What I would like is to transform this dataset into a date bound dataset like below:
StartDate EndDate InstrumentID Dimension DimensionValue
2018-01-01 2018-01-02 123 Currency GBP
2018-01-03 2018-01-04 123 Currency USD
2018-01-05 2018-01-06 123 Currency GBP
I thought about writing the SQL like this:
SELECT
MIN(StartDate) AS StartDate
, MAX(EndDate) AS EndDate
, [InstrumentID]
, Dimension
, DimensionValue
FROM #Worktable
GROUP BY InstrumentID, Dimension, DimensionValue
However this obviously won't work as it will ignore the change in date for GBP and just group one record together with start date of 2018-01-01 and end date of 2018-01-06.
Is there a way in which I can do this and achieve the dates I require?
Thanks
This is a common Gaps and Islands question. There are plenty of examples out there on how to do this; for example:
WITH VTE AS(
SELECT CONVERT(date,StartDate) AS StartDate,
CONVERT(Date,EndDate) AS EndDate,
InstrumentID,
Dimension,
DimensionValue
FROM (VALUES('20180101','20180101',123,'Currency','GBP'),
('20180102','20180102',123,'Currency','GBP'),
('20180103','20180103',123,'Currency','USD'),
('20180104','20180104',123,'Currency','USD'),
('20180105','20180105',123,'Currency','GBP'),
('20180106','20180106',123,'Currency','GBP')) V(StartDate,EndDate,InstrumentID,Dimension,DimensionValue)),
Grps AS (
SELECT StartDate,
EndDate,
InstrumentID,
Dimension,
DimensionValue,
ROW_NUMBER() OVER (PARTITION BY InstrumentID, Dimension ORDER BY StartDate) -
ROW_NUMBER() OVER (PARTITION BY InstrumentID, Dimension, DimensionValue ORDER BY StartDate) AS Grp
FROM VTE)
SELECT MIN(StartDate) AS StartDate,
MAX(EndDate) AS EndDate,
InstrumentID,
Dimension,
DimensionValue
FROM Grps
GROUP BY InstrumentID,
Dimension,
DimensionValue,
Grp
ORDER BY StartDate;
This is a form of gaps-and-islands. But because there are start date and end dates, you need to be careful. I recommend lag() and cumulative sum:
select InstrumentID, Dimension, DimensionValue,
min(startdate) as startdate, max(enddate) as enddate
from (select w.*,
sum(case when prev_enddate = startdate then 0 else 1 end)
over (partition by InstrumentID, Dimension,
DimensionValue order by startdate) as grp
from (select w.*,
lag(enddate) over (partition by InstrumentID, Dimension, DimensionValue
order by startdate) as prev_enddate
from #worktable w
) w
group by InstrumentID, Dimension, DimensionValue, grp
order by InstrumentID, Dimension, DimensionValue, min(startdate);
You need to use dense rank like:
with x as(
select DENSE_RANK() OVER
(PARTITION BY DimensionValue) AS Rank , *
from Worktable
) select StartDate AS StartDate
, EndDate AS EndDate
, [InstrumentID]
, Max(Dimension) AS Dimension
, DimensionValue, Rank
FROM x
GROUP BY InstrumentID, StartDate, EndDate, DimensionValue,Rank
Update, I just thought of this, I couldn't test it yet, I think it will work the way you want it to.
Select StartDate, EndDate, InstrumentID, Dimension, DimensionValue From (
SELECT
StartDate AS StartDate
, EndDate AS EndDate
, [InstrumentID]
, Dimension
, DimensionValue
, Count(*)
FROM #Worktable
GROUP BY InstrumentID, StartDate, EndDate, Dimension, DimensionValue) x
Hope this helps!
Try something like the following:
WITH CTE AS(
SELECT StartDate::DATE AS StartDate,
EndDate::DATE AS EndDate,
InstrumentID,
Dimension,
DimensionValue
FROM (VALUES('20180101','20180101',123,'Currency','GBP'),
('20180102','20180102',123,'Currency','GBP'),
('20180103','20180103',123,'Currency','USD'),
('20180104','20180104',123,'Currency','USD'),
('20180105','20180105',123,'Currency','GBP'),
('20180106','20180106',123,'Currency','GBP')) V(StartDate,EndDate,InstrumentID,Dimension,DimensionValue))
SELECT startdate
, enddate
, instrumentid
, dimension
, dimensionvalue
FROM (
SELECT *
, CASE WHEN (LAG(enddate, 1) OVER(PARTITION BY dimensionvalue ORDER BY startdate) IS NULL) OR (enddate - LAG(enddate, 1) OVER(PARTITION BY dimensionvalue ORDER BY startdate) <> 1) THEN 0
ELSE 1 END is_valid
FROM CTE
) a
WHERE is_valid = 1
ORDER BY startdate;
Credit to #Lamu for creating the temp table.
Related
My dataset looks like this and I need to generate the StartDate (Min), EndDate(Min) by grouping them by Name and Date columns. When Type changes, the group by logic should break and take Max date till there.
Name
Type
Date
A
xx
1/1/2018
A
xx
1/2/2018
A
yy
1/3/2018
A
xx
1/4/2018
A
xx
1/5/2018
A
xx
1/6/2018
The output would be like:
Name
Type
StartDate
EndDate
A
xx
1/1/2018
1/2/2018
A
yy
1/3/2018
1/3/2018
A
xx
1/4/2018
1/6/2018
Below approach would be bit clumsy yet fetches the desired output. The buckets are partitioned based on the date (day) difference.
declare #tbl table(name varchar(5),type varchar(5),[date] date)
insert into #tbl
values('A','xx','1/1/2018')
,('A','xx','1/2/2018')
,('A','yy','1/3/2018')
,('A','xx','1/4/2018')
,('A','xx','1/5/2018')
,('A','xx','1/6/2018')
select distinct name,type
,min(date)over(partition by name,type,diffmodified order by diffmodified) as [StartDate]
,max(date)over(partition by name,type,diffmodified order by diffmodified) as [EndDate]
from(
select *
,case when max(diff)over(partition by name,type order by [date]) > 1
then max(diff)over(partition by name,type order by [date]) else diff end as [diffmodified]
from(
select *,
isnull(DATEDIFF(day, lag([date],1)
over(partition by name,type order by [date]), [date] ),1)[diff]
from
#tbl)
t)t
The challenge in this case is to identify all target groups by the columns Name and Type taking into account the gaps. As a possible solution, you can use an additional grouping expression based on the difference between Row_Number ordered by Date and Row_Number ordered by Date with Partion by Name, Type.
With A As (
Select Name, [Type], [Date],
Row_Number() Over (Order by [Date]) As Num,
Row_Number() Over (Partition by Name, [Type] Order by [Date]) As Num_1
From Tbl)
Select Name, [Type],
Convert(VarChar(10), Min([Date]), 103) As StartDate,
Convert(VarChar(10), Max([Date]), 103) As EndDate
From A
Group by Name, [Type], Num - Num_1
Order by StartDate
dbfiddle
Name
Type
StartDate
EndDate
A
xx
01/01/2018
01/02/2018
A
yy
01/03/2018
01/03/2018
A
xx
01/04/2018
01/06/2018
Hope this clarify you.
select Name,Type,min(date) as StartDate,max(date) as EndDate
from Table_Name
group by Type,Name
Here is my data:
id
customercode
startdate
enddate
1
122
20200812
20200814
2
122
20200816
20200817
3
122
20200817
20200819
4
122
20200821
20200822
5
122
20200823
20200824
I tried the following code:
select Customercode, min(startdate) as startdate, max(enddate) as enddate
from (
select Customercode, startdate, enddate
sum(rst) over (order by Customercode, DOS) as grp
from (
select Customercode, startdate, enddate
case when coalesce(lag(enddate) over (partition by Customercode order by Customercode, startdate), startdate) + 1 <> startdate then 1 end rst
from tbl
) t1
) t2
group by grp, Customercode
order by startdate
My result
id
customercode
startdate
enddate
1
122
20200812
20200814
2
122
20200816
20200817
3
122
20200817
20200819
4
122
20200821
20200824
The desired output should be like this. Please share your thoughts.
id
customercode
startdate
enddate
1
122
20200812
20200814
2
122
20200816
20200819
3
122
20200821
20200824
It is unclear if you want to group records whose start date is the same as the previous end date, or one day afterwards.
If you want group on the same date, you would phrase the query as:
select customercode, min(startdate), max(enddate)
from (
select t.*,
sum(case when startdate = lag_enddate then 0 else 1 end)
over(partition by customercode order by startdate) as grp
from (
select t.*,
lag(enddate) over(partition by customercode order by startdate) as lag_enddate
from tbl t
) t
) t
group by customercode, grp
order by min(startdate)
You can also allow both cases at once, by modifying the conditional window sum(). This requires a little date artithmetics, whose syntax varies across databases. In standard SQL:
sum(case when startdate <= lag_enddate + interval '1' day then 0 else 1 end)
over(partition by customercode order by startdate) as grp
I have a data ranges with start and end date for a persons, I want to get the continuous date ranges only per persons:
Input:
NAME | STARTDATE | END DATE
--------------------------------------
MIKE | **2019-05-15** | 2019-05-16
MIKE | 2019-05-17 | **2019-05-18**
MIKE | 2020-05-18 | 2020-05-19
Expected output like:
MIKE | **2019-05-15** | **2019-05-18**
MIKE | 2020-05-18 | 2020-05-19
So basically output is MIN and MAX for each continuous period for the person.
Appreciate any help.
I have tried the below query:
With N AS ( SELECT Name, StartDate, EndDate
, LastStop = MAX(EndDate)
OVER (PARTITION BY Name ORDER BY StartDate, EndDate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) FROM Table ), B AS ( SELECT Name, StartDate, EndDate
, Block = SUM(CASE WHEN LastStop Is Null Then 1
WHEN LastStop < StartDate Then 1
ELSE 0
END)
OVER (PARTITION BY Name ORDER BY StartDate, LastStop) FROM N ) SELECT Name
, MIN(StartDate) DateFrom
, MAX(EndDate) DateTo FROM B GROUP BY Name, Block ORDER BY Name, Block
But its not considering the continuous period. It's showing the same input.
This is a type of gap-and-islands problem. There is no need to expand the data out by day! That seems very inefficient.
Instead, determine the "islands". This is where there is no overlap -- in your case lag() is sufficient. Then a cumulative sum and aggregation:
select name, min(startdate), max(enddate)
from (select t.*,
sum(case when prev_enddate >= dateadd(day, -1, startdate) then 0 else 1 end) over
(partition by name order by startdate) as grp
from (select t.*,
lag(enddate) over (partition by name order by startdate) as prev_enddate
from t
) t
) t
group by name, grp;
Here is a db<>fiddle.
Here is an example using an ad-hoc tally table
Example or dbFiddle
;with cte as (
Select A.[Name]
,B.D
,Grp = datediff(day,'1900-01-01',D) - dense_rank() over (partition by [Name] Order by D)
From YourTable A
Cross Apply (
Select Top (DateDiff(DAY,StartDate,EndDate)+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),StartDate)
From master..spt_values n1,master..spt_values n2
) B
)
Select [Name]
,StartDate= min(D)
,EndDate = max(D)
From cte
Group By [Name],Grp
Returns
Name StartDate EndDate
MIKE 2019-05-15 2019-05-18
MIKE 2020-05-18 2020-05-19
Just to help with the Visualization, the CTE generates the following
This will give you the same result
SELECT subquery.name,min(subquery.startdate),max(subquery.enddate1)
FROM (SELECT NAME,startdate,
CASE WHEN EXISTS(SELECT yt1.startdate
FROM t yt1
WHERE yt1.startdate = DATEADD(day, 1, yt2.enddate)
) THEN null else yt2.enddate END as enddate1
FROM t yt2) as subquery
GROUP by NAME, CAST(MONTH(subquery.startdate) AS VARCHAR(2)) + '-' + CAST(YEAR(subquery.startdate) AS VARCHAR(4))
For the CASE WHEN EXISTS I refered to SQL CASE
For the group by month and year you can see this GROUP BY MONTH AND YEAR
DB_FIDDLE
Please help to split the date range by every 6 moths and the start date could be anything but using the start date we need to add up to 09-30 only and the next day which is 10/01 should become start date. I tried using recursive cte but still not getting the exact result
startdate enddate
06-22-2018 09-30-2022
output
startdate enddate
06-22-2018 09-30-2018
10-01-2018 03-31-2019
04-01-2019 09-30-2019
10-01-2019 03-31-2020
04-01-2020 09-30-2020
Here is another option which uses an ad-hoc tally table
Example
Declare #YourTable table (startdate date, enddate date)
Insert Into #YourTable values
('06/22/2018','09/30/2022')
;with cte as (
Select *
,Grp = sum( case when day(D)=1 and month(D) in (4,10) then 1 else 0 end) over (order by d)
From #YourTable A
Cross Apply (
Select Top (DateDiff(DAY,startdate,enddate)+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),startdate)
From master..spt_values n1,master..spt_values n2
) B
)
Select StartDate = min(D)
,EndDate = max(D)
From cte
Group by Grp
Order By min(D)
Returns
StartDate EndDate
2018-06-22 2018-09-30
2018-10-01 2019-03-31
2019-04-01 2019-09-30
2019-10-01 2020-03-31
2020-04-01 2020-09-30
2020-10-01 2021-03-31
2021-04-01 2021-09-30
2021-10-01 2022-03-31
2022-04-01 2022-09-30
Option where we JOIN to an ad-hoc calendar table (note the TOP 10000 and base date of 2000-01-01)
Declare #YourTable table (id int,startdate date, enddate date)
Insert Into #YourTable values
(1,'06/22/2018','09/30/2022')
;with cte as (
Select A.*
,B.D
,Grp = sum( case when day(D)=1 and month(D) in (4,10) then 1 else 0 end) over (order by d)
From #YourTable A
Join (
Select Top 10000 D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),'2000-01-01')
From master..spt_values n1,master..spt_values n2
) B on D between startDate and EndDate
and (D in (startdate,EndDate)
or ( day(D) in (1,day(eomonth(d))) and month(D) in (3,4,9,10))
)
)
Select ID
,StartDate = min(D)
,EndDate = max(D)
From cte
Group by ID,Grp
Order By ID,min(D)
Returns
ID StartDate EndDate
1 2018-06-22 2018-09-30
1 2018-10-01 2019-03-31
1 2019-04-01 2019-09-30
1 2019-10-01 2020-03-31
1 2020-04-01 2020-09-30
1 2020-10-01 2021-03-31
1 2021-04-01 2021-09-30
1 2021-10-01 2022-03-31
1 2022-04-01 2022-09-30
You can use a recursive CTE:
with cte as (
select startdate, eomonth(datefromparts(year(startdate), 9, 1)) as enddate, enddate as orig_enddate
from t
union all
select dateadd(day, 1, enddate), eomonth(dateadd(month, 5, dateadd(day, 1, enddate))) as enddate, orig_enddate
from cte
where enddate < orig_enddate
)
select *
from cte;
Here is a db<>fiddle.
It is unclear what year you want for the first row. As per your question, this uses Sep 30th of the year of the startdate.
If you need more than 100 dates, then add option max(recursion 0).
I'm trying to do a query on this table:
Id startdate enddate amount
1 2013-01-01 2013-01-31 0.00
2 2013-02-01 2013-02-28 0.00
3 2013-03-01 2013-03-31 245
4 2013-04-01 2013-04-30 529
5 2013-05-01 2013-05-31 0.00
6 2013-06-01 2013-06-30 383
7 2013-07-01 2013-07-31 0.00
8 2013-08-01 2013-08-31 0.00
I want to get the output:
2013-01-01 2013-02-28 0
2013-03-01 2013-06-30 1157
2013-07-01 2013-08-31 0
I wanted to get that result so I would know when money started to come in and when it stopped. I am also interested in the number of months before money started coming in (which explains the first row), and the number of months where money has stopped (which explains why I'm also interested in the 3rd row for July 2013 to Aug 2013).
I know I can use min and max on the dates and sum on amount but I can't figure out how to get the records divided that way.
Thanks!
with CT as
(
select t1.*,
( select max(endDate)
from t
where startDate<t1.StartDate and SIGN(amount)<>SIGN(t1.Amount)
) as GroupDate
from t as t1
)
select min(StartDate) as StartDate,
max(EndDate) as EndDate,
sum(Amount) as Amount
from CT
group by GroupDate
order by StartDate
SQLFiddle demo
Here's one idea (and a fiddle to go with it):
;WITH MoneyComingIn AS
(
SELECT MIN(startdate) AS startdate, MAX(enddate) AS enddate,
SUM(amount) AS amount
FROM myTable
WHERE amount > 0
)
SELECT MIN(startdate) AS startdate, MAX(enddate) AS enddate,
SUM(amount) AS amount
FROM myTable
WHERE enddate < (SELECT startdate FROM MoneyComingIn)
UNION ALL
SELECT startdate, enddate, amount
FROM MoneyComingIn
UNION ALL
SELECT MIN(startdate) AS startdate, MAX(enddate) AS enddate,
SUM(amount) AS amount
FROM myTable
WHERE startdate > (SELECT enddate FROM MoneyComingIn)
And a second, without using UNION (fiddle):
SELECT MIN(startdate), MAX(enddate), SUM(amount)
FROM
(
SELECT startdate, enddate, amount,
CASE
WHEN EXISTS(SELECT 1
FROM myTable b
WHERE b.id>=a.id AND b.amount > 0) THEN
CASE WHEN EXISTS(SELECT 1
FROM myTable b
WHERE b.id<=a.id AND b.amount > 0)
THEN 2
ELSE 1
END
ELSE 3
END AS partition_no
FROM myTable a
) x
GROUP BY partition_no
although I suppose as written it assumes Id are in order. You could substitute this with a ROW_NUMBER() OVER(ORDER BY startdate).
Something like that should do it :
select min(startdate), max(enddate), sum(amount) from paiements
where enddate < (select min(startdate) from paiements where amount >0)
union
select min(startdate), max(enddate), sum(amount) from paiements
where startdate >= (select min(startdate) from paiements where amount >0)
and enddate <= (select max(enddate) from paiements where amount >0)
union
select min(startdate), max(enddate), sum(amount) from paiements
where startdate > (select max(enddate) from paiements where amount >0)
But for this kind of reporting, It's probably more explicit using multiple queries.
This does what you want:
-- determine the three periods
DECLARE #StartMoneyIn INT
DECLARE #EndMoneyIn INT
SELECT #StartMoneyIn = MIN(Id)
FROM [Amounts]
WHERE amount > 0
SELECT #EndMoneyIn = MAX(Id)
FROM [Amounts]
WHERE amount > 0
-- retrieve the amounts
SELECT MIN(startdate) AS startdate, MAX(enddate) AS enddate, SUM(amount) AS amount
FROM [Amounts]
WHERE Id < #StartMoneyIn
UNION
SELECT MIN(startdate), MAX(enddate), SUM(amount)
FROM [Amounts]
WHERE Id >= #StartMoneyIn AND Id <= #EndMoneyIn
UNION
SELECT MIN(startdate), MAX(enddate), SUM(amount)
FROM [Amounts]
WHERE Id > #EndMoneyIn
If all you want to do is to see when money started coming in and when it stopped, this might work for you:
select
min(startdate),
max(enddate),
sum(amount)
where
amount > 0
This would not include the periods where there was no money coming in though.
If you don't care about the total in the period, but only want the records where you go from 0 to something and vica versa, you could do something crazy like this:
select *
from MoneyTable mt
where exists ( select *
from MoneyTable mtTemp
where mtTemp.enddate = dateadd(day, -1, mt.startDate)
and mtTemp.amount <> mt.amount
and mtTemp.amount * mt.amount = 0)
Or if you must include the first record:
select *
from MoneyTable mt
where exists ( select *
from MoneyTable mtTemp
where mtTemp.enddate = dateadd(day, -1, mt.startDate)
and mtTemp.amount <> mt.amount
and mtTemp.amount * mt.amount = 0 )
or not exists ( select *
from MoneyTable mtTemp
where mtTemp.enddate = dateadd(day, -1, mt.startDate))
Sql Fiddle