How to get Cumulative data from the same table using SQL? - sql

I have this table
table1
eventid entityid eventdate
----------------------------------------
123 xyz Jan-02-2019
541 xyz Jan-02-2019
234 xyz Jan-03-2019
432 xyz Jan-04-2019
111 xyz Jan-05-2019
124 xyz Jan-06-2019
123 xyz Jan-07-2019
234 xyz Jan-08-2019
432 xyz Jan-09-2019
111 xyz Jan-12-2019
I want to show final result as
entityid interval1 interval2
------------------------------
xyz 2 4
here intervals are in days.
Logic to calculate intervals are :
Ex - event 123 and 234 happens multiple time so date difference between each occurrence as shown below would be added finally into interval1.
Please note - its not necessary 234 would be always in a next row of 123. there could be other events in between.
Formula is
interval1 = datediff(day,eventdate of 123,eventdate of 234) + datediff(day,eventdate of 123,eventdate of 234) + and so on
Same for interval2 but for event 432 & 111.
entityid eventid1 eventid2 event_date_diff
--------------------------------------------
xyz 123 234 1
xyz 123 234 1
xyz 432 111 1
xyz 432 111 3
The challenge here is to find out if event 123 has 234 event or not in upcoming rows (not necessarily in immediate next row) and if its there then find the date difference. If there are any other events between 123-234 then we need to ignore those in between events. Also if 123 appears twice then need latest eventdate for 123.

Let's go over this in terms of your requirements, and build up the necessary pieces. This won't be approached in the order you stated them in, but in an order that makes them easier to understand.
Also if 123 appears twice then need latest eventdate for 123.
This means we need to create a range bounds. This is pretty easy:
NextOccurence AS (SELECT eventId, entityId, eventDate,
LEAD(eventDate) OVER(PARTITION BY eventId, entityId ORDER BY eventDate) AS nextOccurenceDate
FROM Table1)
... this will give us every occurrence of an event, with the next one, if present (these can be limited to just your "source" events, but I'm not bothering with that here).
The challenge here is to find out if event 123 has 234 event or not in upcoming rows (not necessarily in immediate next row) and if its there then find the date difference. If there are any other events between 123-234 then we need to ignore those in between events.
(and you previously mentioned it should be the minimum following date, if there were multiple following events).
For this we need to first map events:
EventMap AS (SELECT 123 AS original, 234 AS follow
UNION ALL
SELECT 432, 111)
... and use this to get the "next" following event in range, in what is partially a greatest-n-per-group query:
SELECT NextOccurence.entityId, NextOccurence.eventId, DATEDIFF(day, NextOccurence.eventDate, Table1.eventDate) AS diff
FROM NextOccurence
JOIN EventMap
ON EventMap.original = NextOccurence.eventId
CROSS APPLY (SELECT TOP 1 Table1.eventDate
FROM Table1
WHERE Table1.entityId = NextOccurence.entityId
AND Table1.eventId = EventMap.follow
AND Table1.eventDate >= NextOccurence.eventDate
AND (Table1.eventDate < NextOccurence.nextOccurenceDate OR NextOccurence.nextOccurenceDate IS NULL)
ORDER BY Table1.eventDate) AS Table1
... at this point, we have something close to your intermediate results table:
| entityId | eventId | diff |
|----------|---------|------|
| xyz | 123 | 1 |
| xyz | 123 | 1 |
| xyz | 432 | 1 |
| xyz | 432 | 3 |
... and what follows afterwards would be a standard PIVOT query to aggregate the results.
The final query ends up looking like this:
WITH NextOccurence AS (SELECT eventId, entityId, eventDate,
LEAD(eventDate) OVER(PARTITION BY eventId, entityId ORDER BY eventDate) AS nextOccurenceDate
FROM Table1),
EventMap AS (SELECT 123 AS original, 234 AS follow
UNION ALL
SELECT 432, 111)
SELECT entityId, [123] AS '123-234', [432] AS '432-111'
FROM (SELECT NextOccurence.entityId, NextOccurence.eventId, DATEDIFF(day, NextOccurence.eventDate, Table1.eventDate) AS diff
FROM NextOccurence
JOIN EventMap
ON EventMap.original = NextOccurence.eventId
CROSS APPLY (SELECT TOP 1 Table1.eventDate
FROM Table1
WHERE Table1.entityId = NextOccurence.entityId
AND Table1.eventId = EventMap.follow
AND Table1.eventDate >= NextOccurence.eventDate
AND (Table1.eventDate < NextOccurence.nextOccurenceDate OR NextOccurence.nextOccurenceDate IS NULL)
ORDER BY Table1.eventDate) AS Table1) AS d
PIVOT (SUM(diff)
FOR eventId IN ([123], [432])
) AS pvt
Fiddle example
...which generates the expected results:
| entityId | 123-234 | 432-111 |
|----------|---------|---------|
| xyz | 2 | 4 |

From what I understood of the question, we are asked to provide occurrences of each eventid per date. However these are to be represented in columns rather than rows.
My approach to this problem is firstly to pivot the data within a cte and then to select the unique value from each column in to the cross apply operator of a query. There may be better ways of doing it but this made the most sense to me.
DECLARE #T TABLE
(
EventId INT,
EntityId NVARCHAR(3),
EventDate DATETIME
);
INSERT INTO #T (EventId, EntityId, EventDate)
SELECT * FROM (VALUES
(123, 'xyz', '2019-01-02'),
(234, 'xyz', '2019-01-03'),
(432, 'xyz', '2019-01-04'),
(111, 'xyz', '2019-01-05'),
(124, 'xyz', '2019-01-06'),
(123, 'xyz', '2019-01-07'),
(234, 'xyz', '2019-01-08'),
(432, 'xyz', '2019-01-09'),
(111, 'xyz', '2019-01-12')
) X (EVENTID, ENTITYID, EVENTDATE);
with cte as (
select EntityId, [123] as Interval1, [234] as Interval2, [432] as Interval3, [111] as
Interval4, [124] as Interval5
from
(
select top 5 EntityId, EventId, min(eventdate) as ordering, count(distinct EventDate)
as
vol from #T
group by EntityId, EventId
order by ordering
) src
PIVOT
(
max(vol)
for EventId in ([123], [234], [432], [111], [124])
) as pvt)
select distinct EntityId, Interval1, Interval2, Interval3, Interval4, Interval5
from (select EntityId from cte) a
cross apply
(select Interval1 from cte where Interval1 is not null) b
cross apply
(select Interval2 from cte where Interval2 is not null) c
cross apply
(select Interval3 from cte where Interval3 is not null) d
cross apply
(select Interval4 from cte where Interval4 is not null) e
cross apply
(select Interval5 from cte where Interval5 is not null) f;

You can use lead() and conditional aggregation for this:
select sum(case when eventid = 123 and next_eventid = 234
then datediff(day, eventdate, next_eventdate)
end) as interval1,
sum(case when eventid = 432 and next_eventid = 111
then datediff(day, eventdate, next_eventdate)
end) as interval2
from (select t.*,
lead(eventid) over (partition by entityid order by eventdate) as next_eventid,
lead(eventdate) over (partition by entityid order by eventdate) as next_eventdate
from t
) t;
Probably the simplest way to handle intervening events is conditional cumulative arithemtic:
select sum(case when eventid = 123 and
then datediff(day, eventdate, next_eventdate_234)
end) as interval1,
sum(case when eventid = 432 and
then datediff(day, eventdate, next_eventdate_111)
end) as interval2
from (select t.*,
min(case when eventid = 234 then eventdate end) over (order by eventdate desc) as next_eventdate_234,
min(case when eventid = 111 then eventdate end) over (order by eventdate desc) as next_eventdate_111
from t
where eventid in (123, 234)
) t
where eventid in (123, 432);

Related

Create date pairs from list of dates in one column table

I have a problem with a SQL query. I have a list of dates in one column, I would like to create pairs of dates. The dates are sequenced, so I have to match the first date with the second and create a record, then the third date with the fourth date and create a record etc .. as in the following example:
ID DATA
50 10/04/2019
50 12/04/2019
50 13/04/2019
50 17/04/2019
50 18/04/2019
50 19/04/2019
ID DATA_START DATA_END
50 10/04/2019 12/04/2019
50 13/04/2019 17/04/2019
50 18/04/2019 19/04/2019
Thanks very much everyone for the help
You should mark your rows that should be grouped together (into single row) and which date will have which role (start or end).
Here's the code:
with a as (
/*Source data*/
select 50 as id, convert(date, '2019-04-10', 23) as dt union all
select 50 as id, convert(date, '2019-04-12', 23) as dt union all
select 50 as id, convert(date, '2019-04-13', 23) as dt union all
select 50 as id, convert(date, '2019-04-17', 23) as dt union all
select 50 as id, convert(date, '2019-04-18', 23) as dt union all
select 50 as id, convert(date, '2019-04-19', 23) as dt
)
select
id,
[1] as dt_start,
[0] as dt_end
from (
select
id,
dt,
/*
the first row (with modulo = 1) is date from
and the second row (with modulo = 0) is date to
*/
(row_number() over(partition by id order by dt)) % 2 as dt_role,
/*Integer division by 2 will group rows together*/
(row_number() over(partition by id order by dt) + 1) / 2 as dt_group
from a
) as s
pivot (
max(dt) for dt_role in ([0], [1])
) as p
GO
id | dt_start | dt_end
-: | :--------- | :---------
50 | 2019-04-10 | 2019-04-12
50 | 2019-04-13 | 2019-04-17
50 | 2019-04-18 | 2019-04-19
db<>fiddle here

Merge rows if date columns are overlapping in TSQL

I have a table in the following format
Id StartDate EndDate Type
1 2012-02-18 2012-03-18 1
1 2012-03-17 2012-06-29 1
1 2012-06-27 2012-09-27 1
1 2014-08-23 2014-09-24 3
1 2014-09-23 2014-10-24 3
1 2014-10-23 2014-11-24 3
2 2015-07-04 2015-08-06 1
2 2015-08-04 2015-09-06 1
3 2013-11-01 2013-12-01 0
3 2018-01-09 2018-02-09 0
I found similar questions here, but not something that could help me solve my problem. I want to merge rows that has the same Id, Type and overlapping date periods.
The result from the above table should be
Id StartDate EndDate Type
1 2012-02-18 2012-09-27 1
1 2014-08-23 2014-11-24 3
2 2015-07-04 2015-09-06 1
3 2013-11-01 2013-12-01 0
3 2018-01-09 2018-02-09 0
In another server, I was able to do it with the following restrictions and the query below:
Didn't care about the Type column, but just the Id
Had a newer version of SQL Server (2012), but now I have 2008 which the code is not compatible
SELECT Id
, MIN(StartDate) AS StartDate
, MAX(EndDate) AS EndDate
FROM (
SELECT *
, SUM(CASE WHEN a.EndDate = a.StartDate THEN 0
ELSE 1
END
) OVER (ORDER BY Id, StartDate) sm
FROM (
SELECT Id
, StartDate
, EndDate
, LAG(EndDate, 1, NULL) OVER (PARTITION BY Id ORDER BY Id, EndDate) EndDate
FROM #temptable
) a
) b
GROUP BY Id, sm
Any advice how I can
Include Type on the process
Make it work on SQL Server 2008
This approach uses an additional temp table to identify the groups of overlapping dates, and then performs a quick aggregate based on the groupings.
SELECT *, ROW_NUMBER() OVER (ORDER BY Id, Type) AS UID,
ROW_NUMBER() OVER (ORDER BY Id, Type) AS GroupId INTO #G FROM #TempTable
WHILE ##ROWCOUNT <> 0 BEGIN
UPDATE T1 SET
GroupId = T2.GroupId
FROM #G T1
INNER JOIN (
SELECT T1.UID, CASE WHEN T1.GroupId < T2.GroupId THEN T1.GroupId ELSE T2.GroupId END
FROM #G T1
LEFT OUTER JOIN #G T2
ON T1.Id = T2.Id AND T1.Type = T2.Type AND T1.GroupId <> T2.GroupId
AND T1.StartDate <= T2.EndDate AND T2.StartDate <= T1.EndDate
) T2 (UID, GroupId)
ON T1.UID = T2.UID
WHERE T1.GroupId <> T2.GroupId
END
SELECT Id, MIN(StartDate) AS StartDate, MAX(EndDate) AS EndDate, Type
FROM #G G GROUP BY GroupId, Id, Type
This returns the expected values
Id StartDate EndDate Type
----------- ---------- ---------- -----------
1 2012-02-18 2012-09-27 1
1 2014-08-23 2014-11-24 3
2 2015-07-04 2015-09-06 1
3 2013-11-01 2013-12-01 0
3 2018-01-09 2018-02-09 0
This is 2008 compatible. A CTE really is the best way to link up all overlapping records in my opinion. The date overlap logic came from this thread: SO Date Overlap
I added extra data that's more complex to make sure that it's working as expected.
DECLARE #Data table (Id INT, StartDate DATE, EndDate DATE, Type INT)
INSERT INTO #data
SELECT 1,'2/18/2012' ,'3/18/2012', 1 UNION ALL
select 1,'3/17/2012','6/29/2012',1 UNION ALL
select 1,'6/27/2012','9/27/2012',1 UNION ALL
select 1,'8/23/2014','9/24/2014',3 UNION ALL
select 1,'9/23/2014','10/24/2014',3 UNION ALL
select 1,'10/23/2014','11/24/2014',3 UNION ALL
select 2,'7/4/2015','8/6/2015',1 UNION ALL
select 2,'8/4/2015','9/6/2015',1 UNION ALL
select 3,'11/1/2013','12/1/2013',0 UNION ALL
select 3,'1/9/2018','2/9/2018',0 UNION ALL
select 4,'1/1/2018','1/2/2018',0 UNION ALL --many non overlapping dates
select 4,'1/4/2018','1/5/2018',0 UNION ALL
select 4,'1/7/2018','1/9/2018',0 UNION ALL
select 4,'1/11/2018','1/13/2018',0 UNION ALL
select 4,'2/7/2018','2/8/2018',0 UNION ALL --many overlapping dates
select 4,'2/8/2018','2/9/2018',0 UNION ALL
select 4,'2/9/2018','2/10/2018',0 UNION all
select 4,'2/10/2018','2/11/2018',0 UNION all
select 4,'2/11/2018','2/12/2018',0 UNION all
select 4,'2/12/2018','2/13/2018',0 UNION all
select 4,'3/7/2018','3/8/2018',0 UNION ALL --many overlapping dates, second instance of id 4, type 0
select 4,'3/8/2018','3/9/2018',0 UNION ALL
select 4,'3/9/2018','3/10/2018',0 UNION all
select 4,'3/10/2018','3/11/2018',0 UNION all
select 4,'3/11/2018','3/12/2018',0 UNION all
select 4,'3/12/2018','3/13/2018',0
;
WITH cdata
AS (SELECT Id,
d.Type,
d.StartDate,
d.EndDate,
CurrentStart = d.StartDate
FROM #Data d
WHERE
NOT EXISTS (
SELECT * FROM #Data x WHERE x.StartDate < d.StartDate AND d.StartDate <= x.EndDate AND d.EndDate >= x.StartDate AND d.Id = x.Id AND d.Type = x.Type --get first records for overlapping ranges
)
UNION ALL
SELECT d.Id,
d.Type,
StartDate = CASE WHEN d2.StartDate < d.StartDate THEN d2.StartDate ELSE d.StartDate END,
EndDate = CASE WHEN d2.EndDate > d.EndDate THEN d2.EndDate ELSE d.EndDate END,
CurrentStart = d2.StartDate
FROM cdata d
INNER JOIN #Data d2
ON (
d.StartDate <= d2.EndDate
AND d.EndDate >= d2.StartDate
)
AND d2.Id = d.Id
AND d2.Type = d.Type
AND d2.StartDate > d.CurrentStart)
SELECT cdata.Id, cdata.Type, cdata.StartDate, EndDate = MAX(cdata.EndDate)
FROM cdata
GROUP BY cdata.Id, cdata.Type, cdata.StartDate
This looks like a Packing Intervals problem. See the post by Itzik Ben-Gan for all the details and what indexes he recommends to make it work efficiently. He presents a solution without recursive CTE.
Two notes.
The query below assumes that intervals are [closed; open), i.e. StartDate is inclusive and EndDate is exclusive. This way to represent such data is often the most convenient. (in the same sense as having arrays as zero-based instead of 1-based is usually more convenient in programming languages).
I added a RowID column to have unambiguous sorting.
Sample data
DECLARE #T TABLE
(
RowID int IDENTITY,
id int,
StartDate date,
EndDate date,
tp int
);
INSERT INTO #T(Id, StartDate, EndDate, tp) VALUES
(1, '2012-02-18', '2012-03-18', 1),
(1, '2012-03-17', '2012-06-29', 1),
(1, '2012-06-27', '2012-09-27', 1),
(1, '2014-08-23', '2014-09-24', 3),
(1, '2014-09-23', '2014-10-24', 3),
(1, '2014-10-23', '2014-11-24', 3),
(2, '2015-07-04', '2015-08-06', 1),
(2, '2015-08-04', '2015-09-06', 1),
(3, '2013-11-01', '2013-12-01', 0),
(3, '2018-01-09', '2018-02-09', 0);
-- Make EndDate an opened interval, make it exclusive
-- [Start; End)
UPDATE #T
SET EndDate = DATEADD(day, 1, EndDate)
;
Recommended indexes
-- indexes to support solutions
CREATE UNIQUE INDEX idx_start_id ON T(id, tp, StartDate, RowID);
CREATE UNIQUE INDEX idx_end_id ON T(id, tp, EndDate, RowID);
Query
Read the Itzik's post to understand what is going on. He has nice illustrations there. In short, each timestamp (start or end) is treated as an event. Each event has a + or - type. Each time we encounter a + event (some interval starts) we increase the running counter. Each time we encounter a - event (some interval ends) we decrease the running counter. When the running counter is 0 it means that the streak of overlapping intervals is over.
I took Itzik's query as is and simply changed the column names to match your names.
WITH C1 AS
-- let e = end ordinals, let s = start ordinals
(
SELECT
RowID, id, tp, StartDate AS ts, +1 AS EventType,
NULL AS e,
ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY StartDate, RowID) AS s
FROM #T
UNION ALL
SELECT
RowID, id, tp, EndDate AS ts, -1 AS EventType,
ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY EndDate, RowID) AS e,
NULL AS s
FROM #T
),
C2 AS
-- let se = start or end ordinal, namely, how many events (start or end) happened so far
(
SELECT C1.*,
ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY ts, EventType DESC, RowID) AS se
FROM C1
),
C3 AS
-- For start events, the expression s - (se - s) - 1 represents how many sessions were active
-- just before the current (hence - 1)
--
-- For end events, the expression (se - e) - e represents how many sessions are active
-- right after this one
--
-- The above two expressions are 0 exactly when a group of packed intervals
-- either starts or ends, respectively
--
-- After filtering only events when a group of packed intervals either starts or ends,
-- group each pair of adjacent start/end events
(
SELECT id, tp, ts,
((ROW_NUMBER() OVER(PARTITION BY id, tp ORDER BY ts) - 1) / 2 + 1)
AS grpnum
FROM C2
WHERE COALESCE(s - (se - s) - 1, (se - e) - e) = 0
)
SELECT id, tp, MIN(ts) AS StartDate, DATEADD(day, -1, MAX(ts)) AS EndDate
FROM C3
GROUP BY id, tp, grpnum
ORDER BY id, tp, StartDate;
Result
+----+----+------------+------------+
| id | tp | StartDate | EndDate |
+----+----+------------+------------+
| 1 | 1 | 2012-02-18 | 2012-09-27 |
| 1 | 3 | 2014-08-23 | 2014-11-24 |
| 2 | 1 | 2015-07-04 | 2015-09-06 |
| 3 | 0 | 2013-11-01 | 2013-12-01 |
| 3 | 0 | 2018-01-09 | 2018-02-09 |
+----+----+------------+------------+
create table #table
(Id int,StartDate date, EndDate date, Type int)
insert into #table
values
('1','2012-02-18','2012-03-18','1'),('1','2012-03-19','2012-06-19','1'),
('1','2012-06-27','2012-09-27','1'),('1','2014-08-23','2014-09-24','3'),
('1','2014-09-23','2014-10-24','3'),('1','2014-10-23','2014-11-24','3'),
('2','2015-07-04','2015-08-06','1'),('2','2015-08-04','2015-09-06','1'),
('3','2013-11-01','2013-12-01','0'),('3','2018-01-09','2018-02-09','0')
select ID,MIN(startdate)sd,MAX(EndDate)ed,type from #table
group by ID,TYPE,YEAR(startdate),YEAR(EndDate)
this can be easily achieved by using some window-functions and CTE's. Here is the solution
DECLARE #table TABLE
(id INT,
StartDate DATE,
EndDate DATE,
[Type] INT
);
INSERT INTO #table(Id, StartDate, EndDate, [Type]) VALUES
(1, '2012-02-18', '2012-03-18', 1),
(1, '2012-03-17', '2012-06-29', 1),
(1, '2012-06-27', '2012-09-27', 1),
(1, '2014-08-23', '2014-09-24', 3),
(1, '2014-09-23', '2014-10-24', 3),
(1, '2014-10-23', '2014-11-24', 3),
(2, '2015-07-04', '2015-08-06', 1),
(2, '2015-08-04', '2015-09-06', 1),
(3, '2013-11-01', '2013-12-01', 0),
(3, '2018-01-09', '2018-02-09', 0);
WITH C1 AS
(
SELECT *,
MAX(EndDate) OVER(PARTITION BY Id, [Type]
ORDER BY StartDate, EndDate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS PrevEnd
FROM #table
),
C2 AS
(
SELECT *,
SUM(StartFlag) OVER(PARTITION BY Id, [Type]
ORDER BY StartDate, EndDate
ROWS UNBOUNDED PRECEDING) AS GroupID
FROM C1
CROSS APPLY ( VALUES(CASE WHEN StartDate <= PrevEnd THEN NULL ELSE 1 END) ) AS A(StartFlag)
)
SELECT Id, [Type], MIN(StartDate) AS StartDate, MAX(EndDate) AS EndDate
FROM C2
GROUP BY Id, [Type], GroupID;

Date range with minimum and maximum dates from dataset having records with continuous date range

I have a dataset with id ,Status and date range of employees.
The input dataset given below are the details of one employee.
The date ranges in the records are continuous(in exact order) such that startdate of second row will be the next date of enddate of first row.
If an employee takes leave continuously for different months, then the table is storing the info with date range as separated for different months.
For example: In the input set, the employee has taken Sick leave from '16-10-2016' to '31-12-2016' and joined back on '1-1-2017'.
So there are 3 records for this item but the dates are continuous.
In the output I need this as one record as shown in the expected output dataset.
INPUT
Id Status StartDate EndDate
1 Active 1-9-2007 15-10-2016
1 Sick 16-10-2016 31-10-2016
1 Sick 1-11-2016 30-11-2016
1 Sick 1-12-2016 31-12-2016
1 Active 1-1-2017 4-2-2017
1 Unpaid 5-2-2017 9-2-2017
1 Active 10-2-2017 11-2-2017
1 Unpaid 12-2-2017 28-2-2017
1 Unpaid 1-3-2017 31-3-2017
1 Unpaid 1-4-2017 30-4-2017
1 Active 1-5-2017 13-10-2017
1 Sick 14-10-2017 11-11-2017
1 Active 12-11-2017 NULL
EXPECTED OUTPUT
Id Status StartDate EndDate
1 Active 1-9-2007 15-10-2016
1 Sick 16-10-2016 31-12-2016
1 Active 1-1-2017 4-2-2017
1 Unpaid 5-2-2017 9-2-2017
1 Active 10-2-2017 11-2-2017
1 Unpaid 12-2-2017 30-4-2017
1 Active 1-5-2017 13-10-2017
1 Sick 14-10-2017 11-11-2017
1 Active 12-11-2017 NULL
I can't take min(startdate) and max(EndDate) group by id,status because if the same employee has taken another Sick leave then that end date ('11-11-2017' in the example) will come as the End date.
can anyone help me with the query in SQL server 2014?
It suddenly hit me that this is basically a gaps and islands problem - so I've completely changed my solution.
For this solution to work, the dates does not have to be consecutive.
First, create and populate sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE
(
Id int,
Status varchar(10),
StartDate date,
EndDate date
);
SET DATEFORMAT DMY; -- This is needed because how you specified your dates.
INSERT INTO #T (Id, Status, StartDate, EndDate) VALUES
(1, 'Active', '1-9-2007', '15-10-2016'),
(1, 'Sick', '16-10-2016', '31-10-2016'),
(1, 'Sick', '1-11-2016', '30-11-2016'),
(1, 'Sick', '1-12-2016', '31-12-2016'),
(1, 'Active', '1-1-2017', '4-2-2017'),
(1, 'Unpaid', '5-2-2017', '9-2-2017'),
(1, 'Active', '10-2-2017', '11-2-2017'),
(1, 'Unpaid', '12-2-2017', '28-2-2017'),
(1, 'Unpaid', '1-3-2017', '31-3-2017'),
(1, 'Unpaid', '1-4-2017', '30-4-2017'),
(1, 'Active', '1-5-2017', '13-10-2017'),
(1, 'Sick', '14-10-2017', '11-11-2017'),
(1, 'Active', '12-11-2017', NULL);
The (new) common table expression:
;WITH CTE AS
(
SELECT Id,
Status,
StartDate,
EndDate,
ROW_NUMBER() OVER(PARTITION BY Id ORDER BY StartDate)
- ROW_NUMBER() OVER(PARTITION BY Id, Status ORDER BY StartDate) As IslandId,
ROW_NUMBER() OVER(PARTITION BY Id ORDER BY StartDate DESC)
- ROW_NUMBER() OVER(PARTITION BY Id, Status ORDER BY StartDate DESC) As ReverseIslandId
FROM #T
)
The (new) query:
SELECT DISTINCT Id,
Status,
MIN(StartDate) OVER(PARTITION BY IslandId, ReverseIslandId) As StartDate,
NULLIF(MAX(ISNULL(EndDate, '9999-12-31')) OVER(PARTITION BY IslandId, ReverseIslandId), '9999-12-31') As EndDate
FROM CTE
ORDER BY StartDate
(new) Results:
Id Status StartDate EndDate
1 Active 01.09.2007 15.10.2016
1 Sick 16.10.2016 31.12.2016
1 Active 01.01.2017 04.02.2017
1 Unpaid 05.02.2017 09.02.2017
1 Active 10.02.2017 11.02.2017
1 Unpaid 12.02.2017 30.04.2017
1 Active 01.05.2017 13.10.2017
1 Sick 14.10.2017 11.11.2017
1 Active 12.11.2017 NULL
You can see a live demo on rextester.
Please note that string representation of dates in SQL should be acccording to ISO 8601 - meaning either yyyy-MM-dd or yyyyMMdd as it's unambiguous and will always be interpreted correctly by SQL Server.
It's an example of GROUPING AND WINDOW.
First you set a reset point for each Status
Sum to set a group
Then get max/min dates of each group.
;with x as
(
select Id, Status, StartDate, EndDate,
iif (lag(Status) over (order by Id, StartDate) = Status, null, 1) rst
from emp
), y as
(
select Id, Status, StartDate, EndDate,
sum(rst) over (order by Id, StartDate) grp
from x
)
select Id,
MIN(Status) as Status,
MIN(StartDate) StartDate,
MAX(EndDate) EndDate
from y
group by Id, grp
order by Id, grp
GO
Id | Status | StartDate | EndDate
-: | :----- | :------------------ | :------------------
1 | Active | 01/09/2007 00:00:00 | 15/10/2016 00:00:00
1 | Sick | 16/10/2016 00:00:00 | 31/12/2016 00:00:00
1 | Active | 01/01/2017 00:00:00 | 04/02/2017 00:00:00
1 | Unpaid | 05/02/2017 00:00:00 | 09/02/2017 00:00:00
1 | Active | 10/02/2017 00:00:00 | 11/02/2017 00:00:00
1 | Unpaid | 12/02/2017 00:00:00 | 30/04/2017 00:00:00
1 | Active | 01/05/2017 00:00:00 | 13/10/2017 00:00:00
1 | Sick | 14/10/2017 00:00:00 | 11/11/2017 00:00:00
1 | Active | 12/11/2017 00:00:00 | null
dbfiddle here
Here's an alternative answer that doesn't use LAG.
First I need to take a copy of your test data:
DECLARE #table TABLE (Id INT, [Status] VARCHAR(50), StartDate DATE, EndDate DATE);
INSERT INTO #table SELECT 1, 'Active', '20070901', '20161015';
INSERT INTO #table SELECT 1, 'Sick', '20161016', '20161031';
INSERT INTO #table SELECT 1, 'Sick', '20161101', '20161130';
INSERT INTO #table SELECT 1, 'Sick', '20161201', '20161231';
INSERT INTO #table SELECT 1, 'Active', '20170101', '20170204';
INSERT INTO #table SELECT 1, 'Unpaid', '20170205', '20170209';
INSERT INTO #table SELECT 1, 'Active', '20170210', '20170211';
INSERT INTO #table SELECT 1, 'Unpaid', '20170212', '20170228';
INSERT INTO #table SELECT 1, 'Unpaid', '20170301', '20170331';
INSERT INTO #table SELECT 1, 'Unpaid', '20170401', '20170430';
INSERT INTO #table SELECT 1, 'Active', '20170501', '20171013';
INSERT INTO #table SELECT 1, 'Sick', '20171014', '20171111';
INSERT INTO #table SELECT 1, 'Active', '20171112', NULL;
Then the query is:
WITH add_order AS (
SELECT
*,
ROW_NUMBER() OVER (ORDER BY StartDate) AS order_id
FROM
#table),
links AS (
SELECT
a1.Id,
a1.[Status],
a1.order_id,
MIN(a1.order_id) AS start_order_id,
MAX(ISNULL(a2.order_id, a1.order_id)) AS end_order_id,
MIN(a1.StartDate) AS StartDate,
MAX(ISNULL(a2.EndDate, a1.EndDate)) AS EndDate
FROM
add_order a1
LEFT JOIN add_order a2 ON a2.Id = a1.Id AND a2.[Status] = a1.[Status] AND a2.order_id = a1.order_id + 1 AND a2.StartDate = DATEADD(DAY, 1, a1.EndDate)
GROUP BY
a1.Id,
a1.[Status],
a1.order_id),
merged AS (
SELECT
l1.Id,
l1.[Status],
l1.[StartDate],
ISNULL(l2.EndDate, l1.EndDate) AS EndDate,
ROW_NUMBER() OVER (PARTITION BY l1.Id, l1.[Status], ISNULL(l2.EndDate, l1.EndDate) ORDER BY l1.order_id) AS link_id
FROM
links l1
LEFT JOIN links l2 ON l2.order_id = l1.end_order_id)
SELECT
Id,
[Status],
StartDate,
EndDate
FROM
merged
WHERE
link_id = 1
ORDER BY
StartDate;
Results are:
Id Status StartDate EndDate
1 Active 2007-09-01 2016-10-15
1 Sick 2016-10-16 2016-12-31
1 Active 2017-01-01 2017-02-04
1 Unpaid 2017-02-05 2017-02-09
1 Active 2017-02-10 2017-02-11
1 Unpaid 2017-02-12 2017-04-30
1 Active 2017-05-01 2017-10-13
1 Sick 2017-10-14 2017-11-11
1 Active 2017-11-12 NULL
How does it work? First I add a sequence number, to assist with merging contiguous rows together. Then I determine the rows that can be merged together, add a number to identify the first row in each set that can be merged, and finally pick the first rows out of the final CTE. Note that I also have to handle rows that can't be merged, hence the LEFT JOINs and ISNULL statements.
Just for interest, this is what the output from the final CTE looks like, before I filter out all but the rows with a link_id of 1:
Id Status StartDate EndDate link_id
1 Active 2007-09-01 2016-10-15 1
1 Sick 2016-10-16 2016-12-31 1
1 Sick 2016-11-01 2016-12-31 2
1 Sick 2016-12-01 2016-12-31 3
1 Active 2017-01-01 2017-02-04 1
1 Unpaid 2017-02-05 2017-02-09 1
1 Active 2017-02-10 2017-02-11 1
1 Unpaid 2017-02-12 2017-04-30 1
1 Unpaid 2017-03-01 2017-04-30 2
1 Unpaid 2017-04-01 2017-04-30 3
1 Active 2017-05-01 2017-10-13 1
1 Sick 2017-10-14 2017-11-11 1
1 Active 2017-11-12 NULL 1
You could use lag() and lead() function together to check the previous and next status
WITH CTE AS
(
select *,
COALESCE(LEAD(status) OVER(ORDER BY (select 1)), '0') Nstatus,
COALESCE(LAG(status) OVER(ORDER BY (select 1)), '0') Pstatus
from table
)
SELECT * FROM CTE
WHERE (status <> Nstatus AND status <> Pstatus) OR
(status <> Pstatus)

SQL Server - MyCTE query based on 24 hour period (next day)

I have this bit of code:
;WITH MyCTE AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY CardUser ORDER BY CardTableID) AS NewVariation
FROM CardChecker
)
UPDATE MyCTE
SET Status = NewVariation
which currently updates the status column, however what I want to happen is over a 24 hour period, the status starts again the next day at 1, and counts again based on the CardUser like specified above:
Current data and what happens:
2 aaa 1 2015-06-25 08:00:00.000 123 1 NULL
3 ccc 1 2015-06-25 00:00:00.000 124 1 NULL
4 aaa 1 2015-06-25 17:30:00.000 125 2 NULL
5 aaa 1 2015-06-26 17:30:00.000 125 *3* NULL
what I want to happen:
2 aaa 1 2015-06-25 08:00:00.000 123 1 NULL
3 ccc 1 2015-06-25 00:00:00.000 124 1 NULL
4 aaa 1 2015-06-25 17:30:00.000 125 2 NULL
5 aaa 1 2015-06-26 17:30:00.000 125 *1* NULL
im not quite sure how I could add this to the above query so would it be possible for someone to point me in the right direction?
the main problem is the EventTime field contains both the date and the time, so adding it is as a PARTITION means the status would always be 1 based on the time parameter of the field
thanks for the help
Current CardTable structure:
CREATE TABLE CardTable (CardTableID INT IDENTITY (1,1) NOT NULL,
CardUser VARCHAR(50),
CardNumber VARCHAR(50),
EventTime DATETIME,
Status INT)
You can CONVERT() the EventTime to DATE type and then PARTITION:
;WITH MyCTE AS
(
SELECT Status,
ROW_NUMBER() OVER(PARTITION BY CardUser, CONVERT(DATE, EventTime)
ORDER BY CardTableID) AS NewVariation
FROM CardChecker
)
UPDATE MyCTE
SET Status = NewVariation
Your query basically unnecessarily updating entire table everytime. If EventTime is current date time of the system, having a flag to mark already updated status would improve the performance.
;WITH MyCTE AS
(
SELECT Status,
ROW_NUMBER() OVER(PARTITION BY CardUser, CONVERT(DATE, EventTime)
ORDER BY CardTableID) AS NewVariation
FROM CardChecker
WHERE Status IS NULL OR
CONVERT(DATE, EventTime) = CONVERT(DATE, GETDATE())
)
UPDATE MyCTE
SET Status = NewVariation

SQL JOIN combining columns to a single column

I've written the following query for Microsoft SQL Server 2008 R2 ...
with
downloads as
(
select convert(varchar(10), timestamp, 112) as downloadDate, COUNT(*) as counter
from <download_table>
group by convert(varchar(10), timestamp,112)
),
uploads as
(
select CONVERT(varchar(10), dateadded, 112) as uploadDate, COUNT(*) as counter
from <upload_table>
group by CONVERT(varchar(10), dateadded, 112)
)
select
downloads.downloadDate,
uploads.uploadDate,
downloads.counter as dCount,
uploads.counter as uCount
from downloads
full join uploads on uploads.uploadDate = downloads.downloadDate
order by downloadDate desc;
which returns the following table...
downloadDate uploadDate dCount uCount
20121211 NULL 40 NULL
20121210 NULL 238 NULL
20121207 20121207 526 4
20121206 20121206 217 12
20121205 NULL 108 NULL
20121204 20121204 190 13
20121203 NULL 141 NULL
20121130 20121130 248 187
20121129 NULL 134 NULL
20121128 NULL 102 NULL
20121127 20121127 494 57
20121126 NULL 153 NULL
20121119 20121119 319 20
20121118 NULL 4 NULL
20121116 20121116 215 16
20121112 20121112 431 144
20121109 20121109 168 48
20121108 20121108 132 181
NULL 20121125 NULL 3
but I can't get the two dates to combine into a single 'date' column without getting some NULL entries, nor can I get the NULL values in the dCount or uCount to display 0 instead of NULL.
Can somebody help me with this please ?
In SQL Serve,r you can use COALESCE around the date field which returns the first non-null value and ISNULL around the count totals to replace the null value with zero:
with
downloads as
(
select convert(varchar(10), timestamp, 112) as downloadDate, COUNT(*) as counter
from download_table
group by convert(varchar(10), timestamp,112)
),
uploads as
(
select CONVERT(varchar(10), dateadded, 112) as uploadDate, COUNT(*) as counter
from upload_table
group by CONVERT(varchar(10), dateadded, 112)
)
select
coalesce(downloads.downloadDate, uploads.uploadDate) as dDate,
isnull(downloads.counter, 0) as dCount,
isnull(uploads.counter, 0) as uCount
from downloads
full join uploads
on uploads.uploadDate = downloads.downloadDate
order by downloadDate desc;
See SQL Fiddle with Demo
Result:
| DDATE | DCOUNT | UCOUNT |
------------------------------
| 20121211 | 2 | 0 |
| 20121210 | 1 | 1 |
| 20121207 | 1 | 0 |
| 20121206 | 2 | 1 |
| 20121208 | 0 | 1 |
| 20121209 | 0 | 1 |
| 20121204 | 0 | 1 |
| 20121205 | 0 | 1 |
Depending on your SQL dialect, something like IFNULL(), NVL(), COALESCE(), IIF() etc. will help you get rid of the NULLs in favour of a date in the past, such as '18000101'.
After having done that, you can use MAX(), SWITCH(), IIF(), IF() or friends to make a single "last usage date" column.
You can use coalesce and nvl like this:
with
downloads as
(
select convert(varchar(10), timestamp, 112) as downloadDate, COUNT(*) as counter
from <download_table>
group by convert(varchar(10), timestamp,112)
),
uploads as
(
select CONVERT(varchar(10), dateadded, 112) as uploadDate, COUNT(*) as counter
from <upload_table>
group by CONVERT(varchar(10), dateadded, 112)
)
select
coalesce(downloads.downloadDate, uploads.uploadDate) as dDate,
nvl(downloads.counter, 0) as dCount,
nvl(uploads.counter, 0) as uCount
from downloads
full join uploads on uploads.uploadDate = downloads.downloadDate
order by downloadDate desc;
Firstly, it is much better to cast a datetime to a date if you want to remove the time element, rather than converting to varchar, in SQL-Server 2008 you can simply use:
CAST(DateAdded AS DATE)
Then rather than using a FULL JOIN I would do this do this using UNION ALL, it should perform better (although I can't say 100% without testing on your actual data).
WITH Data AS
( SELECT [Date] = CAST(Timestamp AS DATE),
[Downloads] = 1,
[Uploads] = 0
FROM Download_Table
UNION ALL
SELECT [Date] = CAST(DateAdded AS DATE),
[Downloads] = 0,
[Uploads] = 1
FROM Upload_Table
)
SELECT [Date],
[Downloads] = SUM(Downloads),
[Uploads] = SUM(Uploads)
FROM Data
GROUP BY [Date]
ORDER BY [Date];