Increment column for streaks - sql

How do I get the following result highlighted in yellow?
Essentially I want a calculated field which increments by 1 when VeganOption = 1 and is zero when VeganOption = 0
I have tried using the following query but using partition continues to increment after a zero. I'm a bit stuck on this one.
SELECT [UniqueId]
,[Meal]
,[VDate]
,[VeganOption]
, row_number() over (partition by [VeganOption] order by [UniqueId])
FROM [Control]
order by [UniqueId]
Table Data:
CREATE TABLE Control
([UniqueId] int, [Meal] varchar(10), [VDate] datetime, [VeganOption] int);
INSERT INTO Control ([UniqueId], [Meal], [VDate], [VeganOption])
VALUES
('1', 'Breakfast',' 2018-08-01 00:00:00', 1),
('2', 'Lunch',' 2018-08-01 00:00:00', 1),
('3', 'Dinner',' 2018-08-01 00:00:00', 1),
('4', 'Breakfast',' 2018-08-02 00:00:00', 1),
('5', 'Lunch',' 2018-08-02 00:00:00', 0),
('6', 'Dinner',' 2018-08-02 00:00:00', 0),
('7', 'Breakfast',' 2018-08-03 00:00:00', 1),
('8', 'Lunch',' 2018-08-03 00:00:00', 1),
('9', 'Dinner',' 2018-08-03 00:00:00', 1),
('10', 'Breakfast',' 2018-08-04 00:00:00', 0),
('11', 'Lunch',' 2018-08-04 00:00:00', 1),
('12', 'Dinner',' 2018-08-04 00:00:00', 1)
;
This is for SQL Server 2016+

You could create subgroups using SUM and then ROW_NUMBER:
WITH cte AS (
SELECT [UniqueId]
,[Meal]
,[VDate]
,[VeganOption]
,sum(CASE WHEN VeganOption = 1 THEN 0 ELSE 1 END)
over (order by [UniqueId]) AS grp --switching 0 <-> 1
FROM [Control]
)
SELECT *,CASE WHEN VeganOption =0 THEN 0
ELSE ROW_NUMBER() OVER(PARTITION BY veganOption, grp ORDER BY [UniqueId])
END AS VeganStreak -- main group and calculated subgroup
FROM cte
order by [UniqueId];
Rextester Demo

This is a variant on gaps-and-islands.
I like to define streaks using the difference of row numbers. This looks like
select c.*,
(case when veganoption = 1
then row_number() over (partition by veganoption, seqnum - seqnum_v order by uniqueid)
else 0
end) as veganstreak
from (select c.*,
row_number() over (partition by veganoption order by uniqueid) as seqnum_v,
row_number() over (order by uniqueid) as seqnum
from c
) c;
Why this works is a bit hard to explain. But, if you look at the results of the subquery, you'll see how the difference of row numbers defines the streaks you want to identify. The rest is just applying row_number() to enumerate the meals.
Here is a Rextester.

One method is to use a CTE to define your groupings, and then do a further ROW_NUMBER() on those, resulting in:
WITH Grps AS(
SELECT *,
ROW_NUMBER() OVER (ORDER BY UniqueID ASC) -
ROW_NUMBER() OVER (PARTITION BY VeganOption ORDER BY UniqueID ASC) AS Grp
FROM Control)
SELECT *,
CASE VeganOption WHEN 0 THEN 0 ELSE ROW_NUMBER() OVER (PARTITION BY Grp ORDER BY UniqueID ASC) END
FROM Grps
ORDER BY UniqueId;

Related

Get last date of modification in database by value

How it is possible to get - when was the last change (by date) - in
this table:
id
date
value
1
01.01.2021
0.0
1
02.01.2021
10.0
1
03.01.2021
15.0
1
04.01.2021
25.0
1
05.01.2021
25.0
1
06.01.2021
25.0
Of course I could use clause where and it will works, but i have a lot of rows and for some i don't now exactly day when this happend.
The resault should be:
id
date
value
1
04.01.2021
25.0
Try this one:
with mytable as (
select 1 as id, date '2021-01-01' as date, 0 as value union all
select 1, date '2021-01-02', 10 union all
select 1, date '2021-01-03', 15 union all
select 1, date '2021-01-04', 25 union all
select 1, date '2021-01-05', 25 union all
select 1, date '2021-01-06', 25
)
select id, array_agg(struct(date, value) order by last_change_date desc limit 1)[offset(0)].*
from (
select *, if(value != lag(value) over (partition by id order by date), date, null) as last_change_date
from mytable
)
group by id
in this scenario I would be using two field in my database "created_at and updated_at" with the type as "timestamp". You may simply fetch your records using OrderBY "updated_at" field.
see what this gives you:
SELECT MAX(date) OVER (PARTITION BY(value)) AS lastChange
FROM Table
WHERE id = 1
The following query and reproducible example on db-fiddle works. I've also included some additional test records.
CREATE TABLE my_data (
`id` INTEGER,
`date` date,
`value` INTEGER
);
INSERT INTO my_data
(`id`, `date`, `value`)
VALUES
('1', '01.01.2021', '0.0'),
('1', '02.01.2021', '10.0'),
('1', '03.01.2021', '15.0'),
('1', '04.01.2021', '25.0'),
('1', '05.01.2021', '25.0'),
('1', '06.01.2021', '25.0'),
('2', '05.01.2021', '25.0'),
('2', '06.01.2021', '23.0'),
('3', '03.01.2021', '15.0'),
('3', '04.01.2021', '25.0'),
('3', '05.01.2021', '17.0'),
('3', '06.01.2021', '17.0');
Query #1
SELECT
id,
date,
value
FROM (
SELECT
*,
row_number() over (partition by id order by date desc) as id_rank
FROM (
SELECT
id,
m1.date,
m1.value,
rank() over (partition by id,m1.value order by date asc) as id_value_rank,
CASE
WHEN (m1.date = (max(m1.date) over (partition by id,m1.value ))) THEN 1
ELSE 0
END AS is_max_date_for_group,
CASE
WHEN (m1.date = (max(m1.date) over (partition by id ))) THEN 1
ELSE 0
END AS is_max_date_for_id
from
my_data m1
) m2
WHERE (m2.is_max_date_for_group = m2.is_max_date_for_id and is_max_date_for_group <> 0 and id_value_rank=1) or (id_value_rank=1 and is_max_date_for_id=0)
) t
where t.id_rank=1
order by id, date, value;
id
date
value
1
04.01.2021
25
2
06.01.2021
23
3
05.01.2021
17
View on DB Fiddle
I actually find that the simplest method is to enumerate the rows by id/date and by id/date/value in descending order. These are the same for the last group . . . and the rest is aggregation:
select id, min(date), value
from (select t.*,
row_number() over (partition by id order by date desc) as seqnum,
row_number() over (partition by id, value order by date desc) as seqnum_2
from t
) t
where seqnum = seqnum_2
group by id;
If you use lag(), I would recommend using qualify for performance:
select t.*
from (select t.*
from t
qualify lag(value) over (partition by id order by date) <> value or
lag(value) over (partition by id order by date) is null
) t
qualify row_number() over (partition by id order by date desc) = 1;
Note: Both of these work if the value is the same for all rows. Other methods may not work in that situation.

Calculating average by using the previous row's value and following row's value

I have calculated average values for each month. Some months are NULL and my manager wants me to use the previous row's value and following month's value and fill the months which are having NULL values.
Current result (see below pic):
Expected Result
DECLARE #DATE DATE = '2017-01-01';
WITH DATEDIM AS
(
SELECT DISTINCT DTM.FirstDayOfMonth
FROM DATEDIM DTM
WHERE Date >= '01/01/2017'
AND Date <= DATEADD(mm,-1,Getdate())
),
Tab1 AS
(
SELECT
T1.FirstDayOfMonth AS MONTH_START,
AVG1,
ROW_NUMBER() OVER (
ORDER BY DATEADD(MM,DATEDIFF(MM, 0, T1.FirstDayOfMonth),0) DESC
) AS RNK
FROM DATEDIM T1
LEFT OUTER JOIN (
SELECT
DATEADD(MM,DATEDIFF(MM, 0, StartDate),0) MONTH_START,
AVG(CAST(DATEDIFF(dd, StartDate, EndDate) AS FLOAT)) AS AVG1
FROM DATATable
WHERE EndDate >= StartDate
AND StartDate >= #DATE
AND EndDate >= #DATE
GROUP BY DATEADD(MM,DATEDIFF(MM, 0, StartDate),0)
) T2 ON T1.FirstDayOfMonth = T2.MONTH_START
)
SELECT *
FROM Tab1
Using your CTEs
select MONTH_START,
case when AVG1 is null then
(select top(1) t2.AVG1
from Tab1 t2
where t1.RNK > t2.RNK and t2.AVG1 is not null
order by t2.RNK desc)
else AVG1 end AVG1,
RNK
from Tab1 t1
Edit
Version for an average of nearest peceding and nearest following non-nulls. Both must exist otherwise NULL is returned.
select MONTH_START,
case when AVG1 is null then
( (select top(1) t2.AVG1
from Tab1 t2
where t1.RNK > t2.RNK and t2.AVG1 is not null
order by t2.RNK desc)
+(select top(1) t2.AVG1
from Tab1 t2
where t1.RNK < t2.RNK and t2.AVG1 is not null
order by t2.RNK)
) / 2
else AVG1 end AVG1,
RNK
from Tab1 t1
I can't quite tell what you are trying to calculate the average of, but this is quite simple with window functions:
select t.*,
avg(val) over (order by month_start rows between 1 preceding and 1 rollowing)
from t;
In your case, I think this translates as:
select datefromparts(year(startdate), month(startdate), 1) as float,
avg(val) as monthaverage,
avg(avg(val)) over (order by min(startdate) rows between 1 preceding and 1 following)
from datatable d
where . . .
group by datefromparts(year(startdate), month(startdate), 1)
You can manipulate previous and following row values using window functions:
SELECT MAX(row_value) OVER(
ORDER BY ... ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS Previous_Value,
MAX(row_value) OVER(
ORDER BY ... ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS Next_Value
Alternatively you can use LAG/LEAD functions and modify your sub-query where you get the AVG:
SELECT
src.MONTH_START,
CASE
WHEN src.prev_val IS NULL OR src.next_val IS NULL
THEN COALESCE(src.prev_val, src.next_val) -- Return non-NULL value (if exists)
ELSE (src.prev_val + src.next_val ) / 2
END AS AVG_new
FROM (
SELECT
DATEADD(MM,DATEDIFF(MM, 0, StartDate),0) MONTH_START,
LEAD(CAST(DATEDIFF(dd, StartDate, EndDate) AS FLOAT)) OVER(ORDER BY ...) AS prev_val,
LAG(CAST(DATEDIFF(dd, StartDate, EndDate) AS FLOAT)) OVER(ORDER BY ...) AS next_val
-- AVG(CAST(DATEDIFF(dd, StartDate, EndDate) AS FLOAT)) AS AVG1
FROM DATATable
WHERE EndDate >= StartDate
AND StartDate >= #DATE
AND EndDate >= #DATE
GROUP BY DATEADD(MM,DATEDIFF(MM, 0, StartDate),0)
) AS src
I haven't tested it, but give it a shot and see how it works. You may need to put at least one column in the ORDER BY portion of the window function.
You could try this query (I just reflected in my sample data relevant parts, I omitted date column):
declare #tbl table (rank int, value int);
insert into #tbl values
(1, null),
(2, 20),
(3, 30),
(4, null),
(5, null),
(6, null),
(7, 40),
(8, null),
(9, null),
(10, 36),
(11, 22);
;with cte as (
select *,
DENSE_RANK() over (order by case when value is null then rank else value end) drank,
case when value is null then lag(value) over (order by rank) end lag,
case when value is null then lead(value) over (order by rank) end lead
from #tbl
)
select rank, value, case when value is null then
max(lag) over (partition by grp) / 2 +
max(lead) over (partition by grp) / 2
else value end valueWithAvg
from (
select *,
rank - drank grp from cte
) a order by rank

fetch two distinct rows with discontinued dates

I want to fetch two rows with discontinued dates from a data sample ex: end date of 1st row should be equal to start date of next row and I want to print whole two rows
tried lead but it did not work
select t1.*
from (select t.*, lead(cast(startdate as date)) over (order by currenykey,cast(enddate as date)) as next_start_date
from table t
) t1
where enddate <> next_start_date
start date end date
1 11/6/17 0:00.00 11/13/17 0:00.00
2 11/13/17 0:00.00 12/26/17 0:00.00
3 12/26/17 0:00.00 1/8/18 0:00.00
4 10/22/18 0:11.13 2/25/19 0:16.35
5 2/25/19 0:16.35 3/4/19 0:09.57
6 3/4/19 0:09.57 3/11/19 0:12.30
7 3/11/19 0:12.30 3/18/19 0:10.21
8 3/18/19 0:10.21 3/25/19 0:09.20
9 3/25/19 0:09.20 4/1/19 0:10.19
I want o print entire rows 3 and 4
If you're on SQL Server 2012 or later you could use the LAG and LEAD functions:
LAG (Transact-SQL)
LEAD (Transact-SQL)
For example...
declare #StackOverflow table (
[ID] int not null,
[StartDate] datetime not null,
[EndDate] datetime not null
);
insert #StackOverflow values
(1, '11/6/17 0:00.00', '11/13/17 0:00.00'),
(2, '11/13/17 0:00.00', '12/26/17 0:00.00'),
(3, '12/26/17 0:00.00', '1/8/18 0:00.00'),
(4, '10/22/18 0:11.13', '2/25/19 0:16.35'),
(5, '2/25/19 0:16.35', '3/4/19 0:09.57'),
(6, '3/4/19 0:09.57', '3/11/19 0:12.30'),
(7, '3/11/19 0:12.30', '3/18/19 0:10.21'),
(8, '3/18/19 0:10.21', '3/25/19 0:09.20'),
(9, '3/25/19 0:09.20', '4/1/19 0:10.19');
select [ID], [StartDate], [EndDate]
from (
select [ID],
[StartDate],
[EndDate],
[Previous] = cast(lag([EndDate]) over (order by [ID]) as date),
[Next] = cast(lead([StartDate]) over (order by [ID]) as date)
from #StackOverflow SO
) SO
where Previous != cast([StartDate] as date)
or Next != cast([EndDate] as date);
Which yields:
ID StartDate EndDate
3 26/12/2017 00:00:00 08/01/2018 00:00:00
4 22/10/2018 00:11:00 25/02/2019 00:16:00
Your query is on the right path, with two caveats:
You want to convert to dates for the comparison.
You need to compare both lead() and lag().
So:
select t.*
from (select t.*,
lead(startdate) over (order by startdate) as next_startdate,
lag(enddate) over (order by startdate) as prev_enddate
from t
) t
where convert(date, enddate) <> convert(date, next_startdate) or
convert(date, startdate) <> convert(date, prev_enddate) ;
That said, I think you are safer with not exists subqueries:
select *
from t
where (not exists (select 1
from t t2
where convert(date, t.startdate) = convert(date, t2.enddate)
) or
not exists (select 1
from t t2
where convert(date, t.enddate) = convert(date, t2.startdate)
)
) and
t.startdate <> (select min(t2.startdate) from t t2) and
t.startdate <> (select max(t2.startdate) from t t2) ;
Here is a db<>fiddle.
To understand why, consider what happens if the start date of line 3 changes. Here is an example where the two do not produce the same results.

Entry and Exit points on times series chart data

Is the following actually possible in SQL?
I have some time-series data and I want to extract some entry and exit points based on prices.
Desired output:
Example Data:
SQL Data:
CREATE TABLE Control
([PKey] int, [TimeStamp] datetime, [Name] varchar(10), [Price1] float, [Price2] float);
INSERT INTO Control ([PKey], [TimeStamp], [Name], [Price1], [Price2])
VALUES
(1,'2018-10-01 09:00:00', 'Name1',120, 125),
(2,'2018-10-01 09:10:00', 'Name1',110, 115),
(3,'2018-10-01 09:20:00', 'Name1',101, 106),
(4,'2018-10-01 09:30:00', 'Name1',105, 110),
(5,'2018-10-01 09:40:00', 'Name1',106, 111),
(6,'2018-10-01 09:50:00', 'Name1',108, 113),
(7,'2018-10-01 10:00:00', 'Name1',110, 115),
(8,'2018-10-01 10:10:00', 'Name1',104, 109),
(9,'2018-10-01 10:20:00', 'Name1',101, 106),
(10,'2018-10-01 10:30:00', 'Name1',99, 104),
(11,'2018-10-01 10:40:00', 'Name1',95, 100),
(12,'2018-10-01 10:50:00', 'Name1',101, 106),
(13,'2018-10-01 11:00:00', 'Name1',102, 107),
(14,'2018-10-01 11:10:00', 'Name1',101, 106),
(15,'2018-10-01 11:20:00', 'Name1',99, 104),
(16,'2018-10-01 11:30:00', 'Name1',105, 110),
(17,'2018-10-01 11:40:00', 'Name1',108, 113),
(18,'2018-10-01 11:50:00', 'Name1',108, 113),
(19,'2018-10-01 12:00:00', 'Name1',109, 114),
(20,'2018-10-01 12:10:00', 'Name1',108, 113),
(21,'2018-10-01 12:20:00', 'Name1',105, 110),
(22,'2018-10-01 12:30:00', 'Name1',101, 106),
(23,'2018-10-01 12:40:00', 'Name1',102, 107),
(24,'2018-10-01 09:00:00', 'Name2',103, 108),
(25,'2018-10-01 09:10:00', 'Name2',101, 106),
(26,'2018-10-01 09:20:00', 'Name2',104, 109),
(27,'2018-10-01 09:30:00', 'Name2',106, 111),
(28,'2018-10-01 09:40:00', 'Name2',108, 113),
(29,'2018-10-01 09:50:00', 'Name2',108, 113),
(30,'2018-10-01 10:00:00', 'Name2',105, 110),
(31,'2018-10-01 10:10:00', 'Name2',103, 108),
(32,'2018-10-01 10:20:00', 'Name2',101, 106),
(33,'2018-10-01 10:30:00', 'Name2',99, 104),
(34,'2018-10-01 10:40:00', 'Name2',101, 106),
(35,'2018-10-01 10:50:00', 'Name2',104, 109),
(36,'2018-10-01 11:00:00', 'Name2',101, 106),
(37,'2018-10-01 11:10:00', 'Name2',99, 104),
(38,'2018-10-01 11:20:00', 'Name2',106, 111),
(39,'2018-10-01 11:30:00', 'Name2',103, 108),
(40,'2018-10-01 11:40:00', 'Name2',105, 110),
(41,'2018-10-01 11:50:00', 'Name2',108, 113),
(42,'2018-10-01 12:00:00', 'Name2',105, 110),
(43,'2018-10-01 12:10:00', 'Name2',104, 109),
(44,'2018-10-01 12:20:00', 'Name2',108, 113),
(45,'2018-10-01 12:30:00', 'Name2',110, 115),
(46,'2018-10-01 12:40:00', 'Name2',105, 110)
;
What have I tried:
I am able to get the first instance of an entry and exit point using the following query which finds the first entry point PKey and then finds the first exit point after the entry point PKey
declare #EntryPrice1 float = 101.0; -- Entry when Price1 <= 101.0 (when not already Entered)
declare #ExitPrice2 float = 113.0; -- Exit when Price2 >= 113.0 (after Entry only)
select
t1.[Name]
,t2.[Entry PKey]
,min(case when t1.[Price2] >= #ExitPrice2 and t1.[PKey] > t2.[Entry PKey] then t1.[PKey] else null end) as [Exit PKey]
from [dbo].[Control] t1
left outer join
(select min(case when [Price1] <= #EntryPrice1 then [PKey] else null end) as [Entry PKey]
,[Name]
from [dbo].[Control]
group by [Name]) t2
on t1.[Name] = t2.[Name]
group by t1.[Name],t2.[Entry PKey]
--Name Entry PKey Exit PKey
--Name1 3 6
--Name2 25 28
I'm stuck on the approach to use that will allow multiple entry/exit points to be returned and not sure if it's even possible in SQL.
The logic for entry an exit points are
Entry - when price1 <= 101.0 and not already in an entry that has not exited.
Exit - when price2 >= 113.0 and inside an entry.
It's a kind of gaps and islands problem, this is a generic solution using Windowed Aggregates (should work for most DBMSes):
declare #EntryPrice1 float = 101.0; -- Entry when Price1 <= 101.0 (when not already Entered)
declare #ExitPrice2 float = 113.0; -- Exit when Price2 >= 113.0 (after Entry only)
WITH cte AS
( -- apply your logic to mark potential entry and exit rows
SELECT *
,CASE WHEN Price1 <= #EntryPrice1 THEN Timestamp END AS possibleEntry
,CASE WHEN Price2 >= #ExitPrice2 THEN Timestamp END AS possibleExit
,Max(CASE WHEN Price1 <= #EntryPrice1 THEN Timestamp END) -- most recent possibleEntry
Over (PARTITION BY Name
ORDER BY Timestamp
ROWS Unbounded Preceding) AS lastEntry
,Max(CASE WHEN Price2 >= #ExitPrice2 THEN Timestamp END) -- most recent possibleExit
Over (PARTITION BY Name
ORDER BY Timestamp
ROWS BETWEEN Unbounded Preceding AND 1 Preceding) AS lastExit
FROM [dbo].[Control]
)
-- SELECT * FROM cte ORDER BY Name, PKey
,groupRows AS
( -- mark rows from the 1st entry to the exit row
SELECT *
-- if lastEntry <= lastExit we're after an exit and before an entry -> don't return this row
,CASE WHEN lastEntry <= lastExit THEN 0 ELSE 1 END AS returnFlag
-- assign the same group number to consecutive rows in group
,Sum(CASE WHEN lastEntry <= lastExit THEN 1 ELSE 0 END)
Over (PARTITION BY Name
ORDER BY Timestamp
ROWS Unbounded Preceding) AS grp
FROM cte
WHERE (possibleEntry IS NOT NULL OR possibleExit IS NOT NULL)
AND lastEntry IS NOT NULL
)
-- SELECT * FROM groupRows ORDER BY Name, PKey
,rowNum AS
( -- get the data from the first and last row of an entry/exit group
SELECT *
-- to get the values of the 1st row in a group
,Row_Number() Over (PARTITION BY Name, grp ORDER BY Timestamp) AS rn
-- to get the values of the last row in a group
,Last_Value(Price2)
Over (PARTITION BY Name, grp
ORDER BY Timestamp
ROWS BETWEEN Unbounded Preceding AND Unbounded Following) AS ExitPrice
,Last_Value(possibleExit)
Over (PARTITION BY Name, grp
ORDER BY Timestamp
ROWS BETWEEN Unbounded Preceding AND Unbounded Following) AS ExitTimestamp
,Last_Value(CASE WHEN possibleExit IS NOT NULL THEN PKey END)
Over (PARTITION BY Name, grp
ORDER BY Timestamp
ROWS BETWEEN Unbounded Preceding AND Unbounded Following) AS ExitPKey
FROM groupRows
WHERE returnFlag = 1
)
SELECT Name
,Price1 AS EntryPrice
,ExitPrice
,Timestamp AS EntryTimestamp
,ExitTimestamp
,PKey AS EntryPKey
,ExitPKey
FROM rowNum
WHERE rn = 1 -- return 1st row of each group
ORDER BY Name, Timestamp
See dbfiddle
Of course it might be possible to simplify the logic or apply some proprietary SQL Server syntax...
This is a weird form of gaps-and-islands. Start with the very basic definitions of entry and exit:
select c.*,
(case when [Price1] <= #EntryPrice1 then 1 else 0 end) as is_entry,
(case when [Price2] >= #ExitPrice2 then 1 else 0 end) as is_exit
from control c;
This doesn't quite work because two adjacent "entries" count only as a single entry. We can get the information we need by looking at the previous entry/exit time. With that logic, we can determine which entries are "real". We might as well get the next exit time as well:
with cee as (
select c.*,
(case when [Price1] <= #EntryPrice1 then 1 else 0 end) as is_entry,
(case when [Price2] >= #ExitPrice2 then 1 else 0 end) as is_exit
from control c
),
cp as (
select cee.*,
max(case when is_entry = 1 then pkey end) over (partition by name order by timestamp rows between unbounded preceding and 1 preceding) as prev_entry,
max(case when is_exit = 1 then pkey end) over (partition by name order by timestamp) as prev_exit,
min(case when is_exit = 1 then pkey end) over (partition by name order by timestamp desc) as next_exit
from cee
)
Next, use this logic to generate a cumulative sum of real entries, and then do some fancy filtering:
with cee as (
select c.*,
(case when [Price1] <= #EntryPrice1 then 1 else 0 end) as is_entry,
(case when [Price1] >= #ExitPrice1 then 1 else 0 end) as is_exit
from control c
),
cp as (
select cee.*,
max(case when is_entry = 1 then pkey end) over (partition by name order by timestamp rows between unbounded preceding and 1 preceding) as prev_entry,
max(case when is_exit = 1 then pkey end) over (partition by name order by timestamp) as prev_exit,
min(case when is_exit = 1 then pkey end) over (partition by name order by timestamp desc) as next_exit
from cee
)
select *
from cp
where cp.is_entry = 1 and
(prev_entry is null or prev_exit > prev_entry)
This gives you the rows where the entry starts. You can join in to get the additional information you want.

How to group records when grouping should be skipped in a discontinued range?

Test data:
CREATE TABLE #Products
(Product VARCHAR(100), BeginDate DATETIME, EndDate DATETIME NULL, Rate INT);
INSERT INTO #Products (Product, BeginDate, EndDate, Rate)
VALUES ('Football', '01-01-1982', '05-03-2011', 2),
('Football', '05-04-2011', '08-01-2012', 1),
('Football', '08-02-2012', '01-01-2013', 2),
('Football', '01-02-2013', NULL, 3),
('Eggs', '01-01-1982', '05-03-2011', 1),
('Eggs', '05-04-2011', '08-01-2012', 1),
('Eggs', '08-02-2012', NULL, 1),
('Potato', '01-01-1982', '05-03-2011', 1),
('Potato', '05-04-2011', '08-01-2012', 1),
('Potato', '08-02-2012', '08-01-2013', 2),
('Potato', '08-02-2013', '08-01-2014', 2),
('Potato', '08-02-2014', '08-01-2015', 3),
('Potato', '08-02-2015', NULL, 3);
Expected result:
CREATE TABLE #Results
(Product VARCHAR(100), BeginDate DATETIME, EndDate DATETIME NULL, Rate INT);
INSERT INTO #Results (Product, BeginDate, EndDate, Rate)
VALUES ('Football', '01-01-1982', '05-03-2011', 2),
('Football', '05-04-2011', '08-01-2012', 1),
('Football', '08-02-2012', '01-01-2013', 2),
('Football', '01-02-2013', NULL, 3),
('Eggs', '01-01-1982', NULL, 1),
('Potato', '01-01-1982', '08-01-2012', 1),
('Potato', '08-02-2012', '08-01-2014', 2),
('Potato', '08-02-2014', NULL, 3);
I want to group by product and rate column, but skip grouping if rate change isn't continuous. for instance the case of football in the given test data. In case of football although there are two rows with Rate of 2, it shouldn't be grouped because there was a different rate for a time period. The BeginDate value will always be 1 day ahead of previous EndDate.
I tried group by but that didn't work.
This is an islands problem, one possible solution
SELECT Product, min(BeginDate), EndDate, rate
FROM (
SELECT Product, BeginDate, rate
,last_value(EndDate) over(partition by Product, Rate order by BeginDate
rows between unbounded preceding and unbounded following) EndDate
,row_number() over(partition by Product order by BeginDate) - row_number() over(partition by Product, Rate order by BeginDate) grp
FROM #Products
) t
GROUP BY Product, grp, EndDate, rate
ORDER BY Product, min(BeginDate)
Result
Product (No column name) EndDate rate
Eggs 01.01.1982 00:00:00 NULL 1
Football 01.01.1982 00:00:00 01.01.2013 00:00:00 2
Football 04.05.2011 00:00:00 01.08.2012 00:00:00 1
Football 02.08.2012 00:00:00 01.01.2013 00:00:00 2
Football 02.01.2013 00:00:00 NULL 3
Potato 01.01.1982 00:00:00 01.08.2012 00:00:00 1
Potato 02.08.2012 00:00:00 01.08.2014 00:00:00 2
Potato 02.08.2014 00:00:00 NULL 3
You can use lag to get the previous row's endDate and Rate and use a case expression to start a new group when the specified conditions aren't met. Use sum() over() to assign groups. Thereafter, you can use first_value window function to get the first beginDate, last endDate and the rate per product,group.
select distinct product,
first_value(begindate) over(partition by product,grp order by beginDate),
first_value(enddate) over(partition by product,grp order by beginDate desc),
max(rate) over(partition by product,grp)
from
(select p.*,
sum(case when datediff(day,prevEnd,beginDate)=1 and prevRate=Rate then 0 else 1 end)
over(partition by product order by beginDate) as grp
from
(select p.*,
lag(endDate,1,endDate) over(partition by product order by beginDate) as prevEnd,
lag(Rate,1,Rate) over(partition by product order by beginDate) as prevRate
from #Products p
) p
) p
Sample Demo
You can use Row_Number and query as below:
Select top (1) with ties * from (
SElect *, RowN = Row_number() over (partition by Product order by begindate) - Row_number() over (partition by product,rate order by begindate)
from #Products
) a order by row_number() over(partition by Product, Rate, RowN order by BeginDate)
I think this does it
select *
from ( select *
, lag(Rate, 1) over(partition by product order by beginDate) as prevRate
, lag(Product, 1) over(partition by product order by beginDate) as prevProduct
from #Products
) lag
where ( rate <> prevRate or prevRate is null ) and product = isnull(prevProduct, product)
order by Product, BeginDate