T-SQL, list of DATETIME, create <from> - <to> from it - sql

What would I need to do to achieve the following? Somehow I can't seem to find a good solution.
I have a few CTEs, and the last one is producing just a list of DATETIME values, with a row number column, and those are ordered by the DATETIME.
For example
rn datetime
---------------------------
1 2023-01-07 01:00:00.000
2 2023-01-08 05:30:00.000
3 2023-01-08 08:00:00.000
4 2023-01-09 21:30:00.000
How do I have to join this CTE with each other in order to get the following result:
from to
---------------------------------------------------
2023-01-07 01:00:00.000 2023-01-08 05:30:00.000
2023-01-08 08:00:00.000 2023-01-09 21:30:00.000
Doing a regular inner join (with t1.rn = t2.rn - 1) gives me one row too much (the one from 05:30 to 08:00). So basically each date can only be "used" once.
Hope that makes sense... thanks!
Inner joining the CTE with itself, which didn't return the wanted result.

You can pivot the outcome of your CTE and distribute rows using arithmetics : modulo 2 comes to mind.
Assuming that your CTE returns columns dt (a datetime field) and rn (an integer row number) :
select min(dt) dt_from, max(dt) dt_to
from cte
group by ( rn - 1 ) % 2
On T-SQL we could also leverage integer division to express the pair grouping:
group by ( rn - 1 ) / 2

You can avoid both the JOIN and the GROUP BY by using LAG to retrieve a previous column value in the result set. The server may be able to generate an execution plan that iterates over the data just once instead of joining or grouping :
with pairs as (
SELECT rn, lag(datetime) OVER(ORDER BY rn) as dt_from, datetime as dt_to
from another_cte
)
select dt_from,dt_to
from pairs
ORDER BY rn
where rn%2=0
The row number itself can be calculated from datetime :
with pairs as (
SELECT
ROW_NUMBER() OVER(ORDER BY datetime) AS rn,
lag(datetime) OVER(ORDER BY datetime) as dt_from,
datetime as dt_to
from another_cte
)
select dt_from,dt_to
from pairs
ORDER BY dt_to
where rn%2=0

Related

SQL to find sum of total days in a window for a series of changes

Following is the table:
start_date
recorded_date
id
2021-11-10
2021-11-01
1a
2021-11-08
2021-11-02
1a
2021-11-11
2021-11-03
1a
2021-11-10
2021-11-04
1a
2021-11-10
2021-11-05
1a
I need a query to find the total day changes in aggregate for a given id. In this case, it changed from 10th Nov to 8th Nov so 2 days, then again from 8th to 11th Nov so 3 days and again from 11th to 10th for a day, and finally from 10th to 10th, that is 0 days.
In total there is a change of 2+3+1+0 = 6 days for the id - '1a'.
Basically for each change there is a recorded_date, so we arrange that in ascending order and then calculate the aggregate change of days grouped by id. The final result should be like:
id
Agg_Change
1a
6
Is there a way to do this using SQL. I am using vertica database.
Thanks.
you can use window function lead to get the difference between rows and then group by id
select id, sum(daydiff) Agg_Change
from (
select id, abs(datediff(day, start_Date, lead(start_date,1,start_date) over (partition by id order by recorded_date))) as daydiff
from tablename
) t group by id
It's indeed the use of LAG() to get the previous date in an OLAP query, and an outer query getting the absolute date difference, and the sum of it, grouping by id:
WITH
-- your input - don't use in real query ...
indata(start_date,recorded_date,id) AS (
SELECT DATE '2021-11-10',DATE '2021-11-01','1a'
UNION ALL SELECT DATE '2021-11-08',DATE '2021-11-02','1a'
UNION ALL SELECT DATE '2021-11-11',DATE '2021-11-03','1a'
UNION ALL SELECT DATE '2021-11-10',DATE '2021-11-04','1a'
UNION ALL SELECT DATE '2021-11-10',DATE '2021-11-05','1a'
)
-- real query starts here, replace following comma with "WITH" ...
,
w_lag AS (
SELECT
id
, start_date
, LAG(start_date) OVER w AS prevdt
FROM indata
WINDOW w AS (PARTITION BY id ORDER BY recorded_date)
)
SELECT
id
, SUM(ABS(DATEDIFF(DAY,start_date,prevdt))) AS dtdiff
FROM w_lag
GROUP BY id
-- out id | dtdiff
-- out ----+--------
-- out 1a | 6
I was thinking lag function will provide me the answer, but it kept giving me wrong answer because I had the wrong logic in one place. I have the answer I need:
with cte as(
select id, start_date, recorded_date,
row_number() over(partition by id order by recorded_date asc) as idrank,
lag(start_date,1) over(partition by id order by recorded_date asc) as prev
from table_temp
)
select id, sum(abs(date(start_date) - date(prev))) as Agg_Change
from cte
group by 1
If someone has a better solution please let me know.

Collapse multiple rows based on time values

I'm trying to collapse rows with consecutive timeline within the same day into one row but having an issue because of gap in time. For example, my dataset looks like this.
Date StartTime EndTime ID
2017-12-1 09:00:00 11:00:00 12345
2017-12-1 11:00:00 13:00:00 12345
2018-09-08 09:00:00 10:00:00 78465
2018-09-08 10:00:00 12:00:00 78465
2018-09-08 15:00:00 16:00:00 78465
2018-09-08 16:00:00 18:00:00 78465
As up can see, the first two rows can just be combined together without any issue because there's no time gap within that day. However. for the entries on 2019-09-08, there is a gap between 12:00 and 15:00. And I'd like to merge these four records into two different rows like this:
Date StartTime EndTime ID
2017-12-1 09:00:00 13:00:00 12345
2018-09-08 09:00:00 12:00:00 78465
2018-09-08 15:00:00 18:00:00 78465
In other words, I only want to collapse the rows only when the time variables are consecutive within the same day for the same ID.
Could anyone please help me with this? I tried to generate unique group using LAG and LEAD functions but it didn't work.
You can use a recursive cte. Group it as same group if the EndTime is same as next StartTime. And then find the MIN() and MAX()
with cte as
(
select rn = row_number() over (partition by [ID], [Date] order by [StartTime]),
*
from tbl
),
rcte as
(
-- anchor member
select rn, [ID], [Date], [StartTime], [EndTime], grp = 1
from cte
where rn = 1
union all
-- recursive member
select c.rn, c.[ID], c.[Date], c.[StartTime], c.[EndTime],
grp = case when r.[EndTime] = c.[StartTime]
then r.grp
else r.grp + 1
end
from rcte r
inner join cte c on r.[ID] = c.[ID]
and r.[Date] = c.[Date]
and r.rn = c.rn - 1
)
select [ID], [Date],
min([StartTime]) as StartTime,
max([EndTime]) as EndTime
from rcte
group by [ID], [Date], grp
db<>fiddle demo
Unless you have a particular objection to collapsing non-consecutive rows, which are consecutive for that ID, you can just use GROUP BY:
SELECT
Date,
StartTime = MIN(StartTime),
EndTime = MAX(EndTime),
ID
FROM table
GROUP BY ID, Date
Otherwise you can use a solution based on ROW_NUMBER:
SELECT
Date,
StartTime,
EndTime,
ID
FROM (
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY Date, ID ORDER BY StartTime)
FROM table
) t
WHERE rn = 1
This is an example of a gaps-and-islands problem -- actually a pretty simple example. The idea is to assign an "island" grouping to each row specifying that they should be combined because they overlap. Then aggregate.
How do you assign the island? In this case, look at the previous endtime and if it is different from the starttime, then the row starts a new island. Voila! A cumulative sum of the the start flag identifies each island.
As SQL:
select id, date, min(starttime), max(endtime)
from (select t.*,
sum(case when prev_endtime = starttime then 0 else 1 end) over (partition by id, date order by starttime) as grp
from (select t.*,
lag(endtime) over (partition by id, date order by starttime) as prev_endtime
from t
) t
) t
group by id, date, grp;
Here is a db<>fiddle.
Note: This assumes that the time periods never span multiple days. The code can be very easily modified to handle that . . . but with a caveat. The start and end times should be stored as datetime (or a related timestamp) rather than separating the date and times into different columns. Why? SQL Server doesn't support '24:00:00' as a valid time.

How to get Date Range which is matching a criteria

I can get the desired output by using while loop but since original table has thousands of record, performance is very slow.
How can I get the desired results using Common Table Expression?
Thank You.
This will produce the desired results. Not as elegant as Gordon's, but it does allow for gaps in dates and dupicate dates.
If you have a Calendar/Tally Table, the cte logic can be removed.
Example
Declare #YourTable Table ([AsOfDate] Date,[SecurityID] varchar(50),[IsHeld] bit)
Insert Into #YourTable Values
('2017-05-19','S1',1)
,('2017-05-20','S1',1)
,('2017-05-21','S1',1)
,('2017-05-22','S1',1)
,('2017-05-23','S1',0)
,('2017-05-24','S1',0)
,('2017-05-25','S1',0)
,('2017-05-26','S1',1)
,('2017-05-27','S1',1)
,('2017-05-28','S1',1)
,('2017-05-29','S1',0)
,('2017-05-30','S1',0)
,('2017-05-31','S1',1)
;with cte1 as ( Select D1=min(AsOfDate),D2=max(AsOfDate) From #YourTable )
,cte2 as (
Select Top (DateDiff(DAY,(Select D1 from cte1),(Select D2 from cte1))+1)
D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),(Select D1 from cte1))
,R=Row_Number() over (Order By (Select Null))
From master..spt_values n1,master..spt_values n2
)
Select [SecurityID]
,[StartDate] = min(D)
,[EndDate] = max(D)
From (
Select *,Grp = dense_rank() over (partition by securityId order by asofdate )-R
From #YourTable A
Join cte2 B on AsOfDate=B.D
Where IsHeld=1
) A
Group By [SecurityID],Grp
Order By min(D)
Returns
SecurityID StartDate EndDate
S1 2017-05-19 2017-05-22
S1 2017-05-26 2017-05-28
S1 2017-05-31 2017-05-31
This is a variant of the gaps-and-islands problem. In this case, you can use date arithmetic to calculate the rows with adjacent dates:
select securityId, isheld, min(asofdate), max(asofdate)
from (select t.*,
datediff(day,
- row_number() over (partition by securityId, isheld
order by asofdate
),
asofdate) as grp
from t
) t
group by grp, securityId, isheld;
Note: This assumes that the dates are contiguous and have no duplicates. The query can be modified to take those factors into account.
The basic idea is that if you have a sequence of days that are increasing one at a time, then you can subtract a sequence of values and get a constant. That is what grp is. The rest is just aggregation.

trying to find the maximum number of occurrences over time T-SQL

I have data recording the StartDateTime and EndDateTime (both DATETIME2) of a process for all of the year 2013.
My task is to find the maximum amount of times the process was being ran at any specific time throughout the year.
I have wrote some code to check every minute/second how many processes were running at the specific time, but this takes a very long time and would be impossible to let it run for the whole year.
Here is the code (in this case check every minute for the date 25/10/2013)
CREATE TABLE dbo.#Hit
(
ID INT IDENTITY (1,1) PRIMARY KEY,
Moment DATETIME2,
COUNT INT
)
DECLARE #moment DATETIME2
SET #moment = '2013-10-24 00:00:00'
WHILE #moment < '2013-10-25'
BEGIN
INSERT INTO #Hit ( Moment, COUNT )
SELECT #moment, COUNT(*)
FROM dbo.tblProcessTimeLog
WHERE ProcessFK IN (25)
AND #moment BETWEEN StartDateTime AND EndDateTime
AND DelInd = 0
PRINT #moment
SET #moment = DATEADD(MINute,1,#moment)
END
SELECT * FROM #Hit
ORDER BY COUNT DESC
Can anyone think how i could get a similar result (I just need the maximum amount of processes being run at any given time), but for all year?
Thanks
DECLARE #d DATETIME = '20130101'; -- the first day of the year you care about
;WITH m(m) AS
( -- all the minutes in a day
SELECT TOP (1440) ROW_NUMBER() OVER (ORDER BY number) - 1
FROM master..spt_values
),
d(d) AS
( -- all the days in *that* year (accounts for leap years vs. hard-coding 365)
SELECT TOP (DATEDIFF(DAY, #d, DATEADD(YEAR, 1, #d))) DATEADD(DAY, number, #d)
FROM master..spt_values WHERE type = N'P' ORDER BY number
),
x AS
( -- all the minutes in *that* year
SELECT moment = DATEADD(MINUTE, m.m, d.d) FROM m CROSS JOIN d
)
SELECT TOP (1) WITH TIES -- in case more than one at the top
x.moment, [COUNT] = COUNT(l.ProcessFK)
FROM x
INNER JOIN dbo.tblProcessTimeLog AS l
ON x.moment >= l.StartDateTime
AND x.moment <= l.EndDateTime
WHERE l.ProcessFK = 25 AND l.DelInd = 0
GROUP BY x.moment
ORDER BY [COUNT] DESC;
See this post for why I don't think you should use BETWEEN for range queries, even in cases where it does semantically do what you want.
Create a table T whose rows represent some time segments.
This table could well be a temporary table (depending on your case).
Say:
row 1 - [from=00:00:00, to=00:00:01)
row 2 - [from=00:00:01, to=00:00:02)
row 3 - [from=00:00:02, to=00:00:03)
and so on.
Then just join from your main table
(tblProcessTimeLog, I think) to this table
based on the datetime values recorded in
tblProcessTimeLog.
A year has just about half million minutes
so it is not that many rows to store in T.
I recently pulled some code from SO trying to solve the 'island and gaps' problem, and the algorithm for that should help you solve your problem.
The idea is that you want to find the point in time that has the most started processes, much like figuring out the deepest nesting of parenthesis in an expression:
( ( ( ) ( ( ( (deepest here, 6)))))
This sql will produce this result for you (I included a temp table with sample data):
/*
CREATE TABLE #tblProcessTimeLog
(
StartDateTime DATETIME2,
EndDateTime DATETIME2
)
-- delete from #tblProcessTimeLog
INSERT INTO #tblProcessTimeLog (StartDateTime, EndDateTime)
Values ('1/1/2012', '1/6/2012'),
('1/2/2012', '1/6/2012'),
('1/3/2012', '1/6/2012'),
('1/4/2012', '1/6/2012'),
('1/5/2012', '1/7/2012'),
('1/6/2012', '1/8/2012'),
('1/6/2012', '1/10/2012'),
('1/6/2012', '1/11/2012'),
('1/10/2012', '1/12/2012'),
('1/15/2012', '1/16/2012')
;
*/
with cteProcessGroups (EventDate, GroupId) as
(
select EVENT_DATE, (E.START_ORDINAL - E.OVERALL_ORDINAL) GROUP_ID
FROM
(
select EVENT_DATE, EVENT_TYPE,
MAX(START_ORDINAL) OVER (ORDER BY EVENT_DATE, EVENT_TYPE ROWS UNBOUNDED PRECEDING) as START_ORDINAL,
ROW_NUMBER() OVER (ORDER BY EVENT_DATE, EVENT_TYPE) AS OVERALL_ORDINAL
from
(
Select StartDateTime AS EVENT_DATE, 1 as EVENT_TYPE, ROW_NUMBER() OVER (ORDER BY StartDateTime) as START_ORDINAL
from #tblProcessTimeLog
UNION ALL
select EndDateTime, 0 as EVENT_TYPE, NULL
FROM #tblProcessTimeLog
) RAWDATA
) E
)
select Max(EventDate) as EventDate, count(GroupId) as OpenProcesses
from cteProcessGroups
group by (GroupId)
order by COUNT(GroupId) desc
Results:
EventDate OpenProcesses
2012-01-05 00:00:00.0000000 5
2012-01-06 00:00:00.0000000 4
2012-01-15 00:00:00.0000000 2
2012-01-10 00:00:00.0000000 2
2012-01-08 00:00:00.0000000 1
2012-01-07 00:00:00.0000000 1
2012-01-11 00:00:00.0000000 1
2012-01-06 00:00:00.0000000 1
2012-01-06 00:00:00.0000000 1
2012-01-06 00:00:00.0000000 1
2012-01-16 00:00:00.0000000 1
Note that the 'in-between' rows don't give anything meaningful. Basically this output is only tuned to tell you when the most activity was. Looking at the other rows in the out put, there wasn't just 1 process running on 1/8 (there was actually 3). But the way this code works is that by grouping the processes that are concurrent together in a group, you can count the number of simultaneous processes. The date returned is when the max concurrent processes began. It doesn't tell you how long they were going on for, but you can solve that with an additional query. (once you know the date the most was ocurring, you can find out the specific process IDs by using a BETWEEN statement on the date.)
Hope this helps.

Find date ranges between large gaps and ignore smaller gaps

I have a column of a mostly continous unique dates in ascending order. Although the dates are mostly continuos, there are some gaps in the dates of less than 3 days, others have more than 3 days.
I need to create a table where each record has a start date and an end date of the range that includes a gap of 3 days or less. But a new record has to be generated if the gap is longer than 3 days.
so if dates are:
1/2/2012
1/3/2012
1/4/2012
1/15/2012
1/16/2012
1/18/2012
1/19/2012
I need:
1/2/2012 1/4/2012
1/15/2012 1/19/2012
You can do something like this:
WITH CTE_Source AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY DT) RN
FROM dbo.Table1
)
,CTE_Recursion AS
(
SELECT *, 1 AS Grp
FROM CTE_Source
WHERE RN = 1
UNION ALL
SELECT src.*, CASE WHEN DATEADD(DD,3,rec.DT) < src.DT THEN rec.Grp + 1 ELSE Grp END AS Grp
FROM CTE_Source src
INNER JOIN CTE_Recursion rec ON src.RN = rec.RN +1
)
SELECT
MIN(DT) AS StartDT, MAX(DT) AS EndDT
FROM CTE_Recursion
GROUP BY Grp
First CTE is just to assign continuous numbers for all rows in order to join them later. Then using recursive CTE you can join on each next row assigning groups if date difference is larger than 3 days. In the end just group by grouping column and select desired results.
SQLFiddle DEMO