How can I get consecutive dates in one row using SQL? - sql

I want to get starttime minimum and endtime maximum on consecutive dates(must be the same month) , but all dates have to be consecutive.
I want to combine them in one row. How can I get the following table?(Table 2)
IP Address
Starttime
Endtime
192.168. 0.1
15/12/2022
15/12/2022
192.168. 0.1
26/12/2022
26/12/2022
192.168. 0.1
27/12/2022
27/12/2022
192.168. 0.1
28/12/2022
28/12/2022
192.168. 0.1
11/01/2023
11/01/2023
192.168. 0.1
12/01/2023
12/01/2023
192.168. 0.1
13/01/2023
13/01/2023
192.168. 0.1
14/01/2023
14/01/2023
192.168. 0.1
15/01/2023
15/01/2023
This is actually what I want:
IP Address
Starttime
Endtime
192.168. 0.1
15/12/2022
15/12/2022
192.168. 0.1
26/12/2022
28/12/2022
192.168. 0.1
11/01/2023
15/01/2023

Use LAG(endtime) or LEAD(starttime) OVER (PARTITION BY ip_address ORDER BY starttime) on a first pass so that each row can see the date from its neighboring row ( you can go in either direction). Then use a CASE or DECODE to test whether the dates are contiguous or not. If they are not, then emit the date, else emit NULL. Then in a parent block, you can use LAST_VALUE() with IGNORE NULLS option and ROWS BETWEEN UNBOUNDED PRECEDING TO CURRENT ROW to get the most recent of those conditionally computed dates. That can then be used in a GROUP BY in yet a third parent query block to finalize your results.

Or the classical MATCH_RECOGNIZE for the merging of intervals:
select * from data
match_recognize(
partition by ipaddress
order by starttime, endtime
measures first(starttime) as starttime, max(endtime) as endtime
pattern( merged* strt )
define
merged as endtime = next(starttime) - 1
);

Related

PostgresSQL query to count time interval inside day period

I have an application that logs three states of some machine. I need to report how long the machine stayed in each state from 00:00:00 until 23:59:59 for each day.
I need a help to build a postgresql query to get entire time interval that some event occur during the day.
For example, as you can see below the data are shown between 23:50:00 previous day until 01:00:00 follwing day.
device_id varchar(50)
Value int4
Date_time timestamptz
device1
0
2022-23-04 23:50:00
device1
0
2022-24-04 00:10:00
device1
0
2022-24-04 00:15:00
device1
0
2022-24-04 00:20:00
device1
1
2022-24-04 00:25:00
device1
1
2022-24-04 00:30:00
device1
1
2022-24-04 11:00:00
device1
0
2022-24-04 21:00:00
device1
1
2022-25-04 01:00:00
I am calculating the duration between state changes and I am also inserting in the table and I am simply making the sum but it is giving me the following result:
Total:
State = 0 - 04:35
State = 1 - 20:35
Sum Both = 25:10:00
The query I need should not consider the portion of time that belong to other days that is not 24/04/2021 and must give me:
Total Day 24/04/2022:
State = 0 - 03:25
State = 1 - 20:35
Sum Both = 24:00:00
And in the and of each day I need to have the percentage of time the machine stayed in each state and build a pie chart.
Is there a way to make a query that fits this needs?
Thank You all in advance for the help.
#shawnt00 answer worked. Now I am trying to figure out how to organize the data so the query result come as the following:
device_id
state_0
state_1
state_9
device_id
timespan_state_0
timespan_state_1
timespan_state_9
For the given example, should be as shown below, I added an second device just to augment the example :)
device_id
state_0
state_1
state_9
device1
03:25
20:35
00:00
device2
X
Y
Z
Att.
Winner Martins
I think you just need to use lead()/lag() along with some case expressions to detect the spans across midnight. No join is required:
with data as (
select *,
cast(date_trunc('day', Date_time) as date) as dt,
lag(Date_time) over (partition by device_id order by Date_time) as last_Date_time,
lead(Date_time) over (partition by device_id order by Date_time) as next_Date_time
from T
)
select device_id, dt as "date", Value,
coalesce(sum(
case when date_trunc('day', next_Date_time) > date_trunc('day', Date_time)
then date_trunc('day', Date_time + interval '1 day') - Date_time
else coalesce(next_date_time - Date_time, interval '0 seconds') end
+
case when date_trunc('day', last_Date_time) < date_trunc('day', Date_time)
then Date_time - date_trunc('day', Date_time)
else interval '0 seconds' end
), interval '0 seconds') as timespan2
from data
group by device_id, dt, Value
order by device_id, dt, Value;
https://dbfiddle.uk/?rdbms=postgres_12&fiddle=ab32fee1615b637f9f2f844aa1bf5064
I'm not overly familiar with all the PostGres date functions so there's possibly a slightly cleaner way to get the time calculation.
The query below would work.
select
std.state,
sum(case
when std.rk = 1 then std.time_diff + std.time_diff_start
when std.trunc_state_start = std.trunc_state_end then std.time_diff
when std.trunc_state_start <> std.trunc_state_end then std.time_diff_end
else std.time_diff
end)
from
(
select
a.state,
date_trunc('day', a.date_time) as trunc_state_start,
date_trunc('day', b.date_time) as trunc_state_end,
b.date_time - a.date_time as time_diff,
a.date_time - date_trunc('day', a.date_time) as time_diff_start,
date_trunc('day', b.date_time) - a.date_time as time_diff_end,
rank() over(order by a.date_time) rk
from
(select ds.*, rank() over(order by date_time) rk from devicestat ds) a
inner join
(select ds.*, rank() over(order by date_time) rk from devicestat ds) b
on
a.rk + 1 = b.rk
where
date_trunc('day', a.date_time) = '2022-04-24') std
group by
std.state;
The joining between itself makes it easy for me to calculate time difference between state start and end time. The rest is how to calculate boundary differences between start and end of the day. I think there are many ways to do that, but this is what came up in my mind.

Collapse multiple rows based on time values

I'm trying to collapse rows with consecutive timeline within the same day into one row but having an issue because of gap in time. For example, my dataset looks like this.
Date StartTime EndTime ID
2017-12-1 09:00:00 11:00:00 12345
2017-12-1 11:00:00 13:00:00 12345
2018-09-08 09:00:00 10:00:00 78465
2018-09-08 10:00:00 12:00:00 78465
2018-09-08 15:00:00 16:00:00 78465
2018-09-08 16:00:00 18:00:00 78465
As up can see, the first two rows can just be combined together without any issue because there's no time gap within that day. However. for the entries on 2019-09-08, there is a gap between 12:00 and 15:00. And I'd like to merge these four records into two different rows like this:
Date StartTime EndTime ID
2017-12-1 09:00:00 13:00:00 12345
2018-09-08 09:00:00 12:00:00 78465
2018-09-08 15:00:00 18:00:00 78465
In other words, I only want to collapse the rows only when the time variables are consecutive within the same day for the same ID.
Could anyone please help me with this? I tried to generate unique group using LAG and LEAD functions but it didn't work.
You can use a recursive cte. Group it as same group if the EndTime is same as next StartTime. And then find the MIN() and MAX()
with cte as
(
select rn = row_number() over (partition by [ID], [Date] order by [StartTime]),
*
from tbl
),
rcte as
(
-- anchor member
select rn, [ID], [Date], [StartTime], [EndTime], grp = 1
from cte
where rn = 1
union all
-- recursive member
select c.rn, c.[ID], c.[Date], c.[StartTime], c.[EndTime],
grp = case when r.[EndTime] = c.[StartTime]
then r.grp
else r.grp + 1
end
from rcte r
inner join cte c on r.[ID] = c.[ID]
and r.[Date] = c.[Date]
and r.rn = c.rn - 1
)
select [ID], [Date],
min([StartTime]) as StartTime,
max([EndTime]) as EndTime
from rcte
group by [ID], [Date], grp
db<>fiddle demo
Unless you have a particular objection to collapsing non-consecutive rows, which are consecutive for that ID, you can just use GROUP BY:
SELECT
Date,
StartTime = MIN(StartTime),
EndTime = MAX(EndTime),
ID
FROM table
GROUP BY ID, Date
Otherwise you can use a solution based on ROW_NUMBER:
SELECT
Date,
StartTime,
EndTime,
ID
FROM (
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY Date, ID ORDER BY StartTime)
FROM table
) t
WHERE rn = 1
This is an example of a gaps-and-islands problem -- actually a pretty simple example. The idea is to assign an "island" grouping to each row specifying that they should be combined because they overlap. Then aggregate.
How do you assign the island? In this case, look at the previous endtime and if it is different from the starttime, then the row starts a new island. Voila! A cumulative sum of the the start flag identifies each island.
As SQL:
select id, date, min(starttime), max(endtime)
from (select t.*,
sum(case when prev_endtime = starttime then 0 else 1 end) over (partition by id, date order by starttime) as grp
from (select t.*,
lag(endtime) over (partition by id, date order by starttime) as prev_endtime
from t
) t
) t
group by id, date, grp;
Here is a db<>fiddle.
Note: This assumes that the time periods never span multiple days. The code can be very easily modified to handle that . . . but with a caveat. The start and end times should be stored as datetime (or a related timestamp) rather than separating the date and times into different columns. Why? SQL Server doesn't support '24:00:00' as a valid time.

Transform records with duration to time-of-day table

I have created a rather complex transformation in power query and due to performance reasons I need to push it back to a SQL Server backend. However I have troubles with implementing it - maybe you can help me with some clues on how to approach this problem.
I have a source table that has transactions with duration - ie. start and end timestamps, and these transactions can spread over multiple days. I would like to transform the table to a time-of-day scale to analyse how these transactions spread across the day from 0:00:00 to 23.59:59.
The distribution is linear using time percentage.
So if I have a source table sample like this:
Record_ID StartTime StopTime Measure
----------------------------------------------------------
1 2020.06.06 9:45 2020.06.06 18:31 682
2 2020.06.06 23:21 2020.06.07 10:51 543
3 2020.06.06 16:38 2020.06.08 9:49 20921
The result would look like this:
Record_ID StartTime StopTime Measure
--------------------------------------------------------------
1 2020.06.06 9:45 2020.06.06 18:31 682
2 2020.06.06 23:21 2020.06.06 23:59 30,5
2 2020.06.07 0:00 2020.06.07 10:51 512,5
3 2020.06.06 16:38 2020.06.06 23:59 3739,3
3 2020.06.07 0:00 2020.06.07 23:59 12189,2
3 2020.06.08 0:00 2020.06.08 9:49 4992,5
Some notes for the calculations:
For record 1 - no transformation is needed as this is not overlapping midnight
For record 2 - it overlaps one midnight, so two records are created, weighing based on minutes
5.61% * 543 = 30.5
94.39% * 543 = 512.5
For record 3 - it overlaps multiple midnights, so multiple records are created again based on minutes. If multiple days are covered, then I would need even more 0:00 - 23:59 type records to cover the whole duration
17.87% * 20921 = 3739.3
58,26% * 20921 = 12189.2
23.86% * 20921 = 4992.5
Is there a pattern reference I could use? Is this possible to do in SQL? Is it possible to do it without loops?
You can use a recursive CTE:
with cte as (
select Record_ID, StartTime, endTime, Measure
from t
union all
select record_id, convert(datetime, dateadd(day, 1, convert(date, StartTime))),
endtime, measure
from cte
where datediff(day, starttime, endtime) > 0
)
select cte.*,
measure * (diff * 1.0 / sum(diff) over (partition by record_id)) as measure
from (select record_id, starttime,
(case when datediff(day, starttime, endtime) = 0
then endtime
else dateadd(day, 1, convert(date, StartTime))
end) as endtime,
measure
from cte
) cte cross apply
(values (datediff(second, starttime, endtime))) v(diff);
Note that this registers the stoptime as the beginning of the following day, so there are no gaps. That makes the allocation of measure more accurate.
Here is a db<>fiddle.
Another option is to create a date dimensions / calendar table.
Then you can do something like this:
WITH cte_dates
as
(
SELECT m.*,
CASE WHEN m.StartTime < ad.TheDate THEN ad.theDate ELSE m.StartTime END as newStartTime ,
CASE WHEN ad.theNextDay < m.StopTime THEN ad.theNextDay ELSE m.StopTime END as newStopTime
FROM myTable m
JOIN allDates ad
ON ad.theDate between cast(m.StartTime as date) and cast(m.StopTime as date)
)
SELECT cd.Record_ID,1.0 * datediff(MINUTE,cd.newStartTime , cd.newStopTime) / datediff(MINUTE,cd.StartTime,cd.StopTime) * measure as measure
FROM cte_dates cd

How can I reference column values from previous rows in BigQuery SQL, in order to perform operations or calculations?

I have sorted my data by start time, and I want to create a new field that rolls up data that overlap start times from the previous rows start and end time.
More specifically, I want to write logic that, for a given record X, if the start time is somewhere between the start and end time of the previous row, I want to give record X the same value for the new field as that previous row. If the start time happens after the end time of the previous row, it would get a new value for the new field.
Is something like this possible in BigQuery SQL? Was thinking maybe lag or window function, but not quite sure. Below are examples of what the base table looks like and what I want for the final table.
Any insight appreciated!
Below is for BigQuery Standard SQL
#standardSQL
SELECT recordID, startTime, endTime,
COUNTIF(newRange) OVER(ORDER BY startTime) AS newRecordID
FROM (
SELECT *,
startTime >= MAX(endTime) OVER(ORDER BY startTime ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS newRange
FROM `project.dataset.table`
)
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 recordID, TIME '12:35:00' startTime, TIME '12:50:00' endTime UNION ALL
SELECT 2, '12:46:00', '12:59:00' UNION ALL
SELECT 3, '14:27:00', '16:05:00' UNION ALL
SELECT 4, '15:48:00', '16:35:00' UNION ALL
SELECT 5, '16:18:00', '17:04:00'
)
SELECT recordID, startTime, endTime,
COUNTIF(newRange) OVER(ORDER BY startTime) AS newRecordID
FROM (
SELECT *,
startTime >= MAX(endTime) OVER(ORDER BY startTime ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS newRange
FROM `project.dataset.table`
)
-- ORDER BY startTime
with result
Row recordID startTime endTime newRecordID
1 1 12:35:00 12:50:00 0
2 2 12:46:00 12:59:00 0
3 3 14:27:00 16:05:00 1
4 4 15:48:00 16:35:00 1
5 5 16:18:00 17:04:00 1
This is a gaps and islands problem. What you want to do is assign a group id to non-intersecting groups. You can calculating the non-intersections using window functions.
A record starts a new group if the cumulative maximum value of the end time, ordered by start time and ending at the previous record, is less than the current end time. The rest is just a cumulative sum to assign a group id.
For your data:
select t.*,
sum(case when prev_endtime >= endtime then 0 else 1 end) over (order by starttime) as group_id
from (select t.*,
max(endtime) over (order by starttime rows between unbounded preceding and 1 preceding) as prev_endtime
from t
) t;
The only potential issue is if two records start at exactly the same time. If this can happen, the logic might need to be slightly more complex.

TSQL adjustable time interval

I have a TSQL query that is returning a list of variable names and their values at a point in time. Currently it is truncating the datetime column to give me a minute-by-minute result set.
It would be incredibly useful to me to be able to specify whatever interval of data I want. Every x seconds, every x minutes, or every x hours.
I cannot GROUP BY because I do not want to aggregate the selected values.
Here is my current query:
SELECT time, var_name, value
FROM (
SELECT time, var_name, value, ROW_NUMBER() over (partition by var_id, convert(varchar(16), time, 121) order by time desc) as seqnum
FROM var_values vv
JOIN var_names vn ON vn.id = vv.tag_id
WHERE ( var_id = 1 OR var_id = 2)
AND time >= '2013-06-04 00:00:00' AND time < '2013-06-04 16:20:17'
) k
WHERE seqnum = 1
ORDER BY time;
And the result set:
2013-06-04 00:20:52.847 Random.Boolean 0
2013-06-04 00:20:52.850 Random.Int1 76
2013-06-04 00:21:52.893 Random.Boolean 1
2013-06-04 00:21:52.897 Random.Int1 46
2013-06-04 00:22:52.920 Random.Boolean 1
2013-06-04 00:22:52.927 Random.Int1 120
Also just to be complete, I want to retain the ability to modify the WHERE clause to choose which var_id's I want in my result set.
You should be able to partition by the unix timestamp divided by your required interval in seconds;
(PARTITION BY var_id, DATEDIFF(SECOND,{d '1970-01-01'}, time) / 60 -- 60 seconds
ORDER BY TIME DESC) AS seqnum
The calculation will give the same result for 60 seconds, which will put all rows in the interval inside the same partition.