Grouping Data 3 Hours after the Initial Time - sql

I need to be able to filter down a dataset to only show the first instance every 3 hours. If an instance is found, any other instances that occur up to 3 hours afterwards should be hidden.
The closes thing I've been able to find is using date_trunc to get the first instance each hour, but I need to hide specifically up to 3 hours after the first instance exactly.
Example Data:
+------------------------+-------+
| Timestamp | Value |
+------------------------+-------+
| "2015-12-29 13:35:00" | 65 |
| "2015-12-29 13:40:00" | 26 |
| "2015-12-29 13:45:00" | 80 |
| "2015-12-29 13:50:00" | 10 |
| "2015-12-29 16:40:00" | 76 |
| "2015-12-29 16:45:00" | 73 |
| "2016-01-04 08:05:00" | 87 |
| "2016-01-04 08:10:00" | 90 |
| "2016-01-04 08:15:00" | 52 |
| "2016-01-04 08:20:00" | 90 |
| "2016-01-04 08:25:00" | 23 |
| "2016-01-04 08:30:00" | 96 |
| "2016-01-04 13:35:00" | 53 |
| "2016-01-04 13:40:00" | 15 |
| "2016-01-04 13:45:00" | 85 |
+------------------------+-------+
Expected Result:
+------------------------+-------+
| Timestamp | Value |
+------------------------+-------+
| "2015-12-29 13:35:00" | 65 |
| "2015-12-29 16:40:00" | 76 |
| "2016-01-04 08:05:00" | 87 |
| "2016-01-04 13:30:00" | 7 |
+------------------------+-------+
Anyone have any ideas? Thank you so much for your help.

This is tricky, because you need to keep track of the last picked record to identify the next one - so you can't just group by 3 hours intervals.
Here is one approach using a recursive cte:
with recursive cte(ts, value) as (
select ts, value
from mytable
where ts = (select min(ts) from mytable)
union all
select x.*
from (select ts from cte order by ts desc limit 1) c
cross join lateral (
select t.ts, t.value
from mytable t
where t.ts >= c.ts + interval '3' hour
order by t.ts
limit 1
) x
)
select * from cte order by ts
The idea is to start from the earliest record in the table, then iterate by picking the first available record that is at least 3 hours later (this assumes no duplicates in the timestamp column).
Note that timestamp is not a good choice for a column name, because it conflicts with a language keyword (that's a datatype). I remaned it to ts in the query.
Demo on DB Fiddle:
ts | value
:------------------ | ----:
2015-12-29 13:35:00 | 65
2015-12-29 16:40:00 | 76
2016-01-04 08:05:00 | 87
2016-01-04 13:35:00 | 53

Related

Is there a way to select records inserted in previous hour from the last recorded inserted record?

I have a table that looks like this that has quite a few records in it:
+---------+------+------------------------+
| unit | temp | login_time_utc |
+---------+------+------------------------+
| 1 | 53 | 2022-01-24 10:02:06 |
| 1 | 62 | 2022-01-24 10:10:01 |
| 2 | 34 | 2022-01-24 10:04:00 |
| 2 | 65 | 2022-01-24 16:08:59 |
| 2 | 65 | 2022-01-24 16:03:56 |
| 2 | 74 | 2022-01-24 16:06:53 |
| 3 | 74 | 2022-01-24 16:05:51 |
| 3 | 83 | 2022-01-24 17:09:49 |
| 3 | 73 | 2022-01-24 18:07:46 |
| 4 | 74 | 2022-01-24 18:11:43 |
+---------+------+------------------------+
I would like to select all the records for each unit that were inserted in the last hour from the most recently inserted record of that respective unit. Is that possible?
I can do this easily if its just the last hour from now, but I don't know how to do this if its the last hour of each units most recent insert.
I cannot use a loop or a cursor in this situation.
with cutoff as (
select unit, max(login_time_utc) as max_login
from T group by unit
)
select data.*
from cutoff cross apply (
select * from T t
where t.unit = cutoff.unit
and t.login_time_utc >= dateadd(hour, -1, cutoff.max_login)
) as data
You can use a window function and a CTE to identify the MAX date per unit. Then use DATEDIFF to find all the records in the last hour.
WITH cte AS (
SELECT *, MAX(login_time_utc) OVER (PARTITION BY unit) AS login_time_utc_max
FROM yourtable
)
SELECT unit, temp, login_time_utc
FROM cte
WHERE DATEDIFF(SS, login_time_utc, login_time_utc_max) <= 3600
ORDER BY login_time_utc

Select only record until timestamp from another table

I have three tables.
The first one is Device table
+----------+------+
| DeviceId | Type |
+----------+------+
| 1 | 10 |
| 2 | 20 |
| 3 | 30 |
+----------+------+
The second one is History table - data received by different devices.
+----------+-------------+--------------------+
| DeviceId | Temperature | TimeStamp |
+----------+-------------+--------------------+
| 1 | 31 | 15.08.2020 1:42:00 |
| 2 | 100 | 15.08.2020 1:42:01 |
| 2 | 40 | 15.08.2020 1:43:00 |
| 1 | 32 | 15.08.2020 1:44:00 |
| 1 | 34 | 15.08.2020 1:45:00 |
| 3 | 20 | 15.08.2020 1:46:00 |
| 2 | 45 | 15.08.2020 1:47:00 |
+----------+-------------+--------------------+
The third one is DeviceStatusHistory table
+----------+---------+--------------------+
| DeviceId | State | TimeStamp |
+----------+---------+--------------------+
| 1 | 1(OK) | 15.08.2020 1:42:00 |
| 2 | 1(OK) | 15.08.2020 1:43:00 |
| 1 | 1(OK) | 15.08.2020 1:44:00 |
| 1 | 0(FAIL) | 15.08.2020 1:44:30 |
| 1 | 0(FAIL) | 15.08.2020 1:46:00 |
| 2 | 0(FAIL) | 15.08.2020 1:46:10 |
+----------+---------+--------------------+
I want to select the last temperature of devices, but take into account only those history records that occurs until the first device failure.
Since device1 starts failing from 15.08.2020 1:44:30, I don't want its records that go after that timestamp.
The same for the device2.
So as a final result, I want to have only data of all devices until they get first FAIL status:
+----------+-------------+--------------------+
| DeviceId | Temperature | TimeStamp |
+----------+-------------+--------------------+
| 2 | 40 | 15.08.2020 1:43:00 |
| 1 | 32 | 15.08.2020 1:44:00 |
| 3 | 20 | 15.08.2020 1:46:00 |
+----------+-------------+--------------------+
I can select an appropriate history only if device failed at least once:
SELECT * FROM Device D
CROSS APPLY
(SELECT TOP 1 * FROM History H
WHERE D.Id = H.DeviceId
and H.DeviceTimeStamp <
(select MIN(UpdatedOn) from DeviceStatusHistory Y where [State]=0 and DeviceId=D.Id)
ORDER BY H.DeviceTimeStamp desc) X
ORDER BY D.Id;
The problems is, if a device never fails, I don't get its history at all.
Update:
My idea is to use something like this
SELECT * FROM DeviceHardwarePart HP
CROSS APPLY
(SELECT TOP 1 * FROM History H
WHERE HP.Id = H.DeviceId
and H.DeviceTimeStamp <
(select ISNULL((select MIN(UpdatedOn) from DeviceMetadataPart where [State]=0 and DeviceId=HP.Id),
cast('12/31/9999 23:59:59.997' as datetime)))
ORDER BY H.DeviceTimeStamp desc) X
ORDER BY HP.Id;
I'm not sure whether it is a good solution
You can use COALESCE: coalesce(min(UpdateOn), cast('9999-12-31 23:59:59' as datetime)). This ensures you always have an upperbound for your select instead of NULL.
I will treat this as two parts problem
I will try to find the time at which device has failed and if it hasn't failed I will keep it as a large value like some timestamp in 2099
Once I have the above I can simply join with histories table and take the latest value before the failed timestamp.
In order to get one, I guess there can be several approaches. From top of my mind something like below should work
select device_id, coalesce(min(failed_timestamps), cast('01-01-2099 01:01:01' as timestamp)) as failed_at
(select device_id, case when state = 0 then timestamp else null end as failed_timestamps from History) as X
group by device_id
This gives us the minimum of failed timestamp for a particular device, and an arbitrary large value for the devices which have never failed.
I guess after this the solution is straight forward.

SQL: Get an aggregate (SUM) of a calculation of two fields (DATEDIFF) that has conditional logic (CASE WHEN)

I have a dataset that includes a bunch of stay data (at a hotel). Each row contains a start date and an end date, but no duration field. I need to get a sum of the durations.
Sample Data:
| Stay ID | Client ID | Start Date | End Date |
| 1 | 38 | 01/01/2018 | 01/31/2019 |
| 2 | 16 | 01/03/2019 | 01/07/2019 |
| 3 | 27 | 01/10/2019 | 01/12/2019 |
| 4 | 27 | 05/15/2019 | NULL |
| 5 | 38 | 05/17/2019 | NULL |
There are some added complications:
I am using Crystal Reports and this is a SQL Expression, which obeys slightly different rules. Basically, it returns a single scalar value. Here is some more info: http://www.cogniza.com/wordpress/2005/11/07/crystal-reports-using-sql-expression-fields/
Sometimes, the end date field is blank (they haven't booked out yet). If blank, I would like to replace it with the current timestamp.
I only want to count nights that have occurred in the past year. If the start date of a given stay is more than a year ago, I need to adjust it.
I need to get a sum by Client ID
I'm not actually any good at SQL so all I have is guesswork.
The proper syntax for a Crystal Reports SQL Expression is something like this:
(
SELECT (CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
And that's giving me the correct value for a single row, if I wanted to do this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 210 | // only days since June 4 2018 are counted
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 2 |
| 4 | 27 | 05/15/2019 | NULL | 21 |
| 5 | 38 | 05/17/2019 | NULL | 19 |
But I want to get the SUM of Duration per client, so I want this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 229 | // 210+19
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 23 | // 2+21
| 4 | 27 | 05/15/2019 | NULL | 23 |
| 5 | 38 | 05/17/2019 | NULL | 229 |
I've tried to just wrap a SUM() around my CASE but that doesn't work:
(
SELECT SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
It gives me an error that the StayDateEnd is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. But I don't even know what that means, so I'm not sure how to troubleshoot, or where to go from here. And then the next step is to get the SUM by Client ID.
Any help would be greatly appreciated!
Although the explanation and data set are almost impossible to match, I think this is an approximation to what you want.
declare #your_data table (StayId int, ClientId int, StartDate date, EndDate date)
insert into #your_data values
(1,38,'2018-01-01','2019-01-31'),
(2,16,'2019-01-03','2019-01-07'),
(3,27,'2019-01-10','2019-01-12'),
(4,27,'2019-05-15',NULL),
(5,38,'2019-05-17',NULL)
;with data as (
select *,
datediff(day,
case
when datediff(day,StartDate,getdate())>365 then dateadd(year,-1,getdate())
else StartDate
end,
isnull(EndDate,getdate())
) days
from #your_data
)
select *,
sum(days) over (partition by ClientId)
from data
https://rextester.com/HCKOR53440
You need a subquery for sum based on group by client_id and a join between you table the subquery eg:
select Stay_id, client_id, Start_date, End_date, t.sum_duration
from your_table
inner join (
select Client_id,
SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END) sum_duration
from your_table
group by Client_id
) t on t.Client_id = your_table.client_id

Query to get maximum value based on timestamp every 4 hour

I have a sql table that stores data every 15 minutes, but I want to fetch the maximum value every 4 hour.
This is my Actual table:
+----+----+----+-------------------------+
| Id | F1 | F2 | timestamp |
+----+----+----+-------------------------+
| 1 | 24 | 30 | 2019-03-25 12:15:00.000 |
| 2 | 22 | 3 | 2019-03-25 12:30:00.000 |
| 3 | 2 | 4 | 2019-03-25 12:45:00.000 |
| 4 | 5 | 35 | 2019-03-25 13:00:00.000 |
| 5 | 18 | 23 | 2019-03-25 13:15:00.000 |
| ' | ' | ' | ' |
| 16 | 21 | 34 | 2019-03-25 16:00:00.000 |
+----+----+----+-------------------------+
The Output I am looking for is:
+----+----+----+
| Id | F1 | F2 |
+----+----+----+
| 1 | 24 | 35 |1st 4 Hours
+----+----+----+
| 2 | 35 | 25 |Next 4 Hours
+----+----+----+
I did use the query
select max(F1) as F1,
max(F2) as F2
from table
where timestamp>='2019/3/26 12:00:01'
and timestamp<='2019/3/26 16:00:01'
and it returns the first 4 hours value but when I Increase the timestamp from 4 hrs to 8 hrs it will still give me 1 max value rather than 2 per 4 hours.
I did try with the group by clause but wasn't able to get the expected result.
This should work
SELECT Max(f1),
Max(f2), datepart(hh,timestamp), convert(date,timestamp)
FROM TABLE
WHERE datepart(hh,timestamp)%4 = 0
AND timestamp>='2019/3/26 12:00:01'
AND timestamp<='2019/3/26 16:00:01'
GROUP BY datepart(hh,timestamp), convert(date,timestamp)
ORDER BY convert(date,timestamp) asc
Here is a relatively simple method:
select convert(date, timestamp) as dte,
(datepart(hour, timestamp) / 4) * 4 as hour,
max(F1) as F1,
max(F2) as F2
from table
group by convert(date, timestamp), (datepart(hour, timestamp) / 4) * 4;
This puts the date and hour into separate columns; you can use dateadd() to put them in one column.
Try this query:
declare #startingDatetime datetime = '2017-10-04 12:00:00';
select grp, max(F1) F1, max(F2) F2
from (
select datediff(hour, #startingDatetime, [timestamp]) / 4 grp, *
from MyTable
where [timestamp] > #startingDatetime
) a group by grp

Where clause changing the results of my datediff column, How can I work around this?

I'm trying to obtain the time elapsed while st1=5. Here is what I currently have, which gives me the datediff time for each state change. My issue is that when i add a where st1=5 clause the datediff shows the difference in time between instances where the state = 5 instead of time elapsed where state is 5.
select timestamp,st1,st2,st3,st4,
datediff(second, timestamp, lead(timestamp)
over (order by timestamp)) as timediff
from A6K_status
Order By Timestamp DESC
+-----+-----+-----+-----+---------------------+----------+
| st1 | st2 | st3 | st4 | TimeStamp | TimeDiff |
+-----+-----+-----+-----+---------------------+----------+
| 3 | 3 | 3 | 3 | 2018-07-23 07:51:06 | |
+-----+-----+-----+-----+---------------------+----------+
| 5 | 5 | 5 | 5 | 2018-07-23 07:50:00 | 66 |
+-----+-----+-----+-----+---------------------+----------+
| 0 | 0 | 10 | 10 | 2018-07-23 07:47:19 | 161 |
+-----+-----+-----+-----+---------------------+----------+
| 5 | 5 | 5 | 5 | 2018-07-23 07:39:07 | 492 |
+-----+-----+-----+-----+---------------------+----------+
| 3 | 3 | 10 | 10 | 2018-07-23 07:37:48 | 79 |
+-----+-----+-----+-----+---------------------+----------+
| 3 | 3 | 10 | 10 | 2018-07-23 07:37:16 | 32 |
+-----+-----+-----+-----+---------------------+----------+
I am trying to sum the time that the state of station1 is 5. From this table above(what I have right now) if i could just sum timediff Where st1=5 that would work perfectly. But by adding "where st1=5" to my query gives me the time difference between instances where the state = 5.
Any help would be much appreciated. I feel very close to the result I would like to achieve. Thanks you.
Edit
This is what I would like to achieve
+-----+------------+----------+
| st1 | TimeStamp | TimeDiff |
+-----+------------+----------+
| 5 | 2018-07-23 | 558 |
+-----+------------+----------+
You would use a subquery (or CTE):
select sum(timediff)
from (select timestamp, st1, st2, st3, st4,
datediff(second, timestamp, lead(timestamp) over (order by timestamp)) as timediff
from A6K_status
) s
where st1 = 5;
Assuming SQL Server, try something like this:
WITH SourceTable AS (
select TOP 100 PERCENT timestamp,st1,st2,st3,st4,
datediff(second, timestamp, lead(timestamp)
over (order by timestamp)) as timediff
from A6K_status
Order By Timestamp DESC
)
SELECT SUM(timediff) as totaltimediff
WHERE st1 = 5