Set a flag based on the value of another flag in the past hour - sql

I have a table with the following design:
+------+-------------------------+-------------+
| Shop | Date | SafetyEvent |
+------+-------------------------+-------------+
| 1 | 2018-06-25 10:00:00.000 | 0 |
| 1 | 2018-06-25 10:30:00.000 | 1 |
| 1 | 2018-06-25 10:45:00.000 | 0 |
| 2 | 2018-06-25 11:00:00.000 | 0 |
| 2 | 2018-06-25 11:30:00.000 | 0 |
| 2 | 2018-06-25 11:45:00.000 | 0 |
| 3 | 2018-06-25 12:00:00.000 | 1 |
| 3 | 2018-06-25 12:30:00.000 | 0 |
| 3 | 2018-06-25 12:45:00.000 | 0 |
+------+-------------------------+-------------+
Basically at each shop, we track the date/time of a repair and flag if a safety event occurred. I want to add an additional column that tracks if a safety event has occurred in the last 8 hours at each shop. The end result will be like this:
+------+-------------------------+-------------+-------------------+
| Shop | Date | SafetyEvent | SafetyEvent8Hours |
+------+-------------------------+-------------+-------------------+
| 1 | 2018-06-25 10:00:00.000 | 0 | 0 |
| 1 | 2018-06-25 10:30:00.000 | 1 | 1 |
| 1 | 2018-06-25 10:45:00.000 | 0 | 1 |
| 2 | 2018-06-25 11:00:00.000 | 0 | 0 |
| 2 | 2018-06-25 11:30:00.000 | 0 | 0 |
| 2 | 2018-06-25 11:45:00.000 | 0 | 0 |
| 3 | 2018-06-25 12:00:00.000 | 1 | 1 |
| 3 | 2018-06-25 12:30:00.000 | 0 | 1 |
| 3 | 2018-06-25 12:45:00.000 | 0 | 1 |
+------+-------------------------+-------------+-------------------+
I was trying to use DATEDIFF but couldn't figure out how to have it occur for each row.

This isn't particularly efficient, but you can use apply or a correlated subquery:
select t.*, t8.SafetyEvent8Hours
from t apply
(select max(SafetyEvent) as SafetyEvent8Hours
from t t2
where t2.shop = t.shop and
t2.date <= t.date and
t2.date > dateadd(hour, -8, t.date)
) t8;
If you can rely on events being logged every 15 minutes, then a more efficient method is to use window functions:
select t.*,
max(SafetyEvent) over (partition by shop order by date rows between 31 preceding and current row) as SafetyEvent8Hours
from t

Related

Redshift SQL - Count Sequences of Repeating Values Within Groups

I have a table that looks like this:
| id | date_start | gap_7_days |
| -- | ------------------- | --------------- |
| 1 | 2021-06-10 00:00:00 | 0 |
| 1 | 2021-06-13 00:00:00 | 0 |
| 1 | 2021-06-19 00:00:00 | 0 |
| 1 | 2021-06-27 00:00:00 | 0 |
| 2 | 2021-07-04 00:00:00 | 1 |
| 2 | 2021-07-11 00:00:00 | 1 |
| 2 | 2021-07-18 00:00:00 | 1 |
| 2 | 2021-07-25 00:00:00 | 1 |
| 2 | 2021-08-01 00:00:00 | 1 |
| 2 | 2021-08-08 00:00:00 | 1 |
| 2 | 2021-08-09 00:00:00 | 0 |
| 2 | 2021-08-16 00:00:00 | 1 |
| 2 | 2021-08-23 00:00:00 | 1 |
| 2 | 2021-08-30 00:00:00 | 1 |
| 2 | 2021-08-31 00:00:00 | 0 |
| 2 | 2021-09-01 00:00:00 | 0 |
| 2 | 2021-08-08 00:00:00 | 1 |
| 2 | 2021-08-15 00:00:00 | 1 |
| 2 | 2021-08-22 00:00:00 | 1 |
| 2 | 2021-08-23 00:00:00 | 1 |
For each ID, I check whether consecutive date_start values are 7 days apart, and put a 1 or 0 in gap_7_days accordingly.
I want to do the following (using Redshift SQL only):
Get the length of each sequence of consecutive 1s in gap_7_days for each ID
Expected output:
| id | date_start | gap_7_days | sequence_length |
| -- | ------------------- | --------------- | --------------- |
| 1 | 2021-06-10 00:00:00 | 0 | |
| 1 | 2021-06-13 00:00:00 | 0 | |
| 1 | 2021-06-19 00:00:00 | 0 | |
| 1 | 2021-06-27 00:00:00 | 0 | |
| 2 | 2021-07-04 00:00:00 | 1 | 6 |
| 2 | 2021-07-11 00:00:00 | 1 | 6 |
| 2 | 2021-07-18 00:00:00 | 1 | 6 |
| 2 | 2021-07-25 00:00:00 | 1 | 6 |
| 2 | 2021-08-01 00:00:00 | 1 | 6 |
| 2 | 2021-08-08 00:00:00 | 1 | 6 |
| 2 | 2021-08-09 00:00:00 | 0 | |
| 2 | 2021-08-16 00:00:00 | 1 | 3 |
| 2 | 2021-08-23 00:00:00 | 1 | 3 |
| 2 | 2021-08-30 00:00:00 | 1 | 3 |
| 2 | 2021-08-31 00:00:00 | 0 | |
| 2 | 2021-09-01 00:00:00 | 0 | |
| 2 | 2021-08-08 00:00:00 | 1 | 4 |
| 2 | 2021-08-15 00:00:00 | 1 | 4 |
| 2 | 2021-08-22 00:00:00 | 1 | 4 |
| 2 | 2021-08-23 00:00:00 | 1 | 4 |
Get the number of sequences for each ID
Expected output:
| id | num_sequences |
| -- | ------------------- |
| 1 | 0 |
| 2 | 3 |
How can I achieve this?
If you want the number of sequences, just look at the previous value. When the current value is "1" and the previous is NULL or 0, then you have a new sequence.
So:
select id,
sum( (gap_7_days = 1 and coalesce(prev_gap_7_days, 0) = 0)::int ) as num_sequences
from (select t.*,
lag(gap_7_days) over (partition by id order by date_start) as prev_gap_7_days
from t
) t
group by id;
If you actually want the lengths of the sequences, as in the intermediate results, then ask a new question. That information is not needed for this question.

Get the total time every time a truck has no speed in SQL?

I have the following table in SQL Server 2014:
Vehicle_Id | Speed | Event | Datetime
-----------+---------+--------------+----------------------
1 | 0 | Door-Open | 2019-05-04 15:00:00
1 | 0 | Door-Closed | 2019-05-04 15:15:00
1 | 50 | Driving | 2019-05-04 15:35:00
1 | 0 | Parked | 2019-05-04 15:50:00
1 | 0 | Door-Open | 2019-05-04 15:51:00
1 | 0 | Door-Closed | 2019-05-04 15:52:00
1 | 50 | Driving | 2019-05-04 15:57:00
I need to identify blocks within a datetime in which the truck has been on speed = 0 for more than an hour. So every time a row appears with speed 0, it should create a unique block_id until a row with speed appears. So the total time should be the first time the truck has speed 0 until the next row it finds with speed > 0.
Expected Output:
Vehicle_Id | Speed | Event | Datetime | Block | Total_State_Time_Block(Minutes)
-----------+---------+--------------+------------------------+-------------+---------------------------------
1 | 0 | Door-Open | 2019-05-04 15:00:00 | 1 | 35 Minutes
1 | 0 | Door-Closed | 2019-05-04 15:15:00 | 1 | 35 Minutes
1 | 50 | Driving | 2019-05-04 15:35:00 | 2 | 15 Minutes
1 | 0 | Parked | 2019-05-04 15:50:00 | 3 | 7 Minutes
1 | 0 | Door-Open | 2019-05-04 15:51:00 | 3 | 7 Minutes
1 | 0 | Door-Closed | 2019-05-04 15:52:00 | 3 | 7 Minutes
1 | 50 | Driving | 2019-05-04 15:57:00 | 4 | ...
So, as it's ordered by datetime, the idea is to create groups of adjacent rows with speed = 0 so I can identify the times a truck hasn't moved for more than an hour.
I tried windowing functions to get the result by vehicle and day. But I can't achieve this last step.
You can try with lag()
select
vehicle_id,
speed,
event,
datetime,
sum(case when speed = rnk then 0 else 1 end) over (order by datetime) as block
from
(
select
*,
lag(speed) over (order by datetime) as rnk
from myTable
) val
output:
| vehicle_id | speed | event | datetime | block |
| ---------- | ----- | ----------- | ------------------------ | ----- |
| 1 | 0 | Door-Open | 2019-05-04 15:00:00 | 1 |
| 1 | 0 | Door-Closed | 2019-05-04 15:15:00 | 1 |
| 1 | 50 | Driving | 2019-05-04 15:35:00 | 2 |
| 1 | 0 | Parked | 2019-05-04 15:50:00 | 3 |
| 1 | 0 | Door-Open | 2019-05-04 15:51:00 | 3 |
| 1 | 0 | Door-Closed | 2019-05-04 15:52:00 | 3 |
| 1 | 50 | Driving | 2019-05-04 15:57:00 | 4 |
If you just want periods where the truck has been at speed = 0 for an hour or more, you don't need your expected output. Instead, you can look at the next value with a speed and calculate the decimal hours.
That is, you can get the blocks directly. This gets the start of the block with the duration:
select t.*,
datediff(second, datetime, coalesce(datetime, max_datetime)
) / (60.0 * 60) as decimal_hours
from (select t.*,
lag(speed) over (partition by vehicle_id order by datetime) as prev_speed
min(case when speed > 0 then datetime end) over (partition by vehicle_id order by datetime) as next_speed,
max(datetime) over (partition by vehicle_id) as max_datetime
from t
) t
where (prev_speed is null or prev_speed > 0) and
speed = 0

Get next result with specific ORDER BY satisfying the WHERE clause

Given a TripID I need to grab the next result that satistfies certain criteria (TripSource <> 1 AND HasLot = 1) but I've found the problem that the order to consider "the next Trip" has to be "ORDER BY TripDate, TripOrder". So I mean that TripID has nothing to do with the order.
(I'm using SQL Server 2008, so I can't use LEAD or LAG but I'm also interested in answers using them.)
Example datasource:
+--------+-------------------------+-----------+------------+--------+
| TripID | TripDate | TripOrder | TripSource | HasLot |
+--------+-------------------------+-----------+------------+--------+
1. | 37172 | 2019-08-01 00:00:00.000 | 0 | 1 | 0 |
2. | 37211 | 2019-08-01 00:00:00.000 | 1 | 1 | 0 |
3. | 37198 | 2019-08-01 00:00:00.000 | 2 | 2 | 1 |
4. | 37213 | 2019-08-01 00:00:00.000 | 3 | 1 | 0 |
5. | 37245 | 2019-08-02 00:00:00.000 | 0 | 1 | 0 |
6. | 37279 | 2019-08-02 00:00:00.000 | 1 | 1 | 0 |
7. | 37275 | 2019-08-02 00:00:00.000 | 2 | 1 | 0 |
8. | 37264 | 2019-08-02 00:00:00.000 | 3 | 2 | 0 |
9. | 37336 | 2019-08-03 00:00:00.000 | 0 | 1 | 1 |
10. | 37320 | 2019-08-05 00:00:00.000 | 0 | 1 | 0 |
11. | 37354 | 2019-08-05 00:00:00.000 | 1 | 1 | 0 |
12. | 37329 | 2019-08-05 00:00:00.000 | 2 | 1 | 0 |
13. | 37373 | 2019-08-06 00:00:00.000 | 0 | 1 | 0 |
14. | 37419 | 2019-08-06 00:00:00.000 | 1 | 1 | 0 |
15. | 37421 | 2019-08-06 00:00:00.000 | 2 | 1 | 0 |
16. | 37414 | 2019-08-06 00:00:00.000 | 3 | 1 | 1 |
17. | 37459 | 2019-08-07 00:00:00.000 | 0 | 2 | 1 |
18. | 37467 | 2019-08-07 00:00:00.000 | 1 | 1 | 0 |
19. | 37463 | 2019-08-07 00:00:00.000 | 2 | 1 | 0 |
20. | 37461 | 2019-08-07 00:00:00.000 | 3 | 0 | 0 |
+--------+-------------------------+-----------+------------+--------+
Results I need:
Given TripID 37211 (Row 2.) I need to get 37198 (Row 3.)
Given TripID 37198 (Row 3.) I need to get 37459 (Row 17.)
Given TripID 37459 (Row 17.) I need to get null
Given TripID 37463 (Row 19.) I need to get null
You can use a correlated subquery or outer apply:
select t.*, t2.tripid
from trips t outer apply
(select top (1) t2.*
from trips t2
where t2.tripsource <> 1 and t2.haslot = 1 and
(t2.tripdate > t.tripdate or
t2.tripdate = t.tripdate and t2.triporder > t.triporder
)
order by t2.tripdate desc, t2.triporder desc
) t2;

SQL Matching many-to-many dates for ID field

Edit: Fixed Start Date for User 2
I have a list of user ids, each having many start dates and many end dates.
A start date can be recorded many times after the "actual" start date of an "event", same goes for the end date.
The result should be each the first start date and first end date for each user "event"
I hope that makes sense, see the example below.
Thanks!
Assuming the Following tables are given:
Start Table:
+--------+-------------+
| UserID | Start |
+--------+-------------+
| 1 | 2019-01-01 |
| 1 | 2019-01-02 |
| 1 | 2019-01-03 |
| 1 | 2019-04-01 |
| 1 | 2019-04-02 |
| 1 | 2019-04-03 |
| 2 | 2019-06-01 |
| 2 | 2019-06-02 |
| 2 | 2019-10-01 |
| 2 | 2019-10-02 |
+--------+-------------+
End Table:
+--------+------------+
| UserID | End |
+--------+------------+
| 1 | 2019-03-01 |
| 1 | 2019-03-02 |
| 1 | 2019-03-03 |
| 1 | 2019-05-01 |
| 1 | 2019-05-02 |
| 1 | 2019-05-03 |
| 2 | 2019-08-01 |
| 2 | 2019-08-02 |
| 2 | 2019-12-01 |
| 2 | 2019-12-02 |
+--------+------------+
Result:
+--------+------------+------------+
| UserID | Start | End |
+--------+------------+------------+
| 1 | 2019-01-01 | 2019-03-01 |
| 1 | 2019-04-01 | 2019-05-01 |
| 2 | 2019-06-01 | 2019-08-01 |
| 2 | 2019-10-01 | 2019-12-01 |
+--------+------------+------------+
Not sure I agree with your 2019-10-02
Here is one solution
Example
Select UserID
,[Start] = min([Start])
,[End]
From (
Select A.*
,[End] = (Select min([End]) From EndTable Where UserID=A.UserID and [End] >= A.Start )
From StartTable A
) A
Group By UserID,[End]
Returns
UserID Start End
1 2019-01-01 2019-03-01
1 2019-04-01 2019-05-01
2 2019-06-01 2019-08-01
2 2019-10-01 2019-12-01

Get last value with delta from previous row

I have data
| account | type | position | created_date |
|---------|------|----------|------|
| 1 | 1 | 1 | 2016-08-01 00:00:00 |
| 2 | 1 | 2 | 2016-08-01 00:00:00 |
| 1 | 2 | 2 | 2016-08-01 00:00:00 |
| 2 | 2 | 1 | 2016-08-01 00:00:00 |
| 1 | 1 | 2 | 2016-08-02 00:00:00 |
| 2 | 1 | 1 | 2016-08-02 00:00:00 |
| 1 | 2 | 1 | 2016-08-03 00:00:00 |
| 2 | 2 | 2 | 2016-08-03 00:00:00 |
| 1 | 1 | 2 | 2016-08-04 00:00:00 |
| 2 | 1 | 1 | 2016-08-04 00:00:00 |
| 1 | 2 | 2 | 2016-08-07 00:00:00 |
| 2 | 2 | 1 | 2016-08-07 00:00:00 |
I need to get last positions (account, type, position) and delta from previous position. I'm trying to use Window functions but only get all rows and can't grouping them/get last.
SELECT
account,
type,
FIRST_VALUE(position) OVER w AS position,
FIRST_VALUE(position) OVER w - LEAD(position, 1, 0) OVER w AS delta,
created_date
FROM table
WINDOW w AS (PARTITION BY account ORDER BY created_date DESC)
I have result
| account | type | position | delta | created_date |
|---------|------|----------|-------|--------------|
| 1 | 1 | 1 | 1 | 2016-08-01 00:00:00 |
| 1 | 1 | 2 | 1 | 2016-08-02 00:00:00 |
| 1 | 1 | 2 | 0 | 2016-08-04 00:00:00 |
| 1 | 2 | 2 | 2 | 2016-08-01 00:00:00 |
| 1 | 2 | 1 | -1 | 2016-08-03 00:00:00 |
| 1 | 2 | 2 | 1 | 2016-08-07 00:00:00 |
| 2 | 1 | 2 | 2 | 2016-08-01 00:00:00 |
| 2 | 2 | 1 | 1 | 2016-08-01 00:00:00 |
| and so on |
but i need only last record for each account/type pair
| account | type | position | delta | created_date |
|---------|------|----------|-------|--------------|
| 1 | 1 | 2 | 0 | 2016-08-04 00:00:00 |
| 1 | 2 | 2 | 1 | 2016-08-07 00:00:00 |
| 2 | 1 | 1 | 0 | 2016-08-04 00:00:00 |
| and so on |
Sorry for my bad language and Thanks for any help.
My "best" try..
WITH cte_delta AS (
SELECT
account,
type,
FIRST_VALUE(position) OVER w AS position,
FIRST_VALUE(position) OVER w - LEAD(position, 1, 0) OVER w AS delta,
created_date
FROM table
WINDOW w AS (PARTITION BY account ORDER BY created_date DESC)
),
cte_date AS (
SELECT
account,
type,
MAX(created_date) AS created_date
FROM cte_delta
GROUP BY account, type
)
SELECT cd.*
FROM
cte_delta cd,
cte_date ct
WHERE
cd.account = ct.account
AND cd.type = ct.type
AND cd.created_date = ct.created_date