Get the total time every time a truck has no speed in SQL? - sql

I have the following table in SQL Server 2014:
Vehicle_Id | Speed | Event | Datetime
-----------+---------+--------------+----------------------
1 | 0 | Door-Open | 2019-05-04 15:00:00
1 | 0 | Door-Closed | 2019-05-04 15:15:00
1 | 50 | Driving | 2019-05-04 15:35:00
1 | 0 | Parked | 2019-05-04 15:50:00
1 | 0 | Door-Open | 2019-05-04 15:51:00
1 | 0 | Door-Closed | 2019-05-04 15:52:00
1 | 50 | Driving | 2019-05-04 15:57:00
I need to identify blocks within a datetime in which the truck has been on speed = 0 for more than an hour. So every time a row appears with speed 0, it should create a unique block_id until a row with speed appears. So the total time should be the first time the truck has speed 0 until the next row it finds with speed > 0.
Expected Output:
Vehicle_Id | Speed | Event | Datetime | Block | Total_State_Time_Block(Minutes)
-----------+---------+--------------+------------------------+-------------+---------------------------------
1 | 0 | Door-Open | 2019-05-04 15:00:00 | 1 | 35 Minutes
1 | 0 | Door-Closed | 2019-05-04 15:15:00 | 1 | 35 Minutes
1 | 50 | Driving | 2019-05-04 15:35:00 | 2 | 15 Minutes
1 | 0 | Parked | 2019-05-04 15:50:00 | 3 | 7 Minutes
1 | 0 | Door-Open | 2019-05-04 15:51:00 | 3 | 7 Minutes
1 | 0 | Door-Closed | 2019-05-04 15:52:00 | 3 | 7 Minutes
1 | 50 | Driving | 2019-05-04 15:57:00 | 4 | ...
So, as it's ordered by datetime, the idea is to create groups of adjacent rows with speed = 0 so I can identify the times a truck hasn't moved for more than an hour.
I tried windowing functions to get the result by vehicle and day. But I can't achieve this last step.

You can try with lag()
select
vehicle_id,
speed,
event,
datetime,
sum(case when speed = rnk then 0 else 1 end) over (order by datetime) as block
from
(
select
*,
lag(speed) over (order by datetime) as rnk
from myTable
) val
output:
| vehicle_id | speed | event | datetime | block |
| ---------- | ----- | ----------- | ------------------------ | ----- |
| 1 | 0 | Door-Open | 2019-05-04 15:00:00 | 1 |
| 1 | 0 | Door-Closed | 2019-05-04 15:15:00 | 1 |
| 1 | 50 | Driving | 2019-05-04 15:35:00 | 2 |
| 1 | 0 | Parked | 2019-05-04 15:50:00 | 3 |
| 1 | 0 | Door-Open | 2019-05-04 15:51:00 | 3 |
| 1 | 0 | Door-Closed | 2019-05-04 15:52:00 | 3 |
| 1 | 50 | Driving | 2019-05-04 15:57:00 | 4 |

If you just want periods where the truck has been at speed = 0 for an hour or more, you don't need your expected output. Instead, you can look at the next value with a speed and calculate the decimal hours.
That is, you can get the blocks directly. This gets the start of the block with the duration:
select t.*,
datediff(second, datetime, coalesce(datetime, max_datetime)
) / (60.0 * 60) as decimal_hours
from (select t.*,
lag(speed) over (partition by vehicle_id order by datetime) as prev_speed
min(case when speed > 0 then datetime end) over (partition by vehicle_id order by datetime) as next_speed,
max(datetime) over (partition by vehicle_id) as max_datetime
from t
) t
where (prev_speed is null or prev_speed > 0) and
speed = 0

Related

Find MIN date associated with FIRST non-0 value

I am trying to generate a list of manager start dates which can be determined by the minimum AS_OF date which is the table partition.
I'm not sure how to accomplish this in a non-processing heavy manner. I believe there are some windows functions that can are better suited to accomplish this.
I do have the below which works, but is terribly slow.
SELECT
Employee_ID,
MIN(As_Of) as manager_start_date
FROM table
WHERE Direct_Reports > 0
GROUP BY 1
Sample table below with desired output at bottom.
+-------------+----------------+----------+
| Employee_ID | Direct_Reports | As_Of |
+-------------+----------------+----------+
| 1 | 0 | 1/1/2019 |
+-------------+----------------+----------+
| 1 | 0 | 1/2/2019 |
+-------------+----------------+----------+
| 1 | 0 | 1/3/2019 |
+-------------+----------------+----------+
| 1 | 1 | 1/4/2019 | '<--- First non 0 value for Employee 1'
+-------------+----------------+----------+
| 2 | 0 | 1/1/2019 |
+-------------+----------------+----------+
| 2 | 0 | 1/2/2019 |
+-------------+----------------+----------+
| 2 | 5 | 1/3/2019 | '<--- First non 0 value for Employee 2'
+-------------+----------------+----------+
| 3 | 0 | 1/1/2019 |
+-------------+----------------+----------+
| 3 | 0 | 1/2/2019 |
+-------------+----------------+----------+
| 3 | 5 | 1/3/2019 | '<--- First non 0 value for Employee 3'
+-------------+----------------+----------+
| 3 | 10 | 1/4/2019 |
+-------------+----------------+----------+
| 3 | 7 | 1/5/2019 |
+-------------+----------------+----------+
+-------------+--------------------+
| Employee_ID | Manager_Start_Date |
+-------------+--------------------+
| 1 | 1/4/2019 |
+-------------+--------------------+
| 2 | 1/3/2019 |
+-------------+--------------------+
| 3 | 1/3/2019 |
+-------------+--------------------+
Try this:
select empid , min(case when directreport > 0 then as_of END) from dbo.manager
group by empid

SQL - Identify consecutive numbers in a table

Is there a way to flag consecutive numbers in an SQL table?
Based on the values in 'value_group_4' column, is it possible to tag continous values? This needs to be done within groups of each 'date_group_1'
I tried using row_numbers, rank, dense_rank but unable to come up with a foolproof way.
This has nothing to do with consecutiveness. You simply want to mark all rows where date_group_1 and value_group_4 are not unique.
One way:
select
mytable.*,
case when exists
(
select null
from mytable agg
where agg.date_group_1 = mytable.date_group_1
and agg.value_group_4 = mytable.value_group_4
group by agg.date_group_1, agg.value_group_4
having count(*) > 1
) then 1 else 0 end as flag
from mytable
order by date_group_1, value_group_4;
In a later version of SQL Server you'd use COUNT OVER instead.
SQL tables represent unordered sets. There is no such thing as consecutive values, unless a column specifies the ordering. Your data does not have such an obvious column, but I'll assume one exists and just call it id for convenience.
With such a column, lag()/lead() does what you want:
select t.*,
(case when lag(value_group_4) over (partition by data_group1 order by id) = value_group_4
then 1
when lead(value_group_4) over (partition by data_group1 order by id) = value_group_4
then 1
else 0
end) as flag
from t;
On close inspection, value_group_3 may do what you want. So you can use that for the id.
If your version of SQL Server doesn't have a full suite of windowing functions it should be still possible. This problem looks like a last-non-null problem which Itzik Ben-Gan has good example here... http://www.itprotoday.com/software-development/last-non-null-puzzle
Also, look at Mikael Eriksson's answer here which uses no windowing functions.
If the order of your data is determined by the date_group_1, value_group_3 column values, then why not make it as simple as the following query:
select
*,
rank() over(partition by date_group_1 order by value_group_3) - 1 value_group_3,
case
when count(*) over(partition by date_group_1, value_group_3) > 1 then 1
else 0
end expected_result
from data;
Output:
| date_group_1 | category_group_2 | value_group_3 | value_group_3 | expected_result |
+--------------+------------------+---------------+---------------+-----------------+
| 2018-01-11 | A | 15.3 | 0 | 0 |
| 2018-01-11 | B | 17.3 | 1 | 1 |
| 2018-01-11 | A | 17.3 | 1 | 1 |
| 2018-01-11 | B | 21 | 3 | 0 |
| 2018-01-22 | A | 15.3 | 0 | 0 |
| 2018-01-22 | B | 17.3 | 1 | 0 |
| 2018-01-22 | A | 21 | 2 | 0 |
| 2018-01-22 | B | 23 | 3 | 0 |
| 2018-03-13 | A | 15.3 | 0 | 0 |
| 2018-03-13 | B | 17.3 | 1 | 1 |
| 2018-03-13 | A | 17.3 | 1 | 1 |
| 2018-03-13 | B | 23 | 3 | 0 |
| 2018-05-15 | A | 6 | 0 | 0 |
| 2018-05-15 | B | 6.3 | 1 | 0 |
| 2018-05-15 | A | 15 | 2 | 0 |
| 2018-05-15 | B | 16.3 | 3 | 1 |
| 2018-05-15 | A | 16.3 | 3 | 1 |
| 2018-05-15 | B | 22 | 5 | 0 |
| 2019-05-04 | A | 0 | 0 | 0 |
| 2019-05-04 | B | 7 | 1 | 0 |
| 2019-05-04 | A | 15.3 | 2 | 0 |
| 2019-05-04 | B | 17.3 | 3 | 0 |
Test it online with SQL Fiddle.

Set a flag based on the value of another flag in the past hour

I have a table with the following design:
+------+-------------------------+-------------+
| Shop | Date | SafetyEvent |
+------+-------------------------+-------------+
| 1 | 2018-06-25 10:00:00.000 | 0 |
| 1 | 2018-06-25 10:30:00.000 | 1 |
| 1 | 2018-06-25 10:45:00.000 | 0 |
| 2 | 2018-06-25 11:00:00.000 | 0 |
| 2 | 2018-06-25 11:30:00.000 | 0 |
| 2 | 2018-06-25 11:45:00.000 | 0 |
| 3 | 2018-06-25 12:00:00.000 | 1 |
| 3 | 2018-06-25 12:30:00.000 | 0 |
| 3 | 2018-06-25 12:45:00.000 | 0 |
+------+-------------------------+-------------+
Basically at each shop, we track the date/time of a repair and flag if a safety event occurred. I want to add an additional column that tracks if a safety event has occurred in the last 8 hours at each shop. The end result will be like this:
+------+-------------------------+-------------+-------------------+
| Shop | Date | SafetyEvent | SafetyEvent8Hours |
+------+-------------------------+-------------+-------------------+
| 1 | 2018-06-25 10:00:00.000 | 0 | 0 |
| 1 | 2018-06-25 10:30:00.000 | 1 | 1 |
| 1 | 2018-06-25 10:45:00.000 | 0 | 1 |
| 2 | 2018-06-25 11:00:00.000 | 0 | 0 |
| 2 | 2018-06-25 11:30:00.000 | 0 | 0 |
| 2 | 2018-06-25 11:45:00.000 | 0 | 0 |
| 3 | 2018-06-25 12:00:00.000 | 1 | 1 |
| 3 | 2018-06-25 12:30:00.000 | 0 | 1 |
| 3 | 2018-06-25 12:45:00.000 | 0 | 1 |
+------+-------------------------+-------------+-------------------+
I was trying to use DATEDIFF but couldn't figure out how to have it occur for each row.
This isn't particularly efficient, but you can use apply or a correlated subquery:
select t.*, t8.SafetyEvent8Hours
from t apply
(select max(SafetyEvent) as SafetyEvent8Hours
from t t2
where t2.shop = t.shop and
t2.date <= t.date and
t2.date > dateadd(hour, -8, t.date)
) t8;
If you can rely on events being logged every 15 minutes, then a more efficient method is to use window functions:
select t.*,
max(SafetyEvent) over (partition by shop order by date rows between 31 preceding and current row) as SafetyEvent8Hours
from t

Aggregating tsrange values into day buckets with a tie-breaker

So I've got a schema that lets people donate $ to a set of organizations, and that donation is tied to a certain arbitrary period of time. I'm working on a report that looks at each day, and for each organization shows the total number of donations and the total cumulative value of those donations for that organization's day.
For example, here's a mockup of 3 donors, Alpha (orange), Bravo (green), and Charlie (Blue) donating to 2 different organizations (Foo and Bar) over various time periods:
I've created a SQLFiddle that implements the above example in a schema that somewhat reflects what I'm working with in reality: http://sqlfiddle.com/#!17/88969/1
(The schema is broken out into more tables than what you'd come up with given the problem statement to better reflect the real-life version I'm working with)
So far, the query that I've managed to put together looks like this:
WITH report_dates AS (
SELECT '2018-01-01'::date + g AS date
FROM generate_series(0, 14) g
), organizations AS (
SELECT id AS organization_id FROM users
WHERE type = 'Organization'
)
SELECT * FROM report_dates rd
CROSS JOIN organizations o
LEFT JOIN LATERAL (
SELECT
COALESCE(sum(doa.amount_cents), 0) AS total_donations_cents,
COALESCE(count(doa.*), 0) AS total_donors
FROM users
LEFT JOIN donor_organization_amounts doa ON doa.organization_id = users.id
LEFT JOIN donor_amounts da ON da.id = doa.donor_amounts_id
LEFT JOIN donor_schedules ds ON ds.donor_amounts_id = da.id
WHERE (users.id = o.organization_id) AND (ds.period && tsrange(rd.date::timestamp, rd.date::timestamp + INTERVAL '1 day', '[)'))
) o2 ON true;
With the results looking like this:
| date | organization_id | total_donations_cents | total_donors |
|------------|-----------------|-----------------------|--------------|
| 2018-01-01 | 1 | 0 | 0 |
| 2018-01-02 | 1 | 250 | 1 |
| 2018-01-03 | 1 | 250 | 1 |
| 2018-01-04 | 1 | 1750 | 3 |
| 2018-01-05 | 1 | 1750 | 3 |
| 2018-01-06 | 1 | 1750 | 3 |
| 2018-01-07 | 1 | 750 | 2 |
| 2018-01-08 | 1 | 850 | 2 |
| 2018-01-09 | 1 | 850 | 2 |
| 2018-01-10 | 1 | 500 | 1 |
| 2018-01-11 | 1 | 500 | 1 |
| 2018-01-12 | 1 | 500 | 1 |
| 2018-01-13 | 1 | 1500 | 2 |
| 2018-01-14 | 1 | 1000 | 1 |
| 2018-01-15 | 1 | 0 | 0 |
| 2018-01-01 | 2 | 0 | 0 |
| 2018-01-02 | 2 | 250 | 1 |
| 2018-01-03 | 2 | 250 | 1 |
| 2018-01-04 | 2 | 1750 | 2 |
| 2018-01-05 | 2 | 1750 | 2 |
| 2018-01-06 | 2 | 1750 | 2 |
| 2018-01-07 | 2 | 1750 | 2 |
| 2018-01-08 | 2 | 2000 | 2 |
| 2018-01-09 | 2 | 2000 | 2 |
| 2018-01-10 | 2 | 1500 | 1 |
| 2018-01-11 | 2 | 1500 | 1 |
| 2018-01-12 | 2 | 0 | 0 |
| 2018-01-13 | 2 | 1000 | 2 |
| 2018-01-14 | 2 | 500 | 1 |
| 2018-01-15 | 2 | 0 | 0 |
That's pretty close, however the problem with this query is that on days where a donation ends and that same donor begins a new one, it should only count that donor's donation one time, using the higher amount donation as a tie-breaker for the cumulative $ count. An example of that is on 2018-01-13 for organization Foo: total_donors should be 1 and total_donations_cents 1000.
I tried to implement a tie-breaker for using DISTINCT ON but I got off into the weeds... any help would be appreciated!
Also, should I be worried about the performance implications of my implementation so far, given the CTEs and the CROSS JOIN?
Figured it out using DISTINCT ON: http://sqlfiddle.com/#!17/88969/4
WITH report_dates AS (
SELECT '2018-01-01'::date + g AS date
FROM generate_series(0, 14) g
), organizations AS (
SELECT id AS organization_id FROM users
WHERE type = 'Organization'
), donors_by_date AS (
SELECT * FROM report_dates rd
CROSS JOIN organizations o
LEFT JOIN LATERAL (
SELECT DISTINCT ON (date, da.donor_id)
da.donor_id,
doa.id,
doa.donor_amounts_id,
doa.amount_cents
FROM users
LEFT JOIN donor_organization_amounts doa ON doa.organization_id = users.id
LEFT JOIN donor_amounts da ON da.id = doa.donor_amounts_id
LEFT JOIN donor_schedules ds ON ds.donor_amounts_id = da.id
WHERE (users.id = o.organization_id) AND (ds.period && tsrange(rd.date::timestamp, rd.date::timestamp + INTERVAL '1 day', '[)'))
ORDER BY date, da.donor_id, doa.amount_cents DESC
) foo ON true
)
SELECT
date,
organization_id,
COALESCE(SUM(amount_cents), 0) AS total_donations_cents,
COUNT(*) FILTER (WHERE donor_id IS NOT NULL) AS total_donors
FROM donors_by_date
GROUP BY date, organization_id
ORDER BY organization_id, date;
Result:
| date | organization_id | total_donations_cents | total_donors |
|------------|-----------------|-----------------------|--------------|
| 2018-01-01 | 1 | 0 | 0 |
| 2018-01-02 | 1 | 250 | 1 |
| 2018-01-03 | 1 | 250 | 1 |
| 2018-01-04 | 1 | 1750 | 3 |
| 2018-01-05 | 1 | 1750 | 3 |
| 2018-01-06 | 1 | 1750 | 3 |
| 2018-01-07 | 1 | 750 | 2 |
| 2018-01-08 | 1 | 850 | 2 |
| 2018-01-09 | 1 | 850 | 2 |
| 2018-01-10 | 1 | 500 | 1 |
| 2018-01-11 | 1 | 500 | 1 |
| 2018-01-12 | 1 | 500 | 1 |
| 2018-01-13 | 1 | 1000 | 1 |
| 2018-01-14 | 1 | 1000 | 1 |
| 2018-01-15 | 1 | 0 | 0 |
| 2018-01-01 | 2 | 0 | 0 |
| 2018-01-02 | 2 | 250 | 1 |
| 2018-01-03 | 2 | 250 | 1 |
| 2018-01-04 | 2 | 1750 | 2 |
| 2018-01-05 | 2 | 1750 | 2 |
| 2018-01-06 | 2 | 1750 | 2 |
| 2018-01-07 | 2 | 1750 | 2 |
| 2018-01-08 | 2 | 2000 | 2 |
| 2018-01-09 | 2 | 2000 | 2 |
| 2018-01-10 | 2 | 1500 | 1 |
| 2018-01-11 | 2 | 1500 | 1 |
| 2018-01-12 | 2 | 0 | 0 |
| 2018-01-13 | 2 | 1000 | 2 |
| 2018-01-14 | 2 | 500 | 1 |
| 2018-01-15 | 2 | 0 | 0 |

Calculating time frames between status using SQL 2008/2012

Calculating time frames between status using SQL 2008/2012
I've the following table who store the status of a student
+----+-----------+------------------+---------+---------+
| ID | PERSON_ID | TIMESTAMP | IN_HOME | STUDYNG |
+----+-----------+------------------+---------+---------+
| 1 | 1 | 17/10/2012 19:00 | 0 | 0 |
| 2 | 1 | 17/10/2012 19:02 | 1 | 0 |
| 3 | 1 | 17/10/2012 19:03 | 1 | 1 |
| 4 | 1 | 17/10/2012 19:04 | 1 | 1 |
| 5 | 1 | 17/10/2012 19:05 | 1 | 0 |
| 6 | 1 | 17/10/2012 19:10 | 0 | 0 |
| 7 | 1 | 17/10/2012 19:12 | 0 | 0 |
| 8 | 1 | 17/10/2012 19:20 | 1 | 0 |
| 9 | 1 | 17/10/2012 19:25 | 1 | 0 |
| 10 | 1 | 17/10/2012 19:26 | 1 | 1 |
| 11 | 1 | 17/10/2012 19:30 | 1 | 0 |
+----+-----------+------------------+---------+---------+
And i would like to produce results in 2 ways to make some reports:
I:
+-----------+------------------+------------------+---------+---------+
| PERSON_ID | START | END | IN_HOME | STUDYNG |
+-----------+------------------+------------------+---------+---------+
| 1 | 17/10/2012 19:00 | 17/10/2012 19:02 | 0 | 0 |
| 1 | 17/10/2012 19:02 | 17/10/2012 19:03 | 1 | 0 |
| 1 | 17/10/2012 19:03 | 17/10/2012 19:05 | 1 | 1 |
| 1 | 17/10/2012 19:05 | 17/10/2012 19:10 | 1 | 0 |
| 1 | 17/10/2012 19:10 | 17/10/2012 19:20 | 0 | 0 |
| 1 | 17/10/2012 19:20 | 17/10/2012 19:26 | 1 | 0 |
| 1 | 17/10/2012 19:26 | 17/10/2012 19:30 | 1 | 1 |
+-----------+------------------+------------------+---------+---------+
II:
+-----+------------------+------------------+--------+---------+----------+----------+
| PID | START | END | InHOME | TotTIME | FreeTIME | StudTIME |
+-----+------------------+------------------+--------+---------+----------+----------+
| 1 | 17/10/2012 19:00 | 17/10/2012 19:02 | 0 | 2min | 2min | 0min |
| 1 | 17/10/2012 19:02 | 17/10/2012 19:10 | 1 | 8min | 6min | 2min |
| 1 | 17/10/2012 19:10 | 17/10/2012 19:20 | 0 | 10min | 10min | 0min |
| 1 | 17/10/2012 19:20 | 17/10/2012 19:26 | 1 | 6min | 6min | 0min |
+-----+------------------+------------------+--------+---------+----------+----------+
What's the best solution to solve this problems?
First one may look like this. I just don't understand, why you have STUDYING = 0 in last row of report I (mistake may be?)
select
T.PERSON_ID,
min(T.[TIMESTAMP]) as START,
CALC.[TIMESTAMP] as [END],
T.IN_HOME, T.STUDYNG
from #Temp as T
cross apply
(
select top 1 TT.*
from #Temp as TT
where
TT.PERSON_ID = T.PERSON_ID and TT.[TIMESTAMP] > T.[TIMESTAMP] and
(TT.IN_HOME <> T.IN_HOME or TT.STUDYNG <> T.STUDYNG)
order by TT.[TIMESTAMP] asc
) as CALC
group by
T.PERSON_ID,
CALC.[TIMESTAMP],
T.IN_HOME, T.STUDYNG
order by START
If you can use SQL 2012 (as stated in the title), I suggest you look into using the LEAD/LAG functions.
When SQL Fiddle comes back on-line, I'll cook up a nice little example.
Here's the first part:
;WITH DATA
AS (SELECT *,
CASE
WHEN ID = 1 THEN 1
ELSE
CASE
WHEN IN_HOME = Lag(IN_HOME, 1)
OVER (
ORDER BY TIMESTAMP)
AND STUDYNG = Lag(STUDYNG, 1)
OVER (
ORDER BY TIMESTAMP) THEN 0
ELSE 1
END
END rn
FROM STUDY),
PREPARED_DATA
AS (SELECT t1.ID,
t1.PERSON_ID,
t1.TIMESTAMP,
Sum(t2.RN) RN,
t1.IN_HOME,
t1.STUDYNG
FROM DATA t1
INNER JOIN DATA t2
ON t1.ID >= t2.ID
GROUP BY t1.ID,
t1.PERSON_ID,
t1.TIMESTAMP,
t1.IN_HOME,
t1.STUDYNG),
SECOND
AS (SELECT PERSON_ID,
Max(TIMESTAMP) max_time,
Min(TIMESTAMP) min_time,
IN_HOME,
STUDYNG
FROM PREPARED_DATA
GROUP BY PERSON_ID,
IN_HOME,
STUDYNG,
RN)
SELECT PERSON_ID,
MAX_TIME,
Lead(MIN_TIME, 1)
OVER (
ORDER BY MIN_TIME),
IN_HOME,
STUDYNG
FROM SECOND
ORDER BY MIN_TIME
A working example can be found here.
If you like the idea, I can prepare the second part too. Just let me know.