Calculating time frames between status using SQL 2008/2012 - sql

Calculating time frames between status using SQL 2008/2012
I've the following table who store the status of a student
+----+-----------+------------------+---------+---------+
| ID | PERSON_ID | TIMESTAMP | IN_HOME | STUDYNG |
+----+-----------+------------------+---------+---------+
| 1 | 1 | 17/10/2012 19:00 | 0 | 0 |
| 2 | 1 | 17/10/2012 19:02 | 1 | 0 |
| 3 | 1 | 17/10/2012 19:03 | 1 | 1 |
| 4 | 1 | 17/10/2012 19:04 | 1 | 1 |
| 5 | 1 | 17/10/2012 19:05 | 1 | 0 |
| 6 | 1 | 17/10/2012 19:10 | 0 | 0 |
| 7 | 1 | 17/10/2012 19:12 | 0 | 0 |
| 8 | 1 | 17/10/2012 19:20 | 1 | 0 |
| 9 | 1 | 17/10/2012 19:25 | 1 | 0 |
| 10 | 1 | 17/10/2012 19:26 | 1 | 1 |
| 11 | 1 | 17/10/2012 19:30 | 1 | 0 |
+----+-----------+------------------+---------+---------+
And i would like to produce results in 2 ways to make some reports:
I:
+-----------+------------------+------------------+---------+---------+
| PERSON_ID | START | END | IN_HOME | STUDYNG |
+-----------+------------------+------------------+---------+---------+
| 1 | 17/10/2012 19:00 | 17/10/2012 19:02 | 0 | 0 |
| 1 | 17/10/2012 19:02 | 17/10/2012 19:03 | 1 | 0 |
| 1 | 17/10/2012 19:03 | 17/10/2012 19:05 | 1 | 1 |
| 1 | 17/10/2012 19:05 | 17/10/2012 19:10 | 1 | 0 |
| 1 | 17/10/2012 19:10 | 17/10/2012 19:20 | 0 | 0 |
| 1 | 17/10/2012 19:20 | 17/10/2012 19:26 | 1 | 0 |
| 1 | 17/10/2012 19:26 | 17/10/2012 19:30 | 1 | 1 |
+-----------+------------------+------------------+---------+---------+
II:
+-----+------------------+------------------+--------+---------+----------+----------+
| PID | START | END | InHOME | TotTIME | FreeTIME | StudTIME |
+-----+------------------+------------------+--------+---------+----------+----------+
| 1 | 17/10/2012 19:00 | 17/10/2012 19:02 | 0 | 2min | 2min | 0min |
| 1 | 17/10/2012 19:02 | 17/10/2012 19:10 | 1 | 8min | 6min | 2min |
| 1 | 17/10/2012 19:10 | 17/10/2012 19:20 | 0 | 10min | 10min | 0min |
| 1 | 17/10/2012 19:20 | 17/10/2012 19:26 | 1 | 6min | 6min | 0min |
+-----+------------------+------------------+--------+---------+----------+----------+
What's the best solution to solve this problems?

First one may look like this. I just don't understand, why you have STUDYING = 0 in last row of report I (mistake may be?)
select
T.PERSON_ID,
min(T.[TIMESTAMP]) as START,
CALC.[TIMESTAMP] as [END],
T.IN_HOME, T.STUDYNG
from #Temp as T
cross apply
(
select top 1 TT.*
from #Temp as TT
where
TT.PERSON_ID = T.PERSON_ID and TT.[TIMESTAMP] > T.[TIMESTAMP] and
(TT.IN_HOME <> T.IN_HOME or TT.STUDYNG <> T.STUDYNG)
order by TT.[TIMESTAMP] asc
) as CALC
group by
T.PERSON_ID,
CALC.[TIMESTAMP],
T.IN_HOME, T.STUDYNG
order by START

If you can use SQL 2012 (as stated in the title), I suggest you look into using the LEAD/LAG functions.
When SQL Fiddle comes back on-line, I'll cook up a nice little example.
Here's the first part:
;WITH DATA
AS (SELECT *,
CASE
WHEN ID = 1 THEN 1
ELSE
CASE
WHEN IN_HOME = Lag(IN_HOME, 1)
OVER (
ORDER BY TIMESTAMP)
AND STUDYNG = Lag(STUDYNG, 1)
OVER (
ORDER BY TIMESTAMP) THEN 0
ELSE 1
END
END rn
FROM STUDY),
PREPARED_DATA
AS (SELECT t1.ID,
t1.PERSON_ID,
t1.TIMESTAMP,
Sum(t2.RN) RN,
t1.IN_HOME,
t1.STUDYNG
FROM DATA t1
INNER JOIN DATA t2
ON t1.ID >= t2.ID
GROUP BY t1.ID,
t1.PERSON_ID,
t1.TIMESTAMP,
t1.IN_HOME,
t1.STUDYNG),
SECOND
AS (SELECT PERSON_ID,
Max(TIMESTAMP) max_time,
Min(TIMESTAMP) min_time,
IN_HOME,
STUDYNG
FROM PREPARED_DATA
GROUP BY PERSON_ID,
IN_HOME,
STUDYNG,
RN)
SELECT PERSON_ID,
MAX_TIME,
Lead(MIN_TIME, 1)
OVER (
ORDER BY MIN_TIME),
IN_HOME,
STUDYNG
FROM SECOND
ORDER BY MIN_TIME
A working example can be found here.
If you like the idea, I can prepare the second part too. Just let me know.

Related

How to count the number of occurrent of each user ID with conditions in SQL database

I have a table in MS SQL that collects the status of each ID in a marketing campaign. In each month, there is a column to check that each consumer ID is in the marketing campaign or not (is_in_programme), if so, in each month, are they newcomers in our programme or not (is_new_apply). Each ID can apply in the programme in multiple times.
My table contains datetime (reported in the last day of every month, with no skipped month), ID, status of each ID as I stated above. And I want to check that in each period, how many time that each ID is in this programme (on EXPECTED column).
In my Output column, I've tried to use the ROW_NUMBER() function that partitioned by id, is_in_programme, is_new_apply when is_in_programme, is_new_apply are both 1. But I cannot check the occurent of each ID when is_new_apply == 0
+------------+-------+-----------------+--------------+--------+----------+
| datetime | ID | is_in_programme | is_new_apply | Output | EXPECTED |
+------------+-------+-----------------+--------------+--------+----------+
| 31/01/2020 | 12345 | 1 | 1 | 1 | 1 |
| 29/02/2020 | 12345 | 1 | 0 | 0 | 1 |
| 31/03/2020 | 12345 | 1 | 0 | 0 | 1 |
| 30/04/2020 | 12345 | 1 | 0 | 0 | 1 |
| 31/05/2020 | 12345 | 0 | 0 | 0 | 0 |
| 30/06/2020 | 12345 | 1 | 1 | 2 | 2 |
| 31/07/2020 | 12345 | 1 | 0 | 0 | 2 |
| 31/08/2020 | 12345 | 1 | 0 | 0 | 2 |
| 31/01/2020 | 67890 | 0 | 0 | 0 | 0 |
| 29/02/2020 | 67890 | 1 | 1 | 1 | 1 |
| 31/03/2020 | 67890 | 1 | 0 | 0 | 1 |
| 30/04/2020 | 67890 | 0 | 0 | 0 | 0 |
| 31/05/2020 | 67890 | 0 | 0 | 0 | 0 |
| 30/06/2020 | 67890 | 1 | 1 | 2 | 2 |
| 31/07/2020 | 67890 | 1 | 0 | 0 | 2 |
| 31/08/2020 | 67890 | 1 | 0 | 0 | 2 |
| 30/09/2020 | 67890 | 0 | 0 | 0 | 0 |
| 31/10/2020 | 67890 | 1 | 1 | 3 | 3 |
| 30/11/2020 | 67890 | 1 | 0 | 0 | 3 |
| 31/12/2020 | 67890 | 1 | 0 | 0 | 3 |
+------------+-------+-----------------+--------------+--------+----------+
Is there any way to check that how many time that each ID is in the marketing campaign in each period like my EXPECTED column?
You seem to want a cumulative sum of is_new_apply when is_in_program is not 0. That would be:
select t.*,
(case when is_in_program <> 0
then sum(is_new_apply) over (partition by id order by datetime)
else 0
end) as expected
from t;

Get the total time every time a truck has no speed in SQL?

I have the following table in SQL Server 2014:
Vehicle_Id | Speed | Event | Datetime
-----------+---------+--------------+----------------------
1 | 0 | Door-Open | 2019-05-04 15:00:00
1 | 0 | Door-Closed | 2019-05-04 15:15:00
1 | 50 | Driving | 2019-05-04 15:35:00
1 | 0 | Parked | 2019-05-04 15:50:00
1 | 0 | Door-Open | 2019-05-04 15:51:00
1 | 0 | Door-Closed | 2019-05-04 15:52:00
1 | 50 | Driving | 2019-05-04 15:57:00
I need to identify blocks within a datetime in which the truck has been on speed = 0 for more than an hour. So every time a row appears with speed 0, it should create a unique block_id until a row with speed appears. So the total time should be the first time the truck has speed 0 until the next row it finds with speed > 0.
Expected Output:
Vehicle_Id | Speed | Event | Datetime | Block | Total_State_Time_Block(Minutes)
-----------+---------+--------------+------------------------+-------------+---------------------------------
1 | 0 | Door-Open | 2019-05-04 15:00:00 | 1 | 35 Minutes
1 | 0 | Door-Closed | 2019-05-04 15:15:00 | 1 | 35 Minutes
1 | 50 | Driving | 2019-05-04 15:35:00 | 2 | 15 Minutes
1 | 0 | Parked | 2019-05-04 15:50:00 | 3 | 7 Minutes
1 | 0 | Door-Open | 2019-05-04 15:51:00 | 3 | 7 Minutes
1 | 0 | Door-Closed | 2019-05-04 15:52:00 | 3 | 7 Minutes
1 | 50 | Driving | 2019-05-04 15:57:00 | 4 | ...
So, as it's ordered by datetime, the idea is to create groups of adjacent rows with speed = 0 so I can identify the times a truck hasn't moved for more than an hour.
I tried windowing functions to get the result by vehicle and day. But I can't achieve this last step.
You can try with lag()
select
vehicle_id,
speed,
event,
datetime,
sum(case when speed = rnk then 0 else 1 end) over (order by datetime) as block
from
(
select
*,
lag(speed) over (order by datetime) as rnk
from myTable
) val
output:
| vehicle_id | speed | event | datetime | block |
| ---------- | ----- | ----------- | ------------------------ | ----- |
| 1 | 0 | Door-Open | 2019-05-04 15:00:00 | 1 |
| 1 | 0 | Door-Closed | 2019-05-04 15:15:00 | 1 |
| 1 | 50 | Driving | 2019-05-04 15:35:00 | 2 |
| 1 | 0 | Parked | 2019-05-04 15:50:00 | 3 |
| 1 | 0 | Door-Open | 2019-05-04 15:51:00 | 3 |
| 1 | 0 | Door-Closed | 2019-05-04 15:52:00 | 3 |
| 1 | 50 | Driving | 2019-05-04 15:57:00 | 4 |
If you just want periods where the truck has been at speed = 0 for an hour or more, you don't need your expected output. Instead, you can look at the next value with a speed and calculate the decimal hours.
That is, you can get the blocks directly. This gets the start of the block with the duration:
select t.*,
datediff(second, datetime, coalesce(datetime, max_datetime)
) / (60.0 * 60) as decimal_hours
from (select t.*,
lag(speed) over (partition by vehicle_id order by datetime) as prev_speed
min(case when speed > 0 then datetime end) over (partition by vehicle_id order by datetime) as next_speed,
max(datetime) over (partition by vehicle_id) as max_datetime
from t
) t
where (prev_speed is null or prev_speed > 0) and
speed = 0

Cumulative sum of multiple window functions V3

I have this table:
id | date | player_id | score | all_games | all_wins | n_games | n_wins
============================================================================================
6747 | 2018-08-10 | 1 | 0 | 1 | | 1 |
6751 | 2018-08-10 | 1 | 0 | 2 | 0 | 2 |
6764 | 2018-08-10 | 1 | 0 | 3 | 0 | 3 |
6783 | 2018-08-10 | 1 | 0 | 4 | 0 | 4 |
6804 | 2018-08-10 | 1 | 0 | 5 | 0 | 5 |
6821 | 2018-08-10 | 1 | 0 | 6 | 0 | 6 |
6828 | 2018-08-10 | 1 | 0 | 7 | 0 | 7 |
17334 | 2018-08-23 | 1 | 0 | 8 | 0 | 8 | 0
17363 | 2018-08-23 | 1 | 0 | 9 | 0 | 9 | 0
17398 | 2018-08-23 | 1 | 0 | 10 | 0 | 10 | 0
17403 | 2018-08-23 | 1 | 0 | 11 | 0 | 11 | 0
17409 | 2018-08-23 | 1 | 0 | 12 | 0 | 12 | 0
33656 | 2018-09-13 | 1 | 0 | 13 | 0 | 13 | 0
33687 | 2018-09-13 | 1 | 0 | 14 | 0 | 14 | 0
45393 | 2018-09-27 | 1 | 0 | 15 | 0 | 15 | 0
45402 | 2018-09-27 | 1 | 0 | 16 | 0 | 16 | 0
45422 | 2018-09-27 | 1 | 1 | 17 | 0 | 17 | 0
45453 | 2018-09-27 | 1 | 0 | 18 | 1 | 18 | 0
45461 | 2018-09-27 | 1 | 0 | 19 | 1 | 19 | 0
45474 | 2018-09-27 | 1 | 0 | 20 | 1 | 20 | 0
57155 | 2018-10-11 | 1 | 0 | 21 | 1 | 21 | 1
57215 | 2018-10-11 | 1 | 0 | 22 | 1 | 22 | 1
57225 | 2018-10-11 | 1 | 0 | 23 | 1 | 23 | 1
69868 | 2018-10-25 | 1 | 0 | 24 | 1 | 24 | 1
The issue that I now need to solve is that I need n_games to be a rolling count of the last number of games per day, i.e. a user can play multiple games per day, as present it is just the same as row_number(*) OVER all_games
The other issues is that the column n_wins only does a sum(*) of the rolling windows wins for the day, so if a user wins a couple of games early on in day, that will not be added to the n_wins column until the next day.
I have the example DEMO:
I have tried this query
SELECT id,
date,
player_id,
score,
row_number(*) OVER all_races AS all_games,
sum(score) OVER all_races AS all_wins,
row_number(*) OVER last_n AS n_games,
sum(score) OVER last_n AS n_wins
FROM scores
WINDOW
all_races AS (PARTITION BY player_id ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
last_n AS (PARTITION BY player_id ORDER BY date ASC RANGE BETWEEN interval '7 days' PRECEDING AND interval '1 day' PRECEDING);
Ideally I need a query that will output something like this table
id | date | player_id | score | all_games | all_wins | n_games | n_wins
============================================================================================
6747 | 2018-08-10 | 1 | 0 | 1 | | 1 |
6751 | 2018-08-10 | 1 | 0 | 2 | 0 | 2 |
6764 | 2018-08-10 | 1 | 0 | 3 | 0 | 3 |
6783 | 2018-08-10 | 1 | 0 | 4 | 0 | 4 |
6804 | 2018-08-10 | 1 | 0 | 5 | 0 | 5 |
6821 | 2018-08-10 | 1 | 0 | 6 | 0 | 6 |
6828 | 2018-08-10 | 1 | 0 | 7 | 0 | 7 |
17334 | 2018-08-23 | 1 | 0 | 8 | 0 | 1 | 0
17363 | 2018-08-23 | 1 | 0 | 9 | 0 | 2 | 0
17398 | 2018-08-23 | 1 | 0 | 10 | 0 | 3 | 0
17403 | 2018-08-23 | 1 | 0 | 11 | 0 | 4 | 0
17409 | 2018-08-23 | 1 | 0 | 12 | 0 | 5 | 0
33656 | 2018-09-13 | 1 | 1 | 13 | 1 | 6 | 0
33687 | 2018-09-13 | 1 | 0 | 14 | 1 | 7 | 1
45393 | 2018-09-27 | 1 | 0 | 15 | 1 | 1 | 1
45402 | 2018-09-27 | 1 | 0 | 16 | 1 | 2 | 1
45422 | 2018-09-27 | 1 | 1 | 17 | 1 | 3 | 1
45453 | 2018-09-27 | 1 | 0 | 18 | 2 | 4 | 2
45461 | 2018-09-27 | 1 | 0 | 19 | 2 | 5 | 2
45474 | 2018-09-27 | 1 | 0 | 20 | 2 | 6 | 1
57155 | 2018-10-11 | 1 | 0 | 21 | 2 | 7 | 1
57215 | 2018-10-11 | 1 | 0 | 22 | 2 | 1 | 1
57225 | 2018-10-11 | 1 | 0 | 23 | 2 | 2 | 1
69868 | 2018-10-25 | 1 | 0 | 24 | 2 | 3 | 1

SQL Getting Running Count with SUM and OVER

In sql I have a history table for each item we have and they can have a record of in or out with a quantity for each action. I'm trying to get a running count of how many of an item we have based on whether it's an activity of out or in. Here is my final sql:
SELECT itemid,
activitydate,
activitycode,
SUM(quantity) AS quantity,
SUM(CASE WHEN activitycode = 'IN'
THEN quantity
WHEN activitycode = 'OUT'
THEN -quantity
ELSE 0 END) OVER (PARTITION BY itemid ORDER BY activitydate rows unbounded preceding) AS runningcount
FROM itemhistory
GROUP BY itemid,
activitydate,
activitycode
This results in:
+--------+-------------------------+--------------+----------+--------------+
| itemid | activitydate | activitycode | quantity | runningcount |
+--------+-------------------------+--------------+----------+--------------+
| 1 | 2017-06-08 13:58:00.000 | IN | 1 | 1 |
| 1 | 2017-06-08 16:02:00.000 | IN | 6 | 2 |
| 1 | 2017-06-15 11:43:00.000 | OUT | 3 | 1 |
| 1 | 2017-06-19 12:36:00.000 | IN | 1 | 2 |
| 2 | 2017-06-08 13:50:00.000 | IN | 5 | 1 |
| 2 | 2017-06-12 12:41:00.000 | IN | 4 | 2 |
| 2 | 2017-06-15 11:38:00.000 | OUT | 2 | 1 |
| 2 | 2017-06-20 12:54:00.000 | IN | 15 | 2 |
| 2 | 2017-06-08 13:52:00.000 | IN | 5 | 3 |
| 2 | 2017-06-12 13:09:00.000 | IN | 1 | 4 |
| 2 | 2017-06-15 11:47:00.000 | OUT | 1 | 3 |
| 2 | 2017-06-20 13:14:00.000 | IN | 1 | 4 |
+--------+-------------------------+--------------+----------+--------------+
I want the end result to look like this:
+--------+-------------------------+--------------+----------+--------------+
| itemid | activitydate | activitycode | quantity | runningcount |
+--------+-------------------------+--------------+----------+--------------+
| 1 | 2017-06-08 13:58:00.000 | IN | 1 | 1 |
| 1 | 2017-06-08 16:02:00.000 | IN | 6 | 7 |
| 1 | 2017-06-15 11:43:00.000 | OUT | 3 | 4 |
| 1 | 2017-06-19 12:36:00.000 | IN | 1 | 5 |
| 2 | 2017-06-08 13:50:00.000 | IN | 5 | 5 |
| 2 | 2017-06-12 12:41:00.000 | IN | 4 | 9 |
| 2 | 2017-06-15 11:38:00.000 | OUT | 2 | 7 |
| 2 | 2017-06-20 12:54:00.000 | IN | 15 | 22 |
| 2 | 2017-06-08 13:52:00.000 | IN | 5 | 27 |
| 2 | 2017-06-12 13:09:00.000 | IN | 1 | 28 |
| 2 | 2017-06-15 11:47:00.000 | OUT | 1 | 27 |
| 2 | 2017-06-20 13:14:00.000 | IN | 1 | 28 |
+--------+-------------------------+--------------+----------+--------------+
You want sum(sum()), because this is an aggregation query:
SELECT itemid, activitydate, activitycode,
SUM(quantity) AS quantity,
SUM(SUM(CASE WHEN activitycode = 'IN' THEN quantity
WHEN activitycode = 'OUT' THEN -quantity
ELSE 0
END)
) OVER (PARTITION BY itemid ORDER BY activitydate ) AS runningcount
FROM itemhistory
GROUP BY itemid, activitydate, activitycode

How to Group by 6 days in Postgresql

I want to convert this type of data to 6Days GROUP BY format.
+-----+--------------+------------+
| gid | cnt | date |
+-----+--------------+------------+
| 1 | 1 | 2012-02-05 |
| 2 | 2 | 2012-02-06 |
| 3 | 1 | 2012-02-07 |
| 4 | 1 | 2012-02-08 |
| 5 | 1 | 2012-02-09 |
| 6 | 2 | 2012-02-10 |
| 7 | 3 | 2012-02-11 |
| 8 | 1 | 2012-02-12 |
| 9 | 1 | 2012-02-13 |
| 10 | 2 | 2012-02-14 |
| 11 | 3 | 2012-02-15 |
| 12 | 4 | 2012-02-16 |
| 13 | 1 | 2012-02-17 |
| 14 | 1 | 2012-02-18 |
| 15 | 1 | 2012-02-19 |
| 16 | NULL | 2012-02-20 |
| 17 | 6 | 2012-02-21 |
| 18 | NULL | 2012-02-22 |
+-----+--------------+------------+
↓↓↓↓↓↓↓↓↓↓↓↓↓↓
The date is a continuous format.
If I understand correctly you need something like this:
WITH x AS (SELECT date::date, (random() * 3)::int AS cnt FROM generate_series('2012-02-05'::date, '2012-02-22'::date, '1 day'::interval) AS date
)
SELECT start::date,
(start + '5 day'::interval)::date AS end,
sum(cnt)
FROM generate_series(
(SELECT min(date) FROM x),
(SELECT max(date) FROM x),
'5 day'::interval
) AS start
LEFT JOIN x ON (x.date >= start AND x.date <= start + '5 day'::interval)
GROUP BY 1, 2
ORDER BY 1
In x I emulate your table.