Sequence of Patterns within Date/time range - sql

I have a problem I would need help on ..
In the example below, if I want to get scenarios based on the data patterns 010 as scenario1, 000 as scenario2, 111 as scenario3 within the Id.. Ignore the records that doesn't follow the pattern..
Ex:
id date Status
1 2012-10-18 1
1 2012-10-19 1
1 2012-10-20 0
1 2012-10-21 0
1 2012-10-22 0
1 2012-10-23 0
1 2012-10-24 1
1 2012-10-25 0
1 2012-10-26 0
1 2012-10-27 0
1 2012-10-28 1
2 2012-10-19 0
2 2012-10-20 0
2 2012-10-21 0
2 2012-10-22 1
2 2012-10-23 1
scenario1:
1 2012-10-23 0
1 2012-10-24 1
1 2012-10-25 0
Scenario2:
1 2012-10-20 0
1 2012-10-21 0
1 2012-10-22 0
2 2012-10-19 0
2 2012-10-20 0
2 2012-10-21 0
Scenario3 - none (no records)

You can construct the patterns as strings and then use string comparison.
At least part of the trick is that you want all rows in the pattern, so you need to construct all potential patterns where each row might appear:
select t.*
from (select t.*,
concat(lag(status), -2) over (partition by id order by date),
lag(status), -1) over (partition by id order by date),
status
) as pat1,
concat(lag(status), -1) over (partition by id order by date),
status,
lead(status), 1) over (partition by id order by date)
) as pat2,
concat(status,
lead(status), 1) over (partition by id order by date),
lead(status), 2) over (partition by id order by date)
) as pat3
from t
) t
where '010' in (pat1, pat2, pat3);

Related

SQL status changes with start and end dates

This is a table of user statuses over the period of 9/1/2021 to 9/10/2021. 1 means "active." 0 means "canceled."
date
user
status
9/1/2021
1
1
9/1/2021
2
0
9/1/2021
3
1
9/2/2021
1
1
9/2/2021
2
1
9/2/2021
3
1
9/3/2021
1
0
9/3/2021
2
1
9/3/2021
3
1
9/4/2021
1
0
9/4/2021
2
1
9/4/2021
3
1
9/5/2021
1
0
9/5/2021
2
1
9/5/2021
3
0
9/6/2021
1
1
9/6/2021
2
1
9/6/2021
3
0
9/7/2021
1
1
9/7/2021
2
1
9/7/2021
3
0
9/8/2021
1
0
9/8/2021
2
1
9/8/2021
3
1
9/9/2021
1
0
9/9/2021
2
1
9/9/2021
3
1
9/10/2021
1
1
9/10/2021
2
0
9/10/2021
3
1
I want to get the start and end date for each user's active and canceled periods during this time. I know this involves a window function, but I can't quite figure out how to do it. This is my desired output:
user
status
start date
end date
1
1
9/1/2021
9/2/2021
1
0
9/3/2021
9/5/2021
1
1
9/6/2021
9/7/2021
1
0
9/8/2021
9/9/2021
1
1
9/10/2021
9/10/2021
2
0
9/1/2021
9/1/2021
2
1
9/2/2021
9/9/2021
2
0
9/10/2021
9/10/2021
3
1
9/1/2021
9/4/2021
3
0
9/5/2021
9/7/2021
3
1
9/8/2021
9/10/2021
Updated
Here is an example:fiddle
Updated query,
;with cte as (
SELECT *,Rank() OVER ( partition by usr,status order by dt )as rnk
,LAG(dt,1) OVER (partition by usr order by dt desc) as LAG
,Row_number() over (partition by usr order by dt asc) as rnum
,count(*) over (partition by usr,status) as cnt
FROM TABLE1
)
Select usr,status,dt as start_date,LAG as End_date from cte
I was able to figure it out.
The key components are filtering for when the current status does not equal the previous status. This indicates a date when the status of the user changes.
When you filter for these rows, you can just use the LEAD() window function and subtract 1 day to get the end date for that status.
with win as
(
select
usr
, dt
, lag(status) over (partition by usr order by dt) as prev_status
, status
from subs
)
select
usr
, status
, dt as start_date
, coalesce(lead(dt) over (partition by usr order by dt) - interval '1 day', (select max(dt) from win)) as end_date
from win
where
status <> prev_status
or prev_status is null

Time series group by day and kind

I create a table using the command below:
CREATE TABLE IF NOT EXISTS stats (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
session_kind INTEGER NOT NULL,
ts TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
)
I insert some time series data using the command below:
INSERT INTO stats (session_kind) values (?1)
Some time after having executed several times the insert command, I have some time series data as below:
id session_kind ts
-----------------------------------------
1 0 2020-04-18 12:59:51 // day 1
2 1 2020-04-19 12:59:52 // day 2
3 0 2020-04-19 12:59:53
4 1 2020-04-19 12:59:54
5 0 2020-04-19 12:59:55
6 2 2020-04-19 12:59:56
7 2 2020-04-19 12:59:57
8 2 2020-04-19 12:59:58
9 2 2020-04-19 12:59:59
10 0 2020-04-20 12:59:51 // day 3
11 1 2020-04-20 12:59:52
12 0 2020-04-20 12:59:53
13 1 2020-04-20 12:59:54
14 0 2020-04-20 12:59:55
15 2 2020-04-20 12:59:56
16 2 2020-04-20 12:59:57
17 2 2020-04-20 12:59:58
18 2 2020-04-21 12:59:59 // day 4
What I would like to have a command that groups my data by date from the most recent day to the least and the number of each session_kind like below (I don't want to give any parameter to this command):
0 1 2 ts
-------------------------
0 0 1 2020-04-21 // day 4
3 2 3 2020-04-20 // day 3
2 2 4 2020-04-19 // day 2
1 0 0 2020-04-18 // day 1
How can I group my data as above?
You can do conditional aggregation:
select
sum(session_kind= 0) session_kind_0,
sum(session_kind= 1) session_kind_1,
sum(session_kind= 2) session_kind_2,
date(ts) ts_day
from mytable
group by date(ts)
order by ts_day desc
If you want something dynamic, then it might be simpler to put the results in rows rather than columns:
select date(ts) ts_day, session_kind, count(*) cnt
from mytable
group by date(ts), session_kind
order by ts_day desc, session_kind
If I understand correctly, you just want to sum the values:
select date(timestamp),
sum(case when session_kind = 1 then 1 else 0 end) as cnt_1,
sum(case when session_kind = 2 then 1 else 0 end) as cnt_2,
sum(case when session_kind = 3 then 1 else 0 end) as cnt_3
from t
group by date(timestamp);
You can also simplify this:
select date(timestamp),
sum( session_kind = 1 ) as cnt_1,
sum( session_kind = 2 ) as cnt_2,
sum( session_kind = 3 ) as cnt_3
from t
group by date(timestamp);

How to count consecutive dates using Netezza

I need to count consecutive days in order to define my cohorts. I have a table that looks like:
pat_id admin_date
----------------------------
1 3/10/2019
1 3/11/2019
1 3/23/2019
1 3/24/2019
1 3/25/2019
2 12/26/2017
2 2/27/2019
2 3/16/2019
2 3/17/2019
I want such as output:
pat_id admin_date consecutive
--------------------------------------------
1 3/10/2019 1
1 3/11/2019 2
1 3/23/2019 1
1 3/24/2019 2
1 3/25/2019 3
2 12/26/2017 1
2 2/27/2019 1
2 3/16/2019 1
2 3/17/2019 2
so that I can use these consecutive days value (per pat_id) to filter for my cohort. I've seen few posts that suggested using DateDiff/DateAdd with row_number, such as:
datediff(day, -row_number() over (partition by mrn order by admin_date), admin_date)
but datediff/dateadd functions wouldn't work on Netezza...
The closest I've got so far was:
select row_number() over (partition by mrn order by administration_date) as consecutive
which doesn't recognize gap between dates and return such an output:
pat_id admin_date consecutive
--------------------------------------------
1 3/10/2019 1
1 3/11/2019 2
1 3/23/2019 3
1 3/24/2019 4
1 3/25/2019 5
2 12/26/2017 1
2 2/27/2019 2
2 3/16/2019 3
2 3/17/2019 4
Does anyone know how to tackle this?
Use lag() to see where the groups start and a cumulative sum to define the group. The rest is just row_number():
select t.*,
row_number() over (partition by pat_id, grp order by admin_date) as consecutive
from (select t.*,
sum( case when prev_ad = admin_date - interval '1 day' then 0 else 1 end) over
(partition by pat_id order by admin_date) as grp
from (select t.*,
lag(admin_date) over (partition by pat_id order by admin_date) as prev_ad
from t
) t
)t ;

How to get running total from consecutive columns in Oracle SQL

I have troubles to display consecutive holidays from an existing date dataset in Oracle SQL. For example, in December 2017 between 20th and 30th, there are the following days off (because Christmas and weekend days):
23.12.2017 Saturday
24.12.2017 Sunday
25.12.2017 Christmas
30.12.2017 Saturday
Now I want my result dataset to look like this (RUNTOT is needed):
DAT ISOFF RUNTOT
20.12.2017 0 0
21.12.2017 0 0
22.12.2017 0 0
23.12.2017 1 1
24.12.2017 1 2
25.12.2017 1 3
26.12.2017 0 0
27.12.2017 0 0
28.12.2017 0 0
29.12.2017 0 0
30.12.2017 1 1
That means when "ISOFF" changes I want to count (or sum) the consecutive rows where "ISOFF" is 1.
I tried to approach a solution with an analytic function, where I summarize the "ISOFF" to the current row.
SELECT DAT,
ISOFF,
SUM (ISOFF)
OVER (ORDER BY DAT ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RUNTOT
FROM (TIME_DATASET)
WHERE DAT BETWEEN DATE '2017-12-20' AND DATE '2017-12-27'
ORDER BY 1
What I get now is following dataset:
DAT ISOFF RUNTOT
20.12.2017 0 0
21.12.2017 0 0
22.12.2017 0 0
23.12.2017 1 1
24.12.2017 1 2
25.12.2017 1 3
26.12.2017 0 3
27.12.2017 0 3
28.12.2017 0 3
29.12.2017 0 3
30.12.2017 1 4
How can I reset the running total if ISOFF changes to 0? Or is this the wrong approach to solve this problem?
Thank you for your help!
This is a gaps-and-islands problem. Here is one method that assigns the groups by the number of 0s up to that row:
select t.*,
(case when is_off = 1
then row_number() over (partition by grp order by dat)
end) as runtot
from (select t.*,
sum(case when is_off = 0 then 1 else 0 end) over (order by dat) as grp
from TIME_DATASET t
) t;
You may use the recursive recursive subquery factoring - the precondition is, that your dates are consecutive without gaps (or you have some oder row number sequence to follow in steps of one).
WITH t1(dat, isoff, runtot) AS (
SELECT dat, isoff, 0 runtot
FROM tab
WHERE DAT = DATE'2017-12-20'
UNION ALL
SELECT t2.dat, t2.isoff,
case when t2.isoff = 0 then 0 else runtot + t2.isoff end as runtot
FROM tab t2, t1
WHERE t2.dat = t1.dat + 1
)
SELECT dat, isoff, runtot
FROM t1;
DAT ISOFF RUNTOT
------------------- ---------- ----------
20.12.2017 00:00:00 0 0
21.12.2017 00:00:00 0 0
22.12.2017 00:00:00 0 0
23.12.2017 00:00:00 1 1
24.12.2017 00:00:00 1 2
25.12.2017 00:00:00 1 3
26.12.2017 00:00:00 0 0
27.12.2017 00:00:00 0 0
28.12.2017 00:00:00 0 0
29.12.2017 00:00:00 0 0
30.12.2017 00:00:00 1 1
Another variation, which doesn't need a subquery or CTE but does need all days to be present and have the same time, is - for the holiday dates only (where isoff = 1) - to see how many days it's been since the last non-holiday date:
select dat,
isoff,
case
when isoff = 1 then
coalesce(dat - max(case when isoff = 0 then dat end)
over (order by dat range between unbounded preceding and 1 preceding), 1)
else 0
end as runtot
from time_dataset
order by dat;
DAT ISOFF RUNTOT
---------- ---------- ----------
2017-12-20 0 0
2017-12-21 0 0
2017-12-22 0 0
2017-12-23 1 1
2017-12-24 1 2
2017-12-25 1 3
2017-12-26 0 0
2017-12-27 0 0
2017-12-28 0 0
2017-12-29 0 0
2017-12-30 1 1
The coalesce() is there in case the first date in the range is a holiday - as there is no previous non-holiday date to compare against, that subtraction would get null.
db<>fiddle with a slightly larger data set.

Counting number of rows grouped by date and hour

I am tracking customer store entry data in Microsoft SQL Server 2008 R2 that looks something like this:
DoorID DateTimeStamp EntryType
1 2013-09-02 09:01:16.000 IN
1 2013-09-02 09:04:09.000 IN
1 2013-09-02 10:19:29.000 IN
1 2013-09-02 10:19:30.000 IN
1 2013-09-02 10:19:32.000 OUT
1 2013-09-02 10:26:36.000 IN
1 2013-09-02 10:26:40.000 OUT
I don't want to count the OUT rows, just IN.
I believe that it needs to be grouped on Date, and DoorID, then get the hours totals.
I would like it to come out like this.
Date DoorID HourOfDay TotalInPersons
2013-09-02 1 0 0
2013-09-02 1 1 0
2013-09-02 1 2 0
2013-09-02 1 3 0
2013-09-02 1 4 0
2013-09-02 1 5 0
2013-09-02 1 6 0
2013-09-02 1 7 0
2013-09-02 1 8 0
2013-09-02 1 9 2
2013-09-02 1 10 3
2013-09-02 1 11 0
2013-09-02 1 12 0
2013-09-02 1 13 0
2013-09-02 1 14 0
2013-09-02 1 15 0
2013-09-02 1 16 0
2013-09-02 1 17 0
2013-09-02 1 18 0
2013-09-02 1 19 0
2013-09-02 1 20 0
2013-09-02 1 21 0
2013-09-02 1 22 0
2013-09-02 1 23 0
SELECT
[Date] = CONVERT(DATE, DateTimeStamp),
DoorID,
HourOfDay = DATEPART(HOUR, DateTimeStamp),
TotalInPersons = COUNT(*)
FROM dbo.tablename
WHERE EntryType = 'IN'
GROUP BY
CONVERT(DATE, DateTimeStamp),
DoorID,
DATEPART(HOUR, DateTimeStamp)
ORDER BY
[Date], DoorID, HourOfDay;
Of course if you need all hours, even where no rows are represented, here is one solution (which limits the output for any day only to the doors that have at least one IN entry on that day):
;WITH h AS
(
SELECT TOP (24) h = number FROM Master..spt_values
WHERE type = N'P' ORDER BY number
),
doors AS
(
SELECT DISTINCT DoorID, [Date] = CONVERT(DATE,DateTimeStamp)
FROM dbo.tablename WHERE EntryType = 'IN'
)
SELECT
d.[Date],
d.DoorID,
HourOfDay = h.h,
TotalInPersons = COUNT(t.EntryType)
FROM doors AS d CROSS JOIN h
LEFT OUTER JOIN dbo.tablename AS t
ON CONVERT(DATE, t.DateTimeStamp) = d.[Date]
AND t.DoorID = d.DoorID
AND DATEPART(HOUR, t.DateTimeStamp) = h.h
AND t.EntryType = 'IN'
GROUP BY d.[Date], d.DoorID, h.h
ORDER BY d.[Date], d.DoorID, h.h;
How about something like this:
SELECT
CAST(DateTimeStamp AS DATE) AS Date
,DoorID
,DATEPART(HOUR, DateTimeStamp) AS HourOfDay
,COUNT(*) AS TotalInPersons
FROM StoreTable
WHERE EntryType = 'IN'
GROUP BY
CAST(DateTimeStamp AS DATE)
,DoorID
,DATEPART(HOUR, DateTimeStamp)
This should work. I guessed on how you would pull DoorID and TotalPersons, but the overall logic is correct
SELECT CONVERT(date,dateColumn) AS Date,
datepart(hh,dateColumn) AS HourOfDay,
DoorID,
COUNT(people) AS TotalPersons
FROM yourtable
WHERE EntryType = 'IN'
GROUP BY CONVERT(date,dateColumn), datepart(hh,dateColumn), DoorID
ORDER BY CONVERT(date,dateColumn), datepart(hh,dateColumn)