I'm trying to calculate duration between different status. Which is working for most part.
I have this table
Table
for id = 102, I was able to calculate duration of each status.
with ab as (
select id,
status,
max(updated_time) as end_time,
min(updated_time) as updated_time
from Table
group by id, status
)
select *,
lead(updated_time) over (partition by id order by updated_time) - updated_time as duration,
extract(epoch from duration) as duration_seconds
from ab
Output for id = 102
but for id = 101, status moved between 'IN_PROGRESS' to 'BLOCKED' & back to 'IN_PROGRESS'
here I need the below result so that I can get the correct IN_PROGRESS duration
Expected
One way to do this would be to track every time there is a change of STATUS for a given ID sorted by VERSION. The below query provides the desired output. More than brevity, I thought having multiple steps showing the transformations would be helpful. The column UNIX timestamp can be easily converted to human readable DateTimestamp format based on the specific database being used. The sample table definition and file used has also been shared below.
Query
WITH VW_STATUS_CHANGE AS
(
SELECT ID, STATUS, LAG(STATUS) OVER (PARTITION BY ID ORDER BY VERSION) LAG_STATUS, VERSION, UNIXTIME,
CASE WHEN LAG (STATUS) OVER (PARTITION BY ID ORDER BY VERSION) <> STATUS THEN 1 ELSE 0 END STATUS_CHANGE
FROM STACKOVERFLOWSQL
),
VW_CREATE_SYNTHETIC_PARTITION AS
(
SELECT ID, STATUS, LAG_STATUS, VERSION, UNIXTIME,STATUS_CHANGE,
SUM(STATUS_CHANGE) OVER (ORDER BY ID, VERSION) AS ROWNUMBER
FROM VW_STATUS_CHANGE
) ,
VW_RESULTS_INTERMEDIATE AS
(
SELECT ID, STATUS, LAG_STATUS, VERSION, UNIXTIME, STATUS_CHANGE,
"FIRST_VALUE"(UNIXTIME) OVER (
PARTITION BY "ID",
"STATUS", ROWNUMBER
ORDER BY
"VERSION"
) "TIME_FIRST_VALUE",
"FIRST_VALUE"(UNIXTIME) OVER (
PARTITION BY "ID",
"STATUS", ROWNUMBER
ORDER BY
"VERSION" DESC
) "TIME_LAST_VALUE"
FROM VW_CREATE_SYNTHETIC_PARTITION
ORDER BY ID, VERSION
)
SELECT DISTINCT ID, STATUS, TIME_FIRST_VALUE, TIME_LAST_VALUE
FROM VW_RESULTS_INTERMEDIATE
ORDER BY TIME_FIRST_VALUE
AWS Athena Table Used along with Sample data.
CREATE EXTERNAL TABLE STACKOVERFLOWSQL (
ID INTEGER,
STATUS STRING,
VERSION INTEGER,
UNIXTIME INTEGER
)
ROW FORMAT SERDE 'ORG.APACHE.HADOOP.HIVE.SERDE2.OPENCSVSERDE'
WITH SERDEPROPERTIES (
'SEPARATORCHAR' = ',',
"SKIP.HEADER.LINE.COUNT"="1"
)
STORED AS TEXTFILE
LOCATION 'S3://<S3BUCKETNAME>/';
Dataset Used:
ID,STATUS,VERSION,UNIXTIME
101,NOT_ASSIGNED,1,1668124141
101,IN_PROGRESS,2,1668124143
101,IN_PROGRESS,3,1668124146
101,IN_PROGRESS,4,1668124150
101,IN_PROGRESS,5,1668124155
101,BLOCKED,6,1668124161
101,BLOCKED,7,1668124168
101,IN_PROGRESS,8,1668124176
101,IN_PROGRESS,9,1668124185
101,IN_PROGRESS,10,1668124195
101,COMPLETED,11,1668124206
105,NOT_ASSIGNED,1,1668124207
105,IN_PROGRESS,2,1668124209
105,IN_PROGRESS,3,1668124212
105,IN_PROGRESS,4,1668124216
105,IN_PROGRESS,5,1668124221
105,IN_PROGRESS,6,1668124227
105,COMPLETED,7,1668124234
Result from the View
ID STATUS TIME_FIRST_VALUE TIME_LAST_VALUE
101 NOT_ASSIGNED 1668124141 1668124141
101 IN_PROGRESS 1668124143 1668124155
101 BLOCKED 1668124161 1668124168
101 IN_PROGRESS 1668124176 1668124195
101 COMPLETED 1668124206 1668124206
105 NOT_ASSIGNED 1668124207 1668124207
105 IN_PROGRESS 1668124209 1668124227
105 COMPLETED 1668124234 1668124234
I have a table with the following columns:
contactId (int)
interval (int)
date (smalldate)
small sample data:
1,120,'12/02/2010'
1,121,'12/02/2010'
1,122,'12/02/2010'
1,123,'12/02/2010'
1,145,'12/02/2010'
1,146,'12/02/2010'
1,147,'12/02/2010'
2,122,'12/02/2010'
2,123,'12/02/2010'
2,124,'12/02/2010'
2,320,'12/02/2010'
2,321,'12/02/2010'
2,322,'12/02/2010'
2,450,'12/02/2010'
2,451,'12/02/2010'
how/is it possible - to get sql to return columns "contactId, minInterval, maxInterval, date", e.g
1,120,123,'12/02/2010'
1,145,147,'12/02/2010'
2,122,124,'12/02/2010'
2,320,322,'12/02/2010'
2,450,451,'12/02/2010'
hopefully this makes sense, basically i'm looking to figure out the min/max range of the intervals by provider & date for the range where they increment by one... once there is a break in the interval incrementer (e.g. more than one) then it would indicate a new min/max range...
any help is greatly appreciated :)
here is my exact SQL table setup:
create table availability
(
Id (int)
ProviderId (int)
IntervalId (int)
Date (date)
)
sample data
providerid,intervalid,date
1128,108,2010-12-27
1128,109,2010-12-27
1128,110,2010-12-27
1128,111,2010-12-27
1128,112,2010-12-27
1128,113,2010-12-27
1128,114,2010-12-27
1128,120,2010-12-27
1128,121,2010-12-27
1128,122,2010-12-27
1128,123,2010-12-27
1128,124,2010-12-27
1128,125,2010-12-27
1213,108,2010-12-27
1213,109,2010-12-27
1213,110,2010-12-27
1213,111,2010-12-27
1213,112,2010-12-27
1213,113,2010-12-27
1213,114,2010-12-27
1213,115,2010-12-27
1213,232,2010-12-27
1213,233,2010-12-27
1213,234,2010-12-27
3954,198,2010-12-27
3954,199,2010-12-27
3954,200,2010-12-27
3954,201,2010-12-27
3954,202,2010-12-27
3954,203,2010-12-27
3954,204,2010-12-27
3954,205,2010-12-27
3954,206,2010-12-27
3954,207,2010-12-27
3954,208,2010-12-27
3954,209,2010-12-27
3954,210,2010-12-27
3954,211,2010-12-27
3954,212,2010-12-27
3954,213,2010-12-27
3954,214,2010-12-27
3954,215,2010-12-27
3954,216,2010-12-27
3954,217,2010-12-27
3954,218,2010-12-27
3954,229,2010-12-27
3954,230,2010-12-27
3954,231,2010-12-27
3954,232,2010-12-27
3954,233,2010-12-27
3954,234,2010-12-27
1128,108,2010-12-28
1128,109,2010-12-28
1128,110,2010-12-28
1128,111,2010-12-28
1128,112,2010-12-28
1128,113,2010-12-28
1128,114,2010-12-28
1128,115,2010-12-28
1128,116,2010-12-28
3954,186,2010-12-28
3954,187,2010-12-28
3954,188,2010-12-28
3954,189,2010-12-28
3954,190,2010-12-28
3954,213,2010-12-28
3954,214,2010-12-28
3954,215,2010-12-28
3954,216,2010-12-28
3954,217,2010-12-28
3954,218,2010-12-28
3954,219,2010-12-28
3954,220,2010-12-28
3954,221,2010-12-28
3954,222,2010-12-28
sample result using current sql within answers:
1062,180,180,2010-12-20
1062,179,179,2010-12-20
1062,178,178,2010-12-20
1062,177,177,2010-12-20
1062,176,176,2010-12-20
1062,175,175,2010-12-20
1062,174,174,2010-12-20
1062,173,173,2010-12-20
1062,172,172,2010-12-20
1062,171,171,2010-12-20
1062,170,170,2010-12-20
1062,169,169,2010-12-20
1062,168,168,2010-12-20
1062,167,167,2010-12-20
1062,166,166,2010-12-20
1062,165,165,2010-12-20
1062,164,164,2010-12-20
1062,163,163,2010-12-20
1062,162,162,2010-12-20
1062,161,161,2010-12-20
1062,160,160,2010-12-20
1062,159,159,2010-12-20
1062,158,158,2010-12-20
1062,157,157,2010-12-20
1062,156,156,2010-12-20
1062,155,155,2010-12-20
1062,154,154,2010-12-20
1062,153,153,2010-12-20
1062,152,152,2010-12-20
1062,151,151,2010-12-20
1062,150,150,2010-12-20
1062,149,149,2010-12-20
1062,148,148,2010-12-20
1062,147,147,2010-12-20
1062,146,146,2010-12-20
1062,145,145,2010-12-20
1062,144,144,2010-12-20
1062,143,143,2010-12-20
1062,142,142,2010-12-20
1062,141,141,2010-12-20
1062,140,140,2010-12-20
1062,139,139,2010-12-20
1062,138,138,2010-12-20
1062,137,137,2010-12-20
1062,136,136,2010-12-20
1062,135,135,2010-12-20
1062,134,134,2010-12-20
1062,133,133,2010-12-20
1062,132,132,2010-12-20
1062,131,131,2010-12-20
1062,130,130,2010-12-20
1062,129,129,2010-12-20
1062,128,128,2010-12-20
1062,127,127,2010-12-20
1062,126,126,2010-12-20
1062,125,125,2010-12-20
1062,124,124,2010-12-20
1062,123,123,2010-12-20
1062,122,122,2010-12-20
1062,121,121,2010-12-20
1062,120,120,2010-12-20
1062,119,119,2010-12-20
1062,118,118,2010-12-20
1062,117,117,2010-12-20
1062,116,116,2010-12-20
1062,115,115,2010-12-20
1062,114,114,2010-12-20
1062,113,113,2010-12-20
1062,112,112,2010-12-20
In SQL Server, Oracle and PostgreSQL:
WITH q AS
(
SELECT t.*, interval - ROW_NUMBER() OVER (PARTITION BY contactID, date ORDER BY interval) AS sr
FROM mytable t
)
SELECT contactID, date, MIN(interval), MAX(interval)
FROM q
GROUP BY
date, contactID, sr
ORDER BY
date, contactID, sr
Update:
With your test data I get this output:
WITH mytable (providerId, intervalId, date) AS
(
SELECT 1128,108,'2010-12-27' UNION ALL
SELECT 1128,109,'2010-12-27' UNION ALL
SELECT 1128,110,'2010-12-27' UNION ALL
SELECT 1128,111,'2010-12-27' UNION ALL
SELECT 1128,112,'2010-12-27' UNION ALL
SELECT 1128,113,'2010-12-27' UNION ALL
SELECT 1128,114,'2010-12-27' UNION ALL
SELECT 1128,120,'2010-12-27' UNION ALL
SELECT 1128,121,'2010-12-27' UNION ALL
SELECT 1128,122,'2010-12-27' UNION ALL
SELECT 1128,123,'2010-12-27' UNION ALL
SELECT 1128,124,'2010-12-27' UNION ALL
SELECT 1128,125,'2010-12-27'
),
q AS
(
SELECT t.*, intervalId - ROW_NUMBER() OVER (PARTITION BY providerId, date ORDER BY intervalId) AS sr
FROM mytable t
)
SELECT providerId, date, MIN(intervalId), MAX(intervalId)
FROM q
GROUP BY
date, providerId, sr
ORDER BY
date, providerId, sr
1128 2010-12-27 108 114
1128 2010-12-27 120 125
, i. e. exactly what you were after.
Are you sure you are using the query correctly? Are you having duplicates on (providerId, intervalId, date)?
it's probably possible to do this with a SQL query alone, but it will probably be a bit mind-boggling. Basically a subquery to find places where it increments by one, joined to the original dataset, with tons of other logic in there. That's my impression at least.
If I were you,
If this is a one-time deal, don't care about performance and just iterate over it and do the calculation 'manually'.
If this is a production dataset and you need to do this operation on a frequent / automated / performance-intensive setting, then rearrange the original dataset to make this kind of query easier.
Hope one of those options is available to you.