SQL: Calculate duration based on dates and parameters (Change Log) - sql

I have a dataset that is like a ticketing system change log, I am trying to calculate kind of like an SLA time across the records, like how long did this specific ticket sit with this Group 2 for before it was either resolved or moved to another group to resolve.
The data looks like so:
ID
field
value
start
end
1
assignment_group
Group 1
2022-03-21 08:00:00
2022-03-21 08:05:00
1
incident_state
Work in Progress
2022-03-21 08:05:00
2022-03-21 08:30:00
1
assignment_group
Group 2
2022-03-21 08:35:00
2022-03-21 08:50:00
1
assigned_to
User 1
2022-03-21 08:50:00
2022-03-21 08:51:00
1
incident_state
Work in Progress
2022-03-21 09:00:00
2022-03-21 09:30:00
1
incident_state
Resolved
2022-03-21 09:30:00
2022-03-21 09:31:00
2
assignment_group
Group 2
2022-01-21 11:30:00
2022-01-21 11:35:00
2
assigned_to
User 1
2022-01-21 11:35:00
2022-01-21 11:37:00
2
incident_state
Work in Progress
2022-01-21 11:40:00
2022-01-21 11:55:00
2
assignment_group
Group 3
2022-01-21 11:58:00
2022-01-21 12:00:00
2
assigned_to
User 2
2022-01-21 12:05:00
2022-01-21 12:06:00
2
incident_state
Resolved
2022-01-21 12:10:00
2022-01-21 12:07:00
The issue I am having is calculating the duration based on the start time the ticket was assigned to a specific group and the end time of when either the ticket was resolved by that group or moved to another group to resolve. For example, I am only interested in Group 2, the duration for ticket 1 for when the ticket was sitting with Group 2 till the ticket was resolved by Group 2 is 2022-03-21 08:35:00 to 2022-03-21 09:31:00, so duration is 1 hour and 1 minute. But for Ticket 2, the ticket sat with Group 2 from 2022-01-21 11:30:00 till it was transferred to another group to resolve at 2022-01-21 11:58:00.
My code looks like so at the moment, I join two tables to pull in the ticket information and then the ticket state changes (so every time an action is taken on that ticket). Then I am left with the table above. I kind of guess I need to use a lead function but I can't figure out how to get the correct end time for the correct record (When incident_state = resolved OR assignment_group = another group):
WITH incidents as
(
SELECT number, sys_id AS SYS_ID_INCIDENT
FROM tables.ServiceIncidents
),
changes as
(
SELECT id, start, field, field_value, value, `end`
FROM tables.IncidentInstances
),
incident_changes as
(
SELECT *, TIMESTAMP_DIFF(changes.`end`,changes.start, MINUTE) as Duration, row_number() over (partition by number order by start) as RN
FROM incidents
LEFT JOIN changes
ON (incidents.SYS_ID_INCIDENT = changes.id)
),
IAMtickets as
(
SELECT i.number, i.SYS_ID_INCIDENT, start, field, value, `end`, Duration, RN
FROM incident_changes i
INNER JOIN
(SELECT DISTINCT number FROM incident_changes WHERE value ='Group 2') r
ON i.number = r.number
),
cte as
(
SELECT *, lead(value) over (partition by IAMtickets.number order by start)
FROM IAMtickets
)
SELECT * FROM CTE
I want the output to be something like this:
ID
Assigned to
Duration
Outcome
1
User 1
1 hour 1 minute
Resolved
2
User 1
28 minutes
Transferred

Related

How to count consecutive days in a table where days are duplicated "PostgresSQL"

Hello I would like to know the highest count of consecutive days a user has trained for.
My logs table that stores the records looks like this:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
4
1
2023-01-27 10:00:00
5
1
5
1
2023-01-28 10:00:00
The closest I could get is with this query, which does work only if the user has trained on one ground at a day.
SELECT COUNT(*) AS days_in_row
FROM (SELECT row_number() OVER (ORDER BY day) - day AS grp
FROM logs
WHERE created_at >= '2023-01-24 00:00:00'
AND user_id = 1) x
GROUP BY grp
logs table:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
4
1
2023-01-27 10:00:00
5
1
5
1
2023-01-28 10:00:00
This query would return a count of 5 consecutive days which is correct.
However my query doesn't work once a user trains multiple times on different training grounds in one day:
logs table:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
3
2
2023-01-26 10:00:00
5
1
4
1
2023-01-27 10:00:00
Than the query from above would return a count of 2 consecutive days which is not what I expect instead I would expect the number four because the user has trained the following days in row (1,2,3,4).
Thank you for reading.
Select only distinct data of interest first
SELECT min(created_at) start, COUNT(*) AS days_in_row
FROM (SELECT created_at, row_number() OVER (ORDER BY day) - day AS grp
FROM (
select distinct day, created_at
from logs
where created_at >= '2023-01-24 00:00:00'
AND user_id = 1) t
) x
GROUP BY grp

Elapsed time between two DateTime values

Say you have a room with an indefinite number of light bulbs, and these are turning randomly on and off. Each time a bulb is turned on and then off, a record is entered in a table with TurnedOn and TurnedOff values.
How should the query look like if I am interested in how long (HH.mm.ss) was it visible in the room between two DateTime values?
e.g.
LightBulbId
TurnedOn
TurnedOff
1
2022-10-01 06:00:00
2022-10-01 11:00:00
2
2022-10-01 07:00:00
2022-10-01 10:00:00
3
2022-10-01 08:00:00
2022-10-01 09:00:00
4
2022-10-01 12:00:00
2022-10-01 13:00:00
5
2022-10-01 14:00:00
2022-10-01 15:00:00
So for the example above in the time period between 2022-10-01 06:00:00 and 2022-10-01 15:00:00 - 09 hours has passed and it was visible for 07 hours.
The bulb can be on for more than 24 hours.
One hour increments are put in the example for simplicity.
If at least one light bulb is on, you can see in the room.
If a Light is turned on, starting from that moment you can see in the room, and if a light is turned off starting from that moment you can not :-)
Another example with the same logic:
Say you have a machine that more than one person can work on at the same time. StartTime and EndTime is added to the table each time when a person starts and then stops working on a machine. I am interested in what was machines work time for a given time period?
select sign(on_off) as on_off
,sum(hour_diff) as hours
from
(
select *
,datediff(second, time, lead(time) over(order by time))/3600.0 as hour_diff
,sum(case when status = 'TurnedOn' then 1 else -1 end) over(order by time) as on_off
from t
unpivot (time for status in(TurnedOn, TurnedOff)) up
) t
group by sign(on_off)
on_off
hours
0
2.000000
1
7.000000
Fiddle
To expand on the correct answer above, this will give the results in the desired hh:mm:ss format
select
sign(on_off) as on_off,
RIGHT('0'+CONVERT(VARCHAR(2),SUM(secs_diff)/3600 ),2)+'h'
+RIGHT('0'+CONVERT(VARCHAR(2),SUM(secs_diff)/60 %60 ),2)+'m'
+RIGHT('0'+CONVERT(VARCHAR(2),SUM(secs_diff)%60),2)+'s'
from
(
select *
,datediff(second, time, lead(time) over(order by time))as secs_diff
,sum(case when status = 'TurnedOn' then 1 else -1 end) over(order by time) as on_off
from #Lights
unpivot (time for status in(TurnedOn, TurnedOff)) up
) t
group by sign(on_off)

Rolling Sum Calculation Based on 2 Date Fields

Giving up after a few hours of failed attempts.
My data is in the following format - event_date can never be higher than create_date.
I'd need to calculate on a rolling n-day basis (let's say 3) the sum of units where the create_date and event_date were within the same 3-day window. The data is illustrative but each event_date can have over 500+ different create_dates associated with it and the number isn't constant. There is a possibility of event_dates missing.
So let's say for 2022-02-03, I only want to sum units where both the event_date and create_date values were between 2022-02-01 and 2022-02-03.
event_date
create_date
rowid
units
2022-02-01
2022-01-20
1
100
2022-02-01
2022-02-01
2
100
2022-02-02
2022-01-21
3
100
2022-02-02
2022-01-23
4
100
2022-02-02
2022-01-31
5
100
2022-02-02
2022-02-02
6
100
2022-02-03
2022-01-30
7
100
2022-02-03
2022-02-01
8
100
2022-02-03
2022-02-03
9
100
2022-02-05
2022-02-01
10
100
2022-02-05
2022-02-03
11
100
The output I'd need to get to (added in brackets the rows I'd need to include in the calculation for each date but my result would only need to include the numerical sum) . I tried calculating using either dates but neither of them returned the results I needed.
date
units
2022-02-01
100 (Row 2)
2022-02-02
300 (Row 2,5,6)
2022-02-03
300 (Row 2,6,8,9)
2022-02-04
200 (Row 6,9)
2022-02-05
200 (Row 9,11)
In Python I solved above with a definition that looped through filtering a dataframe for each date but I am struggling to do the same in SQL.
Thank you!
Consider below approach
with events_dates as (
select date from (
select min(event_date) min_date, max(event_date) max_date
from your_table
), unnest(generate_date_array(min_date, max_date)) date
)
select date, sum(units) as units, string_agg('' || rowid) rows_included
from events_dates
left join your_table
on create_date between date - 2 and date
and event_date between date - 2 and date
group by date
if applied to sample data in your question - output is

Count consecutive recurring values

I am struggling to find any info on this on the internet after a couple of hours of searching, trial, error and failure. We have the following table structure:
Name
EventDateTime
Mark
Dave
2021-03-24 09:00:00
Present
Dave
2021-03-24 14:00:00
Absent
Dave
2021-03-25 09:00:00
Absent
Dave
2021-03-26 09:00:00
Absent
Dave
2021-03-27 09:00:00
Present
Dave
2021-03-27 14:00:00
Absent
Dave
2021-03-28 09:00:00
Absent
Dave
2021-03-29 10:00:00
Absent
Dave
2021-03-30 13:00:00
Absent
Jane
2021-03-30 13:00:00
Absent
Basically registers for people for events. We need to pull a report to see who we have not had contact from for more x consecutive days. Consecutive meaning for the days that they have events in the data not consecutive calendar days. Also if there is a present on one of the days where they were also absent the count needs to start again from the next day they were absent.
The first issue I've got is getting distinct dates where there are only absences, then the 2nd is getting the number of consecutive days of absences - I've done the 2nd in MySQL with variables but struggled to migrate this over to PostgreSQL where the reporting is done from.
An example of the output I'd want is:
Name
EventDateTime
Mark
ConsecCount
Dave
2021-03-24 09:00:00
Present
0
Dave
2021-03-24 14:00:00
Absent
0
Dave
2021-03-25 09:00:00
Absent
1
Dave
2021-03-26 09:00:00
Absent
2
Dave
2021-03-27 09:00:00
Present
0
Dave
2021-03-27 14:00:00
Absent
0
Dave
2021-03-28 09:00:00
Absent
1
Dave
2021-03-29 10:00:00
Absent
2
Dave
2021-03-30 13:00:00
Absent
3
Jane
2021-03-30 13:00:00
Absent
0
This table is currently at 639931 records and they have been generated since 1st October and will continue to grow at this rate.
Any help, or advise on where to start that would be great.
This can be achieved using window functions as follows:
WITH with_row_numbers AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY EventDateTime) AS this_row_number,
(CASE WHEN Mark = 'Present' THEN ROW_NUMBER() OVER (PARTITION BY Name ORDER BY EventDateTime) ELSE 0 END) AS row_number_if_present
FROM events
)
SELECT
Name,
EventDateTime,
Mark,
GREATEST(0, this_row_number - MAX(row_number_if_present) OVER (PARTITION BY Name ORDER BY EventDateTime) - 1)
FROM with_row_numbers
Original answer with LATERAL join
WITH with_row_numbers AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY EventDateTime)
FROM events e
)
SELECT
t1.Name,
t1.EventDateTime,
t1.Mark,
GREATEST(0, t1.ROW_NUMBER - COALESCE(sub.prev_present_row_number, 0) - 1) AS ConsecCount
FROM with_row_numbers AS t1
CROSS JOIN LATERAL (
SELECT MAX(row_number) AS prev_present_row_number
FROM with_row_numbers t2
WHERE t2.Name = t1.Name
AND t2.EventDateTime <= t1.EventDateTime
AND t2.Mark = 'Present'
) sub

How to match the closet date in sql (redshift)?

For example, my table A is work_schedule:
Employee_id
Week_start
Work_schedule
A
2021-01-03
Day shift
A
2021-01-10
Day shift
A
2021-01-17
Night shift
B
2020-12-27
Day shift
B
2021-01-03
Day shift
Table B is employee_history:
Employee_id
Calendar_date
Tenure
A
2020-12-20
0
A
2020-12-21
1
A
---
2-30
A
2021-01-19
31
A
2021-01-20
32
B
2020-12-15
0
B
2020-12-16
1
B
---
Employee can choose work schedule 2 weeks ahead, and I want to fetch tenure at the snapshot date (2 weeks ahead). For employee A, the 14 days time period can match a calendar_date. But for employee B, he started within 2 weeks. I want to have the closet date to the 2-week date.
The ideal output is:
Employee_id
Week_start
Work_schedule
Calendar_date (to calculate tenure)
Tenure (at 2 weeks ago)
A
2021-01-03
Day shift
2020-12-20
0
A
2021-01-10
Day shift
2020-12-27
7
A
2021-01-17
Night shift
2021-01-03
14
B
2020-12-27
Day shift
2020-12-15
0
B
2021-01-03
Day shift
2020-12-20
5
For one record to fetch closet date, I can use
order by abs(datediff(day, (week_start - 14), calendar_date)) asc
limit 1
For example, fetch ‘2020-12-15’ as the closest date to ‘2020-12-13’.
select employee_id, calendar_date, tenure
from employee_history h
where employee_id = B
order by abs(datediff(day, ('2020-12-27' - 14), date_key)) asc
limit 1
But I have more than one employees in this situation, how can I get the closest calendar_date for all those that cannot find a match for exactly 2 weeks?