SQL query to find the average time differences between two statuses - sql

I am trying to find the time between status changes for tickets. The statuses are A,B,C,D,E. I need to identify where the bottlenecks are in the system. The table looks something like this:
ticket_no
created_at
current_status
next_status
1
12/2/2022
A
B
1
12/3/2022
B
C
1
12/3/2022
C
B
1
12/4/2022
B
C
1
12/4/2022
C
E
2
12/4/2022
A
C
2
12/5/2022
C
D
2
12/7/2022
D
E
As you can see for ticket 1, it cycled between statuses B and C before finally ending at E. I want to calculate the average time tickets take to move between specific statuses (say A->C, C->E). It’s a bit confusing because tickets can return to previous statuses and tickets don’t need to move through every status. There is an order to the statuses but you can return to a previous state.
Any ideas?
I’ve tried a bunch of things, like lagging (only looks at previous/next), or even pivoting with case statements and subtracting but it doesn't seem to work.
Again the ask is to find the time spent (on average) to go between 2 specific statuses, such as A->C or C->E
Here's my query so far. The idea is to pivot things and just subtract, but I'm really not sure this is gonna be valid
with pv_times as (
select ticket_no,
max(case when current_status='A' and next_status='B' then created_at else null end) as ab_time,
max(case when current_status='A' and next_status='C' then created_at else null end) as ac_time
FROM statuses
GROUP BY 1
)
select * from pv_times
# subtract times to find diff...but is this even valid?

time spent to go between 2 specific statuses
Enumerate all such statuses.
This is the lower diagonal triangle of a 5 × 5 matrix.
Then do a JOIN (.merge) to aggregate all observed
transitions against that vector of possibilities,
.count()'ing the number of them we observed.

Related

How do I stop my query from pulling duplicates?

Yes, I know this seems simple:
SELECT DISTINCT(...)
Except, it apparently isn't
Here is my actual Query:
SELECT
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
CompletedTrainings.DecShotDate,
CompletedTrainings.DecShotLocation,
CompletedTrainings.DecReason,
CompletedTrainings.DecExplanation,
IIf([DecShotLocation]="MCS","Yes","No") AS YesMCS,
IIf([DecReason]=1,1,0) AS YesAllergy,
IIf([DecReason]=2,1,0) AS YesImmune,
IIf([DecReason]=3,1,0) AS YesAdverse,
IIf([DecReason]=4,1,0) AS YesMedical,
IIf([DecReason]=5,1,0) AS YesSpiritual,
IIf([DecReason]=6,1,0) AS YesOther,
IIf([DecReason]=7,1,0) AS YesAlready
FROM
EmployeeInformation
INNER JOIN (CompletedTrainings
LEFT JOIN DeclinationReasons ON CompletedTrainings.DecReason = DeclinationReasons.ReasonID)
ON EmployeeInformation.ID = CompletedTrainings.Employee
GROUP BY
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
CompletedTrainings.DecShotDate,
CompletedTrainings.DecShotLocation,
CompletedTrainings.DecReason,
CompletedTrainings.DecExplanation,
IIf([DecShotLocation]="MCS","Yes","No"),
IIf([DecReason]=1,1,0),
IIf([DecReason]=2,1,0),
IIf([DecReason]=3,1,0),
IIf([DecReason]=4,1,0),
IIf([DecReason]=5,1,0),
IIf([DecReason]=6,1,0),
IIf([DecReason]=7,1,0)
HAVING
((((EmployeeInformation.Active) Like -1)
AND ((CompletedTrainings.DecShotDate + 365 >= DATE())
OR (CompletedTrainings.DecShotDate IS NULL))));
This is Joining a few tables (obviously) in order to get a number of records. The problem is that if someone is duplicated on the table with a NULL in one of the date fields, and a date in another field, it pulls both the NULL and the DATE, or pulls multiple NULLS it might pull multiple dates but those are not present right at the moment.
I need the Nulls, they are actual data in this particular case, but if someone has a date and a NULL I need to pull only the newest record, I thought I could add MAX(RecordID) from the table, but that didn't change the results of the query either.
That code:
SELECT
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
MAX(CompletedTrainings.RecordID),
CompletedTrainings.DecShotDate
...
And it returned the same issue, Duplicated EmployeeInformation.ID with different DecShotDate values.
Currently it returns:
ID
Active
DecShotDate
etc. x a bunch
1
-1
date date
whatever goes
2
-1
in these
2
-1
date date
columns
These are being used in a report, that is to determine the total number of employees who fit the criteria of the report. The NULLs in DecShotDate are needed as they show people who did not refuse to get a flu vaccine in the current year, while the dates are people who did refuse.
Now I have come up with one simple solution, I could add a column to the CompletedTrainings Table that contains a date or other value, and add that to the HAVING statement. This might be the right solution as this is a yearly training questionnaire that employees have to fill out. But I am asking for advice before doing this.
Am I right in thinking I need to add a column to filter by so that older data isn't being pulled, or should I be able to do this by pulling recordID, and did I just bork that part of the query up?
Edited to add raw table views:
EmployeeInformation Table:
ID
Last
First
empID
Active
Termdate
DoH
Title
PT/FT/PD
PI
1
Doe
Jane
982
-1
date
Sr
PD
X
2
Roe
John
278
0
date
date
Jr
PD
X
3
Moe
Larry
1232
-1
date
Sr
FT
X
4
Zoe
Debbie
1424
-1
date
Sr
PT
X
DeclinationReasons Table:
ReasonID
Reason
1
Allergy
2
Already got it
3
Illness
CompletedTrainings Table:
RecordID
Employee
Training
...
DecShotdate
DecShotLocation
DecShotReason
DecExp
1
1
4
date
location
2
text
2
1
4
3
2
4
4
3
4
date
location
3
text
5
3
4
date
location
1
text
6
4
4
After some serious soul searching, I decided to use another column and filter by that.
In the end my query looks like this:
SELECT *
FROM (
(
SELECT RecordID, DecShotDate, DecShotLocation, DecReason, DecExplanation, Employee,
IIf([DecShotLocation]="MCS","Yes","No") AS YesMCS, IIf([DecReason]=1,1,0) AS YesAllergy,
IIf([DecReason]=2,1,0) AS YesImmune, IIf([DecReason]=3,1,0) AS YesAdverse,
IIf([DecReason]=4,1,0) AS YesMedical, IIf([DecReason]=5,1,0) AS YesSpiritual,
IIf([DecReason]=6,1,0) AS YesOther, IIf([DecReason]=7,1,0) AS YesAlready
FROM CompletedTrainings WHERE (CompletedDate > DATE() - 365 ) AND (Training = 69)) AS T1
LEFT JOIN
(
SELECT ID, Active FROM EmployeeInformation) AS T2 ON T1.Employee = T2.ID)
LEFT JOIN
(
SELECT Reason, ReasonID FROM DeclinationReasons) AS T3 ON T1.DecReason = T3.ReasonID;
This may not have been the best solution, but it did exactly what I needed. Which is to get the information by latest entry into the database.
Previously I had tried to use MAX(), DISTINCT(), etc. but always had a problem of multiple records being retrieved. In this case, I intentionally SELECT the most recent records first, then join them to the results of the next query, and so on. Until I have all the required data for my report.
I write this in hopes someone else finds it useful. Or even better if someone tells me why this is wrong, so as to improve my own skills.

Finding the exact overlapping time

with tickets as (
select o.SSTID, o.open_Id, o.Createddatetime openTime, c.Createddatetime closeTime
from dbo.Close_ticket c
inner join dbo.Openticket o ON o.SSTID = c.SSTID and c.Open_ID=o.open_id
)
select t1.SSTID,
SUM(isnull(datediff(hour
, case when t1.openTime > t2.openTime then t1.openTime else t2.openTime end
, case when t1.closeTime > t2.closeTime then t2.closeTime else t1.closeTime end),0)) as [OverLappingtime]
from tickets t1
left join tickets t2 on t1.SSTID = t2.SSTID
and t1.openTime < t2.closeTime and t2.openTime < t1.closeTime
and t1.open_id < t2.open_id
group by t1.SSTID
This is my code where each ticket is compared to every other ticket to find the total overlapping time. But if I create more tickets the total time exceeds 24 hours when all the tickets where created on the same day. How can I find the exact overlapping time? If we see the first three tickets, the 2nd and the third ticket were opened and closed within the opening and closing time of the first ticket.
I need the exact overlapping time.
This is my Openticket table.
[Open_ID,SSTID,Createddatetime]
- 1,1,2020-04-27 06:40:32.337
- 2,1,2020-04-27 12:40:32.337
- 3,1,2020-04-27 14:40:32.337
- 4,1,2020-04-27 15:40:32.337
- 5,1,2020-04-27 18:40:32.337
This is my Close_ticket table.
[Close_id,open_id,SSTID,Createddatetime]
- 1,1,1,2020-04-27 20:40:32.337
- 2,2,1,2020-04-27 15:40:32.337
- 3,3,1,2020-04-27 16:40:32.337
- 4,4,1,2020-04-27 17:40:32.337
- 5,5,1,2020-04-27 21:40:32.337
You keep saying "the logic I've used so far is the one I mentioned" but at no point have you actually mentioned this logic in any useful form so that anyone can understand what it is you are doing: all you are doing is stating numbers with no indication on how you calculated these numbers.
Please provide a step by step guide to show how you calculated an overlap figure of 4 hours for the first 3 tickets.
For example, taking your data but moving the start/end times to the hour (rather than 40:32.337) for the sale of simplicity, we have this:
Possible overlap calculations:
2 overlaps 1 by 3 hours => overlap is 3
3 overlaps 1 by 3 hours => overlap is 3
You want to calculate overlap of both 2 & 3 compared to 1: 3 + 3 = 6
You only want the overlap when all 3 tickets overlap: 1
You don't want to double count any overlap: 2 overlaps 1 by 3 hours, 3 overlaps 1 by 3 hours, 2 & 3 overlap each other by 1 hour (double count) => 3 + 3 - 1 = 4
So which of these possible calculations are you using or are you using completely different logic and, if so, what it that logic?

Return 0 in Sheets Query if there is no data

I need some advice in google query language.
I want to count rows depending on date and a condition. But if the condition is not met, it should return 0.
What I'm trying to achieve:
Date Starts
05.09.2018 0
06.09.2018 3
07.09.2018 0
What I get:
Date Starts
06.09.2018 3
The query looks like =Query(Test!$A2:P; "select P, count(B) where (B contains 'starts') group by P label count(B) 'Starts'")
P contains ascending datevalues and B an event (like start in this case).
How can I force output a 0 for the dates with no entry containing "start"?
The main point is to get all needed data in one table in ascending order. But this is only working, if every day has an entry. If there is no entry for a day, the results for "start" do not match the datevalue in column A. 3 in column D would be in the first row of the table then.
I need it like this:
A B C D
Date Logins Sessions Starts
05.09.2018 1 2 0
06.09.2018 3 4 3
07.09.2018 4 5 0
Maybe this is easy to fix, but I don't see it.
Thanks in advance!
You can do some pre-processing before the query. Ex: check if column B contains 'start' with regexmatch and use a double unary (--) to force the boolean values into 1's and 0's. The use query to sum.
=Query(Arrayformula({--regexmatch(Test!$B2:B; "start")\ Test!$A2:P}); "select Col17, sum(Col1) where Col17 is not null group by Col17 label sum(Col1) 'Starts'")
Change ranges to suit.

Efficiently identify all FK items with n>3 dates within any 8 week period from a SQL table?

I have a ~400,000 row table containing the dates at which a collection of ~30,000 people had appointments. Each row has the patient ID number and an appointment date. I want to efficiently select people who had at least 4 appointments in an 8 week span. Ideally, I would also flag the appointments that were within this 8 week span as I did so. I am working in a server environment that does not allow CLR aggregate functions. Is this possible to do in SQL server? If so, how?
What I've thought about:
If I could write my own aggregate function to do this via GROUP BY that would obviously be best - but I can't seem to find any way to do it with the built in aggregate functions.
I can add a column to my original table giving a date 8 weeks out from any given appointment, but can't come up with any way that doesn't involve a for loop to then ask the question row by row whether there are at least 3 other appointments within that window.
Finally, I've even though that perhaps I could just do GROUP BY but somehow create 100 new columns (as there are up to that many appointments for some patients) to create a table that contains every appointment indexed by patient, but even as a SQL newbie I'm pretty sure that as soon as I get to the point of imagining adding 100 new columns I'm going down the wrong road....
For clarity of discussion, here is some notation:
MyTable:
ApptID PatientID ApptDate (in smalldatetime)
--------------------------------------------------
Apt1 Pt1 Datetime1
Apt2 Pt1 Datetime2
Apt3 Pt2 Datetime3
... ... ...
Desired output (one option):
PatientID 4aptsIn8weeks? (Boolean) InitialApptDateForWin
Pt1 1 Datetime1
Pt2 0 NULL
Pt3 1 Datetime3
...
Desired output (another option):
ApptID PatientID ApptDate InAn8wkWindow? InitialApptDateForWin
Apt1 Pt1 Datetime1 1 Datetime1
Apt2 Pt1 Datetime2 1 Datetime1
Apt3 Pt2 Datetime3 0 NULL
... ... ...
But really, any output format that will in the end let me select patients and appointments that meet this criterion would be dandy....
Thanks for any ideas!
EDIT: Here's a slightly decompressed outline of my implementation of the selected answer below, just in case the details are helpful for anyone else (being new to SQL, it took me a couple stabs to get it working):
WITH MyTableAlias AS (
SELECT * FROM MyTable
)
SELECT MyTableAlias.PatientID, MyTable.Apptdate AS V1,
MyTableAlias.Apptdate AS V2
INTO temp1
FROM MyTable INNER JOIN MyTableAlias
ON (
MyTable.PatientID = MyTableAlia.PatientID
AND (DATEDIFF(Wk,MyTable.Apptdate,MyTableAlias.Apptdate) <=8 )
);
-- Since this gives for any given two visit dates 3 hits
-- (V1-V1, V1-V2, V2-V2), delete the ones where the second visit is being
-- selected as V1:
DELETE FROM temp1
WHERE V2<V1;
-- So far we have just selected pairs of visits within an 8 week
-- span of each other, including an entry for each visit being
-- within 8 weeks of itself, but for the rest only including the item
-- where the second visit is after the first. Now we want to look
-- for examples of first visits where there are at least 4 hits:
SELECT PatientID, V1, MAX(V2) AS lastvisitinspan, DATEDIFF(Wk,V1,MAX(V2))
AS nWeeksInSpan, COUNT(*) AS nWeeksInSpan
INTO MyOutputTable
FROM temp
GROUP BY PatientID, V1
HAVING COUNT(*)>3;
-- From here on it's just a matter of how I want to handle patients with two
-- separate V1 examples meeting criteria...
Rough outline of the query:
INNER JOIN the table ("table") with itself ("alias"), the ON clause would be:
table.patientid = alias.patientid
table.appointment_date < alias.appointment_date
datediff(table.appointment_date, alias.appointment_date) <= 8 week
Then GROUP BY table.patientid, table.appointment_date
Output table.patientid, table.appointment_date, MAX(alias.appointment_date), COUNT(*)
Add a HAVING COUNT(*) > n clause
There are some issues though:
With 400,000 rows the JOIN could produce a very large result set
It will count some date ranges twice. E.g. if there were 4 visits in 9 week period then it will return two rows (#1, #2, #3 and #2, #3, #4).

Convert list of transitions (points in time) to list of states (periods of time)

Did something similar long ago, but when I think I'm doing the same thing now, it doesn't work.
A history table is a list of events happening to accounts. Some of those events are changes in status, in which case a multipurpose Detail column shows the new status. Sample:
... where Event_Type = 'Change_Status';
Acct Line Event_Type Detail
---- ---- ------------- -------
A 1 Change_Status Created
A 4 Change_Status Billed
A 7 Change_Status Paid
A 10 Change_Status Audited
B 1 Change_Status Created
B 6 Change_Status Billed
Now it is easy enough to join this to itself and get a table of time periods WHERE A.Acct = B.Acct and A.Line < B.Line but two things I'm failing on:
I also need to capture the last status, but in that case there is no end (B.*). I thought a left join would get it (B.Line is null) but it doesn't.
Need to eliminate periods that span more than one status, such as A-1 to A-7 Tried both items below, but either one eliminated everything.
AND A.LINE = (SELECT Max(Line) FROM Events TEMP
WHERE TEMP.Acct = A.Acct
AND TEMP.Line < B.Line or B.Line is null);
AND NOT EXISTS (SELECT Line FROM Events TEMP
WHERE TEMP.Acct = A.Acct
AND TEMP.Line between A.Line and B.Line);
If any of that is unclear, what I need to create is effectively
Acct Line Acct Line Status
---- ---- ---- ---- -------
from A 1 To A 4 Created
from A 4 To A 7 Billed
from A 7 To A 10 Paid
from A 10 To Audited
from B 1 To B 6 Created
I poked around with this on a postgres 9.1 database (so, ymmv). This is the query i came up with:
select
x.acct, x.line, y.line, x.status
from
statchanges x
left join statchanges y on x.acct = y.acct
and y.line > x.line
where
y.line is null or
(y.line - x.line =
(select min(y1.line - x1.line)
from statchanges x1, statchanges y1
where x1.acct = x.acct
and x1.line = x.line
and x1.acct = y1.acct
and y1.line > x1.line));
Important differences: 1- in the join clause, i'm joining on b.line > a.line, rather than a.line < b.line. This appears to be because (on postgres 9.1, at least) null is sorted after non-nulls, unless otherwise specified. 2- i'm jumping through some hoops to make sure i get the right min in the sub-query: making a very similar join (don't have to do a left join since we don't care about the nulls), and making sure the acct and starting line match with the outer query.
I'm not sure if this is completely what you're looking for, but it should hopefully give you some directions to explore.