SQL to find overlapping time periods and sub-faults

SQL to find overlapping time periods and sub-faults - sql

Long time stalker, first time poster (and SQL beginner). My question is similar to this one SQL to find time elapsed from multiple overlapping intervals, except I'm able to use CTE, UDFs etc and am looking for more detail.
On a piece of large scale equipment I have a record of all faults that arise. Faults can arise on different sub-components of the system, some may take it offline completely (complete outage = yes), while others do not (complete outage = no). Faults can overlap in time, and may not have end times if the fault has not yet been repaired.
Outage_ID StartDateTime EndDateTime CompleteOutage
1 07:00 3-Jul-13 08:55 3-Jul13 Yes
2 08:30 3-Jul-13 10:00 4-Jul13 No
3 12:00 4-Jul-13 No
4 12:30 4-Jul13 12:35 4-Jul-13 No
1 |---------|
2 |---------|
3 |--------------------------------------------------------------
4 |---|
I need to be able to work out for a user defined time period, how long the total system is fully functional (no faults), how long its degraded (one or more non-complete outages) and how long inoperable (one or more complete outages). I also need to be able to work out for any given time period which faults were on the system. I was thinking of creating a "Stage Change" table anytime a fault is opened or closed, but I am stuck on the best way to do this - any help on this or better solutions would be appreciated!

This isn't a complete solution (I leave that as an exercise :)) but should illustrate the basic technique. The trick is to create a state table (as you say). If you record a 1 for a "start" event and a -1 for an "end" event then a running total in event date/time order gives you the current state at that particular event date/time. The SQL below is T-SQL but should be easily adaptable to whatever database server you're using.
Using your data for partial outage as an example:
DECLARE #Faults TABLE (
StartDateTime DATETIME NOT NULL,
EndDateTime DATETIME NULL
)
INSERT INTO #Faults (StartDateTime, EndDateTime)
SELECT '2013-07-03 08:30', '2013-07-04 10:00'
UNION ALL SELECT '2013-07-04 12:00', NULL
UNION ALL SELECT '2013-07-04 12:30', '2013-07-04 12:35'
-- "Unpivot" the events and assign 1 to a start and -1 to an end
;WITH FaultEvents AS (
SELECT *, Ord = ROW_NUMBER() OVER(ORDER BY EventDateTime)
FROM (
SELECT EventDateTime = StartDateTime, Evt = 1
FROM #Faults
UNION ALL SELECT EndDateTime, Evt = -1
FROM #Faults
WHERE EndDateTime IS NOT NULL
) X
)
-- Running total of Evt gives the current state at each date/time point
, FaultEventStates AS (
SELECT A.Ord, A.EventDateTime, A.Evt, [State] = (SELECT SUM(B.Evt) FROM FaultEvents B WHERE B.Ord <= A.Ord)
FROM FaultEvents A
)
SELECT StartDateTime = S.EventDateTime, EndDateTime = F.EventDateTime
FROM FaultEventStates S
OUTER APPLY (
-- Find the nearest transition to the no-fault state
SELECT TOP 1 *
FROM FaultEventStates B
WHERE B.[State] = 0
AND B.Ord > S.Ord
ORDER BY B.Ord
) F
-- Restrict to start events transitioning from the no-fault state
WHERE S.Evt = 1 AND S.[State] = 1
If you are using SQL Server 2012 then you have the option to calculate the running total using a windowing function.

The below is a rough guide to getting this working. It will compare against an interval table of dates and an interval table of 15 mins. It will then sum the outage events (1 event per interval), but not sum a partial outage if there is a full outage.
You could use a more granular time interval if you needed, I choose 15 mins for speed of coding.
I already had a date interval table set up "CAL.t_Calendar" so you would need to create one of your own to run this code.
Please note, this does not represent actual code you should use. It is only intended as a demonstration and to point you in a possible direction...
EDIT I've just realised I have't accounted for the null end dates. The code will need amending to check for NULL endDates and use #EndDate or GETDATE() if #EndDate is in the future
--drop table ##Events
CREATE TABLE #Events (OUTAGE_ID INT IDENTITY(1,1) PRIMARY KEY
,StartDateTime datetime
,EndDateTime datetime
, completeOutage bit)
INSERT INTO #Events VALUES ('2013-07-03 07:00','2013-07-03 08:55',1),('2013-07-03 08:30','2013-07-04 10:00',0)
,('2013-07-04 12:00',NULL,0),('2013-07-04 12:30','2013-07-04 12:35',0)
--drop table #FiveMins
CREATE TABLE #FiveMins (ID int IDENTITY(1,1) PRIMARY KEY, TimeInterval Time)
DECLARE #Time INT = 0
WHILE #Time <= 1410 --number of 15 min intervals in day * 15
BEGIN
INSERT INTO #FiveMins SELECT DATEADD(MINUTE , #Time, '00:00')
SET #Time = #Time + 15
END
SELECT * from #FiveMins
DECLARE #StartDate DATETIME = '2013-07-03'
DECLARE #EndDate DATETIME = '2013-07-04 23:59:59.999'
SELECT SUM(FullOutage) * 15 as MinutesFullOutage
,SUM(PartialOutage) * 15 as MinutesPartialOutage
,SUM(NoOutage) * 15 as MinutesNoOutage
FROM
(
SELECT DateAnc.EventDateTime
, CASE WHEN COUNT(OU.OUTAGE_ID) > 0 THEN 1 ELSE 0 END AS FullOutage
, CASE WHEN COUNT(OU.OUTAGE_ID) = 0 AND COUNT(pOU.OUTAGE_ID) > 0 THEN 1 ELSE 0 END AS PartialOutage
, CASE WHEN COUNT(OU.OUTAGE_ID) > 0 OR COUNT(pOU.OUTAGE_ID) > 0 THEN 0 ELSE 1 END AS NoOutage
FROM
(
SELECT CAL.calDate + MI.TimeInterval AS EventDateTime
FROM CAL.t_Calendar CAL
CROSS JOIN #FiveMins MI
WHERE CAL.calDate BETWEEN #StartDate AND #EndDate
) DateAnc
LEFT JOIN #Events OU
ON DateAnc.EventDateTime BETWEEN OU.StartDateTime AND OU.EndDateTime
AND OU.completeOutage = 1
LEFT JOIN #Events pOU
ON DateAnc.EventDateTime BETWEEN pOU.StartDateTime AND pOU.EndDateTime
AND pOU.completeOutage = 0
GROUP BY DateAnc.EventDateTime
) AllOutages

Related

Return 0 values where corresponding type is missing

I need a select query where I get amount of visits, type of visit to sort it and in which dates this happend.
My problem is that I dont get 0s at times where there are no visits.
I am using this code:
SELECT COUNT(*) AS numberOfVisits,
CONVERT(DATE, CONVERT(VARCHAR(10), CAST(dateOfVisit AS DATE), 121)) AS dateOfVisit,
typeOfVisit
FROM WEB
WHERE dateOfVisit >= '1-1-2021'
AND dateOfVisit <= '1-31-2021'
GROUP BY CAST(dateOfVisit AS DATE), typeOfVisit
ORDER BY CAST(dateOfVisit AS DATE)
/*i am converting and casting the date for other reasons that i need in
my API/frontend and i also get the condition of date from api 1-1-2021
and 1-31-2021 are just for sample*/
and with this i get the following results:
https://imgur.com/a/o3c9A2p
The columns Follow as: numberOfVisits | dateOfVisit | typeOfVisit
As you can see I have 4 types.
The first 3 rows contain types 2,3,4 but not 1; I want to have an extra row here with value 0, same date and type 1.
The actual value 0 is also not written in the database and neither is null
The desired result would be like this:
0 2021-01-04 1
1 2021-01-04 2
10 2021-01-04 3
2 2021-01-04 4
and if the result would only show that there were visits of type 2 then it would be something like this:
0 2021-01-04 1
1 2021-01-04 2
0 2021-01-04 3
0 2021-01-04 4
UPDATE:
So i did what i think is a calander table without actualy creating one and i do get the correct format of data but now my count is way off... let me show you
the query :
DECLARE #MinDate DATE = '1-1-2021',
#MaxDate DATE = '1-31-2021';
SELECT COUNT(*) AS numberOfVisits,
CONVERT(DATE, CONVERT(VARCHAR(10), CAST(b.VisitDate AS DATE), 121)) AS dateOfVisit,
a.typeOfVisit
FROM WEB a
CROSS JOIN WEB b
WHERE b.VisitDate >= #MinDate
AND b.VisitDate <= #MaxDate
GROUP BY CAST(b.VisitDate AS DATE), a.typeOfVisit
ORDER BY CAST(b.VisitDate AS DATE)
and the result of query : https://imgur.com/a/BtpoqD1
As you can see the dates and types can all be seen now but the count is huge and if dont do cross join then the dates with no count cant be seen again... I have no clue what goes wrong can you please help me ?

Maybe it wasnt clear enough what i wanted in my question, but i managed to come to a solution with case in my select.
If anyone is interested in the solution i came up with, here is how i wrote the select with everything else the same as in the original question.
SELECT SUM(CASE WHEN TypeOfVisit = 1 THEN 1 ELSE 0 END) VisitType1,
//...

You can't count if there is no table record of date. You need to have calendar table with every date to start select from then left join your visit table and count.
It would be something like this:
if OBJECT_ID('tempdb..#Calendar') is not null drop table #Calendar
create table #Calendar (adDate date)
declare #dDate date = '2021-01-01'
while #dDate <= '2021-01-31'
begin
insert into #Calendar (adDate)
select #dDate
set #dDate = dateadd(DAY, 1, #dDate)
end
select
c.adDate,
w.typeOfVisit,
COUNT(w.visit) as VisitCount
from #Calendar c
left join web w on w.dateOfVisit = c.adDate
group by
c.adDate,
w.typeOfVisit
order by
c.adDate
If you want to have 0 visits by type also, not just by day then you can do it like this:
if OBJECT_ID('tempdb..#Calendar') is not null drop table #Calendar
create table #Calendar (adDate date, anTypeOfVisit int)
declare
#dDate date = '2021-01-01',
#nTypeOfVisit int
while #dDate <= '2021-01-31'
begin
set #nTypeOfVisit = 1
while #nTypeOfVisit <=4
begin
insert into #Calendar (adDate, anTypeOfVisit)
select #dDate, #nTypeOfVisit
set #nTypeOfVisit = #nTypeOfVisit + 1
end
set #dDate = dateadd(DAY, 1, #dDate)
end
select
c.adDate,
c.anTypeOfVisit,
COUNT(w.dateofvisit) as VisitCount
from #Calendar c
left join web w on w.dateOfVisit = c.adDate and w.typeofvisit = c.anTypeOfVisit
group by
c.adDate,
c.anTypeOfVisit
order by
c.adDate,
c.anTypeOfVisit

You need to create a table that contains a row for each visit type for each date. Then do a left join to bring in the counts from your actual data
DROP TABLE IF EXISTS #Visit
CREATE TABLE #Visit (ID INT IDENTITY(1,1) ,DateOfVisit Date,TypeOfVisit int)
INSERT INTO #Visit
VALUES ('2021-01-04',2),('2021-01-04',3),('2021-01-04',3),('2021-01-04',4);
WITH cte_DistinctVisitDate AS (
SELECT DateOfVisit
FROM #Visit
WHERE DateOfVisit BETWEEN '2020-01-04' AND '2021-01-04' /*Always use YYYY-MM-DD for params as it is culture-agnostic*/
GROUP BY DateOfVisit
),
cte_TypePerDate AS (
/*One row for each type on each visit date*/
SELECT DateOfVisit,TypeOfVisit
FROM cte_DistinctVisitDate
CROSS JOIN (VALUES (1),(2),(3),(4)) AS B(TypeOfVisit) /*I am hard coding raw data, but you should use lookup table that contains each visit type*/
),
cte_VisitCount AS (
/*Actual counts, but only lists visit types with visits*/
SELECT DateOfVisit,TypeOfVisit,VisitCnt = COUNT(*)
FROM #Visit
GROUP BY DateOfVisit,TypeOfVisit
)
SELECT A.DateOfVisit,A.TypeOfVisit,VisitCnt = ISNULL(B.VisitCnt,0)
FROM cte_TypePerDate AS A
LEFT JOIN cte_VisitCount AS B
ON A.DateOfVisit = B.DateOfVisit
AND A.TypeOfVisit = B.TypeOfVisit

Splitting up group by with relevant aggregates beyond the basic ones?

I'm not sure if this has been asked before because I'm having trouble even asking it myself. I think the best way to explain my dilemma is to use an example.
Say I've rated my happiness on a scale of 1-10 every day for 10 years and I have the results in a big table where I have a single date correspond to a single integer value of my happiness rating. I say, though, that I only care about my happiness over 60 day periods on average (this may seem weird but this is a simplified example). So I wrap up this information to a table where I now have a start date field, an end date field, and an average rating field where the start days are every day from the first day to the last over all 10 years, but the end dates are exactly 60 days later. To be clear, these 60 day periods are overlapping (one would share 59 days with the next one, 58 with the next, and so on).
Next I pick a threshold rating, say 5, where I want to categorize everything below it into a "bad" category and everything above into a "good" category. I could easily add another field and use a case structure to give every 60-day range a "good" or "bad" flag.
Then to sum it up, I want to display the total periods of "good" and "bad" from maximum beginning to maximum end date. This is where I'm stuck. I could group by the good/bad category and then just take min(start date) and max(end date), but then if, say, the ranges go from good to bad to good then to bad again, output would show overlapping ranges of good and bad. In the aforementioned situation, I would want to show four different ranges.
I realize this may seem clearer to me that it would to someone else so if you need clarification just ask.
Thank you
---EDIT---
Here's an example of what the before would look like:
StartDate| EndDate| MoodRating
------------+------------+------------
1/1/1991 |3/1/1991 | 7
1/2/1991 |3/2/1991 | 7
1/3/1991 |3/3/1991 | 4
1/4/1991 |3/4/1991 | 4
1/5/1991 |3/5/1991 | 7
1/6/1991 |3/6/1991 | 7
1/7/1991 |3/7/1991 | 4
1/8/1991 |3/8/1991 | 4
1/9/1991 |3/9/1991 | 4
And the after:
MinStart| MaxEnd | Good/Bad
-----------+------------+----------
1/1/1991|3/2/1991 |good
1/3/1991|3/4/1991 |bad
1/5/1991|3/6/1991 |good
1/7/1991|3/9/1991 |bad
Currently my query with the group by rating would show:
MinStart| MaxEnd | Good/Bad
-----------+------------+----------
1/1/1991|3/6/1991 |good
1/3/1991|3/9/1991 |bad
This is something along the lines of
select min(StartDate), max(EndDate), Good_Bad
from sourcetable
group by Good_Bad

While Jason A Long's answer may be correct - I can't read it or figure it out, so I figured I would post my own answer. Assuming that this isn't a process that you're going to be constantly running, the CURSOR's performance hit shouldn't matter. But (at least to me) this solution is very readable and can be easily modified.
In a nutshell - we insert the first record from your source table into our results table. Next, we grab the next record and see if the mood score is the same as the previous record. If it is, we simply update the previous record's end date with the current record's end date (extending the range). If not, we insert a new record. Rinse, repeat. Simple.
Here is your setup and some sample data:
DECLARE #MoodRanges TABLE (StartDate DATE, EndDate DATE, MoodRating int)
INSERT INTO #MoodRanges
VALUES
('1/1/1991','3/1/1991', 7),
('1/2/1991','3/2/1991', 7),
('1/3/1991','3/3/1991', 4),
('1/4/1991','3/4/1991', 4),
('1/5/1991','3/5/1991', 7),
('1/6/1991','3/6/1991', 7),
('1/7/1991','3/7/1991', 4),
('1/8/1991','3/8/1991', 4),
('1/9/1991','3/9/1991', 4)
Next, we can create a table to store our results, as well as some variable placeholders for our cursor:
DECLARE #MoodResults TABLE(ID INT IDENTITY(1, 1), StartDate DATE, EndDate DATE, MoodScore varchar(50))
DECLARE #CurrentStartDate DATE, #CurrentEndDate DATE, #CurrentMoodScore INT,
#PreviousStartDate DATE, #PreviousEndDate DATE, #PreviousMoodScore INT
Now we put all of the sample data into our CURSOR:
DECLARE MoodCursor CURSOR FOR
SELECT StartDate, EndDate, MoodRating
FROM #MoodRanges
OPEN MoodCursor
FETCH NEXT FROM MoodCursor INTO #CurrentStartDate, #CurrentEndDate, #CurrentMoodScore
WHILE ##FETCH_STATUS = 0
BEGIN
IF #PreviousStartDate IS NOT NULL
BEGIN
IF (#PreviousMoodScore >= 5 AND #CurrentMoodScore >= 5)
OR (#PreviousMoodScore < 5 AND #CurrentMoodScore < 5)
BEGIN
UPDATE #MoodResults
SET EndDate = #CurrentEndDate
WHERE ID = (SELECT MAX(ID) FROM #MoodResults)
END
ELSE
BEGIN
INSERT INTO
#MoodResults
VALUES
(#CurrentStartDate, #CurrentEndDate, CASE WHEN #CurrentMoodScore >= 5 THEN 'GOOD' ELSE 'BAD' END)
END
END
ELSE
BEGIN
INSERT INTO
#MoodResults
VALUES
(#CurrentStartDate, #CurrentEndDate, CASE WHEN #CurrentMoodScore >= 5 THEN 'GOOD' ELSE 'BAD' END)
END
SET #PreviousStartDate = #CurrentStartDate
SET #PreviousEndDate = #CurrentEndDate
SET #PreviousMoodScore = #CurrentMoodScore
FETCH NEXT FROM MoodCursor INTO #CurrentStartDate, #CurrentEndDate, #CurrentMoodScore
END
CLOSE MoodCursor
DEALLOCATE MoodCursor
And here are the results:
SELECT * FROM #MoodResults
ID StartDate EndDate MoodScore
----------- ---------- ---------- --------------------------------------------------
1 1991-01-01 1991-03-02 GOOD
2 1991-01-03 1991-03-04 BAD
3 1991-01-05 1991-03-06 GOOD
4 1991-01-07 1991-03-09 BAD

Is this what you're looking for?
IF OBJECT_ID('tempdb..#MyDailyMood', 'U') IS NOT NULL
DROP TABLE #MyDailyMood;
CREATE TABLE #MyDailyMood (
TheDate DATE NOT NULL,
MoodLevel INT NOT NULL
);
WITH
cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
cte_n3 (n) AS (SELECT 1 FROM cte_n2 a CROSS JOIN cte_n2 b),
cte_Calendar (dt) AS (
SELECT TOP (DATEDIFF(dd, '2007-01-01', '2017-01-01'))
DATEADD(dd, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1, '2007-01-01')
FROM
cte_n3 a CROSS JOIN cte_n3 b
)
INSERT #MyDailyMood (TheDate, MoodLevel)
SELECT
c.dt,
ABS(CHECKSUM(NEWID()) % 10) + 1
FROM
cte_Calendar c;
--==========================================================
WITH
cte_AddRN AS (
SELECT
*,
RN = ISNULL(NULLIF(ROW_NUMBER() OVER (ORDER BY mdm.TheDate) % 60, 0), 60)
FROM
#MyDailyMood mdm
),
cte_AssignGroups AS (
SELECT
*,
DateGroup = DENSE_RANK() OVER (PARTITION BY arn.RN ORDER BY arn.TheDate)
FROM
cte_AddRN arn
)
SELECT
BegOfRange = MIN(ag.TheDate),
EndOfRange = MAX(ag.TheDate),
AverageMoodLevel = AVG(ag.MoodLevel),
CASE WHEN AVG(ag.MoodLevel) >= 5 THEN 'Good' ELSE 'Bad' END
FROM
cte_AssignGroups ag
GROUP BY
ag.DateGroup;
Post OP update solution...
WITH
cte_AddRN AS ( -- Add a row number to each row that resets to 1 ever 60 rows.
SELECT
*,
RN = ISNULL(NULLIF(ROW_NUMBER() OVER (ORDER BY mdm.TheDate) % 60, 0), 60)
FROM
#MyDailyMood mdm
),
cte_AssignGroups AS ( -- Use DENSE_RANK to create groups based on the RN added above.
-- How it works: RN set the row number 1 - 60 then repeats itself
-- but we dont want ever 60th row grouped together. We want blocks of 60 consecutive rows grouped together
-- DENSE_RANK accompolishes this by ranking within all the "1's", "2's"... and so on.
-- verify with the following query... SELECT * FROM cte_AssignGroups ag ORDER BY ag.TheDate
SELECT
*,
DateGroup = DENSE_RANK() OVER (PARTITION BY arn.RN ORDER BY arn.TheDate)
FROM
cte_AddRN arn
),
cte_AggRange AS ( -- This is just a straight forward aggregation/rollup. It produces the results similar to the sample data you posed in your edit.
SELECT
BegOfRange = MIN(ag.TheDate),
EndOfRange = MAX(ag.TheDate),
AverageMoodLevel = AVG(ag.MoodLevel),
GorB = CASE WHEN AVG(ag.MoodLevel) >= 5 THEN 'Good' ELSE 'Bad' END,
ag.DateGroup
FROM
cte_AssignGroups ag
GROUP BY
ag.DateGroup
),
cte_CompactGroup AS ( -- This time we're using dense rank to group all of the consecutive "Good" and "Bad" values so that they can be further aggregated below.
SELECT
ar.BegOfRange, ar.EndOfRange, ar.AverageMoodLevel, ar.GorB, ar.DateGroup,
DenseGroup = ar.DateGroup - DENSE_RANK() OVER (PARTITION BY ar.GorB ORDER BY ar.BegOfRange)
FROM
cte_AggRange ar
)
-- The final aggregation step...
SELECT
BegOfRange = MIN(cg.BegOfRange),
EndOfRange = MAX(cg.EndOfRange),
cg.GorB
FROM
cte_CompactGroup cg
GROUP BY
cg.DenseGroup,
cg.GorB
ORDER BY
BegOfRange;

How do I calculate coverage dates during a period of time in SQL against a transactional table?

I'm attempting to compile a list of date ranges like so:
Coverage Range: 10/1/2016 - 10/5/2016
Coverage Range: 10/9/2016 - 10/31/2016
for each policy in a database table. The table is transactional, and there is one cancellation transaction code, but three codes that can indicate coverage has begun. Also, there can be instances where the codes that indicate start of coverage can occur in sequence (start on 10/1, then another start on 10/5, then cancel on 10/14). Below is an example of a series of transactions that I would like to generate the above results from:
TransID PolicyID EffDate
NewBus 1 9/15/2016
Confirm 1 9/17/2016
Cancel 1 10/5/2016
Reinst 1 10/9/2016
Cancel 1 10/15/2016
Reinst 1 10/15/2016
PolExp 1 3/15/2017
SO in this dataset, I want the following results for the date range 10/1 - 10/31
Coverage Range: 10/1/2016 - 10/5/2016
Coverage Range: 10/9/2016 - 10/31/2016
Note that since the cancel and reinstatement happen on the same day, I'm excluding them from the results set. I tried pairing the transactions with subqueries:
CONVERT(varchar(10),
CASE WHEN overall.sPTRN_ID in (SELECT code FROM #cancelTransCodes)
-- This is a coverage cancellationentry
THEN -- Set coverage start date using previous paired record
CASE WHEN((SELECT MAX(inn.PD_EffectiveDate) FROM PolicyData inn WHERE inn.sPTRN_ID in (SELECT code FROM #startCoverageTransCodes)
and inn.PD_EffectiveDate <= overall.PD_EffectiveDate
and inn.PD_PolicyCode = overall.PD_PolicyCode) < #sDate) THEN #sDate
ELSE
(SELECT MAX(inn.PD_EffectiveDate) FROM PolicyData inn WHERE inn.sPTRN_ID in (SELECT code FROM #startCoverageTransCodes)
and inn.PD_EffectiveDate <= overall.PD_EffectiveDate
and inn.PD_PolicyCode = overall.PD_PolicyCode)
END
ELSE -- Set coverage start date using current record
CASE WHEN (overall.PD_EffectiveDate < #sDate) THEN #sDate ELSE overall.PD_EffectiveDate END END, 101)
as [Effective_Date]
This mostly works except for the situation I listed above. I'd rather not rewrite this query if I can help it. I have a similar line for expiration date:
ISNULL(CONVERT(varchar(10),
CASE WHEN overall.sPTRN_ID in (SELECT code FROM #cancelTransCodes) -- This is a coverage cancellation entry
THEN -- Set coverage end date with current record
overall.PD_EffectiveDate
ELSE -- check for future coverage end dates
CASE WHEN
(SELECT COUNT(*) FROM PolicyData pd WHERE pd.PD_EffectiveDate > overall.PD_EffectiveDate and pd.sPTRN_ID in (SELECT code FROM #cancelTransCodes)) > 1
THEN -- There are future end dates
CASE WHEN((SELECT TOP 1 pd.PD_ExpirationDate FROM PolicyData pd
WHERE pd.PD_PolicyCode = overall.PD_PolicyCode
and pd.PD_EntryDate between #sDate and #eDate
and pd.sPTRN_ID in (SELECT code FROM #cancelTransCodes))) > #eDate
THEN #eDate
ELSE
(SELECT TOP 1 pd.PD_ExpirationDate FROM PolicyData pd
WHERE pd.PD_PolicyCode = overall.PD_PolicyCode
and pd.PD_EntryDate between #sDate and #eDate
and pd.sPTRN_ID in (SELECT code FROM #cancelTransCodes))
END
ELSE -- No future coverage end dates
CASE WHEN(overall.PD_ExpirationDate > #eDate) THEN #eDate ELSE overall.PD_ExpirationDate END
END
END, 101), CONVERT(varchar(10), CASE WHEN(overall.PD_ExpirationDate > #eDate) THEN #eDate ELSE overall.PD_ExpirationDate END, 101))
as [Expiration_Date]
I can't help but feel like there's a simpler solution I'm missing here. So my question is: how can I modify the above portion of my query to accomodate the above scenario? OR What is the better answer? If I cam simplify this, I would love to hear how.
Here's the solution I ended up implementing
I took a simplified table where I boiled all the START transaction codes to START and all the cancel transaction codes to CANCEL. When I viewed the table based on that, it was MUCH easier to watch how my logic affected the results. I ended up using a simplified system where I used CASE WHEN clauses to identify specific scenarios and built my date ranges based on that. I also changed my starting point away from looking at cancellations and finding the related starts, and reversing it (find starts and then related calcellations). So here's the code I implemented:
/* Get Coverage Dates */
,cast((CASE WHEN sPTRN_ID in (SELECT code FROM #startCoverageTransCodes) THEN
CASE WHEN (cast(overall.PD_EntryDate as date) <= #sDate) THEN #sDate
WHEN (cast(overall.PD_EntryDate as date) > #sDate AND cast(overall.PD_EntryDate as date) <= #eDate) THEN overall.PD_EntryDate
WHEN (cast(overall.PD_EntryDate as date) > #eDate) THEN #eDate
ELSE cast(overall.PD_EntryDate as date) END
ELSE
null
END) as date) as Effective_Date
,cast((CASE WHEN sPTRN_ID in (SELECT code FROM #startCoverageTransCodes) THEN
CASE WHEN (SELECT MIN(p.PD_EntryDate) FROM PolicyData p WITH (NOLOCK) WHERE p.sPTRN_ID in (SELECT code FROM #cancelTransCodes) AND p.PD_EntryDate > overall.PD_EntryDate AND p.PD_PolicyCOde = overall.PD_PolicyCode) > #eDate THEN #eDate
ELSE ISNULL((SELECT MIN(p.PD_EntryDate) FROM PolicyData p WITH (NOLOCK) WHERE p.sPTRN_ID in (SELECT code FROM #cancelTransCodes) AND p.PD_EntryDate > overall.PD_EntryDate AND p.PD_PolicyCOde = overall.PD_PolicyCode), #eDate) END
ELSE
CASE WHEN (SELECT MAX(p.PD_EntryDate) FROM PolicyData p WITH (NOLOCK) WHERE p.sPTRN_ID in (SELECT code FROM #startCoverageTransCodes) AND p.PD_EntryDate > overall.PD_EntryDate AND p.PD_PolicyCOde = overall.PD_PolicyCode) > #eDate THEN #eDate
ELSE (SELECT MAX(p.PD_EntryDate) FROM PolicyData p WITH (NOLOCK) WHERE p.sPTRN_ID in (SELECT code FROM #startCoverageTransCodes) AND p.PD_EntryDate > overall.PD_EntryDate AND p.PD_PolicyCOde = overall.PD_PolicyCode)
END END) as date) as Expiration_Date
As you can see, I relied on subqueries in this case. I had a lot of this logic as joins, which caused extra rows where I didn't need them. So by making the date range logic based on sub-queries, I ended up speeding the stored procedure up by several seconds, bringing my execution time to under 1 second where before it was between 2-5 seconds.

There might be a simpler solution, but I just do not see it right now.
The outline for each step is:
Generate dates for date range, which you do not need to do if you have a calendar table.
Transform the incoming data set as you described in your question (skipping start/cancel on the same day); and add the next EffDate for each row.
Explode the data set with a row for each day between the generated ranges of step 2.
Reduce the data set back down based on consecutive days of converage.
test setup: http://rextester.com/GUNSO45644
/* set date range */
declare #fromdate date = '20161001'
declare #thrudate date = '20161031'
/* generate dates in range -- you can skip this if you have a calendar table */
;with n as (select n from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t(n))
, dates as (
select top (datediff(day, #fromdate, #thrudate)+1)
[Date]=convert(date,dateadd(day,row_number() over(order by (select 1))-1, #fromdate))
from n as deka
cross join n as hecto /* 100 days */
cross join n as kilo /* 2.73 years */
cross join n as [tenK] /* 27.3 years */
order by [Date]
)
/* reduce test table to desired input*/
, pol as (
select
Coverage = case when max(TransId) in ('Cancel','PolExp')
then 0 else 1 end
, PolicyId
, EffDate = case when max(TransId) in ('Cancel','PolExp')
then dateadd(day,1,EffDate) else EffDate end
, NextEffDate = oa.NextEffDate
from t
outer apply (
select top 1
NextEffDate = case
when i.TransId in ('Cancel','PolExp')
then dateadd(day,1,i.EffDate)
else i.EffDate
end
from t as i
where i.PolicyId = t.PolicyId
and i.EffDate > t.EffDate
order by
i.EffDate asc
, case when i.TransId in ('Cancel','PolExp') then 1 else 0 end desc
) as oa
group by t.PolicyId, t.EffDate, oa.NextEffDate
)
/* explode desired input by day, add row_numbers() */
, e as (
select pol.PolicyId, pol.Coverage, d.Date
, rn_x = row_number() over (
partition by pol.PolicyId
order by d.Date
)
, rn_y = row_number() over (
partition by pol.PolicyId, pol.Coverage
order by d.date)
from pol
inner join dates as d
on d.date >= pol.EffDate
and d.date < pol.NextEffDate
)
/* reduce to date ranges where Coverage = 1 */
select
PolicyId
, FromDate = convert(varchar(10),min(Date),120)
, ThruDate = convert(varchar(10),max(Date),120)
from e
where Coverage = 1
group by PolicyId, (rn_x-rn_y);
returns:
+----------+------------+------------+
| PolicyId | FromDate | ThruDate |
+----------+------------+------------+
| 1 | 2016-10-01 | 2016-10-05 |
| 1 | 2016-10-09 | 2016-10-31 |
+----------+------------+------------+

How to implement this logic?

I have designed a script to get the inspector performance score which is based on different factors. Inspector is awarded grades based on their performance score. Script runs over night as a SQL job and updates all the inspectors (over 6500 inspectors) grades.
We are checking last 90 days progress but many inspectors who have done no work in last 90 days are getting full marks. To avoid this situation we have decided to look at last 90 days and if the number of reports is zero go back another 90 days for that inspector.
i.e. If out 6500 inspectors lets say 250 has done no job then script needs to go back another 90 days for those 250 inspectors and see if they have any work.
This could have been implemented in cursors very easily but i can't use cursor as it is taking too long as discussed here select query in Cursor taking too long
What are the other option? Should i write a function which will first check if there is any work been done in last 90 days for one inspector if not then go back another 90 days. But for doing this i would till need cursor?
ADDED
I have tried setting dates in temp table as mentioned by #Raj but it is taking too much time. This is a same query which took so long while using cursor. Other stats are running fine and i think something to do with query.
Requirements:
Number of visits for each inspectors where visits has uploaded document (1 or 2 or 13)
Tables:
Inspectors: InspectorID
InspectionScope: ScopeID, InspectorID (FK)
Visits: VisitID, VisitDate ScopeID (FK)
VisitsDoc: DocID, DocType, VisitID (FK)
DECLARE
#DateFrom90 date, #DateTo date, #DateFrom180 date, #DateFrom date;
SELECT #DateTo = CAST(GETDATE() AS DATE)
,#DateFrom90 = CAST(GETDATE() - 90 AS DATE)
,#DateFrom180 = CAST(GETDATE() - 180 AS DATE)
DECLARE #Inspectors TABLE (
InspectorID int,
InspectorGrade int,
DateFrom date,
DateTo date
);
insert into #inspectors (
InspectorID ,
InspectorGrade,
DateFrom ,
DateTo
)
select
tmp.InspectorID , tmp.InspectorGrade
,case when tmp.VisitWithReport = 0 then #DateFrom180 else #DateFrom90 end StartDate
,#DateTo EndDate
from
(
select
i.InspectorID , i.InspectorGrade
,VisitWithReport = (select COUNT(v.visitid) from visits v
inner join InspectionScope s on s.ScopeID = v.ScopeID
where v.ReportStandard not in (0,9) and v.VisitType = 1
and v.VisitDate BETWEEN #DateFrom90 and #DateTo
and s.InspectorID = i.InspectorID)
from inspectors i
)tmp;
--select * from #Inspectors
SELECT i.InspectorID , i.InspectorGrade
,TotalVisitsWithAtLeastOneReport = (select COUNT(distinct v.visitID) from Visits v
inner join InspectionScope s on s.ScopeID = v.ScopeID
inner join VisitDocs vd on vd.VisitID = v.VisitID
where vd.DocType IN (1,2,13) and s.InspectorID = i.InspectorID
and v.VisitDate BETWEEN i.DateFrom and i.DateTo
)
from #Inspectors i

You can identify the last job/work date first before applying any logic. Like, you can store InspectorID and LastWorkDay in a temp table (assuming LastWorkDay will be available in some table). Then based on LastWorkDay you can decide how many days you have to go back - 90 or 180. This will be another field (StartDate) in temp table which can be derived based on LastWorkDay column.

Better way to calculate utilisation

I have a rather complicated (and very inefficient) way of getting utilisation from a large list of periods (Code below).
Currently I'm running this for a period of 8 weeks and it's taking between 30 and 40 seconds to return data.
I need to run this regularly for periods of 6 months, 1 year and two years which will obviously take a massive amount of time.
Is there a smarter way to run this query to lower the number of table scans?
I have tried several ways of joining the data, all seem to return junk data.
I've tried to comment the code as much as I can but if anything is unclear let me know.
Table Sizes:
[Stock] ~12,000 records
[Contitems] ~90,000 records
Pseudocode for clarity:
For each week between Start and End:
Get list of unique items active between dates (~12,000 rows)
For each unique item
Loop through ContItems table (~90,000 rows)
Return matches
Group
Group
Return results
The Code
DECLARE #WEEKSTART DATETIME; -- Used to pass start of period to search
DECLARE #WEEKEND DATETIME; -- Used to pass end of period to search
DECLARE #DC DATETIME; -- Used to increment dates
DECLARE #INT INT; -- days to increment for each iteration (7 = weeks)
DECLARE #TBL TABLE(DT DATETIME, SG VARCHAR(20), SN VARCHAR(50), TT INT, US INT); -- Return table
SET #WEEKSTART = '2012-05-01'; -- Set start of period
SET #WEEKEND = '2012-06-25'; -- Set end of period
SET #DC = #WEEKSTART; -- Start counter at first date
SET #INT = 7; -- Set increment to weeks
WHILE (#DC < #WEEKEND) -- Loop through dates every [#INT] days (weeks)
BEGIN
SET #DC = DATEADD(D,#INT,#DC); -- Add 7 days to the counter
INSERT INTO #TBL (DT, SG, SN, TT, US) -- Insert results from subquery into return table
SELECT #DC, SUB.GRPCODE, SubGrp.NAME, SUM(SUB.TOTSTK), SUM(USED)
FROM
(
SELECT STK.GRPCODE, 1 AS TOTSTK, CASE (SELECT COUNT(*)
FROM ContItems -- Contains list of hires with a start and end date
WHERE STK.ITEMNO = ContItems.ITEMNO -- unique item reference
AND ContItems.DELDATE <= DATEADD(MS,-2,DATEADD(D,#INT,#DC)) -- Hires starting before end of week searching
AND (ContItems.DOCDATE#5 >= #DC -- Hires ending after start of week searching
OR ContItems.DOCDATE#5 = '1899-12-30 00:00:00.000')) -- Or hire is still active
WHEN 0 THEN 0 -- None found return zero
WHEN NULL THEN 0 -- NULL return zero
ELSE 1 END AS USED -- Otherwise return 1
FROM Stock STK - List of unique items
WHERE [UNIQUE] = 1 AND [TYPE] != 4 -- Business rules
AND DATEPURCH < #DC AND (DATESOLD = '1899-12-30 00:00:00.000' OR DATESOLD > DATEADD(MS,-2,DATEADD(D,#INT,#DC))) -- Stock is valid between selected week
) SUB
INNER JOIN SubGrp -- Used to get 'pretty' names
ON SUB.GRPCODE = SubGrp.CODE
GROUP BY SUB.GRPCODE, SubGrp.NAME
END
-- Next section gets data from temp table
SELECT SG, SN, SUM(TT) AS TOT, SUM(US) AS USED, CAST(SUM(US) AS FLOAT) / CAST(SUM(TT) AS FLOAT) AS UTIL
FROM #TBL
GROUP BY SG, SN
ORDER BY TOT DESC

I have two suggestions.
First, rewrite the query to move the "select" statement from the case statement to the from clause:
SELECT #DC, SUB.GRPCODE, SubGrp.NAME, SUM(SUB.TOTSTK), SUM(USED)
FROM (SELECT STK.GRPCODE, 1 AS TOTSTK,
(CASE MAX(Contgrp.cnt) -- Or hire is still active
WHEN 0 THEN 0 -- None found return zero
WHEN NULL THEN 0 -- NULL return zero
ELSE 1
END) AS USED -- Otherwise return 1
FROM Stock STK left outer join -- List of unique items
(SELECT itemno, COUNT(*) as cnt
FROM ContItems -- Contains list of hires with a start and end date
WHERE ContItems.DELDATE <= DATEADD(MS,-2,DATEADD(D,#INT,#DC)) AND -- Hires starting before end of week searching
(ContItems.DOCDATE#5 >= #DC OR -- Hires ending after start of week searching
ContItems.DOCDATE#5 = '1899-12-30 00:00:00.000'
)
group by ITEMNO
) ContGrp
on STK.ITEMNO = ContItems.ITEMNO
WHERE [UNIQUE] = 1 AND [TYPE] != 4 AND -- Business rules
DATEPURCH < #DC AND (DATESOLD = '1899-12-30 00:00:00.000' OR DATESOLD > DATEADD(MS,-2,DATEADD(D,#INT,#DC))) -- Stock is valid between selected week
) SUB INNER JOIN SubGrp -- Used to get 'pretty' names
ON SUB.GRPCODE = SubGrp.CODE
GROUP BY SUB.GRPCODE, SubGrp.NAME
In doing this, I found a something suspicious. The case statement is operating at the level of "ItemNo", but the grouping is by "GrpCode". So, the "Count(*)" is really returning the sum at the group level. Is this what you intend?
The second is to dispense with the WHILE loop, if you have multiple weeks. To do this, you just need to convert DatePurch to an appropriate week. However, if the code usually runs on just one or two weeks, this effort may not help very much.

Well, replacing the DATEADD functions in the WHERE clauses at first.
You already have
SET #DC = DATEADD(D,#INT,#DC);
Why not declare another local variable for deletion date:
WHILE (#DC < #WEEKEND) -- Loop through dates every [#INT] days (weeks)
BEGIN
SET #DC = DATEADD(D,#INT,#DC);
DECLARE #DeletionDate DATETIME = DATEADD(MS,-2,DATEADD(D,#INT,#DC));
And use it in the case statement:
CASE (SELECT COUNT(*) .... AND ContItems.DELDATE <= #DeletionDate ....
And also in the outer where clause...
Then you need to make sure that you have correctly indexed your tables.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL to find overlapping time periods and sub-faults - sql

Related

Return 0 values where corresponding type is missing

Splitting up group by with relevant aggregates beyond the basic ones?

How do I calculate coverage dates during a period of time in SQL against a transactional table?

How to implement this logic?

Better way to calculate utilisation

Categories

Resources