Better way to calculate utilisation - sql

I have a rather complicated (and very inefficient) way of getting utilisation from a large list of periods (Code below).
Currently I'm running this for a period of 8 weeks and it's taking between 30 and 40 seconds to return data.
I need to run this regularly for periods of 6 months, 1 year and two years which will obviously take a massive amount of time.
Is there a smarter way to run this query to lower the number of table scans?
I have tried several ways of joining the data, all seem to return junk data.
I've tried to comment the code as much as I can but if anything is unclear let me know.
Table Sizes:
[Stock] ~12,000 records
[Contitems] ~90,000 records
Pseudocode for clarity:
For each week between Start and End:
Get list of unique items active between dates (~12,000 rows)
For each unique item
Loop through ContItems table (~90,000 rows)
Return matches
Group
Group
Return results
The Code
DECLARE #WEEKSTART DATETIME; -- Used to pass start of period to search
DECLARE #WEEKEND DATETIME; -- Used to pass end of period to search
DECLARE #DC DATETIME; -- Used to increment dates
DECLARE #INT INT; -- days to increment for each iteration (7 = weeks)
DECLARE #TBL TABLE(DT DATETIME, SG VARCHAR(20), SN VARCHAR(50), TT INT, US INT); -- Return table
SET #WEEKSTART = '2012-05-01'; -- Set start of period
SET #WEEKEND = '2012-06-25'; -- Set end of period
SET #DC = #WEEKSTART; -- Start counter at first date
SET #INT = 7; -- Set increment to weeks
WHILE (#DC < #WEEKEND) -- Loop through dates every [#INT] days (weeks)
BEGIN
SET #DC = DATEADD(D,#INT,#DC); -- Add 7 days to the counter
INSERT INTO #TBL (DT, SG, SN, TT, US) -- Insert results from subquery into return table
SELECT #DC, SUB.GRPCODE, SubGrp.NAME, SUM(SUB.TOTSTK), SUM(USED)
FROM
(
SELECT STK.GRPCODE, 1 AS TOTSTK, CASE (SELECT COUNT(*)
FROM ContItems -- Contains list of hires with a start and end date
WHERE STK.ITEMNO = ContItems.ITEMNO -- unique item reference
AND ContItems.DELDATE <= DATEADD(MS,-2,DATEADD(D,#INT,#DC)) -- Hires starting before end of week searching
AND (ContItems.DOCDATE#5 >= #DC -- Hires ending after start of week searching
OR ContItems.DOCDATE#5 = '1899-12-30 00:00:00.000')) -- Or hire is still active
WHEN 0 THEN 0 -- None found return zero
WHEN NULL THEN 0 -- NULL return zero
ELSE 1 END AS USED -- Otherwise return 1
FROM Stock STK - List of unique items
WHERE [UNIQUE] = 1 AND [TYPE] != 4 -- Business rules
AND DATEPURCH < #DC AND (DATESOLD = '1899-12-30 00:00:00.000' OR DATESOLD > DATEADD(MS,-2,DATEADD(D,#INT,#DC))) -- Stock is valid between selected week
) SUB
INNER JOIN SubGrp -- Used to get 'pretty' names
ON SUB.GRPCODE = SubGrp.CODE
GROUP BY SUB.GRPCODE, SubGrp.NAME
END
-- Next section gets data from temp table
SELECT SG, SN, SUM(TT) AS TOT, SUM(US) AS USED, CAST(SUM(US) AS FLOAT) / CAST(SUM(TT) AS FLOAT) AS UTIL
FROM #TBL
GROUP BY SG, SN
ORDER BY TOT DESC

I have two suggestions.
First, rewrite the query to move the "select" statement from the case statement to the from clause:
SELECT #DC, SUB.GRPCODE, SubGrp.NAME, SUM(SUB.TOTSTK), SUM(USED)
FROM (SELECT STK.GRPCODE, 1 AS TOTSTK,
(CASE MAX(Contgrp.cnt) -- Or hire is still active
WHEN 0 THEN 0 -- None found return zero
WHEN NULL THEN 0 -- NULL return zero
ELSE 1
END) AS USED -- Otherwise return 1
FROM Stock STK left outer join -- List of unique items
(SELECT itemno, COUNT(*) as cnt
FROM ContItems -- Contains list of hires with a start and end date
WHERE ContItems.DELDATE <= DATEADD(MS,-2,DATEADD(D,#INT,#DC)) AND -- Hires starting before end of week searching
(ContItems.DOCDATE#5 >= #DC OR -- Hires ending after start of week searching
ContItems.DOCDATE#5 = '1899-12-30 00:00:00.000'
)
group by ITEMNO
) ContGrp
on STK.ITEMNO = ContItems.ITEMNO
WHERE [UNIQUE] = 1 AND [TYPE] != 4 AND -- Business rules
DATEPURCH < #DC AND (DATESOLD = '1899-12-30 00:00:00.000' OR DATESOLD > DATEADD(MS,-2,DATEADD(D,#INT,#DC))) -- Stock is valid between selected week
) SUB INNER JOIN SubGrp -- Used to get 'pretty' names
ON SUB.GRPCODE = SubGrp.CODE
GROUP BY SUB.GRPCODE, SubGrp.NAME
In doing this, I found a something suspicious. The case statement is operating at the level of "ItemNo", but the grouping is by "GrpCode". So, the "Count(*)" is really returning the sum at the group level. Is this what you intend?
The second is to dispense with the WHILE loop, if you have multiple weeks. To do this, you just need to convert DatePurch to an appropriate week. However, if the code usually runs on just one or two weeks, this effort may not help very much.

Well, replacing the DATEADD functions in the WHERE clauses at first.
You already have
SET #DC = DATEADD(D,#INT,#DC);
Why not declare another local variable for deletion date:
WHILE (#DC < #WEEKEND) -- Loop through dates every [#INT] days (weeks)
BEGIN
SET #DC = DATEADD(D,#INT,#DC);
DECLARE #DeletionDate DATETIME = DATEADD(MS,-2,DATEADD(D,#INT,#DC));
And use it in the case statement:
CASE (SELECT COUNT(*) .... AND ContItems.DELDATE <= #DeletionDate ....
And also in the outer where clause...
Then you need to make sure that you have correctly indexed your tables.

Related

Using Conditional Aggregation in SubQuery

Thanks to the help of another user, I was able to use Conditional Aggregation to get the data point I need. I now need to implement this into an existing query in order to get an SLA % for a date range (rather than each package). Previous post for reference: Pull a DATEDIFF between Rows with Distinct value and WHERE Clause
The below query was used when the assumption that the 2 timestamps in 'PackageTable' were accurate enough to calculate SLA. Since I found out they were not, I have to run the query on a different table (PackageTable_Audit) that basically records events in a row when a package moves from LifeCycleStatusId = 1 (creation) to LifeCycleStatusId = 3 (Assigned) to LifeCycleStatusId = 5 (Completed). As such, the SLA adherence % is the amount of packages that were completed in X seconds / total packages. Since I can't use a simple DATEDIFF in a sub-query, and thus have to use the aggregate function to get a DATEDIFF between rows, I'm not sure how to work it into the query.
I've updated my old query with the Conditional Aggregate, but I get the following Error:
"Cannot perform an aggregate function on an expression containing an aggregate or a subquery."
Query:
-- VARIABLE DECLARATION AND INITIALIZATION
DECLARE #StartDate varchar(10);
DECLARE #EndDate varchar(10);
SET #StartDate = '2019-06-01';
SET #EndDate = '2019-06-31';
-- TABLE DECLARATION ##################################################
DECLARE #TABLE1 TABLE("No. Packages in SLA" INT, "Total Packages" INT, "SLA %" FLOAT)
--#####################################################################
-- WHAT GETS INSERTED INTO TABLE 1
INSERT INTO #TABLE1
SELECT
A.NUM, A.DENOM, CAST(A.NUM AS FLOAT)/A.DENOM*100
FROM
(
-- COLUMN SELECTION. TWO NUMBERS WILL REPRESENT A NUM AND A DENOM
SELECT
(SELECT SUM(CASE
WHEN
datediff(second, MAX(CASE WHEN LifeCycleStatusId = 2 THEN rowDateModified END),
MAX(CASE WHEN LifeCycleSTatusId = 5 THEN rowDateModified END)
) < 172800
THEN 1
ELSE 0
END) AS IN_SLA
FROM PackageTable WITH (nolock)
WHERE lifecyclestatusid = 5
AND rowDateCreated BETWEEN #StartDate AND #EndDate)
AS NUM,
(SELECT COUNT(PackageGuid) As No_Packages
FROM PackageTable WITH (nolock)
WHERE lifecyclestatusid = 5
AND rowDateCreated BETWEEN #StartDate AND #EndDate)
AS DENOM
) A
SELECT "No. Packages in SLA", "Total Packages", "SLA %"
FROM #TABLE1

SQL: Count of records by consecutive date, even when no records exist for a date

I'm using SQL Server 2008 R2.
I’m querying a table of Hospital Appointment Slots and trying to return a list of how many appointment slots for a specific doctor are flagged as being booked, grouped by week number/year.
There are some instances of weeks that don’t have any booked appointments yet, but I want the result to list ALL the forthcoming weeks, even where the count of booked appointment slots is zero.
The output I’m looking for is along these lines:
-------------------------------------------
Year | Week Number | Number of Booked Slots
-------------------------------------------
2017 | 48 | 10
2017 | 49 | 0
2017 | 50 | 4
2017 | 51 | 2
2017 | 52 | 0
2018 | 1 | 5
I understand that a standard select aggregating the results won’t show those weeks where there's a zero count of records, because there’s nothing to return – so I’ve tried to get around this by using a cte to first produce a list of all the forthcoming weeks.
However, try as I might, I can’t get the query to display the zero weeks...
I’ve seen a number of solutions to similar problems to this, but despite experimenting I haven’t been able to apply them to my particular problem (including SQL Count to include zero values)
This is the latest iteration of the query I’ve written so far.
WITH CTE_Dates AS
(
SELECT DISTINCT Slot_Start_Date AS cte_date
FROM [Outpatients.vw_OP_Clinic_Slots WITH (NOLOCK)
)
SELECT
DATEPART(year,OPCS.Slot_Start_Date) [Year]
,DATEPART(week,OPCS.Slot_Start_Date) [Week Number]
,count(OPCS.Slot_Start_Date) [Number of Booked Slots]
FROM
Outpatients.vw_OP_Clinic_Slots OPCS WITH (NOLOCK)
LEFT OUTER JOIN CTE_Dates ON OPCS.Slot_Start_Date=CTE_Dates.cte_date
LEFT OUTER JOIN Outpatients.vw_OP_Clinics CLIN ON OPCS.Clinic_Code=CLIN.Clinic_Code
WHERE
OPCS.Slot_Start_Date >= '14/08/2017'
AND OPCS.Booked_Flag = 'Y'
AND CLIN.Lead_Healthcare_Professional_Name = 'Dr X'
GROUP BY
DATEPART(year,OPCS.Slot_Start_Date)
,DATEPART(week,OPCS.Slot_Start_Date)
ORDER BY
DATEPART(year,OPCS.Slot_Start_Date)asc
,DATEPART(week,OPCS.Slot_Start_Date)asc
The result it’s returning me is correct, BUT I just need it to include those weeks in the list where the count is zero.
Please can anyone explain where I’m going wrong? I’m guessing I’m not joining the cte correctly, but I’ve tried both right and left joins which produce the same result. I’ve also tried inverting the query by swopping the above query statement and cte around, but this doesn’t work either.
Appreciate any guidance anyone can suggest.
Just RIGHT JOIN #ListOfWeeks table to the result set you have:
DECLARE #ListOfWeeks TABLE ([Week_No] int, [Year_Number] int);
DECLARE #i tinyint = 1, #y int = 2010;
WHILE #i <= 52 AND #y < 2018
BEGIN
INSERT INTO #ListOfWeeks([Week_No], [Year_Number]) VALUES (#i, #y);
IF #i = 52 BEGIN
SET #i = 0
SET #y +=1
END
SET #i += 1
END
SELECT * FROM #ListOfWeeks
WITH [Your_Part] AS(
SELECT
DATEPART(year,OPCS.Slot_Start_Date) [Year]
,DATEPART(week,OPCS.Slot_Start_Date) [Week Number]
,count(OPCS.Slot_Start_Date) [Number of Booked Slots]
FROM
Outpatients.vw_OP_Clinic_Slots OPCS WITH (NOLOCK)
LEFT OUTER JOIN CTE_Dates ON OPCS.Slot_Start_Date=CTE_Dates.cte_date
LEFT OUTER JOIN Outpatients.vw_OP_Clinics CLIN ON OPCS.Clinic_Code=CLIN.Clinic_Code
WHERE
OPCS.Slot_Start_Date >= '14/08/2017'
AND OPCS.Booked_Flag = 'Y'
AND CLIN.Lead_Healthcare_Professional_Name = 'Dr X'
GROUP BY
DATEPART(year,OPCS.Slot_Start_Date)
,DATEPART(week,OPCS.Slot_Start_Date)
),
SELECT xxx.[Year_Number], xxx.[Week_No], yp.[Number of Booked Slots]
FROM [Your_Part] yp
RIGHT JOIN #ListOfWeeks xxx ON yp.[Year] = xxx.[Year_Number] AND yp.[Week Number] = xxx.[Week_No]
IF OBJECT_ID(N'tempdb..##cal_weeks_temp', N'U') IS NOT NULL
DROP TABLE ##cal_weeks_temp;
create table ##cal_weeks_temp (date_of_week date, week_num int)
declare #start_date date
declare #end_date date
set #start_date='01/01/2017'
set #end_date='12/31/2018'
while #start_date<#end_date
begin
set #start_date=dateadd(day,1,#start_date)
insert into ##cal_weeks_temp values (#start_date,DATEPART(week,#start_date))
end
select YEAR(t1.date_of_week) 'YEAR',t1.week_num,
sum(case convert(varchar,t2.BookedTime,105) when convert(varchar,t1.date_of_week,105) then 1 else 0 end) 'count'
from ##cal_weeks_temp t1
left join Your_Table t2
on convert(varchar,t2.BookedTime,105)=convert(varchar,t1.date_of_week,105)
group by YEAR(t1.date_of_week) ,t1.week_num
order by YEAR(t1.date_of_week) ,t1.week_num

Splitting up group by with relevant aggregates beyond the basic ones?

I'm not sure if this has been asked before because I'm having trouble even asking it myself. I think the best way to explain my dilemma is to use an example.
Say I've rated my happiness on a scale of 1-10 every day for 10 years and I have the results in a big table where I have a single date correspond to a single integer value of my happiness rating. I say, though, that I only care about my happiness over 60 day periods on average (this may seem weird but this is a simplified example). So I wrap up this information to a table where I now have a start date field, an end date field, and an average rating field where the start days are every day from the first day to the last over all 10 years, but the end dates are exactly 60 days later. To be clear, these 60 day periods are overlapping (one would share 59 days with the next one, 58 with the next, and so on).
Next I pick a threshold rating, say 5, where I want to categorize everything below it into a "bad" category and everything above into a "good" category. I could easily add another field and use a case structure to give every 60-day range a "good" or "bad" flag.
Then to sum it up, I want to display the total periods of "good" and "bad" from maximum beginning to maximum end date. This is where I'm stuck. I could group by the good/bad category and then just take min(start date) and max(end date), but then if, say, the ranges go from good to bad to good then to bad again, output would show overlapping ranges of good and bad. In the aforementioned situation, I would want to show four different ranges.
I realize this may seem clearer to me that it would to someone else so if you need clarification just ask.
Thank you
---EDIT---
Here's an example of what the before would look like:
StartDate| EndDate| MoodRating
------------+------------+------------
1/1/1991 |3/1/1991 | 7
1/2/1991 |3/2/1991 | 7
1/3/1991 |3/3/1991 | 4
1/4/1991 |3/4/1991 | 4
1/5/1991 |3/5/1991 | 7
1/6/1991 |3/6/1991 | 7
1/7/1991 |3/7/1991 | 4
1/8/1991 |3/8/1991 | 4
1/9/1991 |3/9/1991 | 4
And the after:
MinStart| MaxEnd | Good/Bad
-----------+------------+----------
1/1/1991|3/2/1991 |good
1/3/1991|3/4/1991 |bad
1/5/1991|3/6/1991 |good
1/7/1991|3/9/1991 |bad
Currently my query with the group by rating would show:
MinStart| MaxEnd | Good/Bad
-----------+------------+----------
1/1/1991|3/6/1991 |good
1/3/1991|3/9/1991 |bad
This is something along the lines of
select min(StartDate), max(EndDate), Good_Bad
from sourcetable
group by Good_Bad
While Jason A Long's answer may be correct - I can't read it or figure it out, so I figured I would post my own answer. Assuming that this isn't a process that you're going to be constantly running, the CURSOR's performance hit shouldn't matter. But (at least to me) this solution is very readable and can be easily modified.
In a nutshell - we insert the first record from your source table into our results table. Next, we grab the next record and see if the mood score is the same as the previous record. If it is, we simply update the previous record's end date with the current record's end date (extending the range). If not, we insert a new record. Rinse, repeat. Simple.
Here is your setup and some sample data:
DECLARE #MoodRanges TABLE (StartDate DATE, EndDate DATE, MoodRating int)
INSERT INTO #MoodRanges
VALUES
('1/1/1991','3/1/1991', 7),
('1/2/1991','3/2/1991', 7),
('1/3/1991','3/3/1991', 4),
('1/4/1991','3/4/1991', 4),
('1/5/1991','3/5/1991', 7),
('1/6/1991','3/6/1991', 7),
('1/7/1991','3/7/1991', 4),
('1/8/1991','3/8/1991', 4),
('1/9/1991','3/9/1991', 4)
Next, we can create a table to store our results, as well as some variable placeholders for our cursor:
DECLARE #MoodResults TABLE(ID INT IDENTITY(1, 1), StartDate DATE, EndDate DATE, MoodScore varchar(50))
DECLARE #CurrentStartDate DATE, #CurrentEndDate DATE, #CurrentMoodScore INT,
#PreviousStartDate DATE, #PreviousEndDate DATE, #PreviousMoodScore INT
Now we put all of the sample data into our CURSOR:
DECLARE MoodCursor CURSOR FOR
SELECT StartDate, EndDate, MoodRating
FROM #MoodRanges
OPEN MoodCursor
FETCH NEXT FROM MoodCursor INTO #CurrentStartDate, #CurrentEndDate, #CurrentMoodScore
WHILE ##FETCH_STATUS = 0
BEGIN
IF #PreviousStartDate IS NOT NULL
BEGIN
IF (#PreviousMoodScore >= 5 AND #CurrentMoodScore >= 5)
OR (#PreviousMoodScore < 5 AND #CurrentMoodScore < 5)
BEGIN
UPDATE #MoodResults
SET EndDate = #CurrentEndDate
WHERE ID = (SELECT MAX(ID) FROM #MoodResults)
END
ELSE
BEGIN
INSERT INTO
#MoodResults
VALUES
(#CurrentStartDate, #CurrentEndDate, CASE WHEN #CurrentMoodScore >= 5 THEN 'GOOD' ELSE 'BAD' END)
END
END
ELSE
BEGIN
INSERT INTO
#MoodResults
VALUES
(#CurrentStartDate, #CurrentEndDate, CASE WHEN #CurrentMoodScore >= 5 THEN 'GOOD' ELSE 'BAD' END)
END
SET #PreviousStartDate = #CurrentStartDate
SET #PreviousEndDate = #CurrentEndDate
SET #PreviousMoodScore = #CurrentMoodScore
FETCH NEXT FROM MoodCursor INTO #CurrentStartDate, #CurrentEndDate, #CurrentMoodScore
END
CLOSE MoodCursor
DEALLOCATE MoodCursor
And here are the results:
SELECT * FROM #MoodResults
ID StartDate EndDate MoodScore
----------- ---------- ---------- --------------------------------------------------
1 1991-01-01 1991-03-02 GOOD
2 1991-01-03 1991-03-04 BAD
3 1991-01-05 1991-03-06 GOOD
4 1991-01-07 1991-03-09 BAD
Is this what you're looking for?
IF OBJECT_ID('tempdb..#MyDailyMood', 'U') IS NOT NULL
DROP TABLE #MyDailyMood;
CREATE TABLE #MyDailyMood (
TheDate DATE NOT NULL,
MoodLevel INT NOT NULL
);
WITH
cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
cte_n3 (n) AS (SELECT 1 FROM cte_n2 a CROSS JOIN cte_n2 b),
cte_Calendar (dt) AS (
SELECT TOP (DATEDIFF(dd, '2007-01-01', '2017-01-01'))
DATEADD(dd, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1, '2007-01-01')
FROM
cte_n3 a CROSS JOIN cte_n3 b
)
INSERT #MyDailyMood (TheDate, MoodLevel)
SELECT
c.dt,
ABS(CHECKSUM(NEWID()) % 10) + 1
FROM
cte_Calendar c;
--==========================================================
WITH
cte_AddRN AS (
SELECT
*,
RN = ISNULL(NULLIF(ROW_NUMBER() OVER (ORDER BY mdm.TheDate) % 60, 0), 60)
FROM
#MyDailyMood mdm
),
cte_AssignGroups AS (
SELECT
*,
DateGroup = DENSE_RANK() OVER (PARTITION BY arn.RN ORDER BY arn.TheDate)
FROM
cte_AddRN arn
)
SELECT
BegOfRange = MIN(ag.TheDate),
EndOfRange = MAX(ag.TheDate),
AverageMoodLevel = AVG(ag.MoodLevel),
CASE WHEN AVG(ag.MoodLevel) >= 5 THEN 'Good' ELSE 'Bad' END
FROM
cte_AssignGroups ag
GROUP BY
ag.DateGroup;
Post OP update solution...
WITH
cte_AddRN AS ( -- Add a row number to each row that resets to 1 ever 60 rows.
SELECT
*,
RN = ISNULL(NULLIF(ROW_NUMBER() OVER (ORDER BY mdm.TheDate) % 60, 0), 60)
FROM
#MyDailyMood mdm
),
cte_AssignGroups AS ( -- Use DENSE_RANK to create groups based on the RN added above.
-- How it works: RN set the row number 1 - 60 then repeats itself
-- but we dont want ever 60th row grouped together. We want blocks of 60 consecutive rows grouped together
-- DENSE_RANK accompolishes this by ranking within all the "1's", "2's"... and so on.
-- verify with the following query... SELECT * FROM cte_AssignGroups ag ORDER BY ag.TheDate
SELECT
*,
DateGroup = DENSE_RANK() OVER (PARTITION BY arn.RN ORDER BY arn.TheDate)
FROM
cte_AddRN arn
),
cte_AggRange AS ( -- This is just a straight forward aggregation/rollup. It produces the results similar to the sample data you posed in your edit.
SELECT
BegOfRange = MIN(ag.TheDate),
EndOfRange = MAX(ag.TheDate),
AverageMoodLevel = AVG(ag.MoodLevel),
GorB = CASE WHEN AVG(ag.MoodLevel) >= 5 THEN 'Good' ELSE 'Bad' END,
ag.DateGroup
FROM
cte_AssignGroups ag
GROUP BY
ag.DateGroup
),
cte_CompactGroup AS ( -- This time we're using dense rank to group all of the consecutive "Good" and "Bad" values so that they can be further aggregated below.
SELECT
ar.BegOfRange, ar.EndOfRange, ar.AverageMoodLevel, ar.GorB, ar.DateGroup,
DenseGroup = ar.DateGroup - DENSE_RANK() OVER (PARTITION BY ar.GorB ORDER BY ar.BegOfRange)
FROM
cte_AggRange ar
)
-- The final aggregation step...
SELECT
BegOfRange = MIN(cg.BegOfRange),
EndOfRange = MAX(cg.EndOfRange),
cg.GorB
FROM
cte_CompactGroup cg
GROUP BY
cg.DenseGroup,
cg.GorB
ORDER BY
BegOfRange;

SQL To Track Rules

I am using MS SQL Server and have a stored procedure where I evaluate transactions based on certain rules and mark each row as eligible or not based on these rules. For example, a transaction from prior year is ineligible, certain products may not be eligible.
I also want to record the reason why the transaction is ineligible. For example, from prior year, ineligible product, etc. I have a table that lists all ineligibility codes.
I apply rules sequentially and record the first reason for ineligibility in the field eligCode defined as int.
But I cannot seem to figure out how to code this in the stored procedure. Any help would be greatly appreciated. Thanks in advance.
This should give you head start:
CREATE PROCEDURE [dbo.YourSprocName]
AS
BEGIN
SELECT CASE
WHEN datepart(year, transaction_date) = datepart(year, getdate()) - 1 --if transaction_date year = previous year
THEN 'Ineligible'
WHEN product_type = < something > -- non eligible products
THEN 'Non-eligible'
ELSE 'Eligible'
END AS transaction_status
,CASE
WHEN ineligibility_code = 1 -- assuming 1 is one of the ineligbility code
THEN 'Bad transaction'
ELSE 'Unknown'
END AS ineligibilty_reason_desc
FROM YourTable
WHERE yourColumn = < condition >
END
declare #eligCode as int
declare #TransID as int
....
select #TransID = NNN -- some value declared above
#eligCode = CASE
WHEN YEAR(TRANSACTION_DATE) < YEAR(getdate()) THEN 1
WHEN [CONDITION2] THEN 2
WHEN [CONDITION3] THEN 3
WHEN [CONDITION4] THEN 4
ELSE 0
END
...
Insert into LOG_TABLE (Transaction_ID, eligCode) values (#TransID, #eligCode)

SQL to find overlapping time periods and sub-faults

Long time stalker, first time poster (and SQL beginner). My question is similar to this one SQL to find time elapsed from multiple overlapping intervals, except I'm able to use CTE, UDFs etc and am looking for more detail.
On a piece of large scale equipment I have a record of all faults that arise. Faults can arise on different sub-components of the system, some may take it offline completely (complete outage = yes), while others do not (complete outage = no). Faults can overlap in time, and may not have end times if the fault has not yet been repaired.
Outage_ID StartDateTime EndDateTime CompleteOutage
1 07:00 3-Jul-13 08:55 3-Jul13 Yes
2 08:30 3-Jul-13 10:00 4-Jul13 No
3 12:00 4-Jul-13 No
4 12:30 4-Jul13 12:35 4-Jul-13 No
1 |---------|
2 |---------|
3 |--------------------------------------------------------------
4 |---|
I need to be able to work out for a user defined time period, how long the total system is fully functional (no faults), how long its degraded (one or more non-complete outages) and how long inoperable (one or more complete outages). I also need to be able to work out for any given time period which faults were on the system. I was thinking of creating a "Stage Change" table anytime a fault is opened or closed, but I am stuck on the best way to do this - any help on this or better solutions would be appreciated!
This isn't a complete solution (I leave that as an exercise :)) but should illustrate the basic technique. The trick is to create a state table (as you say). If you record a 1 for a "start" event and a -1 for an "end" event then a running total in event date/time order gives you the current state at that particular event date/time. The SQL below is T-SQL but should be easily adaptable to whatever database server you're using.
Using your data for partial outage as an example:
DECLARE #Faults TABLE (
StartDateTime DATETIME NOT NULL,
EndDateTime DATETIME NULL
)
INSERT INTO #Faults (StartDateTime, EndDateTime)
SELECT '2013-07-03 08:30', '2013-07-04 10:00'
UNION ALL SELECT '2013-07-04 12:00', NULL
UNION ALL SELECT '2013-07-04 12:30', '2013-07-04 12:35'
-- "Unpivot" the events and assign 1 to a start and -1 to an end
;WITH FaultEvents AS (
SELECT *, Ord = ROW_NUMBER() OVER(ORDER BY EventDateTime)
FROM (
SELECT EventDateTime = StartDateTime, Evt = 1
FROM #Faults
UNION ALL SELECT EndDateTime, Evt = -1
FROM #Faults
WHERE EndDateTime IS NOT NULL
) X
)
-- Running total of Evt gives the current state at each date/time point
, FaultEventStates AS (
SELECT A.Ord, A.EventDateTime, A.Evt, [State] = (SELECT SUM(B.Evt) FROM FaultEvents B WHERE B.Ord <= A.Ord)
FROM FaultEvents A
)
SELECT StartDateTime = S.EventDateTime, EndDateTime = F.EventDateTime
FROM FaultEventStates S
OUTER APPLY (
-- Find the nearest transition to the no-fault state
SELECT TOP 1 *
FROM FaultEventStates B
WHERE B.[State] = 0
AND B.Ord > S.Ord
ORDER BY B.Ord
) F
-- Restrict to start events transitioning from the no-fault state
WHERE S.Evt = 1 AND S.[State] = 1
If you are using SQL Server 2012 then you have the option to calculate the running total using a windowing function.
The below is a rough guide to getting this working. It will compare against an interval table of dates and an interval table of 15 mins. It will then sum the outage events (1 event per interval), but not sum a partial outage if there is a full outage.
You could use a more granular time interval if you needed, I choose 15 mins for speed of coding.
I already had a date interval table set up "CAL.t_Calendar" so you would need to create one of your own to run this code.
Please note, this does not represent actual code you should use. It is only intended as a demonstration and to point you in a possible direction...
EDIT I've just realised I have't accounted for the null end dates. The code will need amending to check for NULL endDates and use #EndDate or GETDATE() if #EndDate is in the future
--drop table ##Events
CREATE TABLE #Events (OUTAGE_ID INT IDENTITY(1,1) PRIMARY KEY
,StartDateTime datetime
,EndDateTime datetime
, completeOutage bit)
INSERT INTO #Events VALUES ('2013-07-03 07:00','2013-07-03 08:55',1),('2013-07-03 08:30','2013-07-04 10:00',0)
,('2013-07-04 12:00',NULL,0),('2013-07-04 12:30','2013-07-04 12:35',0)
--drop table #FiveMins
CREATE TABLE #FiveMins (ID int IDENTITY(1,1) PRIMARY KEY, TimeInterval Time)
DECLARE #Time INT = 0
WHILE #Time <= 1410 --number of 15 min intervals in day * 15
BEGIN
INSERT INTO #FiveMins SELECT DATEADD(MINUTE , #Time, '00:00')
SET #Time = #Time + 15
END
SELECT * from #FiveMins
DECLARE #StartDate DATETIME = '2013-07-03'
DECLARE #EndDate DATETIME = '2013-07-04 23:59:59.999'
SELECT SUM(FullOutage) * 15 as MinutesFullOutage
,SUM(PartialOutage) * 15 as MinutesPartialOutage
,SUM(NoOutage) * 15 as MinutesNoOutage
FROM
(
SELECT DateAnc.EventDateTime
, CASE WHEN COUNT(OU.OUTAGE_ID) > 0 THEN 1 ELSE 0 END AS FullOutage
, CASE WHEN COUNT(OU.OUTAGE_ID) = 0 AND COUNT(pOU.OUTAGE_ID) > 0 THEN 1 ELSE 0 END AS PartialOutage
, CASE WHEN COUNT(OU.OUTAGE_ID) > 0 OR COUNT(pOU.OUTAGE_ID) > 0 THEN 0 ELSE 1 END AS NoOutage
FROM
(
SELECT CAL.calDate + MI.TimeInterval AS EventDateTime
FROM CAL.t_Calendar CAL
CROSS JOIN #FiveMins MI
WHERE CAL.calDate BETWEEN #StartDate AND #EndDate
) DateAnc
LEFT JOIN #Events OU
ON DateAnc.EventDateTime BETWEEN OU.StartDateTime AND OU.EndDateTime
AND OU.completeOutage = 1
LEFT JOIN #Events pOU
ON DateAnc.EventDateTime BETWEEN pOU.StartDateTime AND pOU.EndDateTime
AND pOU.completeOutage = 0
GROUP BY DateAnc.EventDateTime
) AllOutages