DAX running total based on 3 columns, one of which is a repeating integer running total - powerpivot

Very new to DAX/PowerPivot, and faced with devilishly tricky question on day one.
I have some data (90,000 rows) I'm trying to use to calculate a cumulative fatigue score for folk working shifts(using PowerPivot/Excel 2016). As per the below screenshot, the dataset is shift data for multiple employees, that has a cumulative count of days worked vs. days off that resets back to 1 whenever they switch from one state to the other, and a 'Score' column that in my production data contains a measure of how fatigued they are.
I would like to cumulatively sum that fatigue score, and reset it whenever they move between the 'Days worked' and 'Days off' states. My desired output is in the 'Desired' column far right, and I've used green highlighting to show days worked vs. days off as well as put a bold border around separate Emp_ID blocks to help demonstrate the data.
There is some similarity between my question and the SO post at DAX running total (or count) across 2 groups except that one of my columns (i.e. the Cumulative Days one) is in a repeating sequence from 1 to x. And Javier Guillén's post would probably make a good starting point if I'd had a couple of months of DAX under my belt, rather than the couple of hours I've gained today.
I can barely begin to conceptualize what the DAX would need to look like, given I'm a DAX newbie (my background is VBA, SQL, and Excel formulas). But lest someone berate me for not even providing a starting point, I tried to tweak the following DAX without really having a clue what I was doing:
Cumulative:=CALCULATE(
SUM( Shifts[Score] ) ,
FILTER(Shifts,Shifts[Cumulative Days] <= VALUES(Shifts[Cumulative Days] )) ,
ALLEXCEPT( shifts, Shifts[Workday],Shifts[EMP_ID] ) )
Now I'll be the first to admit that this code is DAX equivelant of the Infinite Monkey Theorem. And alas, I have no bananas today, and my only hope is that someone finds this problem suitably a-peeling.

The problem with this table is there is no way to determine when stop summing while performing the cumulative total.
I think one way to achive it could be calculating the next first date where continuous workday status changes.
For example the workday status in the first three rows for EMP_ID 70073 are the same, until the fourth row, date 04-May which is the date the workday status changes. My idea is to create a calculated column that find the status change date for each workday serie. That column lets us implement the cumulative sum.
Below is the expression for the calculated column I named Helper.
Helper =
IF (
ISBLANK (
CALCULATE (
MIN ( [Date] ),
FILTER (
'Shifts',
'Shifts'[EMP_ID] = EARLIER ( 'Shifts'[EMP_ID] )
&& 'Shifts'[Workday] <> EARLIER ( 'Shifts'[Workday] )
&& [Date] > EARLIER ( 'Shifts'[Date] )
)
)
),
CALCULATE (
MAX ( [Date] ),
FILTER (
Shifts,
Shifts[Date] >= EARLIER ( Shifts[Date] )
&& Shifts[EMP_ID] = EARLIER ( Shifts[EMP_ID] )
)
)
+ 1,
CALCULATE (
MIN ( [Date] ),
FILTER (
'Shifts',
'Shifts'[EMP_ID] = EARLIER ( 'Shifts'[EMP_ID] )
&& 'Shifts'[Workday] <> EARLIER ( 'Shifts'[Workday] )
&& [Date] > EARLIER ( 'Shifts'[Date] )
)
)
)
In short, the expression says if the date calculation for the current workday series change returns a blank use the last date for that EMP_ID ading one date.
Note there is no way to calculate the change date for the last workday serie, in this case 08-May rows, so if the the calculation returns blank it means it is being evaluated in the last serie then my expression should return the max date for that EMP_ID adding one day.
Once the calculated column is in the table you can use the following expression to create a measure for the cumulative value:
Cumulative Score =
CALCULATE (
SUM ( 'Shifts'[Score] ),
FILTER ( ALL ( 'Shifts'[Helper] ), [Helper] = MAX ( [Helper] ) ),
FILTER ( ALL ( 'Shifts'[Date] ), [Date] <= MAX ( [Date] ) )
)
In a table in Power BI (I have no access to PowerPivot at least eight hours) the result is this:
I think there is an easier solution, my first thought was using a variable, but that is only supported in DAX 2015, it is quite possible you are not using Excel 2016.
UPDATE: Leaving only one filter in the measure calculation. FILTER are iterators through the entire table, so using only one filter and logic operators could be more performant.
Cumulative Score =
CALCULATE (
SUM ( 'Shifts'[Score] ),
FILTER (
ALL ( 'Shifts'[Helper], Shifts[Date] ),
[Helper] = MAX ( [Helper] )
&& [Date] <= MAX ( [Date] )
)
)
UPDATE 2: Solution for pivot tables (matrix), since previous expression worked only for a tabular visualization. Also measure expression was optimized to implement only one filter.
This should be the final expression for pivot table:
Cumulative Score =
CALCULATE (
SUM ( 'Shifts'[Score] ),
FILTER (
ALLSELECTED ( Shifts ),
[Helper] = MAX ( [Helper] )
&& [EMP_ID] = MAX ( Shifts[EMP_ID] )
&& [Date] <= MAX ( Shifts[Date] )
)
)
Note: If you want to ignore filters use ALL instead of
ALLSELECTED.
Results in Power BI Matrix:
Results in PowerPivot Pivot Table:
Let me know if this helps.

Related

Z-Score in SQL based on last 1 year

I have daily data structured in the below format. Please note this is just a subset of the data and I had to make some modifications to be able to share it.
The first column is the [DataValue] for which I need to find the Z-score by IndexValue, [Qualifier], [QualifierCode] and [QualifierType]. I also have the [Date] column in there.
I essentially need to find the Z-score value for each data point by IndexValue, [Qualifier], [QualifierCode] and [QualifierType]. The main point of focus here is that I have data for the last 3 years but in order to calculate Z-score, I only want to take the average and standard deviation for the last one year.
Z-Score = [DataValue] - (Avg in last 1 year) / (Std Dev in last 1 year)
I am struggling with how to get average for the last one year. Would anybody be able to help me with this?
SELECT [IndexValue]
,[Qualifier]
,[QualifierCode]
,[QualifierType],[Date]
,[Month]
,[Year]
,[Z-Score] = ([DataValue] - ROUND(AVG([DataValue]),3))/ ROUND(STDEV([DataValue]),3)
FROM [TABLEA]
GROUP BY [IndexValue]
,[Qualifier]
,[QualifierCode]
,[QualifierType]
,[Date]
,[Month]
,[Year]
order by [IndexValue]
,[Qualifier]
,[QualifierCode]
,[QualifierType]
,[Date] desc
: https://i.stack.imgur.com/pqhJD.png
You need window functions for this:
SELECT a.*,
( (DataValue - AVG(DataValue) OVER ()) /
STDEV(DataValue) OVER ()
) as z_score
FROM [TABLEA] a;
Note: if data_value is an integer, you will need to convert it to a number with digits:
SELECT a.*,
( (DataValue - AVG(DataValue * 1.0) OVER ()) /
STDEV(DataValue) OVER ()
) as z_score
FROM [TABLEA] a;
Rounding for the calculation seems to be way off base, unless your intention is to produce a z-like score that isn't really a z-score.

Next available Date

I have a tubular model that has a standard star schema
On my dim date table there is a column that flags UK holidays
I would like to not included this date if a key chooses a date that has been flagged but the next availble date
I don't have much access to the database to build a function for this as Ive seen others do
Could anyone suggest some Dax or a method of doing this
Thanks so much in advance
sample
You can create a calculated column to get the next working dateKey if date is flagged as non working date. In case date is not flagged the column contains the dateKey value.
Use this DAX expression in the calculated column:
=
IF (
[isDefaultCalendarNonWorkingDay] = 1,
CALCULATE (
MIN ( [dateKey] ),
FILTER (
DimDate,
[dateKey] > EARLIER ( [dateKey] )
&& [isDefaultCalendarNonWorkingDay] = 0
)
),
[dateKey]
)
I've recreated you DimDate table with some sample data:
Let me know if this helps.

DAX Time Intelligence custom previous periods

My cube has a fact table with a "Sales" column.
There is a related Date Table "SalesDate" (properly marked as a Date Table)
I created a measure for "average sales" called [AvgSales]
There is also a measure for "past year average sales"
[AvgSales] :=
AVERAGE([Sales])
[PY AvgSales] :=
IF (
HASONEVALUE ( 'SalesDate'[Date] ),
CALCULATE (
[AvgSales],
DATEADD ( 'SalesDate'[Date], -1, YEAR )
),
BLANK ()
)
This works beautifully, and I can slice it in Excel like this: SalesDate[Year] on rows, SalesDate[Month] on columns.
The task at hand is to write a "past 5 year average sales" measure.
It is important that this measure will also work properly if you slice like described above (years on rows, months on columns)
I've spent a lot of time on http://www.daxpatterns.com/time-patterns/ but I'm really confused how to approach this properly.
This might be a bit simplistic but cant you just change the DATEADD function to -5 years?
[AvgSales] :=
AVERAGE([Sales])
[PY AvgSales] :=
IF (
HASONEVALUE ( 'SalesDate'[Date] ),
CALCULATE (
[AvgSales],
DATEADD ( 'SalesDate'[Date], -5, YEAR )
),
BLANK ()
)

Combine two DAX-measures in Power Pivot

I have two calculated measures, one that calculates number of new customers for each month and another that calculates transaction values for each month. Is it possible to combine the two measures to, for example, calculate the transaction value but only for the new customers?
Okay, I think I solved it. I may have been unclear about what I was doing and I was thinking in wrong direction. I was using Marco Russos formula for new customers and instead of counting rows for customers I simply changed it to calcuate transaction value. Now it looks something like this:
Test:=CALCULATE([Transaction Value SEK]; (
FILTER (
ADDCOLUMNS (
VALUES ( MonthlyStatistics[Mid] );
"PreviousSales"; CALCULATE (
COUNTROWS (MonthlyStatistics);
FILTER (
ALL ( 'Date' );
'Date'[Date] < MIN ( 'Date'[Date] )
)
)
);
[PreviousSales] = 0
)
))

Powerpivot-Flag first occurence depending on what user filters

In power pivot, I am trying to figure out how to tag the first occurrence of a visit, based upon what a user filters. For example, if they are looking at calendar year 2014 below is the data. Distinctcount works if you don't care about the time period in which the first count occurs.
If a user filters to March 2014 only, the would see
the following:
this seems tricky at first, but can be done very easily with DAX:
=
IF (
CALCULATE (
MIN ( Visits[VisitID] ),
ALL ( Visits[VisitID] ),
ALL ( Visits[AdmitDate] )
)
- MAX ( [VisitID] )
= 0,
1,
0
)
What this does is very straightforward - it removes the filter on both VisitID and AdmitDate, and by doing so it calculates the minimum for every single Patient ID. Then it subtracts MAX of VisitID for a given row. If the difference equals to 0 (that means this is the first visit), then the value is set to 1, otherwise the value is set to 0.
I have named this measure Check and if you then add it to your table, the result should look like this:
Works well with filtering too (in this case Filter is set on Month = 3):
Alternative approach using RANKX in case of multiple columns
Also, RANKX could be used to achieve this - it seems to be a bit more flexible solution, however I am not sure what would be the performance in a very large dataset.
=
IF (
HASONEVALUE ( Visits[PatientID] ),
IF (
RANKX (
FILTER (
ALLSELECTED ( Visits ),
Visits[PatientID] = MAX ( Visits[PatientID] )
),
[MIN Visit],
,
1,
DENSE
)
= 1,
1,
0
),
DISTINCTCOUNT ( Visits[PatientID] )
)
This works perfectly even if you filter any column. It's a bit complex to understand, but play around a bit with it and also check the linked documentation. What the formula basically does is dynamic RANK across group that is determined by [PatiendID].
The Visit ID is the key item being ranked - you have to use a new measure which I named MIN Visit:
=MIN([VisitID])
The very first IF checks, if the current row is a Total Row, and if so it performs a different calculation to get the total of patients with first visits (by counting distinct values of Patiend ID).
I have updated the source Excel file as well. Here is the link (2013 version).
Hope this helps.
It depends on what you want to ultimately see.
If you just want the first visit date, you would add a calculated field such as:
CALCULATE( FIRSTDATE( 'Date'[Date]), FILTER( Fact, Fact[AdmitDate] >= MIN( 'Date'[Date] ) ) )
IF you want to count the number of patients by date where it was their first visit, based on how the data is sliced, that gets much more complicated.