I have a calculated member which represents an active customer. That would be the following;
WITH MEMBER [Measures].[Active Customers] AS
Count ( nonempty( Filter (
( [Customer].[Customer Key].Members, [Measures].[Turnover] ),
[Measures].[Turnover] > 0
) ) )
This works great, when I want to get active customers in the current period and previous ones, as I get my time dimension, and use the CurrentMember, CurrentMember.PrevMember and CurrentMember with the Lag function in order to get customers who were active in previous periods.
My problem is when I want to get the count of customers, who are common in different members. Say I want to get customers who are active in the current period, and NOT in the previous period. Or another case, active in current, and active in previous. Because of this, I would need to use the INTERSECT function, and my customer dimension has 4 million records. This is already a subset of 9 million records.
So when checking for a customer who is active in 2 consecutive periods, I do this (The Active Previous Period, and Active Current Period is basically the calculated member above, however with CurrentMember and CurrentMember.PrevMember) :
set [Previous Active Customers Set] AS
Filter (
( [Customer].[Customer Key].Members, [Measures].[Active Previous Period] ),
[Measures].[Active Previous Period] > 0
)
set [Current Active Customers Set] AS
Filter (
( [Customer].[Customer Key].Members, [Measures].[Active Current Period] ),
[Measures].[Active Current Period] > 0
)
member [Measures].[Active 2 consecutive periods] as
count(INTERSECT([Current Active Customers Set],[Previous Active Customers Set]) )
This takes forever. Is there anyway to improve, or go around this performance problem of using the INTERSECT with large sets? Or maybe optimizations on the MDX query? I tried always using a subset of my customers dimension, but this only reduced the number of records to less than 4 million - so it's still large. Any help would be appreciated!
I would assume you can speed this up if you avoid using named sets and calculated members as far as possible.
One step towards this would be as follows: Create a new fact table with foreign keys just to your customer and time dimension, and add a record to it if a customer was active on that day. Build a measure group, let's say "activeCustomers" based on this table, just using "count" as the measure. But make this invisible, as we do not need it.
Then, you can replace
count( nonempty( Filter (
( [Customer].[Customer Key].Members, [Measures].[Turnover] ),
[Measures].[Turnover] > 0
) ) )
with
count( Exists(
[Customer].[Customer Key].Members,
<state your time selection here>,
"activeCustomers"
) )
Exists should be more efficient than Filter.
Another optimization approach could be the observation that instead of intersecting two sets generated via Filter, you could define one set with a more complex filter, avoiding that AS is looping along the customers twice, and then intersecting the results:
set [Active Customers Set] AS
Filter (
( [Customer].[Customer Key].Members, [Measures].[Active Previous Period] ),
[Measures].[Active Previous Period] > 0
AND
[Measures].[Active Current Period] > 0
)
Related
In a cube that contains memberships of a club, I have a column MembersInOut in my fact-table which holds when a member joined the club (Value = 1) and leaving (value = -1). The Club started jan 1. 2000. so no members before that date.
Now to know the current number of members on a specific date I can do this:
CREATE MEMBER CURRENTCUBE.[Measures].[Calculated MembersOfTheClub]
AS
Sum(
{[Date Dim].[Date].&[2000-01-01T00:00:00]:
[Date Dim].[Date].currentmember},
[Measures].[MembersInOut]
)
This works fine on the actuel date, but how to make this work on a date hierarchie [Year-Month-day] ?
Thanks
You could create Y-M-D hierarchy, then use expression like below
with member[Measures].[S1] AS
sum(
{NULL:[Date].[Calendar Date].CurrentMember}
, [Measures].[Internet Sales Count])
select nonempty ([Date].[Calendar Date].members) on rows, nonempty ({[Measures].[S1],[Measures].[Internet Sales Count]}) on columns from [Analysis Services Tutorial]
Zoe
Very new to DAX/PowerPivot, and faced with devilishly tricky question on day one.
I have some data (90,000 rows) I'm trying to use to calculate a cumulative fatigue score for folk working shifts(using PowerPivot/Excel 2016). As per the below screenshot, the dataset is shift data for multiple employees, that has a cumulative count of days worked vs. days off that resets back to 1 whenever they switch from one state to the other, and a 'Score' column that in my production data contains a measure of how fatigued they are.
I would like to cumulatively sum that fatigue score, and reset it whenever they move between the 'Days worked' and 'Days off' states. My desired output is in the 'Desired' column far right, and I've used green highlighting to show days worked vs. days off as well as put a bold border around separate Emp_ID blocks to help demonstrate the data.
There is some similarity between my question and the SO post at DAX running total (or count) across 2 groups except that one of my columns (i.e. the Cumulative Days one) is in a repeating sequence from 1 to x. And Javier Guillén's post would probably make a good starting point if I'd had a couple of months of DAX under my belt, rather than the couple of hours I've gained today.
I can barely begin to conceptualize what the DAX would need to look like, given I'm a DAX newbie (my background is VBA, SQL, and Excel formulas). But lest someone berate me for not even providing a starting point, I tried to tweak the following DAX without really having a clue what I was doing:
Cumulative:=CALCULATE(
SUM( Shifts[Score] ) ,
FILTER(Shifts,Shifts[Cumulative Days] <= VALUES(Shifts[Cumulative Days] )) ,
ALLEXCEPT( shifts, Shifts[Workday],Shifts[EMP_ID] ) )
Now I'll be the first to admit that this code is DAX equivelant of the Infinite Monkey Theorem. And alas, I have no bananas today, and my only hope is that someone finds this problem suitably a-peeling.
The problem with this table is there is no way to determine when stop summing while performing the cumulative total.
I think one way to achive it could be calculating the next first date where continuous workday status changes.
For example the workday status in the first three rows for EMP_ID 70073 are the same, until the fourth row, date 04-May which is the date the workday status changes. My idea is to create a calculated column that find the status change date for each workday serie. That column lets us implement the cumulative sum.
Below is the expression for the calculated column I named Helper.
Helper =
IF (
ISBLANK (
CALCULATE (
MIN ( [Date] ),
FILTER (
'Shifts',
'Shifts'[EMP_ID] = EARLIER ( 'Shifts'[EMP_ID] )
&& 'Shifts'[Workday] <> EARLIER ( 'Shifts'[Workday] )
&& [Date] > EARLIER ( 'Shifts'[Date] )
)
)
),
CALCULATE (
MAX ( [Date] ),
FILTER (
Shifts,
Shifts[Date] >= EARLIER ( Shifts[Date] )
&& Shifts[EMP_ID] = EARLIER ( Shifts[EMP_ID] )
)
)
+ 1,
CALCULATE (
MIN ( [Date] ),
FILTER (
'Shifts',
'Shifts'[EMP_ID] = EARLIER ( 'Shifts'[EMP_ID] )
&& 'Shifts'[Workday] <> EARLIER ( 'Shifts'[Workday] )
&& [Date] > EARLIER ( 'Shifts'[Date] )
)
)
)
In short, the expression says if the date calculation for the current workday series change returns a blank use the last date for that EMP_ID ading one date.
Note there is no way to calculate the change date for the last workday serie, in this case 08-May rows, so if the the calculation returns blank it means it is being evaluated in the last serie then my expression should return the max date for that EMP_ID adding one day.
Once the calculated column is in the table you can use the following expression to create a measure for the cumulative value:
Cumulative Score =
CALCULATE (
SUM ( 'Shifts'[Score] ),
FILTER ( ALL ( 'Shifts'[Helper] ), [Helper] = MAX ( [Helper] ) ),
FILTER ( ALL ( 'Shifts'[Date] ), [Date] <= MAX ( [Date] ) )
)
In a table in Power BI (I have no access to PowerPivot at least eight hours) the result is this:
I think there is an easier solution, my first thought was using a variable, but that is only supported in DAX 2015, it is quite possible you are not using Excel 2016.
UPDATE: Leaving only one filter in the measure calculation. FILTER are iterators through the entire table, so using only one filter and logic operators could be more performant.
Cumulative Score =
CALCULATE (
SUM ( 'Shifts'[Score] ),
FILTER (
ALL ( 'Shifts'[Helper], Shifts[Date] ),
[Helper] = MAX ( [Helper] )
&& [Date] <= MAX ( [Date] )
)
)
UPDATE 2: Solution for pivot tables (matrix), since previous expression worked only for a tabular visualization. Also measure expression was optimized to implement only one filter.
This should be the final expression for pivot table:
Cumulative Score =
CALCULATE (
SUM ( 'Shifts'[Score] ),
FILTER (
ALLSELECTED ( Shifts ),
[Helper] = MAX ( [Helper] )
&& [EMP_ID] = MAX ( Shifts[EMP_ID] )
&& [Date] <= MAX ( Shifts[Date] )
)
)
Note: If you want to ignore filters use ALL instead of
ALLSELECTED.
Results in Power BI Matrix:
Results in PowerPivot Pivot Table:
Let me know if this helps.
In power pivot, I am trying to figure out how to tag the first occurrence of a visit, based upon what a user filters. For example, if they are looking at calendar year 2014 below is the data. Distinctcount works if you don't care about the time period in which the first count occurs.
If a user filters to March 2014 only, the would see
the following:
this seems tricky at first, but can be done very easily with DAX:
=
IF (
CALCULATE (
MIN ( Visits[VisitID] ),
ALL ( Visits[VisitID] ),
ALL ( Visits[AdmitDate] )
)
- MAX ( [VisitID] )
= 0,
1,
0
)
What this does is very straightforward - it removes the filter on both VisitID and AdmitDate, and by doing so it calculates the minimum for every single Patient ID. Then it subtracts MAX of VisitID for a given row. If the difference equals to 0 (that means this is the first visit), then the value is set to 1, otherwise the value is set to 0.
I have named this measure Check and if you then add it to your table, the result should look like this:
Works well with filtering too (in this case Filter is set on Month = 3):
Alternative approach using RANKX in case of multiple columns
Also, RANKX could be used to achieve this - it seems to be a bit more flexible solution, however I am not sure what would be the performance in a very large dataset.
=
IF (
HASONEVALUE ( Visits[PatientID] ),
IF (
RANKX (
FILTER (
ALLSELECTED ( Visits ),
Visits[PatientID] = MAX ( Visits[PatientID] )
),
[MIN Visit],
,
1,
DENSE
)
= 1,
1,
0
),
DISTINCTCOUNT ( Visits[PatientID] )
)
This works perfectly even if you filter any column. It's a bit complex to understand, but play around a bit with it and also check the linked documentation. What the formula basically does is dynamic RANK across group that is determined by [PatiendID].
The Visit ID is the key item being ranked - you have to use a new measure which I named MIN Visit:
=MIN([VisitID])
The very first IF checks, if the current row is a Total Row, and if so it performs a different calculation to get the total of patients with first visits (by counting distinct values of Patiend ID).
I have updated the source Excel file as well. Here is the link (2013 version).
Hope this helps.
It depends on what you want to ultimately see.
If you just want the first visit date, you would add a calculated field such as:
CALCULATE( FIRSTDATE( 'Date'[Date]), FILTER( Fact, Fact[AdmitDate] >= MIN( 'Date'[Date] ) ) )
IF you want to count the number of patients by date where it was their first visit, based on how the data is sliced, that gets much more complicated.
I'm a newcomer to SQL MDX and don't know exactly how to achieve this.
I need to get data from my cube for the last X days from the last available data.
The following is my code:
SELECT { [Measures].[Fact Stays Count], [Measures].[Time Spent] } ON COLUMNS,
NON EMPTY { ( [Dim Locals].[Local Description].[Local Description].ALLMEMBERS * [FK Date].[Date].[Date] ) } ON ROWS
FROM
(
select { TAIL(FILTER([FK Date].[Date].MEMBERS, NOT ISEMPTY([FK Date].[Date].CURRENTMEMBER)),30) } ON COLUMNS
FROM (
SELECT ( STRTOSET(#userId, CONSTRAINED) ) ON COLUMNS
FROM [DW]
)
)
The problem is the query returns the last 30 days where data exists, not the last 30 consecutive calendar days.
How can I change the query to get the results I want?
Try this. The only thing I changed is the select with the dates in it. Instead of asking for the last 30 days where there is data for the measures, I'm asking for the last day where there is data for the measures, getting that last item and then doing the lag of 29 days for the beginning of the date range and then without the lag (to the last day with data) for the end of the date range.
SELECT { [Measures].[Fact Stays Count], [Measures].[Time Spent] } ON COLUMNS,
NON EMPTY { ( [Dim Locals].[Local Description].[Local Description].ALLMEMBERS * [FK Date].[Date].[Date] ) } ON ROWS
FROM
(
select { TAIL(FILTER([FK Date].[Date].MEMBERS, NOT ISEMPTY([FK Date].[Date].CURRENTMEMBER)),1).item(0).lag(29): TAIL(FILTER([FK Date].[Date].MEMBERS, NOT ISEMPTY([FK Date].[Date].CURRENTMEMBER)),1).itm(0)} ON COLUMNS
FROM (
SELECT ( STRTOSET(#userId, CONSTRAINED) ) ON COLUMNS
FROM [DW]
)
)
Be aware that the way you have the query now will return the last day where there is data for both measures. If those two measures don't line up it might not provide what you want. For instance, if there is data through Dec 30 2013 on Fact Stays Count and data through Jan 5 2014 on Time Spent, it would return Dec 30 2013. If you want it to depend on both measures, you are good. If you want it to depend on one measures, you can switch it to be something like the below instead.
Tail(Filter([FK Date].[Date].[Date].MEMBERS.MEMBERS, [Measures].[Fact Stays Count] >0),1).item(0)
I'd like to get the product with zero sales over 30 days. E.g. Below is my expected result:
Store,Product,Days
Store1, product1, 33
Store1, product2, 100
Store2, product5, 96
Store34, product14, 78
Store100, product9, 47
So I wrote below query:
WITH
MEMBER [Measures].[Zero Sales Days]
AS
COUNT(
FILTER(
NONEMPTY( [Calendar].[Date].[Day],[Measures].[POS Qty])
, ( [Measures].[POS Qty]=0)
)
)
SELECT
([Store].[Store].[Store],[product].[product].[product]) on 1,
([MEASURES].[Zero Sales Days]) ON 0
FROM [testcube]
The problem is: How to filter the case: days of zero sales<30
Thanks,
Nia
I did some change and then ran against my DB.
I got nothing if I added the where cause. If not, the result is '#Error'.
I need not select any time related dimension. What I want to do for the report is: select store and product dimension, and define a calculated measure to get the count. Boyan, I will be really appreciated it if you can need the detailed the query for it.
The function LastPeriods is what you're looking for:
WITH
MEMBER [Measures].[Zero Sales Days]
AS COUNT(
FILTER([Calendar].[Date].[Day],
SUM( LastPeriods(30, [Calendar].[Date].currentmember),[Measures].[POS Qty])
= 0 )
)
SELECT
([Store].[Store].[Store],[product].[product].[product]) on 1,
([MEASURES].[Zero Sales Days]) ON 0
FROM [testcube]
The following query works against Adventure Works and shows you the products with no sales for over 30 days from the date in the WHERE clause back:
WITH
MEMBER [Measures].[Number of Periods With No Sales] AS
Iif(([Date].[Date].CurrentMember, [Measures].[Internet Sales Amount])=0,
([Date].[Date].PrevMember, [Measures].[Number of Periods With No Sales])+1,
NULL
)
MEMBER [Measures].[Number of > 30 Periods With No Sales] AS
Sum(
Iif([Measures].[Number of Periods With No Sales] > 30,
[Measures].[Number of Periods With No Sales],
NULL
)
)
SELECT
{
[Measures].[Number of > 30 Periods With No Sales]
} ON 0,
NON EMPTY {
[Product].[Product Categories].[Product]
} ON 1
FROM [Adventure Works]
WHERE [Date].[Calendar].[Date].&[860]
You will need to re-work it (change the dimension/measure names) to get it to work against your db. Please let me know if you need a query which can give you all products regardless of the date, which have at least one period with more than 30 days with no sales (e.g. max period with no sales, or an arbitrary such period). This will require a few changes. Also, since the query is using recursion it may be slow - if it is too slow we can see how to improve its performance - something which may require changes to your data model to support this bit of analytics.