How to get data from last days - sql

I'm a newcomer to SQL MDX and don't know exactly how to achieve this.
I need to get data from my cube for the last X days from the last available data.
The following is my code:
SELECT { [Measures].[Fact Stays Count], [Measures].[Time Spent] } ON COLUMNS,
NON EMPTY { ( [Dim Locals].[Local Description].[Local Description].ALLMEMBERS * [FK Date].[Date].[Date] ) } ON ROWS
FROM
(
select { TAIL(FILTER([FK Date].[Date].MEMBERS, NOT ISEMPTY([FK Date].[Date].CURRENTMEMBER)),30) } ON COLUMNS
FROM (
SELECT ( STRTOSET(#userId, CONSTRAINED) ) ON COLUMNS
FROM [DW]
)
)
The problem is the query returns the last 30 days where data exists, not the last 30 consecutive calendar days.
How can I change the query to get the results I want?

Try this. The only thing I changed is the select with the dates in it. Instead of asking for the last 30 days where there is data for the measures, I'm asking for the last day where there is data for the measures, getting that last item and then doing the lag of 29 days for the beginning of the date range and then without the lag (to the last day with data) for the end of the date range.
SELECT { [Measures].[Fact Stays Count], [Measures].[Time Spent] } ON COLUMNS,
NON EMPTY { ( [Dim Locals].[Local Description].[Local Description].ALLMEMBERS * [FK Date].[Date].[Date] ) } ON ROWS
FROM
(
select { TAIL(FILTER([FK Date].[Date].MEMBERS, NOT ISEMPTY([FK Date].[Date].CURRENTMEMBER)),1).item(0).lag(29): TAIL(FILTER([FK Date].[Date].MEMBERS, NOT ISEMPTY([FK Date].[Date].CURRENTMEMBER)),1).itm(0)} ON COLUMNS
FROM (
SELECT ( STRTOSET(#userId, CONSTRAINED) ) ON COLUMNS
FROM [DW]
)
)
Be aware that the way you have the query now will return the last day where there is data for both measures. If those two measures don't line up it might not provide what you want. For instance, if there is data through Dec 30 2013 on Fact Stays Count and data through Jan 5 2014 on Time Spent, it would return Dec 30 2013. If you want it to depend on both measures, you are good. If you want it to depend on one measures, you can switch it to be something like the below instead.
Tail(Filter([FK Date].[Date].[Date].MEMBERS.MEMBERS, [Measures].[Fact Stays Count] >0),1).item(0)

Related

DAX running total based on 3 columns, one of which is a repeating integer running total

Very new to DAX/PowerPivot, and faced with devilishly tricky question on day one.
I have some data (90,000 rows) I'm trying to use to calculate a cumulative fatigue score for folk working shifts(using PowerPivot/Excel 2016). As per the below screenshot, the dataset is shift data for multiple employees, that has a cumulative count of days worked vs. days off that resets back to 1 whenever they switch from one state to the other, and a 'Score' column that in my production data contains a measure of how fatigued they are.
I would like to cumulatively sum that fatigue score, and reset it whenever they move between the 'Days worked' and 'Days off' states. My desired output is in the 'Desired' column far right, and I've used green highlighting to show days worked vs. days off as well as put a bold border around separate Emp_ID blocks to help demonstrate the data.
There is some similarity between my question and the SO post at DAX running total (or count) across 2 groups except that one of my columns (i.e. the Cumulative Days one) is in a repeating sequence from 1 to x. And Javier Guillén's post would probably make a good starting point if I'd had a couple of months of DAX under my belt, rather than the couple of hours I've gained today.
I can barely begin to conceptualize what the DAX would need to look like, given I'm a DAX newbie (my background is VBA, SQL, and Excel formulas). But lest someone berate me for not even providing a starting point, I tried to tweak the following DAX without really having a clue what I was doing:
Cumulative:=CALCULATE(
SUM( Shifts[Score] ) ,
FILTER(Shifts,Shifts[Cumulative Days] <= VALUES(Shifts[Cumulative Days] )) ,
ALLEXCEPT( shifts, Shifts[Workday],Shifts[EMP_ID] ) )
Now I'll be the first to admit that this code is DAX equivelant of the Infinite Monkey Theorem. And alas, I have no bananas today, and my only hope is that someone finds this problem suitably a-peeling.
The problem with this table is there is no way to determine when stop summing while performing the cumulative total.
I think one way to achive it could be calculating the next first date where continuous workday status changes.
For example the workday status in the first three rows for EMP_ID 70073 are the same, until the fourth row, date 04-May which is the date the workday status changes. My idea is to create a calculated column that find the status change date for each workday serie. That column lets us implement the cumulative sum.
Below is the expression for the calculated column I named Helper.
Helper =
IF (
ISBLANK (
CALCULATE (
MIN ( [Date] ),
FILTER (
'Shifts',
'Shifts'[EMP_ID] = EARLIER ( 'Shifts'[EMP_ID] )
&& 'Shifts'[Workday] <> EARLIER ( 'Shifts'[Workday] )
&& [Date] > EARLIER ( 'Shifts'[Date] )
)
)
),
CALCULATE (
MAX ( [Date] ),
FILTER (
Shifts,
Shifts[Date] >= EARLIER ( Shifts[Date] )
&& Shifts[EMP_ID] = EARLIER ( Shifts[EMP_ID] )
)
)
+ 1,
CALCULATE (
MIN ( [Date] ),
FILTER (
'Shifts',
'Shifts'[EMP_ID] = EARLIER ( 'Shifts'[EMP_ID] )
&& 'Shifts'[Workday] <> EARLIER ( 'Shifts'[Workday] )
&& [Date] > EARLIER ( 'Shifts'[Date] )
)
)
)
In short, the expression says if the date calculation for the current workday series change returns a blank use the last date for that EMP_ID ading one date.
Note there is no way to calculate the change date for the last workday serie, in this case 08-May rows, so if the the calculation returns blank it means it is being evaluated in the last serie then my expression should return the max date for that EMP_ID adding one day.
Once the calculated column is in the table you can use the following expression to create a measure for the cumulative value:
Cumulative Score =
CALCULATE (
SUM ( 'Shifts'[Score] ),
FILTER ( ALL ( 'Shifts'[Helper] ), [Helper] = MAX ( [Helper] ) ),
FILTER ( ALL ( 'Shifts'[Date] ), [Date] <= MAX ( [Date] ) )
)
In a table in Power BI (I have no access to PowerPivot at least eight hours) the result is this:
I think there is an easier solution, my first thought was using a variable, but that is only supported in DAX 2015, it is quite possible you are not using Excel 2016.
UPDATE: Leaving only one filter in the measure calculation. FILTER are iterators through the entire table, so using only one filter and logic operators could be more performant.
Cumulative Score =
CALCULATE (
SUM ( 'Shifts'[Score] ),
FILTER (
ALL ( 'Shifts'[Helper], Shifts[Date] ),
[Helper] = MAX ( [Helper] )
&& [Date] <= MAX ( [Date] )
)
)
UPDATE 2: Solution for pivot tables (matrix), since previous expression worked only for a tabular visualization. Also measure expression was optimized to implement only one filter.
This should be the final expression for pivot table:
Cumulative Score =
CALCULATE (
SUM ( 'Shifts'[Score] ),
FILTER (
ALLSELECTED ( Shifts ),
[Helper] = MAX ( [Helper] )
&& [EMP_ID] = MAX ( Shifts[EMP_ID] )
&& [Date] <= MAX ( Shifts[Date] )
)
)
Note: If you want to ignore filters use ALL instead of
ALLSELECTED.
Results in Power BI Matrix:
Results in PowerPivot Pivot Table:
Let me know if this helps.

DAX Time Intelligence custom previous periods

My cube has a fact table with a "Sales" column.
There is a related Date Table "SalesDate" (properly marked as a Date Table)
I created a measure for "average sales" called [AvgSales]
There is also a measure for "past year average sales"
[AvgSales] :=
AVERAGE([Sales])
[PY AvgSales] :=
IF (
HASONEVALUE ( 'SalesDate'[Date] ),
CALCULATE (
[AvgSales],
DATEADD ( 'SalesDate'[Date], -1, YEAR )
),
BLANK ()
)
This works beautifully, and I can slice it in Excel like this: SalesDate[Year] on rows, SalesDate[Month] on columns.
The task at hand is to write a "past 5 year average sales" measure.
It is important that this measure will also work properly if you slice like described above (years on rows, months on columns)
I've spent a lot of time on http://www.daxpatterns.com/time-patterns/ but I'm really confused how to approach this properly.
This might be a bit simplistic but cant you just change the DATEADD function to -5 years?
[AvgSales] :=
AVERAGE([Sales])
[PY AvgSales] :=
IF (
HASONEVALUE ( 'SalesDate'[Date] ),
CALCULATE (
[AvgSales],
DATEADD ( 'SalesDate'[Date], -5, YEAR )
),
BLANK ()
)

How do I build a MDX query that considers only facts that happened in the last 10 days of February?

I have a fact table that has a time dimension, which contains year, month, day and hour.
I was able to find ways to filter things that happened in a given day, or month (simple where/filter by the desired level). But I would like to create an MDX query that filter the results so my cube has information about the facts recorded in the last 10 days of febraury.
Is there anyway I can do it?
Assuming you have all the days of February in your cube, you could use a set inside there WHERE clause.
Something like this..
WHERE ([Date].Month)
Supposing you have a Year-Month-Day-Hour hierarchy in place and there may be some dates missing
Select....... on COLUMN,
....... ON ROWS
FROM ....
WHERE
({[Time].[Month].&[Feb 2015].LastChild.LAG(10) : [Date].[Month].&[Feb 2015].LastChild})
If no dates are missing in the date dim,
select ... ON COLUMNS,
... ON ROWS
FROM ...
WHERE
({[Time].[Date].&[02/19/2015] : [Date].[Date].&[02/28/2015]})
If you want the sales for last 10 days of Feb for every year:
SELECT Measures.Sales ON COLUMNS,
Products.Products.MEMBERS ON ROWS
FROM
(
SELECT
generate //This would build the set for the last 10 days of Feb for every year
(
[Time].[Year].[All].children,
TAIL //This returns the last 10 days of february(second month)
(
[Time].[Year].CURRENTMEMBER.FIRSTCHILD.LEAD(1).CHILDREN,
10
)
) ON COLUMNS
FROM YourCube
)
Just as some extra info - if you want a "rolling" 10 day sum or 10 day average then code similar to the following is a possible approach:
WITH
MEMBER [Measures].[Sum 10] AS
Sum
(
LastPeriods
(10
,[Date].[Calendar].CurrentMember
)
,[Measures].[Internet Order Count]
)
MEMBER [Measures].[MovAvg 10] AS
Avg
(
LastPeriods
(10
,[Date].[Date].CurrentMember
)
,[Measures].[Internet Order Count]
), format_string = "#.000"
SELECT
{
[Measures].[Internet Order Count]
,[Measures].[Sum 10]
,[Measures].[MovAvg 10]
} ON 0
,Descendants
(
[Date].[Calendar].[Month].&[2006]&[2]
,[Date].[Calendar].[Date]
) ON 1
FROM [Adventure Works];
It returns data like the following:

How to calculate average on two dimensions in MDX

I'm trying to convert the following SQL query into a calculated member in my SSAS cube.
SELECT ActionKey, AVG(1.0 * Days) AS AverageDays
FROM( SELECT ActionKey, UserKey, COUNT(DISTINCT DateKey) AS Days
FROM [TEST].[dbo].[FactActivity]
GROUP BY ActionKey, UserKey) a
GROUP BY ActionKey
How do I do this in MDX? I tried the following but it's giving me wrong result
IIF([Measures].[Dim User Count] = 0, 0 , [Measures].[Dim Date
Count]/[Measures].[Dim User Count])
In my cube, I have two derived measures . "Dim Date Count" which is count of rows in DimDate table and "Dim User Count" which is count of row of DimUser table. Both have many-many relationship with other dimensions of the cube, so i can calculate the distinct days and users easily.
This worked
AVG([Users].[User Key].[User Key], [Measures].[DATE COUNT])
(not a solution but maybe helps)
Are the two measures that you've created giving the results you expect? If you run the equivalent of the following against [YourCube] is it just the new measure [Measures].[AverageDays] that is wrong?
SELECT
NON EMPTY
{
[Measures].[AverageDays]
,[Measures].[Dim Date Count]
,[Measures].[Dim User Count]
} ON COLUMNS
,NON EMPTY
{
[Action].[Action].MEMBERS
*
[Date].[Calendar].[Month].ALLMEMBERS
} ON ROWS
FROM [YourCube];

Using intersect with 2 large sets to get the distinct count - MDX

I have a calculated member which represents an active customer. That would be the following;
WITH MEMBER [Measures].[Active Customers] AS
Count ( nonempty( Filter (
( [Customer].[Customer Key].Members, [Measures].[Turnover] ),
[Measures].[Turnover] > 0
) ) )
This works great, when I want to get active customers in the current period and previous ones, as I get my time dimension, and use the CurrentMember, CurrentMember.PrevMember and CurrentMember with the Lag function in order to get customers who were active in previous periods.
My problem is when I want to get the count of customers, who are common in different members. Say I want to get customers who are active in the current period, and NOT in the previous period. Or another case, active in current, and active in previous. Because of this, I would need to use the INTERSECT function, and my customer dimension has 4 million records. This is already a subset of 9 million records.
So when checking for a customer who is active in 2 consecutive periods, I do this (The Active Previous Period, and Active Current Period is basically the calculated member above, however with CurrentMember and CurrentMember.PrevMember) :
set [Previous Active Customers Set] AS
Filter (
( [Customer].[Customer Key].Members, [Measures].[Active Previous Period] ),
[Measures].[Active Previous Period] > 0
)
set [Current Active Customers Set] AS
Filter (
( [Customer].[Customer Key].Members, [Measures].[Active Current Period] ),
[Measures].[Active Current Period] > 0
)
member [Measures].[Active 2 consecutive periods] as
count(INTERSECT([Current Active Customers Set],[Previous Active Customers Set]) )
This takes forever. Is there anyway to improve, or go around this performance problem of using the INTERSECT with large sets? Or maybe optimizations on the MDX query? I tried always using a subset of my customers dimension, but this only reduced the number of records to less than 4 million - so it's still large. Any help would be appreciated!
I would assume you can speed this up if you avoid using named sets and calculated members as far as possible.
One step towards this would be as follows: Create a new fact table with foreign keys just to your customer and time dimension, and add a record to it if a customer was active on that day. Build a measure group, let's say "activeCustomers" based on this table, just using "count" as the measure. But make this invisible, as we do not need it.
Then, you can replace
count( nonempty( Filter (
( [Customer].[Customer Key].Members, [Measures].[Turnover] ),
[Measures].[Turnover] > 0
) ) )
with
count( Exists(
[Customer].[Customer Key].Members,
<state your time selection here>,
"activeCustomers"
) )
Exists should be more efficient than Filter.
Another optimization approach could be the observation that instead of intersecting two sets generated via Filter, you could define one set with a more complex filter, avoiding that AS is looping along the customers twice, and then intersecting the results:
set [Active Customers Set] AS
Filter (
( [Customer].[Customer Key].Members, [Measures].[Active Previous Period] ),
[Measures].[Active Previous Period] > 0
AND
[Measures].[Active Current Period] > 0
)