Is COUNTDISTINCT base measure faster than calculated distinct measure? - ssas

I have a fact table and dimension table that look like this
​Fact {DateKey,UserKey,ActionKey,Action Count}
I have a user dimension table that looks like this
DimUser {UserKey,Name,IsActive}
and a date dimension table that looks like this
DimDate { DateKey, Week, Year}
I have a physical distinct count measure on the DateKey column on the measure group based of the fact table..lets call it "DATE COUNT"
Now I want to have a calculated measure called "DAILY ACTIVE USERS" which will looks like this
[IsDailyActive] = IIF([Measures].[DATE COUNT] >= 21, 1, 0)
[Daily Active Users] = SUM([Dim User].[User Key].[User Key].Members, [Measures].[IsDailyActive])
There is also another way to solve this problem w/o using the physical distinct count measure as follows
[HasAction] = IIF([Measures].[Action Count] > 1, 1, 0)
[DATE COUNT] = SUM([Dim Date].[DATE KEY].[DATE KEY].Members, [Measures].[HasAction])
[IsDailyActive] = IIF([Measures].[DATE COUNT] >= 21, 1, 0)
[Daily Active Users] = SUM([Dim User].[User Key].[User Key].Members, [Measures].[IsDailyActive])
But I'm afraid the calculated measure will be slower than the physical distinct count measure that is calculated at processing time instead of at query time. Any thoughts of which will be better performing? My fact table has billion rows!

Related

MDX Dynamic dimension filter based on value of other dimension

How can I filter using values from two dimension in MDX?
The required result should include records where [Purchase Date].[Date] is before today minus number of years from [Accounting Date].[Year]. So, the result should include YoY records from today based on [Purchase Date].[Date] for each [Accounting Date].[Year] member.
I would like something like the following:
SELECT NON EMPTY [Measures].[Amount] ON 0,
NON EMPTY [Accounting Date].[Year].[Year].ALLMEMBERS ON 1
FROM [Tabular_Model]
WHERE (
NULL :
STRTOMEMBER("[Purchase Date].[Date].&["+ Format(DateAdd("YYYY", [Accounting Date].[Year].CURRENTMEMBER.MEMBER_VALUE - 2020, Now()),"yyyy-MM-ddT00:00:00") + "]")
)
But it fails with error: Execution of the managed stored procedure DateAdd failed with the following error: Microsoft::AnalysisServices::AdomdServer::AdomdException.
The syntax for 'All' is incorrect. (All).
Why CURRENTMEMBER.MEMBER_VALUE works for HAVING but not in my WHERE clause? What is the right way?
Try the following measure and query:
WITH
MEMBER [Measures].[Trailing Amount] as SUM({NULL :
STRTOMEMBER("[Purchase Date].[Date].&["+ Format(DateAdd("YYYY", [Accounting Date].[Year].CURRENTMEMBER.MEMBER_VALUE - 2020, Now()),"yyyy-MM-ddT00:00:00") + "]")}, [Measures].[Amount])
SELECT [Measures].[Trailing Amount] ON 0,
NON EMPTY [Accounting Date].[Year].[Year].MEMBERS ON 1
FROM [Tabular_Model]
If MDX doesn't perform as well as you hope, then you might consider adding the following DAX measure into your Tabular model. The following DAX query illustrates how to use it, but if you put this DAX measure into your model, you can query it with MDX queries and it should likely perform better than an MDX calculation:
define
measure 'Your Table Name Here'[Trailing Sales] =
VAR YearOffset = SELECTEDVALUE('Accounting Date'[Year]) - 2020
VAR NowDate = NOW()
VAR EndDate = DATE(YEAR(NowDate)+YearOffset,MONTH(NowDate),DAY(NowDate))
RETURN CALCULATE([Amount], 'Purchase Date'[Date] <= EndDate)
evaluate ADDCOLUMNS(ALL('Accounting Date'[Year]),"Trailing Sales",[Trailing Sales])

SSAS MDX sum up on memberships in date hierarchies?

In a cube that contains memberships of a club, I have a column MembersInOut in my fact-table which holds when a member joined the club (Value = 1) and leaving (value = -1). The Club started jan 1. 2000. so no members before that date.
Now to know the current number of members on a specific date I can do this:
CREATE MEMBER CURRENTCUBE.[Measures].[Calculated MembersOfTheClub]
AS
Sum(
{[Date Dim].[Date].&[2000-01-01T00:00:00]:
[Date Dim].[Date].currentmember},
[Measures].[MembersInOut]
)
This works fine on the actuel date, but how to make this work on a date hierarchie [Year-Month-day] ?
Thanks
You could create Y-M-D hierarchy, then use expression like below
with member[Measures].[S1] AS
sum(
{NULL:[Date].[Calendar Date].CurrentMember}
, [Measures].[Internet Sales Count])
select nonempty ([Date].[Calendar Date].members) on rows, nonempty ({[Measures].[S1],[Measures].[Internet Sales Count]}) on columns from [Analysis Services Tutorial]
Zoe

MDX - filter by [Day] but display category of [Month]

I’m hoping someone can help me out with restructuring/rewriting my MDX query – I’m fairly new to MDX and only know enough to be dangerous. I am using Mondrian if that makes a difference.
Here is the stacked bar chart I am producing…
Injuries by Month and Category
And here is my query (simplified to remove all the stuff not relevant to this question)…
WITH
SET [Date Range] AS {${mdxStartDateParam}.Parent : ${mdxEndDateParam}.Parent}
MEMBER [Measures].[Month Name] as [Incident Date.YQMD].currentmember.parent.parent.name || "-" || [Incident Date.YQMD].currentmember.name
SET [Classification Month Set] AS (
Hierarchize(
ORDER(
Hierarchize(FILTER([Classification].[Classification].members,[Classification].CURRENTMEMBER IN {Descendants([Classification].[${paramInjClass}])})),
[Measures].[Injury Count],
BDESC
)
) * [Date Range]
)
SELECT {[Measures].[Injury Count], [Measures].[Month Name]} ON COLUMNS,
NON EMPTY [Classification Month Set] ON ROWS
FROM [Injury Analysis]
The problem I have is that my two date parameters (${mdxStartDateParam} and ${mdxEndDateParam}) can be any date at the [Day] level, while my chart X Axis is showing at the [Month] level, and even if the ${mdxStartDateParam} is midway through a month my query is returning all data for the month.
eg. If I have an Injury that occurred on February 2nd but my ${mdxStartDateParam} is [Incident Date.YQMD].[2017].[Q1].[Feb].[17], then that Injuryis being included in the chart.
Is there a way I can restructure my MDX so that the bar for February does not show all data for February, but only the data for Fenruary that is >= ${mdxStartDateParam} and <= ${mdxEndDateParam}?
Since the Mondrian doesn't support sub-queries, you can't use your Calendar hierarchy for both where clause and axis. Also there is no way to filter days and show month only on axis. So, if you have two separate hierarchies for Days and Months, you may use the following:
WITH
SET [Date Range] AS [YourDateDim].[YourHierarchyNotInDateParam].[MonthLevel].Members
MEMBER [Measures].[Month Name] as [Incident Date.YQMD].currentmember.parent.parent.name || "-" || [Incident Date.YQMD].currentmember.name
SET [Classification Month Set] AS (
Hierarchize(
ORDER(
Hierarchize(FILTER([Classification].[Classification].members,[Classification].CURRENTMEMBER IN {Descendants([Classification].[${paramInjClass}])})),
[Measures].[Injury Count],
BDESC
)
) * [Date Range]
)
SELECT {[Measures].[Injury Count], [Measures].[Month Name]} ON COLUMNS,
NON EMPTY [Classification Month Set] ON ROWS
FROM [Injury Analysis]
WHERE {${mdxStartDateParam}:${mdxEndDateParam}})
Otherwise you have to deal with shown days and group them after.
Without knowing anything about the dialect of MDX you're using, or being able to see the dimension structure, my guess is that the problem is with the definition of [Date Range]:
SET [Date Range] AS {${mdxStartDateParam}.Parent : ${mdxEndDateParam}.Parent}
If the two parameters are at the Day level, does .Parent return their parent months?
The solution might be to make the date range be a set of days:
SET [Date Range] AS {${mdxStartDateParam} : ${mdxEndDateParam}}
and then aggregate by month somehow.

How to calculate average on two dimensions in MDX

I'm trying to convert the following SQL query into a calculated member in my SSAS cube.
SELECT ActionKey, AVG(1.0 * Days) AS AverageDays
FROM( SELECT ActionKey, UserKey, COUNT(DISTINCT DateKey) AS Days
FROM [TEST].[dbo].[FactActivity]
GROUP BY ActionKey, UserKey) a
GROUP BY ActionKey
How do I do this in MDX? I tried the following but it's giving me wrong result
IIF([Measures].[Dim User Count] = 0, 0 , [Measures].[Dim Date
Count]/[Measures].[Dim User Count])
In my cube, I have two derived measures . "Dim Date Count" which is count of rows in DimDate table and "Dim User Count" which is count of row of DimUser table. Both have many-many relationship with other dimensions of the cube, so i can calculate the distinct days and users easily.
This worked
AVG([Users].[User Key].[User Key], [Measures].[DATE COUNT])
(not a solution but maybe helps)
Are the two measures that you've created giving the results you expect? If you run the equivalent of the following against [YourCube] is it just the new measure [Measures].[AverageDays] that is wrong?
SELECT
NON EMPTY
{
[Measures].[AverageDays]
,[Measures].[Dim Date Count]
,[Measures].[Dim User Count]
} ON COLUMNS
,NON EMPTY
{
[Action].[Action].MEMBERS
*
[Date].[Calendar].[Month].ALLMEMBERS
} ON ROWS
FROM [YourCube];

Using intersect with 2 large sets to get the distinct count - MDX

I have a calculated member which represents an active customer. That would be the following;
WITH MEMBER [Measures].[Active Customers] AS
Count ( nonempty( Filter (
( [Customer].[Customer Key].Members, [Measures].[Turnover] ),
[Measures].[Turnover] > 0
) ) )
This works great, when I want to get active customers in the current period and previous ones, as I get my time dimension, and use the CurrentMember, CurrentMember.PrevMember and CurrentMember with the Lag function in order to get customers who were active in previous periods.
My problem is when I want to get the count of customers, who are common in different members. Say I want to get customers who are active in the current period, and NOT in the previous period. Or another case, active in current, and active in previous. Because of this, I would need to use the INTERSECT function, and my customer dimension has 4 million records. This is already a subset of 9 million records.
So when checking for a customer who is active in 2 consecutive periods, I do this (The Active Previous Period, and Active Current Period is basically the calculated member above, however with CurrentMember and CurrentMember.PrevMember) :
set [Previous Active Customers Set] AS
Filter (
( [Customer].[Customer Key].Members, [Measures].[Active Previous Period] ),
[Measures].[Active Previous Period] > 0
)
set [Current Active Customers Set] AS
Filter (
( [Customer].[Customer Key].Members, [Measures].[Active Current Period] ),
[Measures].[Active Current Period] > 0
)
member [Measures].[Active 2 consecutive periods] as
count(INTERSECT([Current Active Customers Set],[Previous Active Customers Set]) )
This takes forever. Is there anyway to improve, or go around this performance problem of using the INTERSECT with large sets? Or maybe optimizations on the MDX query? I tried always using a subset of my customers dimension, but this only reduced the number of records to less than 4 million - so it's still large. Any help would be appreciated!
I would assume you can speed this up if you avoid using named sets and calculated members as far as possible.
One step towards this would be as follows: Create a new fact table with foreign keys just to your customer and time dimension, and add a record to it if a customer was active on that day. Build a measure group, let's say "activeCustomers" based on this table, just using "count" as the measure. But make this invisible, as we do not need it.
Then, you can replace
count( nonempty( Filter (
( [Customer].[Customer Key].Members, [Measures].[Turnover] ),
[Measures].[Turnover] > 0
) ) )
with
count( Exists(
[Customer].[Customer Key].Members,
<state your time selection here>,
"activeCustomers"
) )
Exists should be more efficient than Filter.
Another optimization approach could be the observation that instead of intersecting two sets generated via Filter, you could define one set with a more complex filter, avoiding that AS is looping along the customers twice, and then intersecting the results:
set [Active Customers Set] AS
Filter (
( [Customer].[Customer Key].Members, [Measures].[Active Previous Period] ),
[Measures].[Active Previous Period] > 0
AND
[Measures].[Active Current Period] > 0
)