MDX Query percentile 25th, 50th and 75th - mdx

I have a question and I haven't been able to find the answer (neither in this forum nor other) I am looking for:
I need to calculate the 25th Percentile, the median (the 50th percentile) and the 75th percentile.
Putting in another words: I need to write in the MDX query in SSRS for it to tell me which data is the 25th, the median and the 75th
All I was able to find so far was not the exact values of each one of them
thanks

I've been working on the same issue for my own data. The trouble I was having is in figuring out the Median() function. Here's how I interpret the parameters of the function:
Microsoft's definition:
MEDIAN(Set_Expression [, Numeric_Expression])
My interpretation:
Set_Expression is the set of values that define the grain to which the measure is summed before the median is evaluated
Numeric_Expression is the measure that is summed, which set of sums is then sorted and evaluated to find the median
In my case for finding the straight median across the entire data set, I didn't want to sum the values at all. To prevent any sums from being calculated, I used the key attribute for a dimension that had a 1-1 cardinality with the records in the fact table that contains the measure that I'm using. The only flaw I've seen so far is that sometimes the median returns a whole number when there are an even number of records and the mean of the two middle records should result in a number ending in .5. For example, the values of the two middle records are 16 and 17 and the function is returning 17 instead of 16.5. Since this is a minor flaw, I'm willing to overlook it for now.
This is what my calculation with the median function looks like:
WITH MEMBER Measures.[Set Median] AS MEDIAN(
[Dimension].[Key Attribute].MEMBERS
,Measures.[Non-summable Measure]
)
I used a combination of Median and TopCount to get the 75th percentile. I use TopCount to limit the set for the median to the second half of the data since TopCount sorts the data in descending order. I'll explain how I understand TopCount:
Microsoft's definition:
TopCount(Set_Expression, Count [, Numeric_Expression])
My interpretation:
Set_Expression is the set of values from which the desired number of tuples will be returned
Count is the number of tuples to return from the set
Numeric_Expression is the value that will be used to sort the set in descending order
I want the Median function to use the last half of the records in the fact table that are returned in the query, so I again use the key for the dimension table that has a 1-1 cardinality with the fact table and I sort it by the measure from which I want to find the median value.
Here is how I coded the member:
MEMBER Measures.[75th Percentile] AS MEDIAN(
TOPCOUNT(
[Dimension].[Key Attribute].MEMBERS
,Measures.[Fact Table Record Count] / 2
,Measures.[Non-summable Measure]
)
,Measures.[Non-summable Measure]
)
So far, this combination of functions has returned a true 75th percentile from my data set. To get the 25th percentile, I tried replacing TOPCOUNT in my code with BOTTOMCOUNT, which is supposed to do the same thing, only sorting the data in ascending order to use the first half of the records instead of the second half. Unfortunately, I haven't been able to get anything but NULL from this combination of functions, so I'm open to suggestions on how to get the 25th percentile.
This is how my final query looks:
SELECT
{
Measures.[Set Median]
,Measures.[25th Percentile]
,Measures.[75th Percentile]
} ON 0
,[Dimensional row members here] ON 1
FROM [Cube]
WHERE
[Non-axis dimensional filter members here]

Related

MDX Result Count

I am a beginner in MDX queries. Can any one tell me how to get the record count that is a result of a MDX query?
The query is following:
select {[Measures].[Employee Department History Count],[Measures].[Rate]} on columns, Non Empty{{Filter([Shift].[Shift ID].[Shift ID].Members, ([Shift].[Shift ID].CurrentMember.name <> "1"))}*{[Employee].[Business Entity ID].[Business Entity ID].Members}} on rows from [Adventure Works2012].
I have tried various methods and I haven't really got a solution for that.
I assume you mean row count when you talk of "record count", as MDX does not know a concept of records, but the result shown from an MDX query is the space built by the tuples on the axes.
I see two possibilities to get the row count:
Just count the rows returned from your above query in the tool from which you call the MDX query.
If you want to count in MDX, then let's state what you want to have:
You want to know the number of members of the set of non empty combinations of [Shift ID]s and [Business Entity ID]s where the Shift ID is not "1" and at least one of the measures [Employee Department History Count] and [Rate] is not null.
To state that different: Let's call the tuples like above for which the first measure is not null "SET1", and the tuples like above for which teh second measure is not null "SET2". Then you you want to know the count of the the tuples which are contained in one of these sets (or in both).
To achieve this, we define these two sets and then a calculated menber (a new measure in our case) containing this calculation in its definition, and then use this calculated member in the select clause to show it:
WITH
SET SET1 AS
NonEmpty({{Filter([Shift].[Shift ID].[Shift ID].Members,
([Shift].[Shift ID].CurrentMember.name <> "1"))}
* {[Employee].[Business Entity ID].[Business Entity ID].Members}},
{[Measures].[Employee Department History Count])
SET SET2 AS
NonEmpty({{Filter([Shift].[Shift ID].[Shift ID].Members,
([Shift].[Shift ID].CurrentMember.name <> "1"))}
* {[Employee].[Business Entity ID].[Business Entity ID].Members}},
{[Measures].[Rate])
MEMBER [Measures].[MyCalculation] AS
COUNT(SET1 + SET 2)
SELECT [Measures].[MyCalculation] ON COLUMNS
FROM [Adventure Works2012]
Please note:
The sets SET1 and SET2 are not absolutely necessary, you could also put the whole calculation in one long and complicated definition of the MyCalculation measure, but splitting it up makes is easier to read. However, the definition of a new member is necessary, as in MDX you can only put members on axes (rows, columns, ...). These members can either already been defined in the cube, or you have to define them in the WITH clause of your query. There is no such thing as putting expressions/calculations on axes in MDX, only members.
The + for sets is a union which removes duplicates, hence this operation gives us the tuples which have an non empty value for at least one of the measures. Alternatively, you could have used the Union function equivalently to the +.
The Nonempty() I used in the definitions of the sets is the NonEmpty function, which is slightly different from the NON EMPTY keyword that you can use on the axes. We use one of the measures as second argument to this function in both set definitions.
I have currently no working SSAS installation available to test my statement, hence there might be a minor error or typo in my above statement, but the idea should work.

Last Available value MDX

I have a requirement where in i am to extract data from a cube, within the SSRS dataset using the query builder ,with the time dimension in the result set, across a range of dates. The conditions are
The measures are to be displayed for each day of the date range.
The sub total row should have the last available measures value for that time range.
There is a time filter (currently a single date filter with a multi select option).
my MDX is as below.
The measure has a 'Sum' as the aggregation type.
I have a calculated measure with the scope defined as below.
SCOPE([MEASURES].[Measure1]);
SCOPE([Date].[Date].MEMBERS);
THIS = TAIL(EXISTING ([Date].[Date].MEMBERS),1).ITEM(0) ;
END SCOPE;
END SCOPE;
This above scope statement works perfectly. however, when i select in more that one date member this query slows WAYYYYYYY down. Performance numbers are
Across 1 date - 4 seconds
Across 2 dates - 22 minutes
Across 3 dates - unknown (in Hours)
This drastic degradation in performance goes away if i remove the scope statement, which makes me thing that there should be a better way to do the same. the final report query is as below.
SELECT
NON EMPTY
{[Measures].[Measure1]} ON COLUMNS
,NON EMPTY
{ [Dimension1].[Dimension1].[Dimension1].ALLMEMBERS*
[Dimension2].[Dimension2].[Dimension2].ALLMEMBERS*
[Dimension3].[Dimension3].[Dimension3].ALLMEMBERS*
[Date].[Date].[Date].ALLMEMBERS
} ON ROWS
FROM (
SELECT {[Date].[Date].&[2014-06-13T00:00:00]
,[Date].[Date].&[2014-06-16T00:00:00] } ON COLUMNS
FROM [Cube]
)
So the question again is, Is there a way to do the last available value part of the scope statement so as to have a better performance? Also, if there is another way to write the final mdx that would help the performance?.
Please let me know if there are anythings unclear regarding the question.
Thanks
Srikanth
The first optimization step would be to change your query to
SELECT
NON EMPTY
{[Measures].[Measure1]} ON COLUMNS
,NON EMPTY
{ [Dimension1].[Dimension1].[Dimension1].ALLMEMBERS*
[Dimension2].[Dimension2].[Dimension2].ALLMEMBERS*
[Dimension3].[Dimension3].[Dimension3].ALLMEMBERS*
{[Date].[Date].&[2014-06-13T00:00:00], [Date].[Date].&[2014-06-16T00:00:00] }
} ON ROWS
FROM [Cube]
Furthermore, I am not sure why you added the SCOPE([Date].[Date].MEMEBER); (probably Date].[Date].MEMBERS, actually). Maybe it helps to omit it and the corresponding END SCOPE.

Calculating percentile values in SSAS

I am trying to calculate percentile (for example 90th percentile point of my measure) in a cube and I think I am almost there. The problem I am facing is, I am able to return the row number of the 90th percentile, but do not know how to get my measure.
With
Member [Measures].[cnt] as
Count(NonEmpty(
-- dimensions to find percentile on (the same should be repeated again
[Calendar].[Hierarchy].members *
[Region Dim].[Region].members *
[Product Dim].[Product].members
,
-- add the measure to group
[Measures].[Profit]))
-- define percentile
Member [Measures].[Percentile] as 90
Member [Measures].[PercentileInt] as Int((([Measures].[cnt]) * [Measures].[Percentile]) / 100)
**-- this part finds the tuple from the set based on the index of the percentile point and I am using the item(index) to get the necessary info from tuple and I am unable to get the measure part
Member [Measures].[PercentileLo] as
(
Order(
NonEmpty(
[Calendar].[Hierarchy].members *
[Region Dim].[Region].members *
[Product Dim].[Product].members,
[Measures].[Profit]),
[Measures].[Profit].Value, BDESC)).Item([Measures].[PercentileInt]).Item(3)**
select
{
[Measures].[cnt],
[Measures].[Percentile],[Measures].[PercentileInt],
[Measures].[PercentileLo],
[Measures].[Profit]
}
on 0
from
[TestData]
I think there must a way to get measure of a tuple found through index of a set. Please help, let me know if you need any more information. Thanks!
You should extract the tuple at position [Measures].[PercentileInt] from your set and add the measure to it to build a tuple of four elements. Then you want to return its value as the measure PercentileLo, i. e. define
Member [Measures].[PercentileLo] as
(
[Measures].[Profit],
Order(
NonEmpty(
[Calendar].[Hierarchy].members *
[Region Dim].[Region].members *
[Product Dim].[Product].members,
[Measures].[Profit]),
[Measures].[Profit], BDESC)).Item([Measures].[PercentileInt])
)
The way you implemented it, you tried to extract the fourth (as Item() starts counting from zero) item from a tuple containing only three elements. Your ordered set only has three hierarchies.
Just another unrelated remark: I think you should avoid using complete hierarchies for [Calendar].[Hierarchy].members, [Region Dim].[Region].members, and [Product Dim].[Product].members. Your code looks like you are including all levels (including the all member) in the calculation. But I do not know the structure and names of your cube, hence I may be wrong with this.
An alternate method could be to find the median of the last 20% of the records in the table. I've used this combination of functions to find the 75th percentile. By dividing the record count by 5, you can use the TopCount function to return a set of tuples that make up 20% of the whole table sorted in descending order by your target measure. The median function should then land you at the correct 90th percentile value without having to find the record's coordinates. In my own use, I use the same measure for the last parameter in both the Median and TopCount functions.
Here's my code:
WITH MEMBER Measures.[90th Percentile] AS MEDIAN(
TOPCOUNT(
[set definition]
,Measures.[Fact Table Record Count] / 5
,Measures.[Value by which to sort the set so the first 20% of records are chosen]
)
,Measures.[Value from which the median should be determined]
)
Based on what you've supplied in your problem definition, I would expect your code to look something like this:
WITH MEMBER Measures.[90th Percentile] AS MEDIAN(
TOPCOUNT(
{
[Calendar].[Hierarchy].members *
[Region Dim].[Region].members *
[Product Dim].[Product].members
}
,Measures.[Fact Table Record Count] / 5
,[Measures].[Profit]
)
,[Measures].[Profit]
)

Why is Mondrian pre-aggegating before calculating Average?

I have a super-simple analyzer report, where all I'm doing is calculating the average of a measure. In the schema, it's default aggregation is AVERAGE. The only other aspect of the report is a filter on date, where I restrict it to being within a list of 3 dates.
What's odd is that it appears that Mondrian is actually calculating the average for each date BEFORE averaging those 3 numbers to get the value displayed in the report. This seems very wrong (the report only has that one average displayed - no other fields).
I don't know MDX that well, but below is what I pulled from the mdx log if that helps:
With
Set [*NATIVE_CJ_SET] as 'Filter([*BASE_MEMBERS_ActivityDate], Not IsEmpty ([Measures].[AveragePosition]))'
Set [*NATIVE_MEMBERS_ActivityDate] as 'Generate([*NATIVE_CJ_SET], {[ActivityDate].CurrentMember})'
Set [*BASE_MEMBERS_Measures] as '{[Measures].[*FORMATTED_MEASURE_0]}'
Set [*BASE_MEMBERS_ActivityDate] as '{[ActivityDate].[2012-09-01 00:00:00.0],[ActivityDate].[2012-09-02 00:00:00.0],[ActivityDate].[2012-09-03 00:00:00.0]}'
Set [*CJ_COL_AXIS] as '[*NATIVE_CJ_SET]'
Member [ActivityDate].[*SLICER_MEMBER] as 'Aggregate ([*NATIVE_MEMBERS_ActivityDate])', SOLVE_ORDER=-400
Member [Measures].[*FORMATTED_MEASURE_0] as '[Measures].[AveragePosition]', FORMAT_STRING = '#,###.00;(#,###.00)', SOLVE_ORDER=400
Select
[*BASE_MEMBERS_Measures] on columns
From [SQLTestCube1_JustResults]
Where ([ActivityDate].[*SLICER_MEMBER])

SSAS 2012 Calculated Member for Percentage

Being an SSAS newbie, I was wondering if it's possible to create a calculated member that references an individual row's value as well as the aggregated value in order to create a percentage?
For example, if I have a fact table with ValueA, I'd like to create a calculate member that essentially performed:
[Measures].[ValueA] (for each row I've sliced the data by) / [Measures].[ValueA] (the total)
Also I'd like to keep the total as the sum of whatever's been filtered in the cube browser. I feel certain this must be possible but I'm clearly missing something.
You can use the Axis function. Her is an example:
WITH MEMBER [Measures].[Percentage] AS
[Measures].[ValueA] / (Axis(1).CurrenMember.Parent, [Measures].[ValueA])
SELECT {[Measures].[ValueA], [Measures].[Percentage]} ON 0,
'what you want' ON 1
FROM your cube
(You may need to add check in the calculated member expression)