Calculating Percentile in MDX - mdx

Does icCube provide any function that can be used for creating calculated members with aggregated percentile, like Median or 95th percentile?

For the median there is an MDX function available. Percentile is not yet directly supported, looks as a good idea to add, but we can user the Vector MDX+ function applying the percentile object function.
Vector( [Set] , [Value], EXCLUDEEMPTY )->percentile(50) // for the median
same as
Median( [Set] , [Value] )
If you're looking to do this directly from a measure (fact level). There is a list of aggregation methods that return a Vector. Let's say we've defined a measure [P&L Vector] with the aggregation method Vector. Then for the median would be :
[P&L Vector]->percentile(50)
From the coming 6.2 version there will be an MDX function to support Percentile:
Percentile( [Set], [Value] , 95 )

Related

What does MDX Aggregate() do with a single argument?

I understand how to use the MDX Aggregate() and Sum() functions, and the differences between them.
(One interesting one is that the Sum of a measure defined at a higher level in a hierarchy over that level's Children multiplies the measure by the number of children - whereas Aggregate "correctly" returns just the value defined at the higher level).
The documented syntax on MSDN is:
Aggregate(Set_Expression [ ,Numeric_Expression ])
I've always used it with both arguments. But what does Aggregate do when only the set_expression argument is provided? The documentation (again from MSDN) is pretty obscure:
If a numeric expression is not provided, this function aggregates each measure within the current query context by using the default aggregation operator that is specified for each measure.
I tried it in an MDX query like this:
WITH MEMBER WeekSummedTotal AS
Aggregate([Days].[WeeksAndDays].CurrentMember.Children)
SELECT
{Measures.ThingoCount,Measures.WeekTotal,Measures.WeekSummedTotal} ON 0,
[Days].[WeeksAndDays].[WeekName] ON 1
FROM DateGRoupingTest
What would this do? Would Aggregate aggregate the cube's default measure over the set? Or the set Measures.Members? Or the set of other measures specified on the 0 axis?
None of these! The query runs and returns results, but the calcd measure WeekSumTotal shows #Error, with a completely nonsensical error:
Aggregate functions cannot be used on calculated members in the measures dimension
Now this is true, but completely irrelevant. None of the other measures in the query is calculated, and in fact the cube doesn't have any calculated members. So what is Aggregate() actually trying to do here? Is this error message (again, in MDX!) completely misleading?
ADDITION: #whytheq in the answer below suggested creating the calculated measure using Aggregate, but creating it on a spare dimension hierarchy rather than in the Measures dimension. This works, but only if the cross-join with the [All] member of the selected "any old..." dimension is included.
Creating the measure there also makes it impossible to put the two (base) measures and the calculated measure on the same axis. If I try to do this:
{Measures.ThingoCount,Measures.WeekTotal,[Ages].[Age Key].WeekSummedTotal} ON 0,
I get the deeply-unhelpful error message:
Members, tuples or sets must use the same hierarchies in the function.
which, I think, translates to "I can't make a set using the , (UNION) function between members of Measures and members of [Ages].[Age Key] because they're members of different dimensions".
My conclusion, thanks to your informative answers, is that Aggregate() with a single argument is a tricky beast; I wonder why it was designed with the second argument optional?
I've also noted that trying to create my calculated member on my Ages dimension (only one hierarchy, only one attribute) gives me the misleading error message:
The 'Ages' dimension contains more than one hierarchy, therefore
the hierarchy must be explicitly specified.
unless I explicitly specify the hierarchy. MDX has so much potential, but the learning curve would be that much gentler if MS had put more effort into making it feed back errors properly.
What would this do? Would Aggregate aggregate the cube's default
measure over the set? Or the set Measures.Members? Or the set of other
measures specified on the 0 axis?
Aggregate function aggregates the set over the current measure for Measures dimension. And a measure is "current" if it is in scope. If a measure is not in scope, the default member from measures dimension is considered for aggregation.
A measure can be added to scope in many ways like
Having the measure on axes
with member [Customer].[Customer].abc as
aggregate([Customer].[Customer].members)
select [Customer].[Customer].abc on 0,
{[Measures].[Internet Sales Amount],[Measures].[Reseller Sales Amount]} on 1
from [Adventure Works]
In the above example the member abc was calcualted twice, once for each measure.
Using Subcube
with member [Customer].[Customer].abc as
aggregate([Customer].[Customer].members)
select [Customer].[Customer].abc on 0
from (select {[Measures].[Internet Sales Amount] } on 0 from [Adventure Works])
Having the measure in definition
with member [Customer].[Customer].abc as
aggregate([Customer].[Customer].members, [Measures].[Internet Sales Amount])
select [Customer].[Customer].abc on 0
from [Adventure Works]
In Where clause
with member [Customer].[Customer].abc as
aggregate([Customer].[Customer].members)
select [Customer].[Customer].abc on 0
from [Adventure Works]
where [Measures].[Internet Sales Amount]
As suggested by whytheq, have the member on some other dimension-hierarchy combo. Otherwise, the aggregate function would probably lead to the calculated member self-referencing itself.
Taking this section of the MSDN definition:
...this function aggregates each measure within the current query
context ...
each measure in the context of your script is the following:
{Measures.ThingoCount,Measures.WeekTotal,Measures.WeekSummedTotal}
Now Measures.WeekSummedTotal is a calculated members in the measures dimension - hence the error.
I'd imagine something like the following would function ok, where you use Aggregate to create a member in a dimension other than Measures?:
WITH
MEMBER [SomeSpareDim].[SomeSpareHier].WeekSummedTotal AS
Aggregate
(
[Days].[WeeksAndDays].CurrentMember.Children * [SomeDim].[SomeHier].[All]
)
SELECT
[SomeSpareDim].[SomeSpareHier].WeekSummedTotal ON 0
,[Days].[WeeksAndDays].[WeekName] ON 1
FROM DateGRoupingTest;
The above can be changed to show Aggregate being very useful:
WITH
MEMBER [Days].[WeeksAndDays].[Last3Weeks] AS
Aggregate
(
{
[Days].[WeeksAndDays].[Weekx]
,[Days].[WeeksAndDays].[Weeky]
,[Days].[WeeksAndDays].[Weekz]
}
)
SELECT
{Measures.ThingoCount,Measures.WeekTotal} ON 0
,{
//<< the following custom aggregated member will work for any measure, that is ON 0, via Aggregate
//it can be mixed up with the normal members of the same hierarchy like the following
[Days].[WeeksAndDays].[Last3Weeks]
,[Days].[WeeksAndDays].[WeekName].members
} ON 1
FROM DateGRoupingTest;

MDX Error while creating Calculated Measure

I am trying to create a calculated measure that finds the difference between two measures by using the following mdx query
WITH MEMBER [Measures].[Available]
AS ([Measures].[Capacity days] ,[Project].[Projects by Name].[All],[Project].[Projects by Code].[All])
- ([Measures].[Worked Days] ,EXCEPT([Project].[Projects by Name].[Project].MEMBERS,
[Project].[Projects by Name].[Project].&[1214]),[Version].[Version].[Combined],[Charge].[Charge].[All])
In case of second measure Worked Days I want to access it with respect to all projects except one ,so am using EXCEPT function which results in the following error
" The function expects a string or numeric expression for the argument. A tuple set expression was used"
Is there any other way to perform this operation?
The query is mixing tuples with sets. Perhaps you can check this gentle introduction of MDX for main concepts and notations.
The second tuple is using a set (the result of EXCEPT) as its second member which is not possible. You could use the aggregate function as following to compute the [Worked Days] over the members of this set instead :
AS ( [Measures].[Capacity days], ... )
- Aggregate(
Except (
[Project].[Projects by Name].[Project].MEMBERS,
[Project].[Projects by Name].[Project].&[1214]
),
( [Measures].[Worked Days], ... )
)

Calculating percentile values in SSAS

I am trying to calculate percentile (for example 90th percentile point of my measure) in a cube and I think I am almost there. The problem I am facing is, I am able to return the row number of the 90th percentile, but do not know how to get my measure.
With
Member [Measures].[cnt] as
Count(NonEmpty(
-- dimensions to find percentile on (the same should be repeated again
[Calendar].[Hierarchy].members *
[Region Dim].[Region].members *
[Product Dim].[Product].members
,
-- add the measure to group
[Measures].[Profit]))
-- define percentile
Member [Measures].[Percentile] as 90
Member [Measures].[PercentileInt] as Int((([Measures].[cnt]) * [Measures].[Percentile]) / 100)
**-- this part finds the tuple from the set based on the index of the percentile point and I am using the item(index) to get the necessary info from tuple and I am unable to get the measure part
Member [Measures].[PercentileLo] as
(
Order(
NonEmpty(
[Calendar].[Hierarchy].members *
[Region Dim].[Region].members *
[Product Dim].[Product].members,
[Measures].[Profit]),
[Measures].[Profit].Value, BDESC)).Item([Measures].[PercentileInt]).Item(3)**
select
{
[Measures].[cnt],
[Measures].[Percentile],[Measures].[PercentileInt],
[Measures].[PercentileLo],
[Measures].[Profit]
}
on 0
from
[TestData]
I think there must a way to get measure of a tuple found through index of a set. Please help, let me know if you need any more information. Thanks!
You should extract the tuple at position [Measures].[PercentileInt] from your set and add the measure to it to build a tuple of four elements. Then you want to return its value as the measure PercentileLo, i. e. define
Member [Measures].[PercentileLo] as
(
[Measures].[Profit],
Order(
NonEmpty(
[Calendar].[Hierarchy].members *
[Region Dim].[Region].members *
[Product Dim].[Product].members,
[Measures].[Profit]),
[Measures].[Profit], BDESC)).Item([Measures].[PercentileInt])
)
The way you implemented it, you tried to extract the fourth (as Item() starts counting from zero) item from a tuple containing only three elements. Your ordered set only has three hierarchies.
Just another unrelated remark: I think you should avoid using complete hierarchies for [Calendar].[Hierarchy].members, [Region Dim].[Region].members, and [Product Dim].[Product].members. Your code looks like you are including all levels (including the all member) in the calculation. But I do not know the structure and names of your cube, hence I may be wrong with this.
An alternate method could be to find the median of the last 20% of the records in the table. I've used this combination of functions to find the 75th percentile. By dividing the record count by 5, you can use the TopCount function to return a set of tuples that make up 20% of the whole table sorted in descending order by your target measure. The median function should then land you at the correct 90th percentile value without having to find the record's coordinates. In my own use, I use the same measure for the last parameter in both the Median and TopCount functions.
Here's my code:
WITH MEMBER Measures.[90th Percentile] AS MEDIAN(
TOPCOUNT(
[set definition]
,Measures.[Fact Table Record Count] / 5
,Measures.[Value by which to sort the set so the first 20% of records are chosen]
)
,Measures.[Value from which the median should be determined]
)
Based on what you've supplied in your problem definition, I would expect your code to look something like this:
WITH MEMBER Measures.[90th Percentile] AS MEDIAN(
TOPCOUNT(
{
[Calendar].[Hierarchy].members *
[Region Dim].[Region].members *
[Product Dim].[Product].members
}
,Measures.[Fact Table Record Count] / 5
,[Measures].[Profit]
)
,[Measures].[Profit]
)

MDX Query percentile 25th, 50th and 75th

I have a question and I haven't been able to find the answer (neither in this forum nor other) I am looking for:
I need to calculate the 25th Percentile, the median (the 50th percentile) and the 75th percentile.
Putting in another words: I need to write in the MDX query in SSRS for it to tell me which data is the 25th, the median and the 75th
All I was able to find so far was not the exact values of each one of them
thanks
I've been working on the same issue for my own data. The trouble I was having is in figuring out the Median() function. Here's how I interpret the parameters of the function:
Microsoft's definition:
MEDIAN(Set_Expression [, Numeric_Expression])
My interpretation:
Set_Expression is the set of values that define the grain to which the measure is summed before the median is evaluated
Numeric_Expression is the measure that is summed, which set of sums is then sorted and evaluated to find the median
In my case for finding the straight median across the entire data set, I didn't want to sum the values at all. To prevent any sums from being calculated, I used the key attribute for a dimension that had a 1-1 cardinality with the records in the fact table that contains the measure that I'm using. The only flaw I've seen so far is that sometimes the median returns a whole number when there are an even number of records and the mean of the two middle records should result in a number ending in .5. For example, the values of the two middle records are 16 and 17 and the function is returning 17 instead of 16.5. Since this is a minor flaw, I'm willing to overlook it for now.
This is what my calculation with the median function looks like:
WITH MEMBER Measures.[Set Median] AS MEDIAN(
[Dimension].[Key Attribute].MEMBERS
,Measures.[Non-summable Measure]
)
I used a combination of Median and TopCount to get the 75th percentile. I use TopCount to limit the set for the median to the second half of the data since TopCount sorts the data in descending order. I'll explain how I understand TopCount:
Microsoft's definition:
TopCount(Set_Expression, Count [, Numeric_Expression])
My interpretation:
Set_Expression is the set of values from which the desired number of tuples will be returned
Count is the number of tuples to return from the set
Numeric_Expression is the value that will be used to sort the set in descending order
I want the Median function to use the last half of the records in the fact table that are returned in the query, so I again use the key for the dimension table that has a 1-1 cardinality with the fact table and I sort it by the measure from which I want to find the median value.
Here is how I coded the member:
MEMBER Measures.[75th Percentile] AS MEDIAN(
TOPCOUNT(
[Dimension].[Key Attribute].MEMBERS
,Measures.[Fact Table Record Count] / 2
,Measures.[Non-summable Measure]
)
,Measures.[Non-summable Measure]
)
So far, this combination of functions has returned a true 75th percentile from my data set. To get the 25th percentile, I tried replacing TOPCOUNT in my code with BOTTOMCOUNT, which is supposed to do the same thing, only sorting the data in ascending order to use the first half of the records instead of the second half. Unfortunately, I haven't been able to get anything but NULL from this combination of functions, so I'm open to suggestions on how to get the 25th percentile.
This is how my final query looks:
SELECT
{
Measures.[Set Median]
,Measures.[25th Percentile]
,Measures.[75th Percentile]
} ON 0
,[Dimensional row members here] ON 1
FROM [Cube]
WHERE
[Non-axis dimensional filter members here]

Why AVG function perform the SUM?

I would to execute a query with a calculated member which returns the AVG (of the measure) of the Coil belonging to a particular LINESPEED.
The query is:
With
Member [Measures].[Avg1] As
AVG(
([LINESPEED].currentmember,
[GRUPPO].[Coil].currentmember)
,
[Measures].[KPI1]
)
SELECT [Measures].[Avg1] on 0,
non empty {[LINESPEED].children} On 1
from[HDGL]
But the AVG function compute exactly the sum of the KPIs of the coil related to a particular LINESPEED!!
Why?
Your formula is using a single tuple, so AVG() is equivalent to SUM() :
AVG( ([LINESPEED].currentmember, [GRUPPO].[Coil].currentmember), ...)