NONEMPTY and CROSSJOIN performance and order in MDX

NONEMPTY and CROSSJOIN performance and order in MDX - ssas

I was wondering which of the following two queries is more performant?
Query 1:
SELECT NONEMPTY(CROSSJOIN({[Product].[Category].children},
{[Scenario].[Scenario].members}
)
) ON COLUMNS
FROM [Analysis Services Tutorial]
Query 2:
SELECT CROSSJOIN(NONEMPTY({[Product].[Category].children}),
NONEMPTY({[Scenario].[Scenario].members})
) ON COLUMNS
FROM [Analysis Services Tutorial]
I would say query 2 is more performant/optimized because first you take out all the unnecessary members and then crossjoin them. The first query you crossjoin everything and then take out the nulls. That would be my guess but I want somebody who can clear me up.
Edit 1 In response of comments of an answer
Lets say I add a measure as a second parameter, so it does not go to the "default measure". How could second query return values with null? I am specifying to crossjoin between nonempty members. And I just really dont see how the can return different results no matter the dimensions involved. To me they seemed pretty equivalent. What am I not seeing?
Query 1:
SELECT NONEMPTY(CROSSJOIN({[Product].[Category].children},
{[Scenario].[Scenario].members}
), [Total Internet Sales]
) ON COLUMNS
FROM [Analysis Services Tutorial]
Query 2:
SELECT CROSSJOIN(NONEMPTY({[Product].[Category].children},[Total Internet Sales]),
NONEMPTY({[Scenario].[Scenario].members},[Total Internet Sales])
) ON COLUMNS
FROM [Analysis Services Tutorial]
Edit 2
As the answer said the queries are not the same. I realized when #GregGalloway presented other scenario.
I did an excel with sample data so maybe someone can find it useful.

They aren't equivalent since both queries we will return different results. For example, against the real Adventure Works (not some tutorial version) these two queries return different results. Notice that the Clothing/Kentucky column shows null on the second query:
SELECT NONEMPTY(CROSSJOIN({[Product].[Category].children},
{[Customer].[State-Province].[State-Province].Members}
), [Measures].[Internet Sales Amount]
) ON COLUMNS
FROM [Adventure Works]
where [Measures].[Internet Sales Amount]
SELECT CROSSJOIN(NONEMPTY({[Product].[Category].children},[Measures].[Internet Sales Amount]),
NONEMPTY({[Customer].[State-Province].[State-Province].Members},[Measures].[Internet Sales Amount])
) ON COLUMNS
FROM [Adventure Works]
where [Measures].[Internet Sales Amount]
Note that the Scenario dimension doesn't relate to the Internet Sales measure group, I don't think. So that may not be a good example. I chose the Product dimension and the Customer dimension for my example.
As discussed (and as you updated in your question) NonEmpty() should always have a second parameter so it is clear what measure you are doing NonEmpty against. Your query should also mention a measure on one axis or the WHERE clause so that you're not returning some vague "default measure". I've included a WHERE clause with a measure in my examples.
Anyway, to answer your question... assuming the measure is a physical measure or a well optimized calculated measure that runs in block mode I wouldn't be surprised if Query 1 is faster. But it depends on the measure and the size of dimensions and the sparsity of the cube. This question is very theoretical and the two queries don't return equivalent results.

Related

Simple Calculated Member Running for Forever - MDX

I am facing very strange issue with MDX (SSAS 2014), on which simplest calculated member is taking forever to execute. Could someone please help me to understand why i am facing this issue. If i not use calculated member everything works fast and result comes in seconds. When i remove Producer attribute, query performances well.
Below is the complete query.
WITH
MEMBER Measures.AsOfDate AS ("[Policy Effective Date].[Year-Month].[Date].&[2018-01-04T00:00:00]")
MEMBER Measures.YTDPremium AS AGGREGATE (YTD(STRTOMEMBER(Measures.AsOfDate)), [Measures].[Written Premium])
SELECT NON EMPTY
{
Measures.YTDPremium
} ON COLUMNS, NON EMPTY
{
(
[Program].[Program Name].[Program Name]
,[Insuring Company].[Insuring Company Name].[Insuring Company Name]
,[Line Of Business].[Line Of Business].[Line Of Business]
,[Producer].[Producer Name].[Producer Name]
)
} ON ROWS
FROM [Premium]

Try understand what the following part does in your query
NON EMPTY { ( [Program].[Program Name].[Program Name]
,[Insuring Company].[Insuring Company Name].[Insuring Company Name]
,[Line Of Business].[Line Of Business].[Line Of Business]
,[Producer].[Producer Name].[Producer Name]
) } ON ROWS
In the above MDX you are telling the server to take a cross product of all values of "Programs", "Line Of Business" and "Producer Name". So lets say you have 4 values of programs , 3 values of line of business and 2 values of producer name. The total combinations are 4*3*2=24
Now the "Non empty" removes any combinations that are not present in your dataset. This is done by removing all rows that have "Null" value in column value.
Your measure is returning value irrespective if that combination exists or not. You can modify your Calculatedmeasure to return value only in the case if the combination is valid. This can be achived by checking an actual measure for that combination
Edit: based the below example is based on the comment
In the below example i am trying to get the internet sales amount categories and components
select
{ [Measures].[Internet Sales Amount] }
on columns,
(
[Product].[Category].[Category],
[Customer].[Country].[Country]
)
on rows
from [Adventure Works]
Result
Now add "Non empty" to the query and observe the results.
Results
Now lets add calculted measure that returns "hello". Notice how the non empty clause is ineffective.
Now modify the code make the calculated measure check other measures for null
with member measures.t as
case when [Measures].[Internet Sales Amount] = null then null else "hello" end
select
{ [Measures].[Internet Sales Amount] ,measures.t }
on columns,
non empty
(
[Product].[Category].[Category],
[Customer].[Country].[Country]
)
on rows
from [Adventure Works]
Result
The bottom line: Because of cross product your result is so huge that SSAS is having hard time handling it.

MDX - 3rd + dimension example needed

I am trying to learn MDX. I am an experienced SQL Developer.
I am trying to find an example of an MDX query that has more than two dimensions. Every single webpage that talks about MDX provides simple two dimensional examples link this:
select
{[Measures].[Sales Amount]} on columns,
Customer.fullname.members on rows
from [Adventure Works DW2012]
I am looking for examples that use the following aliases: PAGES (third dimension?), section (forth dimension?) and Chapter (fifth dimension?). I have tried this but I do not think it is correct:
select
{[Measures].[Sales Amount]} on columns,
Customer.fullname.members on rows,
customer.Location.[Customer Geography] as pages
from [Adventure Works DW2012]
I am trying to get this output using an MDX query (this is from AdventureWorks DW2012):

That's not a 3-dimensional resultset in your screenshot, unless there's something cropped from it.
Something like
SELECT [Geography].[Country].Members ON 0,
[Customer].[CustomerName].Members ON 1
FROM [whatever the cube is called]
WHERE [Measures].[Sales Amount]
(dimension/hierarchy/level names may not be exactly right)
would give a resultset like the one in your message.
The beyond 2nd-dimension dimensions and dimension names are not used in any client tool that I know. (Others may know different). They seem to be there in MDX so that MDX can hand >2-dimensional resultsets to clients that can handle them (e.g. an MDX subquery handing its results to the main query).
An often-used trick in MDX is to get the members of two dimensions onto one axis by cross-joining:
SELECT
{[Date].[Calendar Date].[Calendar Year].Members * [Geography].[Country].Members} ON 0,
[something else] ON 1
FROM [Cube]

How about the following - it does not send more than two dimensions back to a flat screen but it uses quite a few dimensions explicitly:
SELECT
[Measures].[Sales Amount] ON O,
[Customer].[fullname].MEMBERS ON 1
FROM
(
SELECT
[Date].[Calendar Month].[Calendar Month].&[February-2012] ON 0,
[Geography].[Country].[Country].&[Canada] ON 1,
[Product].[Product].&[Red Bike] ON 2,
[Customer].[Customer].&[foo bar] ON 3
FROM [Adventure Works DW2012]
)
I've made up the dimension | hierarchy | member combinations as I do not have access to the cube.
Also if we consider implicit dimensions then take the following:
SELECT
[Customer].[Location].[Customer Geography] ON 0,
[Customer].[fullname].[fullname].&[Aaron Flores] ON 1
FROM [Adventure Works DW2012]
WHERE
(
[Measures].[Sales Amount]
);
On the slicer I've used braces (..) which indicate a tuple, but this is actually shorthand for the following:
SELECT
[Customer].[Location].[Customer Geography] ON 0,
[Customer].[fullname].[fullname].&[Aaron Flores] ON 1
FROM [Adventure Works DW2012]
WHERE
(
[Measures].[Sales Amount]
,[Date].[Calendar Month].[Calendar Month].[All],
,[Geography].[Country].[Country].[All],
,[Product].[Product].[All]
,...
,...
....
);
The All member from every dimension in the cube could be included in this slicer without affecting the result.
So the whole nature of mdx is multi-dimensional - yes you do not get more than a 2 dimensional table returned to your screen but the way you get to that cellset could well involve many dimensions.

MDX - Count of Filtered CROSSJOIN - Performance Issues

BACKGROUND: I've been using MDX for a bit but I am by no means an expert at it - looking for some performance help. I'm working on a set of "Number of Stores Authorized / In-Stock / Selling / Etc" calculated measures (MDX) in a SQL Server Analysis Services 2012 Cube. I had these calculations performing well originally, but discovered that they weren't aggregating across my product hierarchy the way I needed them to. The two hierarchies predominantly used in this report are Business -> Item and Division -> Store.
For example, in the original MDX calcs the Stores In-Stock measure would perform correctly at the "Item" level but wouldn't roll up a proper sum to the "Business" level above it. At the business level, we want to see the total number of store/product combinations in-stock, not a distinct or MAX value as it appeared to do originally.
ORIGINAL QUERY RESULTS: Here's an example of it NOT working correctly (imagine this is an Excel Pivot Table):
[FILTER: CURRENT WEEK DAYS]
[BUSINESS] [AUTH. STORES] [STORES IN-STOCK] [% OF STORES IN STOCK]
[+] Business One 2,416 2,392 99.01%
[-] Business Two 2,377 2,108 93.39%
-Item 1 2,242 2,094 99.43%
-Item 2 2,234 1,878 84.06%
-Item 3 2,377 2,108 88.68%
-Item N ... ... ...
FIXED QUERY RESULTS: After much trial and error, I switched to using a filtered count of a CROSSJOIN() of the two hierarchies using the DESCENDANTS() function, which yielded the correct numbers (below):
[FILTER: CURRENT WEEK DAYS]
[BUSINESS] [AUTH. STORES] [STORES IN-STOCK] [% OF STORES IN STOCK]
[+] Business One 215,644 149,301 93.90%
[-] Business Two 86,898 55,532 83.02%
-Item 1 2,242 2,094 99.43%
-Item 2 2,234 1,878 99.31%
-Item 3 2,377 2,108 99.11%
-Item N ... ... ...
QUERY THAT NEEDS HELP: Here is the "new" query that yields the results above:
CREATE MEMBER CURRENTCUBE.[Measures].[Num Stores In-Stock]
AS COUNT(
FILTER(
CROSSJOIN(
DESCENDANTS(
[Product].[Item].CURRENTMEMBER,
[Product].[Item].[UPC]
),
DESCENDANTS(
[Division].[Store].CURRENTMEMBER,
[Division].[Store].[Store ID]
)
),
[Measures].[Inventory Qty] > 0
)
),
FORMAT_STRING = "#,#",
NON_EMPTY_BEHAVIOR = { [Inventory Qty] },
This query syntax is used in a bunch of other "Number of Stores Selling / Out of Stock / Etc."-type calculated measures in the cube, with only a variation to the [Inventory Qty] condition at the bottom or by chaining additional conditions.
In its current condition, this query can take 2-3 minutes to run which is way too long for the audience of this reporting. Can anyone think of a way to reduce the query load or help me rewrite this to be more efficient?
Thank you!
UPDATE 2/24/2014: We solved this issue by bypassing a lot of the MDX involved and adding flag values to our named query in the DSV.
For example, instead of doing a filter command in the MDX code for "number of stores selling" - we simply added this to the fact table named query...
CASE WHEN [Sales Qty] > 0
THEN 1
ELSE NULL
END AS [Flag_Selling]
...then we simply aggregated these measures as LastNonEmpty in the cube. They roll up much faster than the full-on MDX queries.

It should be much faster to model your conditions into the cube, avoiding the slow Filter function:
If there are just a handful of conditions, add an attribute for each of them with two values, one for condition fulfilled, say "cond: yes", and one for condition not fulfilled, say "cond: no". You can define this in a view on the physical fact table, or in the DSV, or you can model it physically. These attributes can be added to the fact table directly, defining a dimension on the same table, or more cleanly as a separate dimension table referenced from the fact table. Then define your measure as
CREATE MEMBER CURRENTCUBE.[Measures].[Num Stores In-Stock]
AS COUNT(
CROSSJOIN(
DESCENDANTS(
[Product].[Item].CURRENTMEMBER,
[Product].[Item].[UPC]
),
DESCENDANTS(
[Division].[Store].CURRENTMEMBER,
[Division].[Store].[Store ID]
),
{ [Flag dim].[cond].[cond: yes] }
)
)
Possibly, you even could define the measure as a standard count measure of the fact table.
In case there are many conditions, it might make sense to add just a single attribute with one value for each condition as a many-to-many relationship. This will be slightly slower, but still faster than the Filter call.

I believe you can avoid the cross join as well as filter completely. Try using this:
CREATE MEMBER CURRENTCUBE.[Measures].[Num Stores In-Stock]
AS
CASE WHEN [Product].[Item Name].CURRENTMEMBER IS [Product].[Item Name].[All]
THEN
SUM(EXISTS([Product].[Item Name].[Item Name].MEMBERS,[Business].[Business Name].CURRENTMEMBER),
COUNT(
EXISTS(
[Division].[Store].[Store].MEMBERS,
(
[Business].[Business Name].CURRENTMEMBER,
[Product].[Item Name].CURRENTMEMBER
),
"Measure Group Name"
)
))
ELSE
COUNT(
EXISTS(
[Division].[Store].[Store].MEMBERS,
(
[Business].[Business Name].CURRENTMEMBER,
[Product].[Item Name].CURRENTMEMBER
),
"Measure Group Name"
)
)
END
I tried it using a dimension in my cube and using Area-Subsidiary hierarchy.
The case statement handles the situation of viewing data at Business level. Basically, the SUM() across all members of Item Names used in CASE statement calculates values for individual Item Names and then sums up all the values. I believe this is what you needed.

MDX currentmember/IS inconsistent?

I have three queries to filter by a member using the currentmember function. When the filter is applied to the hierarchy that has the member I want to filter by, I can match the members using the IS operator and get the correct result. It does not work though when the filtered set and the member are in different hierarchies. Yet, I can get the filtered results correctly for the second case if instead of objects comparison I just do a caption comparison. The examples use the AdventureWorks database.
This query is working as expected with the IS operator:
select non empty [Measures].[Reseller Sales Amount] on 0,
Filter (NonEmpty({[Geography].[Country].[Country].ALLMEMBERS * [Geography].[City].[City].ALLMEMBERS}), [Geography].[City].Currentmember IS [Geography].[City].&[Seattle]&[WA]) on 1
from [Adventure Works]
This one uses a caption comparison (different result, as expected)
select non empty [Measures].[Reseller Sales Amount] on 0,
Filter (NonEmpty({[Geography].[Country].[Country].ALLMEMBERS}), [Geography].[City].Currentmember.MEMBER_CAPTION = 'Seattle') on 1
from [Adventure Works]
This one though, which should produce the same result as the previous query, does not return anything:
select non empty [Measures].[Reseller Sales Amount] on 0,
Filter (NonEmpty({[Geography].[Country].[Country].ALLMEMBERS }), [Geography].[City].Currentmember IS [Geography].[City].&[Seattle]&[WA]) on 1
from [Adventure Works]
Thanks.

In fact, this is a bit strange. For me, the most surprising result is the second one. No reference in the set to be filtered to the city, and nevertheless, a filter is applied. I would think the reason for the second result is that somehow "implicit overwrite" kicks in.
And probably, the second and third case are treated differently as the optimizer somehow choses different ways to interpret the statement. Normally, string operations like the reference to caption are less efficient than the IS operator which works on object identity.

It looks like most comments confirm that the result of the second query is a bug. Some more comments here in this other thread

Why does Order ignore subselect?

I've run into some confusing behavior in Analysis Services 2005 (and 2008 R2) and would appreciate it if someone could explain why this is happening. For the sake of this question, I've reproduced the behavior against the Adventure Works cube.
Given this MDX:
SELECT [Customer].[Education].[(All)].ALLMEMBERS ON COLUMNS,
Order(DrillDownLevel({[Customer].[Customer Geography].[All Customers]}),
([Measures].[Internet Order Count]),
ASC) ON ROWS
FROM (SELECT {[Customer].[Education].&[Partial High School]} ON COLUMNS FROM [Adventure Works])
WHERE [Measures].[Internet Order Count];
The query evaluates with the ordered set on rows:
All Customers: 2, 136
Germany: 269
France: 298
Canada: 304
United Kingdom: 311
United States: 457
Australia: 497
However, if I include the All Member (or defaultmember) for Education in the tuple used in the order statement:
SELECT [Customer].[Education].[(All)].ALLMEMBERS ON COLUMNS,
Order(DrillDownLevel({[Customer].[Customer Geography].[All Customers]}),
([Measures].[Internet Order Count], [Customer].[Education].[All Customers]),
ASC) ON ROWS
FROM (SELECT {[Customer].[Education].&[Partial High School]} ON COLUMNS FROM [Adventure Works])
WHERE [Measures].[Internet Order Count];
Then the the set comes back in a significantly different order:
All Customers: 2, 136
France: 298
Germany: 269
United Kingdom: 311
Canada: 304
Australia: 497
United States: 459
Note that France and Germany are out of order relative to each other. Same with Canada / UK and with USA / Australia. From what I can tell, it's ordering based on the aggregation before the sub-cube is evaluated.
Why does including this member (which should implicitly be in the tuple in the first example?) cause the evaluation of the order statement to look outside of the subcube's visual totals? Filter and TopCount etc functions seem to have the same behavior.
Cheers.

My guess is that since the 2 attributes are related only by key (no direct relationship exists) the resulting crossjoin of the attributes results on the key members being aggregated and this is where the visualtotals is not applied (it's one of the ways how to calculate tho nonvisualtotal with subselects).
I've constructed a query to demonstrate what is happening as a result of that crossjoin:
WITH
MEMBER sortExpr as
AGGREGATE(Descendants([Customer].[Customer Geography].CurrentMember), [Measures].[Internet Order Count])
SELECT
[Customer].[Education].[(All)].ALLMEMBERS
*
{[Measures].[Internet Order Count], sortExpr} ON COLUMNS,
Order
(
DrillDownLevel({[Customer].[Customer Geography].[All Customers]})
,sortExpr
,ASC
) ON ROWS
FROM
(
SELECT
{[Customer].[Education].&[Partial High School]} ON COLUMNS
FROM [Adventure Works]
)
;
EDIT:
Here is another query that shows that the expression works correctly with attribute overwrites once you solve the subselect problem using a query scoped set (well-known solution):
WITH
set sub as
[Customer].[Customer Geography].[Customer]
MEMBER sortExpr2 as
Aggregate(existing sub, [Measures].[Internet Order Count])
SELECT
[Customer].[Education].[(All)].ALLMEMBERS
*
{[Measures].[Internet Order Count], sortExpr2} ON COLUMNS,
Order
(
DrillDownLevel({[Customer].[Customer Geography].[All Customers]})
,
(
-- [Customer].[Customer Geography].CurrentMember,
[Customer].[Education].[All Customers],
sortExpr2
)
,ASC
) ON ROWS
FROM
(
SELECT
{[Customer].[Education].&[Partial High School]} ON COLUMNS
FROM [Adventure Works]
)
;
Thanks to Boyan for tweeting this question.
Regards, Hrvoje

Note: Have a look at http://www.bp-msbi.com/2011/07/mdx-subselects-some-insight/ where I explained the behaviour in more detail.
A great article about subselects is Mosha's 2008 MDX: subselects and CREATE SUBCUBE in non-visual mode
Note what happens when you use a subselect. Implicit Exists and visual totals are applied. The key bit of information here in regards to what you are experiencing is:
2. Applies visual totals to the values of physical measures even within expressions if there are no coordinate overwrites.
In your first query SSAS applies visual totals by default. You can change your query to not do this like this:
SELECT [Customer].[Education].[(All)].ALLMEMBERS ON COLUMNS,
Order(DrillDownLevel({[Customer].[Customer Geography].[All Customers]}),
([Measures].[Internet Order Count]),
ASC) ON ROWS
FROM NON VISUAL (SELECT {[Customer].[Education].&[Partial High School]} ON COLUMNS FROM [Adventure Works])
WHERE [Measures].[Internet Order Count];
The NON VISUAL keyword in a SELECT statement tells SSAS to only apply the implicit Exists, but not the visual totals part. The results of the query will be the same as in the second case, but you will also see the real numbers it is ordering by.
Because you are explicitly overwriting the All member in the second query, SSAS does not apply the visual totals to this expression and orders by the total amounts for all years. However, it still displays the visual totals for the selected on ROWS measure after it evaluates the order on non-visual totals.

I'm not a SSAS specialist but this is related to attribute overwrite.
You're overwriting your sub-select in the Order function evaluation. Note, this behavior is context dependent (axes eval, calculated members and pivot). In you example the context is a function during the evaluation of the axis. The behavior is different from the evaluation of the pivot (once you're axes are known). It's complicated but the way it is.
Note that in icCube and after discussing with some MDX specialist we decided to simplify and not follow this behavior: subselect filter is always applied.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

NONEMPTY and CROSSJOIN performance and order in MDX - ssas

Related

Simple Calculated Member Running for Forever - MDX

MDX - 3rd + dimension example needed

MDX - Count of Filtered CROSSJOIN - Performance Issues

MDX currentmember/IS inconsistent?

Why does Order ignore subselect?

Categories

Resources