I have a many-to-many dimension in my cube (next to other regular dimensions). When I want to exclude fact rows in my row count measure, I usually do something like the following in MDX
SELECT [Measures].[Row Count] on 0
FROM cube
WHERE ([dimension].[attribute].Children - [dimension].[attribute].&[value])
This might seem more complicated than needed in this simple example, but in this case the WHERE can grow sometimes, also including UNIONs.
So this works for regular dimensions, but now I have a many-to-many dimension. If I perform the trick above it does not produce the desired result, namely I want to exclude all rows that have that specific attribute in the many-to-many dimension.
Actually it does exactly what the MDX asks, namely count all rows, but ignore the specified attribute. Since a row in the fact table can have multiple attributes in a many-to-many dimension, the row will still be counted.
That's not what I need, I need it to explicitly exclude rows that have that dimension attribute value. Also, I might exclude multiple values. So what I need is something similar to T-SQL's WHERE .. NOT IN (...)
I realize that I can just subtract the resulting values from [attribute].all and [attribute].&[value], but that won't work any more when UNIONing multiple WHERE statements.
Anybody got a good idea on how to solve this?
Thanks in advance,
Delta
I have not tested this, but I think you could do this if you had an attribute that was at the same level of granularity as the rows (so probably implemented as a fact relationship).
So if you wanted to count the number of orders that did NOT have a product category of bikes (assuming a M2M relationship between OrderID and Category) then something like the following should work. (you can find more info on the EXISTS function in Books Online)
[Orders].[Order ID].[Order ID].Members
- EXISTS([Orders].[Order ID].[Order ID].Members
, [Product].[Category].&[Bikes]
, "Order Facts")
Although it could be quite slow as this sort of query is forcing the SSAS engine to add up a lot of facts from a low level.
Have you tried the EXCEPT command? It's syntax is like the following:
EXCEPT({the set i want}, {a set of members i dont want})
You could use the Filter function:
SELECT [Measures].[Row Count] on 0
FROM [cube]
WHERE Filter([dimension].[attribute].Children, [dimension].CurrentMember.MemberValue <> value)
Related
For the life of me I cannot figure out why MDX defines dimensionality per-attribute on a dimension rather than on the key. Perhaps I'm misunderstanding something here, but it seems like a very odd way to do this if I'm understanding things correctly. Let's say I have the following data:
Person
ID (Key)
Name
Age
And I have some data like this: [('Tom123', 'Tom', 15), ('Brad456', 'Brad', 16).
Now, why I could select the two users the following two ways:
{Person.Name.Tom, Person.Name.Brad}
Or:
{Person.ID.Tom123, Person.ID.Brad456}
But not the following way:
{Person.Name.Tom, Person.ID.Brad456}
Yet all three use the same 'dimension' and even 'dimensionality' since all three ways uniquely address the same two Person entities!
This seems so odd to me, in that they are both using the same 'dimension' and 'dimensionality' and should be able to use the Key for that dimension rather than thinking each attribute is unique. Why is this so? Or, am I misunderstanding something in this.
If we use this image:
Why would it matter if we address the individual cube (tuple) by doing: {Product.TV, Geography.Asia, Time.Q1} or doing {Product.TV, Geography.Asia, Time.Quarter One}. They're just two ways of doing exactly the same thing but yet MDX considers them different dimensionality (?).
That is an excellect question.
Yet all three use the same 'dimension' and even 'dimensionality' since
all three ways uniquely address the same two Person entities!
This seems so odd to me, in that they are both using the same
'dimension' and 'dimensionality' and should be able to use the Key for
that dimension rather than thinking each attribute is unique. Why is
this so? Or, am I misunderstanding something in this.
So for any language, the syntax is checked before anything else. The case you are refering is the only instance where Dimensionality can be ignored. But it will not be wise for a language to go evaluate the data first(in terms of SSAS and MDX its called AttributeID). Secondly this will be more confusing to a user. building on your example
{Person.Name.Tom, Person.ID.Brad456}
lets we have 100 people in Newyork, so as per your logic the following
the set will be {Person.City.Newyork}, then MDX will replace all 100 IDs. This would prevent MDX from using Aggregates, which is the biggest advantge of a CUBE. Mostly the queries sent to a Cube are targeted towards summarized data. Most queries in MDX would need data of entire NEWYORK rather then a single person in NEWYORK
So to summarize, the compile time will increase and the Aggregates will not be useable.
I'm trying to optimize a 2M row SSAS query into Power BI by using MDX prior to the Power Query. I have experience in T-SQL and found a website to help translate T-SQL experience into MDX, which was successful for some queries (basic rows/column selects, crossjoins, non empty, order by, filter, where). So now I want to get in my sales data which contains three dimensions and four measures but I get the following error:
Executing the query ...
Query (3, 1) The 'Measures' hierarchy appears more than once in the tuple.
Run complete
I attempted a few variations related to crossjoining the measures and the dimensions, only selecting one measure (which still took too long), and specifying members vs children.
'''
select
([Date].[OrderDate].children, [Customer].[CustID].children, [ProdLevel].[ProdNumber].children) on rows,
([Measures].[Revenue], [Measures].[Units], [Measures].[ASP], [Measures].[Profit]) on columns
from [RepProdDB]
where [ProdLevel].[Prod Description].[MyBusinessUnit]
'''
Looking up the error: "The 'Measures' hierarchy appears more than once in the tuple." is a bit vague to me as I have slight but probably incomplete understanding of tuples.
My hope is to have something that I can easily get in PivotTable OLAP, Power Pivot, and Power Query but using the actual MDX code. Thoughts?
So you need to understand the diffrence between tuples and sets.
select
non empty
(
[Date].[OrderDate].children,
[Customer].[CustID].children,
[ProdLevel].[ProdNumber].children
)
on rows,
{
[Measures].[Revenue],
[Measures].[Units],
[Measures].[ASP],
[Measures].[Profit]
}
on columns
from [RepProdDB]
where
[ProdLevel].[Prod Description].[MyBusinessUnit]
I am trying to run a query that aggregates data, groups the results by several different fields, and extract all relevant "SubTotal" permutations. (similar to CUBE() in MSSQL)
When Using Group By Rollup(), I get only permutations according to the order of the Group By fields in the Rollup function.
For example the query below (runs on a public dataset), it returns subtotal by year, or by year and month, or by year, month and medallion... but it doesn't subtotal by medallion.
SELECT
trip_year,
trip_month,
medallion,
SUM(trip_count) AS Sum_trip_count
FROM
[nyc-tlc:yellow.Trips_ByMonth_ByMedallion]
WHERE
medallion IN ("2R76", "8J82", "3B85", "4L79", "5D59", "6H75", "7P60", "8V48", "1H12", "2C69", "2F38", "5Y86", "5j90", "8A75", "8V41", "9J24", "9J55", "1E13", "1J82")
GROUP BY
ROLLUP(trip_year,
trip_month,
medallion)
My question is:
What should I do in order to get all different permutations of "Sub Totals" in a single query results.
Already tried: Union with similar query but with different order, it works, but not elegant (it would require too many unions).
Thanks
You are correct on both counts. In BigQuery, ROLLUP respects the hierarchy treating the listed fields as a strictly ordered list. Their order will not be changed during aggregation.
The CUBE aggregate commonly found in other SQL environments is unordered and in fact aggregates every possible order/subset of its listed fields. At this time, CUBE has not been implemented in BigQuery. The workaround you suggest is also what I would suggest. UNION all result sets from ROLLUP using each permutation of its contained fields. Albeit not ideal, you should get the same results.
In short, UNIONs of several queries with different permutations of ROLLUP fields is the only way to achieve this at the moment. The downsides are as you state that this may be difficult to maintain and can be more expensive in queries.
If you would like to see CUBE implemented in BigQuery, I strongly encourage you to file a feature request on the Big Query public issue tracker. Be sure to include a thorough use case in this request.
UPDATE: To support the feature request filed by the OP, please star it and you'll receive notifications with updates.
In analysis services, I have cube that is based on hospitalization data. For each hospitalization there are potentially 9 icd codes and these are each stored in their own field in the view on which the cube is based. These are stored in a child table in the relational database on which the SSAS database is based.
I would like to query the cube to return all rows that have a certain ICD code in any one or more of the 9 icd code fields. It seems as if it should be simple to have this sort of "OR" in the WHERE or the Filter clause, but I'm not finding the correct method.
Thanks in advance,
Jeremy Schrader
As far as i understand, you are an SQL guy and new to MDX, so that's why you have difficulties for the query.
it would be better if you tell us what are the measures you want to select with ICD codes but i am going to try to show you an mdx query sample as simple as possible. Your query should like below;
select {Measure1,Measure2,...} on columns
ICDCodeDimension.Children on rows
//{ICDCodeDimension.ICDCode1,ICDCodeDimension.ICDCode5,...} on rows
from Cube
MDX is highly advanced query language and there are many more concept you should know/learn to use it effectively.
Hope this help.
I am guessing that you'd have a dimension called [ICD Codes] with a single level called [Codes] and 9 members called [Code A] and [Code B] or whatever. Perhaps even a member for [No code] too?
In that case your query would be able to tell you the total number of hospitalisation cases for each code, for a certain time period, across all hospitals:
SELECT {[ICD Codes].[Codes].members} ON ROWS,
{[Measures].[Number of Cases]} ON COLUMNS
FROM [CubeName]
WHERE ([Time].[2010].[Quarter 1])
Thanks for both of your feedback. After I researched further (particularly an article here: http://sqlblog.com/blogs/mosha/default.aspx that uses the SUBCUBE method to give some "OR" functionality but with very poor performance) I realized that the OR construct that I was looking for requires record-level information and so doesn't work after the aggregation that SSAS performs. Thus, I need to create a field on the fact table that has the result of the SQL "OR" statement that I need.
In this case I will just create a flag for any record that has a certain range of ICD codes in any of the 9 ICD code fields. Then, I'll create a measure that gives a count of these. Luckily, the requirements of my app are that only a limited number of diagnoses need to be looked at in this way(i.e. any hospitalization that is diabetes-related,tobacco-related,etc.). I'm still curious how one would approach this if you needed to allow the user to choose any ICD code. My understanding at this point is that you would then need to revert back to plain SQL.
Jeremy
I have a situation where I have a product and a time dimension, with a fact table of sales volume. Over time, various details about the product changes, with the except of the business key for the product. In my flat reporting from the cube, I want to include some aggregration at the 'business key' level, regardless of what other parts of the product dimension are shown.
In sql this would be trivial as something like:
select sum(volume) over (partition by productKey,year) as Total
Regardless of whatever else I had selected, the Total column would be aggregated only on those two fields.
In MDX I have managed to achieve the same result, but it seems like there must be a simpler way.
WITH MEMBER Measures.ProductKeyTotal AS
'SUM(([Product].[ProductKey],[Time].[Year]
,[Product].[Product Name].[Product Name].ALLMEMBERS
,[Volume Type].[Volume Type Id].[Volume Type Id].ALLMEMBERS)
,[Measures].[Volume])'
SELECT {[Measures].[Volume],[Measures].[ProductKeyTotal]} ON COLUMNS,
NONEMPTYCROSSJOIN ([Product].[ProductKey].[ProductKey].ALLMEMBERS
,[Time].[Time].[Year].ALLMEMBERS
,[Product].[Product Name].[Product Name].ALLMEMBERS
,[Volume Type].[Volume Type Id].[Volume Type Id].ALLMEMBERS) ON ROWS
FROM [My Cube]
WHERE ([Product].[Include In Report].&[True])
1) If I don't include the allmembers for the rows I don't want in the calculated member the total is not correct, is there a shortcut to force it to ignore all the dimensions other that what you specify?
Part of the reason I ask is that I need to add a bunch of other calculated members, some of which will be using parameters and if I use the method from the example above I am going to need to duplicate the same stuff in multiple places, and the code will get weighty.
Well, first of all, don't use NonEmptyCrossJoin--it's been deprecated. Use non empty and then the cross join operator (*).
It's important to understand how tuples and tuple sets work to answer your question. Essentially, any dimension not explicitly stated will always get the CurrentMember of a given dimension. Typically, this is DefaultMember, but if you have it set to something else in your query, that will change this up. The reason you have to specify ALLMEMBERS for those dimensions is because it will use CurrentMember, otherwise. You could just use the [All] member in lieu of trying to sum up ALLMEMBERS (especially if they're not flat!), which will give you a bit better performance.
The most performant way to do this is to add another Measure Group to your cube, and then remove the keys that don't apply to the measure from that Measure Group. This way, you get a native calculation for these rather than a run-time calculation (which tend to be slow, especially when you're adding up everything in your cube). Moreover, you can even set up some aggregation design on that Measure Group, and it will be very performant.