Why does dimensionality depend on the attribute in MDX - ssas

For the life of me I cannot figure out why MDX defines dimensionality per-attribute on a dimension rather than on the key. Perhaps I'm misunderstanding something here, but it seems like a very odd way to do this if I'm understanding things correctly. Let's say I have the following data:
Person
ID (Key)
Name
Age
And I have some data like this: [('Tom123', 'Tom', 15), ('Brad456', 'Brad', 16).
Now, why I could select the two users the following two ways:
{Person.Name.Tom, Person.Name.Brad}
Or:
{Person.ID.Tom123, Person.ID.Brad456}
But not the following way:
{Person.Name.Tom, Person.ID.Brad456}
Yet all three use the same 'dimension' and even 'dimensionality' since all three ways uniquely address the same two Person entities!
This seems so odd to me, in that they are both using the same 'dimension' and 'dimensionality' and should be able to use the Key for that dimension rather than thinking each attribute is unique. Why is this so? Or, am I misunderstanding something in this.
If we use this image:
Why would it matter if we address the individual cube (tuple) by doing: {Product.TV, Geography.Asia, Time.Q1} or doing {Product.TV, Geography.Asia, Time.Quarter One}. They're just two ways of doing exactly the same thing but yet MDX considers them different dimensionality (?).

That is an excellect question.
Yet all three use the same 'dimension' and even 'dimensionality' since
all three ways uniquely address the same two Person entities!
This seems so odd to me, in that they are both using the same
'dimension' and 'dimensionality' and should be able to use the Key for
that dimension rather than thinking each attribute is unique. Why is
this so? Or, am I misunderstanding something in this.
So for any language, the syntax is checked before anything else. The case you are refering is the only instance where Dimensionality can be ignored. But it will not be wise for a language to go evaluate the data first(in terms of SSAS and MDX its called AttributeID). Secondly this will be more confusing to a user. building on your example
{Person.Name.Tom, Person.ID.Brad456}
lets we have 100 people in Newyork, so as per your logic the following
the set will be {Person.City.Newyork}, then MDX will replace all 100 IDs. This would prevent MDX from using Aggregates, which is the biggest advantge of a CUBE. Mostly the queries sent to a Cube are targeted towards summarized data. Most queries in MDX would need data of entire NEWYORK rather then a single person in NEWYORK
So to summarize, the compile time will increase and the Aggregates will not be useable.

Related

Aggregation of an annotation in GROUP BY in Django

UPDATE
Thanks to the posted answer, I found a much simpler way to formulate the problem. The original question can be seen in the revision history.
The problem
I am trying to translate an SQL query into Django, but am getting an error that I don't understand.
Here is the Django model I have:
class Title(models.Model):
title_id = models.CharField(primary_key=True, max_length=12)
title = models.CharField(max_length=80)
publisher = models.CharField(max_length=100)
price = models.DecimalField(decimal_places=2, blank=True, null=True)
I have the following data:
publisher title_id price title
--------------------------- ---------- ------- -----------------------------------
New Age Books PS2106 7 Life Without Fear
New Age Books PS2091 10.95 Is Anger the Enemy?
New Age Books BU2075 2.99 You Can Combat Computer Stress!
New Age Books TC7777 14.99 Sushi, Anyone?
Binnet & Hardley MC3021 2.99 The Gourmet Microwave
Binnet & Hardley MC2222 19.99 Silicon Valley Gastronomic Treats
Algodata Infosystems PC1035 22.95 But Is It User Friendly?
Algodata Infosystems BU1032 19.99 The Busy Executive's Database Guide
Algodata Infosystems PC8888 20 Secrets of Silicon Valley
Here is what I want to do: introduce an annotated field dbl_price which is twice the price, then group the resulting queryset by publisher, and for each publisher, compute the total of all dbl_price values for all titles published by that publisher.
The SQL query that does this is as follows:
SELECT SUM(dbl_price) AS total_dbl_price, publisher
FROM (
SELECT price * 2 AS dbl_price, publisher
FROM title
) AS A
GROUP BY publisher
The desired output would be:
publisher tot_dbl_prices
--------------------------- --------------
Algodata Infosystems 125.88
Binnet & Hardley 45.96
New Age Books 71.86
Django query
The query would look like:
Title.objects
.annotate(dbl_price=2*F('price'))
.values('publisher')
.annotate(tot_dbl_prices=Sum('dbl_price'))
but gives an error:
KeyError: 'dbl_price'.
which indicates that it can't find the field dbl_price in the queryset.
The reason for the error
Here is why this error happens: the documentation says
You should also note that average_rating has been explicitly included
in the list of values to be returned. This is required because of the ordering of the values() and annotate() clause.
If the values() clause precedes the annotate() clause, any annotations
will be automatically added to the result set. However, if the
values() clause is applied after the annotate() clause, you need to explicitly include the aggregate column.
So, the dbl_price could not be found in aggregation, because it was created by a prior annotate, but wasn't included in values().
However, I can't include it in values either, because I want to use values (followed by another annotate) as a grouping device, since
If the values() clause precedes the annotate(), the annotation will be computed using the grouping described by the values() clause.
which is the basis of how Django implements SQL GROUP BY. This means that I can't include dbl_price inside values(), because then the grouping will be based on unique combinations of both fields publisher and dbl_price, whereas I need to group by publisher only.
So, the following query, which only differs from the above in that I aggregate over model's price field rather than annotated dbl_price field, actually works:
Title.objects
.annotate(dbl_price=2*F('price'))
.values('publisher')
.annotate(sum_of_prices=Count('price'))
because the price field is in the model rather than being an annotated field, and so we don't need to include it in values to keep it in the queryset.
The question
So, here we have it: I need to include annotated property into values to keep it in the queryset, but I can't do that because values is also used for grouping (which will be wrong with an extra field). The problem essentially is due to the two very different ways that values is used in Django, depending on the context (whether or not values is followed by annotate) - which is (1) value extraction (SQL plain SELECT list) and (2) grouping + aggregation over the groups (SQL GROUP BY) - and in this case these two ways seem to conflict.
My question is: is there any way to solve this problem (without things like falling back to raw sql)?
Please note: the specific example in question can be solved by moving all annotate statements after values, which was noted by several answers. However, I am more interested in solutions (or discussion) which would keep the annotate statement(s) before values(), for three reasons: 1. There are also more complex examples, where the suggested workaround would not work. 2. I can imagine situations, where the annotated queryset has been passed to another function, which actually does GROUP BY, so that the only thing we know is the set of names of annotated fields, and their types. 3. The situation seems to be pretty straightforward, and it would surprise me if this clash of two distinct uses of values() has not been noticed and discussed before.
Update: Since Django 2.1, everything works out of the box. No workarounds needed and the produced query is correct.
This is maybe a bit too late, but I have found the solution (tested with Django 1.11.1).
The problem is, call to .values('publisher'), which is required to provide grouping, removes all annotations, that are not included in .values() fields param.
And we can't include dbl_price to fields param, because it will add another GROUP BY statement.
The solution in to make all aggregation, which requires annotated fields firstly, then call .values() and include that aggregations to fields param(this won't add GROUP BY, because they are aggregations).
Then we should call .annotate() with ANY expression - this will make django add GROUP BY statement to SQL query using the only non-aggregation field in query - publisher.
Title.objects
.annotate(dbl_price=2*F('price'))
.annotate(sum_of_prices=Sum('dbl_price'))
.values('publisher', 'sum_of_prices')
.annotate(titles_count=Count('id'))
The only minus with this approach - if you don't need any other aggregations except that one with annotated field - you would have to include some anyway. Without last call to .annotate() (and it should include at least one expression!), Django will not add GROUP BY to SQL query. One approach to deal with this is just to create a copy of your field:
Title.objects
.annotate(dbl_price=2*F('price'))
.annotate(_sum_of_prices=Sum('dbl_price')) # note the underscore!
.values('publisher', '_sum_of_prices')
.annotate(sum_of_prices=F('_sum_of_prices')
Also, mention, that you should be careful with QuerySet ordering. You'd better call .order_by() either without parameters to clear ordering or with you GROUP BY field. If the resulting query will contain ordering by any other field, the grouping will be wrong.
https://docs.djangoproject.com/en/1.11/topics/db/aggregation/#interaction-with-default-ordering-or-order-by
Also, you might want to remove that fake annotation from your output, so call .values() again.
So, final code looks like:
Title.objects
.annotate(dbl_price=2*F('price'))
.annotate(_sum_of_prices=Sum('dbl_price'))
.values('publisher', '_sum_of_prices')
.annotate(sum_of_prices=F('_sum_of_prices'))
.values('publisher', 'sum_of_prices')
.order_by('publisher')
This is expected from the way group_by works in Django. All annotated fields are added in GROUP BY clause. However, I am unable to comment on why it was written this way.
You can get your query to work like this:
Title.objects
.values('publisher')
.annotate(total_dbl_price=Sum(2*F('price'))
which produces following SQL:
SELECT publisher, SUM((2 * price)) AS total_dbl_price
FROM title
GROUP BY publisher
which just happens to work in your case.
I understand this might not be the complete solution you were looking for, but some even complex annotations can also be accommodated in this solution by using CombinedExpressions(I hope!).
Your problem comes from values() follow by annotate(). Order are important.
This is explain in documentation about [order of annotate and values clauses](
https://docs.djangoproject.com/en/1.10/topics/db/aggregation/#order-of-annotate-and-values-clauses)
.values('pub_id') limit the queryset field with pub_id. So you can't annotate on income
The values() method takes optional positional arguments, *fields,
which specify field names to which the SELECT should be limited.
This solution by #alexandr addresses it properly.
https://stackoverflow.com/a/44915227/6323666
What you require is this:
from django.db.models import Sum
Title.objects.values('publisher').annotate(tot_dbl_prices=2*Sum('price'))
Ideally I reversed the scenario here by summing them up first and then doubling it up. You were trying to double it up then sum up. Hope this is fine.

Trouble with DIMENSION_PROPERTIES

I'm doing some exploratory work with tiny test dimensions and cubes. I'm hoping to finally fully understand SSAS's concept of a "dimension attribute", which I find pretty incomprehensible, though I've used SSAS for years without really getting to the bottom of it.
But that's too broad a question. My problem now is this:
I've created an Employee dimension. Key column is EID (int), Name column is Name (varchar). OK, my naming conventions aren't brilliant (but good naming conventions seem to be very difficult in SSAS dimension design, given the fact that all object names are exposed to users).
There is also an attribute called Salary.
What I'm trying to do here is explore the possibility of a "dimension leaf-level property". Not really "dimension attributes" in the sense SSAS uses them, though the only way to implement them is as dimension attributes. What I mean by a "leaf-level property" is a non-aggregatable "attribute", with the same cardinality as the leaf level (or, in any case, to be treated as having the same cardinality). I don't want to aggregate by Salary, for the purposes of this exercise: I just want Salary to be available as a property of each leaf-level member of the dimension. A bit like an extension of the Key Column and Name Column properties.
I'm familiar with this common way of showing member properties:
WITH MEMBER Measures.ESal AS Employee.EID.CurrentMember.Properties("Salary")
SELECT
Measures.ESal ON 0,
Employee.EID.EID.Members
ON 1
FROM $Employee
This works. To prove that I'm only returning members from the leaf level, here's the result set:
(It makes sense that a member property for a non-leaf level might not have a value - and I've read that in this case, MDX simply throw an error rather than showing values for the leaf-level members and nothing for non-leaf).
But trying to use DIMENSION PROPERTIES causes an error:
SELECT
{} ON 0,
Employee.EID.EID.Members
DIMENSION PROPERTIES Employee.Salary
ON 1
FROM $Employee
Result: Query (4, 22) The [Employee].[Salary] dimension attribute was not found.
If I put the hierarchy and level in, though, it works (though SSMS doesn't actually show anything specified in DIMENSION PROPERTIES; as I'd heard already):
SELECT
{} ON 0,
Employee.EID.EID.Members
DIMENSION PROPERTIES Employee.EID.EID.Salary
ON 1
FROM $Employee
Result: just like before, but with no ESal column (obviously).
Two questions:
Why does the Salary attribute have to be addressed as an attribute
of dimension/hierarchy/level Employee.EID.EID? Surely it's an
attribute in the dimension? (I find the way SSAS/SSDT implements
multiple hierarchies deeply confused and confusing)
What is the point of DIMENSION PROPERTIES? Do other clients
actually show what it returns? If so, isn't it then a nightmare to
test this code?

SSAS: Two sets specified in the function have different dimensionality

I'm trying to run the following MDX query (I'm newbie in the matter):
WITH MEMBER [Measures].[Not Null SIGNEDDATA] AS IIF( IsEmpty( [Measures].[SIGNEDDATA] ), 0, [Measures].[SIGNEDDATA] )
SELECT
{[Measures].[Not Null SIGNEDDATA]} ON COLUMNS,
{[Cuenta].[818000_001],[Cuenta].[818000_G02]} ON ROWS
FROM [Notas_SIC]
WHERE ([Auditoria].[AUD_NA],[Concepto].[CONCEPTO_NA],[Entidad].[CCB],
[Indicador].[INDICADOR_NA],[Interco].[I_NONE],[Moneda].[COP],
[Tiempo].[2010.01],[Version].[VERSION_NA])
Where 818000_001 is a base member of my 'Cuenta' dimension, and 818000_G02 is a node or aggregation. I receive the following message:
"Two sets specified in the function have different dimensionality"
What am I doing wrong? If I put the query with only base members (many) or only differents aggregations, the result is ok as expected.
Thanks in advance!
By using commas in your WHERE clause, you are trying to create a set out of members from different dimensions. You should use asterisks to make a CROSSJOIN instead.
So replace this:
WHERE ([Auditoria].[AUD_NA],[Concepto].[CONCEPTO_NA],[Entidad].[CCB],
[Indicador].[INDICADOR_NA],[Interco].[I_NONE],[Moneda].[COP],
[Tiempo].[2010.01],[Version].[VERSION_NA])
With this:
WHERE ({[Auditoria].[AUD_NA],[Concepto].[CONCEPTO_NA],[Entidad].[CCB]} *
{[Indicador].[INDICADOR_NA],[Interco].[I_NONE],[Moneda].[COP]} *
{[Tiempo].[2010.01],[Version].[VERSION_NA]})
I added curly braces though I'm not sure if they're absolutely required.
This is maybe your problem:
{[Cuenta].[818000_001],[Cuenta].[818000_G02]} ON ROWS
You have put them as a set but you can only make a set out of members with the same dimensionality. From your description it sounds like [Cuenta].[818000_G02] is being classed as a different hierarchy which is unexpected.
Is [Cuenta].[818000_G02] created using the Aggregate function? Can you swap to using the Sum function? Does the script then work?
(not a great answer - just more questions?)
{[Cuenta].[818000_001],[Cuenta].[818000_G02]}
Your problem probably lies in the above statement. As is clear, they could be from different hierarchies(which is not clear from your example and I come to that below).
Try replacing the above with
{[Cuenta].[Hierarchy1].[818000_001]} * {[Cuenta].[Hierarchy2].[818000_G02]}
where Hierarchy1 and Hierarchy2 are the hierarchies to which these members belong. (If you are not sure, just try {[Cuenta].[818000_001]} * {[Cuenta].[818000_G02]}). I am assuming that the second aggregate members may be built on a different hierarchy and thus the engine is throwing an error.
It is generally a good habit to always use the fully qualified member name because then the SSAS engine has to spend less time in searching for the member.

MDX newbie: How do I filter/except based on an intermediate level which is not unique

I have a dimension ItemSales with a hierarchy of levels like Site->ItemType->Item
ItemType is not unique. Sites can sell the same ItemType.
I want to exclude a certain type from the totals (such as Unknown)
This must be easy but I'm stuck. It seems like Except would work but as far I can work out, except requires me to enumerate every site
Except([ItemSales].[Sites].[ItemType].Members],{[ItemSales].[site1].[unknown],[ItemSales].[site2].[Unknown]})
this also doesn't help if I just want to aggregate at Sites level.
The examples I see for filter focus on numerical filters for measures. Can you filter on the name of a member or whatever we call the key value it gets from the column?
Sorry for asking such an easy question but I'm not getting any less confused the more I read.
Not sure this is the best way to address your problem, but you can do something with the Filter() function using the .name of the members to keep all [ItemType] that are not 'Unknown' :
Filter(
[ItemSales].[Sites].[ItemType].Members,
[ItemSales].[Sites].currentMember.name <> 'Unknown'
)

MDX WHERE NOT IN() equivalent in many-to-many dimension

I have a many-to-many dimension in my cube (next to other regular dimensions). When I want to exclude fact rows in my row count measure, I usually do something like the following in MDX
SELECT [Measures].[Row Count] on 0
FROM cube
WHERE ([dimension].[attribute].Children - [dimension].[attribute].&[value])
This might seem more complicated than needed in this simple example, but in this case the WHERE can grow sometimes, also including UNIONs.
So this works for regular dimensions, but now I have a many-to-many dimension. If I perform the trick above it does not produce the desired result, namely I want to exclude all rows that have that specific attribute in the many-to-many dimension.
Actually it does exactly what the MDX asks, namely count all rows, but ignore the specified attribute. Since a row in the fact table can have multiple attributes in a many-to-many dimension, the row will still be counted.
That's not what I need, I need it to explicitly exclude rows that have that dimension attribute value. Also, I might exclude multiple values. So what I need is something similar to T-SQL's WHERE .. NOT IN (...)
I realize that I can just subtract the resulting values from [attribute].all and [attribute].&[value], but that won't work any more when UNIONing multiple WHERE statements.
Anybody got a good idea on how to solve this?
Thanks in advance,
Delta
I have not tested this, but I think you could do this if you had an attribute that was at the same level of granularity as the rows (so probably implemented as a fact relationship).
So if you wanted to count the number of orders that did NOT have a product category of bikes (assuming a M2M relationship between OrderID and Category) then something like the following should work. (you can find more info on the EXISTS function in Books Online)
[Orders].[Order ID].[Order ID].Members
- EXISTS([Orders].[Order ID].[Order ID].Members
, [Product].[Category].&[Bikes]
, "Order Facts")
Although it could be quite slow as this sort of query is forcing the SSAS engine to add up a lot of facts from a low level.
Have you tried the EXCEPT command? It's syntax is like the following:
EXCEPT({the set i want}, {a set of members i dont want})
You could use the Filter function:
SELECT [Measures].[Row Count] on 0
FROM [cube]
WHERE Filter([dimension].[attribute].Children, [dimension].CurrentMember.MemberValue <> value)