SSAS Multidimensional fact to filter on dimension attribute - ssas

I have 2 facts :
F_Budget
F_Actual
They both have the same dimensions, except of D_Version, that is only for F_Budget.
In D_Version, I have an attribute ReferenceDate, so that each Budget Version, can specify the actual Figures that where use at time of budgeting...
Now I need to Filter my "Actual" measure, so that it is looking in my D_Version Dimension, taking out the ReferenceDate, and then Filtering the F_Actual, like
F_Actual.Date = D_Version.ReferenceYear (but the D_Version.ReferenceDate from F_Budget)
The idea is being able to put side by side
Budget_Round1 and Actual with Date=ReferenceDate
Any Idea how to do that, or is there perhaps a better way ?

Related

HANA Studio - Calculation view Calculated column not being aggregated correctly

I encounter a problem while I am trying to aggregate (sum) a calculated column which was created in another Aggregation node from another Calculation view.
Calculation View : TEST2
Projection 1 (Plain projection of another query)
Projection1
Aggregation 1 Sum Amount_LC by HKONT and Unique_document_identifier. In the aggregation, a calculated column Clearing_sum is created with the following formular:
Aggregation1
[Question 1] The result of this calculation in raw data preview makes sense for me but the result in Analysis tab seems incorrect. What is the cause of this different output between Analysis and Raw Data?
Result Raw Data
Result Analysis
I thought that it might be the case that, instead of summing up, the analysis uses the formular of Clearing_sum since it is in the same node.
So I tried creating a new Calculation (TEST3) with a projection on this TEST2 (all columns included) and ran to see the output. I still get the same output (correct raw data but incorrect analysis).
Test3
Result Analysis Test3
[QUESTION 2] How can I get my desired result? (e.g. the sum of Clearing_sum for the highlighted row should be 2 according to Raw data tab). I also tried enabling the Client-side aggregation in the Calculated column, but it did not help.
Without the actual models (and not just screenshots) it is hard to tell what the cause of the problem here is.
One possible cause could be that removing the HKONT changed the grouping level of the underlying view that computed SUM(Amount_LC). In turn, this affects the calculation of Clearing_sum.
A way to avoid this is to instruct HANA to not strip those unreferenced columns and to not change the grouping level. To do that, the KEEP FLAG needs to be set for the columns that should stay part of the grouping.
For a more detailed explanation of this flag, check the documentation and/or blog posts like Usage of “Keep Flag”.

DeepAR Building Product Categories

I have a problem with the understanding of the DeepAR Algorithm.
I tried to forecast the sales of single products with the Algorithm.
First I tried it for one SKU on a daily frequence but I got the following error message:
ParamValidationError: Parameter validation failed:
Invalid type for parameter Body, value: [datetime
I thought, that the reason for that error was that I have too many "NaN"- values in my targets. Could that be the reason?
(I didn't apply any categories or dynamic_feats)
I then tried to make the forecast on a monthly frequence, but the result was that I didn't have enough timestamps for the algorithm.
Would it be possible to group my products within the DeepAR algorithm through the "cat" or the "dynamic_feat" operators, so that I would have less "NaN"-values in my targets?
I would like to group the products by different features like color, price or size. Do you know if that is possible, or do I have to do that before I apply DeepAR?
Thanks in advance:)
It looks like the error is thrown by boto (ParamValidationError). I suspect that you are not using the correct json-format to send the requests. See an example here.
I then tried to make the forecast on a monthly frequence, but the result was that I didn't have enough timestamps for the algorithm.
There is also weekly frequency, which you try out. However, DeepAR should also be able to handle NaN values.
Would it be possible to group my products within the DeepAR algorithm through the "cat" or the "dynamic_feat" operators, so that I would have less "NaN"-values in my targets?
Generally, cat is used to assign one or more categories to a time-series. However, I don't see how that should affect the number of NaN-values in your targets. Also, DeepAR does not emit NaNs in the prediction.
I would like to group the products by different features like color, price or size. Do you know if that is possible, or do I have to do that before I apply DeepAR?
Yes, that's what cat is for. The documentation explains how you can encode these category values.
Why do you have so many NaN values? Is it because of you not knowing the values of these days or because there were no sales on these days? Before feeding data into an (any) algorithm, you need to handle missing values. If you can replace your NaN values with zero values or any other imputation method, you will have fewer issues.
DeepAR is best when you add more categories ("cat" values) such as brand, color, size, and similar values. DeepAR is using these categories to calculate embeddings that are encoding the "meanings" of these categories as they affect the sales values. For example, if you have 10 colors that some of them are "girly" and some are "boyish", or some are "crazy" and some are "solid", the embedding calculation has the potential to capture these attributes and use them to improve the accuracy of the calculation.
Prices are different often as they can be elastic (applying discounts or promotions), and they should be represented as "dynamic_feat", and you should add their values for each day/month/other series frequency.
If your prices are static you can still use them as categories ("cat") by converting them into buckets, such as "high"/"medium"/"low". This is also a standard method when analyzing the features that you have and transform them to the capabilities and strengths of the algorithm that you are going to use. DeepAR, in this case, is good in encoding categories (static and low cardinality features) and calculating regression to numeric features with possible correlation with the target.

PowerPivot DAX MAXX Peformance Issue

I am building a data model with PowerPivot for Excel 2013 and need to be able to identify the max number of emails sent per person. The DAX formula below gives me the result that I looking for but performance is incredibly slow. Is there an alternative that will compute a maximum by group without the performance hit?
Maximum Emails per Constituent:
=MAXX(SUMMARIZE('Email Data','Email Data'[person_id],"MAX Value",
([Emails Sent]/[Unique Count People])),[MAX Value])
So, without the measure definitions for [Emails Sent] or [Unique Count People], it is not possible to give definitive advice on performance. I'm going to assume they are trivial measures, though, based on their names - note that this is an assumption and its truth will affect the rest of my post. That being said, there is an obvious optimization to make to start with.
Maximum Emails per Consultant:=
MAXX(
ADDCOLUMNS(
VALUES('Email Data'[person_id])
,"MAX Value"
,[Emails Sent] / [Unique Count People]
)
,[MAX Value]
)
I used the ADDCOLUMNS() rather than a SUMMARIZE() to calculate new columns. See this post for an explanation of the performance implications.
Additionally, since you're not grouping by multiple columns, there's no need to use SUMMARIZE(). The performance impact of using VALUES() instead should be minimal.
The other question that comes to mind is whether this needs to be a measure. Are you going to be slicing by other dimensions? If not, this becomes a static attribute of a [person_id] which could be calculated during ETL, or in a calculated column.
A final note - I've also been assuming that your model is optimal as well. Again, we'd need to see it to make comment on whether you could see performance issues from something you're doing there.

OLTP variable required as Dimension and Measure in OLAP?

Scenario
Designing a Star Diagram for an OLAP environment for the process Incident Management. Management requests to be able to both filter on SLA status (breached, achieved or in progress) and being able to calculate the percentage of sla achieved vs breached. Reporting will be done through in Excel/SSRS through SSAS (tabular).
Question
I’m reasonable inexperienced in designing for an OLAP environment. I know my idea will work but I’m concerned this is not the best approach.
My idea:
SLA needs to be both a measure and a dimension.
DimSLA
…
(Nullable bool) Sla Achieved -> Yes=True, No=False, and InProgress=NULL
…
FactIncident
…
(Nullable Integer) Sla Achieved Yes=1,No=0 and In Progress=NULL
…
Then in SSAS, publish a calculated percentage field which averages FactIncident-SlaAchieved.
Is this the right/advisable way to do it?
As you describe it, "SLA achieved" should be an attribute, as you want to classify by it, not sum it. The only thing you want to sum or aggregate would be other measures (maybe an incident count) under the condition that the "SLA achieved" attribute has certain values like "achieved" or "not achieved". This is the main rule in dimensional design: Things you use for classifying or break down are attributes, and things that you calculate are measures. There are a few cases where you need a column for both, but not many.
Do not just use a boolean value. Use a string value easily understand by users like the texts "SLA achieved", "SLA not achieved", "in progress". This makes it much more easy for non technical users to use the cube. In case you use this in a dimension table , there would be just three records with the strings, and the fact table would reference them with maybe a byte foreign key, hence more meaningful texts do not use up millions of bytes.

Multiplying Quantity * Price in Calculated Member

I know MDX is used for much more sophisticated math, so please forgive the simplistic scenario, but this is one of my first Calculated members.
When I multiply Price x Quantity, the AS cube's data browser has the correct information in the leaf elements, but not in any of the parents. The reason seems to be that I want something like (1 * 2) + (2 * 3) + (4 * 5) and not (7 * 10) which think I am getting as a result of how the Sum is done on columns.
Is the IsLeaf expression intended to be used in these circumstances? Or is there another way? If so, are there any examples as simple as this I can see?
This Calculated member that I tried to create is just this:
[Measures].[Price]*[Measures].[Quantity]
The result for a particular line item (the leaf) is correct. But the results for, say, all of april, is an incredibly high number.
Edit:
I am now considering that this might be an issue regarding bad data. It would be helpful though if someone could just confirm that the above calculated member should be work under normal circumstances.
Here it is a blog post dealing with this particular problem: Aggregating the Result of an MDX Calculation Using Scoped Assignments.
For leaf level computations resulting in something that can then be summed, MDX is rather complex and slow.
The simplest way to do what you want to achieve would be to make this a normal measure, based on the Price x Quantity calculation defined in the data source view.