I am editing my question which I want exactly.
I have two columns Actual Units, Future Units from Fact A and Fact B respectively but at same granular level.I also have Demand Units from Fact B
My requirement is :
1. Projected Units = Coalesce(Actual Units,Future Units)
2. Stock Units = IF(Projected Units > Demand Units,Demand Units,Projected
Units)
3. Stock Rate = (Stock Units/Demand Units)
I cannot join the two facts in the data source view level and do the
calculation there because they are a very huge tables, so I think the
performance would be very slow. If you say that doing the calculations at
the data source view level level is the only way we have, please let me
know.
Did you get this?
When calculating the grand total MDX is summing up A, summing up B, and then comparing them.
If you want the calculation to occur at the row level (checking whether B>A) then edit the Data Source View and add a new calculated column to the table your measure group is based upon. The calculated column should be:
CASE WHEN B>A THEN A ELSE B END
Then create a Sum measure based upon that new column.
This approach will perform much better compared to any completely MDX approach to calculating this at a very detailed grain. If your fact tables had 500,000 rows or less and you had a degenerate Dimension which was the same grain as the grain you need to calculate at, we could possibly do it in MDX. But since you are concerned with SQL query performance I am assuming the tables are big. Just remember that SQL is done once at processing time. MDX is calculated in every query at query time. So do expensive things in SQL when you can.
Related
I am trying to come up with some arithmetic calculations for some survey data. I want to do these calculations for a number of segments and want to figure out how to do it without writing numerous SELECT statements.
This is what I have so far:
FACT table. This tables holds survey data at a respondent level - for example, if a survey had 10 questions, this table will have 11 columns: a column to identify the respondent_ID and 10 other columns to identify the responses to those questions.
DIMENSION table. This table segments we want to view the survey data by at a respondent level - for example, if we want to view survey responses by membership_status and age_bracket, this table will have 3 columns: a column to identify the respondent_ID, and two columns to identify membership_status and age_bracket.
OUTPUT.
I want to get aggregate calculations to summarizes the responses to the survey overall and to each question. I also want to be able to get this information for all possible segments that exist in the DIMENSIONS table.
I can do the query below, however I'll need to do this for every segment:
SELECT
COUNT(DISTINCT(CASE WHEN f.QUESTION_1 IN ('8', '9', '10') THEN f.RESPONDENT_ID END))*1.0 / COUNT(DISTINCT(CASE WHEN f.QUESTION_1 IS NOT NULL THEN f.RESPONDENT_ID END))*1.0 AS CSAT_1
FROM FACT f
JOIN DIMENSION d ON f.RESPONDENT_ID = d.RESPONDENT_ID
WHERE d.MEMBERSHIP_STATUS = 'ACTIVE'
The calculation above gives us something called a top 3 box. That is just one calculation, I will need to do many of them. Additionally, ever calculation will need to be done for each segment. In order to get a calculation for nonactive members, I would need to run another query and set d.MEMBERSHIP_STATUS = 'INACTIVE' and I would need to run another query with no filter, to get the overall calculation.
Is there a way I could store all my arithmetic calculations needed in my output as a function (maybe in a temp table or something) - my thought is that it'll be better to set the functions somewhere, and then when I need to calculate the output, I would some how call the function to do all the calculations I need, and give me all the calculations for every segment I have?
I can't fully envision how to get there, or if this is even a good solution, so guidance and detailed SQL code would be extremely helpful.Examples please!
I'm looking for advice on how to optimize a multi-level DAX summarize query. This one is very slow because, I think, it is running O(n^3) because of the nesting. Unfortunately, i need to have several levels because the hierarchy levels Order > Order Line > Order Detail need to be calculated differently.
Units need to sum up to the Detail level
That needs to be averaged up to the Line level
That needs to be summed up to the Order level
SUMX(
SUMMARIZE(
'FACT Opportunity'
,Opportunity[LineId]
,"Units"
,AVERAGEX(
SUMMARIZE(
'FACT Opportunity'
,Opportunity[DetailId]
,"SumDetail"
,SUM('FACT Opportunity'[Units])
)
,[SumDetail]
)
)
,[Units]
)
Any help or advice you could provide would be very much appreciated.
It's very hard to provide an optimisation advise without seeing the data and data model (it'd be great if they were included in the question).
The key issue here is that the presence of the duplicates makes fact "Units" non-additive, meaning that you can't simply roll it up the hierarchy. As a result, you are forced to do a very expensive triple-looping.
An obvious solution then is to make "Units" fully additive. You can compute de-duplicated (adjusted for duplicates) Units and store them in fact Opportunity, as a calculated column:
Adjusted Units =
DIVIDE (
'FACT Opportunity'[Units],
CALCULATE ( COUNT ( 'FACT Opportunity'[DetailId] ) )
)
Here, you divide Units by the number of unique DetailIDs (usually, it will be 1, but in case of duplicate DetailIDs it will be 2, etc).
This calculated column will increase your data loading time a bit, but save a lot of query time. To further optimize, consider pre-calculating it in a data warehouse.
The adjusted Units are fully additive, so you dax is now simple:
Total Units = SUM('FACT Opportunity'[Adjusted Units])
It should work correctly on any level of the Order > Line > Detail hierarchy (unless there are additional problems not described in the question), and it should be fast.
I have this DAX formula that gives me a count of id that appear on the fact table in a month, averaged over the year. I can put this measure is a table ad it's unpacked by row with no issues (by adding variables from dimensions)
Measure:= AVERAGEX(
SUMMARIZE(
CALCULATETABLE(fact_table;FILTER('Time_Dimension';'Time_Dimension'[Last_month] <> "LAST"));
Time_Dimension[Month Name];
"Count";DISTINCTCOUNT(fact_table[ID])
);
[Count]
)
But it's terrible slow (I have 3 measures like this on a single table) and the fact table is big (like 300Million rows big)
I was reading that SUMMARIZE perform really bad with aggregations and It should be replaced with SUMMARIZECOLUMNS.
I wrote this formula
Measure_v2:= AVERAGEX(
SUMMARIZECOLUMNS(
Time_Dimension[Month Name];
FILTER(Time_Dimension;
Time_Dimension[Month Name]<>"LAST"
);
"Count";DISTINCTCOUNT(fact_table[ID])
)
[Count]
)
And it works when I visualize the measure as it is, but when I try to put it in a context (like the table above) it gives me the error "Can't use SUMMARIZECOLUMN and ADDMISSINGITEMS() in this context" How can I make a sustainable optimization from the original SUMMARIZE function?
Before optimizing SUMMARIZE, I would re-visit the overall approach. If your goal is to calculate average fact count per year-month, there is a simpler (and faster) way.
[ID Count]:=CALCULATE(COUNT('fact_table'[ID]),'Time_Dimension'[Last_month] <> "LAST")
[Average ID Count]:=AVERAGEX( VALUES('Time_Dimension'[Year_Month]), [ID Count`])
assuming that:
you have year-month attribute in your time dimension;
IDs in your fact table are unique (and therefore, simple count is
enough)
If this solution does not solve your problem, then please post your data model - it's hard to optimize without knowing the data structure.
On a side note, I would remove ID field from the fact table. It adds no value to the model, and consumes huge amounts of memory. Your objective can be achieved by simply counting rows:
[Fact Count]:=CALCULATE(COUNTROWS('fact_table'),'Time_Dimension'[Last_month] <> "LAST")
I need to create a weighted average that multiplies a column of volume manufactured for multiple manufacturing plants by a column containing the cost to manufacture at each plant, and returns one weighted average value for a specific product type for all plants.
I've tried adding this as a calculated column using:
=sumx('Plant','Plant'[Cost]*'Plant'[Tonnage])/sum('Plant'[Tonnage])
But this goes row by row, so it doesn't give me the full over riding average that I need for the company. I can aggregate the data, but really want to see the average lined up against individual plant for benchmarking
Any ideas how I can do this?
You can do this in multiple ways. You can either make a single more complex calculation, or you can make a few calculated columns to make the final calculation more transparent. I will pick the latter approach here, because it is more easy to show what is going on. I'm going to use the following DAX functions: CALCULATE, SUM, and ALLEXCEPT.
First, create three new calculated columns.
The first one should contain the [Volume] times [Cost] for each record:
VolumeTimesCost:=[Volume] * [Cost]
The second one should contain the sum of [VolumeTimesCost] for all plants within a given product type. It could look like this:
TotalProductTypeCost:=CALCULATE(SUM([VolumeTimesCost]),ALLEXCEPT([Product Type]))
Using the ALLEXCEPT([Product Type]) removes the filter from all other columns than the [Product Type] column.
The third calculated column should contain the SUM of [Volume] for all plants within a given product type. It could look like this:
TotalProductTypeVolume:=CALCULATE(SUM([Volume]),ALLEXCEPT([Product Type]))
You can then create your measure based on the two calculated columns [TotalProductTypeCost] and [TotalProductTypeVolume].
I hope that helps you solve the issue correctly. Otherwise feel free to let me know!
I'm building a cube in MS BIDS. I need to create a calculated measure that returns the weighted-average of the rank value weighted by the number of searches. I want this value to be calculated at any level, no matter what dimensions have been applied to break-down the data.
I am trying to do something like the following:
I have one measure called [Rank Search Product] which I want to apply at the lowest level possible and then sum all values of it
IIf([Measures].[Searches] IS NOT NULL, [Measures].[Rank] * [Measures].[Searches], NULL)
And then my weighted average measure uses this:
IIf([Measures].[Rank Search Product] IS NOT NULL AND SUM([Measures].[Searches]) <> 0,
SUM([Measures].[Rank Search Product]) / SUM([Measures].[Searches]),
NULL)
I'm totally new to writing MDX queries and so this is all very confusing to me. The calculation should be
([Rank][0]*[Searches][0] + [Rank][1]*[Searches][1] + [Rank][2]*[Searches][2] ...)
/ SUM([searches])
I've also tried to follow what is explained in this link http://sqlblog.com/blogs/mosha/archive/2005/02/13/performance-of-aggregating-data-from-lower-levels-in-mdx.aspx
Currently loading my data into a pivot table in Excel is return #VALUE! for all calculations of my custom measures.
Please halp!
First of all, you would need an intermediate measure, lets say Rank times Searches, in the cube. The most efficient way to implement this would be to calculate it when processing the measure group. You would extend your fact table by a column e. g. in a view or add a named calculation in the data source view. The SQL expression for this column would be something like Searches * Rank. In the cube definition, you would set the aggregation function of this measure to Sum and make it invisible. Then just define your weighted average as
[Measures].[Rank times Searches] / [Measures].[Searches]
or, to avoid irritating results for zero/null values of searches:
IIf([Measures].[Searches] <> 0, [Measures].[Rank times Searches] / [Measures].[Searches], NULL)
Since Analysis Services 2012 SP1, you can abbreviate the latter to
Divide([Measures].[Rank times Searches], [Measures].[Searches], NULL)
Then the MDX engine will apply everything automatically across all dimensions for you.
In the second expression, the <> 0 test includes a <> null test, as in numerical contexts, NULL is evaluated as zero by MDX - in contrast to SQL.
Finally, as I interpret the link you have in your question, you could leave your measure Rank times Searches on SQL/Data Source View level to be anything, maybe just 0 or null, and would then add the following to your calculation script:
({[Measures].[Rank times Searches]}, Leaves()) = [Measures].[Rank] * [Measures].[Searches];
From my point of view, this solution is not as clear as to directly calculate the value as described above. I would also think it could be slower, at least if you use aggregations for some partitions in your cube.