I have this DAX formula that gives me a count of id that appear on the fact table in a month, averaged over the year. I can put this measure is a table ad it's unpacked by row with no issues (by adding variables from dimensions)
Measure:= AVERAGEX(
SUMMARIZE(
CALCULATETABLE(fact_table;FILTER('Time_Dimension';'Time_Dimension'[Last_month] <> "LAST"));
Time_Dimension[Month Name];
"Count";DISTINCTCOUNT(fact_table[ID])
);
[Count]
)
But it's terrible slow (I have 3 measures like this on a single table) and the fact table is big (like 300Million rows big)
I was reading that SUMMARIZE perform really bad with aggregations and It should be replaced with SUMMARIZECOLUMNS.
I wrote this formula
Measure_v2:= AVERAGEX(
SUMMARIZECOLUMNS(
Time_Dimension[Month Name];
FILTER(Time_Dimension;
Time_Dimension[Month Name]<>"LAST"
);
"Count";DISTINCTCOUNT(fact_table[ID])
)
[Count]
)
And it works when I visualize the measure as it is, but when I try to put it in a context (like the table above) it gives me the error "Can't use SUMMARIZECOLUMN and ADDMISSINGITEMS() in this context" How can I make a sustainable optimization from the original SUMMARIZE function?
Before optimizing SUMMARIZE, I would re-visit the overall approach. If your goal is to calculate average fact count per year-month, there is a simpler (and faster) way.
[ID Count]:=CALCULATE(COUNT('fact_table'[ID]),'Time_Dimension'[Last_month] <> "LAST")
[Average ID Count]:=AVERAGEX( VALUES('Time_Dimension'[Year_Month]), [ID Count`])
assuming that:
you have year-month attribute in your time dimension;
IDs in your fact table are unique (and therefore, simple count is
enough)
If this solution does not solve your problem, then please post your data model - it's hard to optimize without knowing the data structure.
On a side note, I would remove ID field from the fact table. It adds no value to the model, and consumes huge amounts of memory. Your objective can be achieved by simply counting rows:
[Fact Count]:=CALCULATE(COUNTROWS('fact_table'),'Time_Dimension'[Last_month] <> "LAST")
Related
I'm looking for advice on how to optimize a multi-level DAX summarize query. This one is very slow because, I think, it is running O(n^3) because of the nesting. Unfortunately, i need to have several levels because the hierarchy levels Order > Order Line > Order Detail need to be calculated differently.
Units need to sum up to the Detail level
That needs to be averaged up to the Line level
That needs to be summed up to the Order level
SUMX(
SUMMARIZE(
'FACT Opportunity'
,Opportunity[LineId]
,"Units"
,AVERAGEX(
SUMMARIZE(
'FACT Opportunity'
,Opportunity[DetailId]
,"SumDetail"
,SUM('FACT Opportunity'[Units])
)
,[SumDetail]
)
)
,[Units]
)
Any help or advice you could provide would be very much appreciated.
It's very hard to provide an optimisation advise without seeing the data and data model (it'd be great if they were included in the question).
The key issue here is that the presence of the duplicates makes fact "Units" non-additive, meaning that you can't simply roll it up the hierarchy. As a result, you are forced to do a very expensive triple-looping.
An obvious solution then is to make "Units" fully additive. You can compute de-duplicated (adjusted for duplicates) Units and store them in fact Opportunity, as a calculated column:
Adjusted Units =
DIVIDE (
'FACT Opportunity'[Units],
CALCULATE ( COUNT ( 'FACT Opportunity'[DetailId] ) )
)
Here, you divide Units by the number of unique DetailIDs (usually, it will be 1, but in case of duplicate DetailIDs it will be 2, etc).
This calculated column will increase your data loading time a bit, but save a lot of query time. To further optimize, consider pre-calculating it in a data warehouse.
The adjusted Units are fully additive, so you dax is now simple:
Total Units = SUM('FACT Opportunity'[Adjusted Units])
It should work correctly on any level of the Order > Line > Detail hierarchy (unless there are additional problems not described in the question), and it should be fast.
I see the problems I have are of nature 'filter one fact table measure based on the value of a column in the same fact table'.
I have a cube with measure called Report. ‘Calls’ and ‘Failure’ are columns of this measure. There is a dimension called ‘trial’. I have to write few new calculations in SSAS Cube.
Sum([measure].[calls] ) only if failure = 1 and value of [Trial categories].[Trail].&[1].
I din’t get the desired result using filter. So I created a new column in the fact table ‘calls_if_failure’ = ‘calls’ or 0 depending on the value of failure column.
And then in the calculated column I used sum([calls_if_failure], [Trial categories].[Trail].&[1]) . Is this the only way to do this?
Now I have many more requirements of the nature ->
Sum([measure].[calls] ) only if [measure].[visit] = 1
Should I take the same approach as before to arrive at the solution? If yes, then this would mean many more columns in fact table.
Appreciate any help. Thank You.
Struggling to understand the question a little but maybe the following will at least help to give you some ideas:
SUM (
IIF (
[Trial categories].CURRENTMEMBER IS [Trial categories].[Trail].&[1]
, [measure].[calls]
, NULL
)
)
I am editing my question which I want exactly.
I have two columns Actual Units, Future Units from Fact A and Fact B respectively but at same granular level.I also have Demand Units from Fact B
My requirement is :
1. Projected Units = Coalesce(Actual Units,Future Units)
2. Stock Units = IF(Projected Units > Demand Units,Demand Units,Projected
Units)
3. Stock Rate = (Stock Units/Demand Units)
I cannot join the two facts in the data source view level and do the
calculation there because they are a very huge tables, so I think the
performance would be very slow. If you say that doing the calculations at
the data source view level level is the only way we have, please let me
know.
Did you get this?
When calculating the grand total MDX is summing up A, summing up B, and then comparing them.
If you want the calculation to occur at the row level (checking whether B>A) then edit the Data Source View and add a new calculated column to the table your measure group is based upon. The calculated column should be:
CASE WHEN B>A THEN A ELSE B END
Then create a Sum measure based upon that new column.
This approach will perform much better compared to any completely MDX approach to calculating this at a very detailed grain. If your fact tables had 500,000 rows or less and you had a degenerate Dimension which was the same grain as the grain you need to calculate at, we could possibly do it in MDX. But since you are concerned with SQL query performance I am assuming the tables are big. Just remember that SQL is done once at processing time. MDX is calculated in every query at query time. So do expensive things in SQL when you can.
I'm building a cube in MS BIDS. I need to create a calculated measure that returns the weighted-average of the rank value weighted by the number of searches. I want this value to be calculated at any level, no matter what dimensions have been applied to break-down the data.
I am trying to do something like the following:
I have one measure called [Rank Search Product] which I want to apply at the lowest level possible and then sum all values of it
IIf([Measures].[Searches] IS NOT NULL, [Measures].[Rank] * [Measures].[Searches], NULL)
And then my weighted average measure uses this:
IIf([Measures].[Rank Search Product] IS NOT NULL AND SUM([Measures].[Searches]) <> 0,
SUM([Measures].[Rank Search Product]) / SUM([Measures].[Searches]),
NULL)
I'm totally new to writing MDX queries and so this is all very confusing to me. The calculation should be
([Rank][0]*[Searches][0] + [Rank][1]*[Searches][1] + [Rank][2]*[Searches][2] ...)
/ SUM([searches])
I've also tried to follow what is explained in this link http://sqlblog.com/blogs/mosha/archive/2005/02/13/performance-of-aggregating-data-from-lower-levels-in-mdx.aspx
Currently loading my data into a pivot table in Excel is return #VALUE! for all calculations of my custom measures.
Please halp!
First of all, you would need an intermediate measure, lets say Rank times Searches, in the cube. The most efficient way to implement this would be to calculate it when processing the measure group. You would extend your fact table by a column e. g. in a view or add a named calculation in the data source view. The SQL expression for this column would be something like Searches * Rank. In the cube definition, you would set the aggregation function of this measure to Sum and make it invisible. Then just define your weighted average as
[Measures].[Rank times Searches] / [Measures].[Searches]
or, to avoid irritating results for zero/null values of searches:
IIf([Measures].[Searches] <> 0, [Measures].[Rank times Searches] / [Measures].[Searches], NULL)
Since Analysis Services 2012 SP1, you can abbreviate the latter to
Divide([Measures].[Rank times Searches], [Measures].[Searches], NULL)
Then the MDX engine will apply everything automatically across all dimensions for you.
In the second expression, the <> 0 test includes a <> null test, as in numerical contexts, NULL is evaluated as zero by MDX - in contrast to SQL.
Finally, as I interpret the link you have in your question, you could leave your measure Rank times Searches on SQL/Data Source View level to be anything, maybe just 0 or null, and would then add the following to your calculation script:
({[Measures].[Rank times Searches]}, Leaves()) = [Measures].[Rank] * [Measures].[Searches];
From my point of view, this solution is not as clear as to directly calculate the value as described above. I would also think it could be slower, at least if you use aggregations for some partitions in your cube.
I have a query to pull clickthrough for a funnel, where if a user hit a page it records as "1", else NULL --
SELECT datestamp
,COUNT(visits) as Visits
,count([QE001]) as firstcount
,count([QE002]) as secondcount
,count([QE004]) as thirdcount
,count([QE006]) as finalcount
,user_type
,user_loc
FROM
dbname.dbo.loggingtable
GROUP BY user_type, user_loc
I want to have a column for each ratio, e.g. firstcount/Visits, secondcount/firstcount, etc. as well as a total (finalcount/Visits).
I know this can be done
in an Excel PivotTable by adding a "calculated field"
in SQL by grouping
in PowerPivot by adding a CalculatedColumn, e.g.
=IFERROR(QueryName[finalcount]/QueryName[Visits],0)
BUT I need give the report consumer the option of slicing by just user_type or just user_loc, etc, and excel will tend to ADD the proportions, which won't work b/c
SUM(A/B) != SUM(A)/SUM(B)
Is there a way in DAX/MDX/PowerPivot to add a calculated column/measure, so that it will be calculated as SUM(finalcount)/SUM(Visits), for any user-defined subset of the data (daterange, user type, location, etc.)?
Yes, via calculated measures. calculated columns are for creating values that you want to see on rows/columns/report header...calculated measures are for creating values that you want to see in the values section of a pivot table and can slice/dice by the columns in the model.
The easiest way would be to create 3 calculated "measures" in the calculation area of the powerpivot sheet.
TotalVisits:=SUM(QueryName[visits])
TotalFinalCount:=SUM(QueryName[finalcount])
TotalFinalCount2VisitsRatio:=[TotalFinalCount]/[TotalVisits]
You can then slice the calculated measure [TotalFinalCount2VisitsRatio] by user_type or just user_loc (or whatever) and the value will be calculated correctly. The difference here is that you are explicitly telling the xVelocity engine to SUM-then-DIVIDE. If you create the calculated column, then the engine thinks you want to DIVIDE-then-SUM.
Also, you don't have to break down the measure into 3 separate measures...it's just good practice. If you're interested in learning more, I'd recommend this book...the author is the PowerPivot/DAX guru and the book is very straightforward.