I'm attempting to speed up a current script.
I feel I may have over complicated the extraction of the days relevant to the script. Currently I'm using EXISTS across levels of the date dimension. Also I am creating a custom set:
WITH
SET [Last24Mths] AS
Tail
(
{[Date].[Calendar].[Month].MEMBERS}
,25
)
SET [TargetDays] AS
Union
(
Exists
(
{[Date].[Calendar].[Date].MEMBERS}
,SubSet
(
[Last24Mths]
,24
,1
)
)
,Exists
(
{[Date].[Calendar].[Date].MEMBERS}
,SubSet
(
[Last24Mths]
,23
,1
)
)
,Exists
(
{[Date].[Calendar].[Date].MEMBERS}
,SubSet
(
[Last24Mths]
,22
,1
)
)
,Exists
(
{[Date].[Calendar].[Date].MEMBERS}
,SubSet
(
[Last24Mths]
,0
,1
)
)
,Exists
(
{[Date].[Calendar].[Date].MEMBERS}
,SubSet
(
[Last24Mths]
,12
,1
)
)
)
SELECT
{} ON 0
,
{[TargetDays]} ON 1
FROM [Adventure Works]
;
In the context of the complete query there is no reason why this set needs to be calculated using custom sets so in my 'improved' script I've moved this out of the WITH. Also as really we are using different levels of the same hierarchy I've switched to the use of DESCENDANTS. Is the following the best I can do in terms of performance? Actually, is the following even actually more efficient?
SELECT
{} ON 0
,
DESCENDANTS(
{
Tail({[Date].[Calendar].[Month].MEMBERS}),
Tail({[Date].[Calendar].[Month].MEMBERS}).ITEM(0).LAG(1),
Tail({[Date].[Calendar].[Month].MEMBERS}).ITEM(0).LAG(2),
Tail({[Date].[Calendar].[Month].MEMBERS}).ITEM(0).LAG(12),
Tail({[Date].[Calendar].[Month].MEMBERS}).ITEM(0).LAG(24)
}
,[Date].[Calendar].[Date]
)
ON 1
FROM [Adventure Works]
;
I would think that for all calculations that can be done purely within the date dimension, efficiency is normally not critical, as there are just a little more than one thousand members in the whole dimension. Hence it is important to not need measure groups etc. to do the calculations, but once you did this, you can normally stop optimizing.
Having said that, you can optimize your expression slightly: You can combine the first three Tail expressions to one:
{
Tail({[Date].[Calendar].[Month].MEMBERS}),
Tail({[Date].[Calendar].[Month].MEMBERS}).ITEM(0).LAG(1),
Tail({[Date].[Calendar].[Month].MEMBERS}).ITEM(0).LAG(2),
Tail({[Date].[Calendar].[Month].MEMBERS}).ITEM(0).LAG(12),
Tail({[Date].[Calendar].[Month].MEMBERS}).ITEM(0).LAG(24)
}
is equivalent to
{
Tail({[Date].[Calendar].[Month].MEMBERS}, 3),
Tail({[Date].[Calendar].[Month].MEMBERS}).ITEM(0).LAG(12),
Tail({[Date].[Calendar].[Month].MEMBERS}).ITEM(0).LAG(24)
}
And if you need the last month in many queries, it might make sense to create a set on cube level
CREATE SET [LastMonth] as Tail({[Date].[Calendar].[Month].MEMBERS})
and then reference it in your calculations:
{
Tail({[Date].[Calendar].[Month].MEMBERS}, 3),
[LastMonth].ITEM(0).LAG(12),
[LastMonth].ITEM(0).LAG(24)
}
This could improve caching, but I am not absolutely sure about that. At least it might make your calculations slightly more readable - and developer performance is something you should not forget completely ;-) Sometimes that is still more important than software performance.
Related
So this question goes off the one here
I've been trying to do a similar count measure and I did the suggested solution but it's still running.... and it's been more than 30 minutes with no results, while without that it runs in under a minute. So am I missing something? Any guidance would help. Here is my query:
WITH
MEMBER [Measures].[IteractionCount] AS
NONEMPTY
(
FILTER
(
([DimInteraction].[InteractionId].[ALL].Children,
[Measures].[Impression Count]),
[DimInteraction].[Interaction State].&[Enabled]
)
).count
SELECT
(
{[Measures].[IteractionCount],
[Measures].[Impression Count]}
)
ON COLUMNS,
(
([DimCampaign].[CampaignId].[CampaignId].MEMBERS,
[DimCampaign].[Campaign Name].[Campaign Name].MEMBERS,
[DimCampaign].[Owner].[Owner].MEMBERS)
,[DimDate].[date].[date].MEMBERS
)
ON ROWS
FROM
(
SELECT
(
{[DimDate].[date].&[2020-05-06T00:00:00] : [DimDate].[date].&[2020-05-27T00:00:00]}
)
ON COLUMNS
FROM [Model]
)
WHERE
(
{[DimCampaign].[EndDate].&[2020-05-27T00:00:00]:NULL},
[DimCampaign].[Campaign State].&[Active],
{[DimInteraction].[End Date].&[2020-05-27T00:00:00]:NULL}//,
//[DimInteraction].[Interaction State].&[Enabled]
)
I don't know if FILTER is affecting it in any way but I tried it with and without and it still runs forever. I do need it specifically filtered to [DimInteraction].[Interaction State].&[Enabled]. I have also tried to instead filter to that option in the WHERE clause but no luck
Any suggestions to optimize this would be greatly appreciated! thanks!
UPDATE:
I end up using this query to load data into a python dataframe. Here is my code for that. I used this script for connecting and loading the data. I had to make some edits to it though to use windows authentication.
ssas_api._load_assemblies() #this uses Windows Authentication
conn = ssas_api.set_conn_string(server='server name',db_name='db name')
df = ssas_api.get_DAX(connection_string=conn, dax_string=query))
The dax_string parameter is what accepts the dax or mdx query to pull from the cube.
Please try this optimization:
WITH
MEMBER [Measures].[IteractionCount] AS
SUM
(
[DimInteraction].[InteractionId].[InteractionId].Members
* [DimInteraction].[Interaction State].&[Enabled],
IIF(
IsEmpty([Measures].[Impression Count]),
Null,
1
)
)
SELECT
(
{[Measures].[IteractionCount],
[Measures].[Impression Count]}
)
ON COLUMNS,
(
([DimCampaign].[CampaignId].[CampaignId].MEMBERS,
[DimCampaign].[Campaign Name].[Campaign Name].MEMBERS,
[DimCampaign].[Owner].[Owner].MEMBERS)
,[DimDate].[date].[date].MEMBERS
)
PROPERTIES MEMBER_CAPTION
ON ROWS
FROM
(
SELECT
(
{[DimDate].[date].&[2020-05-06T00:00:00] : [DimDate].[date].&[2020-05-27T00:00:00]}
)
ON COLUMNS
FROM [Model]
)
WHERE
(
{[DimCampaign].[EndDate].&[2020-05-27T00:00:00]:NULL},
[DimCampaign].[Campaign State].&[Active],
{[DimInteraction].[End Date].&[2020-05-27T00:00:00]:NULL}//,
//[DimInteraction].[Interaction State].&[Enabled]
)
CELL PROPERTIES VALUE
If that doesn’t perform well the please describe the number of rows returned by your query when you comment out IteractionCount (sic) from the columns axis. And please describe how many unique InteractionId values you have.
I'm thinking my problem is coming from not understanding the order of operations DAX uses because its super confusing to me, but here is my problem:
I have a fairly simple query using SUMMARIZE that basically pulls in a field I want and then calculates several metrics - each of which are filtered within the CALCULATE function. But I want to exclude all rows that end up with a null value due to no data being available at that level. I'm not on my computer with the code or I would put exactly what I have here, but a simplified version is basically like this:
EVALUATE
FILTER(
SUMMARIZE(
'Fact Table',
FieldTable[Field1],
'Metric1',
CALCULATE(
[MetricA]
FILTER(
'Fact Table'
[MetricA],
[Year] = 2017
)
),
'Metric2',
CALCULATE(
[MetricB]
FILTER(
'Fact Table'
[MetricA],
[Year] = 2016
)
)
),
NOT(ISNULL([Metric1]))
)
Hopefully I got all that right. I'm don't have DAX studio in front of me to fix my problems so there might be minor errors, but hopefully you get the gist of it.
The problem is that this returns a blank table. If I take out the FILTER surrounding the SUMMARIZE function then everything works exactly like I want it to, except it brings in a ton of blank rows, which is what I'm trying to eliminate with the FILTER on the outside. Any thoughts on how to do that?
I don't know if this is the best solution, but I figured out a solution. Basically, add 0 to my measures and then change the filter to filter on <> 0.
EVALUATE
FILTER(
SUMMARIZE(
'Fact Table',
FieldTable[Field1],
'Metric1',
CALCULATE(
[MetricA]
FILTER(
'Fact Table'
[MetricA],
[Year] = 2017
)
) + 0,
'Metric2',
CALCULATE(
[MetricB]
FILTER(
'Fact Table'
[MetricA],
[Year] = 2016
)
)
),
[Metric1]<>0
)
Good Day all.
I really hope someone can assist with this. The following code works great in DaxStudio and returns a topn table.
evaluate
TOPN(10,SUMMARIZE(factDailyPlay,factDailyPlay[PlayerAccountNumber],"Top10",SUM(factDailyPlay[ActualWin])),[Top10],0)
What I am trying to return in my model though is sum of those top 10 values as a single scalar value of that topn table.
I keep getting the following error.
The expression refers to multiple columns. Multiple columns cannot be converted to a scalar value.
Thanks
Try using:
EVALUATE
ROW (
"Total", SUMX (
TOPN (
10,
SUMMARIZE (
factDailyPlay,
factDailyPlay[PlayerAccountNumber],
"Top10", SUM ( factDailyPlay[ActualWin] )
),
[Top10], 0
),
[Top10]
)
)
Basically the below expression calculates the sum you require.
SUMX (
TOPN (
10,
SUMMARIZE (
factDailyPlay,
factDailyPlay[PlayerAccountNumber],
"Top10", SUM ( factDailyPlay[ActualWin] )
),
[Top10], 0
),
[Top10]
)
I created a query which calculates variance level on sales and the percentage on the variance. However, my query returns NULL whenever I try to test against the CostSales measure!
IIF(
IsEMPTY(
(
ParallelPeriod
(
[Time].[CalendarSales].[CalendarYear],
1,
[Time].[CalendarSales].CurrentMember
),[Measures].[CostSales]
)
)
OR
(
ParallelPeriod
(
[Time].[CalendarSales].[CalendarYear],
1,
[Time].[CalendarSales].CurrentMember
),[Measures].[CostSales]
)=0
,
0,
[Measures].[ParallelPeriodFactSalesVariance]/[Measures].[ParellelPeriodFactSales]
)
Any idea of what I am doing wrong here?
If you are not getting 0 returned then I'd assume that there is a problem with [Measures].[ParallelPeriodFactSalesVariance]/[Measures].[ParellelPeriodFactSales]
Two diagnostics you can run are:
1.Change the whole measure to this:
IIF(
IsEMPTY(
(
ParallelPeriod
(
[Time].[CalendarSales].[CalendarYear],
1,
[Time].[CalendarSales].CurrentMember
),[Measures].[CostSales]
)
)
OR
(
ParallelPeriod
(
[Time].[CalendarSales].[CalendarYear],
1,
[Time].[CalendarSales].CurrentMember
),[Measures].[CostSales]
)=0
,
0,
999
)
I'd imagine 999 is being returned. If it is then try changing the custom member to just this:
[Measures].[ParallelPeriodFactSalesVariance]/[Measures].[ParellelPeriodFactSales]
Is NULL now being returned? So this is the problem - but as Greg's comment suggests we need to see the code for these two measures - also, as context is important, we could do with seeing the full script where your code is being used.
I'm using Oracle and I want to replace null values with 0 but it does not work. This is my query:
with pivot_data as (
select (NVL(d.annulation,0)) as annulation ,
d.id_cc1_annulation as nature ,
t.mois as mois
from datamart_cnss d , ref_temps t
where t.id_temps = d.id_temps
)
select * from pivot_data
PIVOT ( sum(annulation)
for nature in (2 as debit ,1 as redit)
)
order by mois asc;
I guess (because of missing example), your idea is to show no nulls in the result after pivoting. If so, your query (I guess) doesn't always return both nature values 1 and 2.
The NVL operator in this case works fine, but you put it in a wrong place. This is the place which generates your NULL because of no rows found for the given criteria:
PIVOT ( sum(annulation)
If you enhance this sum result with the love of NVL - I am pretty sure it will work as you expect.