Last Available value MDX - ssas

I have a requirement where in i am to extract data from a cube, within the SSRS dataset using the query builder ,with the time dimension in the result set, across a range of dates. The conditions are
The measures are to be displayed for each day of the date range.
The sub total row should have the last available measures value for that time range.
There is a time filter (currently a single date filter with a multi select option).
my MDX is as below.
The measure has a 'Sum' as the aggregation type.
I have a calculated measure with the scope defined as below.
SCOPE([MEASURES].[Measure1]);
SCOPE([Date].[Date].MEMBERS);
THIS = TAIL(EXISTING ([Date].[Date].MEMBERS),1).ITEM(0) ;
END SCOPE;
END SCOPE;
This above scope statement works perfectly. however, when i select in more that one date member this query slows WAYYYYYYY down. Performance numbers are
Across 1 date - 4 seconds
Across 2 dates - 22 minutes
Across 3 dates - unknown (in Hours)
This drastic degradation in performance goes away if i remove the scope statement, which makes me thing that there should be a better way to do the same. the final report query is as below.
SELECT
NON EMPTY
{[Measures].[Measure1]} ON COLUMNS
,NON EMPTY
{ [Dimension1].[Dimension1].[Dimension1].ALLMEMBERS*
[Dimension2].[Dimension2].[Dimension2].ALLMEMBERS*
[Dimension3].[Dimension3].[Dimension3].ALLMEMBERS*
[Date].[Date].[Date].ALLMEMBERS
} ON ROWS
FROM (
SELECT {[Date].[Date].&[2014-06-13T00:00:00]
,[Date].[Date].&[2014-06-16T00:00:00] } ON COLUMNS
FROM [Cube]
)
So the question again is, Is there a way to do the last available value part of the scope statement so as to have a better performance? Also, if there is another way to write the final mdx that would help the performance?.
Please let me know if there are anythings unclear regarding the question.
Thanks
Srikanth

The first optimization step would be to change your query to
SELECT
NON EMPTY
{[Measures].[Measure1]} ON COLUMNS
,NON EMPTY
{ [Dimension1].[Dimension1].[Dimension1].ALLMEMBERS*
[Dimension2].[Dimension2].[Dimension2].ALLMEMBERS*
[Dimension3].[Dimension3].[Dimension3].ALLMEMBERS*
{[Date].[Date].&[2014-06-13T00:00:00], [Date].[Date].&[2014-06-16T00:00:00] }
} ON ROWS
FROM [Cube]
Furthermore, I am not sure why you added the SCOPE([Date].[Date].MEMEBER); (probably Date].[Date].MEMBERS, actually). Maybe it helps to omit it and the corresponding END SCOPE.

Related

MDX - Use Aggregate on a CROSSJOINed multi-dimensional set to return a grand total row

I raised a ticket on grand totals when filtering ages ago and I used the proposed solution yesterday to get one dimension working:
WITH
SET [FilteredSet] AS
{
FILTER([CDR].[CDR].Members,([CDR].[CDR].CURRENTMEMBER.MEMBERVALUE="70648") OR ([CDR].[CDR].CURRENTMEMBER.MEMBERVALUE="70646"))
}
MEMBER [CDR].[CDR].[Test] AS Aggregate([FilteredSet])
SELECT NON EMPTY {[Measures].[CS01.TenorTotal] } ON COLUMNS, NON EMPTY {[FilteredSet], [CDR].[CDR].[Test]} ON ROWS FROM [TraderCube] WHERE ([Date].[Date].[2022-04-01],[Epoch].[Epoch].[Branch].[master].[17])
(Other ticket with useful info)
MDX - Grand total missing when filtering
And as I expand the test query (to get it all working and I then expand the C# code to add it) I note that this works:
WITH
SET [FilteredSet] AS
{
FILTER(CROSSJOIN(HIERARCHIZE(DRILLDOWNLEVEL([CDR].[CDR].[ALL].[AllMember]),POST),HIERARCHIZE(DRILLDOWNLEVEL([Book].[Book].[ALL].[AllMember]),POST)),(LEFT([CDR].[CDR].CURRENTMEMBER.MEMBERVALUE,2)="66") )
}
SELECT NON EMPTY {[Measures].[CS01.TenorTotal] } ON COLUMNS, NON EMPTY {[FilteredSet]} ON ROWS FROM [TraderCube] WHERE ([Date].[Date].[2022-04-01],[Epoch].[Epoch].[Branch].[master].[17])
But we will often filter on the 2nd (or 3rd, 4th etc) dimension, like so and the total (AllMember) is filtered out from Book as it does match the INSTR comparisons.
How do I get my Aggregate MEMBER to work properly here? I've placed [?] in position as I can get this sort of setup to work on one dimension (E.g. [CDR] as shown in first code block) but here the query returns [CDR] and then the [Book] members that match and I want to 'glue' my Agregate MEMBER [?] of the measure as a grand total.
WITH
SET [FilteredSet] AS
{
FILTER(CROSSJOIN(HIERARCHIZE(DRILLDOWNLEVEL([CDR].[CDR].[ALL].[AllMember]),POST),HIERARCHIZE(DRILLDOWNLEVEL([Book].[Book].[ALL].[AllMember]),POST)),(LEFT([CDR].[CDR].CURRENTMEMBER.MEMBERVALUE,2)="66")
AND (INSTR([Book].[Book].CURRENTMEMBER.MEMBERVALUE,"72")>0 OR INSTR([Book].[Book].CURRENTMEMBER.MEMBERVALUE,"73")>0))
}
MEMBER [?] AS Aggregate([FilteredSet])
SELECT NON EMPTY {[Measures].[CS01.TenorTotal] } ON COLUMNS, NON EMPTY {[FilteredSet], [?]} ON ROWS FROM [TraderCube] WHERE ([Date].[Date].[2022-04-01],[Epoch].[Epoch].[Branch].[master].[17])
Current MDX error back is:
'The 2 objects do not have the same dimensionality: [[CDR], [Book]] and [[Measures]]'
Thanks to more advanced MDX-ers out there! :)
Leigh Tilley
TilleyTech Ltd
P.S. Sorry no screenshots...client browser trackers would detect it and I'd get in trouble!

MDX Filter based on a Dimension field and Measure field not returning expected result

I'm using SSAS 2019 and have the following MDX query to determine data errors - i.e. in this case where a given score ([Measures].[Score]) is above what the maximum score ([PerformanceLevel].[MaxScore]) is
SELECT NON EMPTY {
[Measures].[FactTestScoreCount]
} ON COLUMNS,
NON EMPTY
{(
[TestEvent].[Key].[Key].Members
)} ON ROWS
FROM [TestScore]
WHERE
(
FILTER(
[PerformanceLevel].[MaxScore].[MaxScore].Members,
[PerformanceLevel].[MaxScore].CurrentMember.MemberValue < [Measures].[Score]
)
)
...you'll note it uses a Measure value to compare against a Dimension value in the filter
However the above query incorrectly returns all rows (it should return zero rows as we have no scores in the current database that are above the maximum score)
I have tried using a HAVING and also tried to add the filter into the FROM however I get the same result. I have also tried converting the data-type to int on both sides of the expression but to no avail. I tried temporarily changing the condition to hard-code (numeric) values and this seems to narrow down the issue as being the [Measures].[Score] in the FILTER - i.e. as far as I can see putting the Measure within the FILTER doesn't seem to be working as expected but I can't work out why
Any ideas would be much appreciated
I didn't fully understand what the first parameter of the Filter function represented - specifically the granularity, otherwise it (depending on your aggregation type will SUM the measure). Also moving the Filter up to the select allows more flexibility when joining/displaying rows - so I came up with the below which works well
SELECT NON EMPTY {
[Measures].[FactTestScoreCount]
} ON COLUMNS,
Filter(
NonEmpty(
{(
[TestEvent].[Key].[Key].members
)}
), [Measures].[Score] > [PerformanceLevel.[MaxScore].CurrentMember.MemberValue
) ON ROWS
FROM [TestScore]

Finding the last 4, 3, 2, 1 months consecutive order drops among clients based on drop variance

Here I have this query that finds out the drop percentage of a bunch of clients based on the orders they have received(i.e. It finds the percentage difference in orders by comparing the current month with the previous month). What I want to achieve here is to have a field where I can see the clients who had 4 months continuous drop, 3 months drop, 2 months drop, and 1 month drop.
I know, it can only be achieved by comparing the last 4 months using the lag function or sub queries. can you guys pls help me out on this one, would appreciate it very much
select
fd.customers2, fd.Month1, fd.year1, fd.variance, case when
(fd.variance < -0.00001 and fd.year1 = '2022.0' and fd.Month1 = '1')
then '1month drop' else fd.customers2 end as 1_most_host_drop
from 
(SELECT
c.*,
sa.customers as customers2,
sum(sa.order) as orders,
date_part(mon, sa.date) as Month1,
date_part(year, sa.date) as year1,
(cast(orders - LAG(orders) OVER(Partition by customers2 ORDER BY
 year1, Month1) as NUMERIC(10,2))/NULLIF(LAG(orders) 
OVER(partition by customers2 ORDER BY year1, Month1) * 1, 0)) AS variance
FROM stats sa join (select distinct
    d.id, d.customers 
     from configer d 
    ) c on sa.customers=c.customers
WHERE sa.date >= '2021-04-1' 
GROUP BY Month1, sa.customers, c.id,  year1, 
     c.customers)fd
In a spirit of friendliness: I think you are a little premature in posting this here as there are several issues with the syntax before even reaching the point where you can solve the problem:
You have at least two places with a comma immediately preceding the word FROM:
...AS variance, FROM stats_archive sa ...
...d.customers, FROM config d...
Recommend you don't use VARIANCE as an alias (it is a system function in PostgreSQL and so is likely also a system function name in Redshift)
Not super important, but there's no need for c.* - just select the columns you will use
DATE_PART requires a string as the first parameter DATE_PART('mon',current_date)
I might be wrong about this, but I suspect you cannot use column aliases in the partition by or order by of a window function. Put the originating expressions there instead:
... OVER (PARTITION BY customers2 ORDER BY DATE_PART('year',sa.date),DATE_PART('mon',sa.date))
LAG has three parameters. (1) The column you want to retrieve the value from, (2) the row offset, where a positive integer indicates how many rows prior to the current row you should retrieve a value from according to the partition and order context and (3) the value the function should return as a default (in case of the first row in the partition). As such, you don't need NULLIF. So, to get the row immediately prior to the current row, or return 0 in case the current row is the first row in the partition:
LAG(orders,1,0) OVER (PARTITION BY customers2 ORDER BY DATE_PART('year',sa.date),DATE_PART('mon',sa.date))
If you use 0 as a default in the calculation of what is currently aliased variance, you will almost certainly run into a div/0 error either now, or worse, when you least expect it in the future. You should protect against that with some CASE logic or better, provide a more appropriate default value or even better, calculate the LAG with the default 0, then filter out the 0 rows before doing the calculation.
You can't use column aliases in the GROUP BY. You must reference each field that is not participating in an aggregate in the group by, whether through direct mention (sa.date) or indirectly in an expression (DATE_PART('mon',sa.date))
Your date should be '2021-04-01'
All in all, without sample data, expected results using the posted sample data and without first removing syntax errors, it is a tall order to attempt to offer advice on the problem which is any more specific than:
Build the source of the calculation as a completely separate query first. Calculate the LAG in that source query. Only when you've run that source query and verified that the LAG is producing the correct result should you then wrap it as a sub-query or CTE (not sure if Redshift supports these, but presumably) at which point you can filter out the rows with a zero as the denominator (the first month of orders for each customer).
Good luck!

MDX Query SUM PROD to do Weighted Average

I'm building a cube in MS BIDS. I need to create a calculated measure that returns the weighted-average of the rank value weighted by the number of searches. I want this value to be calculated at any level, no matter what dimensions have been applied to break-down the data.
I am trying to do something like the following:
I have one measure called [Rank Search Product] which I want to apply at the lowest level possible and then sum all values of it
IIf([Measures].[Searches] IS NOT NULL, [Measures].[Rank] * [Measures].[Searches], NULL)
And then my weighted average measure uses this:
IIf([Measures].[Rank Search Product] IS NOT NULL AND SUM([Measures].[Searches]) <> 0,
SUM([Measures].[Rank Search Product]) / SUM([Measures].[Searches]),
NULL)
I'm totally new to writing MDX queries and so this is all very confusing to me. The calculation should be
([Rank][0]*[Searches][0] + [Rank][1]*[Searches][1] + [Rank][2]*[Searches][2] ...)
/ SUM([searches])
I've also tried to follow what is explained in this link http://sqlblog.com/blogs/mosha/archive/2005/02/13/performance-of-aggregating-data-from-lower-levels-in-mdx.aspx
Currently loading my data into a pivot table in Excel is return #VALUE! for all calculations of my custom measures.
Please halp!
First of all, you would need an intermediate measure, lets say Rank times Searches, in the cube. The most efficient way to implement this would be to calculate it when processing the measure group. You would extend your fact table by a column e. g. in a view or add a named calculation in the data source view. The SQL expression for this column would be something like Searches * Rank. In the cube definition, you would set the aggregation function of this measure to Sum and make it invisible. Then just define your weighted average as
[Measures].[Rank times Searches] / [Measures].[Searches]
or, to avoid irritating results for zero/null values of searches:
IIf([Measures].[Searches] <> 0, [Measures].[Rank times Searches] / [Measures].[Searches], NULL)
Since Analysis Services 2012 SP1, you can abbreviate the latter to
Divide([Measures].[Rank times Searches], [Measures].[Searches], NULL)
Then the MDX engine will apply everything automatically across all dimensions for you.
In the second expression, the <> 0 test includes a <> null test, as in numerical contexts, NULL is evaluated as zero by MDX - in contrast to SQL.
Finally, as I interpret the link you have in your question, you could leave your measure Rank times Searches on SQL/Data Source View level to be anything, maybe just 0 or null, and would then add the following to your calculation script:
({[Measures].[Rank times Searches]}, Leaves()) = [Measures].[Rank] * [Measures].[Searches];
From my point of view, this solution is not as clear as to directly calculate the value as described above. I would also think it could be slower, at least if you use aggregations for some partitions in your cube.

MDX Calculating Time Between Events

I have a Cube which draws its data from 4 fact/dim tables.
FactCaseEvents (EventID,CaseID,TimeID)
DimEvents (EventID, EventName)
DimCases (CaseID,StateID,ClientID)
DimTime (TimeID,FullDate)
Events would be: CaseReceived,CaseOpened,CaseClientContacted,CaseClosed
DimTime holds an entry for every hour.
I would like to write an MDX statement that will get me 2 columns: "CaseRecievedToCaseOpenedOver5" and "CaseClientContactedToCaseClosedOver5"
CaseRecievedToCaseOpenedOver5 would hold the number of cases that had a time difference over 5 hours for the time between CaseReceived and CaseOpened.
I'm guessing that "CaseRecievedToCaseOpenedOver5" and "CaseClientContactedToCaseClosedOver5" would be calculated members, but I need some help figuring out how to create them.
Thanks in advance.
This looks like a good place to use an accumulating snapshot type fact table and calculate the time it takes to move from one stage of the pipeline to the next in the ETL process.
Query for AdventureWorks (DateDiff works in MDX):
WITH
MEMBER Measures.NumDays AS
'iif(ISEMPTY(([Delivery Date].[Date].CurrentMember
,[Ship Date].[Date].CurrentMember
,Measures.[Order Count]))
,null
, Datediff("d",[Ship Date].[Date].CurrentMember.Name
,[Delivery Date].[Date].CurrentMember.Name))'
SELECT
NON EMPTY {[Ship Date].[Date].&[63]
:[Ship Date].[Date].&[92]} ON COLUMNS,
NON EMPTY {[Delivery Date].[Date].&[63]
:[Delivery Date].[Date].&[92]}
* {[Measures].[NumDays]
, [Measures].[Order Count]} ON ROWS
FROM [Adventure Works]
Taken from: http://www.mombu.com/microsoft/sql-server-olap/t-can-i-have-datediff-in-mdx-265763.html
If you'll be using this member a lot, create it as a calculated member in the cube, on the Calculations tab if I remember right.