If I create a set of tuples via the following crossjoin which hierarchy is joined to which first?
hierA * hierB * hierC
If I know that the count of non empty members in A and B are significantly less than the count of members in C then should this influence the order?
It depends on the execution system used, i.e. how crossjoining on axis is optimized by compiler.
Anyway, you will get real boost only if you first reduce cube size with subselect - it will prevent crossjoining non-matching points:
select
{ [Measures].[X] } on 0,
{ [DimA].[A] * [DimB].[B] } on 1
From (
select { [DimB].[B].members} on 0
from CubeA
)
Related
I'm completely new to SSAS cubes and terminologies (members, hierarchies, e.t.c) and MDX queries but i have started my journey to learn this stuff, so apologies if my question is very clear.
SELECT NON EMPTY { } ON COLUMNS, {
[Suggestions].[Parent_id].[Parent_id] *--.ALLMEMBERS *
[Suggestions].[id].[id] * --.ALLMEMBERS *
[Suggestions].[Sugg - #].[Sugg - #] *-- .ALLMEMBERS *
[Suggestions].[Sugg - Assigned].[Sugg - Assigned] * --.ALLMEMBERS *
[Suggestions].[Sugg - Assigned to].[Sugg - Assigned to]* --.ALLMEMBERS *
[Suggestions].[Sugg - Status].[Sugg - Status] *--.ALLMEMBERS
--[Parent_Details].[Unit_Name].[Unit_Name] --.ALLMEMBERS
}
DIMENSION PROPERTIES MEMBER_CAPTION,
MEMBER_UNIQUE_NAME ON ROWS
FROM ( SELECT ( { [Suggestions].[Sugg - Assigned to].&[UNIT] } ) ON COLUMNS
FROM ( SELECT ( STRTOSET('SG123', CONSTRAINED) ) ON COLUMNS
FROM ( SELECT ( { [Suggestions].[Sugg - Status].&[Pending Inputt] } ) ON COLUMNS
FROM [BOI_Tracker-Stats])))
CELL PROPERTIES VALUE
I have the above MDX query that executing. I generated the query from the MDX query designer tool in SSMS and have only simple modifications by hand.
In the query, if i comment out the line [Parent_Details].[Unit_Name].[Unit_Name] --.ALLMEMBERS, i get the correct number of rows.
Main Question.
If i un-comment it so that i return the Unit_Name column, my rows are duplicated. The original 100 correct rows now because thousands of rows with duplicate values all over. Any body know what i should look out for that is causing this. Looks a wrong join is being applied.
Other things i would like to understand.
1. The query designer generated the query in the format
[Suggestions].[Parent_id].[Parent_id].ALLMEMBERS * . If i comment out .ALLMEMBERS * such that the query is just [Suggestions].[Parent_id].[Parent_id] with out .ALLMEMBERS * the results are the same. So what is the use of .ALLMEMBERS *
2. I also notice that the column i want to select is repeated twice like
Suggestions].[Parent_id].[Parent_id], why is this so?, why can't it just be generated as Suggestions].[Parent_id]
If you select from different dimensions like that you basically multiply the results. If you think about it that's the correct behavior. In your case you have [Suggestions] and [Parent_Details] . These are different dimensions. In your query you want results having both so it does:
For each member of [Suggestion] get all members of [Parent Details] and add them to the result. So the result set becomes:
[Suggestion-1][Parent_Details-1][Measures...]
[Suggestion-1][Parent_Details-2][Measures...]
[Suggestion-2][Parent_Details-1][Measures...]
[Suggestion-2][Parent_Details-2][Measures...]
[Suggestion-3][Parent_Details-1][Measures...]
etc.
( having different levels from the [Suggestions] dimension doesn't multiply the measures )
This is a correct behavior when you think about it because if you add these two dimensions you probably want to know something like "What are the measures for that suggestion and for these parent details?" And that exact row will be correct in the result set. It all depends on what result do you want to get (What do you ask for).
The multiplication of the names depends on your cube design. First row is a level and second a member. If you create a hierarchy for example it will not look like that.
just wondering if you could explain to me why these two mdx statements yield different results, and how I would alter the calculated measure to be correct. I am assuming the calculated measure is incorrectly using the sum function.
SELECT
{
[Measures].[PPPH]
}
ON COLUMNS,
{
[Matter].[Parent Matter Lawyer - Responsible].&[qnas]
}
ON ROWS
FROM [Expert Hypercube]
WHERE
(
[Date].[Fiscal].[Month].&[201702],
[Relative Period].[Relative Period].&[YTD]
)
versus
with
member measures.[PPPHx] as
sum({{linkmember([Timekeeper].[Person].currentmember,[Matter].[Parent Matter Lawyer - Responsible])}},measures.[PPPH])
select
{
measures.[PPPHx]
} on columns,
{
[Timekeeper].[Person].&[qnas]
} on rows
from [Expert Hypercube]
where
(
[Date].[Fiscal].[Month].&[201702],
[Relative Period].[Relative Period].&[YTD]
)
It's unclear why do you use Sum function at all. LinkMember function returns a member, no need to cover it with {...} like a set. Also you need to use the All member, since it returns [Timekeeper].[Person].CurrentMember + LinkMember([Timekeeper].[Person].CurrentMember,[Matter].[Parent Matter Lawyer - Responsible]).
With
Member [Measures].[PPPHx] as
(
LinkMember([Timekeeper].[Person].CurrentMember,[Matter].[Parent Matter Lawyer - Responsible]),
[Timekeeper].[Person].[All],
[Measures].[PPPH]
)
See a Richard Lees's article about it.
I have two dimensions, lets say Date Hierarchy and Product and a measure which has MAX (Measures.[Max]) aggregation.
The requirement would be to have SUM of Measures.[Max] on DAY or HOUR level of Date Hierarchy and be summarized in Month level.
I have the following query:
With
Member Measures.SumOfMax as SUM([Date].[Hierarchy].[Hour].AllMembers, Measures.[Max])
Select
NON Empty
{
Measures.SumOfMax
} ON COLUMNS,
NON EMPTY
{
[Date].[Hierarchy].[Month].AllMembers *
[Product].[Product Name].[Product Name].Allmembers
} Having Measures.[Max] > 0
ON ROWS
FROM [Cube]
Above query runs very slow. Are there any ways to optimized this?
The problem with this query is that the calculated measure Measures.SumOfMax is evaluated for every cell on the axis although it's yielding the same value each time. SSAS engine is not intelligent enough to understand that, but since you know about this behavior, you can take advantage of FE caching so that it gets evaluated only once and gets cached in FE cache. Read more on it here
With
Member Measures.[_SumOfMax] as SUM([Date].[Hierarchy].[Hour].AllMembers, Measures.[Max])
Member Measures.[SumOfMax] as ([Date].[Hierarchy].[Hour].[All], Measures.[_SumOfMax])
Select
NON Empty
{
Measures.SumOfMax
} ON COLUMNS,
NON EMPTY
{
[Date].[Hierarchy].[Month].AllMembers *
[Product].[Product Name].[Product Name].Allmembers
} Having Measures.[Max] > 0
ON ROWS
FROM [Cube]
Hope this helps.
Right now I'm dealing with a program that can generate and return SQL or MDX queries (depending on the source database of the queries). I'm working on adding a feature that counts all the rows returned by a given query.
Now, I have some small background with SQL, so I was able to parse table names and generate a rowcount. However, MDX is a completely new beast for me.
In SQL, I'm creating:
SELECT
COUNT(SUM)
AS ROWS
FROM
(
COUNT(*) AS COUNT FROM TABLE1
UNION ALL
COUNT(*) AS COUNT FROM TABLE2
UNION ALL
COUNT(*) AS COUNT FROM TABLE3
ETC...
)
Now, what I'm wondering is, how would I do something similar with MDX? I've done some reading on MDX, and from what I gathered the basic notation is
[Dimension].[Hierarchy].[Level]
Now with SQL, I parsed the table names out of a larger generated query and simply inserted them into a new programmatically generated query. What would I have to grab from a larger MDX query to generate my own rowcounting query and sending it off to run? A simpler example of the MDX I'm dealing with would be:
WITH
MEMBER [BUSINESS1].[XQE_RS_CM1] AS '([BUSINESS1].[COMPANY_H].[all])', SOLVE_ORDER = 8
MEMBER [BUSINESS2].[XQE_RS_CM0] AS '([BUSINESS2].[all])', SOLVE_ORDER = 4
SELECT
NON EMPTY {[BUSINESS2].[ALL_TIME_H].[CALENDAR_YEAR_L].MEMBERS AS [XQE_SA1] , HEAD({[BUSINESS2].[XQE_RS_CM0]}, COUNT(HEAD([XQE_SA1]), INCLUDEEMPTY))} DIMENSION PROPERTIES PARENT_LEVEL, PARENT_UNIQUE_NAME ON AXIS(0),
NON EMPTY {[BUSINESS1].[COMPANY_H].[COMPANY_CD__L].MEMBERS AS [XQE_SA0] , HEAD({[BUSINESS1].[XQE_RS_CM1]}, COUNT(HEAD([XQE_SA0]), INCLUDEEMPTY))} DIMENSION PROPERTIES PARENT_LEVEL, PARENT_UNIQUE_NAME ON AXIS(1),
NON EMPTY {[Measures].[Measures].[BUSINESS3]} DIMENSION PROPERTIES PARENT_LEVEL, PARENT_UNIQUE_NAME ON AXIS(2)
FROM
[SOURCE] CELL PROPERTIES CELL_ORDINAL, FORMAT_STRING, VALUE
Any insight would be awesome, thanks.
At first glance your script looks reasonable then after unravelling it becomes a bit(!) more complex.
The main difference between this and other scripts is its use of axis(2). In a sub-select extra dimensions are often used but this is a little odd as most clients can't handle 3 dimensional cellsets - so I'm intrigued by what is consuming this info?
Also the member [BUSINESS1].[XQE_RS_CM1] is a single member as is [BUSINESS2].[XQE_RS_CM0] so what is the point of the sections HEAD... ?
WITH
MEMBER [BUSINESS1].[XQE_RS_CM1] AS
([BUSINESS1].[COMPANY_H].[all]), SOLVE_ORDER = 8
MEMBER [BUSINESS2].[XQE_RS_CM0] AS
([BUSINESS2].[all]), SOLVE_ORDER = 4
SELECT
NON EMPTY
{[BUSINESS2].[ALL_TIME_H].[CALENDAR_YEAR_L].MEMBERS AS [XQE_SA1]
,HEAD(
{[BUSINESS2].[XQE_RS_CM0]},
COUNT(
HEAD([XQE_SA1])
,INCLUDEEMPTY
)
)}
ON AXIS(0),
NON EMPTY
{[BUSINESS1].[COMPANY_H].[COMPANY_CD__L].MEMBERS AS [XQE_SA0]
,HEAD(
{[BUSINESS1].[XQE_RS_CM1]},
COUNT(
HEAD([XQE_SA0])
,INCLUDEEMPTY
)
)}
ON AXIS(1),
NON EMPTY
{
[Measures].[Measures].[BUSINESS3]
}
ON AXIS(2)
FROM
[SOURCE]
Does the following return the same data as the original script?
SELECT
NON EMPTY
{
[BUSINESS2].[ALL_TIME_H].[CALENDAR_YEAR_L].MEMBERS
,[BUSINESS2].[all]
}
ON 0,
NON EMPTY
{
[BUSINESS1].[COMPANY_H].[COMPANY_CD__L].MEMBERS
,[BUSINESS1].[COMPANY_H].[all]
}
ON 1
FROM [SOURCE]
WHERE [Measures].[Measures].[BUSINESS3];
All you need to calculate then is the count of members returned in the following set on the rows:
{
[BUSINESS1].[COMPANY_H].[COMPANY_CD__L].MEMBERS
,[BUSINESS1].[COMPANY_H].[all]
}
I just stumbled over jOOQ's maxDistinct SQL aggregation function.
What does MAX(DISTINCT x) do different from just MAX(x) ?
maxDistinct and minDistinct were defined in order to keep consistency with the other aggregate functions where having a distinct option actually makes a difference (e.g., countDistinct, sumDistinct).
Since the maximum (or minimum) calculated between the distinct values of a dataset is mathematically equivalent with the simple maximum (or minimum) of the same set, these function are essentially redundant.
In short, there will be no difference. In case of MySQL, it's even stated in manual page:
Returns the maximum value of expr. MAX() may take a string argument;
in such cases, it returns the maximum string value. See Section 8.5.3,
“How MySQL Uses Indexes”. The DISTINCT keyword can be used to find the
maximum of the distinct values of expr, however, this produces the
same result as omitting DISTINCT.
The reason why it's possible - is because to keep compatibility with other platforms. Internally, there will be no difference - MySQL will just omit influence of DISTINCT. It will not try to do something with set of rows (i.e. produce distinct set first). For indexed columns it will be Select tables optimized away (thus reading one value from index, not a table), for non-indexed - full scan.
If i'm not wrong there are no difference
For Columns
ID
1
2
2
3
3
4
5
5
The OUTPUT for both quires are same 5
MAX(DISTINCT x)
// ID = 1,2,2,3,3,4,5,5
// DISTINCT = 1,2,3,4,5
// MAX = 5
// 1 row
and for
MAX(x)
// ID = 1,2,2,3,3,4,5,5
// MAX = 5
// 1 row
Theoretically, DISTINCT x ensures that every element is different from a certain set. The max operator selects the highest value from a set. In plain SQL there should be no difference between both.