I am new to MDX and I know that this must be a simple question but I haven't been able to find an answer.
I am modeling a a questionnaire that has questions and answers. What I am trying to achieve is to find out the number of people who gave specific answers to questions., e.g. the number of males aged between 20-25
When I run the query below for the questions individually the correct result is returned
SELECT
[Measures].[Fact Demographics Count] ON Columns
FROM
[Dsv All]
WHERE
[Answer].[Dim Answer].&[1]
[Measures].[Fact Demographics Count] is a count of the primary key column
[Answer].[Dim Answer].&[1] is the key for the Male answer
Result for number of people who are male = 150
Result for number of people who are between 20-25 = 12
But when I run the next query below rather than getting the number people who are males and aged between 20-25. I get the sum of the number of people who are males and the number of people who are between 20-25.
SELECT
[Measures].[Fact Demographics Count] ON Columns
FROM
[Dsv All]
WHERE
{[Answer].[Dim Answer].&[1],[Answer].[Dim Answer].&[9]}
result = 162
The structure of the fact table is
FactDemographicsKey,
RespodentKey,
QuestionKey,
AnswerKey
Any help would be greatly appreciated
Thanks
Take a look at the MDX function FILTER - this may give you what you need. A combination of FILTER and Member Properties to filter against the ID's might do it. You're having a problem because what you're trying to do is a little against the grain of how an OLAP cube is structured (from my experience) because Age and Gender are both members of the same dimension (Answers), which means that they each get their own cells for aggregation, but unlike if Age and Gender were each on their own dimension, they don't get aggregated with respect to one another except to get added together. In an OLAP cube, each combination of each member of each dimension with each member of every other dimension gets a "cell" with the value of each measure that is unique to that combination - that is what you want, but members of the same dimension (such as Answers) aren't cross-calculated in that way.
If possible, I would recommend breaking out the individual answers into individual dimensions, i.e. Age and Gender each have their own dimensions with their own members, then what you want to do will naturally flow out of your cube. Otherwise, I'm afraid you will have lots of MDX fiddelry to do. (I am not an MDX expert, though, so I could be completely off base on this one, but that is my understanding)
Also, definitely read the book previously mentioned, MDX Solutions, unless this is the only MDX query you think you'll need to write. MDX and Multidimensional analysis are nothing like SQL, and a solid understanding of the structure of an OLAP database and MDX in general is absolutely essential, and that book does a very, very nice job of getting you where you need to be in that department.
When trying to figure out problems with where-criteria or slices I find it helpful to breakdown the items that you're slicing on into dimensions, rather than measures.
select
[Measures].[Fact Demographics Count] on Columns
from [Dsv All]
where
{
[Answer].[Dim Answer].&[1],
[Dim Age Band].[20-25]
}
Although then you're not really using the power of MDX - you're getting just one value.
select
[Dim Answer].Members on Columns,
[Dim Age Band].Members on Rows
from [Dsv All]
where ( [Measures].[Fact Demographics Count] )
Will give you a pivot table (or crosstable) breaking down gender (on columns) by age-bands (on rows).
BTW - ff you're learning MDX this book: MDX Solutions is far and away the best starting point that I've found.
Firstly thanks to everyone for their replies. This was an interesting one to solve and for anyone new to MDX and coming from SQL its an easy trap to fall into.
So for those interested here is a brief overview of the solution.
I have 3 tables
factDemographics: holds respondents and their answers (who answered what)
dimAnswer: the answers
dimRespondent: the respondents
In the datasource view for the cube I duplicated factDemographics 5 times using Named Queries and I named these fact1, fact2, ..., fact5. (which will create 5 measure groups)
Using VS Studio's create cube wizard I set the following fact tables
fact1, fact2, ... as fact tables
dimRespondent a fact table. I use this table to get the number of respondents.
Removed the original factDemographics table.
Once the cube was created I duplicated the dimAnswer dimension 5 times, naming them filter1, filter2, ...
Finally in the Cube Structure's Dimension Usage tab I linked these together as follows
filter1 many to many dimRespondent
filter2 many to many dimRespondent
filter3 many to many dimRespondent
filter4 many to many dimRespondent
filter5 many to many dimRespondent
filter1 regular relationship fact1
filter2 regular relationship fact2
filter3 regular relationship fact3
filter4 regular relationship fact4
filter5 regular relationship fact5
This now enables me to rewrite the query I used in my original post as
SELECT
[Measures].[Dim Respondent Count] On 0
FROM
[DemographicsCube]
WHERE
(
[Filter1].[Answer].&[Male],
[Filter2].[Answer].&[20-25]
)
My query can now be filtered by up to 5 questions.
Although this works I'm sure that there is a more elegant solution. If anyone knows what that is I'd love to hear it.
Thanks
If you are using MSSQL, you can use the "WITH ROLLUP" to get some extra rows which would have the information you want. Also, you are not using a "GROUP BY" which you will need.
Use the GROUP BY to break up the set into groups and then use aggregate functions to get your counts and other stats.
Example:
select AGE, GENDER, count(1)
from MY_TABLE
group by AGE, GENDER
with rollup
This would give you the number of each gender of person in your table in each age group, and the "rollup" would give you the total number of people in your table, the numbers in each age group regardless of gender, and the numbers of each gender regardless of age. Something like
AGE GENDER COUNT
--- ------ -----
20 M 1245
21 M 1012
20 F 942
21 F 838
M 2257
F 1780
20 2187
21 1850
4037
Related
I am trying to come up with some arithmetic calculations for some survey data. I want to do these calculations for a number of segments and want to figure out how to do it without writing numerous SELECT statements.
This is what I have so far:
FACT table. This tables holds survey data at a respondent level - for example, if a survey had 10 questions, this table will have 11 columns: a column to identify the respondent_ID and 10 other columns to identify the responses to those questions.
DIMENSION table. This table segments we want to view the survey data by at a respondent level - for example, if we want to view survey responses by membership_status and age_bracket, this table will have 3 columns: a column to identify the respondent_ID, and two columns to identify membership_status and age_bracket.
OUTPUT.
I want to get aggregate calculations to summarizes the responses to the survey overall and to each question. I also want to be able to get this information for all possible segments that exist in the DIMENSIONS table.
I can do the query below, however I'll need to do this for every segment:
SELECT
COUNT(DISTINCT(CASE WHEN f.QUESTION_1 IN ('8', '9', '10') THEN f.RESPONDENT_ID END))*1.0 / COUNT(DISTINCT(CASE WHEN f.QUESTION_1 IS NOT NULL THEN f.RESPONDENT_ID END))*1.0 AS CSAT_1
FROM FACT f
JOIN DIMENSION d ON f.RESPONDENT_ID = d.RESPONDENT_ID
WHERE d.MEMBERSHIP_STATUS = 'ACTIVE'
The calculation above gives us something called a top 3 box. That is just one calculation, I will need to do many of them. Additionally, ever calculation will need to be done for each segment. In order to get a calculation for nonactive members, I would need to run another query and set d.MEMBERSHIP_STATUS = 'INACTIVE' and I would need to run another query with no filter, to get the overall calculation.
Is there a way I could store all my arithmetic calculations needed in my output as a function (maybe in a temp table or something) - my thought is that it'll be better to set the functions somewhere, and then when I need to calculate the output, I would some how call the function to do all the calculations I need, and give me all the calculations for every segment I have?
I can't fully envision how to get there, or if this is even a good solution, so guidance and detailed SQL code would be extremely helpful.Examples please!
I have this DAX formula that gives me a count of id that appear on the fact table in a month, averaged over the year. I can put this measure is a table ad it's unpacked by row with no issues (by adding variables from dimensions)
Measure:= AVERAGEX(
SUMMARIZE(
CALCULATETABLE(fact_table;FILTER('Time_Dimension';'Time_Dimension'[Last_month] <> "LAST"));
Time_Dimension[Month Name];
"Count";DISTINCTCOUNT(fact_table[ID])
);
[Count]
)
But it's terrible slow (I have 3 measures like this on a single table) and the fact table is big (like 300Million rows big)
I was reading that SUMMARIZE perform really bad with aggregations and It should be replaced with SUMMARIZECOLUMNS.
I wrote this formula
Measure_v2:= AVERAGEX(
SUMMARIZECOLUMNS(
Time_Dimension[Month Name];
FILTER(Time_Dimension;
Time_Dimension[Month Name]<>"LAST"
);
"Count";DISTINCTCOUNT(fact_table[ID])
)
[Count]
)
And it works when I visualize the measure as it is, but when I try to put it in a context (like the table above) it gives me the error "Can't use SUMMARIZECOLUMN and ADDMISSINGITEMS() in this context" How can I make a sustainable optimization from the original SUMMARIZE function?
Before optimizing SUMMARIZE, I would re-visit the overall approach. If your goal is to calculate average fact count per year-month, there is a simpler (and faster) way.
[ID Count]:=CALCULATE(COUNT('fact_table'[ID]),'Time_Dimension'[Last_month] <> "LAST")
[Average ID Count]:=AVERAGEX( VALUES('Time_Dimension'[Year_Month]), [ID Count`])
assuming that:
you have year-month attribute in your time dimension;
IDs in your fact table are unique (and therefore, simple count is
enough)
If this solution does not solve your problem, then please post your data model - it's hard to optimize without knowing the data structure.
On a side note, I would remove ID field from the fact table. It adds no value to the model, and consumes huge amounts of memory. Your objective can be achieved by simply counting rows:
[Fact Count]:=CALCULATE(COUNTROWS('fact_table'),'Time_Dimension'[Last_month] <> "LAST")
I'm relatively new to SQL but have learned some cool stuff. I'm getting results that don't make sense. I've got a query with several subqueries and what-not but I have a windowed function that isn't working like I'm expecting.
The part that isn't working is this (simplified from the 300 line query):
SELECT AVG(table.sales_amount)
OVER (PARTITION BY table.month, table.sales_rep, table.department)
FROM table
The problem is that when I pull the data non aggregated I get a value different (107) than the above returns (95).
I've used windowed functions for COUNT and SUM and they work fine, but AVG is acting strangely. Am I missing something about how this works with AVG?
The subquery that table is a standin for looks like:
sales_rep, month, department, sales_amount
1, 2017-1, abc, 125.20
1, 2017-2, abc, 120.00
2, 2017-1, def, 100.00
...etc
Working out of Sql Server Management studio
SOLVED: I did finally figure it out, the results i was joining this subquery to had the sales rep multiple times in a month selling objects A&B which caused whoever sold both to be counted twice. whoops, my bad.
The results that you get should be the same values as in:
SELECT AVG(table.sales_amount)
FROM table
GROUP BY table.month, table.sales_rep, table.department;
Of course, the rows will be different. You need to match up the three key columns.
Based on your sample data, it looks like the partitioning keys uniquely define each row. Perhaps you really intend:
SELECT AVG(table.sales_amount) OVER () as overall_average
FROM table;
EDIT:
For the departmental average:
SELECT AVG(table.sales_amount) OVER (partition by table.department) as department_average
FROM table;
After some bruteforcing of potential errors I finally figured out the issue. I was joining that subquery to the another which had multiple instances of a sales_rep in a given month (selling objects a & b) which caused the average of those with sales of both objects to be counted twice instead of once.
so sales rep 1 sold objects a & b which made his avg count as 66% of the dept avg instead of 50%, and sales rep 2 count only 33%.
I have a question related to creating a (more efficient) custom Distinct Count Measure using MDX.
Background
My cube has several long many to many relationship chains between Facts and Dimensions and it is important for me to be able to track which members in certain Dimensions do and do not relate to other Dimensions. As such, I have created a "Not Related" record in each of my dimension tables and set those records' ID values to -1. Then in my intermediate mapping fact tables I use the -1 ID to connect to these "Not Related" records.
The issue arises when I try to run a normal out-of-the-box distinct count on any field where the -1 members are present. In the case that a -1 member exists, the distinct count measure will return a result of 1 more than the true answer.
To solve this issue I have written the following MDX:
CREATE MEMBER CURRENTCUBE.[Measures].[Provider DCount]
AS
//Oddly enough MDX seems to require that the PID (Provider ID) field be different from both the linking field and the user sliceable field.
SUM( [Providers].[PID Used For MDX].Children ,
//Don't count the 'No Related Record' item.
IIF( NOT([Providers].[PID Used For MDX].CURRENTMEMBER IS [Providers].[PID Used For MDX].&[-1])
//For some reason this seems to be necessary to calculate the Unknown Member correctly.
//The "Regular Provider DCount Measure" below is the out-of-the-box, non-MDX measure built off the same field, and is not shown in the final output.
AND [Measures].[Regular Provider DCount Measure] > 0 , 1 , NULL )
),
VISIBLE = 1 , DISPLAY_FOLDER = 'Distinct Count Measures' ;
The Issue
This MDX works and always shows the correct answer (yeah!), but it is EXTREMELY slow when users start pulling Pivot Tables with more than a few hundred cells that use this measure. For less than 100 cells, the results are nearly instantaneously. For a few thousand cells (which is not uncommon at all), the results could take up to an hour to resolve (uggghhh!).
Can anyone help show me how to write a more efficient MDX formula to accomplish this task? Your help would be GREATLY appreciated!!
Jon Oakdale
jonoakdale#hotmail.com
Jon
You can use predefined scope to nullify all unnecessary (-1) members and than create your measure.
SCOPE ([Providers].[PID Used For MDX].&[-1]
,[Measures].[Regular Provider DCount Measure]);
THIS = NULL;
END SCOPE;
CREATE MEMBER CURRENTCUBE.[Measures].[Provider DCount]
AS
SUM([Providers].[PID Used For MDX].Children
,[Measures].[Regular Provider DCount Measure]),
VISIBLE = 1;
By the way, I used in my tests [Providers].[PID Used For MDX].[All].Children construction since don't know, what is dimension / hierarchy / ALL-level in your case. It seems like [PID Used For MDX] is ALL-level and [Providers] is name of dimension and hierarchy, and HierarchyUniqueName is set to Hide.
I’m pretty new to the many-to-many dimensions but I have a scenario to solve, which raised a couple of questions that I can’t solve myself… So your help would be highly appreciated!
The scenario is:
There is a parent-child Categories dimension which has a recursive Categories hierarchy with NonLeafDataVisible set
There is a regular Products dimension, that slices the fact table
There is a bridge many-to-many ProductCategory table which defines the relation between the two. Important to note is that a product can belong to any level of the categories hierarchy – i.e. a particular category can have both – directly assigned products and sub-categories.
There is a fact Transactions table that holds a FK to the Product that has been sold, as well as a FK to its category. The FK is needed, because
I have all this modeled in BIDS, the dimension usage is set between each of the dimensions and the facts, the many-to-many relation between the Categories and the Transactions table is in place is in place. In other words everything seems kind of OK..
I now need to write an MDX which I would use to create a report that shows something like that:
Lev1 Lev2 Lev3 Prod Count
-A
-AA 6
-AA 2
P6 1
P5 1
-AAA 2
P1 1
P2 1
-AAB 2
P3 1
P4 1
+BB
The following MDX almost returns what I need:
SELECT
[Measures].[SALES Count] ON COLUMNS,
NONEMPTYCROSSJOIN(
DESCENDANTS([Category].[PARENTCATEGORY].[Level 01].MEMBERS),
[Product].[Prod KEY].[Prod KEY].MEMBERS,
[Measures].[Measures].[Bridge Distinct Count],
[Measures].[SALES Count],
2) ON ROWS
FROM [Sales]
The problem that I have is that for each of the non-leaf categories, the cross join returns a valid intersection with each of the products that’s been sold for it + all subcategories. Hence the result set contains way too much redundant data and besides I can’t find a way to filter out the redundancies in the SSRS report itself.
Any idea on how to rewrite the MDX so that it only returns the result set above?
Another problem is that if I create a role-playing Category dimension which I set to slice directly the transactions data, then the numbers that I get when browsing the cube are completely off… It seems as SSAS is doing something during processing (but it’s not the SQL statements it shoots to the OLTP, as those remain exactly the same) that causes the problem, but I’ve no idea what. Any ideas?
Cheers,
Alex
I think I found a solution to the problem, using the following query:
WITH
MEMBER [Measures].[Visible] AS
IsLeaf([DIM Eco Res Category].[PARENTCATEGORY].CurrentMember)
MEMBER [Measures].[CurrentProd] AS
IIF
(
[Measures].[Visible]
,[DIM Eco Res Product].[Prod KEY].CurrentMember.Name
,""
)
SELECT
{
[Measures].[Visible]
,[Measures].[CurrentProd]
,[Measures].[FACT PRODSALES Count]
} ON COLUMNS
,NonEmptyCrossJoin
(
Descendants
(
[DIM Eco Res Product].[Prod KEY].[(All)],
,Leaves
)
,Descendants([DIM Eco Res Category].[PARENTCATEGORY].[(All)])
,[Measures].[FACT PRODSALES Count]
,2
)
DIMENSION PROPERTIES
MEMBER_CAPTION
,MEMBER_UNIQUE_NAME
,PARENT_UNIQUE_NAME
,LEVEL_NUMBER
ON ROWS
FROM [Sales];
In the report then I use the [Measures].[CurrentProd] as a source for the product column and that seems to work fine so far.