I'm totally new to MDX (My background is T-SQL). I have a fact table / cube with basically 2 columns
[Start Date], [Finish Date]
20140101, -1
20140101, 20140401
20140301, 20140501
...
I have a simple measure counting the records based on those columns
Counting how many people started (or finished) in some date is a simple task, but I want to know the cumulative total in a period.
Looking at the example above:
Month/Year, Started, Finished, CumulativeTotal
01/2014, 2, 0, 2
02/2014, 0, 0, 2
03/2014, 1, 0, 3
04/2014, 0, 1, 2
05/2014, 0, 1, 1
The "-1" is a Foreign Key to a "Not Defined" value in my date dimensions. This means the record is not finished yet, therefore, it must appear in every month after the started date.
How can I achieve this? I just can't find any reference for it. Or I just don't know what to look for (probably true).
Not to avoid answering your question, but this isn't a good design for a fact table to be used to populate a cube. You want to create a star schema, so a better design would be something like this:
PersonId Month Count
Where PersonId is a foreign key to your People table,
Month is a DateTime that is a foreign key to a Time dimension table,
Count is either a 1 or 0 depending on if that person should be counted in that month or not.
To get the results you want from the table you have, you're better off using straight SQL. This isn't the kind of thing that Cubes and MDX are meant to handle.
Related
Would much appreciate any help on this.
I have a measure called "Sales" populated with values, however i am trying to turn the "Sales" value to 0, whenever the "Sales Flag" is set to 0.
Important Note: The Sales Flag is based on Date (lowest level of detail).
The difficulty that i am really experiencing and cant get a grip on, is how i am trying the display the MDX outcome.
As explained above, i would want to make the "Sales" value 0 whenever we have a 0 in the "Sales Flag" (which is based on the Date), but when I run the MDX Script I would wan't on the ROWS to NOT display the Date, but instead just the Week (higher Level to Date), as below shows:
I really have spent hours on this and can't seem to understand how we can create this needed custom Sales measure based on the Sales Flag on the date level, but having the MDX outcome display ROWS on Week level.
Laz
You need to define the member in the MDX before the select. Something like that:
WITH MEMBER [Measures].[Fixed Sales] as IIF([Sales Flag].currentMember=1,[Sales], 0)
SELECT [Measures].[Fixed Sales] on 0, [Sales Flag] on 1 from [Cube]
I am writing the code without SSAS here so it might not be the 100% correct syntax but you can get the general idea ;)
You can add the iif in the SELECT part but I find creating member to be the cleaner solution.
SELECT IIF([Sales Flag].currentMember=1,[Sales], 0) on 0, [Sales Flag] on 1 from [Cube]
If you have a control over the cube in SSAS you can create a calculated member there and you can access it easier.
Glad to hear if Veselin's answer works for you, but if not...
Several approaches are also possible.
Use Measure expression for Sales measure:
Use SCOPE command for Day level (if it's Key level of Date dimension). If it's not a key level you have to aggregate on EVERY level (week, year etc) to emulate AggregateFunction of Sales measure but with updated behavior for one flag:
SCOPE([Date].[Your Date Hierarchy].[Day].members,[Measures].[Sales]);
THIS=IIF([Sales Flag].CurrentMember = 1,[Measures].[Sales],0);
END SCOPE;
Update logic in DSV to multiply Sales column by SalesFlag. This is the easiest way from T-SQL perspective.
Data: I have a single row that represents an annual subscription to a product, it has an overall startDate and endDate, there is also third date which is startdate + 1 month called endDateNew. I also have a non-related date table (called table X).
Output I'm looking for: I need a new column called Categorisation that will return 'New' if the date selected in table X is between startDate and endDateNew and 'Existing' if the date is between startDate and endDate.
Problem: The column seems to evaluate immediately without taking in to account the date context from the non-related date table - I kinda expected this to happen in visual studio (where it assumes the context is all records?) but when previewing in Excel it carries through this same value through.
The bit that is working:I have an aggregate (an active subscriber count) that correctly counts the subscription as active over the months selected in Table X.
The SQL equivalent on an individual date:
case
when '2015-10-01' between startDate and endDateNew then 'New'
when '2015-10-01' < endDate then 'Existing'
end as Category
where the value would be calculated for each date in table X
Thanks!
Ross
Calculated columns are only evaluated at model refresh/process time. This is by design. There is no way to make a calculated column change based on run-time changes in filter context from a pivot table.
Ross,
Calculated columns work differently than Excel. Optimally the value is known when the record is first added to the model.
Your example is kinda similar to a slowly changing dimension .
There are several possible solutions. Here are two and a half:
Full process on the last 32 days of data every time you process the subscriptions table (which may be unacceptably inefficient).
OR
Create a new table 'Subscription scd' with the primary key from the subscriptions table and your single calculated column of 'Subscription Age in Days'. Like an outrigger. This table could be reprocessed more efficiently than reprocessing the subscriptions table, so process the subscriptions table as incrementals only and do a full process on this table for the data within the last 32 days instead.
OR
Decide which measures are interesting within the 'new/existing' context and write explicit measures for them using a dynamic filter on the date column in the measures
eg. Define
'Sum of Sales - New Subscriptions',
'Sum of Sales - Existing Subscriptions',
'Distinct Count of New Subscriptions - Last 28 Days', etc
I want to model a fact table for our users to help us calculate DAU (Daily active Users), WAU (Weekly active users) and MAU (Monthly active users).
The definitions of these measures are as follows:
1. DAU are users who is active every day during last 28 days.
2. WAU are users who are active at least on one day in each 7 days period during last 28 days
3. MAU are users who are active at least 20 days during last 28 days
I have built a SSAS cube with my fact table and user dimension table as follows
Fact : { date, user_id, activity_name}
Dimension: { date, user_id, gender, age, country }
Now I want to build a cube over this data so that we can see all the measures in any given day for last 28 days.
I thought of initially storing 28 days of data for all users in the SQL server and then do count distinct on date to see which measures they fall into.. but this proved very expensive since the data per day is huge..almost 10 millions rows.
So my next thought was to model the fact table (before moving it to SQL) such that it has a new column called "active_status" which is a 32 bit binary type column.
Basically, I'll store a binary number (or decimal equivalent) like 11000001101111011111111111111 which has a bit set on the days the user is active and off on the days user is not active.
This way I can compress 28 days worth of data in a single day before loading into data mart
Now the problem is , I think MDX doesn't support bitwise operations on columns in the expressions for calculated members like regular SQL does. I was hoping to create calculated measures daily_active_users, weekly_active_users and monthly_active_users using MDX that looks at this active_status bit for the user and does bitwise operation to determine the status.
Any suggestions on how to solve this problem? if MDX doesn't allow bitwise, what else can I do SSAS to achieve this.
thanks for the help
Additonal notes:
#Frank
Interesting thought about using a view to do the conversion from bitset to a dimension category..but I'm afraid it won't work. Because I have few dimensions connected to this fact table that have many-many relationships..for ex: I have a dimension called DimLanguage and another dimension called DimCountry and they have many-many relationship. And what ultimately I would like to do in the cube is to calculate the DAU/WAU/MAU which are COUNT(DISTINCT UserId) based on the combination of dimensions. So for ex; If a user is not MAU for dimension country US because he is only active 15 days out of 28 ....but he will be considered
You do not want to show the bitmap data to the users of the cube, but just the categories DAU, WAU, MAU, you should do the conversion from bitmap to category on data loading time. Just create a dimension table containing e. g. the following data:
id category
-- --------
1 DAU
2 WAU
3 MAU
Then define a view on your fact table that evaluates the bitmap data, and for each user and each date just calculates the id value of the category the user is in. This is then conceptually a foreign key to the dimension table. Use this view instead of the fact table in your cube.
All the bitmap evaluations are thus done on the relational side, where you have the bit operators available.
EDIT
As your requirement is that you need to aggregate the bitmap data in Analysis Services using bitwise OR as the aggregation method, I see no simple way to do that.
What you could do, however, would be to have 28 single columns, say Day1 to Day28, which would be either 0 or 1. These could be of type byte to save some space. You would use Maximum as aggregation method, which is equivalent to binary OR on a single bit.
Then, it would not be really complex to calculate the final measure, as we know the values are either zero or one, and thus we can just sum across the days:
CASE
WHEN Measures.[Day1] + ... + Measures.[Day28] = 28 THEN 'DAU'
WHEN Measures.[Day1] + ... + Measures.[Day7] >= 1 AND
Measures.[Day8] + ... + Measures.[Day14] >= 1 AND
Measures.[Day15] + ... + Measures.[Day21] >= 1 AND
Measures.[Day22] + ... + Measures.[Day28] >= 1 THEN 'WAU'
WHEN Measures.[Day1] + ... + Measures.[Day28] >= 20 THEN 'MAU'
ELSE 'Other'
END
The order of the clauses in the CASE is relevant, as the first condition matching is taken, and your definitions of WAU and MAU have some intersection.
If you have finally tested everything, you would make the measures Day1 to Day28 invisible in order not to confuse the users of the cube.
I am working on an Hotel DB, and the booking table changes a lot since people book and cancel reservation all the time. Trying to find out the best way to convert the booking table to a fact table in SSAS. I want to be able to get the right statsics from it.
For example: if a client X booked a room on Sep 20th for Dec 20th and canceled the order on Oct 20th. If I run the cube on the month of September (run it in Nov) and I want to see how many rooms got booked in the month of Sep, the order X made should be counted in the sum.
However, if I run the cube for YTD calculation (run it in Nov), the order shouldn't be counted in the sum.
I was thinking about inserting the updates to the same fact table every night, and in addition to the booking number (unique key) and add revision column to the table. So going back to the example, let say client X booking number is 1234, the first time I enter it to the table will get revision 0, in Oct when I add the cancellation record, it will get revision 1 (of course with timestamp on the row).
Now, if I want to look on any piroed of time, I can take it by the timestamp and look at the MAX(revision).
Does it make sense? Any ideas?
NOTE: I gave the example of cancelling the order, but we want to track another statistics.
Another option I read about is partitioning the cubes, but do I partition the entire table. I want to be able to add changes every night. Will I need to partition the entire table every night? it's a huge table.
One way to handle this is to insert records in your fact table for bookings and cancellations. You don't need to look at the max(revision) - cubes are all about aggregation.
If your table looks like this:
booking number, date, rooms booked
You can enter data like this:
00001, 9/10, 1
00002, 9/12, 1
00001, 10/5, -1
Then your YTDs will always have information accurate as of whatever month you're looking at. Simply sum up the booked rooms.
New to MDX and a bit bewildered...
We have a cube with an excel pivot table over the top for reporting. What I need to provide is an easy means for people to see the max (last) member of the relevant date dimension when also sliced against another dimension, and the row count on that day.
I have a "booking count" measure and I want to see the number of bookings on the last date I had any bookings for any other dimensions children. If that makes sense.
The closest i've got so far is this:
select [Measures].[Booking Count] on 0, filter(([Sub Channel].[Channels].[Channel].members, [Booking Date].[Date].children), not isempty([Measures].[Booking Count])) on 1 from myCube
But I can't then use LastChild or similar to get the last member of each channel.
Hope that makes sense! Huge thanks in advance for any help.
You are nearly there, just use the Tail function to get the last date from your set:
select [Measures].[Booking Count]
on 0,
Tail(filter(([Sub Channel].[Channels].[Channel].members, [Booking Date].[Date].children),
not isempty([Measures].[Booking Count])))
on 1
from myCube