Grouping SSAS measures - ssas

Hi I have a fact table in my cube with member visits to a building. There is an hour dimension on the fact table. For example:
Member 1 visits 2 times between 1:00-2:00
Member 2 visits 2 times between 2:00-3:00
Member 1 visits 2 times between 4:00-5:00
So overall 6 individual visits. Is there a way of getting unique visitors between 1:00 - 4:00 which in this case is 2.
It's a bit of vague question but any pointers in the right direction would be appreciated.

use distinct count in the measure aggregation
Semi additive measures

Related

How to calculated on created fields? Why the calculation is wrong?

I am working on the workforce analysis project. And I did some case when conditional calculations in Google Data Studio. However, when I successfully conducted the creation of the new field, I couldn't do the calculation again based on the fields I created.
Based on my raw data, I generated the start_headcount, new_hires, terminated, end_headcount by applying the Case When conditional calculations. However, I failed in the next step to calculate the Turnover rate and Retention rate.
The formula for Turnover rate is
terms/((start_headcount+end_headcount)/2)
for retention is
end_headcount/start_headcount
However, the result is wrong. Part of my table is as below:
Supervisor sheadcount newhire terms eheadcount turnover Retention
A 1 3 1 3 200% 0%
B 6 2 2 6 200% 500%
C 6 1 3 4 600% 300%
So the result is wrong. The turnover rate for A should be 1/((1+3)/2)=50%; For B should be 2/((6+6)/2)=33.33%.
I don't know why it is going wrong. Can anyone help?
For example, I wrote below for start_headcount for each employee
CASE
WHEN Last Hire Date<'2018-01-01' AND Termination Date>= '2018-01-01'
OR Last Hire Date<'2018-01-01' AND Termination Date IS NULL
THEN 1
ELSE 0
END
which means if an employee meets the above standard, will get 1. And then they all grouped under a supervisor. I think it might be the problem why the turnover rate in sum is wrong since it is not calculated on the grouped date but on each record and then summed up.
Most likely you are trying to do both steps within the same query and thus newly created fields like start_headcount, etc. not visible yet within the same select statement - instead you need to put first calculation as a subquery as in example below
#standardSQL
SELECT *, terms/((start_headcount+end_headcount)/2) AS turnover
FROM (
<query for your first step>
)

Distinctcount - suppliers for departments over a period of time - slow performance

In a model that contains the following dimensions:
- Time - granularity month - 5 years - 20 quarters - 60 months
- Suppliers- 6000 suppliers at lowest level
- departments - 500 departments on lowest level
I need to have the distinct count of the suppliers for each department.
I use the function:
with member [measures].[#suppliers] as
distinctcount(([Supplier].[Supplier].[supplier].members
,[Measures].[amount]))
)
select [Measures].[#suppliers] on 0
, order([Department].[Department].[department].members, [#suppliers], BDESC) on 1
from [cube]
where [Time].[Time].[2017 10]:[Time].[Time].[2018 01]
The time component may vary, as the dashboard user is free to choose a reporting period.
But the MDX is very slow. It takes about 38ms to calculate the measure for each row. I want to use this measure to rank the departments and to calculate a cumulative % and assign scores to these values. As you can imagine performance will not improve.
I have tried to use functions and cache the result, but results - for me - got worse (according to the log 2x as bad).
What can I do to improve the performance?
To go fast adding a measure that calculates de Distinct Count on the Supplier ID of the table associated to[Measures].[Amount] will help. In the Schema definition.
The other ones are not scalable as Supplier is growing.
Nonetheless, why did you use DistinctCount instead of Count(NonEmpty())) ?
DistinctCount is mainly for calculating the number of members/tuples that are different in a set. It only makes sense if it's possible to have two same members in a set. As our initial members have no duplicated, it's useless.
Count(NonEmpty()) filters the set whith the nonempty and counts the number of items in the set. This can be easily calculated in parallel

What is the best partitioning strategy for multiple distinct count measures in a cube

I have a cube that has a fact table with a month's worth of data. The fact table is 1.5 billion rows.
Fact table contains the following columns { DateKey,UserKey,ActionKey, ClientKey, ActionCount } .
The fact table contains one row per user per client per action per day with the no of activities done.
Now I want to calculate the below measures in my cube as follows
Avg Days Engaged per user
AVG([Users].[User Key].[User Key], [Measures].[DATE COUNT])
Users Engaged >= 14 Days
SUM([Users].[User Key].[User Key], IIF([Measures].[DATE COUNT] >= 14, 1, 0))
Avg Requests Per User
IIF([Measures].[USER COUNT] = 0, 0 ,[Measures].[ACTIVITY COUNT]/[Measures].[USER COUNT])
So to do this, I have created two distinct count measures DATE COUNT and USER COUNT which are distinct aggregations on the DateKey and UserKey columns of fact table. I want to know partition the measure group(s) ( there are 3 of them bcoz of distinct measure going in it's own measure group).
What is the best strategy to partition the cube? I have read the analysis service distinct count guide end-end and it mentioned that partitioning the cube by non-overlapping user ids is the best strategy for single user queries and user X time is the best for single user time-set queries.
I want to know if I should partition by cube into 75 partitions each (1.5 billion rows/20 million rows per partition) which will have each partition with non-overlapping and sequential user ids or should I partition it into 31 partitions one per day with overlapping userid but distinct days in each partition or 31 * 3 = 93 partitions where I break down the cube into per day and then for each day further partition in to 3 equal parts with non-overlapping user ids within each day (but users will overlap between days) or partition by ActionKey into 45 partitions of un-equal size since most of the times the measures are sliced by Action?
I'm a bit confused because the paper only talks about optimizing on a single distinct count measure, where as I need to do distinct counts on both user and dates for my measures.
any tips ?
I would first take a step back and try the Many-to-Many dimension count technique to achieve Distinct Count results without the overhead of actual Distinct Count aggregations.
Probably the best explanation of this is the "Distinct Count" section of the "Many to Many Revolution 2.0" paper:
http://www.sqlbi.com/articles/many2many/
Note Solution C is the one I am referring to.
You usually find this solution scales much better than a standard "Distinct Count" measure. For example I have one cube with 2b rows in the biggest Fact (and only 4 partitions), and a "M2M Distinct Count" fact on 9m rows - performance is great e.g. 6-7 hours to completely reprocess all the data, less than 5 seconds for most queries. The server is OK but not great e.g. VM, 4 cores, 32 GB RAM (shared with SQL, SSRS, SSIS etc), no SSD.
I think you can get carried away with too many partitions and overcomplicating the design. The basic engine can do wonders with careful design.

How to do bitwise operations in SSAS cube for aggregations using MDX

I want to model a fact table for our users to help us calculate DAU (Daily active Users), WAU (Weekly active users) and MAU (Monthly active users).
The definitions of these measures are as follows:
1. DAU are users who is active every day during last 28 days.
2. WAU are users who are active at least on one day in each 7 days period during last 28 days
3. MAU are users who are active at least 20 days during last 28 days
I have built a SSAS cube with my fact table and user dimension table as follows
Fact : { date, user_id, activity_name}
Dimension: { date, user_id, gender, age, country }
Now I want to build a cube over this data so that we can see all the measures in any given day for last 28 days.
I thought of initially storing 28 days of data for all users in the SQL server and then do count distinct on date to see which measures they fall into.. but this proved very expensive since the data per day is huge..almost 10 millions rows.
So my next thought was to model the fact table (before moving it to SQL) such that it has a new column called "active_status" which is a 32 bit binary type column.
Basically, I'll store a binary number (or decimal equivalent) like 11000001101111011111111111111 which has a bit set on the days the user is active and off on the days user is not active.
This way I can compress 28 days worth of data in a single day before loading into data mart
Now the problem is , I think MDX doesn't support bitwise operations on columns in the expressions for calculated members like regular SQL does. I was hoping to create calculated measures daily_active_users, weekly_active_users and monthly_active_users using MDX that looks at this active_status bit for the user and does bitwise operation to determine the status.
Any suggestions on how to solve this problem? if MDX doesn't allow bitwise, what else can I do SSAS to achieve this.
thanks for the help
Additonal notes:
#Frank
Interesting thought about using a view to do the conversion from bitset to a dimension category..but I'm afraid it won't work. Because I have few dimensions connected to this fact table that have many-many relationships..for ex: I have a dimension called DimLanguage and another dimension called DimCountry and they have many-many relationship. And what ultimately I would like to do in the cube is to calculate the DAU/WAU/MAU which are COUNT(DISTINCT UserId) based on the combination of dimensions. So for ex; If a user is not MAU for dimension country US because he is only active 15 days out of 28 ....but he will be considered
You do not want to show the bitmap data to the users of the cube, but just the categories DAU, WAU, MAU, you should do the conversion from bitmap to category on data loading time. Just create a dimension table containing e. g. the following data:
id category
-- --------
1 DAU
2 WAU
3 MAU
Then define a view on your fact table that evaluates the bitmap data, and for each user and each date just calculates the id value of the category the user is in. This is then conceptually a foreign key to the dimension table. Use this view instead of the fact table in your cube.
All the bitmap evaluations are thus done on the relational side, where you have the bit operators available.
EDIT
As your requirement is that you need to aggregate the bitmap data in Analysis Services using bitwise OR as the aggregation method, I see no simple way to do that.
What you could do, however, would be to have 28 single columns, say Day1 to Day28, which would be either 0 or 1. These could be of type byte to save some space. You would use Maximum as aggregation method, which is equivalent to binary OR on a single bit.
Then, it would not be really complex to calculate the final measure, as we know the values are either zero or one, and thus we can just sum across the days:
CASE
WHEN Measures.[Day1] + ... + Measures.[Day28] = 28 THEN 'DAU'
WHEN Measures.[Day1] + ... + Measures.[Day7] >= 1 AND
Measures.[Day8] + ... + Measures.[Day14] >= 1 AND
Measures.[Day15] + ... + Measures.[Day21] >= 1 AND
Measures.[Day22] + ... + Measures.[Day28] >= 1 THEN 'WAU'
WHEN Measures.[Day1] + ... + Measures.[Day28] >= 20 THEN 'MAU'
ELSE 'Other'
END
The order of the clauses in the CASE is relevant, as the first condition matching is taken, and your definitions of WAU and MAU have some intersection.
If you have finally tested everything, you would make the measures Day1 to Day28 invisible in order not to confuse the users of the cube.

SSAS cube - Finding count of days overdue per order

I'm trying to add a new measure to my OLAP cube, in SSAS.
The fact table is at the order detail level, so each row in the fact table represents an order and a product, like:
I'd like to add a measure that indicates the maximum number of days overdue per order number. So that measure should say:
OrderNo 1 -> 5 days overdue
OrderNo 2 -> 1 day overdue
OrderNo 3 -> 0 days overdue
I have tried using the MAX operator, with no success.
By the way, this is a multidimensional SSAS schema. I'm using SQL Server 2008.
Thanks in advance!
Just define a measure "max days overdue" or how ever you want to call it in the cube designer, tab "cube structure". In the properties of the measure, use "Max" as the aggregate function. That's it.