Im trying to create a daily calculation in my Cube or an MDX statement that will do a calculation daily and roll up to any dimension. I've been able to successfully get the values back, however the performance is not what I think it should be.
My fact table will have 4 dimensions 1 of which being daily date (time). I have a formula that uses 4 other measures in this fact table and those need to be calculated daily and then geometrically linked across the time dimension.
The following MDX statement works great and produces the correct value but it is very slow. I have tried using exp(sum(log+1))-1 and multiply seems to perform a little better but not good enough. Is there another approach to this solution or is there something wrong with my MDX statement?
I have tried defining aggregations For [Calendar_Date] and [Dim_Y].[Y ID], but it does not seem to use these aggregations.
WITH
MEMBER Measures.MyCustomCalc AS (
(
Measures.x -Measures.y
) -
(
Measures.z - Measures.j
)
)
/
Measures.x
MEMBER Measures.LinkedCalc AS ASSP.MULTIPLY(
[Dim_Date].[Calendar Date].Members,
Measures.MyCustomCalc + 1
) - 1
SELECT
Measures.LinkedCalc ON Columns,
[Dim_Y].[Y ID].Members ON Rows
FROM
[My DB]
The above query takes 7 seconds to run w/ the following number of records:
Measure: 98,160 records
Dim_Date: 5,479 records
Dim_Y: 42 records
We have assumed that by defining an aggregation that the amount of calculations we'd be performing would only be 42 * number of days, in this case a maximum of 5479 records.
Any help or suggestions would be greatly appreciated!
Related
I've created a query that I'm hoping to use to fill a table with daily budgets at the end of every day. To do this, I just need some simple maths:
monthly budget / number of days left in the month.
Then, at the end of the month, I can SUM all of the rows to calculate an accurate budget.
The query is as follows:
SELECT *,
ROUND(SAFE_DIVIDE(budget, DATETIME_DIFF(CURRENT_DATE(), LAST_DAY(CURRENT_DATE()), DAY)),2) AS daily_budget
FROM `mm-demo-project.marketing_hub.budget_manager`
When executing the query, my results present as negative numbers, which according to the documentation for this function, is likely caused by the result overflowing the result type.
View the results of the query.
I've made a fools guess at rounding the calculation. Needless to say that it did not work at all.
How can I stop my query from returning negative number?
use below
SELECT *,
ROUND(SAFE_DIVIDE(budget, DATETIME_DIFF(LAST_DAY(CURRENT_DATE()), CURRENT_DATE(), DAY)),2) AS daily_budget
FROM `mm-demo-project.marketing_hub.budget_manager`
I am new to SSAS and I am trying to write a query in ssas caluculation tab that will show all customers (customername) and items ordered (ItemName) in last 40 days. I am not sure is calculation tab the best option to do this.
I have got only one measure VOLUME and dimension attributes (customername,itemname,officename,shipdate ...) I have got also Date Hierarchy with Year->Q->Month->Date.
I have got this statement that gives me netvolume for specific itemname in last 40 days, but what i am looking for is all customers and items ordered in last 40 days. I do not need measure. ( this 40 days might change to 20 or 60 days)
SELECT
{ [Measures].[NET VOLUME]} ON COLUMNS,
FILTER
(
{[CUBENAME].[SHIPDATE].&[2018-03-23T00:00:00]:[CUBENAME].[SHIPDATE].&[2018-05-01T00:00:00]},
[CUBENAME].[ITEMNAME].&[11# RPC 6411]
) ON ROWS
FROM[CUBENAME]
Try using named query in the DSV and filter based on the date field.
In a model that contains the following dimensions:
- Time - granularity month - 5 years - 20 quarters - 60 months
- Suppliers- 6000 suppliers at lowest level
- departments - 500 departments on lowest level
I need to have the distinct count of the suppliers for each department.
I use the function:
with member [measures].[#suppliers] as
distinctcount(([Supplier].[Supplier].[supplier].members
,[Measures].[amount]))
)
select [Measures].[#suppliers] on 0
, order([Department].[Department].[department].members, [#suppliers], BDESC) on 1
from [cube]
where [Time].[Time].[2017 10]:[Time].[Time].[2018 01]
The time component may vary, as the dashboard user is free to choose a reporting period.
But the MDX is very slow. It takes about 38ms to calculate the measure for each row. I want to use this measure to rank the departments and to calculate a cumulative % and assign scores to these values. As you can imagine performance will not improve.
I have tried to use functions and cache the result, but results - for me - got worse (according to the log 2x as bad).
What can I do to improve the performance?
To go fast adding a measure that calculates de Distinct Count on the Supplier ID of the table associated to[Measures].[Amount] will help. In the Schema definition.
The other ones are not scalable as Supplier is growing.
Nonetheless, why did you use DistinctCount instead of Count(NonEmpty())) ?
DistinctCount is mainly for calculating the number of members/tuples that are different in a set. It only makes sense if it's possible to have two same members in a set. As our initial members have no duplicated, it's useless.
Count(NonEmpty()) filters the set whith the nonempty and counts the number of items in the set. This can be easily calculated in parallel
I want to model a fact table for our users to help us calculate DAU (Daily active Users), WAU (Weekly active users) and MAU (Monthly active users).
The definitions of these measures are as follows:
1. DAU are users who is active every day during last 28 days.
2. WAU are users who are active at least on one day in each 7 days period during last 28 days
3. MAU are users who are active at least 20 days during last 28 days
I have built a SSAS cube with my fact table and user dimension table as follows
Fact : { date, user_id, activity_name}
Dimension: { date, user_id, gender, age, country }
Now I want to build a cube over this data so that we can see all the measures in any given day for last 28 days.
I thought of initially storing 28 days of data for all users in the SQL server and then do count distinct on date to see which measures they fall into.. but this proved very expensive since the data per day is huge..almost 10 millions rows.
So my next thought was to model the fact table (before moving it to SQL) such that it has a new column called "active_status" which is a 32 bit binary type column.
Basically, I'll store a binary number (or decimal equivalent) like 11000001101111011111111111111 which has a bit set on the days the user is active and off on the days user is not active.
This way I can compress 28 days worth of data in a single day before loading into data mart
Now the problem is , I think MDX doesn't support bitwise operations on columns in the expressions for calculated members like regular SQL does. I was hoping to create calculated measures daily_active_users, weekly_active_users and monthly_active_users using MDX that looks at this active_status bit for the user and does bitwise operation to determine the status.
Any suggestions on how to solve this problem? if MDX doesn't allow bitwise, what else can I do SSAS to achieve this.
thanks for the help
Additonal notes:
#Frank
Interesting thought about using a view to do the conversion from bitset to a dimension category..but I'm afraid it won't work. Because I have few dimensions connected to this fact table that have many-many relationships..for ex: I have a dimension called DimLanguage and another dimension called DimCountry and they have many-many relationship. And what ultimately I would like to do in the cube is to calculate the DAU/WAU/MAU which are COUNT(DISTINCT UserId) based on the combination of dimensions. So for ex; If a user is not MAU for dimension country US because he is only active 15 days out of 28 ....but he will be considered
You do not want to show the bitmap data to the users of the cube, but just the categories DAU, WAU, MAU, you should do the conversion from bitmap to category on data loading time. Just create a dimension table containing e. g. the following data:
id category
-- --------
1 DAU
2 WAU
3 MAU
Then define a view on your fact table that evaluates the bitmap data, and for each user and each date just calculates the id value of the category the user is in. This is then conceptually a foreign key to the dimension table. Use this view instead of the fact table in your cube.
All the bitmap evaluations are thus done on the relational side, where you have the bit operators available.
EDIT
As your requirement is that you need to aggregate the bitmap data in Analysis Services using bitwise OR as the aggregation method, I see no simple way to do that.
What you could do, however, would be to have 28 single columns, say Day1 to Day28, which would be either 0 or 1. These could be of type byte to save some space. You would use Maximum as aggregation method, which is equivalent to binary OR on a single bit.
Then, it would not be really complex to calculate the final measure, as we know the values are either zero or one, and thus we can just sum across the days:
CASE
WHEN Measures.[Day1] + ... + Measures.[Day28] = 28 THEN 'DAU'
WHEN Measures.[Day1] + ... + Measures.[Day7] >= 1 AND
Measures.[Day8] + ... + Measures.[Day14] >= 1 AND
Measures.[Day15] + ... + Measures.[Day21] >= 1 AND
Measures.[Day22] + ... + Measures.[Day28] >= 1 THEN 'WAU'
WHEN Measures.[Day1] + ... + Measures.[Day28] >= 20 THEN 'MAU'
ELSE 'Other'
END
The order of the clauses in the CASE is relevant, as the first condition matching is taken, and your definitions of WAU and MAU have some intersection.
If you have finally tested everything, you would make the measures Day1 to Day28 invisible in order not to confuse the users of the cube.
I have a Microsoft Access 2010 database of thyroid surgeries. I have a query that counts the number of surgeries in the entire database. I have a second query that counts the number of surgeries performed between certain dates. I created a new query using SQL to calculate the percentage of surgeries in that date range (a percentage of the total surgery number). Here is the code:
SELECT
((select count(ThyroidSurgeryType)
from [Thyroid Surgeries]
HAVING ([Thyroid Surgeries].SurgeryDate) Between #1/1/2011# And #12/31/2012#)/(select count(ThyroidSurgeryType)
from [Thyroid Surgeries])) AS Percentage
FROM [Thyroid Surgeries];
I get .33 (I then set the row format to percentage to get 33%), but I get 6 rows of 33% as opposed to just one. Why is it displaying this way? Thanks
The way you're using inline queries, you're executing the same query per row of the outer query. So if your table has six rows, you'd be executing it six time (and getting the same result for each query, naturally). You can simplify things by removing the outer query and using an iif expression:
SELECT SUM (IIF (SurgeryDate Between #1/1/2011# And #12/31/2012#, 1, 0)) /
COUNT(ThyroidSurgeryType) AS Percentage
FROM [Thyroid Surgeries];