Approaching mixed granularity date dimensions for operational periods in cube design - ssas

I am building a cube in SSAS, modelling (amongst other things) activity of engineering teams. I have a fact table (TeamActivity), with facts such as Mileage and TimeOnSite on a DAILY granularity. This references a date dimension table (DimDate). DimDate contains typical attributes so data can be analysed by calendar/fiscal month or year etc. This is all fine.
In another fact table (TeamPay) we have more facts (HoursClaimed, AmountPaid) which are stored on a WEEKLY granularity per team. These are business-specific operational weeks which run from Saturday to Friday.
Business users want to correlate the data in these two fact tables (e.g. HoursClaimed-TimeOnSite) - obviously they can't go to a "per day" level, but will want to analyse it per operational week or per calendar/fiscal month or year etc.
How can I design the cube to accommodate this? I have looked at Lower Date Granularity for FactBudget which maybe relates to my issue but not sure if this applies in my situation?

For me it is always much more simple to modify the raw data and push it down to the more detailed granularity level.
So in this situation I would pick either first or last day of your week (let business decide whether they want it all on Fri or on Sat) and smack all the facts on to that day of week. Connect to day level in the Date dimension and it is good to go!

Related

Should I separate Actual and Forecast data into 2 tables?

Forecast and Actual data are both being inputted monthly by human within the same column, yet different calculation rules and business objects
Forecast table:
Actual Table:
One might argue that this is a matter of opinion. However, I think the answer is rather obvious. And the answer is "yes".
Why? Forecast data is much more complicated than actual data. For instance, I have worked with a company that does several budgets for a year, before deciding on the final budget in early January. Then on a monthly basis, updates the budget forecast for the year, taking business conditions into account.
So, a forecast might have information such as:
as-of-date
how developed
who approved it
what time period it is for
granularity
and more
In addition, the actuals and forecast might not be at the same level of granularity. For instance, "forecast" might be by business unit. But the actuals might break down by sales person.
These considerations suggest that the two are different entities and should be stored in different tables in the database.

SSAS joining fact tables with different time granularity

I have two Measure Groups, one with time grain year/month(YearMonthDim), the other with time grain datetime(CalenderDim). Can I link the Month-grained fact to the CalenderDimension so I can make reports joined from both fact tables on the time dimension?
I just made a quick fix and added the YearMonthDim to the fact table with the datetime granularity. Is there another way to solve this?
The term I was looking for was Role Playing Dimension.
I connected the second measure group on Month granularity with the Month attribute in the already existing CalendarDimension whose granularity is on days.
It now works like a charm.

Partition and process SSAS cube for huge data

I've a SSAS cube with rigid relationship. Daily I get data from source for last 2 months only. My cube have data since 2010 onwards.
I am planning to partition that cube and then process it. My questions are
I know that in rigid relationship I've to go with Process full. Does that mean that I've to process all partition as Process Full or I can go ahead with selected partition for process full.
How can I design my partition strategy? If I do 2 months partition then I will end up in 6 partition per year and later they may increase. I thought of going with 6 months partition. but if I am on 7th month or 1st month then I've to process two partition(i.e. current + last 6 month). Is it good enough?
Marking attribute relationships as Rigid when they actually do change (meaning when the rollups change such as Product A rolling up to Cereal vs. Oatmeal category) is a bad idea. Just mark them as Flexible relationships. Rigid vs. flexible doesn't impact query performance just processing performance. And if Rigid causes you to do ProcessFull on dimensions that is going to mean you have to reprocess all your measure group partitions. So change relationships to Flexible unless you are 100% sure you never run an UPDATE statement on your dimension table in your ETL.
I would partition by month. Then you can just process the most recent two months every day. To be more explicit:
ProcessUpdate your dimensions
ProcessData the most recent two months of partitions.
ProcessIndexes on your cube (which rebuilds indexes and flexible aggs on older partitions)

Weekly summary table; how to reference the time dimension

We're thinking about adding a weekly summary table to our little data warehouse. We have a classic time dimension down to the daily level (Year/Month/Day) with the appropriate Week/Quarter/etc. columns.
We'd like to have the time key in this new weekly summary table reference our time dimension. What's the best practice here—have the time key reference the id of the first day in the week it represents? Or the last day? Or something entirely different?
By convention, the fact tables with date period aggregations (week, month...) reference the DateKey of the last day of the period -- so, for this example you would reference the last day of the week.
Kind of logical too, the week must end in order to be aggregated.
It is important to clearly state (somewhere) that the grain of the fact table is one-week, so that report designers are aware of this.
Days are a good example of an entity best identified by natural keys — their representations in Gregorian calendar.
To identify a week or a month, it's best to use its first day. In Oracle, you can easily retrieve it by a call to TRUNC:
SELECT TRUNC(fact_date, 'month'), SUM(fact_value)
FROM fact
GROUP BY
TRUNC(fact_date, 'month')
In other systems it's a little bit more complex but quite easy too.
What about making a new dimension "Week"?
You can create a relation between time and week dimension, if you need.
Apropos an earlier answer I would actually expect to store data associated with an interim level of the time dimension hierarchy - when it relates to an atomic measurement for that interim time period - by attaching to the key associated with the first day of the period - this makes it much more straightforward when loading (esp with months - I guess weeks might always require some calculation) and also when reporting - nonetheless it is a convention and as long as you pick a common-sense option (and stick to it) you will be fine.
BTW do not create a week dimension - you should be using a rich time dimension with all the hierarchies available within it for year, quarter, month, week, day etc (bearing in mind there are often multiple, exclusive heirarchies) and in this instance only would also recommend a non-meaningless surrogate key in the form 20100920 - dates are immutable and in this format can easily be contained as int columns so there is little value in using a meaningless keys for dates (or in dim_time either) - if you have ever had to write queries to dereference data where meaningless SKs are used for the time dimension you know the (unnecessary) pain...
M

Data Warehouse - business hours

I'm working on a Data Warehouse which, in the end, will require me to create reports based on business hours. Currently, my time dimension is granular to the hour. I'm wondering if I should be modifying my Time dimension to include a bit field for "business hour" or should I be creating some sort of calculated measure for it on the analysis end? Any examples would be super magnificent?
Use a bit (or even another column) to specify whether an hour is a business hour at the time it is stored. Otherwise when you change the business hours you will become unable to reproduce historical reports.
Is all of your sales data in the same time zone? For example, are you tracking sales for outlets in different time zones, or end users in different time zones? If so, you may want to create that bit field for "business hour" in the sales fact table, because it'll be pretty difficult to calculate that on the fly for users and outlets in different time zones.
Plus, you only want to calculate this once - when the sale is imported into the data warehouse - because this data is not likely to change very often. It's not like you're going to say, "This sale used to be in business hours, but it no longer is."
business hours are business rules, therefore they may change in the future
represent business hours as a base time and a duration, e.g. StartTime 0900, Duration 9.5 hours, that way you can easily change the interval, do what-if scenarios based on different business hours, and business hours can cross date lines without complicating queries
of course, all datetimes should be GMT (UTC), never local time, to avoid daylight savings time complexities
EDIT: I think I misunderstood the question, your data is already granular to the hour... No, I think my answer stands, but with the addition of Effective Start and End dates for the business-hour intervals. That would permit the granularity to change in the future while still preserving history
I'm not sure if this helps, but I'd use UCT to store all the times, and then have a start and end times to specify the business hours. Once that is setup, it would be a simple If (SpecificHour >= BusinessStartingHour) And (SpecificHour <= BusinessEndingHour) Then ... operation.
You can play and test with your different options if you use Microsoft PerformancePoint 2007. You can modify your dimensions and output your results in charts, pivot-tables, other reporting tools etc.
http://office.microsoft.com/en-us/performancepoint/FX101680481033.aspx
Could the "business hours" change over time? I guess I'm asking whether each row needs to tie to a business hour flag, or whether just having the reports themselves (or some reference) table decide whether that transaction took place during a business hour or not is enough.
All else equal, I'd probably have the report do it for you, instead of flagging rows, but if business hours are volatile over time, you'd have to flag the rows to make sure your historic data stays correct.
It's a judgement call I think... one that depends on performance testing, system usage, etc. Personally, I'd probably create an indexed field to hold a flag in the interest of dealing with the logic to determine what is and isn't a business hour up-front (i.e. when the data is loaded). If done correctly (and again, depending on the specific usage) I think you might be able to get a performance gain as well.