Weekly summary table; how to reference the time dimension - sql

We're thinking about adding a weekly summary table to our little data warehouse. We have a classic time dimension down to the daily level (Year/Month/Day) with the appropriate Week/Quarter/etc. columns.
We'd like to have the time key in this new weekly summary table reference our time dimension. What's the best practice here—have the time key reference the id of the first day in the week it represents? Or the last day? Or something entirely different?

By convention, the fact tables with date period aggregations (week, month...) reference the DateKey of the last day of the period -- so, for this example you would reference the last day of the week.
Kind of logical too, the week must end in order to be aggregated.
It is important to clearly state (somewhere) that the grain of the fact table is one-week, so that report designers are aware of this.

Days are a good example of an entity best identified by natural keys — their representations in Gregorian calendar.
To identify a week or a month, it's best to use its first day. In Oracle, you can easily retrieve it by a call to TRUNC:
SELECT TRUNC(fact_date, 'month'), SUM(fact_value)
FROM fact
GROUP BY
TRUNC(fact_date, 'month')
In other systems it's a little bit more complex but quite easy too.

What about making a new dimension "Week"?
You can create a relation between time and week dimension, if you need.

Apropos an earlier answer I would actually expect to store data associated with an interim level of the time dimension hierarchy - when it relates to an atomic measurement for that interim time period - by attaching to the key associated with the first day of the period - this makes it much more straightforward when loading (esp with months - I guess weeks might always require some calculation) and also when reporting - nonetheless it is a convention and as long as you pick a common-sense option (and stick to it) you will be fine.
BTW do not create a week dimension - you should be using a rich time dimension with all the hierarchies available within it for year, quarter, month, week, day etc (bearing in mind there are often multiple, exclusive heirarchies) and in this instance only would also recommend a non-meaningless surrogate key in the form 20100920 - dates are immutable and in this format can easily be contained as int columns so there is little value in using a meaningless keys for dates (or in dim_time either) - if you have ever had to write queries to dereference data where meaningless SKs are used for the time dimension you know the (unnecessary) pain...
M

Related

SSAS joining fact tables with different time granularity

I have two Measure Groups, one with time grain year/month(YearMonthDim), the other with time grain datetime(CalenderDim). Can I link the Month-grained fact to the CalenderDimension so I can make reports joined from both fact tables on the time dimension?
I just made a quick fix and added the YearMonthDim to the fact table with the datetime granularity. Is there another way to solve this?
The term I was looking for was Role Playing Dimension.
I connected the second measure group on Month granularity with the Month attribute in the already existing CalendarDimension whose granularity is on days.
It now works like a charm.

Migrating Relational to Multidimensional - How to Create Time Dimension from Timestamp - SSAS

Objective: I have a relational database(RDB) model. A few tables have an attribute timestamp.
I want to create a date dimension for my multidimensional model.
Viewing the Microsoft Tutorial Solution 3, I noticed that the time Date dimension table's attribute FullDateAlternateKey has the same format as the timestamp attribute in the RDB's tables.
Question: So, I was wondering if there is a way to automatically generate a Date dimension table schema (with the FullDateAlternateKey as primary key) and populate it with the data from the timestamps in the RDB's tables?
Then I could make the timestamp attribute from the RDB's tables a foreign key to the Time dimension table in my multidimensional model.
Don't.
First, decide the "grain" of your dimension. It sounds like you want a DATE dimension, so the grain will be a day.
Then, decide the columns you want in the dimension. Examples are week number, day number in week, day number in year, day name, month name, etc.
Next, build a spreadsheet that contains one row per date, for the range of dates you need, and calculates the columns you require.
Finally, load and process the dimension, from the spreadsheet, using your preferred ETL/ELT method.
The reason you don't build it from the incoming data values is that you may have gaps in the data. A date dimension should have ALL dates in your desired range (ie, 1900-01-01 to 2999-12-31) so that your BI tools can eventually use it for time series reporting. If you don't have ALL the dates, and try to show date on the x-axis of a graph, you will get misrepresentative visualisations.
Another reason for using a spreadsheet as your source is that the DATE dimension is one of the most volatile dimensions in your design. Your users will ask for new columns, and variations on columns (ie, "Can we have a column with the date like 4th. of August, 2017?") and a spreadsheet is a very fast way to manage the data, and rebuild the dimension when necessary.
Step 1:
Choose granularity and generate time dimension key based on that granularity. For example, choosing the granularity hour, one would need something like 2001020323 (yyyymmddhh).
There is no automatic way to do this using SSAS, so it's best to use a script to build the time dimension table in the underlying data source, then import it to DSV in SSAS and use it to build the time dimension. (like this one)
Step 2:
I have to match the time dimension's keys, so I need an ETL process/job/script taking my timestamp as input and returning a key for that timestamp that matches the keys in the time dimension table.

Hour of Day - SSAS

I would like to analyse data per hour using SSAS. The built in date dimension does not create any hour attributes.
Currently I am creating a new table with HourOfDay and HourOfDayName fields and will use this table to create a date dimension.
Could any one tell me if there is a common way of achieving time of day based analysis using SSAS05.
Thanks
Typically you create a separate day and time dimension. This is done to prevent the dimension from growing to be too large. You can add special descriptive attributes to the time dimension to designate time periods that are relevant to your business or type of analysis.(Different shifts in a factory for example). Then you would just use the time dimension to slice the data like any other dimension.
You can also build more interesting analysis paths by pivoting your design and use a time period i.e. duration as a measure. This often requires creating a new fact table or using Relational views.

Fact/Dim Table Time Value

I'm setting up Fact and Dim tables and trying to figure out the best way to setup my time values. AdventureworksDW uses a timekey (UID) for each time entry in the DimTime table. I'm wondering there's any reason I shouldn't just use a time value instead i.e. 0106090800 (My granularity is hourly)?
"Intelligent keys" (in this case, a coded date and hour number) can lead to problems when you want to change definitions in your dimension. For example, your users might insist on a change from local time to UTC. Now your key is no longer actually a useful number, it's the old value in the dimension.
Further, with a midnight roll-over issue, the date part of your intelligent key might not match the actual date of the UTC vs. local time change.
To prevent the key from becoming a problem, you can't use it for any calculation of any kind. In which case, it's little better than a simple GUID or auto-increment number.
Auto-increment keys (or GUIDS) are fast and simple. Most important, they are trivially consistent across all dimensions.
Time happens to have a numeric mapping, but it helps to look at this is a weird coincidence, not a basis for a good design.
Here's Ralph Kimball's latest on time dimension. It's dated 2004, but it's still good.
This one will help, too.
The primary key should be surrogate, meaningless -- however, using YYYYMMDD for date dimension key is hard to resist, and also allows for easy table partitioning. The trick is that it should still be regarded as meaningless -- the fact that it looks like a date should be regarded as purely coincidental. This key should never be exposed to business users.

Data Warehouse - business hours

I'm working on a Data Warehouse which, in the end, will require me to create reports based on business hours. Currently, my time dimension is granular to the hour. I'm wondering if I should be modifying my Time dimension to include a bit field for "business hour" or should I be creating some sort of calculated measure for it on the analysis end? Any examples would be super magnificent?
Use a bit (or even another column) to specify whether an hour is a business hour at the time it is stored. Otherwise when you change the business hours you will become unable to reproduce historical reports.
Is all of your sales data in the same time zone? For example, are you tracking sales for outlets in different time zones, or end users in different time zones? If so, you may want to create that bit field for "business hour" in the sales fact table, because it'll be pretty difficult to calculate that on the fly for users and outlets in different time zones.
Plus, you only want to calculate this once - when the sale is imported into the data warehouse - because this data is not likely to change very often. It's not like you're going to say, "This sale used to be in business hours, but it no longer is."
business hours are business rules, therefore they may change in the future
represent business hours as a base time and a duration, e.g. StartTime 0900, Duration 9.5 hours, that way you can easily change the interval, do what-if scenarios based on different business hours, and business hours can cross date lines without complicating queries
of course, all datetimes should be GMT (UTC), never local time, to avoid daylight savings time complexities
EDIT: I think I misunderstood the question, your data is already granular to the hour... No, I think my answer stands, but with the addition of Effective Start and End dates for the business-hour intervals. That would permit the granularity to change in the future while still preserving history
I'm not sure if this helps, but I'd use UCT to store all the times, and then have a start and end times to specify the business hours. Once that is setup, it would be a simple If (SpecificHour >= BusinessStartingHour) And (SpecificHour <= BusinessEndingHour) Then ... operation.
You can play and test with your different options if you use Microsoft PerformancePoint 2007. You can modify your dimensions and output your results in charts, pivot-tables, other reporting tools etc.
http://office.microsoft.com/en-us/performancepoint/FX101680481033.aspx
Could the "business hours" change over time? I guess I'm asking whether each row needs to tie to a business hour flag, or whether just having the reports themselves (or some reference) table decide whether that transaction took place during a business hour or not is enough.
All else equal, I'd probably have the report do it for you, instead of flagging rows, but if business hours are volatile over time, you'd have to flag the rows to make sure your historic data stays correct.
It's a judgement call I think... one that depends on performance testing, system usage, etc. Personally, I'd probably create an indexed field to hold a flag in the interest of dealing with the logic to determine what is and isn't a business hour up-front (i.e. when the data is loaded). If done correctly (and again, depending on the specific usage) I think you might be able to get a performance gain as well.