Is it best practice to separate datetime into two dimensions: day and UTCtime especially in the context of tabular models? Google did not provide much information.
Thanks.
It depends on the data set you are analyzing, if you want to analyse on hour or minute it is better for performance to separate the dimensions. This would require Using a datekey and timekey for your fact tables.
Related
I have two Measure Groups, one with time grain year/month(YearMonthDim), the other with time grain datetime(CalenderDim). Can I link the Month-grained fact to the CalenderDimension so I can make reports joined from both fact tables on the time dimension?
I just made a quick fix and added the YearMonthDim to the fact table with the datetime granularity. Is there another way to solve this?
The term I was looking for was Role Playing Dimension.
I connected the second measure group on Month granularity with the Month attribute in the already existing CalendarDimension whose granularity is on days.
It now works like a charm.
Basically we are building a reporting dashboard for our software. We are giving the Clients the ability to view basic reporting information.
Example: (I've removed 99% of the complexity of our actual system out of this example, as this should still get across what I'm trying to do)
One example metric would be...the number of unique products viewed over a certain time period. AKA, if 5 products were each viewed by customers 100 times over the course of a month. If you run the report for that month, it should just say 5 for number of products viewed.
Are there any recommendations on how to go about storing data in such a way where it can be queried for any time range, and return a unique count of products viewed. For the sake of this example...lets say there is a rule that the application cannot query the source tables directly, and we have to store summary data in a different database and query it from there.
As a side note, we have tons of other metrics we are storing, which we store aggregated by day. But this particular metric is different because of the uniqueness issue.
I personally don't think it's possible. And our current solution is that we offer 4 pre-computed time ranges where metrics affected by uniqueness are available. If you use a custom time range, then that metric is no longer available because we don't have the data pre-computed.
Your problem is that you're trying to change the grain of the fact table. This can't be done.
Your best option is what I think you are doing now - define aggregate fact tables at the grain of day, week and month to support your performance constraint.
You can address the custom time range simply by advising your users that this will be slower than the standard aggregations. For example, a user wanting to know the counts of unique products sold on Tuesdays can write a query like this, at the expense of some performance loss:
select distinct dim_prod.pcode
,count(*)
from fact_sale
join dim_prod on dim_prod.pkey = fact_sale.pkey
join dim_date on dim_date.dkey = fact_sale.dkey
where dim_date.day_name = 'Tuesday'
group by dim_prod.pcode
The query could also be written against a daily aggregate rather than a transactional fact, and as it would be scanning less data it would run faster, maybe even meeting your need
From the information that you have provided, I think you are trying to measure ' number of unique products viewed over a month (for example)'.
Not sure if you are using Kimball methodologies to design your fact tables. I believe in Kimball methodology, an Accumulating Snapshot Fact table will be recommended to meet such a requirement.
I might be preaching to the converted(apologies in that case), but if not then I would let you go through the following link where the experts have explained the concept in detail:
http://www.kimballgroup.com/2012/05/design-tip-145-time-stamping-accumulating-snapshot-fact-tables/
I have also included another link from Kimball, which explains different types of fact tables in detail:
http://www.kimballgroup.com/2014/06/design-tip-167-complementary-fact-table-types/
Hope that explains the concepts in detail. More than happy to answer any questions(to the best of my ability)
Cheers
Nithin
We're thinking about adding a weekly summary table to our little data warehouse. We have a classic time dimension down to the daily level (Year/Month/Day) with the appropriate Week/Quarter/etc. columns.
We'd like to have the time key in this new weekly summary table reference our time dimension. What's the best practice here—have the time key reference the id of the first day in the week it represents? Or the last day? Or something entirely different?
By convention, the fact tables with date period aggregations (week, month...) reference the DateKey of the last day of the period -- so, for this example you would reference the last day of the week.
Kind of logical too, the week must end in order to be aggregated.
It is important to clearly state (somewhere) that the grain of the fact table is one-week, so that report designers are aware of this.
Days are a good example of an entity best identified by natural keys — their representations in Gregorian calendar.
To identify a week or a month, it's best to use its first day. In Oracle, you can easily retrieve it by a call to TRUNC:
SELECT TRUNC(fact_date, 'month'), SUM(fact_value)
FROM fact
GROUP BY
TRUNC(fact_date, 'month')
In other systems it's a little bit more complex but quite easy too.
What about making a new dimension "Week"?
You can create a relation between time and week dimension, if you need.
Apropos an earlier answer I would actually expect to store data associated with an interim level of the time dimension hierarchy - when it relates to an atomic measurement for that interim time period - by attaching to the key associated with the first day of the period - this makes it much more straightforward when loading (esp with months - I guess weeks might always require some calculation) and also when reporting - nonetheless it is a convention and as long as you pick a common-sense option (and stick to it) you will be fine.
BTW do not create a week dimension - you should be using a rich time dimension with all the hierarchies available within it for year, quarter, month, week, day etc (bearing in mind there are often multiple, exclusive heirarchies) and in this instance only would also recommend a non-meaningless surrogate key in the form 20100920 - dates are immutable and in this format can easily be contained as int columns so there is little value in using a meaningless keys for dates (or in dim_time either) - if you have ever had to write queries to dereference data where meaningless SKs are used for the time dimension you know the (unnecessary) pain...
M
I would like to analyse data per hour using SSAS. The built in date dimension does not create any hour attributes.
Currently I am creating a new table with HourOfDay and HourOfDayName fields and will use this table to create a date dimension.
Could any one tell me if there is a common way of achieving time of day based analysis using SSAS05.
Thanks
Typically you create a separate day and time dimension. This is done to prevent the dimension from growing to be too large. You can add special descriptive attributes to the time dimension to designate time periods that are relevant to your business or type of analysis.(Different shifts in a factory for example). Then you would just use the time dimension to slice the data like any other dimension.
You can also build more interesting analysis paths by pivoting your design and use a time period i.e. duration as a measure. This often requires creating a new fact table or using Relational views.
I'm working on a Data Warehouse which, in the end, will require me to create reports based on business hours. Currently, my time dimension is granular to the hour. I'm wondering if I should be modifying my Time dimension to include a bit field for "business hour" or should I be creating some sort of calculated measure for it on the analysis end? Any examples would be super magnificent?
Use a bit (or even another column) to specify whether an hour is a business hour at the time it is stored. Otherwise when you change the business hours you will become unable to reproduce historical reports.
Is all of your sales data in the same time zone? For example, are you tracking sales for outlets in different time zones, or end users in different time zones? If so, you may want to create that bit field for "business hour" in the sales fact table, because it'll be pretty difficult to calculate that on the fly for users and outlets in different time zones.
Plus, you only want to calculate this once - when the sale is imported into the data warehouse - because this data is not likely to change very often. It's not like you're going to say, "This sale used to be in business hours, but it no longer is."
business hours are business rules, therefore they may change in the future
represent business hours as a base time and a duration, e.g. StartTime 0900, Duration 9.5 hours, that way you can easily change the interval, do what-if scenarios based on different business hours, and business hours can cross date lines without complicating queries
of course, all datetimes should be GMT (UTC), never local time, to avoid daylight savings time complexities
EDIT: I think I misunderstood the question, your data is already granular to the hour... No, I think my answer stands, but with the addition of Effective Start and End dates for the business-hour intervals. That would permit the granularity to change in the future while still preserving history
I'm not sure if this helps, but I'd use UCT to store all the times, and then have a start and end times to specify the business hours. Once that is setup, it would be a simple If (SpecificHour >= BusinessStartingHour) And (SpecificHour <= BusinessEndingHour) Then ... operation.
You can play and test with your different options if you use Microsoft PerformancePoint 2007. You can modify your dimensions and output your results in charts, pivot-tables, other reporting tools etc.
http://office.microsoft.com/en-us/performancepoint/FX101680481033.aspx
Could the "business hours" change over time? I guess I'm asking whether each row needs to tie to a business hour flag, or whether just having the reports themselves (or some reference) table decide whether that transaction took place during a business hour or not is enough.
All else equal, I'd probably have the report do it for you, instead of flagging rows, but if business hours are volatile over time, you'd have to flag the rows to make sure your historic data stays correct.
It's a judgement call I think... one that depends on performance testing, system usage, etc. Personally, I'd probably create an indexed field to hold a flag in the interest of dealing with the logic to determine what is and isn't a business hour up-front (i.e. when the data is loaded). If done correctly (and again, depending on the specific usage) I think you might be able to get a performance gain as well.