As of now in my fact table i have around 6 Date Keys which needs to be linked to one Date Dimension.
Possible solution on table with me is:
Creating multiple date dimension and linking it with Fact Table. As this involves duplicating the data. I wanted to avoid this solution as i have around 10 Date Columns in my fact Table.
I am looking for a solution in order to reduce the redundancy of number of Tables in my Model.
Something like User-Role
Without increasing the number of tables, we can use the relationship which is relevant to the measures. This can be achieved by enforcing the data model to activate the relationship we need. The relationship can be moved from inactive to active in DAX using USERELATIONSHIP function. Below are the snaps showing one active and 2 inactive relationships and usage of "USERELATIONSHIP" while implementing the measures and results later.
Project requires a usable data warehouse (DW) in SQL Server tables. They prefer no analysis services, with the SQL Server DW providing everything they need.
They are starting to use PowerBI, and have expressed the desire to provide all facts and measures in SQL Server tables, as opposed to a multi dimensional cube. Client has also used SSRS (to a large degree), and some Excel (by users).
At first they only required the Revenue FACT for Period x Product x Location. This uses a periodic snapshot type of fact, not a transactional grain fact.
So, then, to provide YTD measures for all periods, my first challenge was filling in the "empty" facts, for which there was no revenue, but there was revenue in prior (and/or subsequent) periods. The empty fact table includes a column for the YTD measure.
I solved that by creating empty facts -- no revenue -- for the no revenue periods, such as this:
Period 1, Loc1, Widget1, $1 revenue, $1 YTD
Period 2, Loc1, Widget1, $0 revenue, $1 YTD (this $0 "fact" created for YTD)
Period 3, Loc1, Widget1, $1 revenue, $2 YTD
I'm just using YTD as an example, but the requirements include measures for Last 12 Months, and Annualized, in addition to YTD.
I found that the best way to tally YTD was actually to create records to hold the measure (YTD) where there is no fact coming from the transactional data (meaning: no revenue for that combination of dimensions).
Now the requirements need the revenue fact by two more dimensions, Market Segment and Customer. This means I need to refactor my existing stored procedures to do the same process, but now for a more granular fact:
Period x Widget x Location x Market x Customer
This will result in creating many more records to hold the YTD (and other) measures. There will be far more records for these measures than are were real facts.
Here are what I believe are possible solutions:
Just do it in SQL DW table(s). This makes it easy to use wherever needed. (like it is now)
Do this in Power BI -- assume DAX expression in PBIX?
SSAS Tabular -- is tabular an appropriate place calculate the YTD, etc. measures, or should it be handled at the reporting layer?
For what it's worth, the client is reluctant to use SSAS Tabular because they want to keep the number of layers to a minimum.
Follow up questions:
Is there a SQL Server architecture to provide this sort of solution as I did it, maybe reducing the number of records necessary?
If they use PowerBI for YTD, 12M, Annualized measures, what do I need to provide in the SQL DW, anything more than the facts?
Is this something that SSAS Tabular solves, inherently?
This is my experience:
I have always maintained that the DW should be where all of your data is. Then any client tool can use that DW and get the same answer.
I came across the same issue in a recent project: Generating "same day last year" type calcs (along with YTD, Fin YTD etc.). SQL Server seemed like the obvious place to put these 'sparse' facts but as I discovered (as you have) the sparsity just gets bigger and bigger and more complicated as the dimensions increase, and you end up blowing out the size and continually coming back and chasing down of those missing sparse facts, and worst of all having to come up with weird 'allocation' rules to push measures down to the require level of detail
IMHO DAX is the place to do this but there is a lot of pain in learning the language, especially of you come from a traditional relational background. But I really do think it's the best thing since SQL, if you can just get past the learning curve.
One of the most obvious advantages of using DAX, not the DW, is that DAX recognises what your current filters are in the client tool (in Power BI, excel, or whatever) at run time and can adjust it's calc automatically. Obviously you can't do that with figures in the DW. For example you can recognise the person or chart or row is filtered on a given date, so your current year/prior calcs automatically calc the correct YTD based on the date.
DAX has a number of 'calendar' type functions (called "time intelligence"), but they only work for a particular type of calendar and there are a lot of constraints, so usually you end up needing to create your own calendar table and build functions around that calendar table.
My advice is to start here: https://www.daxpatterns.com/ and try generating some YTD calcs in DAX
For what it's worth, the client is reluctant to use SSAS Tabular
because they want to keep the number of layers to a minimum.
Power BI already has a (required) modelling layer that effectively uses SSAS tabular internally, so you already have an additional logical layer. It's just in the same tool as the reporting layer. The difference is that doing modelling only in Power BI currently isn't an "Enterprise" approach. Features such as model version control, partitioned loads, advanced row level security aren't supported by Power BI (although who knows what next month will bring)
Layers are not bad things as long as you keep them under control. Otherwise we should just go back to monolithic cobol programs.
It's certainly possible to start doing your modelling purely in Power BI then at a later stage when you need the features, control and scalabiliyt, migrate to SSAS Tabular.
One thing to consider is that SSAS Tabular PaaS offering in Azure can get pretty pricey, but if you ever need partitioned loads (i.e. load just this weeks data into a very large cube with a lot of history), you'll need to use it.
Is there a SQL Server architecture to provide this sort of solution as I did it, maybe reducing the number of records necessary?
I guess that architecture would be defining the records in views. That has a lot of obvious downfalls. There is a 'sparse' designator but that just optimise storage for fields that have lots of NULLs, which may not even be the case.
If they use PowerBI for YTD, 12M, Annualized measures, what do I need to provide in the SQL DW, anything more than the facts?
You definitely need a comprehensive calendar table defining the fiscal year
Is this something that SSAS Tabular solves, inherently?
If you only want to report by calendar periods (1st Jan to 31st Dec) then built in time intelligence is "inherent" but if you want to report by fiscal periods, the time intelligence can't be used. Regardless you still need to define the DAX calcs. and they can get really big
First, SSAS Tabular and Power BI use the same engine. So they are equally applicable.
And the ability to define measures that can be calculated across any slice of the data, using any of a large number categorical attributes is one of the main reasons why you want something like SSAS Tabular or Power BI in front of your SQL Server. (The others are caching, simplified end-user reporting, the ability to mash-up data across sources, and custom security.)
Ideally, SQL Server should provide the Facts, along with single-column joins to any dimension tables, including a Date Dimension table. Power BI / SSAS Tabular would then layer on the DAX Measure definitions, the Filter Flow behavior and perhaps Row Level Security.
I have requirement where user wants "All" option is few fields.
1.Sites has records around 20 (Includes All option)
2.Cost Centers which are dependent on 1.Sites has total records around 540 including all Sites. Sites may have different number of Cost Centers (Includes All option)
3.Employees which are dependent on 2.Cost Centers has total records around 29000. Each Cost Center may include different number of Employees. (Includes All option)
4. Processes Which are independents of all above. It includes records around 20.(Includes All option)
Now Sites, Cost Centers, Employees and Processes have dropdown with "All" along with other options.
How would i design database table. Considering below scenarios
User selects following:
Sites : Riyadh
Cost Centers : MA - Medical
Employees : All
Processes : Travel Request and Authorization
User has gone for All in Cost Center
Sites : Jeddah
Cost Centers : All
Employees : All
Processes : All
Likewise There are few others combinations. And How user should see inserted records so that He/She can easily navigate to record and update/delete it. Right now i was thinking of inserting single records for option "All". For e.g.
User Selects:
Sites : Riyadh
Cost Centers : Nursing
Employees : All
Processes : All
This will insert just one row in database table.
User has requirement that if he has 200 Employees under selected Cost Center and he wants to apply for only 70 Employees. He needs to do more work.
How user edit the inserted records afterwards. And How view of all records should be rendered so that editing particular records become easy for user.
You don't model the ALL in your data or you have to deal with people mis-assigning an employee to a cost center named ALL under a site named all. You don't want that!
Sites have cost centers, cost centers have employees, there are processes and (I assume) employees may be assigned to them, thus implying a table that links employees to processes. Only store REAL data.
Then you be smart in your queries so that if the user selects ALL for a given drop-down they get ALL matching records, and inserted data must meet proper referential integrity. A cost center must belong to a valid site. An employee must belong to a cost center and may have one or more processes they are linked to.
but putting in "All" placeholders? You're opening yourself up for a world of hurt managing pseudo-relationships versus real relationships if you go down that route.
Actually you have two relationship between Sites and Cost Centers (I'm narrowing it down only to those two entities). Both are optional and one of them must be defined.
The first relation is (un-problematic) zero to one relationship Site to Cost Center (covering the case that the cost center is known and assigned for the Site).
The second relationship covers the case, that no cost center is assigned and the cost must be "somehow allocated". The "ALL" may mean each cost center (say) receives equal share.
This split in two relationship makes the database design more clean, but it will not address the main problem, which is in querying the relation.
The problem is manifested in OR condition in join predicates (chasing both paths) which can lead to sub-optimal performance.
So this is the touchstone of your design, collect the main queries and check how they perform on sample data.
One possible approach to attack performance problems would be to define materialized views that expand the ALL relationship to every Cost Center (as proposed by #Michael) and that can be refreshed in case of a new Cost Center definition, so you need not to handle manually such changes.
I've got a simple cube with a fact table which has a date field among others and connected it with a time dimension which has 2 hierarchies.
What I want to do is create one measure that will be filtered only by the one time hierarchy and a second one for the second time hierarchy.
Basically this:
Measure1 ----> Cannot be affected by filtering of time_hierarchy2 and gets filtered only by time_hierarchy1
And the same for Measure2.
With what I've tried so far I can't do this because whenever I add a time hierarchy in the cube browser filter area, it affects both measures while I want them to be independent.
Is this possible?
The idea is to create two instances (i.e Cube Dimensions) of your Database Dimension and put one Hierarchy in each of them. This concept is also known as a Role-Playing Dimension.
You can then add filters using these role-playing dimensions to filter your Measures.
In the way you have described your current data model this is not possible. Within Analysis Services if you were to review the Dimension Usage tab you will notice the dimension to measure group usage. For a single measure to dimension relationship the measure will be affected by all attributes/hierarchies of the related dimension when browsing the cube.
If a viable option would be to have a separate TimeKey in your fact you may establish a Role Playing Dimension and have multiple constraints from the your fact to Time dimension.
Another option could be similar to where I recently split this setup into multiple facts each with a single reference to the Time dimension so that I could the plot separate measures to the same graph on the same time axis. How to avoid Role Playing Dimension
In SQL Server 2008+, we'd like to enable tracking of historical changes to a "Customers" table in an operational database.
It's a new table and our app controls all writing to the database, so we don't need evil hacks like triggers. Instead we will build the change tracking into our business object layer, but we need to figure out the right database schema to use.
The number of rows will be under 100,000 and number of changes per record will average 1.5 per year.
There are at least two ways we've been looking at modelling this:
As a Type 2 Slowly Changing Dimension table called CustomersHistory, with columns for EffectiveStartDate, EffectiveEndDate (set to NULL for the current version of the customer), and auditing columns like ChangeReason and ChangedByUsername. Then we'd build a Customers view over that table which is filtered to EffectiveEndDate=NULL. Most parts of our app would query using that view, and only parts that need to be history-aware would query the underlying table. For performance, we could materialize the view and/or add a filtered index on EffectiveEndDate=NULL.
With a separate audit table. Every change to a Customer record writes once to the Customer table and again to a CustomerHistory audit table.
From a quick review of StackOverflow questions, #2 seems to be much more popular. But is this because most DB apps have to deal with legacy and rogue writers?
Given that we're starting from a blank slate, what are pros and cons of either approach? Which would you recommend?
In general, the issue with SCD Type- II is, if the average number of changes in the values of the attributes are very high, you end-up having a very fat dimension table. This growing dimension table joined with a huge fact table slows down the query performance gradually. It's like slow-poisoning.. Initially you don't see the impact. When you realize it, it's too late!
Now I understand that you will create a separate materialized view with EffectiveEndDate = NULL and that will be used in most of your joins. Additionally for you, the data volume is comparatively low (100,000). With average changes of only 1.5 per year, I don't think data volume / query performance etc. are going to be your problem in the near future.
In other words, your table is truly a slowly changing dimension (as opposed to a rapidly changing dimension - where your option #2 is a better fit). In your case, I will prefer option #1.