I have the following scenario:
Fact table A linked to Dimensions D1, D2, D3, D4, D5
Fact table B linked to Dimensions D1, D2, D3
I want that D4 is linked to Fact B. I can use Fact A for this. Fact A will be used as a Many-to-Many Relationship.
Is such an approach of using an existing fact as a M2M relationship good practice?
Also, in SSAS you do not specify which dimensions will be linked (when using M2M). Does this mean that I would have to link both D4 and D5? and what happens to D1,D2,D3? Are they linked again?
It is totally fine to have a many-to-many relationship table containing facts. In fact, for currency conversions, this is a standard case, where the exchange rate is the fact in the table relating time and possibly transaction currency to the target currency.
And you do configure the many-to-many relationship for every dimension: On the "Dimension Usage" tab of Cube Designer, you configure e. g. at the row for dimension D4 and the column for measure group B that this relationship is via the many-to-many table A. If you configure the cell in the same column for the D5 dimension as "No Relationship" (i. e. gray), this dimension will not be related to the measures of measure group B.
Related
Firstly, I am going to explain my problem by using example from real life.
Let’s say that we are company and we are selling different means of transport, e.g. cars, buses, trucks, trains, planes, etc.
Let's say that we have around 10.000.000 different items with daily changes.
For each item we have an unique name (e.g. car Audi A8 X or plane Boing 747-200B Y) where X and Y are unique values.
Don’t worry about naming because it works just fine.
For each item we also have some special data. Data depends on type, e.g. for car: dimensions (length, width, height …), powertrain, etc. For planes we have e.g. length, interior width, wingspan, wing area, wing sweep, etc.
And now the problem … I would like to put all this data from different Excel files and paper to database.
Question 1: Which database model is better?
Idea #1: I am going to create one table, called items where I am going to store only name of product which we are selling (e.g. car Audi A8 X, plane Boing 747-200B Y, etc.). And than in other tables (car, plane, train …) I will store extra data for cars / planes / trains.
So if I would like to get all data of e.g. car than I will have to check table car. If I would like to get all data of e.g. train than I will have to check table train.
Idea #2: Should I create one table where I am going to store all item’s names (just like in Idea #1, items). And than additional pivot table (e.g. data with fields: item, key, value) where I will be able to find all informations?
Question 2: I need history of all data. In first case I will have to duplicate row from e.g. table car just because one fields is different. But for Idea #2 ... for all rows in pivot table data would be necessary to have information if data is valid (or when was valid).
Can you please help me? I have no idea which model is better or what is actually using in production. Also ... is there any good book about storing historical data to database?
Thanks!
You present two problems to us. The first is organizing specialized data about subtypes (cars, buses, trucks, etc.). The second is dealing with temporal (historical) data.
Your idea #1 resembles a design pattern known as "Class Table Inheritance". If you will do a search on this phrase, you will find many articles outlining exactly how it works. These will pretty much confirm your initial reaction, but they will add lots more helpful detail. You will also find numerous references to previous Q&A entries in this site, and in the DBA site.
For an alternate design, look up "Single Table Inheritance". This stores everything in a single fat table, with NULLS in spaces that don't pertain to the case at hand.
I am not sure what you mean by storing something in a pivot table. I'm familiar with pivot tables in Excel, but I have always used them as results calculated from ordinary tables where the data is stored.
How to deal with historical data is a separate issue.
I have a DimPerson table and a DimPersonDecileOutrigger Table which stores decile data. The way the outrigger is structured is that a customer is given a decile for current year and previous year (if they have bought in the period)- which means a customer might have TY and NOT LY and vice versa. Some customers are both.
In ssis when I picked the columns in dimension structure- I initially only picked columns from DimPerson and not the outrigger. That way in the browser it showed all the id's starting from 1. But when I dragged some columns from outrigger- then in the browser it doesnt show all personID's. I want to see all customers regardless of them having a decile or not.
Pic attached to show what it looks like in dimension structure tab. Also the relationship is between OutriggerID as primary and OutriggerID in person as foreign.
If you just want to solve the problem, you can create a View in your underlying relational database that uses LEFT OUTER JOIN to link the two tables, so that the view will return all rows from DimPerson, even if they don't have a Decile.
Then use the view as the source for your dimension instead of the tables.
I am reading a Database Normalization tutorial and I am having difficulty understanding the following:
Functional dependency says that if two tuples have same values for attributes A1, A2,..., An, then those two tuples must have to have same values for attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign (→) that is, X→Y, where X functionally determines Y.
What do the above two refer to? What is meant by "Functionally Determines"?
I can have a tuple where A1, A2, A3 are same but B1, B2, B3 are different.
A functional dependency occurs when one attribute in a relation uniquely determines another attribute. This can be written A -> B which would be the same as stating "B is functionally dependent upon A."
In a table listing employee characteristics including Social Security Number (SSN) and name, it can be said that name is functionally dependent upon SSN (or SSN -> name) because an employee's name can be uniquely determined from their SSN However, the reverse statement (name -> SSN) is not true because more than one employee can have the same name but different SSNs.
I have a fact table with columns Product Model and measure Installed base(not important, any other measure would do the same).
Than I have dimension table, and here comes the tricky part:
I have many product models, and each belong to higher level custom_groups (4 in total),
Same product model may belong to more than one custom groupX, and custom groupX may belong to more than one custom groups X-1.
for example hierarchy from highest CG1 to lowest PM Level:
XXX=>YYY=>ZZZ=>WWW
LLL=>MMM=>QQQ=>WWW
RRR=>PPP=>QQQ=>TTT
you can see, that last level PM WWW belongs to 2 different custom groups 3 (QQQ and ZZZ) whereas custom group 3 QQQ may belong to two different custom groups 2 (MMM and PPP in this case).
I tried to model this via hierarchies in SSAS but either I got wrong measure results (IB is summed up without aggregations and results are same for all custom groups), or I get missing some custom groups in my hierarchy.
You can solve this using a many-to-many relationship: Create a dimension table with four columns for the four custom group levels, as well as a custom_group_id column. Fill in to this table all combinations of custom groups that appear in your data. Then build a bridge table (aka factless fact table) with the two columns product_model and custom_group_id. Fill a record into this for each group combination that a product belongs to.
Then, in BIDS create a dimension from the custom group table, and a measure group from your bridge table, using count as the only measure, and make this measure invisible. Finally in the "Dimension Usage" tab of Cube Editor, configure the relationship between the main measure group and the custom group dimension to be many-to-many via the bridge measure group.
I was hoping someone could explain the appropriate use of the 'FACT Relationship Type' under the Dimension Usage tab. Is it simply to create a dimension out of your fact table to access attribute on the fact table itself?
Thanks in advance!
Yes, if your fact table has attributes that you would like to slice by (create a dimension from), you would use this relationship type.
Functionally, to the users it behaves no differently than a regular relationship.
After you create your dimensions and cubes you need to define how each dimension is related to each measure group. A measure group is a set of measures exposed by a single fact table.
Each cube can contain multiple fact tables and multiple dimensions. However, not every dimension will be related to every fact table.
To define relationships right click the cube in BIDS and choose open; then navigate to the Dimension Usage tab. If you click the ellipsis button next to each dimension you will see a screen that allows you to change dimension usage for a particular measure group. You can choose from the following options:
Regular default option; the dimension is joined directly to the fact table
No relationship the dimension is not related to the current measure group
Fact the dimension and fact are derived from a single table. If this is the case your dimensional warehouse has poor design and isn't likely to perform well. Consider separating fact and dimension tables.
Referenced the dimension is joined to an intermediate table prior to being joined to the fact table. Referenced relationship resembles a snowflake dimension, but is slightly different. Suppose you have a customer dimension and a sales fact; you'd like to examine total sales by customer, but you also want to examine line item sales by customer. Instead of duplicating the customer key in the line item fact table you can treat the sales fact as an intermediate table to join customer to line item.
Many-to-many this option involves two fact tables and two dimension tables. Dimension A is joined to an intermediate fact A, which in turn joins to dimension B to which the fact B is joined. Much like with fact option if you need to use many-to-many option your design could probably use some improvement. This type of relationship is sometimes necessary if you are building cubes on top of a relational database that is in 3rd normal form. It is strongly advisable to use a dimensional model with star schema for all cubes. For example you could have two fact tables: vehicles and options; each vehicle can come with a number of options. You're likely to examine vehicle sales by customer, and options by the items that are included in each option. Therefore you would have a customer dimension and item dimension. You could also want to examine vehicles sales by included item. If so the vehicle fact would be joined to the options fact and customer dimension; the options fact would also join to items' dimension.
Data mining target dimension is based on a mining model which is built from a source dimension. Both source dimension and target dimension must be included in the cube.