I have a fact table having foreign key relationship with other 4 tables. Now, I have four dimensions and 1 measure. I need to combine all 4 dimensions into 1. Anyone suggest me how to do this.
Thanks
I just finished with a project optimizing processing time. They had a dimension that used the fact key as the dimension key. The fact and dimension each had 300 million rows. The hardware was good and it took 3 hours to process the dimension and another hour to process the fact partitions. Dimensions can't be partitioned in Multidimensional.
We took apart that 300 million row dimension and build separate dimensions logically by entity. Now processing is about an hour for everything.
All that is to say that you can create a dimension with the same number of rows as your fact table but processing performance will suffer. I would only combine unrelated dimensions of held down and tortured by the business. Let business users dictate the business need not the implementation details.
That being said with under a million rows you will probably be five either way. Your question was how to combine them. One way is to use all 4 dimension key columns in the key of your dimension. Your combined dimension would have one key and the KeyColumns property would mention all 4 key columns from your fact table. Then add the dimension attribute columns from all 4 dimension tables to that one SSAS dimension. Due to the relationship on the DSV this is allowed.
Alternately if your fact table has an identity column you can use that as the key.
Related
I'm building a data warehouse, and the data is of a quality where 8 fields may be required to uniquely identify a record, and this applies to three tables, each of which will have a few million rows of data per year. It's all 0NF.
Obviously every situation is unique, but considering that the purpose of the data warehouse is for OLAP, am I right in thinking that I would be better to create a single column to use as the primary key rather than a composite primary key of 8 separate fields? It's straightforward to concatenate the fields into an extra column as part of the ETL pipeline.
I appreciate the redundancy increases the storage requirement, and we are talking millions of rows a year, but I'm guessing it'll significantly improve query performance? And reduce memory requirements if the data is modelled in a BI tool?
Can anybody give me any general thoughts or advice on this please?
Below is some entirely made-up simulated data. I need to like the order table to the shipment table to get where the order was shipped from, for example, or maybe the order table to the shipment table to sum the quantity shipped.
I don't think normalising the tables is the way to go, as all four of the columns I'm using here would be subject to change, and only combined they form a reliable key for a unique shipment.
Because of this the data is bulk deleted/inserted based on shift date.
Thanks!
Phil.
Those look like fact tables. In a dimensional model only dimension tables need single-column keys. Fact tables typically have compound keys made up of the dimension foreign keys that define the fact table grain.
I am modelling cube in SSAS. Cube has around 20 dimensions and 6 fact tables. Some of the dimensions are common among the fact tables. e.g. Time dimension. Fact_PNL has 3 date columns for those we have 3 role playing dimensions in the dimension usages.Another fact table has 5 date columns for them as well we have separate role playing dimensions in dimension usage tab. We have a common dimension Company which is foreign key in all fact tables. We might need to combine the data from multiple facts to get final output.
Should i create 6 role playing dimension for each of the fact table or use the same dimension for all fact tables?
Role playing dimensions should be created when we have multiple columns pointing to the same dimension ?
It's up to you. If the role playing dimension plays the same logical role for each fact table, then I would use the same RPD for the same logical role in each fact table. But if you want to use separate ones for each fact table, maybe because you think in the future they might be used differently, then you can.
In short, either way works fine, so whatever makes the most intuitive sense to you and other users is the way you should go.
Yes, that is the purpose of Role Playing Dimensions. When two or more columns in the same fact table reference the same dimension.
I have a cube with 3 fact tables and 20 + dimensions that relate easily to all 3 fact tables and everything works fine except for the fact that one of the dimensions (Warehouse) is only related to 2 of the 3 fact tables. My problem I guess is a display issue. When the user is viewing measures from all 3 fact tables then drags over the Warehouse dimension, it simply repeats the grand total of the measure in the 3rd fact table for every possible value of Warehouse. This certainly makes sense to me as there is no relationship set up and it's conceptually behaving almost like a cross-join. Nonetheless, it's confusing to users and I'd like to not have the grand total duplicated for each dimension member in Warehouse. I was thinking one solution was to create a dummy warehouse called "Not Applicable" and then relate every row in the 3rd fact table to that dimension member. I was hoping there's just a setting in SSAS where I could control this behavior so I didn't have to create any new warehouse values. Is there a standard way to handle non-related dimensions with multiple fact tables? Thanks in advance.
You can use the "IgnoreUnrelatedDimensions" property of the measure group not related to Warehouse: set it from the default value true to false. Then, measure values for this measure group will only be shown for the "All" members from the warehouse dimension, and the cells will be null (empty) for non-All members of this dimension.
This is a global setting per measure group, you cannot configure it individually per dimension and measure group. But for your purpose, this should be fine.
i have designed places related warehouse tables - DimPlaces, FactPlaces, DimGeography. It is straightforward design if you see. All the locations is in DimPlaces (Addrline1, Addrline2,placename,etc) and geography hierarchy is in DimGeography (City, State, Country, PostCode). FactPlaces is table which has got foriegn keys to DimPlaces and DimGeography.
I would like to maintain historical data as there are chances that places names or their properties might change and at the same time if the location of a place changes then geographic hierarchy key changes.
I have found design pattern -
Another useful design pattern is to add the durable account key to the fact table in addition to the dimension’s surrogate key. This joins back to the current rows in the dimension to make it easier to report all of history by the current dimension attributes.
Could you please suggest is this OK to follow this solution? If yes, do i need to use KEY of type UNIQUEIDENTIFIER for a unique value?
Another question on this - I have employees data (DimEmployee and FactEmployee). Each employee is associated with the places where he works. How to connect These EMPLOYEE TABLES with the PLACES TABLES. Do I need to connect FACTEMPLOYEE WITH FACTPLACES?
I think in the first instance, they're referring to business keys? So if your dimension table has two rows, surrogate key 1 & 2, but they both refer to the same thing, so both have AccountId/ProductId/WhateverId of 1, then you will have some fact table rows with surrogate key 1 and business key 1, and later ones with surrogate key 2 and business key 1.
Uniqueidentifiers are very wide, try and avoid using them on fact tables and for joins if possible.
For your last question - That's really more a reporting thing. Do you need to do that? Is that what people need to see, do they need to slice by that? You could consider a referenced dimension - Where the places table links to the fact tables via a placeId on the employees dimension. Or, you could have a factemployees table with start and stop dates. It depends on what you need to achieve.
I have designed a fact table that stores the facts for a specific date dimension and an action type such as create, update or cancelled. The facts can be create and cancelled only once, but update many times.
myfact
---------------
date_key
location_key
action_type_key
This will allow me to get a count for all the updates done, all the new ones created for a period and specify a specific region through the location dimension.
Now in addition I also have 2 counts for each fact, i.e. Number of People, Number of Buildings. There is no relation between these. And I would like to query on how many of the facts having a specific count, such as how many have 10 building, how many have 9 etc.
What would be the best table design for these. Basically I see the following options, but am open to hear better solutions.
add the counts as reference info in the fact table as people_count and building_count
add a dimension for each of these that stores the valid options, i.e. people dimension that stores a key and a count and building dimension that stores a key and a count. The main fact will have a people_key and a building_key
add one dimension for the count these is used for both people and building counts, i.e. count dimension that stores a key and a generic count. The main fact will have a people_count_key and a building_count_key
First your counts are essentially "dimensions" in the purest sense (you can think of dimensions as a way to group records for reporting purposes). The question though is whether dimensional modeling is what you want to do. I think you are better off as seeing this as something of an implicit dimension than you are to add dimension tables. What this means essentially is that dimension tables add nothing and they create corner cases of errors I just don't think are very helpful unless you need to track a bunch of information related to numbers.
If it were me I would just add the counts to the fact table, not to other tables.