I'm new to that topic. I've got a database with a flat fact table, which contain data like date, product group, product subgroup, product actual name, and some calculations/statistics. All I need to do is create a report using olap cube. I have got two ideas how to create that, but dont know which draft is better (if even correct). The original DAILY_REPORT... table has not a primary key. Its just a data table. In first concept I have created every table (which will be as a dimension) with a ID, and connected the product->family of product->project->building in a hierarchy. Another concept is without all ID's and hierarchy. Relation created automatically based on names. Can somebody explain me in which direction I should tend...?
First idea:
http://imgur.com/iKNfAXF
Second:
http://imgur.com/IZjW1W6
Thanks in advance!
You can follow these steps to create your cube:
Create a separate view for each of the dimensions you want to have. Group similar type of data in one view, for e.g. Product Name, Product Group, Product Sub-Group, etc.
Keep the data in your dimension view as DISTINCT data. for e.g. SELECT DISTINCT [Product Name], [Product Group], [Product Sub-Group] FROM TABLE
Keep an 'ID' column in each dimension view, for e.g. Product ID in Product view
Create a view for your fact. Include 'ID' column of each dimension in your Fact view. This will help you to create relationship on 'ID' column, which will be a lot faster than relationship created on top of names.
For creating hierarchies in dimension attributes, SSAS provide drag and drop functionality.
If you need more details let me know.
You could construct the dimensions you need by views that based on distinct queries (i.e. SELECT DISTINCT) from the source data. These can be used to populate the dimensions.
You can make a synthetic date dimension fairly easily.
Then you can create a DSV that joins the views back against the fact table to populate the measure group.
If you need to fake a primary key then you can use a view that annotates the fact table with a column generated from row_number() or some similar means. Note that this is not necessarily stable across runs, so you can't rely on it for incremental loads. However, it would work fine for complete refreshes.
Related
I was wonder if it was possible to create a generated column by using a group by query. For example, in the diagram below, is it possible to have the quantity of a equipment be generated based on the number of asset that matches it's foreign key?
You can't do that with a computed column. If you wanted to store and maintain such information, you would need trigger code for every DML operation on table asset , which makes things rather complex.
You can, on the other hand, create a view:
create view v_equipment
select e.*,
(select count(*) from asset a where a.equipment_id = e.equipment_id) as quantity
from equipment e
This gives you an always up-to-date perspective at your data. You can query the view directly instead of the table whenever you need the quantity information.
I have a DimPerson table and a DimPersonDecileOutrigger Table which stores decile data. The way the outrigger is structured is that a customer is given a decile for current year and previous year (if they have bought in the period)- which means a customer might have TY and NOT LY and vice versa. Some customers are both.
In ssis when I picked the columns in dimension structure- I initially only picked columns from DimPerson and not the outrigger. That way in the browser it showed all the id's starting from 1. But when I dragged some columns from outrigger- then in the browser it doesnt show all personID's. I want to see all customers regardless of them having a decile or not.
Pic attached to show what it looks like in dimension structure tab. Also the relationship is between OutriggerID as primary and OutriggerID in person as foreign.
If you just want to solve the problem, you can create a View in your underlying relational database that uses LEFT OUTER JOIN to link the two tables, so that the view will return all rows from DimPerson, even if they don't have a Decile.
Then use the view as the source for your dimension instead of the tables.
I found a schema in Google Images (see below) that can illustrate a problem I having in my data warehouse design:
My design is different, but this is the simplest figure I could find to convey my question, which is given the figure, I'm wondering how could the schema accommodate the following scenario: if a product had a unique number assigned to it by the SalesOrg (salesOrg_product_number)...For example, a salesOrg sells food items and assigns all food items of the same kind the same unique salesOrg_product_number. A different salesOrg would have a different salesOrg_product_number for that type of product.
I'm inclined to place the salesOrg_product_number attribute in the Product dimension table, but part of me thinks it should be in the salesOrg dimension table instead. I'm wondering which one of these is correct way in a data warehouse (not relational db) design to maintain the star schema?
In a perfect world the Primary Keys of a dimension table should be just surrogate key, without any meaning for the business. Table IDs should be invisible for the final users, but business code should be of course available.
A possible solution would be to have a product table with a structure like:
Product_id
Product_desc
Product_SO1_number
Product_SO2_number
...
Of course this will require to show the correct field to the correct Sales Organization. Depending on your reporting tool this can be more or less difficult. For example if you write your query manually you need just to put the right column in your select.
Another possibility would be to have a product/sales_org table, a table which combine the Product and the Sales_Org one:
Product_Sales_Org_id
Product_id
Sales_Org_id
Product_SO_number
...
This table will be child of the two dimension table and on the fact table you will have Product_Sales_Org_id column. Depending on Product and Sales Organization the Product_SO_number will return the correct number per SO.
If you want to have this in a star schema structure you can put Product/Sales_Org/Product_Sales_Org together in only one table like:
Product_Sales_Org_id
Product_id
Sales_Org_id
Product_desc
Sales_Org_desc
Product_SO_number
...
Sincerely I would go for the second solution, keep the Product and the Sales_Org tables separated, because they are two different business entities and implement the relationship table in the middle.
I hope this helps.
We have a data warehouse that contains a large fact table with over 100 million rows. I'm trying to create a cube that includes this fact table and need to create a fact dimension based off of this table. The issue that I'm running into is that there is no way to find uniqueness on this table using the fields that would be included in the fact dimension, without using every field in the table.
I created a surrogate key in the dsv using:
Row_Number() OVER (ORDER BY ID, Dt, Num)
I've used this method to create a surrogate key in another dsv and it worked, but I was also able to find uniqueness with the fields in the Order By.
When I browse the cube based on this fact table I get the correct results when using regular dimensions. When I try to use fields from fact dimension I get eroneous results in most cases...some are correct, though very few.
Would this be a case where I should request that a surrogate key get created on the fact table? Is there a better solution that someone could suggest?
I was hoping someone could explain the appropriate use of the 'FACT Relationship Type' under the Dimension Usage tab. Is it simply to create a dimension out of your fact table to access attribute on the fact table itself?
Thanks in advance!
Yes, if your fact table has attributes that you would like to slice by (create a dimension from), you would use this relationship type.
Functionally, to the users it behaves no differently than a regular relationship.
After you create your dimensions and cubes you need to define how each dimension is related to each measure group. A measure group is a set of measures exposed by a single fact table.
Each cube can contain multiple fact tables and multiple dimensions. However, not every dimension will be related to every fact table.
To define relationships right click the cube in BIDS and choose open; then navigate to the Dimension Usage tab. If you click the ellipsis button next to each dimension you will see a screen that allows you to change dimension usage for a particular measure group. You can choose from the following options:
Regular default option; the dimension is joined directly to the fact table
No relationship the dimension is not related to the current measure group
Fact the dimension and fact are derived from a single table. If this is the case your dimensional warehouse has poor design and isn't likely to perform well. Consider separating fact and dimension tables.
Referenced the dimension is joined to an intermediate table prior to being joined to the fact table. Referenced relationship resembles a snowflake dimension, but is slightly different. Suppose you have a customer dimension and a sales fact; you'd like to examine total sales by customer, but you also want to examine line item sales by customer. Instead of duplicating the customer key in the line item fact table you can treat the sales fact as an intermediate table to join customer to line item.
Many-to-many this option involves two fact tables and two dimension tables. Dimension A is joined to an intermediate fact A, which in turn joins to dimension B to which the fact B is joined. Much like with fact option if you need to use many-to-many option your design could probably use some improvement. This type of relationship is sometimes necessary if you are building cubes on top of a relational database that is in 3rd normal form. It is strongly advisable to use a dimensional model with star schema for all cubes. For example you could have two fact tables: vehicles and options; each vehicle can come with a number of options. You're likely to examine vehicle sales by customer, and options by the items that are included in each option. Therefore you would have a customer dimension and item dimension. You could also want to examine vehicles sales by included item. If so the vehicle fact would be joined to the options fact and customer dimension; the options fact would also join to items' dimension.
Data mining target dimension is based on a mining model which is built from a source dimension. Both source dimension and target dimension must be included in the cube.