OLAP Dimension structure - ssas

I have Dimension "Customer". Each Customer can have some buisness units and some departmens.
I should bild 2 hierarchies: Customer->Department and Customer->Buisness Unit.
So, I also need to set key attribute. This is my question: What should be use as key attribute?
May be I do this wrong?
Could you help?

To define hierarchies, you should ask the following questions:
If I group the departments, I have a consumer? If I group the business, I have a consumer?
If I group the departments and business, I have a consumer?
If grouping the departments get a consumer, so the hierarchy is: Consumer > Department. Similarly with the other.
If grouping the department and business (an attribute in dimension that contains two pieces of information, for example, DPT1-BUS1) obtains a consumer, the hierarchy is: Consumer > Department_Business.
It is not recommended to have null attributes in a dimension. So make sure that the consumer needs to have a business and a department. Otherwise, rephrase the modeling of the data warehouse. Generally, a key dimension is a artificial key auto-increment...
I recommend that read Kimball
Hope this help.

Related

Database Relation Anomalies

I am tasked to find anomalies within this relation. I had identified a few insertion, deletion and update anomalies.
Commission Percentage: the percentage of the total sales made by a salesperson that is paid as commission to that salesperson.
Year of Hire: the year the salesperson was first hired
Department Number: the number of the department where the salesperson works
Manager Name: name of the manager of the department
However, I am confused with a anomalies that I pulled out. Below is the statement:
There can not be a manager with the same name in the company as there is no primary identifier for the manager entity except for the name, which can be a duplicate within the company.
May I know how should I phrase the above statement and under which (update/deletion/insertion) anomaly should I include it in?
Thank you
May I request additional assistance below as well:
How would you change the current design and how does your new design address the problems you have identified with the current design.
My current design is splitting it into 3 relations:
Salesperson(salespersonNumber, salespersonName, commissionPercentage, YearOfHire, deparetmentNumber)
Product(productNumber, productName, unitPrice)
Manager(managerNumber, managerName, departmentNumber)
However, I am missing out quantity entity.
Quantity requires composite key of productNumber & salespersonNumber.
Should I make it in another relation by itself?
Quantity(productNumber, salespersonNumber)
Anomalies
When identifying identifying (potential) anomalies, you're listing dependent attributes that are affected by the anomalies (you forgot Salesperson Name, btw). Specifically, you listed attributes that depended on a subset of the key (Salesperson Number, Product Number), thus violating 2NF. You're on the right track.
However, be careful not to confuse attributes with anomalies. An update anomaly would be if 1 of the 3 instances of Bilstein got changed. The (assumed) functional dependency Salesperson Name depends on Salesperson Number would be broken and the data would be inconsistent (Salesperson Number 437 would be associated with more than one name). Remember that normalization aims to eliminate redundant associations.
Identity
The problem with identifying managers by name indicates a poor modeling decision. As you stated, a company's set of managers isn't uniquely identified by name, so there's a mismatch between the logical data model and the world it models. This won't cause insert, update or delete anomalies as long as we use different values for different managers, but it will prevent convenient identification of managers. Possible improvements would be to use multiple attributes (abstract domains are often easily identified by a combination of attributes, but natural domains like people usually aren't, e.g. Manager Name, Birthdate would be more identifying but still not a good solution), turn the Manager Name into a surrogate key (e.g. Scott1, Scott2), or introduce a new surrogate key (e.g. a numeric ID).
Proposed improvement
Your proposed design normalizes the original table as well as addressing the identification problem. It's a good answer except for two issues: it doesn't include the association between Salesperson and Manager, and in your Quantity relation, you forgot to include the quantity as a dependent attribute.
Good job so far, hope this helps.

Star schema for target and actual comparison Kimball

I am going to model one of the star schemas for a university data warehousing project. We need to compare the actual application count with a target.
There are target counts (set by the colleges every year) associated with Departments, Course groups, and Courses.
The requirement is to ensure that the targets set get correctly allocated and also the progress of applications against the target.
One proposal is to include all the actual counts (department level total accepted count, course group level total accepted count, course level total accepted count) and corresponding target counts (dept level target counts, course group target counts, course target counts) in single fact table. One of the dimensions in this star schema is Course dimensions. It consists of all the
course, course group and department information. I do understand a granularity problem here, but this could be handled at the cube implementation level.
Or if I want to set the target at different hierarchy levels of a dimension, should I build different fact tables? As mentioned below:
Implement 3 fact tables for 3 different types of targets and connect these fact tables to the actual fact table. In this situation, the Course dimension can snowflake into course group dimension and department dimensions. First fact is connected to course. Second fact to course group and the 3rd one to dept. The actual fact table is of granularity of course level so all three fact tables can be connected to the actual fact table via course. Note that the actual fact table is of grain course-level and this can be aggregated to get higher level such as course group and dept actual counts.
Data Architects please comment!

How Do I Architect Out This Relationship

The best way I can describe this is with an example:
Imagine Many Businesses. In any one business there can be multiple divisions. In each division there can be multiple dept. This would be a series of 1 to many, right?...But now what if a given dept can be elevated to a division or may be a division can be sold off and become its own Business or maybe 2 businesses merger. I need this type of fluid design where, I envision, each entity can easily be moved up or down or in or out.
You can do this with a self referential parent.
Table Business [BusinessID, BusinessLevelID, ParentBusinessID]
Table BusinessLevel [BusinessLevelID, Description ]{eg. business, division, dept }
Look at using either isActive boolean fields or start and end date fields to handle changes. If Division X becomes Business Y, deactivate the old record and add a new one. Regarding the associated departments, either do the same thing or update the applicable field depending on your requirements for preserving historical data.

How yo choose the right intermediate measure group in many-many relationship when you have multiple options

I have a fact table called "FactActivity" and a few dimension tables like users, clients, actions, date and tenants.
I create measure groups corresponding to each of them as follows
FactActivity => Sum of ActivityCount colums
DimUser => Count of rows
DimTenant => Count of rows
DimDate => Count of rows and distinct count of weekofyear column
Each user can do multiple actions using multiple clients. A tenant is logical grouping of users. So a tenant contain multiple users but a user can't belong to more than 1 tenant. All the dimension tables and fact tables are connected to DimDate via regular relationship.
The cube structure is as follows.
Now I want to defined the dimension relationships to each of the measure group. Some of them are Many-Many relationsip (to enable distinct count calculation). The designer is showing me multiple options to choose from for many of the intersections. I'm confused as to which one to select as intermediate measure group. Should I always pick the measure group whose total # rows is the least ex: DimDate? Or what is the right logic to determine the intermediate measure group.
This is what I got. IS this right? If no, what is wrong?
For more information to hep choose the right answer.
FactActivity = 1 billion rows
DimUser = 35 million rows
DimTenant = 1 million rows
DimDate = 1000 rows
The correct way to choose the intermediate measure group depends on how you want to evaluate your measures with respect to the dimension related:
Let's start with Activity measure group to Tenant dimension: The question is: How should Analysis Services determine the activity count (or any other measure in the Activity measure group) of a tenant? The only reasonable way to determine this would be to go from the activity fact table through the user table to the tenant table. And actually, the last relationship is not a many-to-many relationship, but a many-to one relationship. I. e. you could optimize away the tenant dimension by integrating it into the user dimension. However, using a many-to-many relationship will work as well, just be a little less efficient. You might also consider using a reference relationship from user to tenant instead of a many-to-many relationship. And there may be other considerations why you may have chosen to have them two separate dimensions, thus I do not discuss this any further.
Now let us continue with the next one: Tenant measure group to User dimension: The way you have configured it (using the date measure group) means that for each date that a tenant and a user have in common, the tenant count of a user adds one to the count. This is probably not what you want. I would assume you want to relate tenant measures to user dimension by the user measure group. However, I am not sure what the purpose of the DateKey in the user and tenant dimension tables is at all. Thus, your relationship may be correct.
Let's continue with the relationships from the Date measure group to the Tenant and User dimensions. I would assume there should be no relationship at all, as the week of the year and the date count do not depend on tenants or users. Please note that it is absolutely ok to have no relationship between some measure groups and some dimensions. If you look at the Microsoft sample cube "Adventure Works", it has more gray cells (i. e. measure group and dimension being unrelated) in the Dimension Usage than white ones (i. e. there is some kind of relationship between measure group and dimension, of whichever type). In the default setting of IgnoreUnrelatedDimensions = true of a measure group, this means that the measure value will be the same for all members of the dimension. This should be the case for date count and week of year. However, again, as I do not know the purpose of the DateKey in the user and tenant dimension tables, I am not sure if this assumption is correct for your data.
And after these examples, I would hope you can continue with the rest of relationships yourself.

Attribute in multiple hierarchies in Analysis Services 2008

I have designed a relatively simple data warehouse that uses the star schema. I have a fact table with just a primary key along with CompanyID and Amount (the actual measurement) columns. Of course I also have a dimension table to represent the companies which the fact table references.
Now I'm required to create a single level hierarchy (CompanyGroup) for companies. This seems like an easy task but the catch is that a single company should be allowed to exist within multiple CompanyGroups.
I experimented with this by creating a new dimension table called CompanyHierarchy that holds a primary key, GroupKey and CompanyKey. Defining a user defined hierarchy where GroupKey is the top level and CompanyKey is the second level yields A duplicate attribute key has been found error for the CompanyKey attribute while processing the dimension.
So, I'm not quite sure how to even start with this. How can I create a user defined hierarchy within a dimension where attributes can exist multiple times?
Screen shot of my current cube definition can be seen at:
img132.imageshack.us/img132/6729/ssasm2m.gif
You need to create a many-to-many relationship (one company can belong to many groups and one group can have many companies) There is an example of a many-to-many relationship in the Adventure Works cube around the sales reason dimension and there is an extensive white paper here that explains a number of different ways of using many-to-many relationships.
There is also a technique for supporting multiple members in the one hierarchy that I documented here