Data Warehouse - Multidimensional Model - Fact Table is Smaller than Dimension Table - ssas

I am working on data warehouse project where customer dimension table is larger than a fact table. Dimension and Fact tables are created from CRM system.
The fact table monitors activities such as letter is sent to a customer or customer calls for assistance. Half of customers have no activities and remaining customers have very few activities; most of customers who have activities have a single activity.
I am not sure if star schema is the best solution for project. Have you worked on similar projects & what was the solution.

If many of the dimension members are not related to the facts at all. I would sugest to filter unused dimension members during the ETL process.
So you do a
SELECT Customer_ID, Name FROM DIL.Customers
WHERE Customer_ID IS IN
(SELECT Customer_ID FROM DIL.Calls)

Related

Can i Create one table for purchases orders and sales orders?

I'm creating a Motorcycle store system, i wondering if i need to create one table that contains orders, the sales orders and the purchases orders with a column that will be called OrderType that determine if its purchase order or sale order
the same with Payments, money that i pay to supplier and money that customer pay to me, should be in table that called payment and column that determine if its outgoing payment or income
is that ok? or i need to create other tables
I would consider against it... Purchases are from a vendor YOU get the products from to build/fix/manufacture something.
Sales orders would have a customer you are selling to and thus would be foreign keys to different tables... unless your customer table has your vendors too and that has some column to identify difference as Vendor vs Customer.
Additionally, as you expand your design development and queries, purchasing history, etc., it may be more beneficial to have them separate.
You can create a single table. Whether this is good design or unfortunate design depends on how you use the data. How many times do you ever want to query these two datasets as if they were one dataset? How many times to you query them separately?

Data warehouse FactTable

I am trying to use this to learn how about data warehousing and having trouble understanding the concept of the fact table.
http://www.codeproject.com/Articles/652108/Create-First-Data-WareHouse
What would be some queries that I could run to find information from the faceable, and what questions do they answer.
A fact table is used in the dimensional model in data warehouse design. A fact table is found at the center of a star schema or snowflake schema surrounded by dimension tables.
A fact table consists of facts of a particular business process e.g., sales revenue by month by product. Facts are also known as measurements or metrics. A fact table record captures a measurement or a metric.
Example of fact table -
In the schema below, we have a fact table FACT_SALES that has a grain which gives us a number of units sold by date, by store and by product.
All other tables such as DIM_DATE, DIM_STORE and DIM_PRODUCT are dimensions tables. This schema is known as the star schema.
Let's translate this a bit.
Firstly, in a fact table we usually enter numeric values ( rarely Strings , char's ,or other data types).
The purpose of a fact table is to connect with the KEYS of dimensional tables ,other fact tables (fact tables more rarely, and also not a good practice) AND measurements ( and by measurements I mean numbers that change frequently like Prices , Quantities , etc.).
Let's take an example:
Think about a row from a fact table as an product from a supermarket when you pass it by the check out and it get's scanned. What will be displayed in the check out row in your database fact table? Possibly:
Product_ID | ProductName | CustomerID | CustomerName | InventoryID | StoreID | StaffID | Price | Quantity ... etc.
So all those Keys and measurements are bringed together in one fact table, having some big performance and understandability advantage.
Fact Table Contain all Primary key of Dimension table and Measure like "Sales Amount"
A Fact table is a table that stores your measurements of a business process. Here you would record numeric values that apply to an event like a sale in a store. It is surrounded by dimension tables which give the measurement context (which store? which product? which date?).
Using the dimensions you can ask lots of questions of your facts, like how many of a particular product have been sold each month in a region.
Some further info
All dimension keys in a fact should be a FK to the dimension
if a key is unknown it should point to a zero key in the dimension detaialing this.
All joins from a fact to a dim are 1 to 1. bridge tables are is a technique to cater for many to many's, but this is more advanced
All measurements in a fact are numeric, but can contain NULLS if unknown (never put 0 to represent unknown)
When joining facts to dimensions there is no need to do an outer join, due to FKS applied above.
if you have 999 rows in a fact, no matter what dimensions you join to, you should always have 999 rows returned.

Customer Dimension as Fact Table in Star Schema

Can Dimension Table became a fact table as well? For instance, I have a Customer dimension table with standard attributes such as name, gender, etc.
I need to know how many customers were created today, last month, last year etc. using SSAS.
I could create faceless fact table with customer key and date key or I could use the same customer dimension table because it has both keys already.
Is it normal to use Customer Dimension table as both Fact & Dimension?
Thanks
Yes, you can use a dimension table as fact table as well. In your case, you would just have a single measure which would be the count - assuming there is one record per customer in this customer table. In case you would have more than one record per customer, e. g. as you use a complex slowly changing dimension logic, you would use a distinct count.
Given your example, it is sufficient to run the query directly against the Customer dimension. There is no need to create another table to do that, such as a fact table. In fact it would be a bad idea to do that because you would have to maintain it every day. It is simpler just to run the query on the fly as long as you have time attributes in the customer table itself. In a sense you are using a dimension as a fact but, after all, data is data and can be queried as need be.

Fact Table Recommendation

I have a data mart which only needs to capture a serial number of a product, the date of the activity, and where the activity took place (which account).
There are five possible activities. The issue I have is this. Two of the activities take place at a warehouse level. The remaining three take place at the account-level (WH does not apply). Ultimately however every warehouse rolls up to a master account.
So if I had one fact table, I would essentially need two FK and you would have to traverse the fact table to build the WH > Account hierarchy which seems hard to maintain. I'd like one dimension table.
Or is it then recommended I split this into two fact tables, even though the only different characteristic of either table is whether the activity took place at the warehouse or not.
The goal of the reporting will be at the account level, but having the WH information may be useful at some point. And I need to check for duplicates, etc which is why I was leaning towards the first, but don't know how to appropriately handle the hierarchies.
Single Fact Table Design
Item: 1
Account: 14
Warehouse:2
ActivityType:3
Date: 20130204
SerialNumber:123456
Count:1
Dual Fact Table Design
Table 1
Item: 1
Warehouse:2
ActivityType:3
Date: 20130204
SerialNumber:123456
Count:1
Table 2
Item: 1
Account:2
ActivityType:3
Date: 20130204
SerialNumber:123456
Count:1
Ive interpreted you situation as:
ALL activities require an account
Some activities involve a
warehouse.
The selection of warehouse implies an account. the
accounts mentioned in the two point above are of the same type (there
is only 1 account dimension table)
In which case you should be OK with the single FACT table design:
[ACTIVITY_FACT]
SK (Optional, i find unique surrogate PKs useful)
ITEM_SK (Link to your ITEM_DIM table)
ACCOUNT_SK (Link to your ACCOUNT_DIM table)
WAREHOUSE_SK (Link to your WAREHOUSE_DIM table, -1 for no warehouse activities)
ACTIVITY_TYPE_SK (Link to your ACTIVITY_TYPE_DIM table)
ACTIVITY_DATE_SK (Link to your DATE_DIM table)
ITEM_SERIAL_NUMBER
ITEM_COUNT
Have a record in your WAREHOUSE dimension for NONE or NOT APPLICABLE and allocate it a nice obvious special condition SK value of -1 or -9 or whatever your shop is using for such things.
For activity records that reference a warehouse, put the appropriate warehouse sk AND the account sk that belong to that warehouse.
For activities that do not involve a warehouse, populate the warehouse sk with the NONE / NOT APPLICABLE warehouse dimension record and the appropriate Account SK.
Now your fact table can be joined to your Account and Warehouse dimension tables without having to worry about outer join or null condition handling. This should allow you and your users to play about with warehouse dimension data as required and your not having to faff about with managing two tables that contain essentially the same date.
A possibility is to define the hierarchy in a single dimension table. Guessing at what you’re dealing with, I came up with the following.
Outline of dimension table:
TABLE: Account
Account_ID <surrogate key>
Account <Account name, identifier>
Warehouse (Warehouse name, identifier)
Sample data:
Account_ID Account Warehouse
1 A n/a
2 B n/a
3 C n/a
4 W wh1
5 W wh2
6 Z wh3
7 Z n/a
Account_ID is just a surrogate key, having no intrinsic meaning or value
Account lists the accounts. Here, I shows five, A, B, C, W and Z. Select distinct to get the list of accounts; join to a fact table by Account_ID where Account = “W” gets all data for that account (for however many warehouses, if applicable).
Warehouse lists all warehouses and the account they are associated with; here, “W” is the account for two separate warehouses (wh1, wh2); Z is associated with warehouse wh3, but could also be used by a fact table with “no” warehouse. Join to a fact table by Account_ID where Warehouse = “wh1” gets all data for that warehouse.
Using this, with Account_ID in a fact table you could drill down for all entries for any given Account or for a specific warehouse (or for no warehouse, if there is value in that).
There are lots of variations and permutations possible with this kind of approach.

Improvement on database schema

I'm creating a small pet shop database for a project
The database needs to have a list of products by supplier that can be grouped by pet type or product category.
Each in store sale and customer order can have multiple products per order and an employee attached to them the customer order must be have a customer and employee must have a position,
http://imgur.com/2Mi7EIU
Here are some random thoughts
I often separate addresses from the thing that has an address. You could make 1-many relationships between Employee, Customer and Supplier to an address table. That would allow you to have different types of addresses per entity, and to change addresses without touching the original table.
If it is possible for prices to change for an item, you would need to account for that somehow. Ideas there are create a pricing table, or to capture the price on the sales item table.
I don't like the way you handle the sales item table. the different foreign keys based on the type of the transaction is not quite correct. An alternative would be to replace SalesItem SaleID and OrderId with the SalesRecordId... another better option would be to just merge the fields from InStoreSale, SalesRecord, and CustomerOrders into a single table and slap an indicator on the table to indicate which type of transaction it was.
You would probably try to be consistent with plurality on your tables. For example, CustomerOrders vs. CustomerOrder.
Putting PositionPay on the EmployeePosition table seems off to... Employees in the same position typically can have different pay.
Is the PetType structured with enough complexity? Can't you have items that apply to more than one pet type? For example, a fishtank can be used for fish or lizards? If so, you will need a many-to-many join table there.
Hope this helps!