I have a data mart which only needs to capture a serial number of a product, the date of the activity, and where the activity took place (which account).
There are five possible activities. The issue I have is this. Two of the activities take place at a warehouse level. The remaining three take place at the account-level (WH does not apply). Ultimately however every warehouse rolls up to a master account.
So if I had one fact table, I would essentially need two FK and you would have to traverse the fact table to build the WH > Account hierarchy which seems hard to maintain. I'd like one dimension table.
Or is it then recommended I split this into two fact tables, even though the only different characteristic of either table is whether the activity took place at the warehouse or not.
The goal of the reporting will be at the account level, but having the WH information may be useful at some point. And I need to check for duplicates, etc which is why I was leaning towards the first, but don't know how to appropriately handle the hierarchies.
Single Fact Table Design
Item: 1
Account: 14
Warehouse:2
ActivityType:3
Date: 20130204
SerialNumber:123456
Count:1
Dual Fact Table Design
Table 1
Item: 1
Warehouse:2
ActivityType:3
Date: 20130204
SerialNumber:123456
Count:1
Table 2
Item: 1
Account:2
ActivityType:3
Date: 20130204
SerialNumber:123456
Count:1
Ive interpreted you situation as:
ALL activities require an account
Some activities involve a
warehouse.
The selection of warehouse implies an account. the
accounts mentioned in the two point above are of the same type (there
is only 1 account dimension table)
In which case you should be OK with the single FACT table design:
[ACTIVITY_FACT]
SK (Optional, i find unique surrogate PKs useful)
ITEM_SK (Link to your ITEM_DIM table)
ACCOUNT_SK (Link to your ACCOUNT_DIM table)
WAREHOUSE_SK (Link to your WAREHOUSE_DIM table, -1 for no warehouse activities)
ACTIVITY_TYPE_SK (Link to your ACTIVITY_TYPE_DIM table)
ACTIVITY_DATE_SK (Link to your DATE_DIM table)
ITEM_SERIAL_NUMBER
ITEM_COUNT
Have a record in your WAREHOUSE dimension for NONE or NOT APPLICABLE and allocate it a nice obvious special condition SK value of -1 or -9 or whatever your shop is using for such things.
For activity records that reference a warehouse, put the appropriate warehouse sk AND the account sk that belong to that warehouse.
For activities that do not involve a warehouse, populate the warehouse sk with the NONE / NOT APPLICABLE warehouse dimension record and the appropriate Account SK.
Now your fact table can be joined to your Account and Warehouse dimension tables without having to worry about outer join or null condition handling. This should allow you and your users to play about with warehouse dimension data as required and your not having to faff about with managing two tables that contain essentially the same date.
A possibility is to define the hierarchy in a single dimension table. Guessing at what you’re dealing with, I came up with the following.
Outline of dimension table:
TABLE: Account
Account_ID <surrogate key>
Account <Account name, identifier>
Warehouse (Warehouse name, identifier)
Sample data:
Account_ID Account Warehouse
1 A n/a
2 B n/a
3 C n/a
4 W wh1
5 W wh2
6 Z wh3
7 Z n/a
Account_ID is just a surrogate key, having no intrinsic meaning or value
Account lists the accounts. Here, I shows five, A, B, C, W and Z. Select distinct to get the list of accounts; join to a fact table by Account_ID where Account = “W” gets all data for that account (for however many warehouses, if applicable).
Warehouse lists all warehouses and the account they are associated with; here, “W” is the account for two separate warehouses (wh1, wh2); Z is associated with warehouse wh3, but could also be used by a fact table with “no” warehouse. Join to a fact table by Account_ID where Warehouse = “wh1” gets all data for that warehouse.
Using this, with Account_ID in a fact table you could drill down for all entries for any given Account or for a specific warehouse (or for no warehouse, if there is value in that).
There are lots of variations and permutations possible with this kind of approach.
Related
I am working in Cognos Analytics 11.1.7.
I have a data module with three tables. Table 1 and 2 contain all the transactions we do, where table 1 contains the remitter's part of the transaction and table 2 contains the beneficiary's part of the transaction. I.e. one transaction is divided into two tables. Table 3 holds account numbers.
I want to create a report that shows the remitter's customer ID and account number. However, in some cases, the the remitter's account is missing. These customer ID's have unique customer ID's (Y instead of X). In those cases, I want the beneficiary's customer ID and account number. Consider the following three tables
Table 1: REMITTER
CUSTOMER_ID
ORDER_NR
1X
123
2Y
456
1X
789
Table 2: BENEFICIARY
CUSTOMER_ID
ORDER_NR
4X
123
6X
456
6X
789
Table 3: ACCOUNTS
CUSTOMER_ID
ACCOUNT_NR
1X
1111
2Y
3X
3333
4X
4444
5X
5555
6X
6666
What I want is basically the following report:
REPORT OF ALL TRANSACTIONS TODAY
CUSTOMER_ID
ACCOUNT_NR
ORDER_NR
1
1111
123
6
6666
456
1
1111
789
I have solved the CUSTOMER_ID column with a switch case:
CASE
WHEN REMITTER.CUSTOMER_ID CONTAINS 'Y'
THEN BENEFICIARY.CUSTOMER_ID
ELSE REMITTER.CUSTOMER_ID
END
Now here's the problem, I can't create a join (relationship) between the column created above and the ACCOUNTS table since my own column lies directly "under" the data module (on the same level as the tables in the index list to the left). However, if I create a column "under" REMITTER table, I can't use the case calculation from above. Cognos gives med the following error:
The expression is not valid.
XQE-MSR-0008 In module "STACKOVERFLOW", the following query
subjects are not joined: "REMITTER", "BENEFICIARY".
I have tried to circumvent the error by creating all kinds of joins between REMITTER and BENEFICIARY on ORDER_NR but Cognos keeps giving me this error.
I have also tried to make a "triangle" of joins, where REMITTER and BENEFICIARY are joined on ORDER_NR, REMITTER and ACCOUNTS are joined on CUSTOMER_ID and BENEFICIARY and ACCOUNTS are joined on CUSTOMER_ID. This doesn't work. However, when I delete either the REMITTER/ACCOUNT or BENEFICIARY/ACCOUNTS join, it works with the table I keep joined.
I am slowly losing my sanity here. Thanks!
What is the nature of the relationships between these entities?
That is a question which you should ask for everything in your model.
The pattern of that relationship drives the relationship between the objects in the model, which in turn drives what decisions you need to make in your modelling.
For example, is this a Bridge table situation? If so, you need to be aware of it so you can model appropriately.
In the end it falls back on Kimball:
Identify the facts
Identify the dimensions
I am assuming that the cardinality is beneficiary to remitter to account or
remitter to beneficiary to account.
Put beneficiary and remitter into a view in the module, create a relationship between it and account, and delete the relationship between the middle table and account (so that the SQL will use the relationship which you created ).
I think putting the calculation into the table which is in the middle would also do the trick.
I can not say that I can map between your described 'triangle' of joined tables and a business purpose so I could not use that information to understand the entity relationship. Such a pattern of relationship is specifically identified as one to be identified and, as part of the Cognos proven practices, corrected. Because I can not identify if there truly is a business purpose to have such a triangle or not, I can not, and will not, describe the appropriate modelling actions as they are dependent on the business purpose of the relationships between the entities, which takes us back to St. Ralph.
Use queries.
In a report, create a query to resolve the remitter/beneficiary problem, then join that to another query that gets the accounts data.
If this functionality must be canned -- because many unknown report developers will use it to produce many unknown reports -- you can still do the same thing, but in a data set.
As C'est Moi alludes to, you can also do this in the data module by joining Remitter and Beneficiary into a table that computes the Customer_Id.
I am trying to use this to learn how about data warehousing and having trouble understanding the concept of the fact table.
http://www.codeproject.com/Articles/652108/Create-First-Data-WareHouse
What would be some queries that I could run to find information from the faceable, and what questions do they answer.
A fact table is used in the dimensional model in data warehouse design. A fact table is found at the center of a star schema or snowflake schema surrounded by dimension tables.
A fact table consists of facts of a particular business process e.g., sales revenue by month by product. Facts are also known as measurements or metrics. A fact table record captures a measurement or a metric.
Example of fact table -
In the schema below, we have a fact table FACT_SALES that has a grain which gives us a number of units sold by date, by store and by product.
All other tables such as DIM_DATE, DIM_STORE and DIM_PRODUCT are dimensions tables. This schema is known as the star schema.
Let's translate this a bit.
Firstly, in a fact table we usually enter numeric values ( rarely Strings , char's ,or other data types).
The purpose of a fact table is to connect with the KEYS of dimensional tables ,other fact tables (fact tables more rarely, and also not a good practice) AND measurements ( and by measurements I mean numbers that change frequently like Prices , Quantities , etc.).
Let's take an example:
Think about a row from a fact table as an product from a supermarket when you pass it by the check out and it get's scanned. What will be displayed in the check out row in your database fact table? Possibly:
Product_ID | ProductName | CustomerID | CustomerName | InventoryID | StoreID | StaffID | Price | Quantity ... etc.
So all those Keys and measurements are bringed together in one fact table, having some big performance and understandability advantage.
Fact Table Contain all Primary key of Dimension table and Measure like "Sales Amount"
A Fact table is a table that stores your measurements of a business process. Here you would record numeric values that apply to an event like a sale in a store. It is surrounded by dimension tables which give the measurement context (which store? which product? which date?).
Using the dimensions you can ask lots of questions of your facts, like how many of a particular product have been sold each month in a region.
Some further info
All dimension keys in a fact should be a FK to the dimension
if a key is unknown it should point to a zero key in the dimension detaialing this.
All joins from a fact to a dim are 1 to 1. bridge tables are is a technique to cater for many to many's, but this is more advanced
All measurements in a fact are numeric, but can contain NULLS if unknown (never put 0 to represent unknown)
When joining facts to dimensions there is no need to do an outer join, due to FKS applied above.
if you have 999 rows in a fact, no matter what dimensions you join to, you should always have 999 rows returned.
I am working on data warehouse project where customer dimension table is larger than a fact table. Dimension and Fact tables are created from CRM system.
The fact table monitors activities such as letter is sent to a customer or customer calls for assistance. Half of customers have no activities and remaining customers have very few activities; most of customers who have activities have a single activity.
I am not sure if star schema is the best solution for project. Have you worked on similar projects & what was the solution.
If many of the dimension members are not related to the facts at all. I would sugest to filter unused dimension members during the ETL process.
So you do a
SELECT Customer_ID, Name FROM DIL.Customers
WHERE Customer_ID IS IN
(SELECT Customer_ID FROM DIL.Calls)
I wish to change the default aggregation from SUM to SUM on Distinct ID Values.
This is the current behaviour
ID Amount
1 $10
1 $10
2 $20
3 $30
3 $30
Sum Total = $90
By default, I am getting a sum of $90. I wish to do the sum on distinct ids and get a value of $60. How would I modify the default Aggregation Behavior to achieve this result?
Design your data as a many-to-many relationship: create one table/view having one record per ID and the amount column from the data shown in your question (the main fact table), and one table/view having one record per record of your data as shown in your question, presumably having another column, as otherwise it would not make any sense to have the data as shown in your question). This will be the m2m dimension table. Then, create a bridge table/view having the id of the m2m dimension table and your ID column.
Then create the following AS objects: A measure group from the main fact table, a dimension on column ID of the same table (in case there is no other column making a dimension table meaningful, in that case, you would better have a separate dimension table having ID as the primary key). Create a dimension from the m2m dimension table, and a measure group having only the invisible measure "count" from the bridge table. Finally, on the "Dimension Usage" tab of Cube Designer, set the relationship between the m2m dimension and the main measure group to be many to many via the bridge measure group.
See http://technet.microsoft.com/en-us/library/ms170463.aspx for a tutorial on many-to-many relationships.
Let's assume we have table USERS like:
..ID..username
user1
user2
user3
Users can have Bills (user -> have_many -> bill relation).
Table BILLS like:
..ID..user_id
1
2
2
Also we have Products so every Product can be associated to ONLY ONE bill (product -> has_one -> bill relation).
Table PRODUCTS like:
..ID..bill_id
2
3
1
So, as you can see, our user can have lot of products (through bills).
My question:
Would it be correct DUE TO Database normalization to add second foreign key to PRODUCTS table named user_id to quickly select all user's Products from PRODUCTS table, or it's not correct and I should use JOIN statement to select all User's Products?
P.S. Sorry for dirty tables drawing )
I would rather go with the normalized view (where you DO NOT have a user_id in the products table).
The only time I would ever consider this, as a last option, is if the performance REALY requires it.
It is usually better to normalize data so as not to duplicate information and retain data consistency.
However there are exceptions required in real life systems, often for performance reasons when dealing with huge volumes of data.