creation of dimension and fact table through source tables - sql

I have imported my flatfiles to SQL Server 2012 and created few tables (source tables). I need to build a cube in SSAS. But I need to make "dimension" and "fact" tables it seems with proper PK/FK relations. Could someone tell me whether I need to do:
create an empty dimABC, dimXYZ tables manually with PK identified?
copy data from source tables (imported above) into this new dimXXX tables through some SQL query?
then create a new factXXX table and copy the required facts(data) from source tables above.
Then I need to use these tables during cube build process.
I appreciate your help in clarifying my steps 1,2,3.

You're pretty close on your steps. It sounds like you are new to data warehousing? You might want to check out The Kimball Group's Data Warehouse Toolkit or website to ensure you get your dimensions and facts built correctly.
You have your data in "staging" meaning you have imported your raw data into SQL Server. You will need to create dimension tables with surrogate keys (just auto-incremented identity values) and then create fact tables that use these surrogate keys as foreign keys. You could probably do all of this in straight SQL, but this is what SSIS is for. Once you have your facts and dimensions defined and populated, best practice is to create views to use in the DSV for your cube.
Once you have your views populated and in your DSV in SSAS, you will build the dimensions and facts and then relate them in the cube. If you define the relationships in the DSV, the relationships will be mostly populated in the Dimension usage tab for you.

Related

How can I find the relationship between dimension and fact tables when there is no FK?

I need to create a bus matrix and in order to do that i need to know which fact table has relationships with which dimension tables.
Unfortunately, in this new project I'm in, it seems to be no FK (crazy, i know).
What I thought about is to use ETL queries and check the joins between the Fact table with the dimension tables.
What I'm worried about is that there might be more relationships that are not included in ETL queries...any advice?
You can use the system metadata tables to list the foreign key references:
select tbname, pkcolnames, reftbname, fkcolnames, colcount
from SYSIBM.SYSRELS B;
If the database does not have properly declared foreign key relationships, then the database does not have the information you are looking for.
Assuming the DB holds no information about the FKs (or information that would help you derive them, like identical column names) then, as you mentioned, examining the ETL code used to load each fact table is probably the only other way of doing this. The ETL must be running a look up on each dimension to get the PK to insert into the fact record, so the information will be there.
There shouldn't be any relationships involving facts that you couldn't determine with this approach. There may be additional relationships between dimensions (bridge tables, more complex SCD types, etc.) but if you sorted out the fact relationships then what remains should be a small enough subset to resolve manually (i.e. by intelligent guesswork)

Data Warehouse Architecture Modeling

I'm trying to Architecture creating a data warehouse in the Star Schema model... any idea would be appreciated.
Any idea what I should do to create a Star Schema? Some day that I should have a linking table with DimProjects going to the fact tables. What about Project hours? What is the right approach to this or do I need other tables to link? Employee's can work on multiple projects, projects require man hours... etc.
What is the best approach on modeling?
So far I have tables:
[CODE]
Dimension Tables Measure Tables
---------------- --------------
DimEmployee FactCRM
DimProjects FactTargets
DimSalesDetails FactRevenue
DimAccounts
DimTerritories
DimDate
DimTime
[/CODE]
Dimensions in a schema of a datewarehouse means independent entities like for say
Dim_Employee
Empid(pk)
Name
Address etc likewise all other
dimensions
With each dimension keys linked to your fact like in above case
FactCRM would include only crm
related measures and would be linled
To their specific dimensions depending
upon the requirements
Without knowing the columns noone would be able to tell what you want in actual. Also remember linking a dimension to a fact is obviously a partial star schema itself so that doesnt lead to any issues. The only thing is if your dimensions are itself normalized in a schema then it becomes snowflake.
Another thing about fact related if you want to perform manipulation of othwr facts based on somw existing facts then you have to link fact table as well with a unique factid. This is called fact constellation. Then the schema would become star/snowflake schema with facy constellation

Pentaho with Saiku Plugin Reference same Dimension multiple times

I am new to the BI realm so forgive me for any mistakes in my understanding. I am designing a Cube using Pentaho with Saiku and have created a basic star schema to support it. My fact table consists of a few facts which are numerical values representing hours of work and cost of work and surrogate keys to the dimension tables.
I need to be able to perform sorting, filtering and querying on several dates related to my fact records. I have created a date dimension to accomplish this. The problem I am having is relating my fact table to this dimension multiple times. Using Schema Workbench I managed to create multiple DimensionUsage records for each of my surrogate keys with different names each pointing to my date dimension.
Upon importing this Mondrian file back into Pentaho and creating a new Saiku Query I am presented with my list of measures and the related dimensions. The issue is that all my references to my date dimension are named the same, the name of the dimension table rather than the name I specified in Schema Workbench. I am unable to tell which relation is for which date field. Any ideas of where I may have gone wrong or is this a limitation of the products I am using?
I am using Pentaho CE 7.1

Star Schema Design / best practice

I am working with a system, which has 4 databases:
Account (Storing bank accounts, transactions, etc)
Client (Client related info)
Credit (getting rates from 3rd party system)
Quality (Further internal calculation)
I want to create 4 facts tables, one fact table for each database... for example, I will have an Account Fact table with ClientAccount, Transaction, Provider as its dimension table. I will have 3 similar Fact Tables for other databases.
My Question is: does it make sense to include each corresponding fact table in that database? i.e. Create Accounting Fact and Dimension tables in the Account database? Or is it a better to create a new database for all of our star schema, and include all the dimension and fact tables in their own database?
Without knowing too much about the system, I would suggest these are dimension tables rather than fact tables.
A dimension table represents an entity or an object that you can use to construct a fact. Accounts and clients seem like a good fit for this. I'm not sure what Credit and Quality are but they may be dimensions as well.
Your fact table should represent transaction-like records. This could be sales, transactions, phone calls, or whatever your data warehouse is reporting on. This fact table would then have foreign keys to each of the dimension tables.
Regarding a single or multiple databases: I would suggest storing it in a single database. It's easier to use that way, and you don't have to worry about database links when querying your data. Your ETL process for populating these fact and dimension tables can extract the data from these four databases and load it into one database, and from there, you can build the cubes in a single database.
Unless your data volume is very small, your data warehouse should be housed in a separate database from the transactional data. A DW has a different usage pattern (OLTP vs OLAP) and will generally have a different maintenance window.
I would recommend creating all of your Dims and Facts in a single dedicated DW database. I can't think of any benefit to separating them and it would reduce your DBA overhead by not having extra databases to manage/secure/audit/document.
As for Dimensions vs Facts, data from the OLTP Account table would be used to create a Dim and a Fact. DimAccount at the very least would be a degenerate dimension containing just the account number. You'd have to review your data to determine if any of the other records are generic attributes of the Account specifically. FactAccount would contain references to the other Dimensions (DimAccountType, DimCustomer, DimLocation, etc)
Think of the dimensions as the values from lookup tables/dropdown lists, which exist prior to any events happening. For example, a bank can offer Checking & Savings accounts, even if they do not yet have any accounts.
Facts document an event. When an account is created, the fact record will reference all of the dimensions that describe the event, and record the measurable values associated with the event, if any.

fact tables (Analysis Services)

I was just wondering if there could be any fact table, the keys of which don't belong to any of the dim tables? However, the fact table seems to contain the dim data.
The reason I came up with this question is that I was looking into a package which uses a dim table and fact table to pull data from, manipulate and them dump into the fact table. But, when I was trying to find any dependencies on the fact table (in the DSV ADD/Remove tables dialog box, I added the fact table, and then when I clicked on related tables, there was none)
And my claim is that the fact table gets some of its data from the dim tables.
Correct me if I am wrong.
Does your Fact table have columns which contain dimension keys, but are not constrained with a foreign key? I assume SSAS uses the foreign keys to identify related tables, so in this case, it wouldn't detect those tables. You can add related tables manually.
Another possibility is that the Fact table contains all the dimensions internally in a denormalized form. Rather than having dimension tables and keys to dimension members, all the data is stored in string form in the Fact table. If this is the case, you can create dimensions from the columns in the Fact table.
Is the fact table a table or named query?