I am new to the BI realm so forgive me for any mistakes in my understanding. I am designing a Cube using Pentaho with Saiku and have created a basic star schema to support it. My fact table consists of a few facts which are numerical values representing hours of work and cost of work and surrogate keys to the dimension tables.
I need to be able to perform sorting, filtering and querying on several dates related to my fact records. I have created a date dimension to accomplish this. The problem I am having is relating my fact table to this dimension multiple times. Using Schema Workbench I managed to create multiple DimensionUsage records for each of my surrogate keys with different names each pointing to my date dimension.
Upon importing this Mondrian file back into Pentaho and creating a new Saiku Query I am presented with my list of measures and the related dimensions. The issue is that all my references to my date dimension are named the same, the name of the dimension table rather than the name I specified in Schema Workbench. I am unable to tell which relation is for which date field. Any ideas of where I may have gone wrong or is this a limitation of the products I am using?
I am using Pentaho CE 7.1
Related
Search one table (Encounters Table) for all records - Insert Totals into a Providers Table
Encounters table lists every hospital encounter over time. Each Encounter record has a Provider (Provider is also in the Providers Table one time as a record - 229 providers, presently.)
The encounter table lists the date and the provider, so that one provider may have 15 to 20 encounter records for that date.
In the Providers table I have been trying to get a count in different fields (formula fields) that will list the number of encounters for the provider in a time frame. Examples of the time frame are current week, past week... or could be last month, two months ago, etc. So the Providers Table primarily would have a key field of Providers which would relate to the Providers in the Encounter table.
Now the software used is Quickbase which is a Cloud Based Database used by the hospital I work for.
I have been unable to use a one-to-many relationship to pull the Provider from the Encounters table into the Provider's table. The column generated is blank. So when I try to pull encounter counts into formula number fields like 'current_week', last_week, last_month, etc. the counts just say 0.
The Quickbase help tells me that I need to populate the related Provider fields so the names appear, but I'm thinking I can't just use a file import (ie CSV file) to fill in the Encounters table since it has thousands of records and there are only 229 providers.
Queries can be done, and knowing that SQL handles this type of work, it just seems that may be the best way to handle this. My SQL isn't that good, but I think I could write a formula that handles this with two tables. I just am not certain how to fill the Encounter Count fields.
This is a bit of a hurdle and perhaps because I'm using a proprietary software package 'Quickbase' it may not be possible.
I do know that queries are used in Quickbase, so the logic of a SQL statement should work in the dialog box used to write formulas in QB.
Any help would be appreciated.
If the only place you need the see the # of Encounters Per Provider in a time frame, I'd try to make it work with a Summary Field in Quickbase and apply a filter for a date range (I'm assuming there is an existing relationship between Providers and Encounters where Providers is the parent).
You can get some good tips on how to configure that field in the Quickbase Community https://community.quickbase.com/home
I'm trying to Architecture creating a data warehouse in the Star Schema model... any idea would be appreciated.
Any idea what I should do to create a Star Schema? Some day that I should have a linking table with DimProjects going to the fact tables. What about Project hours? What is the right approach to this or do I need other tables to link? Employee's can work on multiple projects, projects require man hours... etc.
What is the best approach on modeling?
So far I have tables:
[CODE]
Dimension Tables Measure Tables
---------------- --------------
DimEmployee FactCRM
DimProjects FactTargets
DimSalesDetails FactRevenue
DimAccounts
DimTerritories
DimDate
DimTime
[/CODE]
Dimensions in a schema of a datewarehouse means independent entities like for say
Dim_Employee
Empid(pk)
Name
Address etc likewise all other
dimensions
With each dimension keys linked to your fact like in above case
FactCRM would include only crm
related measures and would be linled
To their specific dimensions depending
upon the requirements
Without knowing the columns noone would be able to tell what you want in actual. Also remember linking a dimension to a fact is obviously a partial star schema itself so that doesnt lead to any issues. The only thing is if your dimensions are itself normalized in a schema then it becomes snowflake.
Another thing about fact related if you want to perform manipulation of othwr facts based on somw existing facts then you have to link fact table as well with a unique factid. This is called fact constellation. Then the schema would become star/snowflake schema with facy constellation
Need to associate multiple fact tables with a mondrian cube. The schema workbench doesn't allow to do so. How can we achieve this?
You cannot add multiple fact tables in a cube. Schema workbench expects you to have a Star schema in which there will just be one fact table. If you need to combine information from two fact tables on the same subject but with different or same granularity, then you must create a virtual cube. It is easy and very convenient. You can refer to the following documentation:
https://help.pentaho.com/Documentation/6.0/0N0/020/040/000
I have imported my flatfiles to SQL Server 2012 and created few tables (source tables). I need to build a cube in SSAS. But I need to make "dimension" and "fact" tables it seems with proper PK/FK relations. Could someone tell me whether I need to do:
create an empty dimABC, dimXYZ tables manually with PK identified?
copy data from source tables (imported above) into this new dimXXX tables through some SQL query?
then create a new factXXX table and copy the required facts(data) from source tables above.
Then I need to use these tables during cube build process.
I appreciate your help in clarifying my steps 1,2,3.
You're pretty close on your steps. It sounds like you are new to data warehousing? You might want to check out The Kimball Group's Data Warehouse Toolkit or website to ensure you get your dimensions and facts built correctly.
You have your data in "staging" meaning you have imported your raw data into SQL Server. You will need to create dimension tables with surrogate keys (just auto-incremented identity values) and then create fact tables that use these surrogate keys as foreign keys. You could probably do all of this in straight SQL, but this is what SSIS is for. Once you have your facts and dimensions defined and populated, best practice is to create views to use in the DSV for your cube.
Once you have your views populated and in your DSV in SSAS, you will build the dimensions and facts and then relate them in the cube. If you define the relationships in the DSV, the relationships will be mostly populated in the Dimension usage tab for you.
Why is it not necessary to define a hierarchy using the Date attribute from a date/time table?
The Analysis Service project seems to want me to create a hierarchy within my Dimension -- the tooltip says "Create hierarchies in non-parent child dimensions". I really had none that came to mind, so I tried adding a the PK Date attribute from my Time table, and creating a hierarchy with that.
When I do this, I get the error "Errors in the high-level relational engine. The 'dbo_Orders' table that is required for a join cannot be reached based on the relationships in the data source view."
I noticed in the AdventureWorks sample never uses Date in a hierarchy. Why is this?