How to model a table with few rows in star schema manner? - pentaho

I have two dimensions: location and date. There's one fact table (x) consisting of measures with respect to location and date. Now, I have a requirement for including target KPI measures for each of the 60 locations I have in location dimension table. So, each measure in the fact table (x) has a benchmark measure (KPI). I cannot add it in the fact table (x) because the KPI values would repeat all across the depth of the table.
How do I re-model the star schema to incorporate this requirement?

You could have two dimensions (date and location) and two fact tables (X and target KPI).
This is usually done in this way because the fact table with the real measure have more dimensions and rows, where the target/forecast has often less details.
I.e. A supermarket chain can fix monthly targets for each store, but it has sales data by store, day, product.
My suggestion is to have two separated tables.
Then if you will use often measures (from X table) and target KPIs together (showing deltas, delta percentage, etc.), then you can think about creating a third fact table (or a view, if performances allow that) to avoid joining between the other two fact tables every time you need that.

Related

Multi-dimensional to Tabular mapping one column to several

I have the following mapping within an old multi-dimensional cube:
Each one of these 3 INVOICE_FACT columns maps to the same sman_key column within the salesman table. I know within tabular that only one column is permitted to be mapped to another column.
How can I replicate this relationship within tabular?
The two standard approaches to this issue are to either create multiple relationships between the two tables (only one relationship can be active at a time) or to import the salesman table multiple times with different names (e.g. salesman, supervisor, driver).
Without knowing more about the way that your data model is being used, it is hard to recommend the approach to use, but I tend to favour importing the salesman table multiple times.
If you use inactive relationships, a key function to know is the USERELATIONSHIP() function which specifies that a particular calculation is to occur using an inactive relationship.
https://www.sqlbi.com/articles/userelationship-in-calculated-columns/ provides some example of this technique where the relationships are to a date dimension, and you want to have measures which accumulate sales amounts according to Order Date, Due Date, or Ship Date (as per AdventureWorks) e.g.
SalesByDueDate :=
CALCULATE (
SUM ( FactInternetSales[SalesAmount] ),
USERELATIONSHIP (
FactInternetSales[DueDateKey],
DimDate[DateKey]
)
)

SSAS - relationship/granularity

I have 2 fact tables with a measure group each, Production and Production Orders. Production has production information at a lower granularity (at the component level) productionorders has information at a higher level (order level with header quantities etc.).
I have created a surrogate key link between the two tables on productionorderid. As soon as I add Prod ID (from productiondetailsdim) to the pivot table it blats out the actual qty (from prod order measure group) and I cannot combine the qty's from the two measure groups.
How can I design the correct relationship between the two? Please see my dim usage diagram. Production Details is the dim that links the two fact tables, at the moment DimProductionDetails is in a fact relationship with Production. I'm not sure what the relationship should be with Production Order (it is currently many to many).
Please see example data between the two tables:
I have to be able to duplicate this behaviour:
Do you want the full actual qty from prod order measure group to repeat next to each product? If so a many-to-many relationship is right. I suspect once I explain how that many-to-many works you will spot the problem.
When you slice full actual qty from prod order measure group by product from the Production Details dimension it does a runtime join between the two measure groups on the common dimensions. So for example, if for if order 245295 has a date of 1/1/2015 while the production details for order 245295 have dates of 1/8/2015 then the runtime join will lose rows for that order and actual qty will show as null. So compare all the dimensions used on both measure groups and ensure all rows for the same order have the same dimension keys for those common dimensions. If for example dates differ then create a named query in the DSV that selects just the dimension columns from the production fact table which match the order fact table. Then create a new measure group off that named query and use the new measure group as the intermediate measure group in your many to many dimension. (The current many to many cell in the dimension usage tab should say the name of the new measure group not the existing Production measure group.)
Edit: if you want the actual qty measure to only show when you are at the order level and be null at the product level then try the following. Change the many-to-many relationship to a regular relationship and in the dialog where you choose how the fact table joins to the dimension change the dimension attribute to ProductionOrder_SK (which is not the key of the dimension) and choose the corresponding column in the fact table. Then left click on the Production Order measure group and go to the Properties window and set IgnoreUnrelatedRelationships to false. That way slicing actual qty by work center or by an attribute that is below grain in the Production Details dimension will show as null.

Multiple Joins from one Dimension Table to single Fact table

I have a fact table that has 4 date columns CreatedDate, LoginDate, ActiveDate and EngagedDate. I have a dimension table called DimDate whose primary key can be used as foreign key for all the 4 date columns in fact table. So the model looks like this.
But the problem is, when I want to do sub-filtering for the measures based on the date column. For ex: Count all users who were created in the last month and are engaged in this month. This is not possible to do with this design, coz when I filter the measure with create date , I can’t further filter for a different time window for engaged date. Since all the connected to same dimension, they are not working independently.
However, If I create a separate date dimension table for each of the columns, and join them like this then it works.
But this looks very cumbersome when I have 20 different date columns in fact table in real world scenario, where I have to create 20 different dimensions and connect them one by one. Is there any other way I can achieve my scenario w/o creating multiple duplicated date dimensions?
This concept is called a role-playing dimension. You don't have to add the table to the DSV or the actual dimensions one time for each date. Instead add the date once, then go to the dimension usage tab. Click Add Cube Dimension, and then choose the date dim. Right-click and rename it. Then update the relationship to use the correct fields.
There's a good article on MSSQLTips.com that covers this topic.

Customer Dimension as Fact Table in Star Schema

Can Dimension Table became a fact table as well? For instance, I have a Customer dimension table with standard attributes such as name, gender, etc.
I need to know how many customers were created today, last month, last year etc. using SSAS.
I could create faceless fact table with customer key and date key or I could use the same customer dimension table because it has both keys already.
Is it normal to use Customer Dimension table as both Fact & Dimension?
Thanks
Yes, you can use a dimension table as fact table as well. In your case, you would just have a single measure which would be the count - assuming there is one record per customer in this customer table. In case you would have more than one record per customer, e. g. as you use a complex slowly changing dimension logic, you would use a distinct count.
Given your example, it is sufficient to run the query directly against the Customer dimension. There is no need to create another table to do that, such as a fact table. In fact it would be a bad idea to do that because you would have to maintain it every day. It is simpler just to run the query on the fly as long as you have time attributes in the customer table itself. In a sense you are using a dimension as a fact but, after all, data is data and can be queried as need be.

sql query performance - wide table with fewer rows or narrow table with lot of rows?

I would like to make queries on fact table in star schema.
I need to capture lot of values like gross sales net sales regional sales etc for combination of other couple values that is PK.
I have two options:
One row with PK and lot of columns with measures like gross sales, regional sales etc.
make measures as a dimension, so PK would be bigger - there would be added column Measures in row. And only one value beside PK. So I decompose the one row with many Measures to lot of rows with one Measure.
What is better for performance, both insert and select?
You will have contention issues if you have a single row with all the values, assuming you have simultaneous inserts/updates and reads.
Having a single wide table also means it's much more difficult to add new measures in the future - it requires changing the table schema which will lock the table and cause other problems.
Your SELECT performance should be similar, unless you are pulling multiple values for the same PK in the same query, in which case the wider table would probably be a little quicker.
What is your lowest level of granularity in the fact? When you mention things like regional sales and gross net sales it makes me think that you might be confusing measures and dimensions. For example: a region would be a dimension of the sales fact.