Customer Dimension as Fact Table in Star Schema

Customer Dimension as Fact Table in Star Schema - ssas

Can Dimension Table became a fact table as well? For instance, I have a Customer dimension table with standard attributes such as name, gender, etc.
I need to know how many customers were created today, last month, last year etc. using SSAS.
I could create faceless fact table with customer key and date key or I could use the same customer dimension table because it has both keys already.
Is it normal to use Customer Dimension table as both Fact & Dimension?
Thanks

Yes, you can use a dimension table as fact table as well. In your case, you would just have a single measure which would be the count - assuming there is one record per customer in this customer table. In case you would have more than one record per customer, e. g. as you use a complex slowly changing dimension logic, you would use a distinct count.

Given your example, it is sufficient to run the query directly against the Customer dimension. There is no need to create another table to do that, such as a fact table. In fact it would be a bad idea to do that because you would have to maintain it every day. It is simpler just to run the query on the fly as long as you have time attributes in the customer table itself. In a sense you are using a dimension as a fact but, after all, data is data and can be queried as need be.

Related

I need help counting char occurencies in a row with sql (using firebird server)

I have a table where I have these fields:
id(primary key, auto increment)
car registration number
car model
garage id
and 31 fields for each day of the mont for each row.
In these fields I have char of 1 or 2 characters representing car status on that date. I need to make a query to get number of each possibility for that day, field of any day could have values: D, I, R, TA, RZ, BV and LR.
I need to count in each row, amount of each value in that row.
Like how many I , how many D and so on. And this for every row in table.
What best approach would be here? Also maybe there is better way then having field in database table for each day because it makes over 30 fields obviously.

There is a better way. You should structure the data so you have another table, with rows such as:
CarId
Date
Status
Then your query would simply be:
select status, count(*)
from CarStatuses
where date >= #month_start and date < month_end
group by status;
For your data model, this is much harder to deal with. You can do something like this:
select status, count(*)
from ((select status_01 as status
from t
) union all
(select status_02
from t
) union all
. . .
(select status_31
from t
)
) s
group by status;

You seem to have to start with most basic tutorials about relational databases and SQL design. Some classic works like "Martin Gruber - Understanding SQL" may help. Or others. ATM you miss the basics.
Few hints.
Documents that you print for user or receive from user do not represent your internal data structures. They are created/parsed for that very purpose machine-to-human interface. Inside your program should structure the data for easy of storing/processing.
You have to add a "dictionary table" for the statuses.
ID / abbreviation / human-readable description
You may have a "business rule" that from "R" status you can transition to either "D" status or to "BV" status, but not to any other. In other words you better draft the possible status transitions "directed graph". You would keep it in extra columns of that dictionary table or in one more specialized helper table. Dictionary of transitions for the dictionary of possible statuses.
Your paper blank combines in the same row both totals and per-day detailisation. That is easy for human to look upon, but for computer that in a sense violates single responsibility principle. Row should either be responsible for primary record or for derived total calculation. You better have two tables - one for primary day by day records and another for per-month total summing up.
Bonus point would be that when you would change values in the primary data table you may ask server to automatically recalculate the corresponding month totals. Read about SQL triggers.
Also your triggers may check if the new state properly transits from the previous day state, as described in the "business rules". They would also maybe have to check there is not gaps between day. If there is a record for "march 03" and there is inserted a new the record for "march 05" then a record for "march 04" should exists, or the server would prohibit adding such a row. Well, maybe not, that is dependent upon you business processes. The general idea is that server should reject storing any data that is not valid and server can know it.
you per-date and per-month tables should have proper UNIQUE CONSTRAINTs prohibiting entering duplicate rows. It also means the former should have DATE-type column and the latter should either have month and year INTEGER-type columns or have a DATE-type column with the day part in it always being "1" - you would want a CHECK CONSTRAINT for it.
If your company has some registry of cars (and probably it does, it is not looking like those car were driven in by random one-time customers driving by) you have to introduce a dictionary table of cars. Integer ID (PK), registration plate, engine factory number, vagon factory number, colour and whatever else.
The per-month totals table would not have many columns per every status. It would instead have a special row for every status! The structure would probably be like that: Month / Year / ID of car in the registry / ID of status in the dictionary / count. All columns would be integer type (some may be SmallInt or BigInt, but that is minor nuancing). All the columns together (without count column) should constitute a UNIQUE CONSTRAINT or even better a "compound" Primary Key. Adding a special dedicated PK column here in the totaling table seems redundant to me.
Consequently, your per-day and per-month tables would not have literal (textual and immediate) data for status and car id. Instead they would have integer IDs referencing proper records in the corresponding cars dictionary and status dictionary tables. That you would code as FOREIGN KEY.
Remember the rule of thumb: it is easy to add/delete a row to any table but quite hard to add/delete a column.
With design like yours, column-oriented, what would happen if next year the boss would introduce some more statuses? you would have to redesign the table, the program in many points and so on.
With the rows-oriented design you would just have to add one row in the statuses dictionary and maybe few rows to transition rules dictionary, and the rest works without any change.
That way you would not

SSAS - relationship/granularity

I have 2 fact tables with a measure group each, Production and Production Orders. Production has production information at a lower granularity (at the component level) productionorders has information at a higher level (order level with header quantities etc.).
I have created a surrogate key link between the two tables on productionorderid. As soon as I add Prod ID (from productiondetailsdim) to the pivot table it blats out the actual qty (from prod order measure group) and I cannot combine the qty's from the two measure groups.
How can I design the correct relationship between the two? Please see my dim usage diagram. Production Details is the dim that links the two fact tables, at the moment DimProductionDetails is in a fact relationship with Production. I'm not sure what the relationship should be with Production Order (it is currently many to many).
Please see example data between the two tables:
I have to be able to duplicate this behaviour:

Do you want the full actual qty from prod order measure group to repeat next to each product? If so a many-to-many relationship is right. I suspect once I explain how that many-to-many works you will spot the problem.
When you slice full actual qty from prod order measure group by product from the Production Details dimension it does a runtime join between the two measure groups on the common dimensions. So for example, if for if order 245295 has a date of 1/1/2015 while the production details for order 245295 have dates of 1/8/2015 then the runtime join will lose rows for that order and actual qty will show as null. So compare all the dimensions used on both measure groups and ensure all rows for the same order have the same dimension keys for those common dimensions. If for example dates differ then create a named query in the DSV that selects just the dimension columns from the production fact table which match the order fact table. Then create a new measure group off that named query and use the new measure group as the intermediate measure group in your many to many dimension. (The current many to many cell in the dimension usage tab should say the name of the new measure group not the existing Production measure group.)
Edit: if you want the actual qty measure to only show when you are at the order level and be null at the product level then try the following. Change the many-to-many relationship to a regular relationship and in the dialog where you choose how the fact table joins to the dimension change the dimension attribute to ProductionOrder_SK (which is not the key of the dimension) and choose the corresponding column in the fact table. Then left click on the Production Order measure group and go to the Properties window and set IgnoreUnrelatedRelationships to false. That way slicing actual qty by work center or by an attribute that is below grain in the Production Details dimension will show as null.

Multiple Joins from one Dimension Table to single Fact table

I have a fact table that has 4 date columns CreatedDate, LoginDate, ActiveDate and EngagedDate. I have a dimension table called DimDate whose primary key can be used as foreign key for all the 4 date columns in fact table. So the model looks like this.
But the problem is, when I want to do sub-filtering for the measures based on the date column. For ex: Count all users who were created in the last month and are engaged in this month. This is not possible to do with this design, coz when I filter the measure with create date , I can’t further filter for a different time window for engaged date. Since all the connected to same dimension, they are not working independently.
However, If I create a separate date dimension table for each of the columns, and join them like this then it works.
But this looks very cumbersome when I have 20 different date columns in fact table in real world scenario, where I have to create 20 different dimensions and connect them one by one. Is there any other way I can achieve my scenario w/o creating multiple duplicated date dimensions?

This concept is called a role-playing dimension. You don't have to add the table to the DSV or the actual dimensions one time for each date. Instead add the date once, then go to the dimension usage tab. Click Add Cube Dimension, and then choose the date dim. Right-click and rename it. Then update the relationship to use the correct fields.
There's a good article on MSSQLTips.com that covers this topic.

Creating relationship between 2 tables in SQL

I have these 2 tables and I need to create a relationship between them so that I can import them into SSAS Tabular and run some analysis.
The first table has RollingQuarter(Moving Quarter) data. The second is a basic Date table with Date as PK.
Can anyone suggest ways to create a relationship with these?
Ill be using SQL Server 2012.
I could re-create a new date table also.

I think you may have a rough time finding a relationship with these tables.
Your top data table is derived data. It's an average over three months, reported monthly. The Quantity column applies to that window, not to a particular date like all of the stuff in the second table. So what would any relationship really mean?
If you have the primary data that were used to calculate your moving average, then use those instead. Then you can relate dates between the two tables.
But if your analysis is such that you don't need the primary data for the top table, then just pick the middle of each quarter (March 15th 2001 for the first record) and use that as your independent variable for your time series on the top. Then you can relate them by that.

SSAS Calculated Member

I have a cube with a measure called FactSales, which has entries for each day.
I have three Dimensions, Date, Customer and CustomerType.
Each FactSales row is linked to a date and customer by foreign key. customer is linked to customer type by foreign key.
From this I am able to spit out all sales figure for each customer on each date which is great.
I have multiple types : typeA, typeB, typeC, typeD, typeE.
What I want though is to create two calculated members which have the values aggregated for each customer by typeA, and then by everyother type.
What I have at the moment is something like
Case
When IsEmpty( [Measures].[FactSales] ) or [Customer].[CustomerType].currentmember <> [Customer].[CustomerType].&[typeA]
Then null
ELSE ([Customer].[CustomerType].&[typeA], [Measures].[FactSales] )
END
but I think this is wrong, and i also can't use this same method to get the value of all the other types excluding the typeA. I also dont get the aggregation when i roll it up to a higher level.
Can anyone help? I may not have explained my self well enough so please let me know if you need more info.

Why you are trying to do that? It's not good to store any calculation that is not related to single fact, data in fact table row should correspond only to one system fact (order, job, etc), and with aggregation you will be able to get data based on any dimension.
Your current structure is well and you can easily get calculations by slicing the cube over customer type dimension.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas