Designing a cube

Designing a cube - ssas

I’ve been asked create our analysis cube and have a design question.
We sell ‘widgets’ and ‘parts’ to go with those widgets. Each order has many widgets and sometimes a few parts.
What I’m stuck on is – to me, an order is a fact in a measure. But, what are the widgets? Are they a dimension and each fact in the measure will be an entry for every part and widget for the order.
So, if order 123 had widget 1 and widget 2 and part 5, then there will be 3 facts in the measure for the same order? Is that correct?

At its basic level you can consider most facts to be transactions or transaction line items. So, for example, you may have a 'sales' fact table in which each record represents one line item from that sale. Each fact record would have numeric columns representing metrics and other columns joining to dimension tables. The combination of those dimensions would describe that line item. So, in your case, you likely have something like:
1) A 'date' dimension detailing the date of the transaction
2) A 'widget' dimension detailing the widget sold on that transaction
3) A 'customer' dimension detailing the customer who bought that item (almost certainly the same customer would appear on every line item for this transaction)
4) ... determined by what information you have and what business problem you're trying to solve.
Now, the dimension tables contain further details. For example, your widget dimension table likely contains things like the name of the widget, the color, the manufacturer, etc. Every time your company sells one of these widgets, the record in the fact table links to that same dimension record for that name, color, manufacturer, etc. combination (i.e. you don't create a new dimension record every time you sell the same item - this is a one-to-many relationship - each dimension record may have many related fact records).
You other dimension tables would similarly describe their dimensions. For example, the customer dimension might give the customer's name, their address, ...
So, the short answer to your question is that widget likely is a dimension, items and widgets may (or may not) actually be the same dimension (in a school class I suspect that they are), and that you would have 3 fact records for that one transaction.

This is probably along the same lines as the prior answer but....
If you try and model "many widgets per order" you'll have issues because you end up with a many (order fact) to many (widgets) relationship. In a cube / star schema design, many to many relationship usually need to be moddeled out to be many to one in some way.
So what you do is try and identify what special thing identifies an "order" (as opposed to a bunch of widgets in an order). Usually that is simply stuff like order date, customer, order number, tax
An example way to model this is:
If you have a single order with five widgets, you model that as a fact table with five records that happens to have a repeating widget, customer, date etc. in it
Then you have to work out how you spread an order header tax amount over five records. The two obvious solutions are:
Create a widget that represents tax and add that as another record
Spread the tax over five records, either evenly or weighted by something
Modelling "parts" just takes these concepts further.
It is important to understand what the end user wants to see, why they want to see parts. What do they want to measure by parts, how do you assign higher level values (like tax) down to lower levels like parts.

Related

SQL schema design: two tables vs adding columns to the same table

This question is about design decision, hence might be a bit opinionated:
Imagine you are designing database for a car dealer where they ONLY auction cars. Some cars are for display only, and some cars are to be sold in auction.
I have a Car entity with 10 attributes: ID, Model, Mode, YearMade, IsDisplayOnly....
Now, I want to add selling price and selling notes to those cars that are for sale (i.e. IsDisplayOnly = false)
I image that there are two ways this can be done:
Add Price and PriceNotes columns into the Car table, knowing that they are always null for IsDisplayOnly = true cars, and those that haven't been sold at auction yet.
Add a new table SaleInfo with 3 columns: CarID, Price, PriceNotes where CarID is the PK and also FK pointing to the ID column in the Car table.
Which option would align most with the best schema design practice? Why?

You should have one car for cars and the attributes of cars. You should have a separate table for the cars for auction.
Why? These are different entities. Your problem definition suggests an auction table. That auction table should have a foreign key references to the cars that are available for auction. A separate table ensures that that foreign key reference is valid.
There are some other reasons that are not apparent in your simplified example. Notes and prices might change over time, so they should be going into a history table. Display cars have other attributes, like the period of time when they are on display and how they are ultimately disposed of. This suggests that they too have particular attributes.

My advice would be to use three tables:
-The first to store all the makes and models of the cars. As well as their costs(eg Honda something or other selling for X amount of money)
-The second to store the details of the individual vehicles, containing a foreign key to the primary key of one of the Make/Model stored in the first table, as well as individual details such as the color, VIN no. etc. As well as whether they can be sold or not.
-The third table would contain the details of each individual purchases, linked to the table containing each individual vehicle, this would be linked to the table containing the details of each individual vehicle, with each purchase connected to a single instance. On the table of vehicles.
The advantages for this layout is that you are actually going to end up using less storage space in the long run, as instead of having the same three fields (The make, model and year) repeating for every vehicle, you will only have a single field to represent that data instead of multiple redundant fields. Another advantage will be searching, as if you are searching for details of individual vehicles of the same brand/type, you will be able to search using only one field, the key linked to the table containing the make and model. This would drastically decrease search times and improve the effectiveness of the system overall.

SSAS Cube Modeling

Two part architecture question:
I have employee, job title, and supervisor dimensions. I kind of wanted to keep them in one dimension and have something like site > supervisor > job title > employee. The problem is that these need to be SCD. That is, they have historical associations to relate to the facts. The fact tables have a requirement to be processed every five minutes (dashboard).
1) Should I have these in a single dimension with a surrogate key (or composite for that matter)? The keys/surrogate key would be composed of calendar_id - employee_id.
2) Have the fact tables have maintain a reference to three different dimensions instead?
The requirement to process every 5 minutes (MOLAP SSIS ETL driven processing). Makes me lean toward keeping the time/change in the facts so that I would ease having to process the dimensions along with the fact tables.

I would design it as a single dimension, with the hierarchy you mentioned: site > supervisor > job title > employee.
Let's call this dimension EmployeeAssignment, because its granularity is not Employees, but any combination of site/supervisor/job title that an employee "adopts" during his/her career. (Feel free to come up with a better name).
I don't think you need a calendar_id key in this dimension: a surrogate key based on DISTINCT SiteID,SupervisorID,JobTitleID,EmployeeID would be enough. Adding a calendar_id key would be making the dimension do too much work: over and above slicing the actual facts, this would make the dimension answer questions like
"Where was employeeID 12345 (in the site/supervisor/job title network) on 1 January 2015?" and
"How many employees did supervisorID 98765 supervise on 1st January 2015?"
These questions IMHO are best addressed with a fact, not a dimension. One cube I've worked on addresses with with an EmployeeDay measure: sliced by dimensions "EmployeeAssignment" and Time, this simply has a 1 if the employee is in that "assignment" on that day.
This EmployeeAssignment SCD is actually pretty slowly-changing, especially compared to your 5-minute fact update interval. Employees are not going to move about or get promoted every 5 minutes, so a reprocess of the dimension shouldn't be necessary more often than daily.
If I've misunderstood anything, let me know in the comments.

SSAS 3 fact tables, but only 2 relate to a certain dimension

I have a cube with 3 fact tables and 20 + dimensions that relate easily to all 3 fact tables and everything works fine except for the fact that one of the dimensions (Warehouse) is only related to 2 of the 3 fact tables. My problem I guess is a display issue. When the user is viewing measures from all 3 fact tables then drags over the Warehouse dimension, it simply repeats the grand total of the measure in the 3rd fact table for every possible value of Warehouse. This certainly makes sense to me as there is no relationship set up and it's conceptually behaving almost like a cross-join. Nonetheless, it's confusing to users and I'd like to not have the grand total duplicated for each dimension member in Warehouse. I was thinking one solution was to create a dummy warehouse called "Not Applicable" and then relate every row in the 3rd fact table to that dimension member. I was hoping there's just a setting in SSAS where I could control this behavior so I didn't have to create any new warehouse values. Is there a standard way to handle non-related dimensions with multiple fact tables? Thanks in advance.

You can use the "IgnoreUnrelatedDimensions" property of the measure group not related to Warehouse: set it from the default value true to false. Then, measure values for this measure group will only be shown for the "All" members from the warehouse dimension, and the cells will be null (empty) for non-All members of this dimension.
This is a global setting per measure group, you cannot configure it individually per dimension and measure group. But for your purpose, this should be fine.

Model a relationship between two fact tables

I have a Sales fact table, an Orders fact table (both line level detail), and two date roleplaying dimensions (from the Date dimension) for Order Date and Transaction Date.
I'm trying to get to a point where you can view sales measures by order date and order measures by transaction date.
The Sales table has the key for the related Order line if the sale was from an order and null if it was a non-order sale. The Order table doesn't have any links to the related transaction.
I've been trying to wrap my head around how to model a relationship based on the link between the two fact tables and the only method I can get to work would be to create a dimension based on the Orders table which contains only the key, then use many-to-many relationships... which somehow seems completely wrong, but I'm not sure what would be the "right" approach to this situation.
If at all possible I'd like the non-order sales to show as "unknown" order dates when viewing Sales Measures by Order date, so you can see the complete picture rather than just sales from orders. Using the above approach this isn't happening.
Any suggestions about what needs to be changed to get this to work?

You were on the right track. I would create a view in the relational database or a named query in the DSV containing as the single column the distinct non-null order IDs, maybe call it "DimOrderId". Then build a dimension from it, setting the "Null processing" property (you have to click the "plus" two times for the "Key Columns" property of the attribute in BIDS to access this property) to "UnknownMember".
And then use this dimension for the many-to-many relationship.

You should use the Order ID to lookup the Order Date and put an Order Date dimension key in the Sales Transaction fact table. Since there may be multiple transactions per order, the other way around probably just doesn't make sense. If it is 1:1 you could do the reverse, but it would mean updating order facts once the transaction occurs which could be a load-time complexity and performance hit. Make sure you really NEED order by Transaction Date.

Two or more similar counts on fact table in dimensional modelling

I have designed a fact table that stores the facts for a specific date dimension and an action type such as create, update or cancelled. The facts can be create and cancelled only once, but update many times.
myfact
---------------
date_key
location_key
action_type_key
This will allow me to get a count for all the updates done, all the new ones created for a period and specify a specific region through the location dimension.
Now in addition I also have 2 counts for each fact, i.e. Number of People, Number of Buildings. There is no relation between these. And I would like to query on how many of the facts having a specific count, such as how many have 10 building, how many have 9 etc.
What would be the best table design for these. Basically I see the following options, but am open to hear better solutions.
add the counts as reference info in the fact table as people_count and building_count
add a dimension for each of these that stores the valid options, i.e. people dimension that stores a key and a count and building dimension that stores a key and a count. The main fact will have a people_key and a building_key
add one dimension for the count these is used for both people and building counts, i.e. count dimension that stores a key and a generic count. The main fact will have a people_count_key and a building_count_key

First your counts are essentially "dimensions" in the purest sense (you can think of dimensions as a way to group records for reporting purposes). The question though is whether dimensional modeling is what you want to do. I think you are better off as seeing this as something of an implicit dimension than you are to add dimension tables. What this means essentially is that dimension tables add nothing and they create corner cases of errors I just don't think are very helpful unless you need to track a bunch of information related to numbers.
If it were me I would just add the counts to the fact table, not to other tables.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas