I have a Sales fact table, an Orders fact table (both line level detail), and two date roleplaying dimensions (from the Date dimension) for Order Date and Transaction Date.
I'm trying to get to a point where you can view sales measures by order date and order measures by transaction date.
The Sales table has the key for the related Order line if the sale was from an order and null if it was a non-order sale. The Order table doesn't have any links to the related transaction.
I've been trying to wrap my head around how to model a relationship based on the link between the two fact tables and the only method I can get to work would be to create a dimension based on the Orders table which contains only the key, then use many-to-many relationships... which somehow seems completely wrong, but I'm not sure what would be the "right" approach to this situation.
If at all possible I'd like the non-order sales to show as "unknown" order dates when viewing Sales Measures by Order date, so you can see the complete picture rather than just sales from orders. Using the above approach this isn't happening.
Any suggestions about what needs to be changed to get this to work?
You were on the right track. I would create a view in the relational database or a named query in the DSV containing as the single column the distinct non-null order IDs, maybe call it "DimOrderId". Then build a dimension from it, setting the "Null processing" property (you have to click the "plus" two times for the "Key Columns" property of the attribute in BIDS to access this property) to "UnknownMember".
And then use this dimension for the many-to-many relationship.
You should use the Order ID to lookup the Order Date and put an Order Date dimension key in the Sales Transaction fact table. Since there may be multiple transactions per order, the other way around probably just doesn't make sense. If it is 1:1 you could do the reverse, but it would mean updating order facts once the transaction occurs which could be a load-time complexity and performance hit. Make sure you really NEED order by Transaction Date.
Related
I have a question about modeling in Visual Studio SQL Server Data Tools. I am currently building a model where I have a fact table that is linked to a Date Dimension table using a Many : One relationship, in that order. The Dimension table has a unique value (PK) that links to the fact table, where the Date is the Foreign Key. For the sales table, there are many sales that happen on the same day. Hence 1:M.
There is no code for this question. However, when I go to refresh the table and bring them into my model, I am noticing that only ONE date shows up in a table for the Date column for the Dimension table. In other words, all dates and transactions show up from the Fact table dataset. However, for the Dimension table, only ONE date shows up: 11/1/2016. I need them all to show up and be linked to the fact table. If you are thinking that I might have the Date table filtered-- I have already checked this. There are NO current and active filters on Date in the Date Dimension table. I am puzzled as to why this is happening.
Also, the data types between tables appear to be consistent. Both are currently set to 'Short Date.' Changing the date type, for both columns, to MM/DD/yyyy does nothing to solve the problem, either.
I’ve been asked create our analysis cube and have a design question.
We sell ‘widgets’ and ‘parts’ to go with those widgets. Each order has many widgets and sometimes a few parts.
What I’m stuck on is – to me, an order is a fact in a measure. But, what are the widgets? Are they a dimension and each fact in the measure will be an entry for every part and widget for the order.
So, if order 123 had widget 1 and widget 2 and part 5, then there will be 3 facts in the measure for the same order? Is that correct?
At its basic level you can consider most facts to be transactions or transaction line items. So, for example, you may have a 'sales' fact table in which each record represents one line item from that sale. Each fact record would have numeric columns representing metrics and other columns joining to dimension tables. The combination of those dimensions would describe that line item. So, in your case, you likely have something like:
1) A 'date' dimension detailing the date of the transaction
2) A 'widget' dimension detailing the widget sold on that transaction
3) A 'customer' dimension detailing the customer who bought that item (almost certainly the same customer would appear on every line item for this transaction)
4) ... determined by what information you have and what business problem you're trying to solve.
Now, the dimension tables contain further details. For example, your widget dimension table likely contains things like the name of the widget, the color, the manufacturer, etc. Every time your company sells one of these widgets, the record in the fact table links to that same dimension record for that name, color, manufacturer, etc. combination (i.e. you don't create a new dimension record every time you sell the same item - this is a one-to-many relationship - each dimension record may have many related fact records).
You other dimension tables would similarly describe their dimensions. For example, the customer dimension might give the customer's name, their address, ...
So, the short answer to your question is that widget likely is a dimension, items and widgets may (or may not) actually be the same dimension (in a school class I suspect that they are), and that you would have 3 fact records for that one transaction.
This is probably along the same lines as the prior answer but....
If you try and model "many widgets per order" you'll have issues because you end up with a many (order fact) to many (widgets) relationship. In a cube / star schema design, many to many relationship usually need to be moddeled out to be many to one in some way.
So what you do is try and identify what special thing identifies an "order" (as opposed to a bunch of widgets in an order). Usually that is simply stuff like order date, customer, order number, tax
An example way to model this is:
If you have a single order with five widgets, you model that as a fact table with five records that happens to have a repeating widget, customer, date etc. in it
Then you have to work out how you spread an order header tax amount over five records. The two obvious solutions are:
Create a widget that represents tax and add that as another record
Spread the tax over five records, either evenly or weighted by something
Modelling "parts" just takes these concepts further.
It is important to understand what the end user wants to see, why they want to see parts. What do they want to measure by parts, how do you assign higher level values (like tax) down to lower levels like parts.
I have a DimPerson table and a DimPersonDecileOutrigger Table which stores decile data. The way the outrigger is structured is that a customer is given a decile for current year and previous year (if they have bought in the period)- which means a customer might have TY and NOT LY and vice versa. Some customers are both.
In ssis when I picked the columns in dimension structure- I initially only picked columns from DimPerson and not the outrigger. That way in the browser it showed all the id's starting from 1. But when I dragged some columns from outrigger- then in the browser it doesnt show all personID's. I want to see all customers regardless of them having a decile or not.
Pic attached to show what it looks like in dimension structure tab. Also the relationship is between OutriggerID as primary and OutriggerID in person as foreign.
If you just want to solve the problem, you can create a View in your underlying relational database that uses LEFT OUTER JOIN to link the two tables, so that the view will return all rows from DimPerson, even if they don't have a Decile.
Then use the view as the source for your dimension instead of the tables.
I'm new to that topic. I've got a database with a flat fact table, which contain data like date, product group, product subgroup, product actual name, and some calculations/statistics. All I need to do is create a report using olap cube. I have got two ideas how to create that, but dont know which draft is better (if even correct). The original DAILY_REPORT... table has not a primary key. Its just a data table. In first concept I have created every table (which will be as a dimension) with a ID, and connected the product->family of product->project->building in a hierarchy. Another concept is without all ID's and hierarchy. Relation created automatically based on names. Can somebody explain me in which direction I should tend...?
First idea:
http://imgur.com/iKNfAXF
Second:
http://imgur.com/IZjW1W6
Thanks in advance!
You can follow these steps to create your cube:
Create a separate view for each of the dimensions you want to have. Group similar type of data in one view, for e.g. Product Name, Product Group, Product Sub-Group, etc.
Keep the data in your dimension view as DISTINCT data. for e.g. SELECT DISTINCT [Product Name], [Product Group], [Product Sub-Group] FROM TABLE
Keep an 'ID' column in each dimension view, for e.g. Product ID in Product view
Create a view for your fact. Include 'ID' column of each dimension in your Fact view. This will help you to create relationship on 'ID' column, which will be a lot faster than relationship created on top of names.
For creating hierarchies in dimension attributes, SSAS provide drag and drop functionality.
If you need more details let me know.
You could construct the dimensions you need by views that based on distinct queries (i.e. SELECT DISTINCT) from the source data. These can be used to populate the dimensions.
You can make a synthetic date dimension fairly easily.
Then you can create a DSV that joins the views back against the fact table to populate the measure group.
If you need to fake a primary key then you can use a view that annotates the fact table with a column generated from row_number() or some similar means. Note that this is not necessarily stable across runs, so you can't rely on it for incremental loads. However, it would work fine for complete refreshes.
Well, I have just heard about that today but I do not get it. So I should not have Transaction table with Date column (because more transactions can occur at the same day) but I should have a Transaction and a Date column, where a Date would have a FK to a transaction. What is the point then, instead of a date I will repeat FK.
an example: A broker can make a transaction at any date. (transaction then needs to hold broker and date information).
Check out:
http://en.wikipedia.org/wiki/Database_normalization#Normal_forms
Transaction date does not need to be normalized.
But, imagine that Transaction is tied to customers, and customer details also have to be kept - this is a case where normalization helps to reduce data redundancy.
Assuming your date table is like the period table in our data warehouse, it is probably structured something like this:
Field date, datatype date (not datetime) primary key
other fields include fiscal year and holiday information
Then your transaction table might resemble something like this:
broker_id, foreign key to broker
date, foreign key to date
transaction time
other fields as necessary
Your question was, "what's the point?". This sort of database design allows you to easily answer questions like, "give me broker x's stats for the past 5 fiscal years, broken down by fiscal period"
Normalizing scalar values (dates, numbers, etc.) is generally overkill. Just because values repeat doesn't mean they should be normalized out. Only repeating values that aren't directly related to the row's primary key (e.g. an Address) should be candidates for normalization.
The only benefit I can see to normalizing dates if you want to add different representations of each date (e.g. Month, Quarter, etc.) without having to do the math each time. Otherwise the drwabacks outweigh the advantages in my opinion.
Moving a date attribute into another table and making it a foreign key in the Transaction table has nothing to do with "normalization".
Consider the example relation:
T{TransactionId,Date}
and dependency
{TransactionId}->{Date}
If TransactionId is a key then T already satisifies 6th Normal Form. Moving Date into another table, replacing it with another attribute and/or making it a foreign key will not make T any "more normalized" than it is already.
Whether or not attribute values "repeat" is irrelevant in normalization. What matter are the functional dependencies and join dependencies you mean to satisfy in your database schema. "Repeating data" is a phrase sometimes used informally to describe what functional dependencies are about but it is an oversimplification to say that decomposition is required simply because data repeats.
Using the date is not an ideal example. Think instead of customer records, tied to each transaction. You want to store the FK of a customer within each transaction row. You don't want to store the customer's name, address, password repeatedly though!