I have a question about modeling in Visual Studio SQL Server Data Tools. I am currently building a model where I have a fact table that is linked to a Date Dimension table using a Many : One relationship, in that order. The Dimension table has a unique value (PK) that links to the fact table, where the Date is the Foreign Key. For the sales table, there are many sales that happen on the same day. Hence 1:M.
There is no code for this question. However, when I go to refresh the table and bring them into my model, I am noticing that only ONE date shows up in a table for the Date column for the Dimension table. In other words, all dates and transactions show up from the Fact table dataset. However, for the Dimension table, only ONE date shows up: 11/1/2016. I need them all to show up and be linked to the fact table. If you are thinking that I might have the Date table filtered-- I have already checked this. There are NO current and active filters on Date in the Date Dimension table. I am puzzled as to why this is happening.
Also, the data types between tables appear to be consistent. Both are currently set to 'Short Date.' Changing the date type, for both columns, to MM/DD/yyyy does nothing to solve the problem, either.
Related
I'm developing a change history table where I'll basically record the old and new value for changes in fields of two types: decimal and datetime.
To make it simple, I was thinking about create a string field and convert the values to string before store in the table.
My problem is that later I'll have to create a field in the report to show the difference between the changes (like if the date as changed from 01/20/2015 to 01/27/2015 the difference will be 7 and so on). I do not want to create a field in the table to record the difference between the fields, I want to do it in the report side.
My question is:
Is there any way to store those two kind of data (decimal and datetime) to make it simple to do comparisons later? Cause if I have it in string type I'll have to convert it two times - one before create the record in DB and the other to see what is the difference between them.
I believe the best approach would be what I like to call the never delete, never update approach.
Basically, you add a column to your source table for the record status, that can be either current, historic or deleted (Use a tinyint for that, just be sure to have it linked to a row status table for readability). then instead of deleting a record you update it's status to deleted, and instead of updating it, you change it's status to historic and then insert a new record with the new data.
Naturally, this approach has it's price, since you will have to write an instead of update trigger, but that is a small price to pay comparing to other approaches of keeping history data.
Also, if your primary key is not an identity column, you will need to add this column to your primary key (and any other unique constraints you might have).
You also might want to add a filter to your non-clustered indexes so that they will only index the records where the status is current.
I have a Sales fact table, an Orders fact table (both line level detail), and two date roleplaying dimensions (from the Date dimension) for Order Date and Transaction Date.
I'm trying to get to a point where you can view sales measures by order date and order measures by transaction date.
The Sales table has the key for the related Order line if the sale was from an order and null if it was a non-order sale. The Order table doesn't have any links to the related transaction.
I've been trying to wrap my head around how to model a relationship based on the link between the two fact tables and the only method I can get to work would be to create a dimension based on the Orders table which contains only the key, then use many-to-many relationships... which somehow seems completely wrong, but I'm not sure what would be the "right" approach to this situation.
If at all possible I'd like the non-order sales to show as "unknown" order dates when viewing Sales Measures by Order date, so you can see the complete picture rather than just sales from orders. Using the above approach this isn't happening.
Any suggestions about what needs to be changed to get this to work?
You were on the right track. I would create a view in the relational database or a named query in the DSV containing as the single column the distinct non-null order IDs, maybe call it "DimOrderId". Then build a dimension from it, setting the "Null processing" property (you have to click the "plus" two times for the "Key Columns" property of the attribute in BIDS to access this property) to "UnknownMember".
And then use this dimension for the many-to-many relationship.
You should use the Order ID to lookup the Order Date and put an Order Date dimension key in the Sales Transaction fact table. Since there may be multiple transactions per order, the other way around probably just doesn't make sense. If it is 1:1 you could do the reverse, but it would mean updating order facts once the transaction occurs which could be a load-time complexity and performance hit. Make sure you really NEED order by Transaction Date.
Well, I have just heard about that today but I do not get it. So I should not have Transaction table with Date column (because more transactions can occur at the same day) but I should have a Transaction and a Date column, where a Date would have a FK to a transaction. What is the point then, instead of a date I will repeat FK.
an example: A broker can make a transaction at any date. (transaction then needs to hold broker and date information).
Check out:
http://en.wikipedia.org/wiki/Database_normalization#Normal_forms
Transaction date does not need to be normalized.
But, imagine that Transaction is tied to customers, and customer details also have to be kept - this is a case where normalization helps to reduce data redundancy.
Assuming your date table is like the period table in our data warehouse, it is probably structured something like this:
Field date, datatype date (not datetime) primary key
other fields include fiscal year and holiday information
Then your transaction table might resemble something like this:
broker_id, foreign key to broker
date, foreign key to date
transaction time
other fields as necessary
Your question was, "what's the point?". This sort of database design allows you to easily answer questions like, "give me broker x's stats for the past 5 fiscal years, broken down by fiscal period"
Normalizing scalar values (dates, numbers, etc.) is generally overkill. Just because values repeat doesn't mean they should be normalized out. Only repeating values that aren't directly related to the row's primary key (e.g. an Address) should be candidates for normalization.
The only benefit I can see to normalizing dates if you want to add different representations of each date (e.g. Month, Quarter, etc.) without having to do the math each time. Otherwise the drwabacks outweigh the advantages in my opinion.
Moving a date attribute into another table and making it a foreign key in the Transaction table has nothing to do with "normalization".
Consider the example relation:
T{TransactionId,Date}
and dependency
{TransactionId}->{Date}
If TransactionId is a key then T already satisifies 6th Normal Form. Moving Date into another table, replacing it with another attribute and/or making it a foreign key will not make T any "more normalized" than it is already.
Whether or not attribute values "repeat" is irrelevant in normalization. What matter are the functional dependencies and join dependencies you mean to satisfy in your database schema. "Repeating data" is a phrase sometimes used informally to describe what functional dependencies are about but it is an oversimplification to say that decomposition is required simply because data repeats.
Using the date is not an ideal example. Think instead of customer records, tied to each transaction. You want to store the FK of a customer within each transaction row. You don't want to store the customer's name, address, password repeatedly though!
I need to design a history table to keep track of multiple values that were changed on a specific record when edited.
Example:
The user is presented with a page to edit the record.
Title: Mr.
Name: Joe
Tele: 555-1234
DOB: 1900-10-10
If a user changes any of these values I need to keep track of the old values and record the new ones.
I thought of using a table like this:
History---------------
id
modifiedUser
modifiedDate
tableName
recordId
oldValue
newValue
One problem with this is that it will have multiple entries for each edit.
I was thinking about having another table to group them but you still have the same problem.
I was also thinking about keeping a copy of the row in the history table but that doesn't seem efficient either.
Any ideas?
Thanks!
I would recommend that for each table you want to track history, you have a second table (i.e. tblCustomer and tblCustomer_History) with the identical format - plus a date column.
Whenever an edit is made, you insert the old record to the history table along with the date/time. This is very easy to do and requires little code changes (usually just a trigger)
This has the benefit of keeping your 'real' tables as small as possible, but gives you a complete history of all the changes that are made.
Ultimately however, it will come down to how you want to use this data. If its just for auditing purposes, this method is simple and has little downside except the extra disk space and little or no impact on your main system.
You should define what type of efficiency you're interested in: you can have efficiency of storage space, efficiency of effort required to record the history (transaction cost), or efficiency of time to query for the history of a record in a specific way.
I notice you have a table name in your proposed history table, this implies an intention to record the history of more than one table, which would rule out the option of storing an exact copy of the record in your history table unless all of the tables you're tracking will always have the same structure.
If you deal with columns separately, i.e. you record only one column value for each history record, you'll have to devise a polymorphic data type that is capable of accurately representing every column value you'll encounter.
If efficiency of storage space is your main concern, then I would break the history into multiple tables. This would mean having new column value table linked to both an edit event table and a column definition table. The edit event table would record the user and time stamp, the column definition table would record the table, column, and data type. As #njk noted, you don't need the old column value because you can always query for the previous edit to get the old value. The main reason this approach would be expected to save space is the assumption that, generally speaking, users will be editing a small subset of the available fields.
If efficiency of querying is your main concern, I would set up a history table for every table you're tracking and add a user and time stamp field to each history table. This should also be efficient in terms of transaction cost for an edit.
You don't need to record old and new value in a history table. Just record the newest value, author and date. You can then just fetch the most recent record for some user_id based on the date of the record. This may not be the best approach if you will be dealing with a lot of data.
user (id, user_id, datetime, author, ...)
Sample data
id user_id datetime author user_title user_name user_tele ...
1 1 2012-11-05 11:05 Bob
2 1 2012-11-07 14:54 Tim
3 1 2012-11-12 10:18 Bob
I'm a beginner developer and I have a database which has several different dates.
Created Date
Converted Date
Lost Date
Changed Date
etc.
The data needs to be shown in one application and filter on all dates. I am coding in QlikView and I could create and date island and use their native set analysis to use filter the data, but that is having a major impact on performance.
Anyone coding in QlikView come across a similar scenario?
Set analysis indeed has a major impact on performance. You are better off using the normal 'selection' functionality in QlikView.
For the answer below I am going to assume that you are familiar with the concept of Star Schema development. In short it means separating Dimensions (selection fields) from Fact fields (counter fields, summation fields, etc.) and connecting them via a link table.
There are two possible scenarios:
1. More than one date is related to the same fact.
For example you have a ´sales transactions´ table which has as a fact the amount of money involved in the sale, and there is not only the ´sale date´ but also the ´payment date´ and you want to select on both. In this case you want to have several independent date selections, since you cannot be sure whether the user wants to select on Converted date, Created date... etc. You need to duplicate your ´date island´ with different keynames and connect it to your transactions table twice. Both date pools will no longer be islands and are more properly called ´Calendar dimensions´.
2. Different dates are related to different facts.
In this case you can use one 'Calendar dimension' to accommodate for all date fields. Simply create one AutoNumber key in your calendar and call it %DateKey. Make this field the connection between your calendar table and your link table. Now for all Fact Tables that have a date which you want to make selectable with the calendar, make sure you connect it to the linktable using a key that includes the Date in the Autonumber hash.
Having it experienced this same what i would reccomend would be creating what i call a Key Table like the example below ; keeps the relationships and you don't have to use set analysis as much; just make sure you put a table with all posible dates as one of the child tables and a %DateKey like littlegreen suggested