I have three tables (lots in reality but these three are the ones I have to worry about right now)
A product line table like...
ProductLineId (pk)
Name
Description
Price
Finance Event (FK)
and a finance event table like...
EventId (pk)
Event Description
and a Financial Transaction table like...
TransactionId (pk)
FinanceEventId (fk)
LotsOfSageReferencesAndOtherForeignKeys
When a sale is processed it creates transaction records based on the Finance Events etc.
The question is; if someone in the admin then goes in and changes the Finance event what is the best way of versioning the events table, while preserving the primary key.
In general you have several choices:
First when dealing with temporal data, you should store the actual values not just the ids in the transaction table. Then the other tables serve as lookup tables for the creation of new records, but you always know what the real data was at the time of the transactions, This would include things that change over time like price, customer name, etc. Note you may also need to store the id field in case you need to look up what the current equivalent is.
Or you can disallow updates entirely on your lookup tables and through a trigger, deactivate the current record and insert a new one when someone runs an update statement. Now your child tables hold the values of the record that was active at the time the record was created. What you lose in this scenario is the ability to look up what the current value would be (which is why you might not want to do it for something like customer name or price).
For data where you might not care if it changed over time, but would want to reflect the current information, allow the updates but have an audit or history table maintained through triggers; so you can recreate what the value would have been at the time if you need to, or to see who changed what and when (sometimes a legal requirement).
Which of these options suits your current situation, only you would know.
Related
I have a table called Station with many fields,
one of the fields is - StationPrice.
I want to hold more information about the payment process such as - currency, paymentStatus and etc (somewhere like 10 fields).
My question is if I need to expand the current table - Station with the new fields or to have a field called - StationPriceId that will be a foreign key to another table called StationPrices which will store all the information about the price related to that station.
The answer to your question is 1 of the most popular answers in DB world - it depends. It might look nicer if this info is split into 2 different tables, however you need to understand that if you split it then you'll need to 2 inserts instead of 1, the same with updated and deletes. Moreover you'll need to JOIN this table every time you'll need this data. I would go with single table first and then moved it to the separate column when the specific need come up.
From other side of view, if this data will be rarely accessed and JOIN/DELETE/INSERT overhead will be minimal then it is OK to move it.
My question is if I need to expand the current table - Station with the new fields or to have a field called - StationPriceId that will be a foreign key to another table called StationPrices which will store all the information about the price related to that station.
Yes it is better to use two tables, Based on information you had provided - it is better to use a separate table.
You might want to maintain a price change history
Hence if you maintain a separate table, you can make earlier price as Active = false and enter a new price for the particular station
I am currently going to be designing an app in vb.net to work with an access back-end database. I have been trying to think of ways to reduce down data redundancy
and I have an example scenario below:
Lets imagine, for an example purpose, I have a customers table and need to highlight all customers in WI and send them a letter. The customers table would
contain all the customers and properties associated with customers (Name, Address, Etc) so we would query for where the state is "WI" in the table. Then we would
take the results of that data, and append it into a table with a "completion" indicator (So from 'CUSTOMERS' to say 'WI_LETTERS' table).
Lets assume some processing needs to be done so when its completed, mark a field in that table as 'complete', then allow the letters to be printed with
a mail merge. (SELECT FROM 'WI_LETTERS' WHERE INDICATOR = COMPLETE).
That item is now completed and done. But lets say, that every odd year (2013) we also send a notice to everyone in the table with a state of "WI". We now query the
customers table when the year is odd and the customer's state is "WI". Then append that data into a table called 'notices' with a completion indicator
and it is marked complete.
This seems to keep the data "task-based" as the data is based solely around the task at hand. However, isn't this considered redundant data? This setup means there
can be one transaction type to many accounts (even multiple times to the same account year after year), but shouldn't it be one account to many transactions?
How is the design of this made better?
You certainly don't want to start creating new tables for each individual task you perform. You may want to create several different tables for different types of tasks if the information you need to track (and hence the columns in those tables) will be quite different between the different types of tasks, but those tables should be used for all tasks of that particular type. You can maintain a field in those tables to identify the individual task to which each record applies (e.g., [campaign_id] for Marketing campaign mailouts, or [mail_batch_id], or similar).
You definitely don't want to start creating new tables like [WI_letters] that are segregated by State (or any client attribute). You already have the customers' State in the [Customers] table so the only customer-related attribute you need in your [Letters] table is the [CustomerID]. If you frequently want to see a list of Letters for Customers in Wisconsin then you can always create a saved Query (often called a View in other database systems) named [WI_Letters] that looks like
SELECT * FROM Letters INNER JOIN Customers ON Customers.CustomerID=Letters.CustomerID
WHERE Customers.State="WI"
I need to design a history table to keep track of multiple values that were changed on a specific record when edited.
Example:
The user is presented with a page to edit the record.
Title: Mr.
Name: Joe
Tele: 555-1234
DOB: 1900-10-10
If a user changes any of these values I need to keep track of the old values and record the new ones.
I thought of using a table like this:
History---------------
id
modifiedUser
modifiedDate
tableName
recordId
oldValue
newValue
One problem with this is that it will have multiple entries for each edit.
I was thinking about having another table to group them but you still have the same problem.
I was also thinking about keeping a copy of the row in the history table but that doesn't seem efficient either.
Any ideas?
Thanks!
I would recommend that for each table you want to track history, you have a second table (i.e. tblCustomer and tblCustomer_History) with the identical format - plus a date column.
Whenever an edit is made, you insert the old record to the history table along with the date/time. This is very easy to do and requires little code changes (usually just a trigger)
This has the benefit of keeping your 'real' tables as small as possible, but gives you a complete history of all the changes that are made.
Ultimately however, it will come down to how you want to use this data. If its just for auditing purposes, this method is simple and has little downside except the extra disk space and little or no impact on your main system.
You should define what type of efficiency you're interested in: you can have efficiency of storage space, efficiency of effort required to record the history (transaction cost), or efficiency of time to query for the history of a record in a specific way.
I notice you have a table name in your proposed history table, this implies an intention to record the history of more than one table, which would rule out the option of storing an exact copy of the record in your history table unless all of the tables you're tracking will always have the same structure.
If you deal with columns separately, i.e. you record only one column value for each history record, you'll have to devise a polymorphic data type that is capable of accurately representing every column value you'll encounter.
If efficiency of storage space is your main concern, then I would break the history into multiple tables. This would mean having new column value table linked to both an edit event table and a column definition table. The edit event table would record the user and time stamp, the column definition table would record the table, column, and data type. As #njk noted, you don't need the old column value because you can always query for the previous edit to get the old value. The main reason this approach would be expected to save space is the assumption that, generally speaking, users will be editing a small subset of the available fields.
If efficiency of querying is your main concern, I would set up a history table for every table you're tracking and add a user and time stamp field to each history table. This should also be efficient in terms of transaction cost for an edit.
You don't need to record old and new value in a history table. Just record the newest value, author and date. You can then just fetch the most recent record for some user_id based on the date of the record. This may not be the best approach if you will be dealing with a lot of data.
user (id, user_id, datetime, author, ...)
Sample data
id user_id datetime author user_title user_name user_tele ...
1 1 2012-11-05 11:05 Bob
2 1 2012-11-07 14:54 Tim
3 1 2012-11-12 10:18 Bob
I'm curious on how some other people have handled this.
Imagine a system that simply has Products, Purchase Orders, and Purchase Order Lines. Purchase Orders are the parent in a parent-child relationship to Purchase Order Lines. Purchase Order Lines, reference a single Product.
This works happily until you delete a Product that is referenced by a Purchase Order Line. Suddenly, the Line knows its selling 30 of something...but it doesn't know what.
What's a good way to anticipate the deletion of a referenced piece of data like this? I suppose you could just disallow a product to be deleted if any Purchase Order Lines reference it but that sounds...clunky. I imagine its likely that you would keep the Purchase Order in the database for years, essentially welding the product you want to delete into your database.
The parent entity should NEVER be deleted or the dependent rows cease to make sense, unless you delete them too. While it is "clunky" to display old records to users as valid selections, it is not clunky to have your database continue to make sense.
To address the clunkiness in the UI, some people create an Inactive column that is set to True when an item is no longer active, so that it can be kept out of dropdown lists in the user interface.
If the value is used in a display field (e.g. a readonly field) the inactive value can be styled in a different way (e.g. strike-through) to reflect its no-longer-active status.
I have StartDate and ExpiryDate columns in all entity tables where the entity can become inactive or where the entity will become active at some point in the future (e.g. a promotional discount).
Enforce referential integrity. This basically means creating foreign keys between the tables and making sure that nothing "disappears"
You can also use this to cause referenced items to be deleted when the parent is deleted (cascading deletes).
For example you can create a SQL Server table in such a way that if a PurchaseOrder is deleted it's child PurchaseOrderLines are also deleted.
Here is a good article that goes into that.
It doesn't seem clunky to keep this data (to me at least). If you remove it then your purchase order no longer has the meaning that it did when you created it, which is a bad thing. If you are worried about having old data in there you can always create an archive or warehouse database that contains stuff over a year old or something...
For data like this where parts of it have to be kept for an unknown amount of time while other parts will not, you need to take a different approach.
Your Purchase Order Lines (POL) table needs to have all of the columns that the product table has. When a line item is added to the purchase order, copy all of product data into the POL. This includes the name, price, etc. If the product has options, then you'll have to create a corresponding PurchaseOrderLineOptions table.
This is the only real way of insuring that you can recreate the purchase order on demand at any point. It also means that someone can change the pricing, name, description, and other information about the product at anytime without impacting previous orders.
Yes, you end up with a LOT of duplicate information in your line item table..; but that's okay.
For kicks, you might keep the product id in the POL table for referencing back, but you cannot depend on the product table to have any bearing on the paid for product...
I'm not sure if this type of question has been answered before. In my database I have a product table and specifications table. Each product can have multiple specifications. Here I need to store the revisions of each product in database in order to query them later on for history purposes.
So I need an efficient way to store the products' relations to specifications each time users make changes to these relations. Also the amount of data can become very big. For example, suppose there are 100000 products in database: each product can have 30 specifications and also there are minimum of 20 revisions on each product. So by storing all the data in a single table the amount of data becomes enormously high.
Any suggestions?
If this is purely for 'archival' purposes then maybe a separate table for the revisions is better.
However if you need to treat previous revisions equally to current revisions (for example, if you want to give users the ability to revert a product to a previous revision), then it is probably best to keep a single products table, rather than copying data between tables. If you are worried about performance, this is what indexes are for.
You can create a compound primary key for the product table, e.g. PRIMARY KEY (product_id, revision). Maybe a stored proc to find the current revision—by selecting the row with the highest revision for a particular product_id—will be useful.
I would recommend having a table, exact copy of current table with a HistoryDate column, and store the revisions in this table. This you can do for all 3 tables in question.
By keeping the revision separate from the main tables, you will not incur any performance penalties when querying the main tables.
You can also look at keeping a record to indicate the user that changed the data.