SQL history table design

SQL history table design - sql

I need to design a history table to keep track of multiple values that were changed on a specific record when edited.
Example:
The user is presented with a page to edit the record.
Title: Mr.
Name: Joe
Tele: 555-1234
DOB: 1900-10-10
If a user changes any of these values I need to keep track of the old values and record the new ones.
I thought of using a table like this:
History---------------
id
modifiedUser
modifiedDate
tableName
recordId
oldValue
newValue
One problem with this is that it will have multiple entries for each edit.
I was thinking about having another table to group them but you still have the same problem.
I was also thinking about keeping a copy of the row in the history table but that doesn't seem efficient either.
Any ideas?
Thanks!

I would recommend that for each table you want to track history, you have a second table (i.e. tblCustomer and tblCustomer_History) with the identical format - plus a date column.
Whenever an edit is made, you insert the old record to the history table along with the date/time. This is very easy to do and requires little code changes (usually just a trigger)
This has the benefit of keeping your 'real' tables as small as possible, but gives you a complete history of all the changes that are made.
Ultimately however, it will come down to how you want to use this data. If its just for auditing purposes, this method is simple and has little downside except the extra disk space and little or no impact on your main system.

You should define what type of efficiency you're interested in: you can have efficiency of storage space, efficiency of effort required to record the history (transaction cost), or efficiency of time to query for the history of a record in a specific way.
I notice you have a table name in your proposed history table, this implies an intention to record the history of more than one table, which would rule out the option of storing an exact copy of the record in your history table unless all of the tables you're tracking will always have the same structure.
If you deal with columns separately, i.e. you record only one column value for each history record, you'll have to devise a polymorphic data type that is capable of accurately representing every column value you'll encounter.
If efficiency of storage space is your main concern, then I would break the history into multiple tables. This would mean having new column value table linked to both an edit event table and a column definition table. The edit event table would record the user and time stamp, the column definition table would record the table, column, and data type. As #njk noted, you don't need the old column value because you can always query for the previous edit to get the old value. The main reason this approach would be expected to save space is the assumption that, generally speaking, users will be editing a small subset of the available fields.
If efficiency of querying is your main concern, I would set up a history table for every table you're tracking and add a user and time stamp field to each history table. This should also be efficient in terms of transaction cost for an edit.

You don't need to record old and new value in a history table. Just record the newest value, author and date. You can then just fetch the most recent record for some user_id based on the date of the record. This may not be the best approach if you will be dealing with a lot of data.
user (id, user_id, datetime, author, ...)
Sample data
id user_id datetime author user_title user_name user_tele ...
1 1 2012-11-05 11:05 Bob
2 1 2012-11-07 14:54 Tim
3 1 2012-11-12 10:18 Bob

Related

history table: 1 vs 2 foreign keys

I have a "controller_variables" table where I save current values of some sensors:
id: the id of the record
controller_id (FK): the id of the controller that provides the data
variable_id (FK): the variable_id
value: the current variable value
created_at: creation date
updated_at: updated date
I also have "history_controller_variables" table where I save "snapshots" of the "controller_variables" table:
id: the id of the record
controller_variable_id (FK): the id of the controller_variables record
value: the "historified" read value
created_at: creation date of the history value
I found myself a few times wondering why I coupled the "history_controller_variables" table to the "controller_variables" table.
If I created the history table as an exact clone of the original table I could:
keep my history in case the referenced "controller_variables" record is deleted.
get history records by directly querying records of a certain controller_id/variable_id.
I can't think of a reason why not to do this change. Are there obvious reasons not to proceed with this change?

You have a fairly big tradeoff here. I don't know which is better, but I can tell you what the advantages of each are.
If your variables for your controllers will not change, then you want to go with one foreign key. This makes it easier to ensure correctness, that the history record represents a valid value for a given controller. If, on the other hand these change and you delete records from the controller variable table, you run into a problem here that has no easy solution. So in that case, you are better off using two.
Ultimately we never know the future and for that reason I would tend to accept some risk of odd data in exchange for ensuring that operational and historical data is subject to different concerns and that changing the data doesn't mess with history.

This is a good change to do. Have the history table be a clone of the original table, but add a timestamp column to the history table. Any time a variable value changes, create a new record in the history table with the new value, and have the timestamp indicate when the variable was changed to that value. If applicable in your application, you can also include a column in your history table that indicates who (or what) modified the variable to be the new value.

How to store customer specific key value pairs of data linked to a record

We are developing an application where records are inserted to a database to be viewed at a later date, we'd like a generic solution to this so we don't have to rework the DB every time we deploy this to a customer.
Records have three types of data:
Common Data:
A set of a data items that are common to every record such as record Id, record creation date, and record type.
Extended Data:
Some of this data is record type specific So Record Type A would Always have Data Elements A,B,C and D While Record Type B would always have Data Elements E,F,J and K
Customer specific Data:
This is data that the customer wants each record of a particular type to hold, for example if a customer requires that All Records of type A have a First Name and a Surname, but Records of type B Only have a phone number.
The extended data is unlikely to change often and any changes will likely involve a code change as well so having a table with Common Data linked to an extended data as a 1 to 1 relationship is what we'd expect to do.
What we are having trouble visualizing is how to store the Customer specific data in such as way that makes sense. My initial impulse was to have a 1 to many relationship to a table with key/value pairs in it, but given that we expect upwards of 20,000 records to be inserted into the table a day this would lead to a table that is incredibly slow to search as it will likely have 10-20 entries per record.
A seconds solution is to have table with N columns in it where N is a number sufficiently large that no customer is likely to need to have that many custom fields. Then having a table that maps customer specific fields to the column for each type of record.
A third solution is to store the fields in a Clob as maybe JSON or XML, I'm not sure how you would go about doing this and making them searchable though.
I'm sure this is a problem that has been solved multiple times, and I'd rather not reinvent the wheel let alone create a wheel that turns out to be square.

Changelog for a table

I want to design a changelog for a few tables. Lets call it table restaurant. Every time a user modifies the list of restaurants the change should be logged.
Idea 1
My first idea was to create 2 tables. One which contains all the restaurants RESTAURANT_VALUE (restaurantId*, restaurantValueId*, address, phone, ..., username, insertDate). Every time a change is made it creates a new entry. Then a table RESTAURANT (restaurantId*, restaurantValueId) which will link to the current valid restaurantValueId. So one table that holds the current and the previous version.
Idea 2
It starts with 2 tables as well. One of them contains all current restaurants. e.g. RESTAURANT_CURRENT. And a second table which contains all changes RESTAURANT_HISTORY. Therefore both need to have the exactly same columns. Every time a change occurs the values of the 'current' table are copied into the history table, and the new version in the 'current'.
My opinion
Idea 1 doesn't care if columns will ever be added or not, therefore maintenance and adding of columns would be easy. However, I think as the database grows... wouldn't it slow down? Idea 2 has the advantage that the table with the values will never have any 'old' stuff and not get crowded.
Theoretically I think Idea 1 should be the one done
What do you think. Would you go for Idea 1 or another one? Are there any other important practical thoughts I am not aware of?

The approach strongly depends on your needs. Why would you want a history table?
If it's just for auditing purposes, then make a separate restaurant_history table (idea 2) to keep the history aside. If you want to present the history in the application, then go for signle restaurants table with one of below options:
seq_no - record version number incrementing with each update. If you need current data, you must search for highest seq_no for given restaurant_id(s), so optionally use also current marker, allowing straighforward current = true
valid_from, valid_to - where valid_to is NULL for current record
Sometimes there is need to query efficiently which attributes exactly changed. to do this easily you can consider a history table on attribute level: (restaurant_id, attribute, old_value, new_value, change_date, user).

Database Design: Stored Record Edit History (Temporal Data)

I want to store temporal information in a database. I have come up with the design below. Is this the best way to do it?
MasterTable
ID
DetailsTable
ID
MasterTableID
CreatedOn
Title
Content
While it works for my purposes having a MasterTable with just an ID field just does not feel right however I can see no other way to link the Details records together.
Is there a cleaner / standard way to do this?

An idea would be to design 2 tables as follows:
Entity table: EntityId - PK, Title, Content
EntityHistory table: EntityId - PK, Version - PK, CreatedOn, Title, Content
Some thoughts about:
Usually you'll need to work only with current version of your row, so your queries will not take into account previous versions while you're joining data, etc. On long term premise, this could have a huge impact on performance, statistics will not be accurate, data selectivity can negatively impact index selection, etc.
In case you work often with current values and historical value, you can define a view as a union on both 2 tables.
How to manage adding a new version? Within a transaction, copy the values from Entity in EntityHistory (by increasing version), then update Entity row with new values. Or alternatively, you could define a trigger on Entity table that will do trick behind.
Use a rowversion column: http://technet.microsoft.com/en-us/library/ms182776(v=sql.105).aspx

Just leave out the MasterTable.
Rows in your DetailsTable will still be "linked together", as you call it, by having the same ID column value.
Any other kind of useful "linking" you might want to do (e.g. link a row to its immediate successor or predecessor) is not achieved by having that MasterTable anyway. It achieves nothing (Unless you would want to have ID's in it, for which there is no Details, such that the ID never has been created, which seems rather unlikely). Leave it out.

Non-destructive updates, versioning finance data in SQL Server 2008

I have three tables (lots in reality but these three are the ones I have to worry about right now)
A product line table like...
ProductLineId (pk)
Name
Description
Price
Finance Event (FK)
and a finance event table like...
EventId (pk)
Event Description
and a Financial Transaction table like...
TransactionId (pk)
FinanceEventId (fk)
LotsOfSageReferencesAndOtherForeignKeys
When a sale is processed it creates transaction records based on the Finance Events etc.
The question is; if someone in the admin then goes in and changes the Finance event what is the best way of versioning the events table, while preserving the primary key.

In general you have several choices:
First when dealing with temporal data, you should store the actual values not just the ids in the transaction table. Then the other tables serve as lookup tables for the creation of new records, but you always know what the real data was at the time of the transactions, This would include things that change over time like price, customer name, etc. Note you may also need to store the id field in case you need to look up what the current equivalent is.
Or you can disallow updates entirely on your lookup tables and through a trigger, deactivate the current record and insert a new one when someone runs an update statement. Now your child tables hold the values of the record that was active at the time the record was created. What you lose in this scenario is the ability to look up what the current value would be (which is why you might not want to do it for something like customer name or price).
For data where you might not care if it changed over time, but would want to reflect the current information, allow the updates but have an audit or history table maintained through triggers; so you can recreate what the value would have been at the time if you need to, or to see who changed what and when (sometimes a legal requirement).
Which of these options suits your current situation, only you would know.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas