Changelog for a table

Changelog for a table - sql

I want to design a changelog for a few tables. Lets call it table restaurant. Every time a user modifies the list of restaurants the change should be logged.
Idea 1
My first idea was to create 2 tables. One which contains all the restaurants RESTAURANT_VALUE (restaurantId*, restaurantValueId*, address, phone, ..., username, insertDate). Every time a change is made it creates a new entry. Then a table RESTAURANT (restaurantId*, restaurantValueId) which will link to the current valid restaurantValueId. So one table that holds the current and the previous version.
Idea 2
It starts with 2 tables as well. One of them contains all current restaurants. e.g. RESTAURANT_CURRENT. And a second table which contains all changes RESTAURANT_HISTORY. Therefore both need to have the exactly same columns. Every time a change occurs the values of the 'current' table are copied into the history table, and the new version in the 'current'.
My opinion
Idea 1 doesn't care if columns will ever be added or not, therefore maintenance and adding of columns would be easy. However, I think as the database grows... wouldn't it slow down? Idea 2 has the advantage that the table with the values will never have any 'old' stuff and not get crowded.
Theoretically I think Idea 1 should be the one done
What do you think. Would you go for Idea 1 or another one? Are there any other important practical thoughts I am not aware of?

The approach strongly depends on your needs. Why would you want a history table?
If it's just for auditing purposes, then make a separate restaurant_history table (idea 2) to keep the history aside. If you want to present the history in the application, then go for signle restaurants table with one of below options:
seq_no - record version number incrementing with each update. If you need current data, you must search for highest seq_no for given restaurant_id(s), so optionally use also current marker, allowing straighforward current = true
valid_from, valid_to - where valid_to is NULL for current record
Sometimes there is need to query efficiently which attributes exactly changed. to do this easily you can consider a history table on attribute level: (restaurant_id, attribute, old_value, new_value, change_date, user).

Related

SQL Server - 1 to 1 relationship best practice

I have a table called Station with many fields,
one of the fields is - StationPrice.
I want to hold more information about the payment process such as - currency, paymentStatus and etc (somewhere like 10 fields).
My question is if I need to expand the current table - Station with the new fields or to have a field called - StationPriceId that will be a foreign key to another table called StationPrices which will store all the information about the price related to that station.

The answer to your question is 1 of the most popular answers in DB world - it depends. It might look nicer if this info is split into 2 different tables, however you need to understand that if you split it then you'll need to 2 inserts instead of 1, the same with updated and deletes. Moreover you'll need to JOIN this table every time you'll need this data. I would go with single table first and then moved it to the separate column when the specific need come up.
From other side of view, if this data will be rarely accessed and JOIN/DELETE/INSERT overhead will be minimal then it is OK to move it.

My question is if I need to expand the current table - Station with the new fields or to have a field called - StationPriceId that will be a foreign key to another table called StationPrices which will store all the information about the price related to that station.
Yes it is better to use two tables, Based on information you had provided - it is better to use a separate table.
You might want to maintain a price change history
Hence if you maintain a separate table, you can make earlier price as Active = false and enter a new price for the particular station

history table: 1 vs 2 foreign keys

I have a "controller_variables" table where I save current values of some sensors:
id: the id of the record
controller_id (FK): the id of the controller that provides the data
variable_id (FK): the variable_id
value: the current variable value
created_at: creation date
updated_at: updated date
I also have "history_controller_variables" table where I save "snapshots" of the "controller_variables" table:
id: the id of the record
controller_variable_id (FK): the id of the controller_variables record
value: the "historified" read value
created_at: creation date of the history value
I found myself a few times wondering why I coupled the "history_controller_variables" table to the "controller_variables" table.
If I created the history table as an exact clone of the original table I could:
keep my history in case the referenced "controller_variables" record is deleted.
get history records by directly querying records of a certain controller_id/variable_id.
I can't think of a reason why not to do this change. Are there obvious reasons not to proceed with this change?

You have a fairly big tradeoff here. I don't know which is better, but I can tell you what the advantages of each are.
If your variables for your controllers will not change, then you want to go with one foreign key. This makes it easier to ensure correctness, that the history record represents a valid value for a given controller. If, on the other hand these change and you delete records from the controller variable table, you run into a problem here that has no easy solution. So in that case, you are better off using two.
Ultimately we never know the future and for that reason I would tend to accept some risk of odd data in exchange for ensuring that operational and historical data is subject to different concerns and that changing the data doesn't mess with history.

This is a good change to do. Have the history table be a clone of the original table, but add a timestamp column to the history table. Any time a variable value changes, create a new record in the history table with the new value, and have the timestamp indicate when the variable was changed to that value. If applicable in your application, you can also include a column in your history table that indicates who (or what) modified the variable to be the new value.

Best table structure for tracking state changes

I'm currently trying to model an aspect of a system whereby components that are stored can change state, eg OK, FAILED, REPAIRED etc. On the web side I will need to show the current state, but also the history of previous (if any) states.
I'm torn between these two designs, could anyone shed any light on the best way (I'm more a software dev than a dba guy).
Option one:
statehistory table which tracks each time the state changes, the highest sequence number will be the current state : SQLFiddle example
Option two:
Similar to above, except the current state is stored in the component table, and only past states are in the history table. When state is changed the current state is inserted as the most recent in history then the current is set in the component table: SQLFiddle example
As an aside, use either one or two but without the state lookup table, just store the state text as varchar (my thinking is this makes it easier to report from?): SQLFiddle example
Thanks.
EDIT:
There are several component tables, should the state history table contain the data for all of them, or make a statehistory table per component? Each components table will have hundreds of thousands of entries, making the statehistory table pretty large.
eg:
Table: component_a
Table: component_b
etc..
statehistory (
component_a_id,
component_b_id,
state_id,
...
)

I tend to do a hybrid between the two. I always store all state changes including the current state in the history table. That gives you a central place to query them. You can have a column IsCurrent BIT NOT NULL to make your life a little easier. Create a filtered unique index with filter IsCurrent = 1 to enforce basic integrity rules.
I also store the current state in the main table. Probably not just a copy but as a foreign key to the history table. That makes for very convenient querying. Looking up the current state is often useful. For indexing reasons you can also duplicate the values into the main table, of course. The more duplication you have the more error prone the system.
If you want to avoid duplication but still index on the current status, you can create an indexed view to combine main and history tables. You can then create an index on mixed columns from both tables (e.g. on (StatusHistoryItems.Status, Components.Name) to support queries that ask for customers with a specific status and a specific name. This query would be resolved as a single index seek on the view's index).
You'd create a view like this:
SELECT *
FROM Components c
JOIN StatusHistoryItems shi on c.ID = shi.ComponentID
AND c.IsCurrent = 1 --this condition will join exactly one row
And index it. Now you have the current status together with all component data in one efficient index. No duplication, no denormalization at all. Just make sure that there is at least one status row for each customer with IsCurrent = 1.
I recommend having a nightly validation job that validates data consistency and alerts you of problems. Denormalized data has a habit of becoming corrupted over time for various reasons.

Database Design: Stored Record Edit History (Temporal Data)

I want to store temporal information in a database. I have come up with the design below. Is this the best way to do it?
MasterTable
ID
DetailsTable
ID
MasterTableID
CreatedOn
Title
Content
While it works for my purposes having a MasterTable with just an ID field just does not feel right however I can see no other way to link the Details records together.
Is there a cleaner / standard way to do this?

An idea would be to design 2 tables as follows:
Entity table: EntityId - PK, Title, Content
EntityHistory table: EntityId - PK, Version - PK, CreatedOn, Title, Content
Some thoughts about:
Usually you'll need to work only with current version of your row, so your queries will not take into account previous versions while you're joining data, etc. On long term premise, this could have a huge impact on performance, statistics will not be accurate, data selectivity can negatively impact index selection, etc.
In case you work often with current values and historical value, you can define a view as a union on both 2 tables.
How to manage adding a new version? Within a transaction, copy the values from Entity in EntityHistory (by increasing version), then update Entity row with new values. Or alternatively, you could define a trigger on Entity table that will do trick behind.
Use a rowversion column: http://technet.microsoft.com/en-us/library/ms182776(v=sql.105).aspx

Just leave out the MasterTable.
Rows in your DetailsTable will still be "linked together", as you call it, by having the same ID column value.
Any other kind of useful "linking" you might want to do (e.g. link a row to its immediate successor or predecessor) is not achieved by having that MasterTable anyway. It achieves nothing (Unless you would want to have ID's in it, for which there is no Details, such that the ID never has been created, which seems rather unlikely). Leave it out.

SQL history table design

I need to design a history table to keep track of multiple values that were changed on a specific record when edited.
Example:
The user is presented with a page to edit the record.
Title: Mr.
Name: Joe
Tele: 555-1234
DOB: 1900-10-10
If a user changes any of these values I need to keep track of the old values and record the new ones.
I thought of using a table like this:
History---------------
id
modifiedUser
modifiedDate
tableName
recordId
oldValue
newValue
One problem with this is that it will have multiple entries for each edit.
I was thinking about having another table to group them but you still have the same problem.
I was also thinking about keeping a copy of the row in the history table but that doesn't seem efficient either.
Any ideas?
Thanks!

I would recommend that for each table you want to track history, you have a second table (i.e. tblCustomer and tblCustomer_History) with the identical format - plus a date column.
Whenever an edit is made, you insert the old record to the history table along with the date/time. This is very easy to do and requires little code changes (usually just a trigger)
This has the benefit of keeping your 'real' tables as small as possible, but gives you a complete history of all the changes that are made.
Ultimately however, it will come down to how you want to use this data. If its just for auditing purposes, this method is simple and has little downside except the extra disk space and little or no impact on your main system.

You should define what type of efficiency you're interested in: you can have efficiency of storage space, efficiency of effort required to record the history (transaction cost), or efficiency of time to query for the history of a record in a specific way.
I notice you have a table name in your proposed history table, this implies an intention to record the history of more than one table, which would rule out the option of storing an exact copy of the record in your history table unless all of the tables you're tracking will always have the same structure.
If you deal with columns separately, i.e. you record only one column value for each history record, you'll have to devise a polymorphic data type that is capable of accurately representing every column value you'll encounter.
If efficiency of storage space is your main concern, then I would break the history into multiple tables. This would mean having new column value table linked to both an edit event table and a column definition table. The edit event table would record the user and time stamp, the column definition table would record the table, column, and data type. As #njk noted, you don't need the old column value because you can always query for the previous edit to get the old value. The main reason this approach would be expected to save space is the assumption that, generally speaking, users will be editing a small subset of the available fields.
If efficiency of querying is your main concern, I would set up a history table for every table you're tracking and add a user and time stamp field to each history table. This should also be efficient in terms of transaction cost for an edit.

You don't need to record old and new value in a history table. Just record the newest value, author and date. You can then just fetch the most recent record for some user_id based on the date of the record. This may not be the best approach if you will be dealing with a lot of data.
user (id, user_id, datetime, author, ...)
Sample data
id user_id datetime author user_title user_name user_tele ...
1 1 2012-11-05 11:05 Bob
2 1 2012-11-07 14:54 Tim
3 1 2012-11-12 10:18 Bob

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas