Taking snapshot of SQL tables - sql

I have a set of referential tables with different schema which we use as a reference data during integration of files. The reference data can be modified from the GUI.
And the requirement is, I need to create a snapshot of data if there are any changes. For eg., Users should be able to see which referential data has been used for particular date.
Option 1: Historize all the tables over night everyday with date. This way when users want to see the data used for particular date, we can easily query the corresponding history table. As users doesnt change the data everyday, this way we will make the database bigger day by day.
Option 2: Historize only the data(rows) which has been modified with modified date and use the view to fetch the data for particular days. But this way I need to write many views as the schema is different for different tables.
If you know of the best way I can use, I would appreciate it if you share your knowledge.
Thanks,

Not sure if possible but:
Option 3: Create/Edit triggers OnInsert/Update/Delete to write new values to an "historical table" and include a timestamp.
To get the Admin data used on day "X" just use the timestamp.
Another option (again not sure if possible) is to add "start_dt/end_dt" to the admin tables and have the processes lookup only the active data
Sérgio

Related

Is it practical to record table creation/modificaion events in a separate table?

I'm about to begin designing a database for an MVC web app and am considering using a table called 'changes' or 'events' to keep track of all changes made to the database. It would have a field for the table name, record id, user making the change, the timestamp and whether it was a creation or modification event.
My question is whether this is a poor design practice compared to having 'created by', 'created on', 'modified by', 'modified on' fields in each table. I was thinking that in the parent Model class, I would use a before-save function recorded every change. I can see a pitfall might be if many records were updated at once, it might be difficult to get the function to save the changes properly.
This is a matter of weighing up the benefit of having the granular info against the overhead of writing to this table every time something in the database changes and the additional storage required.
If you are concerned about using a function on the model class then an alternative would be database triggers. This would be quite robust, but would be more work to set up as you would need to define triggers for each table unless the database you are using has a facility to log DML changes generically.
Finally, I would also advise considering archiving as this table has the potential to get very big very quickly.
I think your approach is fine. One advantage of having these events in a separate table is you can capture multiple edits. If you just have ModifiedDate/ModifiedBy columns you only see the last edit.
The main thing to be aware of is table size since audit tables can get VERY big. You may also decide to split into multiple audit tables (e.g. use the same table name with an _audit suffix) to improve query performance.
It depends on what you need. The reason for creating and maintaining an events table is to maintain an audit trail of changes.
For some applications, the created / updated fields at the end of a row are a sufficient audit trail.
For more secure applications, you need an events table. You also need to include the actual change (before / after) in your events table.
Also consider a time-related table, where each record has a start and end date, along with a created by. Any change actually sets the end date for the previous record and creates a new record, with a NULL end date.
Your current record is the one with a NULL end date.
Basically, every record has a life-span.

How to store daily/monthly snapshots on Google BigQuery?

we need to store daily and monthly snapshots of some of ours database.
It's not backup, we need to store the data so to analyze them later and to see how they evolve during the time.
We still don't know exactly what sort of queries we will need in two months, for starting we need to track some evolutions of our user base, so we will save daily snapshots of users and other related collections.
We are thinking to put all the stuff on Google BigQuery, it's easy to put data on it and easier to make queries on that data.
We will create some tables, one for each set of data we need, with all the needed columns, plus an extra one that will contain the date on which the extraction process was done.
We will use this column to group the data by day, month, and so on.
An alternative approach could be to create a dataset for each .. well set of data, and one table every time we need a snapshot.
I honestly don't know what is the better between these two, or if there are better options.
It's difficult to say which is best for you since I don't know your needs or cost requirements.
However, with the "create some tables, one for each set of data we need, with all the needed columns, plus an extra one that will contain the date on which the extraction process was done" method, you could run queries that will allow you to see what has changed for your users over time. For example, you could say, for a particular time slice, the average activity of a particular user over time.
Probably a bit late, but for future readers: you are probably looking for date-partitioned tables. It corresponds exactly to this use case, and there's a straightforward example in the documentation page.
You can now create table snapshots in BigQuery.
You can only use the bq command line tool for now.
See here -> https://cloud.google.com/bigquery/docs/table-snapshots-create#creating_table_snapshots

Best practice for auditing data in SQL Server and retrieving point in time data

I've been doing history tables for some time now in databases, but never put too much effort or thought into it. I wonder what is the best practice out there.
My main goal is to record any changes to a record for a particular day. If more than one change happens in a day then then only one history record will exist. I need to record the date the record was changed, also when I retrieve data I need to pull the correct record from history as it was at a particular time. So for example I have a customers table and want to pull out what their address was for a particular date. My Sprocs like get Cust details will take in an optional date and if no date is passed in then it returns the most recent record.
So here's what I was looking for advice on:
Do I keep the history table in the same table and use a logical delete flag to hide the historical ones? I normally don't do this as some tables can change a lot and have lots of records. Do I use a separate table that mirrors the main table? I usually do this. Should I only put change records into the history table and not the current one? What is the most efficient way given a date to pull out the right record at a point in time, get every record for a customer <= date passed in, and then sort by most recent date and take the top?
Thanks for all the help... regards M
Suggestion is to use trigger based auditing and create triggers for all tables you need to audit.
With triggers you can accomplish the requirement for not storing more than one record update per day.
I’d suggest you check out ApexSQL Audit that generates triggers for you and try to reverse engineer what triggers they use, how storage tables look like and such.
This will give you a good start and you can work form there.
Disclaimer: not affiliated with ApexSQL but I do use their tools on a daily basis.
I'm no expert in the field but a good sql consultant once told me that a good aproach is generally to use the same table if all data can be changed. Otherwise have the original table contain only core nonchangable data and the historical table contain only stuff that can be changed.
You should defintely read this article on managing bitemporal data. The nice thing about this approach is it enables an auditable way of correcting historical data.
I beleive this will address your concerns about modidying the history data
I've always used a modified version of the audit table described in this article. While it does require you to pivot data so that it resembles your table's native structure, it is resilient against changes to the schema.
You can create a UDF that returns a table and accepts a table name (varchar) and point in time (datetime) as parameters. The UDF should rebuild the table using the audit (historical values) and give you the effective values at that date & time.

How would you maintain a history in SQL tables?

I am designing a database to store product informations, and I want to store several months of historical (price) data for future reference. However, I would like to, after a set period, start overwriting initial entries with minimal effort to find the initial entries. Does anyone have a good idea of how to approach this problem? My initial design is to have a table named historical data, and everyday, it pulls the active data and stores it into the historical database with a time stamp. Does anyone have a better idea? Or can see what is wrong with mine?
First, I'd like to comment on your proposed solution. The weak part of course is that, there can, actually, be more than one change between your intervals. That means, the record was changed three times during the day, but you only archive the last change.
It's possible to have the better solution, but it must be event-driven. If you have the database server that supports events or triggers (like MS SQL), you should write a trigger code that creates entry in history table. If your server does not support triggers, you can add the archiving code to your application (during Save operation).
You could place a trigger on your price table. That way you can archive the old price in an other table at each update or delete event.
It's a much broader topic than it initially seems. Martin Fowler has a nice narrative about "things that change with time".
IMO your approach seems sound if your required history data is a snapshot of the end of the day's data - in the past I have used a similar approach with overnight jobs (SP's) that pick up the day's new data, timestamp it and then use a "delete all data that has a timestamp < today - x" where x is the time period of data I want to keep.
If you need to track all history changes, then you need to look at triggers.
I would like to, after a set period, start overwriting initial entries with minimal effort to find the initial entries
We store data in Archive tables, using a Trigger, as others have suggested. Our archive table has additional column for AuditDate, and stores the "Deleted" data - i.e. the previous version of the data. The current data is only stored in the actual table.
We prune the Archive table with a business rule along the lines of "Delete all Archive data more than 3 months old where there exists at least one archive record younger than 3 months old; delete all archive data more than 6 months old"
So if there has been no price change in the last 3 months you would still have a price change record from the 3-6 months ago period.
(Ask if you need an example of the self-referencing-join to do the delete, or the Trigger to store changes in the Archive table)

Database history for client usage

I'm trying to figure out what would be the best way to have a history on a database, to track any Insert/Delete/Update that is done. The history data will need to be coded into the front-end since it will be used by the users. Creating "history tables" (a copy of each table used to store history) is not a good way to do this, since the data is spread across multiple tables.
At this point in time, my best idea is to create a few History tables, which the tables would reflect the output I want to show to the users. Whenever a change is made to specific tables, I would update this history table with the data as well.
I'm trying to figure out what the best way to go about would be. Any suggestions will be appreciated.
I am using Oracle + VB.NET
I have used very successfully a model where every table has an audit copy - the same table with a few additional fields (time stamp, user id, operation type), and 3 triggers on the first table for insert/update/delete.
I think this is a very good way of handling this, because tables and triggers can be generated from a model and there is little overhead from a management perspective.
The application can use the tables to show an audit history to the user (read-only).
We've got that requirement in our systems. We added two tables, one header, one detail called AuditRow and AuditField. The AuditRow contains one row per row changed in any other table, and the AuditField contains one row per column changed with old value and new value.
We have a trigger on every table that writes a header row (AuditRow) and the needed detail rows (one per changed colum) on each insert/update/delete. This system does rely on the fact that we have a guid on every table that can uniquely represent the row. Doesn't have to be the "business" or "primary" key, but it's a unique identifier for that row so we can identify it in the audit tables. Works like a champ. Overkill? Perhaps, but we've never had a problem with auditors. :-)
And yes, the Audit tables are by far the largest tables in the system.
If you are lucky enough to be on Oracle 11g, you could also use the Flashback Data Archive
Personally, I would stay away from triggers. They can be a nightmare when it comes to debugging and not necessarily the best if you are looking to scale out.
If you are using an PL/SQL API to do the INSERT/UPDATE/DELETEs you could manage this in a simple shift in design without the need (up front) for history tables.
All you need are 2 extra columns, DATE_FROM and DATE_THRU. When a record is INSERTed, the DATE_THRU is left NULL. If that record is UPDATEd or DELETEd, just "end date" the record by making DATE_THRU the current date/time (SYSDATE). Showing the history is as simple as selecting from the table, the one record where DATE_THRU is NULL will be your current or active record.
Now if you expect a high volume of changes, writing off the old record to a history table would be preferable, but I still wouldn't manage it with triggers, I'd do it with the API.
Hope that helps.