What is the best way to query deleted records with SQL Server 2016 temporal tables? - sql-server-2016

I'm looking at SQL Server 2016 temporal tables and can't find any efficient way to query for all historical records that are now deleted.
I prefer not to soft-delete or moving to a 'deleted items table', as I feel with temporal tables it is redundant.
Can this can be achieved with temporal tables in an efficient way?

Temporal tables are intended to give you a point-in-time view of your data, not a state view - it doesn't actually understand state. Nothing is exposed to users to determine how a row arrived in the temporal history table.
If you did not temporarily pause/stop system versioning on your temporal table then you just need to find the delta between the history table and the active table. All remaining rows in the history table that don't have a corresponding row in the active table are deleted rows.
For example, if you have tblCustCalls and it's enabled for temporal with a tblCustCallsHistory, something like SELECT * FROM tblCustCallsHistory WHERE ID NOT IN (SELECT ID FROM tblCustCalls). In this example, ID is the primary key. You can optimize the TSQL if the tables are very large but the base concept doesn't change.

There is a way to detect it via the ValidTo column of your temporal table.
The latest ValidTo for the record will be less than the current date.
Or another way to look at it, an undeleted record will have a ValidTo that equals '9999-12-31 18:59:59.9900000'. I don't trust this value enough to hard code looking for it, so I just look for ValidTo > current date.
Don't forget it's UTC.
I write in the last updated by user id on the record before I delete it so that essentially becomes a snapshot of who deleted it and when.

You could also add a column [Action] containing the action. This resolves in the following process:
- Adding a new row: just add the row with [Action] = 'Inserted'
- Updating an existing row: just update the row with [action] = 'Updated'
- Deleting a row: First update the row with [Action] = 'Deleted' and delete the row
Like this you can find easily the unchanged rows in your basetable (where [action] = 'Inserted') and the deleted rows in your historytable (where [action] = 'Deleted')
Be aware this will create 2 rows in the history table!! (1 update and 1 delete statement)

Related

Find original value for record using temporal and history table

In SQL Server, if you have a temporal table and a history table, how do you find the original values for a record if you do not know the date/time that record was first created?
If the record was never edited, then it won't have a value in the history table and the original record is in the temporal table.
If it was edited, the original record is in the history table but with valid_from and valid_to dates that you don't know.
Is there a parameter in the FOR SYSTEM_TIME clause that returns the record as it appeared when it was first added to the table, regardless of whether it has changed since?
Just one note on the answer. You mention this :
If the record was never edited, then it won't have a value in the history table and the original record is in the temporal table. If it was edited, the original record is in the history table
A huge benefit of using Temporal Tables is that it acts as a single table. It is not two tables where you have to do unions to work out which table you should be looking at. If you were creating a history table manually (e.g. A roll your own audit solution), then you would have to query across two different tables.
Now onto your solution. The easiest way would be to use the SYSTEM_TIME ALL flag. It works like so :
SELECT TOP 1 *
FROM Person
FOR SYSTEM_TIME ALL
WHERE Id = 1
ORDER BY ValidFrom ASC
This returns ALL historic records for a particular ID. But because we order by ValidFrom and then select the TOP 1, we actually return the very first entry.
And again, this works regardless of whether that first record is in the "current" table or the history table.
More info here : https://dotnetcoretutorials.com/2021/12/11/temporal-tables-in-sql-server/

Incremental load for Updates into Warehouse

I am planning for an incremental load into warehouse (especially for updates of source tables in RDBMS).
Capturing the updated rows in staging tables from RDBMS based the updates datetime. But how do I determine which column of a particular row needs to be updated in the target warehouse tables?
Or do I just delete a particular row in the warehouse table (based on the primary key of the row in staging table) and insert the new updated row?
Which is the best way to implement the incremental load between the RDBMS and Warehouse using PL/SQL and SQL coding?
In my opinion, the easiest way to accomplish this is as follows:
Create a stage table identical to your host table. When you do your incremental/net-change load, load all changed records into this table (based on whatever your "last updated" field is)
Delete the records from your actual table based on the primary key. For example, if your primary key is customer, part, the query might look like this:
delete from main_table m
where exists (
select null
from stage_table s
where
m.customer = s.customer and
m.part = s.part
);
Insert the records from the stage to the main table.
You could also do an update existing records / insert new records, but either way that's two steps. The advantage of the method I listed is that it will work even if your tables have partitions and the newly updated data violates one of the original partition rules, whereas an update would not accomplish that. Also, the syntax is much simpler as your update would have to list every single field, whereas the delete from / insert into allows you list only the primary key fields.
Oracle also has a merge clause that will update if it exists or insert if it does not. I honestly don't know how that would be impacted if you had partitions.
One major caveat. If your updates include deletes -- records that need to be deleted from the main table, none of these will resolve that and you will need some other way to handle that. It may not be necessary, depending on your circumstances, but it's something to consider.

How should I reliably mark the most recent row in SQL Server table?

The existing design for this program is that all changes are written to a changelog table with a timestamp. In order to obtain the current state of an item's attribute we JOIN onto the changelog table and take the row having the most recent timestamp.
This is a messy way to keep track of current values, but we cannot readily change this changelog setup at this time.
I intend to slightly modify the behavior by adding an "IsMostRecent" bit to the changelog table. This would allow me to simply pull the row having that bit set, as opposed to the MAX() aggregation or recursive seek.
What strategy would you employ to make sure that bit is always appropriately set? Or is there some alternative you suggest which doesn't affect the current use of the logging table?
Currently I am considering a trigger approach, which turns the bit off all other rows, and then turns it on for the most recent row on an INSERT
I've done this before by having a "MostRecentRecorded" table which simply has the most recently inserted record (Id and entity ID) fired off a trigger.
Having an extra column for this isn't right - and can get you into problems with transactions and reading existing entries.
In the first version of this it was a simple case of
BEGIN TRANSACTION
INSERT INTO simlog (entityid, logmessage)
VALUES (11, 'test');
UPDATE simlogmostrecent
SET lastid = ##IDENTITY
WHERE simlogentityid = 11
COMMIT
Ensuring that the MostRecent table had an entry for each record in SimLog can be done in the query but ISTR we did it during the creation of the entity that the SimLog referred to (the above is my recollection of the first version - I don't have the code to hand).
However the simple version caused problems with multiple writers as could cause a deadlock or transaction failure; so it was moved into a trigger.
Edit: Started this answer before Richard Harrison answered, promise :)
I would suggest another table with the structure similar to below:
VersionID TableName UniqueVal LatestPrimaryKey
1 Orders 209 12548
2 Orders 210 12549
3 Orders 211 12605
4 Orders 212 10694
VersionID -- being the tables key
TableName -- just in case you want to roll out to multiple tables
UniqueVal -- is whatever groups multiple rows into a single item with history (eg Order Number or some other value)
LatestPrimaryKey -- is the identity key of the latest row you want to use.
Then you can simply JOIN to this table to return only the latest rows.
If you already have a trigger inserting rows into the changelog table this could be adapted:
INSERT INTO [MyChangelogTable]
(Primary, RowUpdateTime)
VALUES (#PrimaryKey, GETDATE())
-- Add onto it:
UPDATE [LatestRowTable]
SET [LatestPrimaryKey] = #PrimaryKey
WHERE [TableName] = 'Orders'
AND [UniqueVal] = #OrderNo
Alternatively it could be done as a merge to capture inserts as well.
One thing that comes to mind is to create a view to do all the messy MAX() queries, etc. behind the scenes. Then you should be able to query against the view. This way would not have to change your current setup, just move all the messiness to one place.

Suggested techniques for storing multiple versions of SQL row data

I am developing an application that is required to store previous versions of database table rows to maintain a history of changes. I am recording the history in the same table but need the most current data to be accessible by a unique identifier that doesn't change with new versions. I have a few ideas on how this could be done and was just looking for some ideas on the best way of doing this or whether there is any reason not to use one of my ideas:
Create a new row for each row version, with a field to indicate which row was the current row. The drawback of this is that the new version has a different primary key and any references to the old version will not return the current version.
When data is updated, the old row version is duplicated to a new row, and the new version replaces the old row. The current row can be accessed by the same primary key.
Add a second table with only a primary key, add a column to the other table which is foreign key to new table's primary key. Use same method as described in option 1 for storing multiple versions and create a view which finds the current version by using the new table's primary key.
PeopleSoft uses (used?) "effective dated records". It took a little while to get the hang of it, but it served its purpose. The business key is always extended by an EFFDT column (effective date). So if you had a table EMPLOYEE[EMPLOYEE_ID, SALARY] it would become EMPLOYEE[EMPLOYEE_ID, EFFDT, SALARY].
To retrieve the employee's salary:
SELECT e.salary
FROM employee e
WHERE employee_id = :x
AND effdt = (SELECT MAX(effdt)
FROM employee
WHERE employee_id = :x
AND effdt <= SYSDATE)
An interesting application was future dating records: you could give every employee a 10% increase effective Jan 1 next year, and pre-poulate the table a few months beforehand. When SYSDATE crosses Jan 1, the new salary would come into effect. Also, it was good for running historical reports. Instead of using SYSDATE, you plug in a date from the past in order to see the salaries (or exchange rates or whatever) as they would have been reported if run at that time in the past.
In this case, records are never updated or deleted, you just keep adding records with new effective dates. Makes for more verbose queries, but it works and starts becoming (dare I say) normal. There are lots of pages on this, for example: http://peoplesoft.wikidot.com/effective-dates-sequence-status
#3 is probably best, but if you wanted to keep the data in one table, I suppose you could add a datetime column that has a now() value populated for each new row and then you could at least sort by date desc limit 1.
Overall though - multiple versions needs more info on what you want to do effectively as much as programatically...ie need more info on what you want to do.
R
Have you considered using AutoAudit?
AutoAudit is a SQL Server (2005, 2008) Code-Gen utility that creates
Audit Trail Triggers with:
Created, CreatedBy, Modified, ModifiedBy, and RowVersion (incrementing INT) columns to table
Insert event logged to Audit table
Updates old and new values logged to Audit table
Delete logs all final values to the Audit tbale
view to reconstruct deleted rows
UDF to reconstruct Row History
Schema Audit Trigger to track schema changes
Re-code-gens triggers when Alter Table changes the table
For me, history tables are always separate. So, definitely I would go with that, but why create some complex versioning thing where you need to look at the current production record. In reporting, this results in nasty unions that are really unnecessary.
Table has a primary key and who cares what else.
TableHist has these columns: incrementing int/bigint primary key, history written date/time, history written by, record type (I, U, D for insert, update, delete), the PK from Table as an FK on TableHist, the remaining columns all other columns with the same name are in the TableHist table.
If you create this history table structure and populate it via triggers on Table, you will have all versions of every row in the tables you care about and can easily determine the original record, every change, and the deletion records as well. AND if you are reporting, you only need to use your historical tables to get all of the information you'd like.
create table table1 (
Id int identity(1,1) primary key,
[Key] varchar(max),
Data varchar(max)
)
go
create view view1 as
with q as (
select [Key], Data, row_number() over (partition by [Key] order by Id desc) as 'r'
from table1
)
select [Key], Data from q where r=1
go
create trigger trigger1 on view1 instead of update, insert as begin
insert into table1
select [Key], Data
from (select distinct [Key], Data from inserted) a
end
go
insert into view1 values
('key1', 'foo')
,('key1', 'bar')
select * from view1
update view1
set Data='updated'
where [Key]='key1'
select * from view1
select * from table1
drop trigger trigger1
drop table table1
drop view view1
Results:
Key Data
key1 foo
Key Data
key1 updated
Id Key Data
1 key1 bar
2 key1 foo
3 key1 updated
I'm not sure if the disctinct is needed.

Audit Triggers: Use INSERTED or DELETED system tables

The topic of how to audit tables has recently sprung up in our discussions... so I like your opinion on whats the best way to approach this. We have a mix of both the approaches (which is not good) in our database, as each previous DBA did what he/she believed was the right way. So we need to change them to follow any one model.
CREATE TABLE dbo.Sample(
Name VARCHAR(20),
...
...
Created_By VARCHAR(20),
Created_On DATETIME,
Modified_By VARCHAR(20),
Modified_On DATETIME
)
CREATE TABLE dbo.Audit_Sample(
Name VARCHAR(20),
...
...
Created_By VARCHAR(20),
Created_On DATETIME,
Modified_By VARCHAR(20),
Modified_On DATETIME
Audit_Type VARCHAR(1) NOT NULL
Audited_Created_On DATETIME
Audit_Created_By VARCHAR(50)
)
Approach 1: Store, in audit tables, only those records that are replaced/deleted from the main table ( using system table DELETED). So for each UPDATE and DELETE in the main table, the record that is being replaced is INSERTED into the audit table with 'Audit_Type' column as wither 'U' ( for UPDATE ) or 'D' ( for DELETE)
INSERTs are not Audited. For current version of any record you always query the main table. And for history you query audit table.
Pros: Seems intutive, to store the previous versions of records
Cons: If you need to know the history of a particular record, you need to join audit table with main table.
Appraoch 2: Store, in audit table, every record that goes into main table ( using system table INSERTED).
Each record that is INSERTED/UPDATED/DELETED to main table is also stored in audit table. So when you insert a new record it is also inserted into audit table. When updated, the new version (from INSERTED) table is stored in Audit table. When deleted, old version (from DELETED) table is stored in audit table.
Pros: If you need to know the history of a particular record, you have everything in one location.
Though I did not list all of them here, each approach has its pros and cons?
I'd go with :
Appraoch 2: Store, in audit table, every record that goes into main table
( using system table INSERTED).
is one more row per item really going to kill the DB? This way you have the complete history together.
If you purge out rows (a range all older than X day) you can still tell if something has changed or not:
if an audit row exists (not purged) you can see if the row in question changed.
if no audit rows exist for the item (all were purged) nothing changed (since any change writes to the audit table, including completely new items)
if you go with Appraoch 1: and purge out a range, it will be hard (need to remember purge date) to tell new inserts vs. ones where all rows were purged.
A third approach we use alot is to only audit the interesting columns, and save both 'new' and 'old' value on each row.
So if you have your "name" column, the audit table would have "name_old" and "name_new".
In INSERT trigger, "name_old" is set to blank/null depending on your preference and "name_new" is set from INSERTED.
In UPDATE trigger, "name_old" is set from DELETED and "name_new" from INSERTED
In DELETE trigger, "name_old" is set from DELETED and "new_name" to blank/null.
(or you use a FULL join and one trigger for all cases)
For VARCHAR fields, this might not look like such a good idea, but for INTEGER, DATETIME, etc it provides the benefit that it's very easy to see the difference of the update.
I.e. if you have a quantity-field in your real table and update it from 5 to 7, you'd have in audit table:
quantity_old quantity_new
5 7
Easily you can calculate that the quantity was increased by 2 on the specific time.
If you have separate rows in audit table, you will have to join one row with "the next" to calculate difference - which can be tricky in some cases...