Audit Triggers: Use INSERTED or DELETED system tables

Audit Triggers: Use INSERTED or DELETED system tables - sql-server-2005

The topic of how to audit tables has recently sprung up in our discussions... so I like your opinion on whats the best way to approach this. We have a mix of both the approaches (which is not good) in our database, as each previous DBA did what he/she believed was the right way. So we need to change them to follow any one model.
CREATE TABLE dbo.Sample(
Name VARCHAR(20),
...
...
Created_By VARCHAR(20),
Created_On DATETIME,
Modified_By VARCHAR(20),
Modified_On DATETIME
)
CREATE TABLE dbo.Audit_Sample(
Name VARCHAR(20),
...
...
Created_By VARCHAR(20),
Created_On DATETIME,
Modified_By VARCHAR(20),
Modified_On DATETIME
Audit_Type VARCHAR(1) NOT NULL
Audited_Created_On DATETIME
Audit_Created_By VARCHAR(50)
)
Approach 1: Store, in audit tables, only those records that are replaced/deleted from the main table ( using system table DELETED). So for each UPDATE and DELETE in the main table, the record that is being replaced is INSERTED into the audit table with 'Audit_Type' column as wither 'U' ( for UPDATE ) or 'D' ( for DELETE)
INSERTs are not Audited. For current version of any record you always query the main table. And for history you query audit table.
Pros: Seems intutive, to store the previous versions of records
Cons: If you need to know the history of a particular record, you need to join audit table with main table.
Appraoch 2: Store, in audit table, every record that goes into main table ( using system table INSERTED).
Each record that is INSERTED/UPDATED/DELETED to main table is also stored in audit table. So when you insert a new record it is also inserted into audit table. When updated, the new version (from INSERTED) table is stored in Audit table. When deleted, old version (from DELETED) table is stored in audit table.
Pros: If you need to know the history of a particular record, you have everything in one location.
Though I did not list all of them here, each approach has its pros and cons?

I'd go with :
Appraoch 2: Store, in audit table, every record that goes into main table
( using system table INSERTED).
is one more row per item really going to kill the DB? This way you have the complete history together.
If you purge out rows (a range all older than X day) you can still tell if something has changed or not:
if an audit row exists (not purged) you can see if the row in question changed.
if no audit rows exist for the item (all were purged) nothing changed (since any change writes to the audit table, including completely new items)
if you go with Appraoch 1: and purge out a range, it will be hard (need to remember purge date) to tell new inserts vs. ones where all rows were purged.

A third approach we use alot is to only audit the interesting columns, and save both 'new' and 'old' value on each row.
So if you have your "name" column, the audit table would have "name_old" and "name_new".
In INSERT trigger, "name_old" is set to blank/null depending on your preference and "name_new" is set from INSERTED.
In UPDATE trigger, "name_old" is set from DELETED and "name_new" from INSERTED
In DELETE trigger, "name_old" is set from DELETED and "new_name" to blank/null.
(or you use a FULL join and one trigger for all cases)
For VARCHAR fields, this might not look like such a good idea, but for INTEGER, DATETIME, etc it provides the benefit that it's very easy to see the difference of the update.
I.e. if you have a quantity-field in your real table and update it from 5 to 7, you'd have in audit table:
quantity_old quantity_new
5 7
Easily you can calculate that the quantity was increased by 2 on the specific time.
If you have separate rows in audit table, you will have to join one row with "the next" to calculate difference - which can be tricky in some cases...

Related

How do I update a SQL table with daily records?

I have a SQL database where some of my tables are updated daily. I want to create another table which is updated daily with records of what tables (table name, modified/updated date) were updated. I also do not want this table to get too big, so I want this table to only keep records for the last 31 days. How would I write the code for this?
I have already created a table (tUpdatedTables) but i would like this table to be updated daily & keep these records for 31 days
This is how I created the table
Select *
Into tUpdatedTables
from sys.tables
order by modify_date desc
I have tried inserting an "Update" code to update the table but I get an error
update tUpdatedTables
set [name]
,[object_id]
,[principal_id]
,[schema_id]
,[parent_object_id]
,[type]
,[type_desc]
,[create_date]
,[modify_date]
,[is_ms_shipped]
,[is_published]
,[is_schema_published]
,[lob_data_space_id]
,[filestream_data_space_id]
,[max_column_id_used]
,[lock_on_bulk_load]
,[uses_ansi_nulls]
,[is_replicated]
,[has_replication_filter]
,[is_merge_published]
,[is_sync_tran_subscribed]
,[has_unchecked_assembly_data]
,[text_in_row_limit]
,[large_value_types_out_of_row]
,[is_tracked_by_cdc]
,[lock_escalation]
,[lock_escalation_desc]
,[is_filetable]
,[is_memory_optimized]
,[durability]
,[durability_desc]
,[temporal_type]
,[temporal_type_desc]
,[history_table_id]
,[is_remote_data_archive_enabled]
,[is_external]
--Into tUpdatedTables
from sys.tables
where modify_date >= GETDATE()
order by modify_date desc
Msg 2714, Level 16, State 6, Line 4 There is already an object named
'tUpdatedTables' in the database.

I want to create another table which is updated daily with records of what tables (table name, modified/updated date) were updated.
If this is all you want, I would suggest instead simply doing daily backups. You should be doing that anyway.
Beyond that, what you're looking for is an audit log. Most languages and frameworks have libraries to do this for you. For example, paper_trail.
If you want to do this yourself, follow the basic pattern of paper_trail.
id as an autoincrementing primary key
item_type which would be the table, or perhaps something more abstract
item_id is the primary key of the item
event are you storing a create, an update, or a delete?
bywho identify who made the change
object a json field containing a dump of the data
created_at when this happened (use a default)
Using JSON is key to making this table generic. Rather than trying to store every possible column of every possible table, and having to keep that up to date as the tables change, you store a JSON dump of the row using FOR JSON. This means the audit table doesn't need to change as other tables change. And it will save a lot of disk space as it avoids the audit table having a lot of unused columns.
For example, here's how you'd record creating ID 5 of some_table by user 23. (I might be a bit off as I don't use SQL Server).
insert into audit_log (item_type, item_id, event, bywho, object)
values(
'some_table', 5, 'create', 23, (
select * from some_table where id = 5 for json auto
)
)
Because the audit table doesn't care about the structure of the thing being recorded, you use insert, update, and delete triggers to each table to record their changes in the audit log. Just change the item_type.
As for not getting too big, don't worry about it until it's a problem. Proper indexing means it won't be a problem: a composite index on (item_type, item_id) will make listing the changes to a particular thing fast. Indexing bywho will make searches for changes made by a particular thing fast. You shouldn't be referencing this thing in production. If you are, that probably requires a different design.
Partitioning the table by month could also stave off scaling issues.
And if it does get too big, you can backup the table and use created_at to delete old entries.
delete from audit_log
where created_at < dateadd(day, -31, getdate())

What kind of approach is this in SQL, it actually exists? Is it viable/good pratice?

One of our teachers gave us the following challenge:
"Make a Database schema with the following principle:
you can't change any values on any table, only add new ones."
I came with the following schema:
CREATE TABLE TRANSACTIONS(ID PRIMARY KEY, TRANSACTION_TYPE_FK, DATE);
CREATE TABLE TRANSACTION_TYPE(ID PRIMARY KEY, NAME);
CREATE TABLE PRODUCTS_TRANSACTIONS(ID_PROD_FK, ID_TRANS_FK, MONEY, QTY);
CREATE TABLE PRODUCTS(ID PRIMARY KEY, NAME, PRICE_FK );
CREATE TABLE PRICES(ID PRIMARY KEY, DATE, DETAILS);
It's just a proof of concept. Basically everything is based on transactions.
Transactions can be Entry, Exit and Move Products and In & Out Money.
I can control my quantities and cash based on transactions.
The PRODUCTS_TRANSACTIONS "MONEY" field is used if a transaction involves money only or there are "discounts" or "taxes" on the transaction.
The Products Table has a "child" table called "prices", it storages all the price changes , the "details" field is for annotations like "Cost Price" etc.
I made it very quick, I am sorry for any inconsistency.
I liked this kind of approach, I am kinda of a newbie with SQL so I really wanted to know if this approach has a name and if it is viable perfomance-wise or a good pratice.
My idea is making a View and "update" it whenever a new transaction is made, since nothing needs to be "updated" I only need to add new rows to the View.
I am currently very sick, so I can't go to college to remedy my doubts.
Thanks in advance for any help

Let's take only one table TRANSACTION_TYPE(ID PRIMARY KEY, NAME) for example:
Now if you want to restrict update on the table, you can achieve that with following queries:
GRANT SELECT,INSERT,DELETE ON TRANSACTION_TYPE TO Username;
OR
Deny UPDATE ON TRANSACTION_TYPE TO Username;
Now to maintain history of insertion and deletion,you can store in another table by creating trigger on TRANSACTION_TYPE as follows:
CREATE or REPLACE TRIGGER my_trigger // name of trigger
AFTER INSERT OR DELETE
ON TRANSACTION_TYPE
FOR EACH ROW
BEGIN
IF INSERTING THEN
INSERT INTO TRANSACTION_INSERT_HISTORY(ID,NAME) //table that maintain history of insertion
VALUES(:new.ID,:new.NAME);
ELSIF DELETING THEN
INSERT INTO TRANSACTION_DELETE_HISTORY(ID,NAME) //table that maintain history of deleted records
VALUES(:old.ID,:old.NAME);
END IF;
END;
/
Before creating this trigger, you first have to create two tables:
TRANSACTION_INSERT_HISTORY(ID,NAME) and
TRANSACTION_DELETE_HISTORY(ID,NAME)
I have created two different tables for insertion and deletion for simplicity.
You can do it with one table too.
Hope it helps.

The table that holds the information, you could give permissions only to insert and select to the table, preventing update.
https://www.mssqltips.com/sqlservertip/1138/giving-and-removing-permissions-in-sql-server/
GRANT INSERT, SELECT ON TableX TO UserY
In a production system, you'd probably design this using a VIEW for selecting the data from the table (to only get the most recent revision of the audit data). With perhaps another VIEW that would allow you to see all the audit history. You'd probably also make use of a Stored Procedure for inserting the data and ensuring the data was being maintained in the audit history way you suggest.

What is the best way to query deleted records with SQL Server 2016 temporal tables?

I'm looking at SQL Server 2016 temporal tables and can't find any efficient way to query for all historical records that are now deleted.
I prefer not to soft-delete or moving to a 'deleted items table', as I feel with temporal tables it is redundant.
Can this can be achieved with temporal tables in an efficient way?

Temporal tables are intended to give you a point-in-time view of your data, not a state view - it doesn't actually understand state. Nothing is exposed to users to determine how a row arrived in the temporal history table.
If you did not temporarily pause/stop system versioning on your temporal table then you just need to find the delta between the history table and the active table. All remaining rows in the history table that don't have a corresponding row in the active table are deleted rows.
For example, if you have tblCustCalls and it's enabled for temporal with a tblCustCallsHistory, something like SELECT * FROM tblCustCallsHistory WHERE ID NOT IN (SELECT ID FROM tblCustCalls). In this example, ID is the primary key. You can optimize the TSQL if the tables are very large but the base concept doesn't change.

There is a way to detect it via the ValidTo column of your temporal table.
The latest ValidTo for the record will be less than the current date.
Or another way to look at it, an undeleted record will have a ValidTo that equals '9999-12-31 18:59:59.9900000'. I don't trust this value enough to hard code looking for it, so I just look for ValidTo > current date.
Don't forget it's UTC.
I write in the last updated by user id on the record before I delete it so that essentially becomes a snapshot of who deleted it and when.

You could also add a column [Action] containing the action. This resolves in the following process:
- Adding a new row: just add the row with [Action] = 'Inserted'
- Updating an existing row: just update the row with [action] = 'Updated'
- Deleting a row: First update the row with [Action] = 'Deleted' and delete the row
Like this you can find easily the unchanged rows in your basetable (where [action] = 'Inserted') and the deleted rows in your historytable (where [action] = 'Deleted')
Be aware this will create 2 rows in the history table!! (1 update and 1 delete statement)

Incremental load for Updates into Warehouse

I am planning for an incremental load into warehouse (especially for updates of source tables in RDBMS).
Capturing the updated rows in staging tables from RDBMS based the updates datetime. But how do I determine which column of a particular row needs to be updated in the target warehouse tables?
Or do I just delete a particular row in the warehouse table (based on the primary key of the row in staging table) and insert the new updated row?
Which is the best way to implement the incremental load between the RDBMS and Warehouse using PL/SQL and SQL coding?

In my opinion, the easiest way to accomplish this is as follows:
Create a stage table identical to your host table. When you do your incremental/net-change load, load all changed records into this table (based on whatever your "last updated" field is)
Delete the records from your actual table based on the primary key. For example, if your primary key is customer, part, the query might look like this:
delete from main_table m
where exists (
select null
from stage_table s
where
m.customer = s.customer and
m.part = s.part
);
Insert the records from the stage to the main table.
You could also do an update existing records / insert new records, but either way that's two steps. The advantage of the method I listed is that it will work even if your tables have partitions and the newly updated data violates one of the original partition rules, whereas an update would not accomplish that. Also, the syntax is much simpler as your update would have to list every single field, whereas the delete from / insert into allows you list only the primary key fields.
Oracle also has a merge clause that will update if it exists or insert if it does not. I honestly don't know how that would be impacted if you had partitions.
One major caveat. If your updates include deletes -- records that need to be deleted from the main table, none of these will resolve that and you will need some other way to handle that. It may not be necessary, depending on your circumstances, but it's something to consider.

Suggested techniques for storing multiple versions of SQL row data

I am developing an application that is required to store previous versions of database table rows to maintain a history of changes. I am recording the history in the same table but need the most current data to be accessible by a unique identifier that doesn't change with new versions. I have a few ideas on how this could be done and was just looking for some ideas on the best way of doing this or whether there is any reason not to use one of my ideas:
Create a new row for each row version, with a field to indicate which row was the current row. The drawback of this is that the new version has a different primary key and any references to the old version will not return the current version.
When data is updated, the old row version is duplicated to a new row, and the new version replaces the old row. The current row can be accessed by the same primary key.
Add a second table with only a primary key, add a column to the other table which is foreign key to new table's primary key. Use same method as described in option 1 for storing multiple versions and create a view which finds the current version by using the new table's primary key.

PeopleSoft uses (used?) "effective dated records". It took a little while to get the hang of it, but it served its purpose. The business key is always extended by an EFFDT column (effective date). So if you had a table EMPLOYEE[EMPLOYEE_ID, SALARY] it would become EMPLOYEE[EMPLOYEE_ID, EFFDT, SALARY].
To retrieve the employee's salary:
SELECT e.salary
FROM employee e
WHERE employee_id = :x
AND effdt = (SELECT MAX(effdt)
FROM employee
WHERE employee_id = :x
AND effdt <= SYSDATE)
An interesting application was future dating records: you could give every employee a 10% increase effective Jan 1 next year, and pre-poulate the table a few months beforehand. When SYSDATE crosses Jan 1, the new salary would come into effect. Also, it was good for running historical reports. Instead of using SYSDATE, you plug in a date from the past in order to see the salaries (or exchange rates or whatever) as they would have been reported if run at that time in the past.
In this case, records are never updated or deleted, you just keep adding records with new effective dates. Makes for more verbose queries, but it works and starts becoming (dare I say) normal. There are lots of pages on this, for example: http://peoplesoft.wikidot.com/effective-dates-sequence-status

#3 is probably best, but if you wanted to keep the data in one table, I suppose you could add a datetime column that has a now() value populated for each new row and then you could at least sort by date desc limit 1.
Overall though - multiple versions needs more info on what you want to do effectively as much as programatically...ie need more info on what you want to do.
R

Have you considered using AutoAudit?
AutoAudit is a SQL Server (2005, 2008) Code-Gen utility that creates
Audit Trail Triggers with:
Created, CreatedBy, Modified, ModifiedBy, and RowVersion (incrementing INT) columns to table
Insert event logged to Audit table
Updates old and new values logged to Audit table
Delete logs all final values to the Audit tbale
view to reconstruct deleted rows
UDF to reconstruct Row History
Schema Audit Trigger to track schema changes
Re-code-gens triggers when Alter Table changes the table

For me, history tables are always separate. So, definitely I would go with that, but why create some complex versioning thing where you need to look at the current production record. In reporting, this results in nasty unions that are really unnecessary.
Table has a primary key and who cares what else.
TableHist has these columns: incrementing int/bigint primary key, history written date/time, history written by, record type (I, U, D for insert, update, delete), the PK from Table as an FK on TableHist, the remaining columns all other columns with the same name are in the TableHist table.
If you create this history table structure and populate it via triggers on Table, you will have all versions of every row in the tables you care about and can easily determine the original record, every change, and the deletion records as well. AND if you are reporting, you only need to use your historical tables to get all of the information you'd like.

create table table1 (
Id int identity(1,1) primary key,
[Key] varchar(max),
Data varchar(max)
)
go
create view view1 as
with q as (
select [Key], Data, row_number() over (partition by [Key] order by Id desc) as 'r'
from table1
)
select [Key], Data from q where r=1
go
create trigger trigger1 on view1 instead of update, insert as begin
insert into table1
select [Key], Data
from (select distinct [Key], Data from inserted) a
end
go
insert into view1 values
('key1', 'foo')
,('key1', 'bar')
select * from view1
update view1
set Data='updated'
where [Key]='key1'
select * from view1
select * from table1
drop trigger trigger1
drop table table1
drop view view1
Results:
Key Data
key1 foo
Key Data
key1 updated
Id Key Data
1 key1 bar
2 key1 foo
3 key1 updated
I'm not sure if the disctinct is needed.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas