Delete from audit table in runtime - sql

We are using Oracle 12.1 database,
We want to create a table which will hold runtime audit data
The relevant/used data is only in a week time frame (older records will become irrelevant) and we'll delete older records using a job
Table holds 3 columns ID, Date (Primary key) and DAY_COUNT
We want to reset specific records, which can be achieve by updating DAY_COUNT to 0
But we want to keep the table small and the old data is irrelevant to us, so we consider using delete instead of update
Is it safe to reset records in runtime using delete ?
It seems the not documented convention to prevent using delete, but is it relevant in this case?

Related

Multi table auditing stucture

In this scenario I have the following tables:
HEADER_TABLE
DETAIL_TABLE_1 (FK to HEADER_TABLE)
And
HEADER_TABLE_AUDIT
DETAIL_AUDIT_TABLE_1
What I would like to do is create a single snap shot of the two tables in the audit tables when a change...um "session".. occurs. So for instance, If the header record AND 3 of 20 associated child records are updated at the same time, or "session" then the triggers on each table will only result in writing of one header audit record and 20 detail audit records (as they were before changes are applied).
I had the idea to attach a change session id that I retrieve from a sequence, and attach to all the changes being made within that session (1 header record, and 3 child records) and passed on to the audit tables. This would result in 4 trigger fires between the 2 tables, but should only write 1 set of data (header record and associated detail records). A check for the change session id in the audit tables (either header or detail) would determine whether a new set needs to be created, or just skip if it already exists (1st trigger fires, doesn't find change session id, creates audit records for all tables, next trigger fires and sees same change session id exists already so it skips adding audit records and so on).
This works ok if I am jusy updating the header record only. The trouble I am running into is when I am updating child records, how do I select the 20 detail records within the trigger on the detail table (understandably, oracle doesn't allow this)?
Of course, I am open to other ideas on doing this, as this was the best I could think of to create a snap shot of data from all involved tables prior to it being updated. I have wrestled with this one for a while so any insight would be greatly appreciated.
There is a project I'm working on with a similar requirement. Let me describe how I have done this, it could be helpful for you.
Requirement: Have a copy of the entire record whenever there is a change. Suppose our table is the EMP table from the sample data provided by oracle.
Implementation:
EMP is no longer a table, it is a view. All the data (the active version AND all the older rows are stored in a table called EMP_ALL.
EMP_ALL has a couple of additional columns:
version_no: this is the timestamp in format YYYYMMDDHH24MISS so it is visible when the change was done.
active_version: contains 1 for the row which is current and 0 for all the audited rows.
emp_key: an id which is the same for each row for an employee - more about this later
EMP is a view with the definition:
CREATE OR REPLACE view EMP AS
SELECT
<all_columns>
FROM EMP_ALL
WHERE active_version = 1
There is an INSTEAD OF trigger on EMP so whenever EMP is updated, the old row is copied into EMP_ALL (with active_version=0) and the current row is updated EMP_ALL (with active_version = 1). The reason that we update the current row is that we want to keep the primary key of the active row because that can be used for any foreign key definitions. The foreign key needs to map to the active row, not to an archived row. The column EMP_KEY is used as an identifier for a single employee.
EMP_ALL has a BEFORE INSERT trigger to populate the audit columns.
The same logic can be applied to any child tables. The column "version_no" will indicate when something changed and determines the order of the changes. This is very useful when constructing an audit trail.
Note: You can not use the RETURNING INTO clause that is used by applications (like Apex) when doing insert on the EMP view.
You can get the a "session number" from the following:
select sid, serial# from v$session.
Use these along with current timestamp and the PK of the row being updated to generate a unique(? see not handled below) key for the AUDIT record. Write to the audit tables only the row being updated (update 3 of 20 details write only those 3). You can then reconstruct an "as of" version of the entire structure when ever necessary. This handles the following:
Header updated, but no details updated.
Detail(s) updated but not the header.
Multiple headers updated in single statement.
Multiple details across multiple headers updated in single statement.
What is not handled:
Multiple updates of single row within single transaction (may need
sequence to avoid duplicate generated key and provide proper sequencing).
Deletes. Especially troublesome delete header.
Inserts (if needed).

Looking to have a copy of another database that automatically updates and has a few additional columns

I'm very new to database/server management. I'm working with a database that I can't add any columns to since it interfaces directly with another piece of software and therefore must stay in a very specific format. However, I'd like to be able to add DateCreated, and CreatedBy columns to the tables in this database to setup some automatic email updates when new entries are made. To do this, I thought I might be able to keep a copy of the original database that automatically updates when changes are made to the original and simply add the additional columns to the copy. I'm working in Microsoft SQL 2017. If anyone could provide any guidance on the best way to accomplish this, your help would be much appreciated.
Create a table extension that consists of the additional columns + the key value from the original table. Each row in Table 1 should have 1 or 0 rows in Table 2. Use a trigger on Table 1 to insert a row in Table 2 on Insert or Update.

Updating a database with keeping relationships

I've got a database which has some tables I want to keep up-to-date. I've got to write a SQL script that checks if there are certain rows in target tables and if there are not, insert them. I tried MERGE, but i have 8 tables related to the 9th one so there are 8 foreign keys and I don't know how to update them all. I don't want to delete all and insert again, because i want to save my main table rows' IDs.
You can use trigger that automatically add inserted rows to destination tables

How to create a Primary Key on quasi-unique data keys?

I have a nightly SSIS process that exports a TON of data from an AS400 database system. Due to bugs in the AS400 DB software, ocassional duplicate keys are inserted into data tables. Every time a new duplicate is added to an AS400 table, it kills my nightly export process. This issue has moved from being a nuisance to a problem.
What I need is to have an option to insert only unique data. If there are duplicates, select the first encountered row of the duplicate rows. Is there SQL Syntax available that could help me do this? I know of the DISTINCT ROW clause but that doesn't work in my case because for most of the offending records, the entirety of the data is non-unique except for the fields which comprise the PK.
In my case, it is more important for my primary keys to remain unique in my SQL Server DB cache, rather than having a full snapshot of data. Is there something I can do to force this constraint on the export in SSIS/SQL Server with out crashing the process?
EDIT
Let me further clarify my request. What I need is to assure that the data in my exported SQL Server tables maintains the same keys that are maintained the AS400 data tables. In other words, creating a unique Row Count identifier wouldn't work, nor would inserting all of the data without a primary key.
If a bug in the AS400 software allows for mistaken, duplicate PKs, I want to either ignore those rows or, preferably, just select one of the rows with the duplicate key but not both of them.
This SELECT statement should probably happen from the SELECT statement in my SSIS project which connects to the mainframe through an ODBC connection.
I suspect that there may not be a "simple" solution to my problem. I'm hoping, however, that I'm wrong.
Since you are using SSIS, you must be using OLE DB Source to fetch the data from AS400 and you will be using OLE DB Destination to insert data into SQL Server.
Let's assume that you don't have any transformations
Add a Sort transformation after the OLE DB Source. In the Sort Transformation, there is a check box option at the bottom to remove duplicate rows based on a give set of column values. Check all the fields but don't select the Primary Key that comes from AS400. This will eliminate the duplicate rows but will insert the data that you still need.
I hope that is what you are looking for.
In SQL Server 2005 and above:
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY almost_unique_field ORDER BY id) rn
FROM import_table
) q
WHERE rn = 1
There are several options.
If you use IGNORE_DUP_KEY (http://www.sqlservernation.com/home/creating-indexes-with-ignore_dup_key.html) option on your primary key, SQL will issue a warning and only the duplicate records will fail.
You can also group/roll-up your data but this can get very expensive. What I mean by that is:
SELECT Id, MAX(value1), MAX(value2), MAX(value3) etc
Another option is to add an identity column (and cluster on this for an efficient join later) to your staging table and then create a mapping in a temp table. The mapping table would be:
CREATE TABLE #mapping
(
RowID INT PRIMARY KEY CLUSTERED,
PKIN INT
)
INSERT INTO #mapping
SELECT PKID, MIN(rowid) FROM staging_table
GROUP BY PKID
INSERT INTO presentation_table
SELECT S.*
FROM Staging_table S
INNER JOIN #mapping M
ON S.RowID = M.RowID
If I understand you correctly, you have duplicated PKs that have different data in the other fields.
First, put the data from the other database into a staging table. I find it easier to research issues with imports (especially large ones) if I do this. Actually I use two staging tables (and for this case I strongly recommend it), one with the raw data and one with only the data I intend to import into my system.
Now you can use and Execute SQL task to grab the one of the records for each key (see #Quassnoi for an idea of how to do that you may need to adjust his query for your situation). Personally I put an identity into my staging table, so I can identify which is the first or last occurance of duplicated data. Then put the record you chose for each key into your second staging table. If you are using an exception table, copy the records you are not moving to it and don't forget a reason code for the exception ("Duplicated key" for instance).
Now that you have only one record per key in a staging table, your next task is to decide what to do about the other data that is not unique. If there are two different business addresses for the same customer, which do you chose? This is a matter of business rules definition not strictly speaking SSIS or SQL code. You must define the business rules for how you chose the data when the data needs to be merged between two records (what you are doing is the equivalent of a de-dupping process). If you are lucky there is a date field or other way to determine which is the newest or oldest data and that is the data they want you to use. In that case once you have selected just one record, you are done the intial transform.
More than likely though you may need different rules for each other field to choose the correct one. In this case you write SSIS transforms in a data flow or Exec SQl tasks to pick the correct data and update the staging table.
Once you have the exact record you want to import, then do the data flow to move to the correct production tables.

Excluding a table from a transaction rollback

We have a table and a set of procedures that are used for generating pk ids. The table holds the last id, and the procedures gets the id, increments it, updates the table, and then returns the newly incremented id.
This procedure could potentially be within a transaction. The problem is that if the we have a rollback, it could potentially rollback to an id that is before any id's that came into use during the transaction (say generated from a different user or thread). Then when the id is incremented again, it will cause duplicates.
Is there any way to exclude the id generating table from a parent transaction to prevent this from happening?
To add detail our current problem...
First, we have a system we are preparing to migrate a lot of data into. The system consists of a ms-sql (2008) database, and a textml database. The sql database houses data less than 3 days old, while the textml acts as an archive for anything older. The textml db also relies on the sql db to provide ids' for particular fields. These fields are Identity PK's currently, and are generated on insertion before publishing to the texml db. We do not want to wash all our migrated data through sql since the records will flood the current system, both in terms of traffic and data. But at the same time we have no way of generating these id's since they are auto-incremented values that sql server controls.
Secondly, we have a system requirement which needs us to be able to pull an old asset out of the texml database and insert it back into the sql database with the original id's. This is done for correction and editing purposes, and if we alter the id's it will break relations downstream on clients system which we have no control over. Of course all this is an issue because id columns are Identity columns.
procedures gets the id, increments it,
updates the table, and then returns
the newly incremented id
This will cause deadlocks. procedure must increment and return in one single, atomic, step, eg. by using the OUTPUT clause in SQL Server:
update ids
set id = id + 1
output inserted.id
where name= #name;
You don't have to worry about concurrency. The fact that you generate ids this way implies that only one transaction can increment an id, because the update will lock the row exclusively. You cannot get duplicates. You do get complete serialization of all operations (ie. no performance and low throughput) but that is a different issue. And this why you should use built-in mechanisms for generating sequences and identities. These are specific to each platform: AUTO_INCREMENT in MySQL, SEQUENCE in Oracle, IDENTITY and SEQUENCE in SQL Server (sequence only in Denali) etc etc.
Updated
As I read your edit, the only reason why you want control of the generated identities is to be able to insert back archived records. This is already possible, simply use IDENTITY_INSERT:
Allows explicit values to be inserted
into the identity column of a table
Turn it on when you insert back the old record, then turn it back off:
SET IDENTITY_INSERT recordstable ON;
INSERT INTO recordstable (id, ...) values (#oldid, ...);
SET IDENTITY_INSERT recordstable OFF;
As for why manually generated ids serialize all operations: any transaction that generates an id will exclusively lock the row in the ids table. No other transaction can read or write that row until the first transaction commits or rolls back. Therefore there can be only one transaction generating an id on a table at any moment, ie. serialization.