I have a data factory pipeline that is taking data from SQL Server and copying it to table storage:
Azure Sql Server View --> Table Storage Cache
This step is doing a REPLACE of the specific row based on a guid from the SQL Server view.
If a record is deleted from the source view, how do we delete that same record from table storage?
During the Copy Active, we couldn't do operation to the table data. Data Factory does not support this operation.
I have created a data pipeline to copy data from my SQL Server to the table storage for the test. Before your Copy Active begin, you can delete or insert your record as you want. And you can preview the data in Data Factory. Once the Copy Active is published and finished, it means that your data have been copied to the table storage.
If your really need to delete the same record in Table Storage, one of the ways is that you can login you Azure portal and using the "Storage Explorer(preview)" to delete the record:
Hope this helps.
Here is a hint -- some code I have run in production for years... the idea is to have the source use a timestamp (or simpler flag) on deleted records and then to have your MERGE respect it.
CREATE PROCEDURE [DataFactory].[MergeCompany]
(
#Company [DataFactory].[CompanyType] READONLY
)
AS
BEGIN
MERGE INTO [Company] AS TGT
USING #Company As SRC
ON TGT.pk = SRC.pk
--
WHEN MATCHED AND SRC.aud_delete_date IS NOT NULL
THEN DELETE
--
WHEN MATCHED AND SRC.aud_delete_date IS NULL
THEN UPDATE SET
TGT.comp_number = SRC.comp_number
,TGT.comp_name = SRC.comp_name
,TGT.aud_update_date = SRC.aud_update_date
,TGT.aud_create_date = SRC.aud_create_date
--
WHEN NOT MATCHED BY TARGET
THEN INSERT(
pk
,comp_number
,comp_name
,aud_update_date
,aud_create_date
)
VALUES(
SRC.pk
,SRC.comp_number
,SRC.comp_name
,SRC.aud_update_date
,SRC.aud_create_date
)
; -- Required semicolon
END
GO
I have the same table in two different sql servers (one is SqlServer 2000 and the other 2008).
I'm using sql server management studio.
I want that each time an insert is made on a table in the SqlServer 2000 table (Table_1) a trigger occurs and the record would also be inserted to the same table in the SqlServer 2008 table (also Table_1).
the sql server 2008 is defined as a linked server, and it is possible to run queries and perform inserts on the 2008 db from the 2000 db connection using sql server management studio.
the trigger is defined as follows :
ALTER TRIGGER [dbo].[trgrTable]
ON [dbo].[Table_1]
AFTER INSERT
AS
BEGIN
INSERT INTO [TLVSQL].[AVI_DEV].[dbo].[Table_1](ID, Value)
SELECT INSERTED.ID AS ID, INSERTED.Value AS Value
FROM INSERTED
END
the [TLVSQL].[AVI_DEV] is the 2008 db name.
But every time i perform an insert on the 2000 table i get a message that the insert failed to commit due to "sqloledb was unable to begin a distributed transaction linked server ...".
The security delegation of the linked server is well defined, i explicitly set the security credentials to a user/password of a db_owner.
What am i doing wrong ? Is there another way of doing what i ask ?
thank you.
Perform inserts from a trigger into a table on linked server - bad decision.
This will great affects on insert performance in the source table ([dbo].[Table_1])
and also there is a distributed transaction, and configuring servers
for support distributed transactions - nightmare.
One possible solution is:
On the source server you can create a synchronization queue table. For example:
CREATE TABLE dbo.SyncQueue
(
QueueId INT IDENTITY(1,1),
KeyForSync INT, -- Primary key value of record in dbo.SourceTable
SyncStatus INT -- statuses can be: 0 - New, 1 - Synchronized, 2 - Error
)
suppose you source table is
CREATE TABLE dbo.SourceTable
(
Key INT, -- primary key of the table
Data varchar(xxx)
)
Triger on dbo.SourceTable can quickly insert into dbo.SyncQueue record Key that you need synchronize
Some periodically performed stored procedure can then insert records from the queue in
table on the linked server.
We have a large number of databases with the same schema, which each have a table with triggers to sync records with another table in a central database. When the table is updated, inserted into, or deleted from, the table in the central database also has a record updated, inserted, or deleted.
We've been having records mysteriously disappear from the table in the central database. When researching the problem I found that when the insert/delete trigger fires there are records in the deleted table that are not from the current delete statement. They aren't even records in the same database. They look like the old values record for update statements on the same table in another database.
All the information I could find says records in the deleted table should be from the statement that caused the trigger to fire.
Can anyone explain why I'm seeing this behavior instead?
EDIT: This is what the insert/delete trigger looks like:
DECLARE #TenantID INT
SELECT #TenantID = ID FROM [CentralDB]..Tenants WHERE db = DB_Name()
INSERT INTO [CentralDB].[dbo].[TenantUsers]
(..snipped list of columns...)
SELECT
...snipped list of columns...
FROM inserted
WHERE UserNameID NOT IN (0,6)
DELETE FROM [CentralDB]..TenantUsers WHERE UserNameID in
(SELECT UserNameID FROM DELETED WHERE UserNameID NOT IN (0,1,6))
And the update trigger:
DECLARE #TenantID INT
SELECT #TenantID = ID FROM [CentralDB]..Tenants WHERE db = DB_Name()
UPDATE [CentralDB].[dbo].[TenantUsers]
SET ...snipped list of columns...
FROM INSERTED i
WHERE i.UserNameID = TenantUsers.UserNameID
AND i.UserNameID NOT IN (0,6)
You've probably done this but if records are being deleted which ought not to be then i'd go round the db's (or write a script too) and check the triggers which contain the delete statements only fire for inserts and deletes.. Maybe there is a rouge trigger which fires on update and executes the delete command?
Its a long shot..
Other than this i would check there are no other triggers in the chain which can delete from the central db table.
there appear to be no obvious issues with the trigger design
I have a table in SQL Server with many fields. Two of which are IDNumber and Resolved(Date) Resolved is Null in SQL Server. Then I have data with IDNumber and and Resolved(Date) that I want to import and update the table. I want it to find the matching IDNumber and put in the updated resolved date. How can I do this?
Thanks!
When doing updates SSIS will update a row at a time, which can be quite costly. I would advise to use an Execute SQL task to create a staging table:
IF (OBJECT_ID(N'dbo.tmpIDNumberStaging') IS NOT NULL
DROP TABLE dbo.tmpIDNumberStaging;
CREATE TABLE dbo.tmpIDNumberStaging
( IDNumber INT NOT NULL,
Resoved DATE NOT NULL
);
Then use a Data Flow task to import your data to this table (not sure what your source is, but your destination would be OLE DB Destination. You may need to set "Delay Validation" on the Data Flow task to false and/or "Validate External Meta Data" to false on the destination because the destination is created at run time).
Finally use this staging table to update your main table (and drop the staging table to clean up)
UPDATE YourMainTable
SET Resolved = st.Resolved
FROM YourMainTable t
INNER JOIN dbo.tmpIDNumberStaging st
ON st.IDNumber = t.IDNumber
WHERE t.Resolved IS NULL;
-- CLEAN UP AND DROP STAGING TABLE
IF (OBJECT_ID(N'dbo.tmpIDNumberStaging') IS NOT NULL
DROP TABLE dbo.tmpIDNumberStaging;
I have 2 tables, Table-A and Table-A-History.
Table-A contains current data rows.
Table-A-History contains historical data
I would like to have the most current row of my data in Table-A, and Table-A-History containing historical rows.
I can think of 2 ways to accomplish this:
whenever a new data row is available, move the current row from Table-A to Table-A-History and update the Table-A row with the latest data (via insert into select or select into table)
or
whenever a new data row is available, update Table-A's row and insert a new row into Table-A-History.
In regards to performance is method 1 or 2 better? Is there a better different way to accomplish this?
Basically you are looking to track/audit changes to a table while keeping the primary table small in size.
There are several ways to solve this issue. The cons and pros of each way is discussed below.
1 - Auditing of the table with triggers.
If you are looking to audit the table (inserts, updates, deletes), look at my how to revent unwanted transactions - SQL Saturday slide deck w/code - http://craftydba.com/?page_id=880. The trigger that fills the audit table can hold information from multiple tables, if you choose, since the data is saved as XML. Therefore, you can un-delete an action if necessary by parsing the XML. It tracks who and what made the change.
Optionally, you can have the audit table on it's own file group.
Description:
Table Triggers For (Insert, Update, Delete)
Active table has current records.
Audit (history) table for non-active records.
Pros:
Active table has smaller # of records.
Index in active table is small.
Change is quickly reported in audit table.
Tells you what change was made (ins, del, upd)
Cons:
Have to join two tables to do historical reporting.
Does not track schema changes.
2 - Effective dating the records
If you are never going to purge the data from the audit table, why not mark the row as deleted but keep it for ever? Many systems like people soft use effective dating to show if a record is no longer active. In the BI world this is called a type 2 dimensional table (slowly changing dimensions). See the data warehouse institute article. http://www.bidw.org/datawarehousing/scd-type-2/ Each record has a begin and end date.
All active records have a end date of null.
Description:
Table Triggers For (Insert, Update, Delete)
Main table has both active and historical records.
Pros:
Historical reporting is easy.
Change is quickly shown in main table.
Cons:
Main table has a large # of records.
Index of main table is large.
Both active & history records in same filegroup.
Does not tell you what change was made (ins, del, upd)
Does not track schema changes.
3 - Change Data Capture (Enterprise Feature).
Micorsoft SQL Server 2008 introduced the change data capture feature. While this tracks data change (CDC) using a LOG reader after the fact,
it lacks things like who and what made the change. MSDN Details - http://technet.microsoft.com/en-us/library/bb522489(v=sql.105).aspx
This solution is dependent upon the CDC jobs running. Any issues with sql agent will cause delays in data showing up.
See change data capture tables.
http://technet.microsoft.com/en-us/library/bb500353(v=sql.105).aspx
Description:
Enable change data capture
Pros:
Do not need to add triggers or tables to capture data.
Tells you what change was made (ins, del, upd) the _$operation field in
<user_defined_table_CT>
Tracks schema changes.
Cons:
Only available in enterprise version.
Since it reads the log after the fact, time delay in data showing up.
The CDC tables do not track who or what made the change.
Disabling CDC removes the tables (not nice)!
Need to decode and use the _$update_mask to figure out what columns changed.
4 - Change Tracking Feature (All Versions).
Micorsoft SQL Server 2008 introduced the change tracking feature. Unlike CDC, it comes with all versions; However, it comes with a bunch of TSQL functions that you have to call to figure out what happened.
It was designed for the purpose of synchronization one data source with SQL server via an application. There is a whole synchronization frame work on TechNet.
http://msdn.microsoft.com/en-us/library/bb933874.aspx
http://msdn.microsoft.com/en-us/library/bb933994.aspx
http://technet.microsoft.com/en-us/library/bb934145(v=sql.105).aspx
Unlike CDC, you specify how long changes last in the database before being purged. Also, inserts and deletes do not record data. Updates only record what field changed.
Since you are synchronizing the SQL server source to another target, this works fine.
It is not good for auditing unless you write a periodic job to figure out changes.
You will still have to store that information somewhere.
Description:
Enable change tracking
Cons:
Not a good auditing solution
The first three solutions will work for your auditing. I like the first solution since I use it extensively in my environment.
Sincerely
John
Code Snippet From Presentation (Autos Database)
--
-- 7 - Auditing data changes (table for DML trigger)
--
-- Delete existing table
IF OBJECT_ID('[AUDIT].[LOG_TABLE_CHANGES]') IS NOT NULL
DROP TABLE [AUDIT].[LOG_TABLE_CHANGES]
GO
-- Add the table
CREATE TABLE [AUDIT].[LOG_TABLE_CHANGES]
(
[CHG_ID] [numeric](18, 0) IDENTITY(1,1) NOT NULL,
[CHG_DATE] [datetime] NOT NULL,
[CHG_TYPE] [varchar](20) NOT NULL,
[CHG_BY] [nvarchar](256) NOT NULL,
[APP_NAME] [nvarchar](128) NOT NULL,
[HOST_NAME] [nvarchar](128) NOT NULL,
[SCHEMA_NAME] [sysname] NOT NULL,
[OBJECT_NAME] [sysname] NOT NULL,
[XML_RECSET] [xml] NULL,
CONSTRAINT [PK_LTC_CHG_ID] PRIMARY KEY CLUSTERED ([CHG_ID] ASC)
) ON [PRIMARY]
GO
-- Add defaults for key information
ALTER TABLE [AUDIT].[LOG_TABLE_CHANGES] ADD CONSTRAINT [DF_LTC_CHG_DATE] DEFAULT (getdate()) FOR [CHG_DATE];
ALTER TABLE [AUDIT].[LOG_TABLE_CHANGES] ADD CONSTRAINT [DF_LTC_CHG_TYPE] DEFAULT ('') FOR [CHG_TYPE];
ALTER TABLE [AUDIT].[LOG_TABLE_CHANGES] ADD CONSTRAINT [DF_LTC_CHG_BY] DEFAULT (coalesce(suser_sname(),'?')) FOR [CHG_BY];
ALTER TABLE [AUDIT].[LOG_TABLE_CHANGES] ADD CONSTRAINT [DF_LTC_APP_NAME] DEFAULT (coalesce(app_name(),'?')) FOR [APP_NAME];
ALTER TABLE [AUDIT].[LOG_TABLE_CHANGES] ADD CONSTRAINT [DF_LTC_HOST_NAME] DEFAULT (coalesce(host_name(),'?')) FOR [HOST_NAME];
GO
--
-- 8 - Make DML trigger to capture changes
--
-- Delete existing trigger
IF OBJECT_ID('[ACTIVE].[TRG_FLUID_DATA]') IS NOT NULL
DROP TRIGGER [ACTIVE].[TRG_FLUID_DATA]
GO
-- Add trigger to log all changes
CREATE TRIGGER [ACTIVE].[TRG_FLUID_DATA] ON [ACTIVE].[CARS_BY_COUNTRY]
FOR INSERT, UPDATE, DELETE AS
BEGIN
-- Detect inserts
IF EXISTS (select * from inserted) AND NOT EXISTS (select * from deleted)
BEGIN
INSERT [AUDIT].[LOG_TABLE_CHANGES] ([CHG_TYPE], [SCHEMA_NAME], [OBJECT_NAME], [XML_RECSET])
SELECT 'INSERT', '[ACTIVE]', '[CARS_BY_COUNTRY]', (SELECT * FROM inserted as Record for xml auto, elements , root('RecordSet'), type)
RETURN;
END
-- Detect deletes
IF EXISTS (select * from deleted) AND NOT EXISTS (select * from inserted)
BEGIN
INSERT [AUDIT].[LOG_TABLE_CHANGES] ([CHG_TYPE], [SCHEMA_NAME], [OBJECT_NAME], [XML_RECSET])
SELECT 'DELETE', '[ACTIVE]', '[CARS_BY_COUNTRY]', (SELECT * FROM deleted as Record for xml auto, elements , root('RecordSet'), type)
RETURN;
END
-- Update inserts
IF EXISTS (select * from inserted) AND EXISTS (select * from deleted)
BEGIN
INSERT [AUDIT].[LOG_TABLE_CHANGES] ([CHG_TYPE], [SCHEMA_NAME], [OBJECT_NAME], [XML_RECSET])
SELECT 'UPDATE', '[ACTIVE]', '[CARS_BY_COUNTRY]', (SELECT * FROM deleted as Record for xml auto, elements , root('RecordSet'), type)
RETURN;
END
END;
GO
--
-- 9 - Test DML trigger by updating, deleting and inserting data
--
-- Execute an update
UPDATE [ACTIVE].[CARS_BY_COUNTRY]
SET COUNTRY_NAME = 'Czech Republic'
WHERE COUNTRY_ID = 8
GO
-- Remove all data
DELETE FROM [ACTIVE].[CARS_BY_COUNTRY];
GO
-- Execute the load
EXECUTE [ACTIVE].[USP_LOAD_CARS_BY_COUNTRY];
GO
-- Show the audit trail
SELECT * FROM [AUDIT].[LOG_TABLE_CHANGES]
GO
-- Disable the trigger
ALTER TABLE [ACTIVE].[CARS_BY_COUNTRY] DISABLE TRIGGER [TRG_FLUID_DATA];
** Look & Feel of audit table **
The recent versions of SQL server (2016+ and Azure) have temporal tables which provide the exact functionality requested, as a first class feature.
https://learn.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables
Somebody at Microsoft probably read this page. :)
Logging changes is something I've generally done using triggers on a base table to record changes in a log table. The log table has additional columns to record the database user, action and date/time.
create trigger Table-A_LogDelete on dbo.Table-A
for delete
as
declare #Now as DateTime = GetDate()
set nocount on
insert into Table-A-History
select SUser_SName(), 'delete-deleted', #Now, *
from deleted
go
exec sp_settriggerorder #triggername = 'Table-A_LogDelete', #order = 'last', #stmttype = 'delete'
go
create trigger Table-A_LogInsert on dbo.Table-A
for insert
as
declare #Now as DateTime = GetDate()
set nocount on
insert into Table-A-History
select SUser_SName(), 'insert-inserted', #Now, *
from inserted
go
exec sp_settriggerorder #triggername = 'Table-A_LogInsert', #order = 'last', #stmttype = 'insert'
go
create trigger Table-A_LogUpdate on dbo.Table-A
for update
as
declare #Now as DateTime = GetDate()
set nocount on
insert into Table-A-History
select SUser_SName(), 'update-deleted', #Now, *
from deleted
insert into Table-A-History
select SUser_SName(), 'update-inserted', #Now, *
from inserted
go
exec sp_settriggerorder #triggername = 'Table-A_LogUpdate', #order = 'last', #stmttype = 'update'
Logging triggers should always be set to fire last. Otherwise, a subsequent trigger may rollback the original transaction, but the log table will have already been updated. This is a confusing state of affairs.
How about method 3: Make Table-A a view against Table-A-History. Insert into Table-A-History and let appropriate filtering logic generate Table-A. That way you're only inserting into one table.
Even though it consumes more space, having the history table containing the most recent record as well will save you pain on writing reports and seeing how changes occurred and when. Something worth thinking about in my opinion.
As far as performance, I would expect them to be identical. But, you certainly wouldn't want to delete the record (option 1's "move") from the non-hist table because you are using referential integrity between the two tables, right?
I would prefer method 1
In addition, I will have also maintain the current record in the history table too
it depends on the need.
Option 1 is OK.
But you have method 4 too :)
Insert new record to your table,
Move old record to archive table on regular base using mysql scheduler. You can schedule data archivation at the time of minimal load, for example at night hours.
You can simply create procedure or job to overcome this issue like this:
create procedure [dbo].[sp_LoadNewData]
AS
INSERT INTO [dbo].[Table-A-History]
(
[1.Column Name], [2.Column Name], [3.Column Name], [4.Column Name]
)
SELECT [1.Column Name], [2.Column Name], [3.Column Name], [4.Column Name]
FROM dbo.[Table-A] S
WHERE NOT EXISTS
(
SELECT * FROM [dbo].[Table-A-History] D WHERE D.[1.Column Name] =S.[1.Column Name]
)
Note: [1.Column Name] is common column for the tables.