How to delete data from table cache using data factory? - sql

I have a data factory pipeline that is taking data from SQL Server and copying it to table storage:
Azure Sql Server View --> Table Storage Cache
This step is doing a REPLACE of the specific row based on a guid from the SQL Server view.
If a record is deleted from the source view, how do we delete that same record from table storage?

During the Copy Active, we couldn't do operation to the table data. Data Factory does not support this operation.
I have created a data pipeline to copy data from my SQL Server to the table storage for the test. Before your Copy Active begin, you can delete or insert your record as you want. And you can preview the data in Data Factory. Once the Copy Active is published and finished, it means that your data have been copied to the table storage.
If your really need to delete the same record in Table Storage, one of the ways is that you can login you Azure portal and using the "Storage Explorer(preview)" to delete the record:
Hope this helps.

Here is a hint -- some code I have run in production for years... the idea is to have the source use a timestamp (or simpler flag) on deleted records and then to have your MERGE respect it.
CREATE PROCEDURE [DataFactory].[MergeCompany]
(
#Company [DataFactory].[CompanyType] READONLY
)
AS
BEGIN
MERGE INTO [Company] AS TGT
USING #Company As SRC
ON TGT.pk = SRC.pk
--
WHEN MATCHED AND SRC.aud_delete_date IS NOT NULL
THEN DELETE
--
WHEN MATCHED AND SRC.aud_delete_date IS NULL
THEN UPDATE SET
TGT.comp_number = SRC.comp_number
,TGT.comp_name = SRC.comp_name
,TGT.aud_update_date = SRC.aud_update_date
,TGT.aud_create_date = SRC.aud_create_date
--
WHEN NOT MATCHED BY TARGET
THEN INSERT(
pk
,comp_number
,comp_name
,aud_update_date
,aud_create_date
)
VALUES(
SRC.pk
,SRC.comp_number
,SRC.comp_name
,SRC.aud_update_date
,SRC.aud_create_date
)
; -- Required semicolon
END
GO

Related

Merge Query With Delete From Target Condition

I am working on a task where my source is AWS RDS - SQL Server and my target is Azure SQL Server.
There's a table with 80M records in my source that needs to be merged with my target table.
This merging will happen every 15 mins and based on the business key, I need to -
Update the target table if the key is updated in the source table.
Insert a new key into the target table.
Mark isDeleted as true in the target if the key is no more present in the source.
IMP Note - The source row is hard-deleted and no history is maintained.
Since this merging happens every 15 mins and the source table is pretty big, I use lastUpdated column to select only limited records in the source query of my merge query.
With this, I am able to perfectly handle the "upsert" scenario, but on delete, it is deleting all the records from the target which is not desirable.
I have tried the below option -
Read the entire source table in a temp_table every 15 mins and then perform merge from temp_table to the target table. But this is very costly in terms of processing and time.
Is there any better way to handle this scenario? I am happy to share more information as needed.
I think you can solve the problem by adding new column called SyncStamp, the idea is, we update or insert the same value for SyncStamp, So the other rows that have not this value should be updated as IsDeleted.
I prefer to get the actual timestamp for SyncStamp but you can choose random numbers.
--get timestamp
Declare #SyncStamp bigint = (SELECT DATEDIFF_BIG(Second, '1970-01-01 00:00:00', GETUTCDATE()))
MERGE TargetTable AS Target
USING SourceTable AS Source
ON Source.BusinessKey = Target.BusinessKey
-- For Inserts
WHEN NOT MATCHED BY Target THEN
INSERT (ProductID,ProductName, SyncStamp)
VALUES (Source.ProductID,Source.ProductName, #SyncStamp)
-- For Updates
WHEN MATCHED THEN UPDATE SET
Target.ProductName = Source.ProductName,
Target.SyncStamp = #SyncStamp;
--Update isDeleted
UPDATE TargetTable
SET IsDeleted= 1
Where IsDeleted=0 and SyncStamp <> #SyncStamp

Best way to transfer data from source table in one db to destination table in another db daily

What would be the best way to transfer certain number of records daily from source to destination and then remove from source?
DB - SQL server on cloud.
As the databases are in the same server, you can create a job that transfers the data do the other database.
Because the databases are in the same server you can easily access them, just by adding the database before the table in the query, look the test that i did:
CREATE DATABASE [_Source]
CREATE DATABASE [_Destination]
CREATE TABLE [_Source].dbo.FromTable
(
some_data varchar(10)
)
CREATE TABLE [_Destination].dbo.ToTable
(
some_data varchar(10)
)
INSERT INTO [_Source].dbo.FromTable VALUES ('PAULO')
--THE JOB WOULD BE SOMETHING LIKE THIS:
-- INSERT INTO DESTINATION GETTING THE DATA FROM THE SOURCE
INSERT INTO [_Destination].dbo.ToTable
SELECT some_data
FROM [_Source].dbo.FromTable
-- DELETE FROM SOURCE
DELETE [_Source].dbo.FromTable

Can't delete from table after switch from logical to streaming replication

On my DEV server I tested logical replication, and return to streaming after that.
Now wal_level = replica and I have two slaves:
pid |state |application_name |client_addr|write_lag |flush_lag |replay_lag |sync_priority|sync_state|
-----|----------|--------------------|-----------|---------------|---------------|---------------|-------------|----------|
12811|streaming |db-slave1 |*.*.*.* |00:00:00.000569|00:00:00.001914|00:00:00.001932| 0|async |
25978|streaming |db-slave2 |*.*.*.* |00:00:00.000568|00:00:00.001913|00:00:00.001931| 0|async |
Now I created new table and insert one record. For example:
create table test_delete (
id int
);
insert into test_delete values (1);
delete from test_delete where id = 1;
The table created and replicated to both slaves, but deletion query failed with error:
SQL Error [55000]: ERROR: cannot delete from table "test_delete" because it does not have a replica identity and publishes deletes
Hint: To enable deleting from the table, set REPLICA IDENTITY using ALTER TABLE.
So, I need help to restore status before switch lo logical replication and ability to delete from tables
After some investigation I found solution. Despite the fact that wal_level has changed in postgres.conf all tables still appears in pg_publication_tables.
So for check publication status used:
select * from pg_publication_tables;
and for remove records:
drop publication <publication_name>;

SQL Merge is not adding to table

I'm trying to merge a temporary table, populated with data from a c# application with a table in my SQL database.
Having read through some articles it seems that SQL Merge should be the most appropriate option and it works great when I'm updating or deleting groups of entries from the database.
My problem is that I'm unable to add a new row of data where the foreign keys don't match what is already in the database table, but instead it is just removing the rows in the database and replacing the new ones in the temporary table.
I have two foreign keys referenced in the table FirstDBTableID and SecondDBTableID.
This is my SQL so far:
ALTER PROCEDURE [dbo].[SP_UpdateTable]
#TempTable as TempTableType READONLY
AS
BEGIN
MERGE dbo.Table AS target
USING #TempTable AS source ON target.FirstDBTableID = source.FirstDBTableID
AND target.SecondDBTableID = source.SecondDBTableID
WHEN MATCHED THEN
UPDATE SET target.Provider = source.Provider, target.Value = source.Value, target.Quantity = source.Quantity
-- value doesn't exist in the target table, so add it
WHEN NOT MATCHED BY TARGET THEN
INSERT (FirstDBTableID, SecondDBTableID, Provider, Value, Quantity)
VALUES (source.FirstDBTableID, source.SecondDBTableID, source.Provider, source.Value, source.Quantity)
-- value doesn't exist in the source table, so delete from target
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
END
Have a missed statement? Maybe SQL Merge isn't going to work here? If I haven't made it clear please ask me questions.
Thanks in advance.

SSIS Update a particular column

I have a table in SQL Server with many fields. Two of which are IDNumber and Resolved(Date) Resolved is Null in SQL Server. Then I have data with IDNumber and and Resolved(Date) that I want to import and update the table. I want it to find the matching IDNumber and put in the updated resolved date. How can I do this?
Thanks!
When doing updates SSIS will update a row at a time, which can be quite costly. I would advise to use an Execute SQL task to create a staging table:
IF (OBJECT_ID(N'dbo.tmpIDNumberStaging') IS NOT NULL
DROP TABLE dbo.tmpIDNumberStaging;
CREATE TABLE dbo.tmpIDNumberStaging
( IDNumber INT NOT NULL,
Resoved DATE NOT NULL
);
Then use a Data Flow task to import your data to this table (not sure what your source is, but your destination would be OLE DB Destination. You may need to set "Delay Validation" on the Data Flow task to false and/or "Validate External Meta Data" to false on the destination because the destination is created at run time).
Finally use this staging table to update your main table (and drop the staging table to clean up)
UPDATE YourMainTable
SET Resolved = st.Resolved
FROM YourMainTable t
INNER JOIN dbo.tmpIDNumberStaging st
ON st.IDNumber = t.IDNumber
WHERE t.Resolved IS NULL;
-- CLEAN UP AND DROP STAGING TABLE
IF (OBJECT_ID(N'dbo.tmpIDNumberStaging') IS NOT NULL
DROP TABLE dbo.tmpIDNumberStaging;